STrenD: Subspace Trend Discovery

From FarsightWiki
Revision as of 17:42, 12 December 2011 by Yan Xu (Talk | contribs)
Jump to: navigation, search

Contents

Clustering

Clustering is for data dimension deduction to speed up the analysis and to achieve better looking progression tree. The clustering has been undertaken on boths sides, samples and features.

For sample cluster, only one param "coherence" is taken into consideration. "Coherence" is mesured by the average Pearson correlation coefficient of each module. Therefore, it should be 0-1. The larger the coherence, the more correlated the module. For feature cluster, besides "coherence", "merge coherence" is the Pearson correlation coefficient of two clustered modules, if the correlation coeffient of two modules exceeds the "merge coherence", these two modules will be merged. Its arrange is also 0-1. Click Sample cluster for sample clustering and Feature cluster for feature clustering.

Recommended param setting:

Samples clustering: coherence = 0.95; It is not neccessary for small sample size.

Feature clustering: coherence = 0.9 or higher for small feature size, 0.7 for large feature size; merge coherence = 0.9.

Minimum Spanning Tree

Build MST for each clustered module so as to tell how samples are related to each other in every module. Each MST represents the progression of the module. For this step, just click on MST button when available, it will automatically generate MSTs for all clustered modules, the running state is updated in command window.

Earth Mover Distance

Based on the MSTs, Earth Mover's Distance(EMD) method is used for MSTs and modules to see how each MST fits each module. If the value is large, it means the MST can well present the module progression to some extent. For this step, just click on EMD button when available, it will match all MSTs and modules. The running state is updated in command window.

Select similar modules

Once the matching among MSTs and modules has finished, a similarity matrix for modules is built. PSM threshold, progression similarity matrix threhold, is set to determine whether two modules are similar. The similarity is one if the module is matching with its own MST. It approaches zero if the MST doesn't fit the module at all. "Selecting percentage" is the percentage of the values higher than the threshold in the similarity matrix. This param depends on the way you trust the similarity judgement. If you trust the similarity value more, just focus on the threhold; If you trust the percentage more, set the threhold so as to let the percentage fall in the range you would like it to. Click Show PSM, a heatmap would pop out for you to select the similar modules. The heatmap is colored from red to blue. Dark red represents high values and dark blue low values. It's symetric so it's recommended to select along the diagonal by left button down clicking on the left-up corner starting block and left button up clicking on the right-down corner ending block. All the blocks in this symetric square will be chosen and their corresponding module IDs will be filled in the "Input hand-picked modules".

Recommended param setting for :

Set the threshold so that the percentage is 0.2 to 0.3.

Progression Tree

Beyond the auto-filling module IDs by right clicking on the heatmap, you can also edit them yourself. If you want to add IDs, make sure they are seperated by comma. After these are all set, click View Progression, the final progression tree is built. You can select the nodes and view the corresponding items in other views. If the samples have been clustered, a vertex represents a cluster with its size number shown near the vertex.

Progression Heatmap

Progression Heatmap is built from the progression tree. Its row order is the tree node order, and its column order is the feature cluster order, the corresponding hierachical clustering dendrogram is shown above the heatmap. The heatmap is colored based on the normalized feature values. When you click vertex in the progression tree, the corresponding rows and selected feature columns will be selected.

Personal tools