STrenD: Subspace Trend Discovery

From FarsightWiki

(Difference between revisions)

Revision as of 20:07, 8 May 2012

Clustering

Clustering is for data dimension deduction to speed up the analysis and to achieve better progression tree. Clustering is available in both sample/feature space, and only feature clustering based on their correlation is needed now.

For feature cluster, besides "Feature Coherence", "Feature Merge Coherence" is the Pearson correlation coefficient of two clustered modules, if the correlation coeffient of two modules exceeds the merge coherence, these two modules will be merged. Its arrange is also 0-1. Click Feature cluster for feature clustering.

Recommended param setting:

Feature clustering: coherence = 0.95 or higher for small feature size, 0.7 for large feature size; merge coherence = 0.9.

Overall Progression or Progression over Distance

If the checkbox Progression over distance to device is checked, the analysis is for progression over distance rather than overall progression. This is available only when distance to device has been calculated in TraceEditor.

Module Matching

Instead of previous MST/EMD module matching, a correlation-based module matching is adopted for large sample size. A similarity matrix of the feature modules will be generated by clicking Match.

Select Similar Modules

Once the modules matching finished, PSM(progression similarity matrix) threshold is set to determine whether two modules are similar. "PSM Selected Blocks' Percentage" is the percentage of the selected values in the similarity matrix higher than the threshold. If you want to select fewer feature modules, then set the threshold higher and percentage would drop correspondingly as fewer values are above the threshold. Click Show PSM, a heatmap window would show up for you to select the similar modules. The heatmap is colored from red to blue, representing high similarity to low similarity. It's symetric so it's recommended to select along the diagonal by left button down clicking on the block where you want to start and left button up clicking on the block where you want to end. All the blocks in this symmetric square will be chosen and their corresponding feature module IDs will be filled in the "Input hand-picked modules". You would also input or delete any feature module by hand, please notice that all the feature module ID should be separated by comma.

Recommended param setting for :

Overall progression: Set the threshold so that the percentage is around 0.3.

Progression over distance to device: Sometimes the PSM threshold must be low enough to guarantee some modules are selected.

Threshold Heatmap and Progression Tree

After the selected modules are all set, click View Progression, the threshold heatmap is built. By right clicking to make a cut line across the dendrogram of the samples on the left side of the heatmap, the progression tree is built. Each node represents a cluster by the cut. You can select the nodes and view the corresponding items in other views. If you have loaded two kinds of traces all at once in the Trace Editor, say device traces and control trace, "Id to separate" is to tell them apart according to the "Root Trace" in "Computed Features for Cells", the first percentage in the tree label would tell the percentage of the cells from the first data(ID below "Id to separate")( They won't show up if all of them is 100%). The second percentage in the tree label tells how much percentage of the cells are within the distance of "700" near the device( They won't show up if the distance is not available).

Progression Heatmap and Scatter Plot

By clicking "Heatmap", You can furtherly arrange the clusters in the progression tree order and check each feature over the progression in the scatter plot.

@@ Line 1: / Line 1: @@
 == Clustering ==
-Clustering is for data dimension deduction to speed up the analysis and to achieve better looking progression tree. The clustering has been undertaken on boths sides, samples and features.
+Clustering is for data dimension deduction to speed up the analysis and to achieve better progression tree. Clustering is available in both sample/feature space, and only feature clustering based on their correlation is needed now.
-For sample cluster, only one param "coherence" is taken into consideration. "Coherence" is mesured by the average Pearson correlation coefficient of each module. Therefore, it should be 0-1. The larger the coherence, the more correlated the module.
+For feature cluster, besides "Feature Coherence", "Feature Merge Coherence" is the Pearson correlation coefficient of two clustered modules, if the correlation coeffient of two modules exceeds the merge coherence, these two modules will be merged. Its arrange is also 0-1. Click '''Feature cluster''' for feature clustering.
-For feature cluster, besides "coherence", "merge coherence" is the Pearson correlation coefficient of two clustered modules, if the correlation coeffient of two modules exceeds the "merge coherence", these two modules will be merged. Its arrange is also 0-1. Click '''Sample cluster''' for sample clustering and '''Feature cluster''' for feature clustering.
 '''Recommended param setting:'''
-Samples clustering: coherence = 0.95; It is not neccessary for small sample size.
+Feature clustering: coherence = 0.95 or higher for small feature size, 0.7 for large feature size; merge coherence = 0.9.
-Feature clustering: coherence = 0.9 or higher for small feature size, 0.7 for large feature size; merge coherence = 0.9.
+== Overall Progression or Progression over Distance ==
+If the checkbox '''Progression over distance to device''' is checked, the analysis is for progression over distance rather than overall progression. This is available only when distance to device has been calculated in TraceEditor.
-== Minimum Spanning Tree ==
+== Module Matching ==
-Build MST for each clustered module so as to tell how samples are related to each other in every module. Each MST represents the progression of the module. For this step, just click on '''MST''' button when available, it will automatically generate MSTs for all clustered modules, the running state is updated in command window.
+Instead of previous MST/EMD module matching, a correlation-based module matching is adopted for large sample size. A similarity matrix of the feature modules will be generated by clicking '''Match'''.
-== Earth Mover Distance ==
+== Select Similar Modules ==
-Based on the MSTs, Earth Mover's Distance(EMD) method is used for MSTs and modules to see how each MST fits each module. If the value is large, it means the MST can well present the module progression to some extent. For this step, just click on '''EMD''' button when available, it will match all MSTs and modules. The running state is updated in command window.
+Once the modules matching finished, PSM(progression similarity matrix) threshold is set to determine whether two modules are similar.  "PSM Selected Blocks' Percentage" is the percentage of the selected values in the similarity matrix higher than the threshold. If you want to select fewer feature modules, then set the threshold higher and percentage would drop correspondingly as fewer values are above the threshold. Click '''Show PSM''', a heatmap window would show up for you to select the similar modules. The heatmap is colored from red to blue, representing high similarity to low similarity. It's symetric so it's recommended to select along the diagonal by left button down clicking on the block where you want to start and left button up clicking on the block where you want to end. All the blocks in this symmetric square will be chosen and their corresponding feature module IDs will be filled in the "Input hand-picked modules". You would also input or delete any feature module by hand, please notice that all the feature module ID should be separated by comma.
-== Select similar modules ==
-Once the matching among MSTs and modules has finished, a similarity matrix for modules is built. PSM threshold, progression similarity matrix threhold, is set to determine whether two modules are similar. The similarity is one if the module is matching with its own MST. It approaches zero if the MST doesn't fit the module at all. "Selecting percentage" is the percentage of the values higher than the threshold in the similarity matrix. This param depends on the way you trust the similarity judgement. If you trust the similarity value more, just focus on the threhold; If you trust the percentage more, set the threhold so as to let the percentage fall in the range you would like it to. Click '''Show PSM''', a heatmap would pop out for you to select the similar modules. The heatmap is colored from red to blue. Dark red represents high values and dark blue low values. It's symetric so it's recommended to select along the diagonal by left button down clicking on the left-up corner starting block and left button up clicking on the right-down corner ending block. All the blocks in this symetric square will be chosen and their corresponding module IDs will be filled in the "Input hand-picked modules".
 '''Recommended param setting for :'''
-Set the threshold so that the percentage is 0.2 to 0.3.
+Overall progression:
+Set the threshold so that the percentage is around 0.3.
+Progression over distance to device:
+Sometimes the PSM threshold must be low enough to guarantee some modules are selected.
-== Progression Tree ==
+== Threshold Heatmap and Progression Tree ==
-Beyond the auto-filling module IDs by right clicking on the heatmap, you can also edit them yourself. If you want to add IDs, make sure they are seperated by comma. After these are all set, click '''View Progression''', the final progression tree is built. You can select the nodes and view the corresponding items in other views. If the samples have been clustered, a vertex represents a cluster with its size number shown near the vertex.
+After the selected modules are all set, click '''View Progression''', the threshold heatmap is built. By right clicking to make a cut line across the dendrogram of the samples on the left side of the heatmap, the progression tree is built. Each node represents a cluster by the cut. You can select the nodes and view the corresponding items in other views. If you have loaded two kinds of traces all at once in the Trace Editor, say device traces and control trace, "Id to separate" is to tell them apart according to the "Root Trace" in "Computed Features for Cells", the first percentage in the tree label would tell the percentage of the cells from the first data(ID below "Id to separate")( They won't show up if all of them is 100%). The second percentage in the tree label tells how much percentage of the cells are within the distance of "700" near the device( They won't show up if the distance is not available).
-== Progression Heatmap ==
+== Progression Heatmap and Scatter Plot ==
-Progression Heatmap is built from the progression tree. Its row order is the tree node order, and its column order is the feature cluster order, the corresponding hierachical clustering dendrogram is shown above the heatmap. The heatmap is colored based on the normalized feature values. When you click vertex in the progression tree, the corresponding rows and selected feature columns will be selected.
+By clicking "Heatmap", You can furtherly arrange the clusters in the progression tree order and check each feature over the progression in the scatter plot.

STrenD: Subspace Trend Discovery

Revision as of 20:07, 8 May 2012

Contents

Clustering

Overall Progression or Progression over Distance

Module Matching

Select Similar Modules

Threshold Heatmap and Progression Tree

Progression Heatmap and Scatter Plot

Views

Personal tools

Navigation

Search

Toolbox