Harmonic Co-clustering Heatmap

From FarsightWiki
(Difference between revisions)
Jump to: navigation, search
(Heatmap)
Line 19: Line 19:
 
[[Image:rawh.png|300px|left|thumb|Raw Data Heatmap]]
 
[[Image:rawh.png|300px|left|thumb|Raw Data Heatmap]]
 
[[Image:reh.png|300px|right|thumb|Processed Data Heatmap]]
 
[[Image:reh.png|300px|right|thumb|Processed Data Heatmap]]
 
== Progression Tree ==
 
Beyond the auto-filling module IDs by right clicking on the heatmap, you can also edit them yourself. If you want to add IDs, make sure they are seperated by comma. After these are all set, click '''View Progression''', the final progression tree is built. You can select the nodes and view the corresponding items in other views. If the samples have been clustered, a vertex represents a cluster with its size number shown near the vertex.
 
 
== Progression Heatmap ==
 
Progression Heatmap is built from the progression tree. Its row order is the tree node order, and its column order is the feature cluster order, the corresponding hierachical clustering dendrogram is shown above the heatmap. The heatmap is colored based on the normalized feature values. When you click vertex in the progression tree, the corresponding rows and selected feature columns will be selected.
 

Revision as of 22:26, 15 November 2012

Contents

Introduction

Co-clustering is a data mining technique that clusters both rows and columns of a data matrix simultaneously. It looks for clusters of similar objects and correlated features that distinguish groups of objects. Harmonic co-clustering is a co-clustering method that achieves clusters in both row space and column space, block structures of a data matrix based on discrete harmonic analysis. Harmonic basis are induced from a constructed coupled geometry of the two spaces. We obtain the coupled geometry by taking the tensor product of local geometry which is captured by a hierarchical partition tree in both row space and column space. Data set is smooth and block structure is obtained with respect to low entropy of expansion coefficients on data matrix. The algorithm proceeds in an iterative way that each local geometry is constructed on the update version of geometry of the other space. By iteratively doing this, the data set become more and more compact and smooth each time. The entropy of expansion coefficients will converge to a low value as thresholdded.

Hierarchical Partition Tree

Orthogonal Basis & Hierarchical Partition Tree

The geometrical structure of a high dimensional data cloud is assumed to be captured by a finer and finer hierarchical partition tree in different levels. Different levels represent different resolutions of the dataset. The bottom level of the tree is comprised with N (number of points) folders each consists a single member from N elements. The top level of the hierarchical tree is comprised with one folder which is the combination of all elements. In the middle levels, folders are combinations of elements from lower levels. All those folders are obtained by applying a simple heuristic k-means clustering on lower level data points. Each folder is denoted as a new data point by a linear combination of the all elements that make up the folder. Each level of the partition tree is a representation of the data cloud, and each level span a subspace of the could. The bottom level which includes each single elements spans the whole data space. Since the higher level is linear combination of the lower level, the space it spans is a subspace, thus there exist an orthogonal subspace which can together comprise the space spanned by lower level. Those basis element exists in the orthogonal subspace is supported by the folders we have. A briefly schematic illustration of the concept is as shown in the figure on the right.

Coupled Geometry

For a given data matrix, there are two spaces that we consider, the row space and the column space. A coupled geometry of the two spaces is derived by taking the tensor product of the hierarchical partition tree from each space. Again we use the concept of basis function for the coupled geometry, folders become rectangular in the tensor space. The basis function in the tensor space is simply the tensor product of the basis function induce by each local geometry.Schematic illustration as shown on the right

rectangular basis

Smoothness

As is known in the domain of harmonic analysis, signal or dataset is smooth with respect to low coefficients entropy. We compute the first order norm entropy of the expansion based on orthogonal basis function induced from the global coupled geometry in each iteration until it reaches the threshold that we set.

Heatmap

Given a data matrix, it is easy to visualize it in a heatmap. Pictures below show a heatmap of raw data matrix and a heatmap after applying the harmonic co-clustering algorithm.

Raw Data Heatmap
Processed Data Heatmap
Personal tools