The Computer Science Colloquium

Thursday, October 8, 4:15pm, room 9204/9205



Rave Harpaz
(Columbia University)

"Model-based Linear Manifold Clustering"

A new paradigm of clustering called "Linear Manifold Clustering" will be presented. A linear manifold is a translated subspace, which can be visualized as a line, plane, hyperplane, etc., depending on its dimensionality. Classical clustering algorithms are based on the concept that a cluster center is a single point, and that clusters are sets of points compact around this central point. The linear manifold clustering paradigm introduces the concept of linear manifold clusters which are groups of points compact around a linear manifold.

The "birth" of this paradigm of clustering is a consequence of what we believe is a need for an important yet overlooked cluster model. Moreover, in many problem domains it assumed that linear models are sufficient enough to describe and capture the data's inherent structure. Yet very few remote attempts have been made to devise clustering methods able to identify or learn mixtures of linear manifolds. Linear manifold clustering is based on a stochastic model which describes a "process" responsible for generating sets of points that lie on linear manifolds. We show that this model is a generalization of other more common cluster models. This generalization allows for less assumptions to be imposed on the data, and more freedom for the data to "speak for itself".

Based on the linear manifold cluster model, we present a set of clustering and modeling techniques along with experiments that demonstrate the applicability and efficacy of the linear manifold clustering paradigm to a wide range of applications. An emphasis is put on the application of DNA microarray analysis, where the goal is to identify clusters of genes that exhibit similar expression patterns, from which gene function may be inferred.




The Colloquium is supported by generous contributions from the Bloomberg, Information Builders, Inc., and Netlogic, Inc.

       


365 Fifth Ave, New York City 10016 | Room 4319 | Phone: 212.817.8190 | Fax: 212.817.1510 | compsci@gc.cuny.edu