University of Wollongong
Browse

A generalized motif bicluster algorithm

Download (108.17 kB)
conference contribution
posted on 2024-11-17, 12:25 authored by Sebastian Kaiser, Friedrich Leisch
In many application domains different clusters in data may be defined by different sets of variables. E.g., in maketing one group of consumers could mainly be concerned about price and technical features of a product, while others care most about design and how \cool" the product is (almost regardless of the price). Standard clustering algorithms use all variables for all clusters and hence may fail to detect such structures in the data. Biclustering is the simultaneous clustering of columns and rows in a data set: each cluster is defined by a different subset of variables, these subsets can of course be overlapping. R package biclust (Kaiser & Leisch 2008, Kaiser et al 2008) contains a comprehensive collection of bicluster algorithms, preprocessing methods, and validation and visualization techniques for bicluster results. The main focus of this presentation will be on recent additions to the package: There are new functions for bicluster validation and comparison. A new generalization of the well-known motif bicluster algorithm has been developed which is particularly suited for biclustering of marketing survey data. While the standard motif algorithm only searches for constant entries in the data matrix, our generalization is better suited for ordinal and metric data. The user can specify
eighborhood patterns" like intervals or density kernels of pre-specified size for metric data. In addition to finding more general patterns than constant groups only this also allows to calculate a posterior probabilities of cluster membership and can be seen as a first step towards fully model-based biclustering. All new methods will be demonstrated using real data from marketing applications.

History

Citation

Sebastian, K. & Leisch, F. (2009). A generalized motif bicluster algorithm. UseR 2009 (p. 101). Rennes, France: Agrocampus Ouest.

Parent title

R User Conference

Pagination

101

Language

English

RIS ID

29170

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC