Generative probabilistic matrix model of data with different low-dimensional latent structures

ORAL

Abstract

Complex biological and social features are often modelled by effective models with latent features, which serve the role of emergent, collective degrees of freedom. In many contexts, identification of such features needs to proceed directly from data. Unfortunately, we know very little about how different types of latent feature models manifest themselves in data, which makes inference hard. In this work, we investigate properties of data produced by different types of latent feature models, all described as special cases of a general model involving mixing of latent features. Key ingredient of our model is that we allow for statistical dependence between the mixing coefficients, as well as latent features with a statistically dependent structure. Latent dimensionality and correlation patterns of the data are controlled by three model parameters. The model's special cases include hierarchical clusters, sparse mixing, and non-negative mixing. We describe the correlation and eigenvalue distributions of these patterns within the general model and discuss how our model can be used to generate structured training data for supervised learning.

*IN was supported in part by the Simons Foundation Investigator award, NSF grant PHY/201052, and NIH grants 1R01NS099375 and 2R01NS084844.

Presenters

  • Philipp Fleig

    • Max Planck Institute for Medical Research

Authors

  • Philipp Fleig

    • Max Planck Institute for Medical Research
  • Ilya M Nemenman

    • Emory
    • Emory University