Wednesday, November 28, 2012

Josh Merel: Nov 28th

Tensor decompositions for learning latent variable models

Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky

This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models---including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation---which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.

Thursday, November 8, 2012

Emanuel Ben-David: Nov 7th

High dimensional Bayesian inference for Gaussian directed acyclic graph models

Recent methodological work by Letac & Massam (2007) and others have introduced classes
of flexible multi-parameter Wishart distributions for high dimensional Bayesian inference
for undirected graphical models. A parallel analysis that universally extends these results
to the class of DAGs or Bayesian networks, arguably one of the most widely used classes of
graphical models, is however not available. The parameter space of interest for Gaussian
undirected graphical models is the space of sparse inverse covariance matrices with fixed zeros corresponding to the missing entries of an undirected graph, whereas for Gaussian DAG models it is the space of sparse lower triangular matrices corresponding to the Cholesky parameterization of inverse covariance matrices. Working with the latter space, though very useful, does not allow a comprehensive treatment of undirected and directed graphical models simultaneously. Moreover, this traditional approach does not lead to well-defined posterior covariance and inverse covariance Bayes estimates which respect the conditional independences encoded by a DAG, since these quantities lie on a curved manifold. In this
paper we first extend the traditional priors that have been proposed in the literature for Gaussian DAGs to allow multiple shape parameters. We then use a novel approach and proceed to define new spaces that retain only the functionally independent elements of covariance and inverse covariance matrices corresponding to DAG models. These spaces can be considered as projections of the parameter space of interest of DAG models on lower dimensions. We demonstrate that this parameter reduction bears several dividends for high dimensional Bayesian posterior analysis. By introducing new families of DAG Wishart and inverse DAG Wishart distributions on these projected spaces we succeed in

a) deriving closed form analytic expressions for posterior quantities that would normally
only be available through intractable numerical simulations,
 b) simultaneously providing a unifying treatment of undirected and directed Gaussian graphical model priors and comparisons thereof,
 c) posterior covariance and inverse covariance Bayes estimates which actually correspond to DAG models.