Sample-Efficient Learning of Mixtures Journal Articles uri icon

  •  
  • Overview
  •  
  • Research
  •  
  • Identity
  •  
  • Additional Document Info
  •  
  • View All
  •  

abstract

  • We consider PAC learning of probability distributions (a.k.a. density estimation), where we are given an i.i.d. sample generated from an unknown target distribution, and want to output a distribution that is close to the target in total variation distance. Let $\textbackslashmathcal F$ be an arbitrary class of probability distributions, and let $\textbackslashmathcalF⌃k$ denote the class of $k$-mixtures of elements of $\textbackslashmathcal F$. Assuming the existence of a method for learning $\textbackslashmathcal F$ with sample complexity $m_\textbackslashmathcalF(\textbackslashepsilon)$, we provide a method for learning $\textbackslashmathcal F⌃k$ with sample complexity $O(k\textbackslashlog k \textbackslashcdot m_\textbackslashmathcal F(\textbackslashepsilon) /\textbackslashepsilon⌃2)$. Our mixture learning algorithm has the property that, if the $\textbackslashmathcal F$-learner is proper/agnostic, then the $\textbackslashmathcal F⌃k$-learner would be proper/agnostic as well. This general result enables us to improve the best known sample complexity upper bounds for a variety of important mixture classes. First, we show that the class of mixtures of $k$ axis-aligned Gaussians in $\textbackslashmathbbR⌃d$ is PAC-learnable in the agnostic setting with $\textbackslashwidetildeO(kd/\textbackslashepsilon ⌃ 4)$ samples, which is tight in $k$ and $d$ up to logarithmic factors. Second, we show that the class of mixtures of $k$ Gaussians in $\textbackslashmathbbR⌃d$ is PAC-learnable in the agnostic setting with sample complexity $\textbackslashwidetildeO(kd⌃2/\textbackslashepsilon ⌃ 4)$, which improves the previous known bounds of $\textbackslashwidetildeO(k⌃3d⌃2/\textbackslashepsilon ⌃ 4)$ and $\textbackslashwidetildeO(k⌃4d⌃4/\textbackslashepsilon ⌃ 2)$ in its dependence on $k$ and $d$. Finally, we show that the class of mixtures of $k$ log-concave distributions over $\textbackslashmathbbR⌃d$ is PAC-learnable using $\textbackslashwidetildeO(d⌃(d+5)/2\textbackslashepsilon⌃-(d+9)/2k)$ samples.

publication date

  • June 2017