Home
Scholarly Works
Near-optimal Sample Complexity Bounds for Robust...
Journal article

Near-optimal Sample Complexity Bounds for Robust Learning of Gaussian Mixtures via Compression Schemes

Abstract

We introduce a novel technique for distribution learning based on a notion of sample compression . Any class of distributions that allows such a compression scheme can be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. As an application of this technique, we prove that ˜Θ( kd 2 /ε 2 ) samples are necessary and sufficient for learning a mixture of k Gaussians in R d , up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that Õ( kd /ε 2 ) samples suffice, matching a known lower bound. Moreover, these results hold in an agnostic learning (or robust estimation) setting, in which the target distribution is only approximately a mixture of Gaussians. Our main upper bound is proven by showing that the class of Gaussians in R d admits a small compression scheme.

Authors

Ashtiani H; Ben-David S; Harvey NJA; Liaw C; Mehrabian A; Plan Y

Journal

Journal of the ACM, Vol. 67, No. 6, pp. 1–42

Publisher

Association for Computing Machinery (ACM)

Publication Date

December 31, 2020

DOI

10.1145/3417994

ISSN

0004-5411

Contact the Experts team