Mixtures of Gaussians are Privately Learnable with a Polynomial Number
of Samples
Abstract
We study the problem of estimating mixtures of Gaussians under the constraint
of differential privacy (DP). Our main result is that
$\text{poly}(k,d,1/\alpha,1/\varepsilon,\log(1/\delta))$ samples are sufficient
to estimate a mixture of $k$ Gaussians in $\mathbb{R}^d$ up to total variation
distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the
first finite sample complexity upper bound for the problem that does not make
any structural assumptions on the GMMs.
To solve the problem, we devise a new framework which may be useful for other
tasks. On a high level, we show that if a class of distributions (such as
Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun
et al., 2021) with respect to total variation distance, then the class of its
mixtures is privately learnable. The proof circumvents a known barrier
indicating that, unlike Gaussians, GMMs do not admit a locally small cover
(Aden-Ali et al., 2021b).