Sample-Efficient Private Learning of Mixtures of Gaussians
Abstract
We study the problem of learning mixtures of Gaussians with approximate
differential privacy. We prove that roughly $kd^2 + k^{1.5} d^{1.75} + k^2 d$
samples suffice to learn a mixture of $k$ arbitrary $d$-dimensional Gaussians
up to low total variation distance, with differential privacy. Our work
improves over the previous best result [AAL24b] (which required roughly $k^2
d^4$ samples) and is provably optimal when $d$ is much larger than $k^2$.
Moreover, we give the first optimal bound for privately learning mixtures of
$k$ univariate (i.e., $1$-dimensional) Gaussians. Importantly, we show that the
sample complexity for privately learning mixtures of univariate Gaussians is
linear in the number of components $k$, whereas the previous best sample
complexity [AAL21] was quadratic in $k$. Our algorithms utilize various
techniques, including the inverse sensitivity mechanism [AD20b, AD20a, HKMN23],
sample compression for distributions [ABDH+20], and methods for bounding
volumes of sumsets.