Benefits of Additive Noise in Composing Classes with Bounded Capacity
Journal Articles
Overview
Research
View All
Overview
abstract
We observe that given two (compatible) classes of functions $\mathcal{F}$ and
$\mathcal{H}$ with small capacity as measured by their uniform covering
numbers, the capacity of the composition class $\mathcal{H} \circ \mathcal{F}$
can become prohibitively large or even unbounded. We then show that adding a
small amount of Gaussian noise to the output of $\mathcal{F}$ before composing
it with $\mathcal{H}$ can effectively control the capacity of $\mathcal{H}
\circ \mathcal{F}$, offering a general recipe for modular design. To prove our
results, we define new notions of uniform covering number of random functions
with respect to the total variation and Wasserstein distances. We instantiate
our results for the case of multi-layer sigmoid neural networks. Preliminary
empirical results on MNIST dataset indicate that the amount of noise required
to improve over existing uniform bounds can be numerically negligible (i.e.,
element-wise i.i.d. Gaussian noise with standard deviation $10^{-240}$). The
source codes are available at
https://github.com/fathollahpour/composition_noise.