Modifications to a neural network's input and output layers are often
required to accommodate the specificities of most practical learning tasks.
However, the impact of such changes on architecture's approximation
capabilities is largely not understood. We present general conditions
describing feature and readout maps that preserve an architecture's ability to
approximate any continuous functions uniformly on compacts. As an application,
we show that if an architecture is capable of universal approximation, then
modifying its final layer to produce binary values creates a new architecture
capable of deterministically approximating any classifier. In particular, we
obtain guarantees for deep CNNs and deep feed-forward networks. Our results
also have consequences within the scope of geometric deep learning.
Specifically, when the input and output spaces are Cartan-Hadamard manifolds,
we obtain geometrically meaningful feature and readout maps satisfying our
criteria. Consequently, commonly used non-Euclidean regression models between
spaces of symmetric positive definite matrices are extended to universal DNNs.
The same result allows us to show that the hyperbolic feed-forward networks,
used for hierarchical learning, are universal. Our result is also used to show
that the common practice of randomizing all but the last two layers of a DNN
produces a universal family of functions with probability one. We also provide
conditions on a DNN's first (resp. last) few layer's connections and activation
function which guarantee that these layers can have a width equal to the input
(resp. output) space's dimension while not negatively affecting the
architecture's approximation capabilities.