The universal approximation property of various machine learning models is
currently only understood on a case-by-case basis, limiting the rapid
development of new theoretically justified neural network architectures and
blurring our understanding of our current models' potential. This paper works
towards overcoming these challenges by presenting a characterization, a
representation, a construction method, and an existence result, each of which
applies to any universal approximator on most function spaces of practical
interest. Our characterization result is used to describe which activation
functions allow the feed-forward architecture to maintain its universal
approximation capabilities when multiple constraints are imposed on its final
layers and its remaining layers are only sparsely connected. These include a
rescaled and shifted Leaky ReLU activation function but not the ReLU activation
function. Our construction and representation result is used to exhibit a
simple modification of the feed-forward architecture, which can approximate any
continuous function with non-pathological growth, uniformly on the entire
Euclidean input space. This improves the known capabilities of the feed-forward
architecture.