Universal Regular Conditional Distributions via...

Universal Regular Conditional Distributions via Probabilistic Transformers

Abstract

We introduce a deep learning model that can universally approximate regular conditional distributions (RCDs). The proposed model operates in three phases: first, it linearizes inputs from a given metric space X$$\mathcal {X}$$ to Rd$$\mathbb {R}^d$$ via a feature map, then a deep feedforward neural network processes these linearized features, and then the network’s outputs are then transformed to the 1-Wasserstein space P1(RD)$$\mathcal {P}_1(\mathbb {R}^D)$$ via a probabilistic extension of the attention mechanism of Bahdanau et al. (Neural machine translation by jointly learning to align and translate, 2014. arXiv:1409.0473). Our model, called the probabilistic transformer (PT), can approximate any continuous function from Rd$$\mathbb {R}^d $$ to P1(RD)$$\mathcal {P}_1(\mathbb {R}^D)$$ uniformly on compact sets, quantitatively. We identify two ways in which the PT avoids the curse of dimensionality when approximating P1(RD)$$\mathcal {P}_1(\mathbb {R}^D)$$-valued functions. The first strategy builds functions in C(Rd,P1(RD))$$C(\mathbb {R}^d,\mathcal {P}_1(\mathbb {R}^D))$$ which can be efficiently approximated by a PT, uniformly on any given compact subset of Rd$$\mathbb {R}^d$$. In the second approach, given any function f in C(Rd,P1(RD))$$C(\mathbb {R}^d,\mathcal {P}_1(\mathbb {R}^D))$$, we build compact subsets of Rd$$\mathbb {R}^d$$ whereon f can be efficiently approximated by a PT.