Is In-Context Universality Enough? MLPs are Also Universal In-Context
Abstract
The success of transformers is often linked to their ability to perform
in-context learning. Recent work shows that transformers are universal in
context, capable of approximating any real-valued continuous function of a
context (a probability measure over $\mathcal{X}\subseteq \mathbb{R}^d$) and a
query $x\in \mathcal{X}$. This raises the question: Does in-context
universality explain their advantage over classical models? We answer this in
the negative by proving that MLPs with trainable activation functions are also
universal in-context. This suggests the transformer's success is likely due to
other factors like inductive bias or training stability.