On the geometry and topology of representations: the manifolds of modular addition
In brief:
The Clock and Pizza interpretations, associated with architectures differing in either uniform or learnable attention, were introduced to argue that different architectural designs can yield distinct circuits for…
The Clock and Pizza interpretations, associated with architectures differing in either uniform or learnable attention, were introduced to argue that different architectural designs can yield…
In this work, we show that this is not the case, and that both uniform attention and trainable attention architectures implement the same algorithm via topologically and geometrically equivalent…
Open receipts to verify and go deeper.
About this source
Source
arXiv cs.LG
Type
Research Preprint
Published
Credibility
Peer-submitted research paper on arXiv
Always verify with the primary source before acting on this information.
arXiv cs.LG·Research Preprint·Primary Source·
On the geometry and topology of representations: the manifolds of modular addition
TL;DR
The Clock and Pizza interpretations, associated with architectures differing in either uniform or learnable attention, were introduced to argue that different architectural designs can yield distinct circuits for…
Scan abstract → experiments → limitations. Also: note model size and inference requirements.
Full Analysis
New research could change how AI systems work.
The Clock and Pizza interpretations, associated with architectures differing in either uniform or learnable attention, were introduced to argue that different architectural designs can yield…
In this work, we show that this is not the case, and that both uniform attention and trainable attention architectures implement the same algorithm via topologically and geometrically equivalent…