"Not Another Manifold"
When people hear 'Deep Manifold,' their first reaction is usually, 'Great, not another manifold...
The manifold structure implied by modern neural networks defies classical mathematical intuition: it is neither globally defined nor locally consistent in the traditional sense, instead exhibiting nested hierarchies and paralogical transitions across layers. Within this structure, there is no 'memory' or 'reasoning' in the cognitive or symbolic sense; rather, what emerges is an Intrinsic Pathway, a mathematically plausible construct that encodes continuity, transformation, and convergence in the high-order nonlinear space of representation.
Classical manifold theory, except in its piecewise form, is fundamentally incompatible with neural networks. Traditional manifolds require a pre-defined structure, neural network manifolds are formed through training. Furthermore, neural networks are not purely forward mappings; they embody a combined forward-inverse dynamic representations.
In approaching such systems, we must resist the temptation to 'overfit' a mathematical framework to match a problem. Instead, we must start with the right question. For neural networks, the foundational question is: What is the nature of the relationship between tokens? Are these relationships linear, low-order nonlinear, high-order nonlinear, or discontinuous? Answering this determines the appropriate mathematical and representational models, rather than assuming them a priori.
Recently, A new mathematical proof shows that dimension 126 can support rare, twisted geometries , fascinatingly, that echoes our intuition about certain transformer embedding sizes... In theory, transformers, with their large embedding spaces, should be well-suited to capture such complexity. But in practice, the sheer size of these embeddings often becomes a liability. Neural networks can be understood as **piecewise** manifolds; transformers go even further, shaping **pointwise** manifolds. While such representations are flexible enough to bend into highly nonlinear, even discontinuous, forms in high-dimensional space, they are notoriously difficult to manage. We’ve seen something similar before, in the 1990s (was part of my PhD), when overly complex function spaces outpaced our ability to train or control them effectively.
see The Mathematical Lineage of Deep Manifold
In our view, neural networks embody 大智若愚—"Great wisdom appears like foolishness.", even “worse” than 大道至简 ("Great wisdom lies in simplicity"). We’ve found that neural networks often operate in ways that seem mathematically primitive, even counterintuitive, yet they reveal surprising depth and power beneath that apparent simplicity.



