Kazani pfp
Kazani

@kazani

Epiplexity How can next-token prediction on human text lead to superhuman skills? How can synthetic data sometimes beat “real” data? And how did AlphaZero learn so much from nothing but the rules of chess? Classic information theory seems to say this shouldn’t happen. Yet it clearly does. The problem is that traditional information theory assumes an observer with unlimited computing power. An unbounded observer can crack any code and reverse any function instantly. To them, a cryptographically encrypted message is "simple" because they can easily find the seed that generated it, distinguishing it easily from pure random noise. If you ignore time, ciphertext isn’t "random", it's the output of a short recipe plus a key. But if you can't afford the computation, it behaves like noise. But AI systems don't have infinite compute. They’re bounded. And once time and compute matter, a new distinction appears: - Time-Bounded Entropy (Randomness): Data that is computationally hard to predict. This includes true noise, but also things like encryption keys or complex hashes that look random to a neural network. - Epiplexity (Structure): Patterns, abstractions, and rules that a model can actually learn and use to compress the data within a reasonable time. They formalize it roughly like this: 1. Find the smallest model that can predict the data within a time limit. 2. The size of that model is epiplexity. Whatever remains unpredictable is time-bounded entropy. This solves the paradox. Random noise has high entropy but low epiplexity because no amount of computing power helps you find a pattern, so the model learns nothing. Meanwhile, a strategy game or a textbook has high epiplexity. It forces the model to build complex internal circuits (shortcuts and concepts) to predict the data efficiently. A neat example from the paper: training a model to predict chess moves is standard. But training it to predict the game in reverse (inferring moves from the final board) is computationally harder. This difficulty forces the model to learn deeper representations of the board state (higher epiplexity), which actually improves its performance on new, unseen chess puzzles. The computation "created" information by converting the implicit consequences of the rules into explicit, usable structures (epiplexity) that the model can now use to play well. In summary: The value of data isn’t just about how unpredictable it is. It’s about how much reusable structure it induces in a learner that has real-world limits. Epiplexity is the amount of structure a model is worth learning because it reduces prediction error enough to justify the added complexity under a time limit. Read the paper: https://arxiv.org/abs/2601.03220
0 reply
0 recast
3 reactions