Kazani pfp
Kazani

@kazani

How LLMs Actually Work An LLM is basically a machine that reads text and guesses the next word, over and over, really fast • Models can't read letters so it chops texts into chunks ("tokenization") -> each chunk gets a number • Each number gets looked up in a giant table that turns it into a long list of numbers (a "vector") • Words with similar meanings end up with similar vectors. "King" sits near "queen," "Paris" near "France" • Tracking word order. A vector for "dog" is the same whether it's the first or fifth word, which is a problem since order changes meaning. So the model adds position information • Letting words talk to each other ("attention"). Each word looks at the other words and decides which ones matter to it. In "The cat that I saw yesterday was sleeping," when the model hits "was," attention figures out that "cat" is the thing doing the sleeping, not "yesterday." It does this by scoring how well words match and focusing on the strong matches • The model then runs attention dozens of times in parallel - one "head" might track grammar, another tracks which pronoun refers to which name, etc. A big model has thousands of these • After words share info, each word gets processed individually. This is where most of the model's stored "knowledge" actually lives - the fact that Paris is France's capital is baked into these layers • Producing the next word. At the end, the model turns its final numbers into a score for every possible next word, converts those into probabilities, and picks one. A "temperature" setting controls how predictable vs. creative the pick is. Then it adds that word to the text and runs the whole loop again. A paragraph is just this loop running word by word What makes models different is mostly (1) what text they were trained on, (2) size/configuration choices, and (3) the fine-tuning done afterward to make them follow instructions and behave TLDR: an LLM converts text to numbers, lets those numbers repeatedly compare and update each other through stacked layers, and uses the result to guess the next word - and "intelligence" is what emerges from doing that at massive scale https://www.0xkato.xyz/how-llms-actually-work/
0 reply
0 recast
5 reactions