Soda (ff0000)

Soda

PM|Crypto Pay |Fediverse

37 Followers

Recent casts

A deep dive into 2025 LLM architectures reveals a fascinating divergence beneath the surface of the standard Transformer block. Attention Isn't Settled: GQA is the new baseline, but it's being challenged. DeepSeek's Multi-Head Latent Attention (MLA) compresses the KV cache for memory savings, while Gemma 3's use of sliding window attention prioritizes local context efficiency. The MoE Philosophy Split: The Mixture-of-Experts paradigm is fragmenting. The debate continues between using many small experts (DeepSeek, Qwen3-Next) versus fewer, wider ones (gpt-oss). The inconsistent use of a "shared expert" (present in GLM-4.5/Grok, absent in recent Qwen/gpt-oss) indicates no single best practice has been established. The Return of Old Ideas: gpt-oss has revived attention biases, a feature largely abandoned post-GPT-2, and introduced learned "attention sinks" for stability—a contrast to the cleaner designs seen in models like Llama.

  • 0 replies
  • 0 recasts
  • 0 reactions

Has Lens considered users other than artists and e-beggars?

  • 0 replies
  • 0 recasts
  • 1 reaction

Top casts

Killer app is the "elephant in the room" of the web3 . From VC to startup, everyone wants to build Infrastructure with high valuation and high demand certainty. Even application developers prefer copy already validated demands across newly emerged L2 chains. Few dare to face uncertainty or pursue product innovation.

  • 0 replies
  • 0 recasts
  • 2 reactions

Curious about what the Day 7 Retention is after Farcaster moves to permissionless.

  • 1 reply
  • 0 recasts
  • 1 reaction

Onchain profile

Ethereum addresses