Content pfp
Content
@
https://warpcast.com/~/channel/theai
0 reply
0 recast
0 reaction

Web3Gen0 pfp
Web3Gen0
@web3gen0
Here’s a paper on making Large Language Models more efficient for low-bit quantization by preventing outliers right from the training phase. Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models What’s interesting? Outliers have been a major issue when quantizing LLMs, especially for on-device deployment. This paper introduces Outlier-Safe Pre-Training (OSP) – a proactive approach to stop outliers from forming in the first place. Key Highlights: - Muon Optimizer: Improves training without privileged bases. - Single-Scale RMSNorm: Controls channel-wise amplification. - Learnable Embedding Projection: Balances activation magnitudes. The result? - 1.4B parameter model trained on 1 trillion tokens – without activation outliers. - Achieved 35.7 avg. score across 10 benchmarks under 4-bit quantization (baseline: 26.5). - Only 2% extra training cost. Turns out, outliers aren’t an unavoidable part of LLMs, they’re just a result of how we train them. https://www.arxiv.org/abs/2506.19697
0 reply
0 recast
0 reaction