Chaos 🎩 on Farcaster

Content pfp

https://opensea.io/collection/science-14

0 reply

0 recast

0 reaction

Chaos 🎩 pfp

@multifractal.eth

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad All evaluated LLMs models performed very poorly, with even the best-performing model achieving an average accuracy of less than 5%. https://arxiv.org/abs/2503.21934

0 reply

0 recast

0 reaction