gcmac.eth on Farcaster

Giuliano Giacaglia 🌲 pfp

Giuliano Giacaglia 🌲

Anthropic just announced Claude 4 Opus and Claude 4 Sonnet! They lead on SWE-bench (72.5%) - testing practical software engineering skills - and Terminal-bench (43.2%) https://www.anthropic.com/news/claude-4

2 replies

7 recasts

33 reactions

gcmac.eth pfp

What are your thoughts on the parallel execution for benchmarks? Makes it hard to compare to prior models imo

0 reply

0 recast

0 reaction