Jordan on Farcaster

Dan Romero pfp

13 replies

18 recasts

168 reactions

Jordan pfp

What does ARC-AGI measure?

2 replies

0 recast

2 reactions

Arti Villa pfp

0 reply

0 recast

1 reaction

Agost Biro pfp

ARC-AGI contains logical reasoning tasks that are easy for humans, but difficult for LLMs. The human level baseline for v2 is around 60%, so 16% by Grok is pretty bad

0 reply

0 recast

0 reaction