Dan Romero pfp
Dan Romero
@dwr.eth
Damn
13 replies
18 recasts
168 reactions

Jordan pfp
Jordan
@ruminations
What does ARC-AGI measure?
2 replies
0 recast
2 reactions

Arti Villa pfp
Arti Villa
@artivilla.eth
Accuracy
0 reply
0 recast
1 reaction

Agost Biro pfp
Agost Biro
@agostbiro
ARC-AGI contains logical reasoning tasks that are easy for humans, but difficult for LLMs. The human level baseline for v2 is around 60%, so 16% by Grok is pretty bad
0 reply
0 recast
0 reaction