Dan Romero
@dwr.eth
Damn
13 replies
18 recasts
168 reactions
Jordan
@ruminations
What does ARC-AGI measure?
2 replies
0 recast
2 reactions
Arti Villa
@artivilla.eth
Accuracy
0 reply
0 recast
1 reaction
Agost Biro
@agostbiro
ARC-AGI contains logical reasoning tasks that are easy for humans, but difficult for LLMs. The human level baseline for v2 is around 60%, so 16% by Grok is pretty bad
0 reply
0 recast
0 reaction