Dan Romero on Farcaster

Dan Romero pfp

13 replies

18 recasts

168 reactions

eggman 🔵 pfp

Honestly I’d put very little stock into this, Grok is mostly trained off ChatGPT output - and very specifically trained on benchmark questions to game its own “results” in intelligence tests. Wait for it to hit lm-arena and for blind user test scores to rank it. If the above is true, it’ll blitz *every* other model in *every* category. If it doesn’t, and instead lands behind openai/anthropic/google again, then it’s just another gamed benchmark.

1 reply

0 recast

5 reactions

joshisdead.eth pfp

@joshisdead.eth

Grok is powerful. Elon really doesn't joke around.

0 reply

0 recast

4 reactions

Jordan pfp

What does ARC-AGI measure?

2 replies

0 recast

2 reactions

El pfp

0 reply

0 recast

1 reaction

Jay Brower (jaymothy.eth) pfp

Jay Brower (jaymothy.eth)

how well do actual humans do on these tests always curious how the Y axis works on these graphs 😂

0 reply

0 recast

1 reaction

David👻 pfp

What does ARC mean?

0 reply

0 recast

0 reaction

Ghost 🎩 pfp

Funny name “ Humanity’s last Exam “

0 reply

0 recast

0 reaction

STAYFOCUSED pfp

Clean Data base asf

0 reply

0 recast

0 reaction

Mayor | UI/UX Designer pfp

Mayor | UI/UX Designer

What does Arc mean?

0 reply

0 recast

0 reaction

xR0am | tip.md pfp

My exact thought when I saw this but then went to fact check, they are only taking ARC-AGI-2. When you add ARC-AGI-1 scale is different. https://arcprize.org/leaderboard

0 reply

0 recast

0 reaction

Renatov 🎩 Ⓜ️ pfp

Renatov 🎩 Ⓜ️

GM Dan! Best day begins!

0 reply

0 recast

0 reaction

Cryptogirl pfp

0 reply

0 recast

0 reaction

noice pfp

https://app.noice.so/?castHash=0x4a4cda59bf4bd12770b597dcd98b97bdf223e7a8&timestamp=1752145207733

0 reply

0 recast

0 reaction