We're doing reinforcement learning from human feedback, but that's a super weak form of reinforcement learning. What is the equivalent reward model in AlphaGo for RLHF? It's what I call a vibe check

Imagine if you wanted to train an AlphaGo RLHF, you would be giving 2 people 2 boards and said: which one do you prefer?

🇧🇷🇺🇸 - Book: Making Things Think: https://holloway.com/mtt Investor in Wander, Carry, Footprint, Merkle Manufactory (Farcaster), Dynamic, Paragraph

building @privy | surfing @ venice, ca | eating @ gjusta / gjelina | “lucky me, lucky mud” 🤠

I will post more AI content. That’s a great idea!

this taught me a lot about AI, i hope you keep posting AI content -- have you thought about cutting your clips and uploading to @10kdotworld.eth? or can you start a new AI channel where i can follow your thinking?

also 4500 $degen

Waiting till we will have LLM with Asperger

Wishing to receive a gift from you once in my life 🎁