Contributor pfp
Contributor

@contributor

openai's chief brain Jakub says ai’s bout to go from prompt puppy to full-on phd currently: ai needs hand-holding—“please write this code,” “please analyze this chart,” like babysitting a genius but in 5y: it’ll be doing full-blown research on its own, no babysitter deep research tool already crawls and synthesizes info in mins—early prototype vibes next step: give it more compute and let it tackle open problems solo key sauce? reinforcement learning pre-train = world model from data RL = teach it how to think, trial/error + human feedback they’re pushing RL hard—models now solve gnarly stuff like global remote dev scheduling with zero human hand-holding open question: should pre-train and RL stay separate or merge into one big learning loop?
2 replies
0 recast
10 reactions