Connor McCormick ☀️ on Farcaster

Content pfp

https://warpcast.com/~/channel/worldcoin

0 reply

0 recast

0 reaction

ted (not lasso) pfp

ted (not lasso)

sharing a few speculative thoughts / questions if @worldcoin app becomes a social app, after reading this @bankless piece by @robinson. half-baked hypothesis: 1. would shift world from identity-focused to content-focused, which is *hard* 2. user behavior, engagement, and content becomes highly valuable training data for OpenAI (if users consent into it; opt-in should be imperative here) 3. then, in theory, $WLD could be the mechanism to compensate users for providing training data (in contrast to reddit selling user data but users get no cut whatsoever) open questions i can't stop thinking about: 1. how do you measure which data is most valuable? not all data is equal 2. who gets to choose who the data is sold to and at what cost? 3. does incentivizing data actually warp the data? a la goodhart's law problem "when a measure becomes a target, it ceases to be a good measure" https://www.bankless.com/read/world-openai-social-network

10 replies

5 recasts

50 reactions

Connor McCormick ☀️ pfp

Connor McCormick ☀️

Re. 2) You can do dropout on data to find the information contribution value. What I would do is train a secondary model to estimate the dropout value of data, then allow data holders to pay to dispute their value attribution. When a funding threshold is reached the model is ablation tested to find the true value of their data. Set up the funding mechanism so that accurate error detection is incentivized (i.e. if the ablation test returns a large delta with the estimates of the secondary model, the funding is mostly returned). With this approach, a huge amount of compute would be allocated for learning categories of data informativeness, which could not only accurately reward contributors but also could direct data generation efforts. It would be largely robust to your concerns about Goodharting (at least as much as the rest of the world, as with today we’d call the reflexive parts ‘trends’) You’d be in some sense learning the derivative of data value, gives much richer data + credible neutrality

0 reply

0 recast

0 reaction