Dan Romero
@dwr
Wonder if ChatGPT will be the last major model to be trained on the open web? robots.txt specifically disallowing crawling from LLMs unless getting paid for the data?
9 replies
0 recast
0 reaction
phil
@phil
I don’t think so. If we continue to see model sizes increase I would expect GPT-4, 5 to also be trained on a similar corpus with better results. What ~might~ happen is that new webpages have protection against this kind of scraping. Hard to do retroactively since the data is probably already cached
0 reply
0 recast
0 reaction