I ran an AI startup back in 2017 and this was a huge deal for us and I’ve seen no actual improvement in this problem. NYTimes is spot on IMO
I ran an AI startup back in 2017 and this was a huge deal for us and I’ve seen no actual improvement in this problem. NYTimes is spot on IMO
This is a threat to any neural network that is being constantly trained. Hell it’s even a problem with our brain’s NN. We just call it “believing your own bullshit” or “getting high on your own supply”.
The issue with NNs looking for cures or diseases (or anything that isn’t trained off of the internet) is that they are basically out of training data. They’ll need orders of magnitude more to get better and we just don’t have that. We haven’t figured out a way to get better off of less data and there’s no real movement on that front either.
What we have right now is essentially a culmination of research that has been going on since the 1960’s that was finally able to be realized with us figuring out:
Completely irrelevant. The title and posted article are talking about unintentionally training LLM text generation models with prior output of other AI models. Not having enough training data for other types of models is a completely different problem and not what the article is about.
Nobody is going to "trawl the web for new data to train their next models” (to quote the article) for a model trying to cure diseases.