Given that Facebook and meta’s other platforms are one of the largest distributors of that, if they scrape Facebook for data this is not exactly a surprise unfortunately…
How are you supposed to train the dam thing to detect something without using that thing though?
How are you supposed to train the dam thing to detect something without using that thing though?
There are various organizations which have clearances to handle child abuse images. Where the data is handled like the plutonium it is, and everybody is vetted. I’m sure they’ve already experimented with developing a bot to detect images.
They even make available their traditional hash databases to server admins who want to run their images against their hash databank.
The issue as the report states is that nobody will willingly link said bot/database to the training data, because either they don’t want a copyright fight or they don’t want to acknowledge the issue.
It’s about generators which certainly should not be trained with such material.