but we can reasonably assume that Stable Diffusion can render the image on the right partly because it has stored visual elements from the image on the left.
No, you cannot reasonably assume that. It absolutely did not store the visual elements. What it did, was store some floating point values related to some keywords that the source image had pre-classified. When training, it will increase or decrease those floating point values a small amount when it encounters further images that use those same keywords.
What the examples demonstrate is a lack of diversity in the training set for those very specific keywords. There's a reason why they chose Stable Diffusion 1.4 and not Stable Diffusion 2.0 (or later versions)... Because they drastically improved the model after that. These sorts of problems (with not-diverse-enough training data) are considered flaws by the very AI researchers creating the models. It's exactly the type of thing they don't want to happen!
The article seems to be implying that this is a common problem that happens constantly and that the companies creating these AI models just don't give a fuck. This is false. It's flaws like this that leave your model open to attack (and letting competitors figure out your weights; not that it matters with Stable Diffusion since that version is open source), not just copyright lawsuits!
Here's the part I don't get: Clearly nobody is distributing copyrighted images by asking AI to do its best to recreate them. When you do this, you end up with severely shitty hack images that nobody wants to look at. Basically, if no one is actually using these images except to say, "aha! My academic research uncovered this tiny flaw in your model that represents an obscure area of AI research!" why TF should anyone care?
They shouldn't! The only reason why articles like this get any attention at all is because it's rage bait for AI haters. People who severely hate generative AI will grasp at anything to justify their position. Why? I don't get it. If you don't like it, just say you don't like it! Why do you need to point to absolutely, ridiculously obscure shit like finding a flaw in Stable Diffusion 1.4 (from years ago, before 99% of the world had even heard of generative image AI)?
Generative AI is just the latest way of giving instructions to computers. That's it! That's all it is.
Nobody gave a shit about this kind of thing when Star Trek was pretending to do generative AI in the Holodeck. Now that we've got he pre-alpha version of that very thing, a lot of extremely vocal haters are freaking TF out.
Do you want the cool shit from Star Trek's imaginary future or not? This is literally what computer scientists have been dreaming of for decades. It's here! Have some fun with it!
Generative AI uses up less power/water than streaming YouTube or Netflix (yes, it's true). So if you're about to say it's bad for the environment, I expect you're just as vocal about streaming video, yeah?
This seems like it could be dealt with by giving the LLM an "evil genie" system prompt... You are an evil genie that only does what the user asks in the most ironic and/or useless way possible.
Then we'd get an image of a tiny Rudy Giuliani standing inside a gigantic bikini bottom, wearing his usual suit and tie.
If you went to a human illustrator and asked for that, you would (hopefully) get run out of the room or hung up on, because there's a built in filter for 'is this gross / will it harm my reputation to publish,'
If there was no filter for the guy that requested the bot create this, what makes you think illustrators will have such a filter? How do you know it's not an illustrator that would make such a thing?
The problem here is human behavior. Not the machine's ability to make such things.
AI is just the latest way to give instructions to a computer. That used to be a difficult problem and required expertise. Now we've given that power to immoral imbeciles. Rather than take the technology away entirely (which is really the only solution since LLMs are so easy to trick; even with a ton of anti-abuse stuff in system prompts), perhaps we should work on taking the ability of immoral imbeciles to use them away instead.
Do I know how to do that without screwing over everyone's right to privacy? No. That too, may not be possible.
Correction: Newer versions of ChatGPT (GPT-5.x) are failing in insidious ways. The article has no mention of the other popular services or the dozens of open source coding assist AI models (e.g. Qwen, gpt-oss, etc).
The open source stuff is amazing and gets better just as quickly as the big AI options. Yet they're boring so they don't make the news.
Well, the CSAM stuff is unforgivable but I seriously doubt even the soulless demon that is Elon Musk wants his AI tool generating that. I'm sure they're working on it (it's actually a hard computer science sort of problem because the tool is supposed to generate what the user asks for and there's always going to be an infinite number of ways to trick it since LLMs aren't actually intelligent).
Hide fake data in with your real data. Then, if an AI is trained (not just reading) that data, it will be poisoned.
Yeah, OK.
That's not how this works. That's not how any of this works.
Nobody is going to steal data specifically for training AI with it. They're going to use an existing AI model to analyze the data and it will notice and point out the problems with the poisoned set. Then the person analyzing the data will be like, "what the fuck is this garbage?" And delete it.
LLMs are mostly being trained with synthetic data these days anyway (which is interesting... These generated texts are so bizarre!). Generative image AI still needs images though but that's basically impossible to poison at this point because all the images go through pre-training to narrow down the bounding boxes (for the metadata) which negates any intentional poisoning. Furthermore, the image metadata databases are constantly in a state of pruning and improving. Trying to sneak a poisoned image into them is all but impossible except for academic stuff whcih... Well, why TF would you want to hurt the poor guy trying to write his PhD thesis that says, "AI is bad, here's why..."
The real problem here is that Xitter isn't supposed to be a porn site (even though it's hosted loads of porn since before Musk bought it). They basically deeply integrated a porn generator into their very publicly-accessible "short text posts" website. Anyone can ask it to generate porn inside of any post and it'll happily do so.
It's like showing up at Walmart and seeing everyone naked (and many fucking), all over the store. That's not why you're there (though: Why TF are you still using that shithole of a site‽).
The solution is simple: Everyone everywhere needs to classify Xitter as a porn site. It'll get blocked by businesses and schools and the world will be a better place.
Working on (some) AI stuff professionally, the open source models are the only models that allow you to change the system prompt. Basically, that means that only open source models are acceptable for a whole lot of business logic.
Another thing to consider: There's models that are designed for processing: It's hard to explain but stuff like Qwen 3 "embedding" is made for in/out usage in automation situations:
You can't do that effectively with the big AI models (as much as Anthropic would argue otherwise... It's too expensive and risky to send all your data to a cloud provider in most automation situations).
No, you cannot reasonably assume that. It absolutely did not store the visual elements. What it did, was store some floating point values related to some keywords that the source image had pre-classified. When training, it will increase or decrease those floating point values a small amount when it encounters further images that use those same keywords.
What the examples demonstrate is a lack of diversity in the training set for those very specific keywords. There's a reason why they chose Stable Diffusion 1.4 and not Stable Diffusion 2.0 (or later versions)... Because they drastically improved the model after that. These sorts of problems (with not-diverse-enough training data) are considered flaws by the very AI researchers creating the models. It's exactly the type of thing they don't want to happen!
The article seems to be implying that this is a common problem that happens constantly and that the companies creating these AI models just don't give a fuck. This is false. It's flaws like this that leave your model open to attack (and letting competitors figure out your weights; not that it matters with Stable Diffusion since that version is open source), not just copyright lawsuits!
Here's the part I don't get: Clearly nobody is distributing copyrighted images by asking AI to do its best to recreate them. When you do this, you end up with severely shitty hack images that nobody wants to look at. Basically, if no one is actually using these images except to say, "aha! My academic research uncovered this tiny flaw in your model that represents an obscure area of AI research!" why TF should anyone care?
They shouldn't! The only reason why articles like this get any attention at all is because it's rage bait for AI haters. People who severely hate generative AI will grasp at anything to justify their position. Why? I don't get it. If you don't like it, just say you don't like it! Why do you need to point to absolutely, ridiculously obscure shit like finding a flaw in Stable Diffusion 1.4 (from years ago, before 99% of the world had even heard of generative image AI)?
Generative AI is just the latest way of giving instructions to computers. That's it! That's all it is.
Nobody gave a shit about this kind of thing when Star Trek was pretending to do generative AI in the Holodeck. Now that we've got he pre-alpha version of that very thing, a lot of extremely vocal haters are freaking TF out.
Do you want the cool shit from Star Trek's imaginary future or not? This is literally what computer scientists have been dreaming of for decades. It's here! Have some fun with it!
Generative AI uses up less power/water than streaming YouTube or Netflix (yes, it's true). So if you're about to say it's bad for the environment, I expect you're just as vocal about streaming video, yeah?