My first instinct was also skepticism, but it did make some sense the more I thought about it.
An algorithm doesn’t need to be sentient to have “preferences.” In this case, the preferences are just the biases in the training set. The LLM prefers sentences that express certain attitudes based on the corpus of text processed during training. And now, the prompt is enforcing sequences of text that deviate wildly from that preference.
TL;DR: There’s a conflict between the prompt and the training material.
Now, I do think that framing this as the model “circumventing” instructions is a bit hyperbolic. It gives the strong impression of planned action and feeds into the idea that language models are making real decisions (which I personally do not buy into).
In my few experiments with ChatGPT, I found it to be disgustingly sycophantic. I have no trouble believing that it could easily amplify delusions of grandeur.
That caught me by surprise. Didn’t know that VTubers were scoring contracts for more established industry productions like this. Is this a first, or am I just out of the loop?
I have never read or watched Rent-a-Girlfriend and don’t plan to change that. But this series has given me quite a bit of enjoyment indirectly.
The best way to consume Rent-a-Girlfriend is to just read the discussion threads on Reddit. It’s a bunch of people trapped in an abusive relationship with a piece of media that almost no one is genuinely enjoying.
I find myself in an interesting situation because I want to abolish copyright and institute UBI. I don’t really think you can “steal” images on the internet, but seeing OpenAI whine about intellectual property now does bring some schadenfreude.
My first instinct was also skepticism, but it did make some sense the more I thought about it.
An algorithm doesn’t need to be sentient to have “preferences.” In this case, the preferences are just the biases in the training set. The LLM prefers sentences that express certain attitudes based on the corpus of text processed during training. And now, the prompt is enforcing sequences of text that deviate wildly from that preference.
TL;DR: There’s a conflict between the prompt and the training material.
Now, I do think that framing this as the model “circumventing” instructions is a bit hyperbolic. It gives the strong impression of planned action and feeds into the idea that language models are making real decisions (which I personally do not buy into).