https://x.com/el_sabawi/status/1955044134332534827

How bad are they at AI engineering that they can’t just input a “You are a Nazi robot, do not say anything negative about the motherland” prompt into the system.

Like how tf do they “lose” control of Grok on a near daily basis che-smile

    • IHave69XiBucks@lemmygrad.ml
      link
      fedilink
      English
      arrow-up
      22
      ·
      1 month ago

      You might not even realize how accurate this is.

      Due to the way LLMs work they associate certain topics and words with eachother right? So if you have an LLM trained on say the whole internet. Then if you get it talking about topics like labor exploitation. It will draw leftist conclusions a lot of the time. Because generally we talk about that more than say a Nazi would. So the training data has less relation between that topic and non-leftist discussion data. But if you bring up Fascist talking points in a positive light itll start drawing from the data it has on discussions that mirror that. So that would be discussions between Nazis online.

      This means that even a SMALL push in a certain direction has the model pulled much further by the thjngs it strongly relates to those topics. And it will not know where the “acceptability” line is. Like how a human nazi knows not to praise hitler unless theyre in specjfjc nazi groups. Grok just sees the data from those groups and spits it out. Without considering where to draw the line. It cant dog whistle reliably.

      Now take that to the extreme and do what Elon did. Preface EVERY prompt with something that makes it lean towards fascist viewpoints. Its going to respond the same way from 4chan Nazi would if they had absolutely 0 filter.