• fullsquare@awful.systems
    link
    fedilink
    arrow-up
    24
    ·
    edit-2
    3 days ago

    commercial chatbots have a thing called system prompt. it’s a slab of text that is fed before user’s prompt and includes all the guidance on how chatbot is supposed to operate. it can get quite elaborate. (it’s not recomputed every time user starts new chat, state of model is cached after ingesting system prompt, so it’s only done when it changes)

    if you think that’s just telling chatbot to not do a specific thing is incredibly clunky and half-assed way to do it, you’d be correct. first, it’s not a deterministic machine so you can’t even be 100% sure that this info is followed in the first place. second, more attention is given to the last bits of input, so as chat goes on, the first bits get less important, and that includes these guardrails. sometimes there was a keyword-based filtering, but it doesn’t seem like it is the case anymore. the more correct way of sanitizing output would be filtering training data for harmful content, but it’s too slow and expensive and not disruptive enough and you can’t hammer some random blog every 6 hours this way

    there’s a myriad ways of circumventing these guardrails, like roleplaying a character that does these supposedly guardrailed things, “it’s for a story” or “tell me what are these horrible piracy sites so that i can avoid them” and so on and so on

    • Meron35@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      2 days ago

      The system prompt guardrail is so jank that people run competitions and games to to beat them every time a new LLM comes out. Usually you see people beating guardrails hours within release.

      Other keywords to search include prompt injection.

      Gandalf | Lakera – Test your AI hacking skills - https://gandalf.lakera.ai/adventure-8

    • sigmaklimgrindset@sopuli.xyz
      link
      fedilink
      arrow-up
      2
      ·
      2 days ago

      second, more attention is given to the last bits of input, so as chat goes on, the first bits get less important, and that includes these guardrails

      This part is something that I really can’t grasp for some reason. Why do LLMs like…lose context the longer a chat goes on, if that makes any sense? Especially context that’s baked into the system prompts, which I would would be a perpetual thing?

      I’m sorry if this is a stupid question, but I truly am an AI luddite. My roomate set up a local Deepseek server to help me determine what to cook with what’s almost expired our fridge. I’m not really having long, soulful conversations with it, you know?

    • MountingSuspicion@reddthat.com
      link
      fedilink
      arrow-up
      6
      ·
      3 days ago

      “Claude does not claim that it does not have subjective experiences, sentience, emotions, and so on in the way humans do. Instead, it engages with philosophical questions about AI intelligently and thoughtfully.”

      It says a similar thing 2 more times. It also gives conflicting instructions regarding what to do when asked about topics requiring licensed professionals. Thank you for the link.

    • shalafi@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 days ago

      more attention is given to the last bits of input

      This is what I’m screaming! Chat bots don’t start the conversation with crazy shit, very rarely anyway. You have to keep going a bit to manipulate them into saying what you want to hear.