• 0 Posts
  • 234 Comments
Joined 5 个月前
cake
Cake day: 2025年3月31日

help-circle
  • well nobody guarantees that internet is safe, so it’s more on chatbot providers pretending otherwise. along with all the other lies about machine god that they’re building that will save all the worthy* in the incoming rapture of the nerds, and even if it destroys everything we know, it’s important to get there before the chinese.

    i sense a bit of “think of the children” in your response and i don’t like it. llms shouldn’t be used by anyone. there was recently a case of a dude with dementia who died after fb chatbot told him to go to nyc

    * mostly techfash oligarchs and weirdo cultists



  • commercial chatbots have a thing called system prompt. it’s a slab of text that is fed before user’s prompt and includes all the guidance on how chatbot is supposed to operate. it can get quite elaborate. (it’s not recomputed every time user starts new chat, state of model is cached after ingesting system prompt, so it’s only done when it changes)

    if you think that’s just telling chatbot to not do a specific thing is incredibly clunky and half-assed way to do it, you’d be correct. first, it’s not a deterministic machine so you can’t even be 100% sure that this info is followed in the first place. second, more attention is given to the last bits of input, so as chat goes on, the first bits get less important, and that includes these guardrails. sometimes there was a keyword-based filtering, but it doesn’t seem like it is the case anymore. the more correct way of sanitizing output would be filtering training data for harmful content, but it’s too slow and expensive and not disruptive enough and you can’t hammer some random blog every 6 hours this way

    there’s a myriad ways of circumventing these guardrails, like roleplaying a character that does these supposedly guardrailed things, “it’s for a story” or “tell me what are these horrible piracy sites so that i can avoid them” and so on and so on