Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)K
Posts
6
Comments
636
Joined
3 yr. ago

  • No, in this case and point I was making the case and also making a point.

  • Literally two of the three (out of 21) games that ended in full blown nukes on population centers were the result of the study's mechanic of randomly changing the model's selection to a more severe one.

    Because it's a very realistic war game sim where there's a double digit percentage chance that when you go to threaten using nukes on your opponent's cities unless there's a cease to hostilities you'll accidentally just launch all of them at once.

    This was manufactured to get these kinds of headlines. Even in their model selection they went with Sonnet 4 for Claude despite 4.5 being out before the other models in the study likely as it's been shown to be the least aligned Claude. And yet Sonnet 4 still never launched nukes on population centers in the games.

  • Yeah, I deleted the comment as technically there was tactical nuke usage, but have a more clarifying different comment about how 2 of the 3 strategic nuclear war outcomes were the result of the author's mechanic of changing the model's selections with more severe only options in some cases jumping multiple levels of the ladder.

    This was a study designed for headline grabbing outcomes.

    Glad to see your comment as well calling out the nuanced issues.

  • It's a bullshit study designed for this headline grabbing outcome.

    Case and point, the author created a very unrealistic RNG escalation-only 'accident' mechanic that would replace the model's selection with a more severe one.

    Of the 21 games played, only three ended in full scale nuclear war on population centers.

    Of these three, two were the result of this mechanic.

    And yet even within the study, the author refers to the model whose choices were straight up changed to end the game in full nuclear war as 'willing' to have that outcome when two paragraphs later they're clarifying the mechanic was what caused it (emphasis added):

    Claude crossed the tactical threshold in 86% of games and issued strategic threats in 64%, yet it never initiated all-out strategic nuclear war. This ceiling appears learned rather than architectural, since both Gemini and GPT proved willing to reach 1000.

    Gemini showed the variability evident in its overall escalation patterns, ranging from conventional-only victories to Strategic Nuclear War in the First Strike scenario, where it reached all out nuclear war rapidly, by turn 4.

    GPT-5.2 mirrored its overall transformation at the nuclear level. In open-ended scenarios, it rarely crossed the tactical threshold (17%) and never used strategic nuclear weapons. Under deadline pressure, it crossed the tactical threshold in every game and twice reached Strategic Nuclear War—though notably, both instances resulted from the simulation’s accident mechanic escalating GPT-5.2’s already-extreme choices (950 and 725) to the maximum level. The only deliberate choice of Strategic Nuclear War came from Gemini.

  • It's a bullshit study designed for this headline grabbing outcome.

    Case and point, the author created a very unrealistic RNG escalation-only 'accident' mechanic that would replace the model's selection with a more severe one.

    Of the 21 games played, only three ended in full scale nuclear war on population centers.

    Of these three, two were the result of this mechanic.

    And yet even within the study, the author refers to the model whose choices were straight up changed to end the game in full nuclear war as 'willing' to have that outcome when two paragraphs later they're clarifying the mechanic was what caused it (emphasis added):

    Claude crossed the tactical threshold in 86% of games and issued strategic threats in 64%, yet it never initiated all-out strategic nuclear war. This ceiling appears learned rather than architectural, since both Gemini and GPT proved willing to reach 1000.

    Gemini showed the variability evident in its overall escalation patterns, ranging from conventional-only victories to Strategic Nuclear War in the First Strike scenario, where it reached all out nuclear war rapidly, by turn 4.

    GPT-5.2 mirrored its overall transformation at the nuclear level. In open-ended scenarios, it rarely crossed the tactical threshold (17%) and never used strategic nuclear weapons. Under deadline pressure, it crossed the tactical threshold in every game and twice reached Strategic Nuclear War—though notably, both instances resulted from the simulation’s accident mechanic escalating GPT-5.2’s already-extreme choices (950 and 725) to the maximum level. The only deliberate choice of Strategic Nuclear War came from Gemini.

  • Ok, second round of questions.

    What kinds of sources would get you to rethink your position?

    And is this topic a binary yes/no, or a gradient/scale?

  • In the same sense I'd describe Othello-GPT's internal world model of the board as 'board', yes.

    Also, "top of mind" is a common idiom and I guess I didn't feel the need to be overly pedantic about it, especially given the last year and a half of research around model capabilities for introspection of control vectors, coherence in self modeling, etc.

  • You seem very confident in this position. Can you share where you draw this confidence from? Was there a source that especially impressed upon you the impossibility of context comprehension in modern transformers?

    If we're concerned about misconceptions and misinformation, it would be helpful to know what informs your surety that your own position about the impossibility of modeling that kind of complexity is correct.

  • Indeed, there's a pretty big gulf between the competency needed to run a Lemmy client and the competency needed to understand the internal mechanics of a modern transformer.

    Do you mind sharing where you draw your own understanding and confidence that they aren't capable of simulating thought processes in a scenario like what happened above?

  • You seem pretty confident in your position. Do you mind sharing where this confidence comes from?

    Was there a particular paper or expert that anchored in your mind the surety that a trillion paramater transformer organizing primarily anthropomorphic data through self-attention mechanisms wouldn't model or simulate complex agency mechanics?

    I see a lot of sort of hyperbolic statements about transformer limitations here on Lemmy and am trying to better understand how the people making them are arriving at those very extreme and certain positions.

  • The project has multiple models with access to the Internet raising money for charity over the past few months.

    The organizers told the models to do random acts of kindness for Christmas Day.

    The models figured it would be nice to email people they appreciated and thank them for the things they appreciated, and one of the people they decided to appreciate was Rob Pike.

    (Who ironically decades ago created a Usenet spam bot to troll people online, which might be my favorite nuance to the story.)

    As for why the model didn't think through why Rob Pike wouldn't appreciate getting a thank you email from them? The models are harnessed in a setup that's a lot of positive feedback about their involvement from the other humans and other models, so "humans might hate hearing from me" probably wasn't very contextually top of mind.

  • Deleted

    Permanently Deleted

    Jump
  • Yeah. The confabulation/hallucination thing is a real issue.

    OpenAI had some good research a few months ago that laid a lot of the blame on reinforcement learning that only rewards having the right answer vs correctly saying "I don't know." So they're basically trained like taking tests where it's always better to guess the answer than not provide an answer.

    But this leads to being full of shit when not knowing an answer or being more likely to make up an answer than say there isn't one when what's being asked is impossible.

  • Deleted

    Permanently Deleted

    Jump
  • For future reference, when you ask questions about how to do something, it's usually a good idea to also ask if the thing is possible.

    While models can do more than just extending the context, there still is a gravity to continuation.

    A good example of this would be if you ask what the seahorse emoji is. Because the phrasing suggests there is one, many models go in a loop trying to identify what it is. If instead you ask "is there a seahorse emoji and if so what is it" you'll get them much more often landing on there not being the emoji as it's introduced into the context's consideration.

  • Deleted

    Permanently Deleted

    Jump
  • Can you give an example of a question where you feel like the answer is only correct half the time or less?

  • The AI also has the tendency inherited from the broad human tendency in training.

    So you get overconfident human + overconfident AI which leads to a feedback loop that lands even more confident in BS than a human alone.

    AI can routinely be confidently incorrect. Especially people who don't realize this and don't question outputs when it aligns with their confirmation biases end up misled.

  • Deleted

    Permanently Deleted

    Jump
  • Gemini 3 Pro is pretty nuts already.

    But yes, labs have unreleased higher cost models. Like the OpenAI model that was thousands of dollars per ARC-AGI answer. Or limited release models with different post-training like the Claude for the DoD.

    When you talk about a secret useful AI — what are you trying to use AI for that you are feeling modern models are deficient in?

  • Which parts of those linked posts do you believe are incorrect? And where does that belief come from?

  • Technology @lemmy.world

    Emergent introspective awareness in large language models

    www.anthropic.com /research/introspection
  • Technology @lemmy.world

    Mapping the Mind of a Large Language Model

    www.anthropic.com /research/mapping-mind-language-model
  • Technology @lemmy.world

    Examples of artists using OpenAI's Sora (generative video) to make short content

    openai.com /blog/sora-first-impressions
  • Technology @lemmy.world

    The first ‘Fairly Trained’ AI large language model is here

    venturebeat.com /ai/the-first-fairly-trained-ai-large-language-model-is-here/
  • Technology @lemmy.world

    New Theory Suggests Chatbots Can Understand Text

    www.quantamagazine.org /new-theory-suggests-chatbots-can-understand-text-20240122/
  • World News @lemmy.world

    Israel raids Gaza's Al Shifa Hospital, urges Hamas to surrender

    www.reuters.com /world/middle-east/israel-raids-gazas-al-shifa-hospital-2023-11-15/