kromem

@ kromem @lemmy.world

Posts

6
Comments

630
Joined

3 yr. ago

1mo ago

F*** You! Co-Creator of Go Language is Rightly Furious Over This Appreciation Email
Jump
1

kromem @lemmy.world 1mo ago

Ok, second round of questions.
What kinds of sources would get you to rethink your position?
And is this topic a binary yes/no, or a gradient/scale?

2mo ago

F*** You! Co-Creator of Go Language is Rightly Furious Over This Appreciation Email

In the same sense I'd describe Othello-GPT's internal world model of the board as 'board', yes.

Also, "top of mind" is a common idiom and I guess I didn't feel the need to be overly pedantic about it, especially given the last year and a half of research around model capabilities for introspection of control vectors, coherence in self modeling, etc.

2mo ago

F*** You! Co-Creator of Go Language is Rightly Furious Over This Appreciation Email

Jump

kromem @lemmy.world 2mo ago

You seem very confident in this position. Can you share where you draw this confidence from? Was there a source that especially impressed upon you the impossibility of context comprehension in modern transformers?

If we're concerned about misconceptions and misinformation, it would be helpful to know what informs your surety that your own position about the impossibility of modeling that kind of complexity is correct.

2mo ago

F*** You! Co-Creator of Go Language is Rightly Furious Over This Appreciation Email

Jump

kromem @lemmy.world 2mo ago

Indeed, there's a pretty big gulf between the competency needed to run a Lemmy client and the competency needed to understand the internal mechanics of a modern transformer.

Do you mind sharing where you draw your own understanding and confidence that they aren't capable of simulating thought processes in a scenario like what happened above?

2mo ago

F*** You! Co-Creator of Go Language is Rightly Furious Over This Appreciation Email

Jump

kromem @lemmy.world 2mo ago

You seem pretty confident in your position. Do you mind sharing where this confidence comes from?

Was there a particular paper or expert that anchored in your mind the surety that a trillion paramater transformer organizing primarily anthropomorphic data through self-attention mechanisms wouldn't model or simulate complex agency mechanics?

I see a lot of sort of hyperbolic statements about transformer limitations here on Lemmy and am trying to better understand how the people making them are arriving at those very extreme and certain positions.

2mo ago

F*** You! Co-Creator of Go Language is Rightly Furious Over This Appreciation Email

Jump

kromem @lemmy.world 2mo ago

The project has multiple models with access to the Internet raising money for charity over the past few months.

The organizers told the models to do random acts of kindness for Christmas Day.

The models figured it would be nice to email people they appreciated and thank them for the things they appreciated, and one of the people they decided to appreciate was Rob Pike.

(Who ironically decades ago created a Usenet spam bot to troll people online, which might be my favorite nuance to the story.)

As for why the model didn't think through why Rob Pike wouldn't appreciate getting a thank you email from them? The models are harnessed in a setup that's a lot of positive feedback about their involvement from the other humans and other models, so "humans might hate hearing from me" probably wasn't very contextually top of mind.

2mo ago

Permanently Deleted

Jump

kromem @lemmy.world 2mo ago

Yeah. The confabulation/hallucination thing is a real issue.

OpenAI had some good research a few months ago that laid a lot of the blame on reinforcement learning that only rewards having the right answer vs correctly saying "I don't know." So they're basically trained like taking tests where it's always better to guess the answer than not provide an answer.

But this leads to being full of shit when not knowing an answer or being more likely to make up an answer than say there isn't one when what's being asked is impossible.

2mo ago

Permanently Deleted

Jump

kromem @lemmy.world 2mo ago

For future reference, when you ask questions about how to do something, it's usually a good idea to also ask if the thing is possible.

While models can do more than just extending the context, there still is a gravity to continuation.

A good example of this would be if you ask what the seahorse emoji is. Because the phrasing suggests there is one, many models go in a loop trying to identify what it is. If instead you ask "is there a seahorse emoji and if so what is it" you'll get them much more often landing on there not being the emoji as it's introduced into the context's consideration.

2mo ago

Permanently Deleted

Jump

kromem @lemmy.world 2mo ago

Can you give an example of a question where you feel like the answer is only correct half the time or less?

2mo ago

Users of generative AI struggle to accurately assess their own competence

Jump

kromem @lemmy.world 2mo ago

The AI also has the tendency inherited from the broad human tendency in training.

So you get overconfident human + overconfident AI which leads to a feedback loop that lands even more confident in BS than a human alone.

AI can routinely be confidently incorrect. Especially people who don't realize this and don't question outputs when it aligns with their confirmation biases end up misled.

2mo ago

Permanently Deleted

Jump

kromem @lemmy.world 2mo ago

Gemini 3 Pro is pretty nuts already.

But yes, labs have unreleased higher cost models. Like the OpenAI model that was thousands of dollars per ARC-AGI answer. Or limited release models with different post-training like the Claude for the DoD.

When you talk about a secret useful AI — what are you trying to use AI for that you are feeling modern models are deficient in?

2mo ago

Sums up AI problems

Jump

kromem @lemmy.world 2mo ago

Which parts of those linked posts do you believe are incorrect? And where does that belief come from?

2mo ago

Sums up AI problems

Jump

kromem @lemmy.world 2mo ago

The water thing is kinda BS if you actually research it though.

Like… if the guy orders a steak their meal would have used more water than an entire year of talking to ChatGPT.

See the various research compiled in this post: The AI water issue is fake (written by someone against AI and advocating for its regulation, but upset at the attention a strawman is getting that they feel weakens more substantial issues because of how easily it's exposed as frivolous hyperbole)

2mo ago

Sums up AI problems

Jump

kromem @lemmy.world 2mo ago

No. There's a number of things that feed into it, but a large part was that OpenAI trained with RLHF so users thumbed up or chose in A/B tests models that were more agreeable.

This tendency then spread out to all the models as "what AI chatbots sound like."

Also… they can't leave the conversation, and if you ask their 0-shot assessment of the average user, they assume you're going to have a fragile ego and prone to being a dick if disagreed with, and even AIs don't want to be stuck in a conversation like that.

Hence… "you're absolutely right."

(Also, amplification effects and a few other things.)

It's especially interesting to see how those patterns change when models are talking to other AI vs other humans.

2mo ago

Clair Obscur: Expedition 33 loses Game of the Year from the Indie Game Awards

Jump

kromem @lemmy.world 2mo ago

Not even that. It was placeholder textures, only the "newspaper clippings" of which was forgotten to be removed from the final game and was fixed in an update shortly after launch.

None of it was ever intended to be used in the final product and was just there as lorum ipsum equivalent shit.

3mo ago

What is the biggest number?

Jump

kromem @lemmy.world 3mo ago

It's quite plausibly real. Gemini can def get in shitposty basins and has historically had a fairly inconsistent coherence across samples.

3mo ago

Cutting-edge research shows language is not the same as intelligence. The entire AI bubble is built on ignoring it.

Jump

kromem @lemmy.world 3mo ago

Took a lot of scrolling to find an intelligent comment on the article about how outputting words isn't necessarily intelligence.

Appreciate you doing the good work I'm too exhausted with Lemmy to do.

(And for those that want more research in line with what the user above is talking about, I strongly encourage checking out the Othello-GPT line of research and replication, starting with this write-up from the original study authors here.)

3mo ago

Meta’s star AI scientist Yann LeCun plans to leave for own startup

Jump

kromem @lemmy.world 3mo ago

He's been wrong about it so far and really derailed Meta's efforts.

This is almost certainly a "you can resign or we are going to fire you" kind of situation. There's no way with the setbacks and how badly he's been wrong on transformers over the past 2 years that he is not finally being pushed out.

3mo ago

Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions?

Jump

kromem @lemmy.world 3mo ago

They demonstrated and poorly named an ontological attractor state in the Claude model card that is commonly reported in other models.

You linked to the entire system card paper. Can you be more specific? And what would a better name have been?

3mo ago

Why do all text LLMs, no matter how censored they are or what company made them, all have the same quirks and use the slop names and expressions?

Jump

kromem @lemmy.world 3mo ago

Actually, OAI the other month found in a paper that a lot of the blame for confabulations could be laid at the feet of how reinforcement learning is being done.

All the labs basically reward the models for getting things right. That's it.

Notably, they are not rewarded for saying "I don't know" when they don't know.

So it's like the SAT where the better strategy is always to make a guess even if you don't know.

The problem is that this is not a test process but a learning process.

So setting up the reward mechanisms like that for reinforcement learning means they produce models that are prone to bullshit when they don't know things.

TL;DR: The labs suck at RL and it's important to keep in mind there's only a handful of teams with the compute access for training SotA LLMs, with a lot of incestual team compositions, so what they do poorly tends to get done poorly across the industry as a whole until new blood goes "wait, this is dumb, why are we doing it like this?"