Caruso was impressed by Gemini’s ability to recognize its limitations.
But it didn’t recognize those limitations on its own, it boasted about being on par with modern chess engines. It only did a 180 after Caruso warned it, which is just the LLM agreeing with whatever the human says.
LLM tools are quickly going to become the propaganda spewing things that conservative lawmakers have often claimed search engines like Google & Bing are. Only in these cases it’s hidden in the code used to train these models rather than just websites that are scraped. Just look at the examples we’re already seeing like Grok relying on Elon, and the Missouri AG insisting that AI’s praise Trump.
I feel like the Atari 2600 is quickly becoming for so-called AI what the “how much is a gallon of milk?” gotcha question had become for politicians who run for office. A rather pointless bit of news.
As Scotty said: the right tool for the right job. An LLM is maybe not a chess engine and that’s fine too. Why would we expect these models to be Magnus effing Carlson if they cannot reliably summarize an email or recommend eating pebbles?
Your question is probably rethorical but I feel the need to put it out there : It’s because it’s been advertised as such. LLMs are not advertised as language based AI but as something “intelligent” with “reasoning” abilities, which they inherently do not have.
But that’s not what most people were told. For a large amount of them, LLMs can “think” and should be able to solve problems, such as chess…
Because people keep asking like LLMs are a magical solution to every problem. This is an effective way to show that that’s false
This confirms Google Gemini is Joshua AI.
Canceling the matchNot playing Chess with a probability word guessers is likely the most time-efficient and sensible decisionGoogle’s Gemini
refusesreturns null to play Chess against the Atari 2600