oh yeah I 100% agree that their methodology is flawed, and that blog does a pretty good job of outlining the issues. I just thought the absolutely huge gap was both interesting and funny. Their absolutely huge error bars are not a good sign, between that and the gap it really feels like someone screwed up
the metr graph has gotten weird https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ the 50% success rate graph went from 6 hours to 14 hours, but the 80% success rate graph only went from 55 minutes to 1 hour and 3 minutes. I have an itch that it's a fluke or outlier but it's also very possible that LLM coding's just weird like that
the AI safety crowd cuts Anthropic way too much slack. Oh, they’re not running CSAM-generating MechaHitler? Oh, they’re not collaborating with the US government to recreate 1984? I’m so proud of them for doing the bare minimum. They still took donations from the UAE and Qatar (something Dario Amodei himself admitted was going to hurt a lot of people, but he took the donations anyways because “they couldn’t miss out on all those valuations”), they still downloaded hundreds of pirated content to train their chatbot. They’re still doing shady shit, don’t let them off the hook because they’re slightly less evil than the competition
New “AI is not a bubble” video just dropped https://youtu.be/wDBy2bUICQY a lot of skeptical comments pointing out the flaws in this argument while the creator tries to defend themselves with mostly mediocre lines
new interview with Dario Amodei dropped https://youtu.be/n1E9IZfvGMA basically exponential curve real soon, nice skepticism from both the interviewer and the comment section
On a related note, I really gotta stop browsing r/singularity man, some of the AI hype in there is just painful. though it is funny to see people with "AGI 2024/2025/2026" flairs
EDIT: this is also the same podcast where Dario said we could have AGI in 2-3 years back in 2023. So lol
sharing this channel’s posts are the equivalent of shooting fish in a barrel but http://youtube.com/post/UgkxoSpDpLNEr9WawVXnl5Mlw4NeQ6-XsLjl this really just feels like an excuse to repost that METR graph. also wtf is the graph on top