IBM just released Granite 4.1, a family of open source language models built specifically for enterprise use. Three sizes, Apache 2.0 licensed and trained on 15 trillion tokens with a level of pipeline obsession that's worth understanding. But there's one result in the benchmarks I keep coming back to. The 8B model. Dense architecture, no MoE tricks, no extended reasoning chains. It matches or beats Granite 4.0-H-Small across basically every benchmark they ran. That older model has 32B parameters with 9B active. This one has 8 billion. Full stop. That result is either very impressive or it means the old model was underbuilt. Probably both. Here's how they built it, what the numbers actually say, and whether any of it matters for your use case.
Deepseek v4 pro! Top up your credit as you go and they’re having a sale until May 31st, but even without the sale 1M output tokens is “only” 3.48. Flash is only 0.28 per 1M output.
Not sure if I could swing Deepseek at my job tho. Surprisingly, Cursor still comes with Kimi2 as model option, so there’s that.