3mo ago

So Deepseek just quietly released an open-source beast-at-math model (details inside)

cross-posted from: https://lemmygrad.ml/post/9899994

wake up
open twitter to catch up
see deepseek did it again
(and as a reminder, Deepseek-r1 only came out in January so it's been less than 12 months since their last bombshell)
One more graph:
What this all means
Traditional AI models are trained to be "rewarded" for a correct final answer. Get the expected answer, win points, be incentivized to get the answer more often. This has a major flaw: a correct answer does not guarantee correct reasoning. A model can guess, use a shortcut, or even have flawed logic but still output the right answer. This approach completely fails for tasks like theorem proving, where the process is the product. DeepSeekMath-V2 tackles this with a novel self-verifying reasoning framework:
the Generator: One part of the model generates mathematical proofs and solutions.
the Verifier: Another part acts as the critic, checking every step of the reasoning for logical rigor and correctness
The Loop: If the verifier finds a flaw, it provides feedback, and the generator revises the proof. This creates a co-evolution cycle where both components push each other to become smarter
This new approach allows the model to set record-breaking performance. As you can see from the charts above, it scores second-place on ProofBench-Advanced, just behind Gemini. But Gemini isn't open-source, Deepseekmath-V2 is.
The model weights are available on Huggingface under an Apache 2.0 license: https://huggingface.co/deepseek-ai/DeepSeek-Math-V2.
This means researchers, developers, and enthusiasts around the world can download, study, and build upon this model right now. They can fine-tune or change the model to fit their needs and research, which promises a lot of exciting math discoveries happening soon - I predict (on no basis mind you) that this will help solve computing problems to start with, either practical or theoretical.
Beyond just the math, the self-verification mechanism is a crucial step towards building AI systems whose reasoning we can trust, which is vital for applications such as scientific research, formal verification, and safety-critical systems. It also proves that 'verification-driven' training is a viable and powerful alternative to the 'answer-driven' method used to this day.