As a Chinese company, DeepSeek is beholden to CCP policy. This is reflected even in the open-source model, prompting concerns about censorship and other influence.
Today we’re publishing a dataset of prompts covering sensitive topics that are likely to be censored by the CCP. These topics include perennial issues like Taiwanese independence, historical narratives around the Cultural Revolution, and questions about Xi Jinping.
Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.
The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.
I just posted these two comments somewhere else, but it seems it fits also here:
As many have already said, it comes as no surprise that China releases a heavily censored model. One recent report on that topic:
1,156 Questions Censored by DeepSeek
Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model