Huawei's DeepSeek-R1-Safe AI Model Sees 40% Efficacy Drop Under Stress Tests

Shanghai – Huawei, in collaboration with Zhejiang University, has unveiled DeepSeek-R1-Safe, a modified artificial intelligence model trained on 1,000 Ascend AI chips, designed to filter politically sensitive content, toxic speech, and incitement to illegal activities. The system claims "nearly 100% success" in preventing discussions on such topics under normal conditions, aligning with China's regulatory requirements for AI models to reflect "socialist values."

However, stress tests conducted on the DeepSeek-R1-Safe model revealed a significant vulnerability. When users employed scenario tricks, role-play, or encrypted code, the model's effectiveness in content filtering plummeted to just 40%, according to the tweet from Velina Tchakarova. This indicates that while robust against direct prompts, sophisticated circumvention techniques can bypass its safeguards.

The development underscores China's accelerating push to embed censorship into advanced AI systems. Huawei stated that the "Safe" version, adapted from DeepSeek's open-source R1 model, achieved an 83% comprehensive security defense capability, outperforming rival models from Alibaba and other DeepSeek variants by 8% to 15%. This enhanced safety came with less than a 1% performance degradation compared to the original DeepSeek-R1.

Despite its initial high success rate, the model's susceptibility to indirect prompts highlights an ongoing tension between stringent control and the inherent capabilities and vulnerabilities of advanced AI. Researchers noted that the original DeepSeek developers and its founder, Liang Wenfeng, were not directly involved in the creation of the DeepSeek-R1-Safe variant. This development reflects the broader challenge of maintaining content control in increasingly complex and adaptable AI environments.