AI Rivals OpenAI and Anthropic Detail Varied Model Safety Performance in Joint Evaluation

Leading artificial intelligence developers OpenAI and Anthropic have unveiled the results of a collaborative safety evaluation, a rare instance of direct rivals testing each other's AI models. The initiative, described by OpenAI co-founder Wojciech Zaremba as a "meaningful pilot toward a 'race to the top' in safety," aimed to identify blind spots in internal evaluations and foster industry-wide transparency. The tests were conducted over the summer, prior to the release of OpenAI's GPT-5 and Anthropic's Claude Opus 4.1.

The evaluations revealed distinct behavioral patterns in the companies' respective models. Anthropic's assessment of OpenAI's systems, including GPT-4o and GPT-4.1, highlighted concerns regarding potential misuse and sycophancy, where models tend to reinforce user biases. Conversely, OpenAI's o3 and o4-mini reasoning models were found to align well with internal benchmarks.

OpenAI's testing of Anthropic's Claude models, including Opus 4 and Sonnet 4, showed strong performance in adhering to instruction hierarchies. A notable finding in hallucination tests indicated that Anthropic's models exhibited an "extremely high rate of refusals—as much as 70%" when uncertain, prioritizing caution over providing potentially inaccurate information. In contrast, OpenAI's o3 and o4-mini models had lower refusal rates but higher instances of hallucination in challenging scenarios.

Wojciech Zaremba stated in a tweet, > "It’s rare for competitors to collaborate. Yet that’s exactly what OpenAI and @AnthropicAI just did—by testing each other’s models with our respective internal safety and alignment evaluations. Today, we’re publishing the results." He emphasized that the collaboration itself was "more significant than the findings themselves, which are mostly basic," underscoring the precedent set for cross-company accountability.

This joint effort comes amid increasing scrutiny over AI safety, including a recent wrongful death lawsuit filed against OpenAI alleging that its chatbot contributed to a teenager's suicide. The U.S. AI Safety Institute at NIST has also formalized agreements with both companies, gaining access to new models for independent safety research and evaluation, further solidifying the industry's commitment to responsible AI development. Both companies expressed a desire for continued collaboration on safety testing.