AI Community Raises Concerns Over Model Quality and Data Integrity in Recent Releases

The search for "Skywork AI controversy yellow card" led to a Hugging Face dataset entry for "Skywork/Skywork-Reward-Preference-80K-v0.1".
This dataset has an "IMPORTANT" notice stating it "shown to contain contaminated samples from the magpie-ultra-v0.1 subset. The prompts of those samples have a significant n-gram overlap with the evaluation prompts in RewardBench, based on the script in this GitHub gist."
It strongly encourages using v0.2 instead of v0.1. This is a direct "yellow card" or controversy related to data contamination and evaluation integrity, similar to the criticism for MBZUAI's model.
Another search result (LinkedIn article) provides an in-depth analysis of Skywork.ai, highlighting significant transparency deficits, lack of clear privacy policies, and a strong inferred connection to Kunlun Tech, a Chinese company with a history of data privacy controversies. This adds further context to potential "yellow cards" or concerns.

I have enough information to construct a comprehensive news article addressing both parts of the tweet. I will focus on the criticisms and controversies surrounding both AI models, using the identified issues as the core of the headline and article content.

I will structure the article with a clear, objective headline, a strong lead paragraph, and supporting paragraphs detailing the issues for each entity. I will integrate direct quotes from the tweet and attribute information from the search results

Recent developments in the artificial intelligence landscape have sparked skepticism within the AI community, with two prominent models, MBZUAI's K2 Think and Skywork AI's reward dataset, drawing significant criticism regarding their performance and data integrity. The concerns highlight a growing demand for transparency and rigorous evaluation in the rapidly evolving field of large language models.

The Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) recently released K2 Think, a 32-billion-parameter reasoning model based on Qwen 2.5. However, community reception has been largely negative. One prominent AI commentator, Teortaxes▶️, expressed initial skepticism, stating, "I didn't bother looking into this model because I have a very low prior for 'a qwen 2.5 finetune from MBZUAI is real news', at a glance the paper seemed like a typical me-too reasoning gobbledygook. Sad to be correct."

User feedback on platforms like Reddit echoed this sentiment, with several users calling K2 Think "unbelievably bad compared to today's standard and their claims." Specific criticisms included poor coding performance, excessive token usage for simple tasks, and allegations of "contamination" in math benchmarks, which could artificially inflate performance metrics. These issues suggest that the model may not live up to its advertised capabilities in complex reasoning tasks.

Adding to the industry's scrutiny, Skywork AI recently received a "yellow card" for data contamination issues. The Hugging Face dataset entry for "Skywork/Skywork-Reward-Preference-80K-v0.1" includes an "IMPORTANT" notice, revealing that the dataset "shown to contain contaminated samples... The prompts of those samples have a significant n-gram overlap with the evaluation prompts in RewardBench." This contamination compromises the integrity of evaluations conducted using the dataset, leading to calls for users to switch to a corrected version.

The criticism against Skywork AI extends beyond dataset integrity. A LinkedIn analysis of Skywork.ai highlighted significant transparency deficits, including a lack of clear privacy policies and an inferred connection to Kunlun Tech, a Chinese technology company with a documented history of data privacy controversies. These concerns collectively underscore a broader industry challenge in ensuring the reliability and ethical development of AI technologies.