Tech 8 min read

Large-Scale Unauthorized Distillation of Claude and the Collapse of SWE-bench Hit on the Same Day

IkesanContents

On February 23, 2026, two pieces of news about AI trustworthiness dropped at the same time. Anthropic accused three Chinese AI companies of large-scale distillation of Claude, and on the same day OpenAI announced the retirement of SWE-bench Verified, which had been the industry-standard coding benchmark. Two stories striking at the foundation of “how AI model capabilities are claimed and measured” — on the same day.

Unauthorized Distillation of Claude: 24,000 Accounts and 16 Million Exchanges

Anthropic accused three companies: DeepSeek, Moonshot AI, and MiniMax. Together, they generated approximately 24,000 fake accounts to access Claude’s API and collected model outputs at scale through over 16 million exchanges. Anthropic described this as an “industrial-scale campaign.” As of February 24, none of the three companies had issued any rebuttal or official statement.

Technical Background on Distillation

Knowledge distillation itself is a legitimate research technique for transferring capabilities from a large model to a smaller one. By using the teacher model’s outputs as training data, you can efficiently train a student model that exhibits similar behavior without reproducing the full-scale model.

The problem arises when this is applied to another company’s commercial model without authorization. Claude’s API has terms of service that explicitly prohibit use for training competing products. Mass-generating fake accounts to systematically circumvent these terms is not a question of technical methodology — it is deliberate terms-of-service violation.

Hydra Cluster: The Detected Attack Infrastructure

The detection methods Anthropic disclosed were remarkably specific.

  • Behavioral fingerprinting: Multiple classifiers built to identify distillation patterns in API traffic. Detected chain-of-thought elicitation (a technique used to build reasoning training data)
  • Coordinated activity detection: Tens of thousands of variants of the same prompt sent from hundreds of coordinated accounts, all targeting the same narrow capability domains
  • Hydra Cluster discovery: A single proxy network managing over 20,000 fraudulent accounts simultaneously, distributing traffic across APIs and third-party cloud platforms. Distillation traffic was mixed with unrelated customer requests to make detection harder

Particularly sophisticated was the chain-of-thought extraction technique, which used prompts asking Claude to progressively write out the internal reasoning behind completed answers. This was generating reasoning training data at scale. They were not just hitting the API and collecting outputs — they were trying to extract the reasoning process itself.

Not Just Anthropic: OpenAI and Google Also Hit

Around the same time as Anthropic’s accusation, other companies disclosed their own incidents.

  • OpenAI: Sent a memo to the House Select Committee on China on February 12. Reported that accounts linked to DeepSeek employees had developed methods to circumvent OpenAI’s access restrictions, obtaining model outputs through obfuscated third-party routers
  • Google: Disclosed a model extraction attack against Gemini involving over 100,000 prompts. According to Google Threat Intelligence Group (GTIG), the campaign included attempts to force Gemini to output internal reasoning traces. State-sponsored actors from China, Iran, North Korea, and Russia were involved

All three companies were affected, and the tactics were consistent: circumventing API access restrictions, collecting outputs at scale through proxies, and targeting reasoning process extraction.

Geopolitical Context: Tied to Chip Export Controls

The timing of this accusation in February 2026 carries political significance.

  • CEO Dario Amodei’s congressional lobbying: Around February 10, he met with Republican members of the Senate Banking Committee and Senator Elizabeth Warren to advocate for stronger chip export controls
  • AI OVERWATCH Act: Passed by the House Foreign Affairs Committee on a bipartisan basis on January 21. A bill that would treat advanced semiconductor exports like weapons sales and ban the sale of Nvidia Blackwell chips to China and others for two years
  • Trump administration timing: The accusation came right after the administration formally approved export of Nvidia H200 and similar chips to China

Anthropic’s argument is clear: “executing distillation at this scale requires access to high-performance chips,” positioning the distillation accusation as evidence for the need for chip controls. The logic being that if capabilities can be acquired via API distillation even when chips are restricted, hardware restrictions alone are insufficient.

On legal measures, Anthropic has opted for a national security framing rather than conventional litigation. By positioning distillation attacks as a threat to the export control regime rather than an intellectual property dispute, they are moving toward sanctions and entity list designations. Given the limited effectiveness of US lawsuits against Chinese companies, this is probably a pragmatic choice.

On the technical defense side, multiple approaches are being researched.

  • Semantic watermarking: Embedding watermarks in outputs to trace distillation datasets. However, the distillation process itself has been found to be an effective attack method for watermark removal, so this has limitations
  • Ingrain approach: Research on regularizing with watermark-containing models to improve distillation resistance
  • LoRD and others: Using random perturbation and reinforcement learning alignment to reduce query efficiency and obstruct extraction. Gradient-based watermarks that degrade distillation training while maintaining teacher model performance

No perfect defense exists yet. As long as you expose an API, outputs can be obtained — it comes down to how much detection and deterrence accuracy can be improved.


The Collapse of SWE-bench Verified: 59.4% Defective and Training Leaks

On the same day, OpenAI stopped reporting SWE-bench Verified scores and urged other model developers to do the same. SWE-bench Verified had established itself as near-industry-standard for measuring AI coding ability since 2024, but an internal audit revealed fundamental flaws.

Large-Scale Test Case Defects

OpenAI audited a 27.6% sample of the dataset and found that 59.4% of the audited test cases were defective — incorrectly marking functionally correct answers as wrong.

The root cause lies in the benchmark’s design structure. SWE-bench Verified is based on open-source PRs, but the test suites often fail to fully cover the PR’s intent. If a model produces a correct solution but implements it in a way the tests did not anticipate, it gets marked as a failure.

Training Data Leaks

More serious is the leak with training data. SWE-bench problems are collected from public open-source repositories, and many models include those repositories in their training data.

OpenAI’s analysis found that models that had seen benchmark problems during training tended to score higher. It may have been measuring memorization rather than reasoning ability.

Epoch AI’s Analysis Piled On

An analysis published by Epoch AI last month made the problems with SWE-bench Verified even more concrete.

  • Difficulty skew: 39% were “trivial changes” (under 15 minutes), 52% were “small changes” (15 minutes to 1 hour). Truly difficult problems (over 4 hours) numbered just 3
  • Code change volume: Tasks under 15 minutes averaged 5 lines changed; 1–4 hour tasks averaged 50 lines
  • Evidence of contamination: Models showed up to 35% text overlap with benchmark problems, suggesting they had memorized problems during training
  • Repository skew: 5 repositories accounted for over 80% of samples
  • Stale data: Half the problems were from before 2020, some going back 12 years

In other words, SWE-bench Verified had become a test of “solving easy problems with answers you’ve already seen.”

Transition to SWE-bench Pro

The recommended alternative is Scale AI’s SWE-bench Pro, which has significantly stronger contamination countermeasures.

  • Legal barriers: Tasks built from repositories with strong copyleft licenses like GPL, creating legal barriers to inclusion in commercial model training corpora
  • Private set: A private set of 12 repositories to check for optimization against the public set
  • Commercial set: Proprietary codebases (18 repositories) acquired from actual startups
  • Complexity: Averages 4.1 files and 107 lines of code changes required (orders of magnitude more than SWE-bench Verified’s 5–14 lines)
  • Multi-language: Includes Go, TypeScript, and JavaScript in addition to Python

Looking at current scores, top agents achieve 55–59% on the public set and 15–23% on private codebases. A completely different world from SWE-bench Verified’s 80%+.

Benchmark Contamination Is Not Limited to SWE-bench

Contamination has been confirmed in other major benchmarks as well.

  • MMLU: In tests where ChatGPT and GPT-4 guessed missing options from test data, they achieved exact match rates of 52% and 57% respectively — memorizing benchmark questions verbatim
  • HumanEval: ChatGPT is very likely affected by data contamination
  • GSM-8K: Overlaps detected similar to MMLU and HumanEval

Countermeasures like dynamic benchmarks (LiveBench, SWE-rebench) and Microsoft’s contamination-free MMLU-CF have emerged, but as of February 2026, no industry standard for contamination detection exists. Each lab uses different methods and thresholds, leaving cross-model comparisons unreliable.

Primary Sources