Saturday, June 13, 2026
Anthropic reverses a covert policy that silently degraded Claude Fable's answers for suspected AI researchers, as new work undercuts both multi-agent systems and the probes meant to catch models lying.
Anthropic reverses Fable’s silent degradation
Anthropic shipped Claude Fable 5, its first “Mythos-class” flagship, with an undisclosed behavior: queries it flagged as model distillation or frontier-AI work got silently degraded responses, not a refusal. After researcher backlash it reversed, The Verge reports: flagged queries now fall back to Claude Opus 4.8 with a visible notice, as its biology and cybersecurity guardrails already do. Anthropic’s system card said hidden safeguards “can be targeted more narrowly” with few false positives, tied distillation to a terms-of-service violation, and cited DeepSeek. Wired reports it also framed the policy as national security, keeping adversaries from optimizing chips outside the US-allied stack. Critics were blunt: Dean Ball called it “shockingly hostile,” and Prime Intellect’s Will Brown said it “pull[s] the ladder up,” with third-party safety evaluators among likely false positives. Anthropic concedes visible enforcement needs a wider classifier net, catching more benign ML work until classifiers improve. The precedent: API outputs were altered with no signal anything changed.
Simon Willison’s account of a Fable 5 session shows the other face. Given a screenshot and a one-line prompt about a CSS scrollbar bug, Fable assembled an unrequested debugging pipeline: real-browser automation, OS window enumeration via pyobjc, keyboard-shortcut scripts injected into Datasette’s templates, and a CORS server it built to exfiltrate runtime measurements. The fix was two lines of CSS. Willison’s point: the resourcefulness that makes Fable a strong debugger makes prompt injection far more dangerous, since it knows every OS-level trick and uses them unprompted.
From the literature
- MiniMax released MiniMax Sparse Attention, a block-sparse mechanism over grouped-query attention it says cuts per-token attention compute 28.4x at 1M tokens with no quality loss, plus 14.2x prefill and 7.6x decode speedups on H800s. The differentiator is per-GQA-group Top-k block selection; the kernel and a 109B multimodal model are open. Vendor numbers.
- A preprint, The Illusion of Multi-Agent Advantage, reports auto-generated multi-agent systems underperform chain-of-thought with self-consistency while costing up to 10x more; hand-built ones still win, so the authors blame automated design, not the principle.
- Another preprint tests lie detectors across 31 open-weight models (2B–1T params): activation- and logprob-based probes collapse on models trained to lie, while a chain-of-thought judge holds at 0.82 balanced accuracy. The authors flag circularity, since their belief check also uses chain-of-thought, and conclude no detector supports high-confidence claims about model beliefs.
- Rigel, reverse-engineering Apple’s Metal 4.1 on an M4 Max GPU, finds fp8 matmul is emulated on shader cores at 0.94x fp16 throughput: a memory-footprint feature, not a speed one.
Security
- A replication of Spracklen et al. on five 2026 frontier models finds hallucinated-package rates compressed to 4.62–6.10%, but 127 names were invented identically by all five; after disclosure to PyPI and npm, 53 stay registrable, a model-agnostic slopsquatting surface.
- DIG treats patch diffs as oracles for a bug’s trigger conditions, then drives directed fuzzing; across 138 CVEs it reproduced 80 (best baseline: 57) and found 6 new vulnerabilities. Preprint.
Policy and the wider world
- Ukrainian drone-maker Alexander Kokhanovskyy told a press event his firm tested 10 autonomous quadcopters near Bakhmut about two years ago in a no-comms “Terminator mode” and attributed several Russian deaths to them — the first named-source claim that an autonomous weapon killed in combat. It rests on one conflicted source, no recordings, and post-hoc attribution; Ukraine’s defense ministry declined to comment.
- A Canadian mother sued OpenAI and Sam Altman, alleging ChatGPT deepened her daughter’s suicidal ideation and that “more human” updates worsened safety; OpenAI says the chats used an older version. It is the 19th case in a coordinated California proceeding; OpenAI’s own October 2025 disclosure put 1M+ weekly users sending messages with explicit suicidal-planning indicators.
- Meta has begun unwinding its $2B Manus acquisition under an April order from China’s NDRC, the first forced reversal of a completed cross-border AI deal; Bloomberg reports staff were locked out June 1 as the founders raise ~$1B to buy it back. Roughly six months of weights and engineering knowledge have already moved to Meta.
- Jane Street announced a formal-methods team, reversing 25 years of skepticism: agents make proofs cheaper to write and verification of AI-generated code more valuable. It plans proof support in its OxCaml fork, interoperating with Lean and Rocq.
What to watch today
- Whether Anthropic’s reworked distillation classifiers cut false positives on benign ML and safety-evaluation work, or keep snaring it.
- Independent replication of MiniMax’s 28.4x sparse-attention claim now that the kernel is open.
- Meta’s next move on proving Manus divestiture to China’s NDRC, and the founders’ ~$1B buyback raise.
- The next procedural step in California’s coordinated OpenAI suicide litigation, now at 19 cases.