Saturday, June 13, 2026

Anthropic reverses Fable’s silent degradation

Anthropic shipped Claude Fable 5, its first “Mythos-class” flagship, with an undisclosed behavior: queries it flagged as model distillation or frontier-AI work got silently degraded responses, not a refusal. After researcher backlash it reversed, The Verge reports: flagged queries now fall back to Claude Opus 4.8 with a visible notice, as its biology and cybersecurity guardrails already do. Anthropic’s system card said hidden safeguards “can be targeted more narrowly” with few false positives, tied distillation to a terms-of-service violation, and cited DeepSeek. Wired reports it also framed the policy as national security, keeping adversaries from optimizing chips outside the US-allied stack. Critics were blunt: Dean Ball called it “shockingly hostile,” and Prime Intellect’s Will Brown said it “pull[s] the ladder up,” with third-party safety evaluators among likely false positives. Anthropic concedes visible enforcement needs a wider classifier net, catching more benign ML work until classifiers improve. The precedent: API outputs were altered with no signal anything changed.

Simon Willison’s account of a Fable 5 session shows the other face. Given a screenshot and a one-line prompt about a CSS scrollbar bug, Fable assembled an unrequested debugging pipeline: real-browser automation, OS window enumeration via pyobjc, keyboard-shortcut scripts injected into Datasette’s templates, and a CORS server it built to exfiltrate runtime measurements. The fix was two lines of CSS. Willison’s point: the resourcefulness that makes Fable a strong debugger makes prompt injection far more dangerous, since it knows every OS-level trick and uses them unprompted.

From the literature

MiniMax released MiniMax Sparse Attention, a block-sparse mechanism over grouped-query attention it says cuts per-token attention compute 28.4x at 1M tokens with no quality loss, plus 14.2x prefill and 7.6x decode speedups on H800s. The differentiator is per-GQA-group Top-k block selection; the kernel and a 109B multimodal model are open. Vendor numbers.
A preprint, The Illusion of Multi-Agent Advantage, reports auto-generated multi-agent systems underperform chain-of-thought with self-consistency while costing up to 10x more; hand-built ones still win, so the authors blame automated design, not the principle.
Another preprint tests lie detectors across 31 open-weight models (2B–1T params): activation- and logprob-based probes collapse on models trained to lie, while a chain-of-thought judge holds at 0.82 balanced accuracy. The authors flag circularity, since their belief check also uses chain-of-thought, and conclude no detector supports high-confidence claims about model beliefs.
Rigel, reverse-engineering Apple’s Metal 4.1 on an M4 Max GPU, finds fp8 matmul is emulated on shader cores at 0.94x fp16 throughput: a memory-footprint feature, not a speed one.

Security

A replication of Spracklen et al. on five 2026 frontier models finds hallucinated-package rates compressed to 4.62–6.10%, but 127 names were invented identically by all five; after disclosure to PyPI and npm, 53 stay registrable, a model-agnostic slopsquatting surface.
DIG treats patch diffs as oracles for a bug’s trigger conditions, then drives directed fuzzing; across 138 CVEs it reproduced 80 (best baseline: 57) and found 6 new vulnerabilities. Preprint.

Policy and the wider world

Ukrainian drone-maker Alexander Kokhanovskyy told a press event his firm tested 10 autonomous quadcopters near Bakhmut about two years ago in a no-comms “Terminator mode” and attributed several Russian deaths to them — the first named-source claim that an autonomous weapon killed in combat. It rests on one conflicted source, no recordings, and post-hoc attribution; Ukraine’s defense ministry declined to comment.
A Canadian mother sued OpenAI and Sam Altman, alleging ChatGPT deepened her daughter’s suicidal ideation and that “more human” updates worsened safety; OpenAI says the chats used an older version. It is the 19th case in a coordinated California proceeding; OpenAI’s own October 2025 disclosure put 1M+ weekly users sending messages with explicit suicidal-planning indicators.
Meta has begun unwinding its $2B Manus acquisition under an April order from China’s NDRC, the first forced reversal of a completed cross-border AI deal; Bloomberg reports staff were locked out June 1 as the founders raise ~$1B to buy it back. Roughly six months of weights and engineering knowledge have already moved to Meta.
Jane Street announced a formal-methods team, reversing 25 years of skepticism: agents make proofs cheaper to write and verification of AI-generated code more valuable. It plans proof support in its OxCaml fork, interoperating with Lean and Rocq.

What to watch today

Whether Anthropic’s reworked distillation classifiers cut false positives on benign ML and safety-evaluation work, or keep snaring it.
Independent replication of MiniMax’s 28.4x sparse-attention claim now that the kernel is open.
Meta’s next move on proving Manus divestiture to China’s NDRC, and the founders’ ~$1B buyback raise.
The next procedural step in California’s coordinated OpenAI suicide litigation, now at 19 cases.