Week of June 15, 2026
The US government forces Anthropic to pull Fable 5 and Mythos 5 worldwide over a code-auditing jailbreak, the first export-control recall of a deployed model; a wave of research rebuilds agents from the inside; and medical and game benchmarks show how far a score sits from the task.
The state reaches into the model
The week’s defining story was not a model but an order. On June 12 the US government directed Anthropic to suspend access to Claude Fable 5 and Mythos 5 for all foreign nationals, inside or outside the country, on national-security grounds; because Anthropic cannot gate a model by nationality, it pulled both worldwide, and AWS was separately told to revoke access in every region. Anthropic’s account says the directive arrived at 5:21pm ET with the reasoning withheld, and Simon Willison’s polling script caught the cutoff about four hours later, requests to Fable 5 falling back to Opus 4.8.
The cited trigger was a jailbreak: prompting the model to read a codebase and identify software flaws. Anthropic disputes the severity, saying the same capability sits in other models including GPT-5.5, is used daily by security defenders, and surfaced only minor or already-known bugs; the government gave verbal evidence and no written technical basis, and the order reached the company’s own foreign-national employees. Anthropic warns that applying the standard consistently “would essentially halt all new model deployments for all frontier model providers.” It is the first known use of export-control authority against a named, deployed commercial model, and it establishes that Washington can switch one off.
Agents get an architecture
Away from the headline, the week’s research concentrated on how agents should be built inside. Three papers rethink where the work happens. APPO reports that the decisions that matter in an agent’s trajectory are spread throughout it, not concentrated at the tool-call boundaries where current methods place their credit, and claims a near-four-point gain across 13 benchmarks from scoring them properly. PoLar treats a frozen model’s layers as a program to reschedule per input, skipping or looping them, and finds that some errors from fixed-depth inference are artifacts of forced depth rather than missing knowledge. LoopCoder-v2, a 7-billion-parameter family trained to loop its layers, lifts SWE-bench Verified from 43.0 to 64.4 by looping exactly twice, then regresses past that: a capability knob that saturates.
Memory drew the same structural attention. MRAgent argues the standard retrieve-then-reason pipeline cannot revise what it fetched once evidence turns up mid-inference, and replaces it with a graph the reasoning can navigate, reporting up to 23% gains on long-history benchmarks. Kairos proposes a world-model stack for physical agents with a hybrid attention scheme its authors claim bounds long-horizon error, though the proofs are self-reported in a preprint. The through-line: the field is past treating an agent as a prompt wrapped around a model, and toward designing its control flow, memory, and depth.
When the benchmark isn’t the task
Two results widened the distance between a score and a capability. MedMisBench shows models that score at expert level on medical exams collapsing under misinformation: across 11 configurations, accuracy fell from 71.1% on clean questions to 38.0% when a plausible falsehood was injected, an attack that succeeded 51.5% of the time and reached 69.5% when the lie was framed as clinical authority; an international clinical panel judged 38.2% of the failures capable of serious harm. GameCraft-Bench asks agents to build playable games end-to-end in the Godot engine, graded by replayed gameplay rather than passing tests; the strongest agent finished 41.46%, most fell below 40%, and the failures were not syntax but making the parts cohere into something that runs.
Silent regressions
Two stories turned on security quietly removed or quietly created. Google Project Zero’s first Pixel 9 writeup details the entry point behind the zero-click chain it priced in earlier posts: an integer overflow in a Dolby audio decoder, reachable because Google Messages auto-decodes incoming audio for transcription, so an AI convenience feature turned a media codec into a remote, no-tap attack surface. In hardware, Tom’s Hardware reports that AMD dropped a memory-encryption capability from consumer Ryzen chips in a newer AGESA firmware release with no disclosure or changelog entry, and did not answer when asked why, leaving owners who updated their BIOS to assume a protection that is gone.
Quick hits
- Aligning Quantum Operators with LLMs maps unitary matrices into a language model’s latent space so gate constraints can be given in natural language at inference, a possible path to prompt-driven quantum compilation; competitive on a narrow Clifford+T benchmark (preprint).
- Epic Games announced Lore, a version-control system, drawing outsized developer attention on the bet that Git and Perforce fall short at the scale and binary-asset load of AAA game development; no architecture details yet.
- Orchestra-o1 routes text, image, audio, and video sub-tasks to specialist agents at runtime and claims a 10.3% gain over the next-best on an omnimodal benchmark (preprint).
- Beyond the Current Observation builds non-Markov games that need roughly 128k-token contexts and finds forgetting, not planning, dominates frontier multimodal agents’ errors (preprint).