Friday, July 3, 2026
Anthropic redeploys Fable 5 and Mythos 5 after the US lifts the first export-control recall of a deployed model, now with a jailbreak-severity framework attached; a single trained transformer layer matches full RL; and a benchmark grading agents as senior engineers fails them three times in four.
Fable 5 comes back
Anthropic redeployed Claude Fable 5 and Mythos 5 on July 1, after the US export controls that pulled them worldwide on June 12 were lifted on June 30. The three-week recall, the first use of export-control authority against a named, deployed model, ends with new machinery rather than a simple reinstatement. Anthropic says fresh safety classifiers block the jailbreak techniques at issue with better than 99% accuracy, at the cost of flagging more benign requests. It also describes an industry effort to agree on how to score a jailbreak’s severity along four axes: how much capability it adds, how broadly it applies, how easily it is weaponized, and how discoverable it is. The government’s role is now a channel, with pre-release access for evaluation, faster information sharing, and joint research. The precedent set in June, that Washington can switch off a model, now comes with a process.
Smaller knobs, same gains
Two results argue that capability need not cost more compute. A preprint reports that training a single transformer layer can match or beat full-parameter reinforcement learning, and that the layer that matters most sits in the middle of the stack, a pattern that held across seven models, two Qwen families, and three RL algorithms. Further out, another group scaled up thermodynamic AI models built on Ising-machine hardware meant for low-power inference, turning a correspondence between high-temperature Gibbs sampling and feed-forward networks into a backpropagation algorithm that reaches 94.9% on CIFAR-10 and 76.0% on CIFAR-100 under binary sampling. Both are early, and both point the same way: fewer trained parameters, or a different substrate, for comparable accuracy.
Grading the senior engineer
Senior SWE-Bench, from Snorkel, raises the bar on coding-agent evaluation by scoring agents as senior engineers: tasks span multiple services, take hundreds of steps, and are graded by a validation agent that writes behavioral tests and weighs runtime correctness against the codebase’s own conventions. Top models fail its senior-level tasks more than 75% of the time. A companion caution comes from theory: a preprint on multi-agent belief shows that when verifier agents lag, false claims propagate and the system oscillates, and derives a closed-form instability threshold that, for a two-step delay, lands on the inverse golden ratio, about 0.618. Grounding every answer in fact removes the effect.
What to watch today
- HARC couples harmfulness and refusal directions during fine-tuning and reports the best robustness-capability-usability trade-off among six methods across five model families, a cleaner handle on the jailbreak problem the Fable recall was about.
- Whether the jailbreak-severity framework Anthropic describes gets adopted by other labs, or stays one company’s rubric.
- For the physics-minded, a claimed counterexample to Ostrogradsky’s theorem builds a UV-complete four-derivative field theory on a Krein space through a hidden “ghost parity,” keeping causality and unitarity where higher-derivative theories usually break.