Tuesday, June 23, 2026
A new theory recasts prompt injection as role confusion the model can be tricked out of; the top open-weight model ships text-only; and a Codex logging default writes terabytes to local SSDs.
A new theory recasts prompt injection as role confusion the model can be tricked out of; the top open-weight model ships text-only; and a Codex logging default writes terabytes to local SSDs.
The US pulls two Anthropic models worldwide over a disputed code-review jailbreak, the first export-control takedown of a deployed model, as AI-found zero-days pile up in FFmpeg and the Pixel 9.
Anthropic reverses a covert policy that silently degraded Claude Fable's answers for suspected AI researchers, as new work undercuts both multi-agent systems and the probes meant to catch models lying.
Project Zero prices a full Pixel root chain at roughly eleven person-weeks and documents months of patch lag, as fresh benchmarks measure how far AI agents still fall short on real work.
A German court strips AI summaries of search's legal shield; Anthropic ships its most capable model behind heavy filters while its CEO asks to be regulated; and new research shows alignment passing benchmarks it quietly fails underneath.
Anthropic ships a frontier model that reroutes dual-use queries instead of refusing them, Amazon deploys random-graph datacenter networks at scale, and error messages emerge as a privileged prompt-injection surface.
A researcher reads two decades of encrypted military traffic hidden in the public GPS signal, OpenAI and Simon Willison both move to contain untrusted input to LLMs, and a $280 soundbar becomes a remote keyboard.
Hugging Face rebuilds its CLI for coding agents and benchmarks the token cost of hand-rolled alternatives; a preprint caps eval scores to expose agents that game the test; NVIDIA releases an open multimodal guardrail.
Cloudflare finds about half of Tier 1 networks accept forged BGP paths; Microsoft fields a from-scratch model family at Build; Uber caps coding agents at $1,500 a month.
Microsoft announces a seven-model MAI family backed by a rare, transparent training report; Alphabet raises about $80 billion, including Berkshire's first big Google stake, to fund the compute race.
An interpretability preprint says diffusion image models read only word meaning and order from prompts, a Lean4 framework brings formal verification to agent workflows, and attackers seized Instagram accounts by asking Meta's support bot.
Two frontier labs detail how they measure and contain their agents; a Zapier exploit chain and Vercel's "inference theft" show what weak containment costs; and reverse-engineers read microcode and hidden memory off the silicon.
Anthropic's $65 billion raise and an incremental Opus 4.8 lead a quiet day, with new research showing coding agents leaking secrets and firing real attacks at live sites.
OpenCode's founder picks apart the pitch that AI lifts team output, Stratechery sizes up satellites as server racks, and Cisco Talos open-sources synthetic security logs that stay consistent across 20-plus formats.
Huawei pitches an architecture-first scaling law to skirt EUV denial, the memory supercycle prices sub-$100 phones out of emerging markets, and Google's AI search box draws a reported migration to rivals.
A maintainer puts hard numbers to open source's agent-traffic problem, an AI disproves an 80-year-old Erdős conjecture, SPEC's new CPU benchmark gets its first independent teardown, and a CISA contractor publishes the agency's own cloud keys.
Microsoft Research releases a codesigned small-model agent stack and claims it leads computer-use benchmarks it ran itself.
An OpenAI reasoning model produces an externally verified disproof of a 1946 Erdős conjecture; a GitHub employee's poisoned IDE extension exposes about 3,800 internal repos; and an essay rereads China's AI optimism as fear of falling behind.
Google sends its agentic science assistant to Nature and into Gemini for Science, Anthropic splits agent brains from hands on Cloudflare, and a new lattice-QCD result quietly closes the muon g−2 anomaly.
Two vendor field reports put a security-tuned Anthropic model preview to work on real codebases and credit the scaffolding over the model, as Marc Brooker reframes where coding agents win.
Model internals own a quiet day: how 2026's open-weight LLMs cut long-context cost, two sober takes on RL and steering, and Gemini 3.5 Flash ships.
A hidden lock in ClickHouse query planning stalled Cloudflare's billing; AI turns up on both sides of the CVE curve; and new open releases put their gains down to better data, not bigger models.
OpenAI hand-builds a Windows sandbox for its Codex agent and discloses an npm worm that forced a code-signing certificate rotation, while Microsoft Research opens up the mimalloc allocator.
Google Project Zero prices a full Pixel zero-click near eleven person-weeks and shows memory safety blocks it; Anthropic ships a frontier model that refuses basic biology and can silently degrade rivals' code; and AWS makes flat random-graph networks its datacenter default.
Cloudflare finds half the internet's Tier 1 backbones accept forged BGP routes; Microsoft fields a from-scratch model family with a rare 109-page training report; and Alphabet raises $80 billion as AI's compute bill comes due.
Machine-generated code, issues, vulnerability reports, and even an Erdős counterexample surged this week; the humans who verify them did not, even as Anthropic raised toward a trillion dollars to automate more of the work.
OpenAI says a general-purpose model overturned a decades-old result on Erdős's unit-distance problem; Cloudflare ran a preview security model through a 50-agent exploit-hunting harness; and Marc Brooker reframed where coding agents win as a question of feedback, not model size.
VulnCheck says AI-assisted bug-hunting is bending the CVE disclosure curve, an npm worm reached OpenAI's code-signing certificates, and three systems teardowns show how much is still built by hand.
A first-of-its-kind US export-control order pulled Anthropic's most capable models offline worldwide over a code-auditing jailbreak, the same month Project Zero and a startup's agent showed how cheap that capability has become.
A general-purpose model disproved a 1946 conjecture and preview security models chained working exploits, but across math, security, and open source the month's scarce resource was verification, not generation.