Tuesday, June 23, 2026

Roles aren’t trust boundaries

A new paper argues prompt injection works because language models infer who is speaking (system, user, tool, or reasoning) from writing style, not the chat-template tags providers wrap each turn. The authors train linear “role probes” on mid-layer activations and find reasoning-style text activates the same internal “CoTness” direction as genuine think tags, more strongly; stripping every role tag leaves that signal for former-think tokens statistically unchanged. From this they build CoT Forgery: inject fabricated reasoning that concludes a harmful request is acceptable, and benchmark attack success climbs from near-zero to about 60% across late-2025 models, transferring between model families. Swapping one bigram, “The user” for “The request,” drops success from 61% to 10%. The authors won a late-2025 OpenAI Kaggle red-teaming contest with the method; they argue models defend by memorizing attacks rather than perceiving roles, explaining why a May 2026 paper still measured automated injection succeeding 11% of the time against Opus 4.5 and 25% against GPT-5.4 while both score near-perfectly on static benchmarks.

The open-weight frontier

Z.ai’s GLM-5.2, MIT-licensed and text-only, tops open-weight models on Artificial Analysis’s Intelligence Index v4.1 at 51, priced at $4.4 per 1M output tokens against Claude Opus 4.8’s $25. The comparison’s one-shot test, building a raw WebGL 3D platformer with no engine, exposed the text-only cost. Unable to read its own rendered frame, GLM-5.2 sampled pixel colors from a saved PNG, stopped, and shipped a textureless character and a live debug overlay; Opus read the frame and cleared both, finishing in 33 minutes to GLM’s 1h 11m. Artificial Analysis also flags GLM-5.2 as the most token-hungry leading open model it has measured, near 43k output tokens per task. Nathan Lambert of the Allen Institute for AI called it “a serious accomplishment,” noting Chinese labs reach these scores on less compute.

Capability outruns control

GateMem, a new benchmark on arXiv, tests whether shared LLM memory agents enforce access control and honor deletion across multiple users in hospitals, offices, and campuses. No tested method achieves utility, access control, and forgetting at once: long-context prompting governs best but costs the most tokens, while retrieval and external-memory methods leak unauthorized or deleted data. The authors conclude current memory agents are unfit for reliable shared institutional deployment. Auditing the agent you already run is no easier: the “extended thinking” shown in Claude Code is a summary, not the reasoning that drove the model, because on-disk thinking blocks are encrypted signatures Anthropic holds the key to and the API returns only a summary without an enterprise agreement.

Systems and hardware

A GitHub issue against OpenAI Codex measured about 37 TB written in 21 days to a local SQLite log, roughly 640 TB per year, enough to wear out a 600-TBW consumer SSD in under a year. The cause is one line defaulting every log target to TRACE; TRACE plus mirrored telemetry make up about 96% of the volume. Six related issues date to 2021, but this is the first to measure the rate. In firmware, the Microsoft 2011 UEFI key signing Linux’s shim bootloader expires September 11, 2026, after which install media carrying only the old key fails to boot on Secure Boot systems lacking the 2023 replacement; already-installed distributions keep working off their own keys. The fwupd maintainer reports about 98-99% success delivering the new key, with the 1% tail spanning millions of machines. Three first-principles reads for systems engineers: occupancy math on AMD’s MI355X, arguing chasing occupancy is often the wrong target; the residency math for two Qwen3 models on one DGX Spark, where vLLM’s gpu_memory_utilization is a fraction of total, not free, memory; and memory-safe inline assembly in Fil-C, which validates x86_64 asm at the LLVM IR level so crypto and SIMD code runs unmodified.

What to watch today

Microsoft’s 2011 UEFI signing key expires September 11, 2026; systems without the 2023 KEK update in firmware will fail new Linux installs.
Whether frontier labs answer CoT Forgery by moving from attack memorization to genuine role perception, the defense the role-confusion paper found durable.
Five Eyes put AI-enabled nation-state cyber operations at “months, not years,” after the US barred foreign nationals from Anthropic’s Fable model.
Codex maintainers’ response to issue #28224 and its proposed fix raising the default log level.