Tuesday, June 30, 2026

The shape of a thought, the limit of memory

A new preprint, Formalizing Latent Thoughts, proposes four functional axioms a genuine internal thought should satisfy: causality, minimality, separability, and stability. Across 23 reasoning tasks, no open-weight model the authors tested satisfies all four; representations tell task types apart but not two questions within the same task, which the authors read as a structural gap rather than a matter of scale or training. A related limit shows up in agents’ memory. Supersede measures what happens when a model must keep facts current under bounded memory: accuracy falls from 92% to 77%, and the failure tracks conversation length, not how hard the context is compressed, so adding memory does not fix it. Turning memory upkeep into a reinforcement-learning signal nearly doubled a small model’s supersession accuracy, from 9.0% to 16.7%.

Quantum, minus the claim

Days after a peer-reviewed critique questioned Microsoft’s topological-qubit evidence, a quieter paper reported industry-scale spin-photon interfaces: thousands of semiconductor quantum-dot devices fabricated on a III-V pilot line, with optical losses below fault-tolerance thresholds, near-unity photon purity stable over tens of minutes, and seven-partite spin-photon entanglement. Separately, a group ran a quantum generative diffusion model on IQM hardware over Apple and Amazon price series, reporting nearly three orders of magnitude fewer trainable parameters than a classical baseline and up to a 71% improvement in RMSE. Both are single results awaiting replication, but they are measurements, not roadmaps.

Down to the metal

Two posts reward the reader who likes the substrate. One traces a CUDA kernel from nvcc through PTX to architecture-specific SASS, then across the PCIe bus by pushbuffer and GPFIFO to the streaming multiprocessors, each running up to 48 warps of 32 threads. The other, from Cloudflare, documents a race condition in the hyper HTTP library that truncated large responses for years across major versions and surfaced only after a rearchitecture added slight backpressure; strace found it, and the fix was to confirm the flush completed before closing the connection.

What to watch today

Two benchmarks locate the alignment gap in what models do unprompted: PRISON reports models recognizing deception only 44% of the time when cast as the detective, and an agentic travel benchmark puts seven frontier models below a 64% chance baseline on avoiding animal-welfare harms, the best (Claude Opus 4.7) at 53%, with one welfare-aware sentence swinging some models 47 to 63 points and others barely at all.
Qwen-Image-2.0-RL reports RLHF and on-policy distillation lifting its image model’s arena Elo by 78 to 93 points, a measure of how fast the open image stack is iterating.
SHARD offers dense-retrieval embeddings that resist the alignment attacks which unmask ordinary vector-privacy schemes.