Friday, May 15, 2026
A hidden lock in ClickHouse query planning stalled Cloudflare's billing; AI turns up on both sides of the CVE curve; and new open releases put their gains down to better data, not bigger models.
One mutex, hundreds of serialized queries
Cloudflare’s billing aggregation on ClickHouse slowed after a January 2025 change, and the cause was a lock no standard metric showed. To enable per-namespace retention, engineers changed the shared “Ready-Analytics” table’s partition key from (day) to (namespace, day), which grew parts per replica from about 30,000 toward 160,000 a year. Standard metrics looked normal and no single query read more data, yet latency tracked the total part count. CPU flame graphs showed nothing, because they sample only active threads. “Real” traces, which include waiting threads, found over half of each query’s duration spent contending for the MergeTreeData mutex during query planning: every thread took an exclusive lock, copied the entire parts vector, released it, then filtered, serializing hundreds of concurrent queries. Cloudflare’s fixes, a shared read lock and a deferred copy of only the filtered subset, are upstreamed as ClickHouse PR #85535 in v25.11; a binary search over the partition-ID prefix prunes parts further. The team says ZooKeeper metadata bloat remains unsolved, and questions whether the partition scheme was the right call.
AI on both sides of the vulnerability lifecycle
AI is appearing on both sides of the vulnerability lifecycle, on uneven evidence. VulnCheck argues that sharp year-to-date jumps in CVE disclosures signal AI-assisted bug-finding at scale: Chrome up 563%, GitHub issuance up 476%, VMware up 181%. It credits Anthropic’s claimed April 2026 system, Claude Mythos, and its reported thousands of zero-days. The causation is inferred, the percentages lack baselines, and VulnCheck sells vulnerability intelligence. The post is candid about false positives: curl’s Daniel Stenberg found only one of five Mythos reports he reviewed was a valid CVE. One confirmed AI-assisted find, an ActiveMQ remote-code-execution flaw (CVE-2026-34197), sits on CISA’s exploited list. On offense, Cisco Talos reports active exploitation of a Catalyst SD-WAN authentication-bypass bug (CVE-2026-20182) and a wider campaign against vManage systems left unpatched since February’s fixes; one of ten intrusion clusters ran a Nim implant hosted on replit.dev that Talos judges was likely built with AI.
Models and research
IBM released Granite Embedding Multilingual R2, two Apache-2.0 embedding models built on ModernBERT, both extending context to 32,000 tokens, 64 times R1’s window. IBM reports the 97M-parameter version scores 60.3 on MTEB Multilingual Retrieval, its claimed best open result under 100M parameters, against 50.9 for multilingual-e5-small; the largest gains are on long-document and code retrieval, the payoff of the longer window. The numbers are self-reported, and by IBM’s own table rival open models lead on retrieval and code. The models drop into sentence-transformers, LangChain, and LlamaIndex, with ONNX weights for CPU.
A CVPR 2026 paper from CMU argues that vision-language models fail at cinematic prompts like rack focus because their training captions lack the vocabulary, not because the models are too small: scaling on the same captions barely moved the metrics. Their pipeline, CHAI, has the model draft a caption, a professional cinematographer critique it, and the model revise, then trains on the resulting triples; an ablation shows the critique’s accuracy, completeness, and constructiveness each change downstream results. The authors say a post-trained 8B Qwen3-VL matches or beats GPT-5 and Gemini 3.1 Pro on their own metrics, with no independent test yet. The spec, data, and code are released.
What agents pay for messy code
SonarSource put a number on whether code quality matters when the reader is an agent. Across six matched repository pairs and about 540 runs on Claude Code with Sonnet 4.6, the cleaner version of each pair used roughly 7% fewer input tokens, 8.5% fewer output tokens, and about a third fewer file revisits, with task completion unchanged. The reading: messy code leads agents to re-read and revisit files more before editing. Sonar defined “quality” with its own SonarQube and ran the study itself, the effects are averages with wide per-task spread, and it tested a single-shot setting.
What to watch today
- Whether ClickHouse 25.11 adopters confirm Cloudflare’s planning fix (PR #85535) on other high-part-count workloads.
- Whether VulnCheck’s CVE surge holds, or fades once frontier models finish their first sweep of major codebases.
- Independent tests of CHAI’s 8B Qwen3-VL claim against GPT-5 and Gemini 3.1 Pro, since the numbers are the authors’ own.
- Whether Cisco’s CVE-2026-20182 exploitation, “limited so far” per Talos, spreads past the UAT-8616 actor.