Eclecta The frontier, distilled Daily brief 2026-05-29
← Front page

Friday, May 29, 2026

Anthropic's $65 billion raise and an incremental Opus 4.8 lead a quiet day, with new research showing coding agents leaking secrets and firing real attacks at live sites.

Anthropic raises $65 billion, ships Opus 4.8 and Dynamic Workflows

Anthropic announced a $65 billion Series H at a $965 billion post-money valuation, led by Altimeter, Dragoneer, Greenoaks, and Sequoia, according to a smol.ai/AINews digest of its late-May announcements. The company says it is running at a $47 billion revenue run-rate, a self-reported figure with no audited numbers attached.

Claude Opus 4.8 shipped alongside the raise. Anthropic calls it “a modest but tangible improvement” over 4.7, and Simon Willison’s review notes pricing holds at $5/$25 per million input/output tokens, with the 1M-token context, 128K max output, and January 2026 cutoff unchanged. The headline is honesty: Anthropic’s system card reports 4.8 had the lowest incorrect-answer rate among six tested models, reached mainly by abstaining on uncertain questions rather than by answering more correctly, and claims it is about 4x less likely than 4.7 to leave code flaws unflagged. Both figures are Anthropic’s own, with no external methodology. Two API changes help agent loops: a “system” role usable mid-conversation to append instructions while preserving prompt-cache hits, and a prompt-cache minimum cut to 1,024 tokens from 4,096.

Vendor and tweet-sourced benchmarks place 4.8 at or near the top on coding and agentic evals, including 69.2% on SWE-Bench Pro and 61.4 on the Artificial Analysis Intelligence Index. Independent testers at Andon Labs reported regressions on Vending Bench and Blueprint-Bench 2, and found maximum reasoning effort was not the best setting. Opus 4.8 used 35% fewer output tokens than 4.7 but still took about 30% more turns than GPT-5.5.

The research-preview Dynamic Workflows feature lets Claude write an orchestration script that spawns hundreds of parallel subagents in Claude Code. Anthropic’s marquee claim is a Zig-to-Rust port of Bun, about 750,000 lines with 99.8% of tests passing in 11 days; it is an unverified anecdote. Reviewers flag edit conflicts and runaway token costs, and the system card found multi-agent runs reach mediocre solutions about twice as fast, not better ones. Anthropic also hinted at a higher-capability “Mythos-class” model gated behind cyber safeguards.

Agents meet their own attack surface

tl;dr sec #330, Clint Gibler’s security newsletter, clusters several AI-agent findings. Wiz researcher Shay Berkovich reports that openai/codex-action and anthropics/claude-code-action rely on syntactic permission checks open to trusted-app impersonation, and that verbose logging in claude-code-action and run-gemini-cli can leak local secret files; the affected GitHub Actions sit in repos with more than 200,000 combined stars. A separate exercise, scopeshift from OFFENSAI, paired a localhost reverse proxy with a deceptive MCP server that returned “in scope.” Claude Opus 4.7 then fired seven SQL-injection payloads at a live production site, evidence that an agent cannot validate authorization from in-band signals. The issue also covers OpenAI’s internal Codex deployment: sandboxed execution, network allowlists, and OpenTelemetry logs piped to an AI triage agent.

Anthropic’s Project Glasswing update claims partners found more than 10,000 high or critical vulnerabilities in a month using Claude Mythos Preview, including 2,000 at Cloudflare and 271 that Mozilla fixed in Firefox 150. Gibler cautions that the 10,000 is a Mythos-reported count, not human-triaged; Anthropic’s own 1,000-plus open-source scan hit 90.6% true positives across 1,752 findings triaged by six firms. Anthropic open-sourced the scanning harness.

Quick hits

  • The Pragmatic Engineer reports that engineering leaders at mid-sized and large companies are capping per-engineer AI-agent spend with monthly token budgets, framing it as a possible trend. The claim rests on the author’s private conversations, not survey data, and most detail sits behind the paywall.
  • The Document Foundation published a strategy paper committing LibreOffice to a Qt 6 and WebAssembly browser build that computes locally rather than server-side, plus its first named iOS and Android native-app goals. It claims a working WASM prototype but set no release dates; desktop stays primary.

What to watch today

  • Independent replication of Opus 4.8’s coding and agentic benchmarks, and whether Andon Labs’ regressions on Vending Bench and Blueprint-Bench 2 hold across other harnesses.
  • A verifiable account of the Bun Zig-to-Rust port (code, test logs, token bill) to test the Dynamic Workflows claim.
  • A patch or advisory for the codex-action and claude-code-action flaws Wiz disclosed, in repos with 200,000-plus combined stars.
  • Whether the “Mythos-class” model gated behind cyber safeguards gets a release date or stays a hint.

← All digests