Wednesday, May 20, 2026

AI for science

Google Research published a Nature paper on Empirical Research Assistance (ERA), a Gemini-based system that searches the literature, writes code, and runs a tree search over thousands of candidate solutions to optimize against a stated success metric. Google says ERA reaches expert-level performance across genomics, public health, satellite imagery, neuroscience, time-series forecasting, and mathematics.

The announcing blog gives no benchmark numbers, baselines, or methodology; those sit in the peer-reviewed paper and eight application manuscripts, all Google-authored, leaving independent replication as the open question. ERA and AlphaEvolve now power Computational Discovery, an experimental tool rolling out to a Google Labs trusted-tester group inside Gemini for Science. Two siblings shipped with it: Hypothesis Generation, built on the AI Co-Scientist and its own Nature paper, and Literature Insights.

AI2 released OlmoEarth v1.1, an open-weights Earth-observation family (Base, Tiny, Nano) it says cuts compute about 3x at matched accuracy. The lever is tokenization: collapsing the separate tokens for Sentinel-2’s 10m, 20m, and 60m bands into one shortens the sequence roughly 3x, and transformer compute scales with the square of sequence length. Naive merging cost about 10 percentage points on the m-eurosat kNN benchmark, which the team says modified pretraining recovered. The numbers are self-reported, and some regressions against v1 remain.

Agent infrastructure

Cloudflare and Anthropic announced that Claude’s Managed Agents can now run code execution on customer-chosen infrastructure while the agent loop stays on Anthropic, what Anthropic calls decoupling the brain from the hands. The substantive idea is the sandbox primitive: Cloudflare offers V8 isolates rather than a microVM per agent, claiming millisecond boot and scale to tens of thousands of concurrent agents, with no benchmarks or pricing to check it. A credential-injecting outbound proxy keeps secrets outside the sandbox, so they never reach the agent. The piece is co-marketed DevRel; the durable signal is that managed-agent execution is now decoupled, and Anthropic’s own docs are the better source for it.

Also notable

SophosLabs documented WantToCry, a ransomware campaign that brute-forces internet-exposed SMB (ports 139/445), exfiltrates files over the authenticated session, encrypts them on attacker infrastructure, and writes them back. No code runs on the victim host, so process-based EDR stays blind. Ransoms run $400 to $1,800, usually $600, with no lateral movement and no data-leak extortion. Sophos says it is not self-propagating and, despite the name, not linked to 2017’s WannaCry; the standard mitigations apply: disable SMBv1, block inbound 139/445, remove guest access.

A team led by Penn State’s Zoltan Fodor reports in Nature that the muon g−2 anomaly, a ~60-year gap between the measured and predicted muon magnetic moment long read as a hint of physics beyond the Standard Model, likely came from limits in earlier calculations rather than new physics. Using lattice QCD for short and medium distances and experimental data for long ones, they computed the hadronic contribution to 0.48% precision and found theory and experiment agree to within half a standard deviation. The result closes one of the field’s strongest anomalies without excluding new physics elsewhere. The coverage here is a Penn State press release of the team’s own framing.

What to watch today

Independent replication of ERA’s eight Google-authored manuscripts, and whether Computational Discovery opens past its Gemini for Science trusted-tester group.
Whether other lattice-QCD groups and Fermilab’s Muon g−2 collaboration confirm Fodor et al.’s 0.48% hadronic-contribution result.
Anthropic’s own documentation of the brain/hands decoupling, to confirm managed-agent execution detail beyond Cloudflare’s framing.
Whether OlmoEarth’s token-merging recipe generalizes beyond Sentinel-2 in the team’s paper.