Wednesday, June 24, 2026

The auditor costs three hundred dollars

Revelio, an agentic vulnerability finder in a new preprint, reports 19 previously unknown memory-safety bugs across seven production projects that had been fuzzed continuously for five to eight years, at about $300 total and roughly an hour per project. Its discipline is what makes the number worth citing: every finding ships with an executable proof that triggers the bug and a deterministic sanitizer that confirms it, so the system reports nothing it cannot reproduce. A cheap static pass ranks hypotheses before any model runs, and the pipeline uses inexpensive models rather than frontier ones. That the bugs survived years of coverage-guided fuzzing suggests the agent reaches semantically complex paths mutation misses, not that it fuzzes faster.

The same day’s arXiv brought the counterweight, work to bound what such agents may do. PORTICO names the gap it targets “lingering authority”: a capability granted for one subgoal that stays live in the agent’s interface afterward, letting it replay file writes, git mutations, or network calls it no longer needs. Its reference monitor issues epoch-bound handles that are revoked when a subgoal closes and checked before any side effect; in the authors’ tests it blocks all 10 stale-reuse attempts a non-revoking baseline permits. VeriPort points the same autonomy at supply-chain hygiene, reporting more than 5,000 verified backported patches across 169 high- and critical-severity CVEs, each carrying separate proofs that it blocks the exploit and preserves function.

Skills become a surface

As agents accumulate reusable skills, the skill library itself becomes something to attack and audit. SkillHarness treats skills learned from trajectories as an injection surface, where a poisoned trajectory teaches a harmful skill that later replays, and reports a 57% cut in unsafe-skill rate against unnamed baselines. Skill Coverage asks a blunter question: are documented skills even exercised? On SkillsBench, successful runs touched only 40 to 44% of a skill’s stated behaviors, and whether a task passed was largely independent of whether its skill was tested. HDSO proposes validating each candidate skill against a falsifiable hypothesis before keeping it, and holds its ALFWorld gains even when a fifth of its training feedback is flipped. All three are preprints.

Screening and surveillance

Two stories concern AI turned on people. A Stanford-led audit of four million job applications, covering 1,700 postings across 150 employers and 11 sectors, found that the AI screening tools now used by about 90% of US employers rejected 26% of Black applicants and 15% of Asian applicants along racial lines; because many employers buy from the same vendor, a rejected candidate can be rejected everywhere at once. Separately, 404 Media reports that Madison Square Garden kept a dossier on activists who campaigned against its facial-recognition entry system, built from their own posts; the document surfaced in a 45GB cache from a breach of the company, and named the EFF’s Adam Schwartz among its subjects.

What to watch today

Whether Revelio-style prove-it-or-drop-it auditing replicates independently, and whether maintainers can triage a flood of machine-found, machine-proved bugs.
CivBench measured frontier agents executing under two-thirds of their own stated next moves, 48% for Claude Opus 4.6, and skipping the checks that would have caught a loss; a reliability floor to watch as long-horizon agent products ship.
A preprint on emergent misalignment argues the broad misbehavior induced by narrow fine-tuning is structural, predictable from a model’s pre-tuning activations rather than fixable by changing the optimizer. Peer review will test that.