The Act2Answer protocol provides a new method to evaluate the commonsense and world knowledge retention of Vision-Language-Action (VLA) models, which is crucial for understanding their limitations and improving them.
Act2Answer adapts VLM knowledge benchmarks to VLA evaluation by requiring agents to answer questions through object placement actions
A large-scale study was conducted on 7 VLA models and 9 VLM baselines
VQA co-training is associated with better knowledge retention in VLA models
Full summary
The paper introduces Act2Answer, a protocol for evaluating Vision-Language-Action (VLA) models' commonsense and world knowledge retention. This method adapts VLM knowledge benchmarks to VLA evaluation by requiring agents to answer questions through object placement actions. The study includes a large-scale analysis of 7 VLA models and 9 VLM baselines, revealing that VQA co-training improves knowledge retention in VLA models. Layerwise intent probing indicates that relevant signals peak in middle layers but attenuate in upper layers.
The introduction of the Cloudflare Monetization Gateway enables seamless micropayments for web assets, addressing a critical gap in monetizing AI-driven usage.
Details
Cloudflare's Monetization Gateway allows charging for any asset protected by Cloudflare via stablecoins over x402 protocol
x402 settles payments in under a second with negligible fees down to fractions of a cent
Monetization Gateway scales across 330+ cities through Cloudflare’s global network
Cloudflare introduces the Monetization Gateway, enabling customers to charge for any digital resource protected by Cloudflare using stablecoins via the x402 protocol. This new system simplifies usage-based billing by handling payment verification at the edge, reducing overhead and latency. The gateway supports micropayments down to fractions of a cent with sub-second settlement times, making it ideal for AI-driven transactions. It scales across 330+ cities through Cloudflare’s global network and offers features like variable pricing based on task complexity.
This research reveals that vision-language-action models can significantly reduce their language backbone size without sacrificing performance, challenging the conventional wisdom about model capacity requirements.
Details
Introduces Drop-Then-Recovery (DTR) protocol to analyze redundancy in VLA models
Proposes GateProbe metric for ranking transformer blocks by contribution to action loss
Removing half of LLM blocks improves OpenVLA-OFT performance from 95.0% to 98.3% on LIBERO benchmark
The paper presents Drop-Then-Recovery (DTR), a method for assessing redundancy in vision-language-action (VLA) models by removing transformer blocks and measuring performance recovery. It introduces GateProbe, a sensitivity metric that ranks block contributions to downstream action loss. Across various VLA architectures and benchmarks, including real-world industrial scenarios, the study finds high redundancy in language backbones while vision and action pathways are more critical. Removing half of the large language model (LLM) blocks improves performance on LIBERO from 95.0% to 98.3%, suggesting that current VLA benchmarks may not adequately pressure deep language grounding and compositional instruction understanding.
This reverse-engineered account of the Apple Neural Engine provides unprecedented technical details that could inform hardware design, AI performance optimization, and security research.
Details
The ANE is a fixed-function matrix accelerator in Apple's A11-class iPhone/iPad chips and M1-class Mac chips since their release
The guide documents the engine’s datapath, roofline performance bounds, dispatch route below Core ML framework, compiler, on-disk program format, weight-compression scheme, kernel driver, firmware, and command protocol
Covers A11 through A18 and M1 through M5 families with per-chip target tables and operation-by-device matrix
The article presents a reverse-engineered account of the Apple Neural Engine (ANE), detailing its architecture, programming interfaces, and performance characteristics. It covers the ANE's presence in various Apple silicon families from A11 to M5, including direct measurements on M1 and M5 chips. The guide documents the engine’s datapath, roofline performance bounds, dispatch route below Core ML framework, compiler, on-disk program format, weight-compression scheme, kernel driver, firmware, and command protocol. Claims are categorized as measured, decompiled-derived, or predicted to ensure transparency.
This project showcases the potential of using large language models (LLMs) to generate complex software systems with extensive manual review and testing, pushing the boundaries of automated code generation.
Details
Webernetes is a partial port of Kubernetes to TypeScript for running clusters in the browser
Generated over 100,000 lines of code across 629 files in 2 months with LLMs
Supports key Kubernetes features like pod lifecycles, DNS, networking, and Deployment tracking
The author released webernetes, a TypeScript port of Kubernetes that runs entirely in the browser. Over two months, LLMs generated nearly 100,000 lines of code across 629 files with extensive manual review and testing. Webernetes supports core Kubernetes features such as pod lifecycles, DNS, networking, and Deployment tracking. The project includes over 1855 unit tests and 204 integration tests to ensure the ported code functions correctly in both Go and JavaScript environments.
Understanding how to implement the 'parse, don't validate' principle in TypeScript can significantly improve type safety and reduce bugs.
Details
Alexis King's Parse, Don't Validate principle was published in 2019
TypeScript supports but does not enforce parsing over validation
Branded types use unique symbols to create distinct types (e.g., EmailBrand)
The article discusses implementing the 'parse, don't validate' principle in TypeScript using branded types and discriminators. It explains that while TypeScript allows this approach, it does not enforce it like Haskell or Elm do. The author describes how to use unique symbols to create distinct types (branded types) and demonstrates parsing functions with error handling using discriminated unions. Zod and similar libraries are mentioned as tools that simplify the process but still require discipline from developers.
MultiHashFormer offers a novel approach to reducing the computational overhead of large language models while maintaining or improving performance, which is crucial for scaling AI applications.
Details
Proposes MultiHashFormer, a hash-based generative language model
Each token represented as unique hash signature using multiple independent hash functions
Evaluates at 100M, 1B and 3B parameter scales
The paper introduces MultiHashFormer, a new framework for hash-based autoregression in causal language models. Each token is uniquely represented by a hash signature generated from multiple independent hash functions. A Hash Encoder compresses these signatures into latent vectors processed by a Transformer decoder, while the Hash Decoder generates the next token's hash signature. The model demonstrates superior performance across various benchmarks at different parameter scales and effectively manages multilingual vocabulary expansion without increasing computational requirements.