Eclecta — research

Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination

Thu, 02 Jul 2026 17:48:25 GMT

Why it matters: Graph-native reinforcement learning offers a pathway to more interpretable AI systems capable of generating scientifically valid hypotheses through structured reasoning.

Notes

Graph-PRefLexOR is a family of models fine-tuned with Group Relative Policy Optimization (GRPO)
Achieves 40-65% improvements over base models on materials science questions
Shows approximately 2-3 times greater semantic diversity than baselines
Test-time graph expansion primarily increases long-range conceptual recombination within a bounded semantic space

The paper introduces Graph-PRefLexOR, a family of graph-native reasoning models fine-tuned with Group Relative Policy Optimization (GRPO) to enhance scientific hypothesis generation. These models organize reasoning into explicit phases for mechanism exploration, graph construction, pattern extraction, and hypothesis synthesis. On materials science questions, Graph-PRefLexOR demonstrates significant improvements over base models in terms of traceability and semantic diversity, achieving up to 65% better performance. The model's test-time graph expansion primarily enhances long-range conceptual recombination within a bounded semantic space.

Read · Primary source

Surfaced on Hugging Face Daily Papers (3) · arXiv cs.AI · arXiv cond-mat

Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training

Thu, 02 Jul 2026 17:16:22 GMT

Why it matters: This research challenges the conventional approach to reinforcement learning adaptation in transformers by demonstrating that training a single layer can achieve similar results to full-parameter training.

Notes

Training a single transformer layer can match or exceed the gains of full-parameter RL training
Layer contribution measures quantify how much improvement a single layer provides when trained in isolation
High-contribution layers are consistently found in the middle of the transformer stack across different models and tasks
Observed patterns hold true for seven models, two model families (Qwen3, Qwen2.5), and three RL algorithms

This study investigates the distribution of reinforcement learning gains across transformer layers during post-training adaptation. It finds that training a single layer can recover most or even surpass the benefits of full-parameter RL training. The research introduces 'layer contribution' to measure the improvement from isolating individual layers, revealing a consistent pattern where high-contribution layers are concentrated in the middle of the stack, while input and output layers contribute less. This phenomenon is observed across various models, tasks, and reinforcement learning algorithms.

Read · Primary source

Surfaced on Hacker News (113) · 28c · arXiv cs.LG

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

Thu, 02 Jul 2026 00:03:33 GMT

Why it matters: This research introduces a novel method for improving large language model (LLM) self-assessment and uncertainty expression, which is crucial for enhancing trustworthiness and reliability in AI systems.

Notes

Reinforcement Learning with Metacognitive Feedback (RLMF) uses self-judgments to refine completion rankings during preference optimization
Metacognitive data selection identifies high-value training examples using similar self-judgments, outperforming naive active learning methods
The approach achieves state-of-the-art faithful calibration on diverse tasks while preserving accuracy
RLMF enhances models' ability to assess and express their own capability limits by up to 63% compared to standard RL

This paper presents a new method called Reinforcement Learning with Metacognitive Feedback (RLMF) aimed at improving large language model (LLM) self-assessment and uncertainty expression. The approach involves using metacognitive feedback to refine completion rankings during preference optimization, as well as identifying high-value training examples through metacognitive data selection. Extensive experiments demonstrate that RLMF achieves state-of-the-art faithful calibration on diverse tasks while maintaining accuracy, outperforming standard reinforcement learning by up to 63%. This method positions itself as a promising paradigm for enhancing LLM metacognition and alignment.

Read · Primary source

Surfaced on Hugging Face Daily Papers (17) · arXiv cs.CL

Multi-Block Diffusion Language Models

Wed, 01 Jul 2026 20:05:20 GMT

Why it matters: MBD-LMs offer significant improvements in text generation efficiency and accuracy through Multi-block Teacher Forcing and optimized decoding algorithms.

Notes

Proposes Multi-Block Diffusion Language Models (MBD-LMs) to extend Block Diffusion Language Models
Introduces Multi-block Teacher Forcing (MultiTF) for training MBD-LMs, improving inference states
Employs an optimized decoding algorithm with the Block Buffer mechanism to preserve prefix-cache reuse and maintain static input shapes
MBD-LLaDA2-Mini increases average Tokens Per Forward pass (TPF) from 3.47 to 6.19 and accuracy from 79.95% to 81.03%
Combining MBD-LLaDA2-Mini with DMax achieves an average TPF of 9.34 with only a 1.02% accuracy drop on math and code benchmarks

The article introduces Multi-Block Diffusion Language Models (MBD-LMs) as an extension to Block Diffusion Language Models, utilizing Multi-block Teacher Forcing (MultiTF) for better alignment between training and inference states. The proposed method includes an optimized decoding algorithm with the Block Buffer mechanism that enhances efficiency by preserving prefix-cache reuse and maintaining static input shapes. Empirical results show significant improvements in text generation performance: MBD-LLaDA2-Mini increases TPF from 3.47 to 6.19 and accuracy from 79.95% to 81.03%. When combined with DMax, the model achieves a TPF of 9.34 while maintaining near-zero accuracy loss.

Read · Primary source

Surfaced on Hugging Face Daily Papers (25) · arXiv cs.LG

BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding

Wed, 01 Jul 2026 19:18:25 GMT

Why it matters: BlockPilot introduces an adaptive policy for speculative decoding that significantly improves inference speed without compromising accuracy, making it a valuable tool for optimizing large language models.

Notes

Proposes BlockPilot, which predicts the optimal block size adaptively based on input representation
Achieves a 4.20x speedup on Qwen3-4B under temperature T=1
Reduces decision space to low-dimensional and structured for efficient policy learning
Introduces minimal overhead while improving efficiency in speculative decoding

BlockPilot is an instance-adaptive policy that predicts the optimal block size for diffusion-based speculative decoding from the prefilling representation. This approach reduces the problem to a simpler decision space, enabling significant speedups with minimal overhead. Experiments show BlockPilot achieves a 4.20x speedup on Qwen3-4B under temperature T=1 without affecting accuracy.

Read · Primary source

Surfaced on Hugging Face Daily Papers (67) · arXiv cs.CL

LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing

Tue, 30 Jun 2026 07:25:32 GMT

Why it matters: This research advances real-time video editing by enabling stable, high-fidelity edits suitable for AR and other interactive applications.

Notes

Three-stage distillation pipeline transfers editing capability from a bidirectional foundation model to an unidirectional streaming editor
AR-oriented mask cache reuses region-related computation across frames, reducing redundant processing and accelerating inference
Achieves state-of-the-art visual quality among streaming baselines
Inference speed boosted to 12.66 FPS
Suitable for interactive and augmented reality applications

The paper introduces LiveEdit, a novel framework for real-time video editing that addresses stability and latency issues through a three-stage distillation pipeline. This method transfers editing capabilities from a bidirectional foundation model to an efficient unidirectional streaming editor, ensuring stable long-term edits without compromising visual fidelity. Additionally, the use of an AR-oriented mask cache reduces redundant computation across frames, significantly accelerating inference speed to 12.66 FPS. The framework is evaluated and shown to achieve state-of-the-art visual quality while being suitable for interactive and augmented reality applications.

Read · Primary source

Surfaced on Hugging Face Daily Papers (72) · arXiv cs.CV

Quantum Generative Diffusion Model for Real-World Time Series

Mon, 29 Jun 2026 14:54:15 GMT

Why it matters: This work introduces the first quantum generative diffusion model for time series, demonstrating significant improvements in efficiency and performance compared to classical models.

Notes

QDiffusion-TS is the first quantum generative diffusion model for real-world time series synthesis
Validated on IQM quantum processor with financial time series data from Apple and Amazon
Reduces number of trainable parameters by nearly three orders of magnitude compared to classical models
Improves predictive performance up to 71% in RMSE over baseline trained solely on real data
Reduces Wasserstein distance by approximately 44% relative to its classical counterpart

The paper presents QDiffusion-TS, the first quantum generative diffusion model for time series synthesis. This hybrid quantum transformer replaces feed-forward components in a denoising transformer with quantum neural networks, significantly reducing the number of trainable parameters. When evaluated on financial data from Apple and Amazon, QDiffusion-TS generates synthetic data that more accurately reproduces real distributions, as measured by a 44% reduction in Wasserstein distance compared to classical models. Additionally, it improves predictive performance up to 71% in RMSE over baselines trained solely on real data.

Read · Primary source

Surfaced on arXiv cs.LG · arXiv quant-ph