<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Eclecta — research</title><description>Papers and results that change practice — machine learning, systems, and the sciences.</description><link>https://eclecta.co/</link><language>en-us</language><docs>https://eclecta.co/research/</docs><item><title>Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination</title><link>https://arxiv.org/abs/2607.00924</link><guid isPermaLink="true">https://arxiv.org/abs/2607.00924</guid><description>Graph-native reinforcement learning offers a pathway to more interpretable AI systems capable of generating scientifically valid hypotheses through structured reasoning.</description><pubDate>Thu, 02 Jul 2026 17:48:25 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; Graph-native reinforcement learning offers a pathway to more interpretable AI systems capable of generating scientifically valid hypotheses through structured reasoning.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Graph-PRefLexOR is a family of models fine-tuned with Group Relative Policy Optimization (GRPO)&lt;/li&gt;&lt;li&gt;Achieves 40-65% improvements over base models on materials science questions&lt;/li&gt;&lt;li&gt;Shows approximately 2-3 times greater semantic diversity than baselines&lt;/li&gt;&lt;li&gt;Test-time graph expansion primarily increases long-range conceptual recombination within a bounded semantic space&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The paper introduces Graph-PRefLexOR, a family of graph-native reasoning models fine-tuned with Group Relative Policy Optimization (GRPO) to enhance scientific hypothesis generation. These models organize reasoning into explicit phases for mechanism exploration, graph construction, pattern extraction, and hypothesis synthesis. On materials science questions, Graph-PRefLexOR demonstrates significant improvements over base models in terms of traceability and semantic diversity, achieving up to 65% better performance. The model&apos;s test-time graph expansion primarily enhances long-range conceptual recombination within a bounded semantic space.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Read&lt;/strong&gt; · &lt;a href=&quot;https://arxiv.org/abs/2607.00924&quot;&gt;Primary source&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Surfaced on&lt;/strong&gt; &lt;a href=&quot;https://huggingface.co/papers/2607.00924&quot;&gt;Hugging Face Daily Papers (3)&lt;/a&gt; · &lt;a href=&quot;https://arxiv.org/abs/2607.00924&quot;&gt;arXiv cs.AI&lt;/a&gt; · &lt;a href=&quot;https://arxiv.org/abs/2607.00924&quot;&gt;arXiv cond-mat&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training</title><link>https://arxiv.org/abs/2607.01232</link><guid isPermaLink="true">https://arxiv.org/abs/2607.01232</guid><description>This research challenges the conventional approach to reinforcement learning adaptation in transformers by demonstrating that training a single layer can achieve similar results to full-parameter training.</description><pubDate>Thu, 02 Jul 2026 17:16:22 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; This research challenges the conventional approach to reinforcement learning adaptation in transformers by demonstrating that training a single layer can achieve similar results to full-parameter training.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Training a single transformer layer can match or exceed the gains of full-parameter RL training&lt;/li&gt;&lt;li&gt;Layer contribution measures quantify how much improvement a single layer provides when trained in isolation&lt;/li&gt;&lt;li&gt;High-contribution layers are consistently found in the middle of the transformer stack across different models and tasks&lt;/li&gt;&lt;li&gt;Observed patterns hold true for seven models, two model families (Qwen3, Qwen2.5), and three RL algorithms&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;This study investigates the distribution of reinforcement learning gains across transformer layers during post-training adaptation. It finds that training a single layer can recover most or even surpass the benefits of full-parameter RL training. The research introduces &apos;layer contribution&apos; to measure the improvement from isolating individual layers, revealing a consistent pattern where high-contribution layers are concentrated in the middle of the stack, while input and output layers contribute less. This phenomenon is observed across various models, tasks, and reinforcement learning algorithms.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Read&lt;/strong&gt; · &lt;a href=&quot;https://arxiv.org/abs/2607.01232&quot;&gt;Primary source&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Surfaced on&lt;/strong&gt; &lt;a href=&quot;https://news.ycombinator.com/item?id=48760201&quot;&gt;Hacker News (113) · 28c&lt;/a&gt; · &lt;a href=&quot;https://arxiv.org/abs/2607.01232&quot;&gt;arXiv cs.LG&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs</title><link>https://arxiv.org/abs/2606.32032</link><guid isPermaLink="true">https://arxiv.org/abs/2606.32032</guid><description>This research introduces a novel method for improving large language model (LLM) self-assessment and uncertainty expression, which is crucial for enhancing trustworthiness and reliability in AI systems.</description><pubDate>Thu, 02 Jul 2026 00:03:33 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; This research introduces a novel method for improving large language model (LLM) self-assessment and uncertainty expression, which is crucial for enhancing trustworthiness and reliability in AI systems.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Reinforcement Learning with Metacognitive Feedback (RLMF) uses self-judgments to refine completion rankings during preference optimization&lt;/li&gt;&lt;li&gt;Metacognitive data selection identifies high-value training examples using similar self-judgments, outperforming naive active learning methods&lt;/li&gt;&lt;li&gt;The approach achieves state-of-the-art faithful calibration on diverse tasks while preserving accuracy&lt;/li&gt;&lt;li&gt;RLMF enhances models&apos; ability to assess and express their own capability limits by up to 63% compared to standard RL&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;This paper presents a new method called Reinforcement Learning with Metacognitive Feedback (RLMF) aimed at improving large language model (LLM) self-assessment and uncertainty expression. The approach involves using metacognitive feedback to refine completion rankings during preference optimization, as well as identifying high-value training examples through metacognitive data selection. Extensive experiments demonstrate that RLMF achieves state-of-the-art faithful calibration on diverse tasks while maintaining accuracy, outperforming standard reinforcement learning by up to 63%. This method positions itself as a promising paradigm for enhancing LLM metacognition and alignment.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Read&lt;/strong&gt; · &lt;a href=&quot;https://arxiv.org/abs/2606.32032&quot;&gt;Primary source&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Surfaced on&lt;/strong&gt; &lt;a href=&quot;https://huggingface.co/papers/2606.32032&quot;&gt;Hugging Face Daily Papers (17)&lt;/a&gt; · &lt;a href=&quot;https://arxiv.org/abs/2606.32032&quot;&gt;arXiv cs.CL&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Multi-Block Diffusion Language Models</title><link>https://arxiv.org/abs/2606.29215</link><guid isPermaLink="true">https://arxiv.org/abs/2606.29215</guid><description>MBD-LMs offer significant improvements in text generation efficiency and accuracy through Multi-block Teacher Forcing and optimized decoding algorithms.</description><pubDate>Wed, 01 Jul 2026 20:05:20 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; MBD-LMs offer significant improvements in text generation efficiency and accuracy through Multi-block Teacher Forcing and optimized decoding algorithms.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Proposes Multi-Block Diffusion Language Models (MBD-LMs) to extend Block Diffusion Language Models&lt;/li&gt;&lt;li&gt;Introduces Multi-block Teacher Forcing (MultiTF) for training MBD-LMs, improving inference states&lt;/li&gt;&lt;li&gt;Employs an optimized decoding algorithm with the Block Buffer mechanism to preserve prefix-cache reuse and maintain static input shapes&lt;/li&gt;&lt;li&gt;MBD-LLaDA2-Mini increases average Tokens Per Forward pass (TPF) from 3.47 to 6.19 and accuracy from 79.95% to 81.03%&lt;/li&gt;&lt;li&gt;Combining MBD-LLaDA2-Mini with DMax achieves an average TPF of 9.34 with only a 1.02% accuracy drop on math and code benchmarks&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The article introduces Multi-Block Diffusion Language Models (MBD-LMs) as an extension to Block Diffusion Language Models, utilizing Multi-block Teacher Forcing (MultiTF) for better alignment between training and inference states. The proposed method includes an optimized decoding algorithm with the Block Buffer mechanism that enhances efficiency by preserving prefix-cache reuse and maintaining static input shapes. Empirical results show significant improvements in text generation performance: MBD-LLaDA2-Mini increases TPF from 3.47 to 6.19 and accuracy from 79.95% to 81.03%. When combined with DMax, the model achieves a TPF of 9.34 while maintaining near-zero accuracy loss.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Read&lt;/strong&gt; · &lt;a href=&quot;https://arxiv.org/abs/2606.29215&quot;&gt;Primary source&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Surfaced on&lt;/strong&gt; &lt;a href=&quot;https://huggingface.co/papers/2606.29215&quot;&gt;Hugging Face Daily Papers (25)&lt;/a&gt; · &lt;a href=&quot;https://arxiv.org/abs/2606.29215&quot;&gt;arXiv cs.LG&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding</title><link>https://arxiv.org/abs/2606.31315</link><guid isPermaLink="true">https://arxiv.org/abs/2606.31315</guid><description>BlockPilot introduces an adaptive policy for speculative decoding that significantly improves inference speed without compromising accuracy, making it a valuable tool for optimizing large language models.</description><pubDate>Wed, 01 Jul 2026 19:18:25 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; BlockPilot introduces an adaptive policy for speculative decoding that significantly improves inference speed without compromising accuracy, making it a valuable tool for optimizing large language models.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Proposes BlockPilot, which predicts the optimal block size adaptively based on input representation&lt;/li&gt;&lt;li&gt;Achieves a 4.20x speedup on Qwen3-4B under temperature T=1&lt;/li&gt;&lt;li&gt;Reduces decision space to low-dimensional and structured for efficient policy learning&lt;/li&gt;&lt;li&gt;Introduces minimal overhead while improving efficiency in speculative decoding&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;BlockPilot is an instance-adaptive policy that predicts the optimal block size for diffusion-based speculative decoding from the prefilling representation. This approach reduces the problem to a simpler decision space, enabling significant speedups with minimal overhead. Experiments show BlockPilot achieves a 4.20x speedup on Qwen3-4B under temperature T=1 without affecting accuracy.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Read&lt;/strong&gt; · &lt;a href=&quot;https://arxiv.org/abs/2606.31315&quot;&gt;Primary source&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Surfaced on&lt;/strong&gt; &lt;a href=&quot;https://huggingface.co/papers/2606.31315&quot;&gt;Hugging Face Daily Papers (67)&lt;/a&gt; · &lt;a href=&quot;https://arxiv.org/abs/2606.31315&quot;&gt;arXiv cs.CL&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>LiveEdit: Towards Real-Time Diffusion-Based Streaming Video Editing</title><link>https://arxiv.org/abs/2606.26740</link><guid isPermaLink="true">https://arxiv.org/abs/2606.26740</guid><description>This research advances real-time video editing by enabling stable, high-fidelity edits suitable for AR and other interactive applications.</description><pubDate>Tue, 30 Jun 2026 07:25:32 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; This research advances real-time video editing by enabling stable, high-fidelity edits suitable for AR and other interactive applications.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Three-stage distillation pipeline transfers editing capability from a bidirectional foundation model to an unidirectional streaming editor&lt;/li&gt;&lt;li&gt;AR-oriented mask cache reuses region-related computation across frames, reducing redundant processing and accelerating inference&lt;/li&gt;&lt;li&gt;Achieves state-of-the-art visual quality among streaming baselines&lt;/li&gt;&lt;li&gt;Inference speed boosted to 12.66 FPS&lt;/li&gt;&lt;li&gt;Suitable for interactive and augmented reality applications&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The paper introduces LiveEdit, a novel framework for real-time video editing that addresses stability and latency issues through a three-stage distillation pipeline. This method transfers editing capabilities from a bidirectional foundation model to an efficient unidirectional streaming editor, ensuring stable long-term edits without compromising visual fidelity. Additionally, the use of an AR-oriented mask cache reduces redundant computation across frames, significantly accelerating inference speed to 12.66 FPS. The framework is evaluated and shown to achieve state-of-the-art visual quality while being suitable for interactive and augmented reality applications.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Read&lt;/strong&gt; · &lt;a href=&quot;https://arxiv.org/abs/2606.26740&quot;&gt;Primary source&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Surfaced on&lt;/strong&gt; &lt;a href=&quot;https://huggingface.co/papers/2606.26740&quot;&gt;Hugging Face Daily Papers (72)&lt;/a&gt; · &lt;a href=&quot;https://arxiv.org/abs/2606.26740&quot;&gt;arXiv cs.CV&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Quantum Generative Diffusion Model for Real-World Time Series</title><link>https://arxiv.org/abs/2606.27561</link><guid isPermaLink="true">https://arxiv.org/abs/2606.27561</guid><description>This work introduces the first quantum generative diffusion model for time series, demonstrating significant improvements in efficiency and performance compared to classical models.</description><pubDate>Mon, 29 Jun 2026 14:54:15 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt; This work introduces the first quantum generative diffusion model for time series, demonstrating significant improvements in efficiency and performance compared to classical models.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Notes&lt;/strong&gt;&lt;/p&gt;&lt;ul&gt;&lt;li&gt;QDiffusion-TS is the first quantum generative diffusion model for real-world time series synthesis&lt;/li&gt;&lt;li&gt;Validated on IQM quantum processor with financial time series data from Apple and Amazon&lt;/li&gt;&lt;li&gt;Reduces number of trainable parameters by nearly three orders of magnitude compared to classical models&lt;/li&gt;&lt;li&gt;Improves predictive performance up to 71% in RMSE over baseline trained solely on real data&lt;/li&gt;&lt;li&gt;Reduces Wasserstein distance by approximately 44% relative to its classical counterpart&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;The paper presents QDiffusion-TS, the first quantum generative diffusion model for time series synthesis. This hybrid quantum transformer replaces feed-forward components in a denoising transformer with quantum neural networks, significantly reducing the number of trainable parameters. When evaluated on financial data from Apple and Amazon, QDiffusion-TS generates synthetic data that more accurately reproduces real distributions, as measured by a 44% reduction in Wasserstein distance compared to classical models. Additionally, it improves predictive performance up to 71% in RMSE over baselines trained solely on real data.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Read&lt;/strong&gt; · &lt;a href=&quot;https://arxiv.org/abs/2606.27561&quot;&gt;Primary source&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Surfaced on&lt;/strong&gt; &lt;a href=&quot;https://arxiv.org/abs/2606.27561&quot;&gt;arXiv cs.LG&lt;/a&gt; · &lt;a href=&quot;https://arxiv.org/abs/2606.27561&quot;&gt;arXiv quant-ph&lt;/a&gt;&lt;/p&gt;</content:encoded></item></channel></rss>