MBD-LMs offer significant improvements in text generation efficiency and accuracy through Multi-block Teacher Forcing and optimized decoding algorithms.
Proposes Multi-Block Diffusion Language Models (MBD-LMs) to extend Block Diffusion Language Models
Introduces Multi-block Teacher Forcing (MultiTF) for training MBD-LMs, improving inference states
Employs an optimized decoding algorithm with the Block Buffer mechanism to preserve prefix-cache reuse and maintain static input shapes
Full summary
The article introduces Multi-Block Diffusion Language Models (MBD-LMs) as an extension to Block Diffusion Language Models, utilizing Multi-block Teacher Forcing (MultiTF) for better alignment between training and inference states. The proposed method includes an optimized decoding algorithm with the Block Buffer mechanism that enhances efficiency by preserving prefix-cache reuse and maintaining static input shapes. Empirical results show significant improvements in text generation performance: MBD-LLaDA2-Mini increases TPF from 3.47 to 6.19 and accuracy from 79.95% to 81.03%. When combined with DMax, the model achieves a TPF of 9.34 while maintaining near-zero accuracy loss.
BlockPilot introduces an adaptive policy for speculative decoding that significantly improves inference speed without compromising accuracy, making it a valuable tool for optimizing large language models.
Details
Proposes BlockPilot, which predicts the optimal block size adaptively based on input representation
Achieves a 4.20x speedup on Qwen3-4B under temperature T=1
Reduces decision space to low-dimensional and structured for efficient policy learning
BlockPilot is an instance-adaptive policy that predicts the optimal block size for diffusion-based speculative decoding from the prefilling representation. This approach reduces the problem to a simpler decision space, enabling significant speedups with minimal overhead. Experiments show BlockPilot achieves a 4.20x speedup on Qwen3-4B under temperature T=1 without affecting accuracy.
This research advances real-time video editing by enabling stable, high-fidelity edits suitable for AR and other interactive applications.
Details
Three-stage distillation pipeline transfers editing capability from a bidirectional foundation model to an unidirectional streaming editor
AR-oriented mask cache reuses region-related computation across frames, reducing redundant processing and accelerating inference
Achieves state-of-the-art visual quality among streaming baselines
The paper introduces LiveEdit, a novel framework for real-time video editing that addresses stability and latency issues through a three-stage distillation pipeline. This method transfers editing capabilities from a bidirectional foundation model to an efficient unidirectional streaming editor, ensuring stable long-term edits without compromising visual fidelity. Additionally, the use of an AR-oriented mask cache reduces redundant computation across frames, significantly accelerating inference speed to 12.66 FPS. The framework is evaluated and shown to achieve state-of-the-art visual quality while being suitable for interactive and augmented reality applications.
This work introduces the first quantum generative diffusion model for time series, demonstrating significant improvements in efficiency and performance compared to classical models.
Details
QDiffusion-TS is the first quantum generative diffusion model for real-world time series synthesis
Validated on IQM quantum processor with financial time series data from Apple and Amazon
Reduces number of trainable parameters by nearly three orders of magnitude compared to classical models
The paper presents QDiffusion-TS, the first quantum generative diffusion model for time series synthesis. This hybrid quantum transformer replaces feed-forward components in a denoising transformer with quantum neural networks, significantly reducing the number of trainable parameters. When evaluated on financial data from Apple and Amazon, QDiffusion-TS generates synthetic data that more accurately reproduces real distributions, as measured by a 44% reduction in Wasserstein distance compared to classical models. Additionally, it improves predictive performance up to 71% in RMSE over baselines trained solely on real data.