Research

Academic or research source. Check the methodology, sample size, and whether it's been replicated.

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs

Reinforcement learning (RL) is widely used to improve large language models on reasoning tasks, and asynchronous RL training is attractive because it increases end-to-end throughput.

arXiv cs.LG · Feb 19, 2026 18:40 UTC · Paper: ~15 min

Read Original

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs

TLDR

Reinforcement learning (RL) is widely used to improve large language models on reasoning tasks, and asynchronous RL training is attractive because it increases end-to-end throughput.

Artifacts

Paper PDF

Open

O open S save B back M mode