Research

Academic or research source. Check the methodology, sample size, and whether it's been replicated.

viable/strict/1772468940: [inductor] Add FMA lowering for add-with-alpha on CUDA

Eager CUDA computes a + alpha * b as fma(b, alpha, a) . Without this, Triton computes b * alpha then adds to a as separate operations, losing the FMA precision guarantee. This affects optimizer...

PyTorch Releases · Mar 02, 2026 11:11 UTC · README: ~3 min

2-Minute Brief

According to PyTorch Releases: Eager CUDA computes a + alpha * b as fma(b, alpha, a) . Without this, Triton computes b * alpha then adds to a as separate operations, losing the FMA precision guarantee. This affects optimizer weight_decay paths which use grad.add(param, alpha=weight_decay) and _foreach_add with alpha. Authored with Claude. Pull Request resolved: #175838 Approved by: https://github.com/v0i0 ghstack dependencies: #174912 , #175309 , #175310

Read Original

viable/strict/1772468940: [inductor] Add FMA lowering for add-with-alpha on CUDA

TLDR

Eager CUDA computes a + alpha * b as fma(b, alpha, a) . Without this, Triton computes b * alpha then adds to a as separate operations, losing the FMA precision guarantee. This affects optimizer...

Artifacts

Code

2-Minute Brief

According to PyTorch Releases: Eager CUDA computes a + alpha * b as fma(b, alpha, a) . Without this, Triton computes b * alpha then adds to a as separate operations, losing the FMA precision guarantee. This affects optimizer weight_decay paths which use grad.add(param, alpha=weight_decay) and _foreach_add with alpha. Authored with Claude. Pull Request resolved: #175838 Approved by: https://github.com/v0i0 ghstack dependencies: #174912 , #175309 , #175310

Open

O open S save B back M mode