Research

Academic or research source. Check the methodology, sample size, and whether it's been replicated.

Speculative Speculative Decoding

Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a...

arXiv cs.LG · Mar 03, 2026 18:41 UTC · Paper: ~15 min

2-Minute Brief

According to arXiv cs.LG: Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a slower target model, and then verifying them in parallel with a single target model forward pass. However, speculative decoding itself relies on a sequential dependence between speculation and verification. We introduce speculative speculative decoding (SSD) to parallelize these operations. While a v

Read Original

Speculative Speculative Decoding

TLDR

Artifacts

Paper PDF

2-Minute Brief

According to arXiv cs.LG: Autoregressive decoding is bottlenecked by its sequential nature. Speculative decoding has become a standard way to accelerate inference by using a fast draft model to predict upcoming tokens from a slower target model, and then verifying them in parallel with a single target model forward pass. However, speculative decoding itself relies on a sequential dependence between speculation and verification. We introduce speculative speculative decoding (SSD) to parallelize these operations. While a v

Open

O open S save B back M mode