Primary Source

Official announcement from Nvidia. These are their claims—they have marketing incentives.

Unlock Massive Token Throughput with GPU Fractioning in NVIDIA Run:ai

As AI workloads scale, achieving high throughput, efficient resource usage, and predictable latency becomes essential.

NVIDIA Developer · Feb 18, 2026 18:00 UTC · ~2 min read

TLDR

As AI workloads scale, achieving high throughput, efficient resource usage, and predictable latency becomes essential.

O open S save B back M mode