Skip to content
Mobrief

Olmo Hybrid and future LLM architectures

So-called hybrid architectures are far from new in open-weight models these days.

Interconnects (Nathan Lambert) · · ~4 min read
Research

Academic or research source. Check the methodology, sample size, and whether it's been replicated.

  • Major industry investment.
  • Interconnects (Nathan Lambert) now have the recent Qwen 3.5 (previewed by Qwen3-Next ), Kimi Linear last fall (a smaller release than their flagship Kimi K2 models ), Nvidia’s Nemotron 3 Nano (with…
  • This is one of those times when a research trend looks like it’s getting adopted everywhere at once (maybe the Muon optimizer too, soon?).

Context

Interconnects (Nathan Lambert) now have the recent Qwen 3.5 (previewed by Qwen3-Next ), Kimi Linear last fall (a smaller release than their flagship Kimi K2 models ), Nvidia’s Nemotron 3 Nano (with the bigger models expecting to drop soon), IBM Granite 4 , and other less notable models. This is one of those times when a research trend looks like it’s getting adopted everywhere at once (maybe the Muon optimizer too, soon?). To tell this story, Interconnects (Nathan Lambert) need to go back a few years to December 2023, when Mamba and Striped Hyena were taking the world by storm 1 — asking the question: Do Interconnects (Nathan Lambert) need full attention in Interconnects (Nathan Lambert)'s models? These early models fizzled out, partially for the same reasons they’re hard today — tricky implementations, open-source tool problems, more headaches in training — but also because the models fell over a bit when scaled up. The hybrid models of the day weren’t quite good enough yet. These models are called hybrid because they mix these new recurrent neural network (RNN) modules with the traditional attention that made the transformer famous. They all work best with this mix of modules.…

For builders

Interconnects (Nathan Lambert) now have the recent Qwen 3.5 (previewed by Qwen3-Next ), Kimi Linear last fall (a smaller release than their flagship Kimi K2 models ), Nvidia’s Nemotron 3 Nano (with…

Interconnects (Nathan Lambert) now have the recent Qwen 3.5 (previewed by Qwen3-Next ), Kimi Linear last fall (a smaller release than their flagship Kimi K2 models ), Nvidia’s Nemotron 3 Nano (with…

Read Original
Open
O open S save B back M mode