Black Forest Labs' new Self-Flow technique makes training multimodal AI models 2.8x more efficient

To create coherent images or videos, generative AI diffusion models like Stable Diffusion or FLUX have typically relied on external "teachers"—frozen encoders like CLIP or DINOv2—to provide the...

VentureBeat · Mar 04, 2026 20:18 UTC · ~3 min read

Tech Press

General tech coverage by VentureBeat. May simplify or sensationalize—check their sources.

Key Takeaways

According to VentureBeat: To create coherent images or videos, generative AI diffusion models like Stable Diffusion or FLUX have typically relied on external "teachers"—frozen encoders like CLIP or DINOv2—to provide the semantic understanding they couldn't learn on their own. But this reliance has come at a cost: a "bottleneck" where scaling up the model no longer yields better results because the external teacher has hit its limit. Today, German AI startup Black Forest Labs (maker of the FLUX series of AI image models)

For Builders

But this reliance has come at a cost: a "bottleneck" where scaling up the model no longer yields better results because the external teacher has hit its limit.

Read Original