Skip to content
Mobrief

Black Forest Labs' new Self-Flow technique makes training multimodal AI models 2.8x more efficient

To create coherent images or videos, generative AI diffusion models like Stable Diffusion or FLUX have typically relied on external "teachers"—frozen encoders like CLIP or DINOv2—to provide the semantic understanding…

VentureBeat · · ~3 min read
Tech Press

General tech coverage by VentureBeat. May simplify or sensationalize—check their sources.

  • Major industry investment.
  • To create coherent images or videos, generative AI diffusion models like Stable Diffusion or FLUX have typically relied on external "teachers"—frozen encoders like CLIP or DINOv2—to provide the…
  • But this reliance has come at a cost: a "bottleneck" where scaling up the model no longer yields better results because the external teacher has hit its limit.

Context

To create coherent images or videos, generative AI diffusion models like Stable Diffusion or FLUX have typically relied on external "teachers"—frozen encoders like CLIP or DINOv2—to provide the semantic understanding they couldn't learn on their own. But this reliance has come at a cost: a "bottleneck" where scaling up the model no longer yields better results because the external teacher has hit its limit. Today, German AI startup Black Forest Labs (maker of the FLUX series of AI image models) has announced a potential end to this era of academic borrowing with the release of Self-Flow , a self-supervised flow matching framework that allows models to learn representation and generation simultaneously. By integrating a novel Dual-Timestep Scheduling mechanism, Black Forest Labs has demonstrated that a single model can achieve state-of-the-art results across images, video, and audio without any external supervision. The technology: breaking the "semantic gap" The fundamental problem with traditional generative training is that it's a "denoising" task. The model is shown noise and asked to find an image; it has very little incentive to understand what the image is, only what it…

For builders

But this reliance has come at a cost: a "bottleneck" where scaling up the model no longer yields better results because the external teacher has hit its limit.

But this reliance has come at a cost: a "bottleneck" where scaling up the model no longer yields better results because the external teacher has hit its limit.

Read Original
Open
O open S save B back M mode