FLUX on Replicate

FLUX has become significantly faster on Replicate, and they-ve open-sourced optimizations, allowing you to explore, understand, and build upon them.

Here are the end-to-end performance times:

FLUX.1 [schnell] at 512×512 with 4 steps: 0.29 seconds (90th percentile: 0.49 seconds)
FLUX.1 [schnell] at 1024×1024 with 4 steps: 0.72 seconds (90th percentile: 0.95 seconds)
FLUX.1 [dev] at 1024×1024 with 28 steps: 3.03 seconds (90th percentile: 3.90 seconds)

These measurements were taken from the west coast of the US using the Python client. Check out a demo of FLUX.1 [schnell] on Replicate below.

How Did We Achieve This?

While most models on Replicate come from the community, so they directly manage the FLUX models in collaboration with Black Forest Labs.

To speed up FLUX, team focused on two key improvements:

Model Optimization: We started with Alex Redden’s flux-fp8-api and further enhanced it using torch.compile alongside fast CuDNN attention kernels available in the nightly Torch builds.
New Synchronous HTTP API: This new API significantly boosts the speed of all image models on Replicate.

The quantization from flux-fp8-api does introduce slight variations in the model’s output, but we’ve found that it has minimal impact on the overall quality.

See the Difference for Yourself

We’ve built a tool that compares the output of thousands of prompts between FLUX.1 [schnell] and FLUX.1 [dev]. No cherry-picking—just transparent comparisons. You can explore the results yourself.

If you prefer, you can disable these optimizations by setting the go_fast input on the model to false.

Transparency in Model Optimization

We believe in being transparent about our optimization process. Comparing outputs between different models and providers is challenging, and it can be hard to know if optimizations are affecting model quality. That’s why we’re sharing our process openly and allowing you to turn off any optimizations. This way, you can be confident that you’re getting the best possible output.

And since our code is open-source, you can dive into it yourself: github.com/replicate/cog-flux.

Open-Source Should Be Fast, Too

Open-source models often come with performance challenges, while proprietary services keep their speed improvements behind closed doors. We aim to change this by making all our optimizations for FLUX open-source.

We’re collaborating with the AI Compiler Study Group and other researchers to create a fast, open-source version of FLUX. By doing this, we’re not just doing the right thing—we’re enabling experts from around the world to join forces and make it even faster. Contributions are always welcome.

Speeding Up Together

With new speed-enhancing techniques emerging constantly, our collaboration with the community ensures that the latest advancements reach Replicate as soon as possible. Stay tuned—there’s more to come!

Read related articles in our Blog.