Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
jasoncorkillย 
posted an update 14 days ago
Post
2029
๐Ÿš€ First Benchmark of @OpenAI 's 4o Image Generation Model!

We've just completed the first-ever (to our knowledge) benchmarking of the new OpenAI 4o image generation model, and the results are impressive!

In our tests, OpenAI 4o image generation absolutely crushed leading competitors, including @black-forest-labs , @google , @xai-org , Ideogram, Recraft, and @deepseek-ai , in prompt alignment and coherence! They hold a gap of more than 20% to the nearest competitor in terms of Bradley-Terry score, the biggest we have seen since the beginning of the benchmark!

The benchmarks are based on 200k human responses collected through our API. However, the most challenging part wasn't the benchmarking itself, but generating and downloading the images:

- 5 hours to generate 1000 images (no API available yet)
- Just 10 minutes to set up and launch the benchmark
- Over 200,000 responses rapidly collected

While generating the images, we faced some hurdles that meant that we had to leave out certain parts of our prompt set. Particularly we observed that the OpenAI 4o model proactively refused to generate certain images:

๐Ÿšซ Styles of living artists: completely blocked
๐Ÿšซ Copyrighted characters (e.g., Darth Vader, Pokรฉmon): initially generated but subsequently blocked

Overall, OpenAI 4o stands out significantly in alignment and coherence, especially excelling in certain unusual prompts that have historically caused issues such as: 'A chair on a cat.' See the images for more examples!

The dataset will soon be published. In the meantime, check out the detailed benchmark here: ๐Ÿ‘‰ rapidata.ai/leaderboard/image-models

In this post