10 months after announcing of GPT-4o's omni model capabilities, OpenAI publicly released native image generation capabilities for GPT-4o. And to exceptional, albeit Gibli memed, fanfare.
What's different
Overall, GPT-4o image generation is less creative compared to Midjourney models, however, there are four breakthroughs with GPT-4o that differentiate it from all others:
-
Natural language prompting
-
Prompt coherence
-
Style transfer
-
Text generation
The first two improvements likely reduce creativity by eliminating the randomness and unexpected results that make generations feel novel.
OpenAI replaced comma-separated, word-smashed prompts with coherent description following. Thematically, coherence is ties this release together. Image to image generation maintains key elements despite style changes. Text prompts correctly renders with a variety of fonts.
Job replacement
GPT-4o is far from replacing graphic designers at American companies, especially when brand consistency matters. For small businesses that wouldn't hire a designer, GPT-4o is now handling many of those design tasks. In South East Asia, family restaurants use imgen for menus and other collateral.
Generated Examples
Here are some that highlight the range of imgen’s capabilities.