GPT-4o Image Gen | Stephen M. Walker II

10 months after announcing of GPT-4o's omni model capabilities, OpenAI publicly released native image generation capabilities for GPT-4o. And to exceptional, albeit Ghibli memed, fanfare.

What's different

Overall, GPT-4o image generation is less creative compared to Midjourney models, however, there are four breakthroughs with GPT-4o that differentiate it from all others:

Natural language prompting
Prompt coherence
Style transfer
Text generation

The first two improvements likely reduce creativity by eliminating the randomness and unexpected results that make generations feel novel.

OpenAI replaced comma-separated, word-smashed prompts with coherent description following. Thematically, coherence ties this release together. Image to image generation maintains key elements despite style changes. Text prompts correctly renders with a variety of fonts.

Job replacement

GPT-4o is far from replacing graphic designers at American companies, especially when brand consistency matters. For small businesses that wouldn't hire a designer, GPT-4o is now handling many of those design tasks. In South East Asia, family restaurants use imgen for menus and other collateral.

Generated Examples

Here are some that highlight the range of imgen's capabilities.

Futuristic cityscape with two galaxies

Top Gun inspired artwork upscaled

T-800 terminator born from chat interface

Moo Deng inspired Apple advertisement style

JP v Godzilla

Stylish cat in business suit

Protest sign holder artistic remix

Pierce Brosnan nostalgic gaming moment