Prompt: Can you generate a realistic colorful image of dog wearing a suit on the street in 16:9 ratio
OpenAI may have kicked off the text-to-image generation craze with its DALL-E model, but since those earlier glory days, the AI company's offering has been lapped by much more capable image models. As a result, when OpenAI released its latest and greatest GPT-4o image generation model, I was skeptical. After testing it, I have changed my mind entirely.
When DALL-E first launched, it lived on its standalone website; since then, it has moved to ChatGPT. The move came with many benefits, including the ability to ask the AI chatbot for an image you want in the same interface where you are already chatting about something else, thereby eliminating the need for constant context switching.
With the release of GPT-4o image generation, OpenAI kept this convenient format, switching the default image generator from DALL-E to GPT-4o for paid subscribers. As a result, it was super easy to start creating new images from my ChatGPT Plus account. All I had to do was enter the prompt for what I wanted to see, and then it generated them. Users can also access it from the Sora interface.
Also: How to use OpenAI's Sora to create stunning AI-generated videos
You can also generate images if you are a free user. At launch, the model was announced to be coming to all users, including free ones, but then OpenAI CEO Sam Altman announced a day later that the rollout to the free tier would now be "delayed for awhile," only to make it available to free users again a week later.
However, if you are unimpressed when you try it in the free version, it is because the only method that activates the use of GPT-4o is typing in the shortcut "/create image." If you simply type a request such as "Create an image of XYZ," it will default to the DALL-E model, which renders significantly lower-quality photos. OpenAI does not explicitly state limits, but after generating three images from my free account, I hit my daily limit. Therefore, ChatGPT Plus is still a good option for higher access to image generation.
The moment you have been waiting for -- the images. After you insert a prompt, the AI outputs the generation in under a minute. The process does take a bit longer than it used to, but the images are worth the wait, delivering lots of details, texture, realism, and even text accuracy. Instead of describing it, I will include examples below so you can see for yourself.
Prompt: Can you generate a realistic image of a chameleon, up close, shot as if it were in National Geographic in 16:9 ratio?
Prompt: Can you generate an image of a laptop open on a desk that says, "This model is so good that it can even get text and hands right, which are usually major challenges for AI models," with hands typing on a keyboard in 16:9 ratio?
Prompt: Can you generate a realistic photo of a close-up of a woman in a crowd in Times Square looking at the camera and smiling, with the quality of one taken on a DSLR?
As seen above, the image generator does a great job of adhering to the prompt and delivering high-quality, realistic images. However, when testing an AI model, one of the true performance metrics is how it compares to competitors on the market. To give you a good indicator of this, I made it generate the same prompt I tested across all of the major AI image generators, including Midjourney, Google's Imagen 3, Adobe Firefly, and more.
I am attaching GPT-4o's rendition below. You can see how it fares against all of the other AI image generators in this article, including DALL-E's rendition, which clearly is far behind what the new model can do.
Prompt: Can you generate an image of a vibrant, realistic hummingbird perched on a tree?
Even though the quality of the images is perhaps one of the model's biggest wins, there are other benefits as well. One of the biggest is that it lives in the chatbot's interface, which makes it easy to tweak the generations with simple natural language prompts. Also, because the chatbot has the context of what you just asked it, it can consider that in building the image.
For example, if you are chatting with it about throwing a birthday party, you may be able to say, "Can you now create an invite that has the information above on it?" instead of having to retype. For example, I started chatting with ChatGPT about throwing a housewarming, and when asking it to create an invite, I did not have to repeat the information I previously provided.
You can also upload reference images and then ask ChatGPT to create a different version or use them as elements of a new one. For example, you can input it as a selfie and have it generated in anime style, as seen in Altman's new X post.
changed my pfp but maybe someone will make me a better one