Зарегистрируйтесь сейчас для лучшей персонализированной цитаты!

Новости по теме

I've tried lots of AI image generators, and Nvidia and MIT's is the one to beat for speed

Apr, 01, 2025 Hi-network.com
Dog on a colorful background
Sabrina Ortiz/ via HART

Since the release of DALL-E in 2021, the first AI image-generating model to popularize the tech, much progress has been made in the AI text-to-image generator space with improved quality, speed, and prompt adherence. However, even the fastest image generators typically take a couple of seconds to create an image -- except this one. 

Also: Apple's AI doctor will be ready to see you next spring

HART, short for Hybrid Autoregressive Transformer, is an AI text-to-image generator developed by MIT, Nvidia, and Tsinghua University. It features unprecedented speed and generations with 3.1 to 5.9 times lower latency than state-of-the-art diffusion models. The key difference? How HART was trained. 

Without getting too technical, instead of using a diffusion model, which is the training method employed by most popular AI image generators, including OpenAI's DALL-E and Google's Imagen 3, HART is an autoregressive (AR) visual generation model, the same as OpenAI's recently released GPT-4o image generator.

AR models offer more control over the final image by generating it step-by-step. However, training these models is costly, and the quality can suffer at higher resolutions. To improve this issue, researchers introduced a hybrid tokenizer that helps process different parts of the image more efficiently. The result: HART is faster and has a higher throughput than diffusion models. 

Also: Gartner to CIOs: Prepare to spend more money on generative AI

Since most AI models take at least a few seconds to generate images, which is impressively quick anyway, I didn't expect HART's speed to leave me very impressed. However, I was wrong. The model is accompanied by a stopwatch for timing each generation. After using the model a few times, I noticed it took 1.8 seconds to generate images. For context, that's how long it takes to say 'Mississippi.'

The same prompt I used to render the images at the top of the article took OpenAI's GPT-4o image generator one minute and 45 seconds and Google's Imagen 3 about 10 seconds. The quality of all three generators was comparable, with Google's image taking the lead, combining speed and quality the best. 

Dog in a clown hat AI image generation

Prompt: A dog wearing a clown hat on a colorful background. (Left to right: ChatGPT's 4o image model, Gemini's Imagen 3, HART.)

Sabrina Ortiz/ via ChatGPT/Gemini/HART

Despite Google's model's speed, it took Imagen 3 about 10 times longer than HART to generate the picture, which shows the pace of HART. I have tested most of the text-to-image models on the market, and HART is the quickest. 

Also: AI agents aren't just assistants: How they're changing the future of work today

If you want to try HART, you can access it for free here. The inference code is also open-sourced and accessible via a public GitHub repository, which developers, academics, or AI aficionados can use for further research on image generators. 

tag-icon Горячие метки: 3. Инновации

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.
Our company's operations and information are independent of the manufacturers' positions, nor a part of any listed trademarks company.