OpenAI Unveils Better Image Generator, DALL-E 3, as AI Arms Race Deepens

The tool will be incorporated into ChatGPT, expanding the reach of the controversial technology.

The Washington Post

September 22, 2023

4 Min Read
OpenAI logo
Alamy

OpenAI on Wednesday began previewing a new version of its DALL-E tool, which creates images from written prompts, and announced plans to integrate it into its popular ChatGPT chatbot, increasing the reach of a controversial technology at a time when lawmakers are calling for more restraint.

The new tool, called DALL-E 3, offers improved understanding of users' commands and is better at rendering legible and coherent text into images, a well-known weakness in AI image generators. Language advances allow DALL-E 3 to parse complex instructions, rather than jumbling up elements of a detailed request, researchers said Tuesday, during a short demo.

"Casual users can log in to [the] chatbot and ask for something pretty vague," said Aditya Ramesh, head of the DALL-E 3 team, who shared a demo of a business owner testing out different logos for a business called Mountain Ramen.

While the new tool is available to a small group of users for early testing, it will be released to subscribers of the company's popular chatbot, ChatGPT, in October — potentially multiplying the number of people who interact with the technology.

The release comes amid challenges for the San Francisco start-up, as competitive pressure builds. Traffic to and monthly users of both DALL-E and OpenAI's flagship chatbot have slowed, as Google rushes a fleet of AI-driven products to users. But by integrating its novel image generator into ChatGPT, OpenAI is expanding its market and offering the technology as a feature to turbocharge its chatbot, rather than presenting the tool as a stand-alone product.

Related:What Is ChatGPT? How It Works and Best Uses for Chatbots

Reporters were unable to test the function during a news briefing because DALL-E 3 was "a little glitchy," said OpenAI's head of PR, Lindsey Head Bolton, but the company later wrote that it would be stable by Wednesday's launch.

Text-to-image generators such as DALL-E 2, Midjourney and Stable Diffusion entranced early adopters when they debuted last year — offering the public the ability to command advanced software, no technical skills required. Advertisers, marketers, politicians and video game makers have used the tools to build buzzy campaigns.

But monthly online visits to the DALL-E tool on desktop and mobile have slowed, falling from a spike of 32 million in March 2023, when OpenAI upgraded ChatGPT's underlying technology, to about 13 million in August, according to data from SimilarWeb, a data analytics firm.

Despite uncertainty over the future of text-to-image AI, the technology has proliferated with few guardrails — igniting concerns that the widespread ability to make realistic-looking images could have social and political repercussions.

Related:GitHub Copilot vs. ChatGPT: Which Tool Is Better for Software Development?

The garbled street signs and muddled text produced by older versions of the tool provided an easy tell for AI-generated images. DALL-E 3's improvements make it more difficult for a layperson to identify real photos.

"You're not going to be able to trust your eyes," said University of California at Berkeley Professor Hany Farid, who specializes in digital forensics and works with Adobe on its Content Authenticity Initiative.

But Farid emphasized that the DALL-E 3's improvements are not cause for alarm because AI gets better at mimicking the real world every six months or so. He called for advanced technology to root out human creations from artificial intelligence.

OpenAI's competitors, including Stability AI and Midjourney, are facing lawsuits from artists and Getty Images alleging that the vast web scrapes of internet data required to teach generative AI models constitute copyright theft.

Law enforcement, regulators and advocacy groups have recently zeroed in on the way these tools are being used to create deepfake nonconsensual pornography, child sexual abuse material and AI-generated ads for the upcoming presidential election.

The DALL-E 3 team said it prioritized these risks by inviting a "red team" of outside experts to test out the worst-case scenarios and then integrate what they learned into the company's mitigation strategies.

For DALL-E 2, OpenAI published a detailed synopsis of this process in a system card, a public account of how an AI model was developed, fine-tuned and safety-tested that functions as both a warning label and a nutrition label. Sandhini Agarwal, a policy researcher, said OpenAI plans to publish one for DALL-E 3 before the tool is open to the public.

As part of a voluntary White House pledge in June, OpenAI agreed to develop and deploy mechanisms to identify when visual or audio content is AI-generated, using methods such as watermarking an image or encoding provenance data to indicate the service or model that created the content. DALL-E 3 is experimenting with a classifier that looks at where an image came from or the content's "provenance," said Ramesh, a method mentioned in the White House commitments.

These type of mechanisms help identify deepfakes but also can help artists track whether their work was used without consent or compensation to train models, said Margaret Mitchell, a research scientist at Hugging Face and former co-lead of ethical AI at Google.

"That's not necessarily in the company's interests, but I think it's in the interest of the greater good," she said.

— Nitasha Tiku, The Washington Post

About the Author

The Washington Post

The latest technology news from The Washington Post.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like