AI illustrator draws imaginative pictures to go with text captions

[ad_1]

AI-generated images

AI images generated from the text prompts тАЬa baby daikon radish in a tutu walking a dogтАЭ and тАЬan armchair in the shape of an avocadoтАЭ

OpenAI

A neural network uses text captions to create outlandish images тАУ such as armchairs in the shape of avocados тАУ demonstrating it understands how language shapes visual culture.

OpenAI, an artificial intelligence company that recently partnered with Microsoft, developed the neural network, which it calls DALL-E. It is a version of the companyтАЩs GPT-3 language model that can create expansive written works based on short text prompts, but DALL-E produces images instead.

тАЬThe world isnтАЩt just text,тАЭ says Ilya Sutskever, co-founder of OpenAI. тАЬHumans donтАЩt just talk: we also see. A lot of important context comes from looking.тАЭ

Advertisement


DALL-E is trained using a set of images already associated with text prompts, and then uses what it learns to try to build an appropriate image when given a new text prompt.

It does this by trying to understand the text prompt, then producing an appropriate image. It builds the image element-by-element based on what has been understood from the text. If it has been presented with parts of a pre-existing image alongside the text, it also considers the visual elements in that image.

тАЬWe can give the model a prompt, like тАШa pentagonal green clockтАЩ, and given the preceding [elements], the model is trying to predict the next one,тАЭ says Aditya Ramesh of OpenAI.

For instance, if given an image of the head of a T. rex, and the text prompt тАЬa T. rex wearing a tuxedoтАЭ, DALL-E can draw the body of the T. rex underneath the head and add appropriate clothing.

The neural network, which is described today on the OpenAI website, can trip up on poorly worded prompts and struggles to position objects relative to each other тАУ or to count.

тАЬThe more concepts that a system is able to sensibly blend together, the more likely the AI system both understands the semantics of the request and can demonstrate that understanding creatively,тАЭ says Mark Riedl at the Georgia Institute of Technology in the US.

тАЬIтАЩm not really sure how to define what creativity is,тАЭ says Ramesh, who admits he was impressed with the range of images DALL-E produced.

The model produces 512 images for each prompt, which are then filtered using a separate computer model developed by OpenAI, called CLIP, into what CLIP believes are the 32 тАЬbestтАЭ results.

CLIP is trained on 400 million images available online. тАЬWe find image-text pairs across the internet and train a system to predict which pieces of text will be paired with which images,тАЭ says Alec Radford of OpenAI, who developed CLIP.

тАЬThis is really impressive work,тАЭ says Serge Belongie at Cornell University, New York. He says further work is required to look at the ethical implications of such a model, such as the risk of creating completely faked images, for example ones involving real people.

Effie Le Moignan at Newcastle University, UK, also calls the work impressive. тАЬBut the thing with natural language is although itтАЩs clever, itтАЩs very cultural and context-appropriate,тАЭ she says.

For instance, Le Moignan wonders whether DALL-E, confronted by a request to produce an image of Admiral Nelson wearing gold lam├й pants, would put the military hero in leggings or underpants тАУ potential evidence of the gap between British and American English.

More on these topics:

[ad_2]

Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here