Blog

How AI took over non fungible Art
Have you ever thought that behind the image you see below there is nothing but a neural network? To be precise a multimodal neural network, i.e. trained on texts and images. As a result, when it is pushed to create an image and a text suggestion is given, it creates exactly what we wrote. There are many text to image models created with artificial intelligence: from the American DALL-E and GLIDE by OpenAI, to the Russian ruDALL-E and the Chinese Cogview...

An overview of DALL-E four months later
It's been four months since the publication of the second version of the popular OpenAI DALL-E text-image model, and although a demo has not yet been released, a not inconsiderable number of people, including the researchers who signed the project itself, have started to share their results on Twitter (obviously generated with an internal demo), inviting the public to suggest new prompts to test the "together" capabilities of the new text-image flagship...

Deepmind's Transframer generates videos up to 30 seconds
In the last years many models for next frame prediction were introduced. These models are able to predict the next frame in a video, thereby creating short videos, such as fireworks and other animations.
A new model published by DeepMind, Google's AI society, promises to bring this field back to life with the introduction of Transframer, which combines the architecture of transformers and that of U-Net convolutional networks, achieving excellent results for several tasks, including image segmentation, image classification and video generation...

Behind DALL-E, StyleGAN Human generates people up to 1024 pixel
Researchers at the Chinese university of Nanyang have released a High-Resolution (512×1024 pixel) StyleGAN2 model that generates full-body humans. It is called StyleGAN human, and it has been trained on a dataset of 230 thousand images of male and female models in different poses and with different outfits. Actually back in 2019, Zalando Research had published an article about the generation of whole bodies trained with a large dataset (> 100 thousand images), but the checkpoints of the model had not been made public...