Open-source rival for OpenAI’s DALL-E runs on your graphics card

OpenAI's open source rival for DALL-E runs on your graphics card

Image: stable diffusion

Der artikel kann nur mit aktiviertem JavaScript dargestellt werden. Bitte aktiviere JavaScript in deinem Browser and lade die Seite neu.

OpenAI’s DALL-E 2 is gaining free competition. Behind it is an open source AI movement and the startup Stability AI.

Artificial intelligence capable of generating images from text descriptions has made rapid progress since the beginning of 2021. At that time, OpenAI showed impressive results with DALL-E 1 and CLIP. The open source community has used CLIP for numerous alternative projects throughout the year. Then, in 2022, OpenAI released the impressive DALL-E 2, Google showed Imagen and Parti, Midjourney hit millions, and Craiyon flooded social media with AI imagery.

Startup Stability AI has now announced the release of Stable diffusionanother DALL-E 2-like system that will initially be made available gradually to new researchers and other groups via a Discord server.

After a test phase, Stable Diffusion will then be released for free: the code and a trained model will be published as open source. There will also be a hosted version with a web interface for users to test the system.

Stability AI funds competitor DALL-E 2 for free

Stable Diffusion is the result of a collaboration between Stability AI researchers, RunwayML, LMU Munich, EleutherAI and LAION. The EleutherAI research collective is known for its open source language models GPT-J-6B and GPT-NeoX-20B, among others, and is also conducting research on multimodal models.

The nonprofit organization LAION (Large-scale Artificial Intelligence Open Network) provided the training data with the open source LAION 5B dataset, which the team filtered with human feedback in an initial testing phase to create the set. of final LAION-esthetic training data.

Patrick Esser of Runway and Robin Rombach of LMU Munich led the project, building on their work in the CompVis group of the University of Heidelberg. There, they created widely used VQGAN and Latent Diffusion. The latter served as the basis for Stable Diffusion with research from OpenAI and Google Brain.

Stability AI, founded in 2020, is supported by mathematician and computer scientist Emad Mostaque. He worked as an analyst for various hedge funds for a few years before turning to public work. In 2019, you helped found Symmitree, a project that aims to lower the cost of smartphones and internet access for disadvantaged populations.

With Stability AI and his private fortune, Mostaque aims to foster the open source AI research community. His startup previously supported the creation of the “LAION 5B” dataset, for example. To train the stable diffusion model, Stability AI provided the servers with 4,000 Nvidia A100 GPUs.

“No one has any voting rights except our 75 employees – no billionaire, big money, government or anyone else in control of the company or communities we support. We are completely independent, “Mostaque told TechCrunch.” We plan to use our computation to accelerate fundamental and open source AI. “

Stable Diffusion is a cornerstone of open source

A stable deployment test is currently underway, with new additions being distributed in waves. The results, which can be seen on Twitter, for example, show that a real competitor DALL-E-2 is emerging here.

Stable Diffusion is more versatile than Midjourney, but has a lower resolution than DALL-E 2. | Image: Github

Unlike DALL-E 2, Stable Diffusion can generate images of important characters and other arguments that OpenAI prohibits in DALL-E 2. Other systems like Midjourney or Pixelz.ai can do this too, but they don’t achieve a quality comparable to the high diversity seen in Stable Diffusion – and none of the other systems are open source.

Stable Diffusion should already run on a single graphics card with 5.1 gigabytes of VRAM, taking AI technology to the limit that until now was only available through cloud services. Stable Diffusion thus offers researchers and stakeholders who do not have access to GPU servers the opportunity to experiment with a modern model of generative AI. The model is also expected to work on MacBooks with Apple’s M1 chip. However, the image generation here takes several minutes rather than seconds.

OpenAI’s DALL-E 2 gets an open source competition, led by an open source community and the startup Stability AI. | Image: Github

Stability AI itself also wants to allow companies to train their Stable Diffusion variant. Multimodal models are therefore following the path previously taken by large linguistic models: away from a single provider and towards the wide availability of numerous alternatives through open source.

Runway is already researching Stable Diffusion-enabled text-to-video editing.

Stable diffusion: Pandora’s box and net benefits

Of course, with open access and the ability to run the model on a widely available GPU, the opportunities for abuse increase dramatically.

“A percentage of people are just plain nasty and weird, but that’s humanity,” Mostaque said. “Indeed, we believe this technology will prevail and the patronizing and somewhat patronizing attitude of many AI enthusiasts is misleading into not trusting society.”

Mostaque points out, however, that free availability allows the community to develop countermeasures.

“We are taking significant security measures, including formulating state-of-the-art tools to help mitigate potential damage during release and our services. With hundreds of thousands of people developing on this model, we are confident that the net benefit will be immensely positive and as billions of people use this technology, the damage will be undone. ”

More information is available on Stable Diffusion github. You can find many examples of Stable Diffusion’s image generation capabilities on the Stable Diffusion subreddit. Go here for beta registration for Stable Diffusion.

Leave a Reply

Your email address will not be published. Required fields are marked *