Image: stable diffusion
Der artikel kann nur mit aktiviertem JavaScript dargestellt werden. Bitte aktiviere JavaScript in deinem Browser and lade die Seite neu.
OpenAI’s DALL-E 2 is gaining free competition. Behind it is an open source AI movement and the startup Stability AI.
Artificial intelligence capable of generating images from text descriptions has made rapid progress since the beginning of 2021. At that time, OpenAI showed impressive results with DALL-E 1 and CLIP. The open source community has used CLIP for numerous alternative projects throughout the year. Then, in 2022, OpenAI released the impressive DALL-E 2, Google showed Imagen and Parti, Midjourney hit millions, and Craiyon flooded social media with AI imagery.
Startup Stability AI has now announced the release of Stable diffusionanother DALL-E 2-like system that will initially be made available gradually to new researchers and other groups via a Discord server.
After a test phase, Stable Diffusion will then be released for free: the code and a trained model will be published as open source. There will also be a hosted version with a web interface for users to test the system.
Stability AI funds competitor DALL-E 2 for free
Stable Diffusion is the result of a collaboration between Stability AI researchers, RunwayML, LMU Munich, EleutherAI and LAION. The EleutherAI research collective is known for its open source language models GPT-J-6B and GPT-NeoX-20B, among others, and is also conducting research on multimodal models.
The nonprofit organization LAION (Large-scale Artificial Intelligence Open Network) provided the training data with the open source LAION 5B dataset, which the team filtered with human feedback in an initial testing phase to create the set. of final LAION-esthetic training data.
Patrick Esser of Runway and Robin Rombach of LMU Munich led the project, building on their work in the CompVis group of the University of Heidelberg. There, they created widely used VQGAN and Latent Diffusion. The latter served as the basis for Stable Diffusion with research from OpenAI and Google Brain.
“Robot jazz”. by TheRealBissy # StableDiffusion #AIA Art #AIArtwork @Stable broadcast pic.twitter.com/V6hBWZUuM9
– Stable diffusion photo (@DiffusionPics) August 14, 2022
Stability AI, founded in 2020, is supported by mathematician and computer scientist Emad Mostaque. He worked as an analyst for various hedge funds for a few years before turning to public work. In 2019, you helped found Symmitree, a project that aims to lower the cost of smartphones and internet access for disadvantaged populations.
With Stability AI and his private fortune, Mostaque aims to foster the open source AI research community. His startup previously supported the creation of the “LAION 5B” dataset, for example. To train the stable diffusion model, Stability AI provided the servers with 4,000 Nvidia A100 GPUs.
“No one has any voting rights except our 75 employees – no billionaire, big money, government or anyone else in control of the company or communities we support. We are completely independent, “Mostaque told TechCrunch.” We plan to use our computation to accelerate fundamental and open source AI. “
Stable Diffusion is a cornerstone of open source
A stable deployment test is currently underway, with new additions being distributed in waves. The results, which can be seen on Twitter, for example, show that a real competitor DALL-E-2 is emerging here.
Unlike DALL-E 2, Stable Diffusion can generate images of important characters and other arguments that OpenAI prohibits in DALL-E 2. Other systems like Midjourney or Pixelz.ai can do this too, but they don’t achieve a quality comparable to the high diversity seen in Stable Diffusion – and none of the other systems are open source.
Turns out #steatable broadcast it can do some really cool tweens between text prompts if you fix the initialization noise and slerp between the prompt’s conditioning vectors: pic.twitter.com/lWOoETYVZ3
– Xander Steenbrugge (@xsteenbrugge) August 7, 2022
Stable Diffusion should already run on a single graphics card with 5.1 gigabytes of VRAM, taking AI technology to the limit that until now was only available through cloud services. Stable Diffusion thus offers researchers and stakeholders who do not have access to GPU servers the opportunity to experiment with a modern model of generative AI. The model is also expected to work on MacBooks with Apple’s M1 chip. However, the image generation here takes several minutes rather than seconds.
Stability AI itself also wants to allow companies to train their Stable Diffusion variant. Multimodal models are therefore following the path previously taken by large linguistic models: away from a single provider and towards the wide availability of numerous alternatives through open source.
Runway is already researching Stable Diffusion-enabled text-to-video editing.
#steatable broadcast text-to-image checkpoints are now available for on-demand research purposes at https://t.co/7SFUVKoUdl
Work on a more permissive release and paint the checkpoints.
Soon ™ coming soon @pistaml for text-to-video editing pic.twitter.com/7XVKydxTeD
– Patrick Esser (@pess_r) 11 August 2022
Stable diffusion: Pandora’s box and net benefits
Of course, with open access and the ability to run the model on a widely available GPU, the opportunities for abuse increase dramatically.
“A percentage of people are just plain nasty and weird, but that’s humanity,” Mostaque said. “Indeed, we believe this technology will prevail and the patronizing and somewhat patronizing attitude of many AI enthusiasts is misleading into not trusting society.”
Mostaque points out, however, that free availability allows the community to develop countermeasures.
“We are taking significant security measures, including formulating state-of-the-art tools to help mitigate potential damage during release and our services. With hundreds of thousands of people developing on this model, we are confident that the net benefit will be immensely positive and as billions of people use this technology, the damage will be undone. ”
More information is available on Stable Diffusion github. You can find many examples of Stable Diffusion’s image generation capabilities on the Stable Diffusion subreddit. Go here for beta registration for Stable Diffusion.
Note: Links to online shops in articles can be so-called affiliate links. If you buy via this link, MIXED receives a commission from the provider. For you the price does not change.