Source: Qubit
The strongest version of Stable Diffusion is here!
Just now, SDXL 1.0 is officially released, and you can try it online for free.
In terms of effect, whether it is a real-life blockbuster:
Surreal Panda Drinking Beer:
Or cyberpunk comics, both are very nice~
According to Stability AI, SDXL 1.0 produces more vivid and accurate colors, with enhancements in contrast, light and shadows, and can produce 1 megapixel images (1024×1024).
It also supports post-editing of generated images directly on the web page.
(It smells like a fight with Midjourney and Firefly)
Netizens can’t help but try it out~
Compared with the basic Stable Diffusion, SDXL 1.0 produces more accurate and realistic results.
And the official said that the prompt words can be simpler than before.
This is because the basic model parameters of SDXL 1.0 have reached 3.5 billion, and the understanding ability is stronger.
Compared with the basic version of Stable Diffusion, the number of parameters is only about 1 billion.
As a result, SDXL 1.0 has also become one of the current largest open image models. The official even directly said that this is the best open image model in the world.
Not much to say, let’s get started~
SDXL 1.0 is very convenient in operation, you can directly input, and you can also check the style and size parameters from the bottom.
The default is to generate 4 pictures at a time. If you are not satisfied, you can click the “plus sign” below to let it continue to draw.
According to the official introduction, now that SDXL 1.0 can generate masterpieces, there is no need to add the prompt word “masterpiece”.
Let it generate a Japanese-style image, the style of painting is very suitable, and the processing of light and shadow is more natural.
1990s anime low resolution screengrab couple walking away in street at night
Or a landscape photo? It can also be false.
Even let Musk stand in a Chinese courtyard, looking up at the sky…
Elon Musk in an ancient Chinese palace
Or did you buy Apple?
On “WWDC”, he held up the new iPhone to show people, surrounded by reporters and fans (doge).
Elon Musk releasing new iPhone at WWDC
In addition to the onion news of these old horses, the effect of painting style works is also good.
The one on the left is Qi Baishi’s ink style, while the one on the right is a caricature.
左:Elon Musk delivering a speech, ink painting, Qi Baishi style Right: Elon Musk comic
In addition to painting, you can also put old horses into our Yuan Qinghua.
Elon musk in the shape of Yuan Dynasty Blue and White Porcelain
The “old horse stories” shown above are relatively simple to use.
But from the effect, the crotch is not stretched because of the simple prompt words, which is also confirmed by the official statement.
But we still want to see, what would it look like if we used a more complex and refined one?
We found a rare photo of an old horse working as an auto repair worker in the Soviet Union created by Midjourney, and re-fed it to MJ to make it a new one.
MJ generated the following, which we used to test SDXL:
Elon Musk(这里MJ生成的是a man) standing in a workroom, in the style of industrial machinery aesthetics, deutscher werkbund, uniformly staged images, soviet, light indigo and dark bronze, new american color photography, detailed facial features
The style is exactly as we expected, and the details are not bad.
There is another complicated picture as the ending, which is also generated by MJ based on previous works and then fed to SDXL.
elon musk eating food with chopsticks, in the style of peter coulson, cross-processing/processed, pinhole photography, herb trimpe, james tissot, transavanguardia, spot metering
On the left is the work of SDXL, and on the right is the original version of MJ, you can compare it.
The prompt words also support Chinese, but it seems that the results will be inclined to the national style, and the accuracy may also be affected.
For example, after inputting “a tiger is at the seaside”, a girl of national style is accidentally given.
Everyone has seen the effect, so how fast is SDXL generating pictures?
For free users, the time is mainly spent queuing, but it won’t wait too long.
In 5.5 seconds, the queue was reduced from 160 to 99.
In addition to generating images, SDXL also provides many post-editing functions.
Specifically, it includes background removal, detail processing, frame enlargement, etc.
These functions are already available on the Clipdrop platform where SDXL is located, and SDXL can transfer the generated pictures to the corresponding module with one click.
Here we choose the background removal function to show it. It can be seen that there are almost no flaws in the details of the edges.
At present, the free usage quota of SDXL is still relatively high. After logging in, each account can generate 400 cards per day (need to queue).
The monthly payment is $9 per month, and the annual payment is equivalent to $7 (about 50 RMB) per month, including 1,500 SDXL quotas per day, and there is no need to queue.
However, the prices in different regions seem to be different. For example, the annual payment in Argentina is 742 pesos (about 19.4 yuan or 2.7 US dollars) per month.
Also, since the paid version is actually a Pro subscription to the Clipdrop platform, other features of the platform are also included.
In addition to Pro accounts, there are API versions available for developers (with access to Stability AI, Amazon, etc.).
In the latest blog, Stability AI introduces more technical details of SDXL 1.0.
First, the model breaks new ground in both scale and architecture.
It innovatively uses a base model + a refiner model.
The parameter sizes of the two are 3.5 billion and 6.6 billion respectively.
This also makes SDXL 1.0 one of the largest open graphics models available today.
Emad Mostaque, founder of Stability AI, said that a larger number of parameters can allow the model to understand more concepts and teach it deeper things.
At the same time, the RLHF enhancement was also carried out in the SDXL 0.9 version.
This is why SDXL 1.0 now supports short prompts, and can distinguish between the Red Square and a Red Square.
In the specific synthesis process, in the first step, the base model generates noisy latent, and then the refined model performs denoising.
The basic model can also be used as an independent module.
The combination of these two models can generate better quality images without consuming more computing resources.
According to the official introduction, SDXL 1.0 can run on a consumer-grade GPU with 8GB VRAM, or on the cloud.
In addition, SDXL 1.0 has also been improved in fine-tuning, and can generate custom LoRAs or checkpoints.
The Stability AI team is also now building a new generation of task-specific structured, styled and combined controls, with T2I/ControlNet specifically for SDXL.
However, these functions are still in the beta testing stage, and you can pay attention to the official updates in the future.
In summary, Vincent diagrams are a gradual iterative process, and the goal of SDXL 1.0 is to make this process easier.
Mostak said that now only 5-10 pictures are needed to quickly fine-tune the model.
It can also be seen from user feedback that SDXL 1.0 is more satisfying than Stable Diffusion.
In fact, since April this year, Stability AI has released the earliest beta version of SDXL.
Internal testing began in June, and version 0.9 was released some time ago. At that time, it was announced that an open version, the latest version 1.0, would be released in July. At present, the relevant code weights have been published on GitHub.
And the person in charge of Stability AI machine learning said that compared with SDXL 0.9, version 1.0 reduces the demand for computing power.
If you are interested in children’s shoes, go try them out~
Trial entry:
GitHub:
Reference link: [1] [2] [3]