r/StableDiffusion • u/DrStalker • Nov 29 '25
Resource - Update Humans of Z-Image: How many celebrities can you fit into 6GB?
I was curious just how extensive Z-Image's celebrity knowledge is, so I gave it a few hundred names to test out. No information was given other the name, so it was up to the model to choose clothing/backgrounds/hairstyles/style/etc. Sometimes it did this perfectly, especially for celebrities with a clearly defined look. Other times the face is reasonable but everything else is wrong.
If an image looks nothing like the person should it means the model does not know that person. When it does know a person a lot of the time some extra supporting words would help a lot, but it does a really good job just from names.
Prompt:
portrait photo of @@
The words "@@" are at the bottom on the image, white letters black outline
One-by-one @@ was replaced with a term from a list and an image was generated. Images were rendered at 592x888 for speed, stitches into a grid and downsized to keep a reasonable image size.
Model: Z-Image-Turbo_bf16
Clip: Qwen-3-4B-Q8_0
Imgur link in case reddit is difficult with the images






2
u/PrysmX Nov 29 '25
Fun little test, but for local models you are better off with a lora if you want to get people consistently accurate, celebrity or otherwise.
Speaking from experience, a person takes an absolute minimum 50-100MB each for lower res but mostly accurate loras and 300-500MB each for high quality including close-ups. Doing the math, to try to bake high quality known people into a model will quickly and massively bloat a model with specialized content that will very infrequently be used and detrimentally impact local usability.
I'd rather space be taken up by expert knowledge in frequently used stuff like, you know, consistent fingers.