Skip to content

Instantly share code, notes, and snippets.

@duckythescientist
Last active February 22, 2024 03:38
Show Gist options
  • Save duckythescientist/fe8bfbfb5900ffed3b2835c29618da33 to your computer and use it in GitHub Desktop.
Save duckythescientist/fe8bfbfb5900ffed3b2835c29618da33 to your computer and use it in GitHub Desktop.
Stable Diffusion Image Generation Notes

Getting Started

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs#running-inside-docker

docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/Projects/stable_diffusion/dockerx:/dockerx rocm/pytorch

Edited two files as per AUTOMATIC1111/stable-diffusion-webui#11458 (comment)

pip install -r requirements_versions.txt

Lol crash.

Following runs will only require you to restart the container, attach to it again and execute the following inside the container: Find the container name from this listing: docker container ls --all, select the one matching the rocm/pytorch image, restart it: docker container restart <container-id> then attach to it: docker exec -it <container-id> bash

https://rentry.org/voldy <- super important guides

Furry Models? https://rentry.org/5exa3

https://civitai.com/ <- The BEST place for models, loras, images, example prompts.

Booru tags: https://danbooru.donmai.us/wiki_pages/tag_groups

In the web UI, go to Settings->User Interface->Quicksettings list and add sd_model_checkpoint, sd_vae, CLIP_stop_at_last_layers, face_restoration, and face_restoration_model.

Workflow to get started: find an image you like on civitai.com, throw it into PNG Info, send the parameters to txt2img, install any missing embeddings or loras, generate an image to make sure you are at least close, and then play around with the prompt. Great artists steal.

https://old.reddit.com/r/StableDiffusion/comments/x41n87/how_to_get_images_that_dont_suck_a/

https://stable-diffusion-art.com/samplers/#Evaluating_samplers

https://promptpedia.co/blog-detail/interactive-comparison-of-stable-diffusion-upscalers

Duck's profile: https://civitai.com/user/duckythescientist

RUNME

# container=$(docker container ls --all | grep rocm/pytorch | cut -f 1 -d ' ')

container="adoring_black"
docker container restart $container
docker exec -it $container bash

cd /dockerx/stable-diffusion-webui
export PYTORCH_HIP_ALLOC_CONF="garbage_collection_threshold:0.7,max_split_size_mb:512"
REQS_FILE='requirements.txt' python launch.py --precision full --no-half --medvram --opt-split-attention --opt-sub-quad-attention 

The PYTORCH_HIP_ALLOC_CONF settings decrease fragmentation and increase garbage collection which has been good for me since AMD's drivers are bad.

Knobs, Dials, and Settings to Twiddle

Prompt

The positive prompt is things you want in your image. The negative prompt is things you don't want in your image. You can use e.g. (some phrase or keywords, optionally more words:1.3) to increase the weight of some phrases from the default 1. You can also use something like 0.8 to deemphasize. E.g. blue eyes sometimes turns the whites blue, so using (blue eyes:0.6) can tame that.

Ordering does matter. Put the most important things first.

https://danbooru.donmai.us/wiki_pages/tag_groups has a tag hierarchy that works well with some models (especially for anime NSFW things).

Sampling method

DPM++ SDE Karras is good.

DPM++ 2M Karras is good (and faster). Idk how it's different.

Euler a can make good images, but it doesn't converge with higher sampling steps.

Clip Skip

Clip skip will skip some depth of keywords. Only applies to some models. I don't fully understand yet. Set to either 1 or 2.

Go to Settings->User Interface->Quicksettings list and make sure CLIP_stop_at_last_layers is in the list.

Sampling Steps

Sampling steps is the number of "denoising" passes. Some samplers need more than others. Some samplers converge, and some (mostly the "a" versions don't). 20-30 is good for most things. Some SDXL models only use 5.

CFG Scale

CFG scale is how strictly to follow the prompt. Small numbers are hallucinations, and large numbers are overly strict. 6-15 seem like a reasonableish range to play in. I keep finding myself around 7.

Batch Count and Batch Size

Batch size is how many to run at once (higher VRAM cost). Batch count is how many to do in a row (only time cost). I keep size at 1 and only tweak count.

Refiner

Refiner allows you to switch models part of the way through. I haven't had great luck. Mismatches make weird color garbage. I keep it off.

VAE

VAEs are somewhat of a refinement step that seems to mostly help with colors, vibrance, contrast. Certain VAEs play well with certain models. Mismatches make weird color garbage. Some models don't need a VAE.

Hires. fix

For making larger images than a model was natively trained for. This can also help fix some details e.g. weirdly directed eyes. It's slow and ram intensive. I usually have it off. But once I find an image I like, I recycle the seed and rerun with Hires.fix enabled.

Upscaler

4x_foolhardy_Remacri is what I usually use, but I had to install it myself.

Hires steps

Can set to 0 to match the sampling steps. I think I remember reading that you want this to be around half of the sampling steps.

Hires. fix Denoising strength

Todo. play around with this. 0.5-0.7 are reasonable. Bigger numbers cause more changes to the base image.

Face Restoration

Can use CodeFormer or GFPGAN to "improve" faces. I've had mixed results. GFPGAN works better for me. I usually keep it off.

LORAs

Loras are ways to teach a model new concepts, characters, styles, poses, themes, etc. They get added to the positive prompt (sometimes negative) and have an adjustable weight. Good Loras can have weight 1. Overtrained Loras tend to make weird artifacts at high weights, so 0.7 can be helpful. A couple Loras have much larger ranges. Some require keywords in the prompt to activate.

Browse CivitAI for these.

Noodling and quick notes:

  • BacheEqualDTST2 (character): bache, tnsfit
  • age_slider_anime_v2: use from -4 (young) to 4 (old)
  • fluffyrock-quality-tags: ???
  • badhandv4: neg embedding
  • add_detail: -2 to +2
  • LED_Glasses-000012: (CYBERPUNK GLASSES, FUTURISTIC LED GLASSES)
  • punk_v0.2: (punk), punk rock goth aesthetics, maybe overtrained?
  • Hide_da_painV2: harold,
  • imperfect: (imperfect, acne), for realistic skin blemishes,
  • Gary_Larson_Style: (a black and white far side comic strip illustration of XXXXXXX by Gary Larson), (a color far side comic strip illustration of XXXXXXX by Gary Larson)
  • scene: (poofy hair, eyeliner, emo, emo makeup, goth) but should work without tags,

Embeddings

Somehow??? joined text or ideas that can give whole ideas to prompts. Great for combined positive or negative prompts. E.g. bad-picture-chill-75v in your negative prompt magically makes your images better.

Check the model details for which embeddings work best with it.

Models

(If I note a pair of numbers, it's probably sampling steps, CFG scale.)

  • DreamshaperXL:

    • Steps 5 works well (4-7 suggested). Steps 15 gets weird skin and eye textures. Middle between may be more realistic. 11 and 13 seemed fine for more realistic skin.
    • CFG Scale 2 works well. 4 is getting a tad wonky, and more than that is bad.
    • (5, 2)
    • In general, it seems to ignore a lot of prompt tags and just does whatever it wants to do. Beautiful but unconfigurable.
    • Very realistic humans.
    • Totally just ignores half my prompt.
    • Wants higher resolutions.
    • No VAE (none I have so far work).
    • I'm not struggling with skin textures and faces.
  • DreamShaper_8_pruned:

    • Working nicely. (20 steps, 7 CFG)
    • Doesn't seem to respect tags that well.
    • Realistic-ish humans.
    • Respects tags better than DreamshaperXL
  • EasyFluffV11.2

    • Good if you like furries.
    • Works well with a Hires.fix upscale by 1.2 and 4x_foolhardy_Remacri
    • Using nai.vae mutes the colors but that's maybe a good thing.
    • Bad to use photo/camera style related prompts.
    • 512x512 may be too small??
    • Needs lora:fluffyrock-quality-tags with "best quality, high quality"
    • Booru tags.
    • Wants lora:hll6.3-fluff-a5b-style and then artist names. Just go here and find ones you like: https://rentry.org/HLL_LCM#a5aa5b
  • bluePencilXL

    • Good for stylized anime/manga looks.
    • Don't use with nai.vae.
    • Makes pretty images when using PicX/CyberRealistic-style prompts.
    • Can be close to photorealistic.
    • Doesn't use CLIP.
    • 28 and 6 working fine.
    • unaestheticXL or NegativeXL
  • abyssorangemix3AOM3

    • Good textures but definitely some weird artifacts.
    • More "realistic" anime images.
    • Evocative but falls apart when you look closely.
    • Needs a VAE (nai.vae.pt is good).
    • OOMs easily with Hires.fix.
  • NovelAI

    • I need to go back and look at tags because the last batch of images was garbage.
  • WaifuDiffusion

    • Also need to work on tags...
    • Some interesting results
  • PicX_real

    • 29 and 5 seem reasonable.
    • I had to drop the rez and upscale. Try again after a reboot?
    • These are gorgeous....
    • Does well to be upscaled slightly (by about 1.5)
  • CyberRealistic_v4.1BackToBasics

    • Use vae-ft-mse
    • Does well to be upscaled slightly (by about 1.5)
    • use bad-picture-chill-75v in neg prompt
    • Or use CyberRealistic_Negative-neg in neg prompt.
    • 28, 7
    • Likes to be 768 tall? 512 made wonky eyes.
    • Really needs the upscale to avoid bad eyes.
    • (analog photo:1.1), RAW Photograph, dslr, high quality, film grain, Fujifilm XT3, insane details, masterpiece, 8k, 35mm photograph, dslr, kodachrome, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage,
    • Can't quite handle a 768x768 with 2x Remacri upscale in 20GB.
  • CyberRealistic 2.5D Style

    • Suggested 20-30, 7
    • Clip 2,
    • vae-ft-mse
    • I'm really liking this one
  • HassakuHentaiModel_v13

    • Clip 1 or 2
    • Danbooru tags
    • No VAE,
    • (masterpiece, best quality), (worst quality, low quality) as prompts
    • badhandv4, bad-picture-chill-75v work well

Models by-the-numbers

A count of how many images I've kept for each model. It's not a perfect representation of how much I like each model as it disfavors models that I've only recently found.

grep -iRaPoh 'Model: .*?,' * | sort | uniq -c | sort -rn

    277 Model: cyberrealistic_v41BackToBasics,
    215 Model: picxReal_10,
    166 Model: EasyFluffV11.2,
    158 Model: bluePencilXL_v050,
    111 Model: abyssorangemix3AOM3_aom3a1b,
     53 Model: cyberrealistic25D_v10,
     47 Model: dreamshaperXL_turboDpmppSDE,
     43 Model: WaifuDiffusion-1-4-anime_e2,
     41 Model: DreamShaper_8_pruned,
     36 Model: hassakuHentaiModel_v13,
     22 Model: nai,
     14 Model: bluePencilXL_v310,
     12 Model: realisticStockPhoto_v10,
      8 Model: lazymixRealAmateur_v40,
      2 Model: v1-5-pruned-emaonly,
      1 Model: NovelAI_Leak,

Image-to-Image

Can be good for upscaling or for taking an example image and iterating on a theme.

A possible workflow: take an image, run "Interrogate CLIP" to get a basic prompt, fix up the prompt, add quality tags, add a good negative prompt (steal from an image you like for that model), and then run img2img generation.

Denoising (sampling steps 20, but IDK if that matters):

  • 0.65: mostly just vibes
  • 0.60: still vibes
  • 0.55: matches pose and most major details, artistic liberties
  • 0.50: matches pose and most major details, artistic liberties
  • 0.45: very close match to pose and some minor details, some artistic liberties, especially with the face
  • 0.40: very close match to pose and most minor details, some artistic liberties, especially with the face
  • 0.35: very close match to pose and most minor details, some artistic liberties, especially with the face
  • 0.30: does smoothing and refinements, still changes facial details, keeps most minor details (so much so that shitty photoshops still look like shitty photoshops)
  • 0.25: minor smoothing and refinements but not much change.

PNG Info

Lets you grab the generation parameters from an image and optionally send those parameters to other tabs. Amazingly useful. Lets you learn from other people's images. Lets you pick back up on something you were working on hours/days/weeks ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment