docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $HOME/Projects/stable_diffusion/dockerx:/dockerx rocm/pytorch
Edited two files as per AUTOMATIC1111/stable-diffusion-webui#11458 (comment)
pip install -r requirements_versions.txt
Lol crash.
Following runs will only require you to restart the container, attach to it again and execute the following inside the container: Find the container name from this listing: docker container ls --all
, select the one matching the rocm/pytorch image, restart it: docker container restart <container-id>
then attach to it: docker exec -it <container-id> bash
https://rentry.org/voldy <- super important guides
Furry Models? https://rentry.org/5exa3
https://civitai.com/ <- The BEST place for models, loras, images, example prompts.
Booru tags: https://danbooru.donmai.us/wiki_pages/tag_groups
In the web UI, go to Settings->User Interface->Quicksettings list
and add sd_model_checkpoint
, sd_vae
, CLIP_stop_at_last_layers
, face_restoration
, and face_restoration_model
.
Workflow to get started: find an image you like on civitai.com, throw it into PNG Info, send the parameters to txt2img, install any missing embeddings or loras, generate an image to make sure you are at least close, and then play around with the prompt. Great artists steal.
https://old.reddit.com/r/StableDiffusion/comments/x41n87/how_to_get_images_that_dont_suck_a/
https://stable-diffusion-art.com/samplers/#Evaluating_samplers
https://promptpedia.co/blog-detail/interactive-comparison-of-stable-diffusion-upscalers
Duck's profile: https://civitai.com/user/duckythescientist
# container=$(docker container ls --all | grep rocm/pytorch | cut -f 1 -d ' ')
container="adoring_black"
docker container restart $container
docker exec -it $container bash
cd /dockerx/stable-diffusion-webui
export PYTORCH_HIP_ALLOC_CONF="garbage_collection_threshold:0.7,max_split_size_mb:512"
REQS_FILE='requirements.txt' python launch.py --precision full --no-half --medvram --opt-split-attention --opt-sub-quad-attention
The PYTORCH_HIP_ALLOC_CONF
settings decrease fragmentation and increase garbage collection which has been good for me since AMD's drivers are bad.
The positive prompt is things you want in your image. The negative prompt is things you don't want in your image. You can use e.g. (some phrase or keywords, optionally more words:1.3)
to increase the weight of some phrases from the default 1
. You can also use something like 0.8
to deemphasize. E.g. blue eyes
sometimes turns the whites blue, so using (blue eyes:0.6)
can tame that.
Ordering does matter. Put the most important things first.
https://danbooru.donmai.us/wiki_pages/tag_groups has a tag hierarchy that works well with some models (especially for anime NSFW things).
DPM++ SDE Karras
is good.
DPM++ 2M Karras
is good (and faster). Idk how it's different.
Euler a
can make good images, but it doesn't converge with higher sampling steps.
Clip skip will skip some depth of keywords. Only applies to some models. I don't fully understand yet. Set to either 1 or 2.
Go to Settings->User Interface->Quicksettings list
and make sure CLIP_stop_at_last_layers
is in the list.
Sampling steps is the number of "denoising" passes. Some samplers need more than others. Some samplers converge, and some (mostly the "a" versions don't). 20-30 is good for most things. Some SDXL models only use 5.
CFG scale is how strictly to follow the prompt. Small numbers are hallucinations, and large numbers are overly strict. 6-15 seem like a reasonableish range to play in. I keep finding myself around 7.
Batch size is how many to run at once (higher VRAM cost). Batch count is how many to do in a row (only time cost). I keep size at 1 and only tweak count.
Refiner allows you to switch models part of the way through. I haven't had great luck. Mismatches make weird color garbage. I keep it off.
VAEs are somewhat of a refinement step that seems to mostly help with colors, vibrance, contrast. Certain VAEs play well with certain models. Mismatches make weird color garbage. Some models don't need a VAE.
For making larger images than a model was natively trained for. This can also help fix some details e.g. weirdly directed eyes. It's slow and ram intensive. I usually have it off. But once I find an image I like, I recycle the seed and rerun with Hires.fix enabled.
4x_foolhardy_Remacri is what I usually use, but I had to install it myself.
Can set to 0 to match the sampling steps. I think I remember reading that you want this to be around half of the sampling steps.
Todo. play around with this. 0.5-0.7 are reasonable. Bigger numbers cause more changes to the base image.
Can use CodeFormer or GFPGAN to "improve" faces. I've had mixed results. GFPGAN works better for me. I usually keep it off.
Loras are ways to teach a model new concepts, characters, styles, poses, themes, etc. They get added to the positive prompt (sometimes negative) and have an adjustable weight. Good Loras can have weight 1. Overtrained Loras tend to make weird artifacts at high weights, so 0.7 can be helpful. A couple Loras have much larger ranges. Some require keywords in the prompt to activate.
Browse CivitAI for these.
Noodling and quick notes:
- BacheEqualDTST2 (character): bache, tnsfit
- age_slider_anime_v2: use from -4 (young) to 4 (old)
- fluffyrock-quality-tags: ???
- badhandv4: neg embedding
- add_detail: -2 to +2
- LED_Glasses-000012: (CYBERPUNK GLASSES, FUTURISTIC LED GLASSES)
- punk_v0.2: (punk), punk rock goth aesthetics, maybe overtrained?
- Hide_da_painV2: harold,
- imperfect: (imperfect, acne), for realistic skin blemishes,
- Gary_Larson_Style: (a black and white far side comic strip illustration of XXXXXXX by Gary Larson), (a color far side comic strip illustration of XXXXXXX by Gary Larson)
- scene: (poofy hair, eyeliner, emo, emo makeup, goth) but should work without tags,
Somehow??? joined text or ideas that can give whole ideas to prompts. Great for combined positive or negative prompts. E.g. bad-picture-chill-75v in your negative prompt magically makes your images better.
Check the model details for which embeddings work best with it.
(If I note a pair of numbers, it's probably sampling steps, CFG scale.)
-
DreamshaperXL:
- Steps 5 works well (4-7 suggested). Steps 15 gets weird skin and eye textures. Middle between may be more realistic. 11 and 13 seemed fine for more realistic skin.
- CFG Scale 2 works well. 4 is getting a tad wonky, and more than that is bad.
- (5, 2)
- In general, it seems to ignore a lot of prompt tags and just does whatever it wants to do. Beautiful but unconfigurable.
- Very realistic humans.
- Totally just ignores half my prompt.
- Wants higher resolutions.
- No VAE (none I have so far work).
- I'm not struggling with skin textures and faces.
-
DreamShaper_8_pruned:
- Working nicely. (20 steps, 7 CFG)
- Doesn't seem to respect tags that well.
- Realistic-ish humans.
- Respects tags better than DreamshaperXL
-
EasyFluffV11.2
- Good if you like furries.
- Works well with a Hires.fix upscale by 1.2 and 4x_foolhardy_Remacri
- Using nai.vae mutes the colors but that's maybe a good thing.
- Bad to use photo/camera style related prompts.
- 512x512 may be too small??
- Needs lora:fluffyrock-quality-tags with "best quality, high quality"
- Booru tags.
- Wants lora:hll6.3-fluff-a5b-style and then artist names. Just go here and find ones you like: https://rentry.org/HLL_LCM#a5aa5b
-
bluePencilXL
- Good for stylized anime/manga looks.
- Don't use with nai.vae.
- Makes pretty images when using PicX/CyberRealistic-style prompts.
- Can be close to photorealistic.
- Doesn't use CLIP.
- 28 and 6 working fine.
- unaestheticXL or NegativeXL
-
abyssorangemix3AOM3
- Good textures but definitely some weird artifacts.
- More "realistic" anime images.
- Evocative but falls apart when you look closely.
- Needs a VAE (nai.vae.pt is good).
- OOMs easily with Hires.fix.
-
NovelAI
- I need to go back and look at tags because the last batch of images was garbage.
-
WaifuDiffusion
- Also need to work on tags...
- Some interesting results
-
PicX_real
- 29 and 5 seem reasonable.
- I had to drop the rez and upscale. Try again after a reboot?
- These are gorgeous....
- Does well to be upscaled slightly (by about 1.5)
-
CyberRealistic_v4.1BackToBasics
- Use vae-ft-mse
- Does well to be upscaled slightly (by about 1.5)
- use bad-picture-chill-75v in neg prompt
- Or use CyberRealistic_Negative-neg in neg prompt.
- 28, 7
- Likes to be 768 tall? 512 made wonky eyes.
- Really needs the upscale to avoid bad eyes.
- (analog photo:1.1), RAW Photograph, dslr, high quality, film grain, Fujifilm XT3, insane details, masterpiece, 8k, 35mm photograph, dslr, kodachrome, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage,
- Can't quite handle a 768x768 with 2x Remacri upscale in 20GB.
-
CyberRealistic 2.5D Style
- Suggested 20-30, 7
- Clip 2,
- vae-ft-mse
- I'm really liking this one
-
HassakuHentaiModel_v13
- Clip 1 or 2
- Danbooru tags
- No VAE,
- (masterpiece, best quality), (worst quality, low quality) as prompts
- badhandv4, bad-picture-chill-75v work well
A count of how many images I've kept for each model. It's not a perfect representation of how much I like each model as it disfavors models that I've only recently found.
grep -iRaPoh 'Model: .*?,' * | sort | uniq -c | sort -rn
277 Model: cyberrealistic_v41BackToBasics,
215 Model: picxReal_10,
166 Model: EasyFluffV11.2,
158 Model: bluePencilXL_v050,
111 Model: abyssorangemix3AOM3_aom3a1b,
53 Model: cyberrealistic25D_v10,
47 Model: dreamshaperXL_turboDpmppSDE,
43 Model: WaifuDiffusion-1-4-anime_e2,
41 Model: DreamShaper_8_pruned,
36 Model: hassakuHentaiModel_v13,
22 Model: nai,
14 Model: bluePencilXL_v310,
12 Model: realisticStockPhoto_v10,
8 Model: lazymixRealAmateur_v40,
2 Model: v1-5-pruned-emaonly,
1 Model: NovelAI_Leak,
Can be good for upscaling or for taking an example image and iterating on a theme.
A possible workflow: take an image, run "Interrogate CLIP" to get a basic prompt, fix up the prompt, add quality tags, add a good negative prompt (steal from an image you like for that model), and then run img2img generation.
Denoising (sampling steps 20, but IDK if that matters):
- 0.65: mostly just vibes
- 0.60: still vibes
- 0.55: matches pose and most major details, artistic liberties
- 0.50: matches pose and most major details, artistic liberties
- 0.45: very close match to pose and some minor details, some artistic liberties, especially with the face
- 0.40: very close match to pose and most minor details, some artistic liberties, especially with the face
- 0.35: very close match to pose and most minor details, some artistic liberties, especially with the face
- 0.30: does smoothing and refinements, still changes facial details, keeps most minor details (so much so that shitty photoshops still look like shitty photoshops)
- 0.25: minor smoothing and refinements but not much change.
Lets you grab the generation parameters from an image and optionally send those parameters to other tabs. Amazingly useful. Lets you learn from other people's images. Lets you pick back up on something you were working on hours/days/weeks ago.