Skip to content

Instantly share code, notes, and snippets.

@Sunderbraze
Forked from city96/ComfyBootlegOffload.py
Last active March 5, 2025 02:03
Show Gist options
  • Save Sunderbraze/d0b0f942256965b40f54247344fea37f to your computer and use it in GitHub Desktop.
Save Sunderbraze/d0b0f942256965b40f54247344fea37f to your computer and use it in GitHub Desktop.
Add node to offload model as well
# Force model to always use specified device
# Place in `ComfyUI\custom_nodes` to use
# City96 [Apache2]
#
import types
import torch
import comfy.model_management
class OverrideDevice:
@classmethod
def INPUT_TYPES(s):
devices = ["cpu",]
for k in range(0, torch.cuda.device_count()):
devices.append(f"cuda:{k}")
return {
"required": {
"device": (devices, {"default":"cpu"}),
}
}
FUNCTION = "patch"
CATEGORY = "other"
def override(self, model, model_attr, device):
# set model/patcher attributes
model.device = device
patcher = getattr(model, "patcher", model) #.clone()
for name in ["device", "load_device", "offload_device", "current_device", "output_device"]:
setattr(patcher, name, device)
# move model to device
py_model = getattr(model, model_attr)
py_model.to = types.MethodType(torch.nn.Module.to, py_model)
py_model.to(device)
# remove ability to move model
def to(*args, **kwargs):
pass
py_model.to = types.MethodType(to, py_model)
return (model,)
def patch(self, *args, **kwargs):
raise NotImplementedError
class OverrideCLIPDevice(OverrideDevice):
@classmethod
def INPUT_TYPES(s):
k = super().INPUT_TYPES()
k["required"]["clip"] = ("CLIP",)
return k
RETURN_TYPES = ("CLIP",)
TITLE = "Force/Set CLIP Device"
def patch(self, clip, device):
return self.override(clip, "cond_stage_model", torch.device(device))
class OverrideVAEDevice(OverrideDevice):
@classmethod
def INPUT_TYPES(s):
k = super().INPUT_TYPES()
k["required"]["vae"] = ("VAE",)
return k
RETURN_TYPES = ("VAE",)
TITLE = "Force/Set VAE Device"
def patch(self, vae, device):
return self.override(vae, "first_stage_model", torch.device(device))
class OverrideMODELDevice(OverrideDevice):
@classmethod
def INPUT_TYPES(s):
k = super().INPUT_TYPES()
k["required"]["model"] = ("MODEL",)
return k
RETURN_TYPES = ("MODEL",)
TITLE = "Force/Set MODEL Device"
def patch(self, model, device):
return self.override(model, "model", torch.device(device))
NODE_CLASS_MAPPINGS = {
"OverrideCLIPDevice": OverrideCLIPDevice,
"OverrideVAEDevice": OverrideVAEDevice,
"OverrideMODELDevice": OverrideMODELDevice,
}
NODE_DISPLAY_NAME_MAPPINGS = {k:v.TITLE for k,v in NODE_CLASS_MAPPINGS.items()}
@city96
Copy link

city96 commented Aug 11, 2024

I remember trying to get this to work but it failed on regular unet models like sd1.x/sdxl hence why I didn't end up adding it (the compute was running on the main card while the model was stored on the second card, and it was agonizingly slow when it was reading each weight like that over my pcie x1 mining adapter)
Does it work correctly for you? Might've been just my driver that's fucked lol.

@MrNeon
Copy link

MrNeon commented Aug 13, 2024

Doesn't work correctly for me, the unet gets moved to the GPU but loras are not applied to the model. No errors, no crashes, it just has no loras loaded.

@desu-anon
Copy link

this custom script now appears to work fully for flux, following the LoRA loading fixes in ComfyUI-GGUF (which were required after comfy's update yesterday).
works well with a multi-GPU setup: one for the SD model + VAE, and one for the text encoders.

(SD Model Loader, CLIP Loader) -> [LoRA Loader] -> (Force/Set Model Device, Force/Set CLIP Device) -> [Other Nodes]

@Sunderbraze
Copy link
Author

Sunderbraze commented Aug 23, 2024

I remember trying to get this to work but it failed on regular unet models like sd1.x/sdxl hence why I didn't end up adding it (the compute was running on the main card while the model was stored on the second card, and it was agonizingly slow when it was reading each weight like that over my pcie x1 mining adapter) Does it work correctly for you? Might've been just my driver that's fucked lol.

Hey sorry I missed this before. What you're describing is accurate: the unet is getting loaded into my second GPU while inference is still running on my primary GPU. It hasn't caused any latency issues on my rig (specs here if interested) likely because both of my 4090s are in 16x slots running at 16x. Inference takes about 13.5 seconds for a 1024x1024 image with the 23GB Flux model on my second GPU and CLIP/VAE/etc on my primary.

I haven't messed with Comfy nodes enough to know if its even possible to change which GPU is selected for primary processing, as far as I know the only way to change that is via a switch at startup, so its possible it might require deeper modification to Comfy than can be done by a node, not sure though

@Zibbibbo
Copy link

Hi, i don't understand why i only have the OverrideCLIP and VAE nodes but i don't see the MODEL node, in comfyui search it doesn't show. Any advice to fix that?

@alex-mitov
Copy link

Doesn't seem to work for me when using it with a fine tuned Flux DEV model. :( Both clip and VAE are still moved to the VRAM instead of the system RAM. Anybody else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment