The purpose of this ComfyUI workflow (Image/Frame(s)/Video To Video process) is to manage batch generations of multiple file items found in a folder with varied mixed image/video file formats (including PSD file format for layers to define frames) through a unified VACE workflow . ((note: this can be swapped out to other types of workflows), but VACE was used because it's most versatile to fit all use cases under the V2V/I2V/FLF2V umbrella)
BY adopting file-system/config conventions, it allows you to mix and match all of VACE capabilities (single/multiple I2V/FLF2V/V2V inpainting, outpainting, restyling/repainting, extension, looping) without overcomplicating the workflow nodes since most of the intention is implied within the accompanying file assets and their respective json configurations. The complexity of the workflow nodes however only comes in the process of managing multiple varied image/video/txt sources and handling the batch series of assets/configuration variations, but this is an added feature beyond what VACE provides.
This document covers all the conventions for configurating your assets for VACE (or any WAN-based asset) for batch-processing, particularly for this workflow and other variations of it.
This sets up the VACE's control video. Various file formats are supported (see last section below). These files are all located in the main source folder as specified in workflow, so you can switch it to any other folder as when needed.
.src.
(filename search identifier) for image/video/folder of key source assets. Any files with this filename indicator will be detected as a batch item to be processed as{stem}.src.{file-extension}
(ignoring everything else).
Define a path to an index.json
file as specified in the workflow. This is simply used as a convention to define any global JSON settings for each task item. It is a JSON object containing key value pairs where each key is a {stem}
pointing to the value object with all the respective JSON parameters. "{stem}": {...}
.
A default fallback {stem}
can be defined in the workflow in the event no JSON spec is found in relation to the source file stem, allowing you to set a global fallback default for specific tasks.
Any {stem}
assets are located in the same location as where the respective source assets are located at, which is the main source folder as specified in workflow.
{stem}.txt
For defining JSON parameters in.txt
file alongside the source asset file for the respective item rather than use the globalindex.json
stem definitions.- NOTE: You also turn on "Enable Global TXT search" in the workflow (and specify a root folder path), to avoid the restriction of always requiring the
{stem}.txt
file to be always be in the same location as the source asset file. This allows you to define TXT json files in seperate folder locations. With this option enabled, it will recursively search that root folder for .txt files as a final global fallback with\{stem}.txt
or/{stem}.txt
(you can customise the slash depending on your OS). In the event multiple matches are found though, it will pick the first matching one, but the list order is undetermined (ie. based off your OS settings) unless you customise the workflow yourself to sort the list. It will pick the first match in the OS walker for the file system, so it will always pick the closest depth-first search match which can be considered predictable enough for most users.
- NOTE: You also turn on "Enable Global TXT search" in the workflow (and specify a root folder path), to avoid the restriction of always requiring the
You can define the prefered default pseudo mask flatcolor (a single grayscale value) for the control video within the workflow itself. White is used but some might prefer or need a mid-gray tone (like 128, or 127, etc.).. I'm not too sure which is best or really does it make a difference or not in some cases. (TODO: per-item json parameter setting for this)
Any transparent regions in alpha channel (if any) for source video/image frames will always be treated as an enforced foreground mask with default pseudo-mask flat colour that will always be on top for control video. Such alpha transparent areas will be merged into VACE's control mask for inpainting.
(Optional) Masked areas under listed assets will potentially apply controlnet layer coloring for control video depending on controlnet settings. Resulting mask will be merged into VACE's control mask for inpainting.
{stem}.mask.
(filename suffix) for image/video/folder of images masking asset.
(Optional) Extra masking that lies behind the Control Masking (if any) that is painted with default pseudo mask colour for control video. (good for erasing areas especially backgrounds or defining blanked-out external masks without editing the source image files). This ignores any controlnet settings by default. Resulting mask will be merged into VACE's control mask for inpainting.
"extramask":
(Optional) (JSON parameter) extra masking asset absolute file/folder path.
{stem}.mask
/"extramask":
Masking conventions for image file formats always assume the presence of an alpha channel and are saved as lossless. Zero-value alpha (ie. transparent regions) indicates areas to be inpainted. The other RGB color channels are simply there for personal visual reference only and ignored by the system. Thus, image formats like JPG that lacks an alpha channel are not supported. (They are lossy anyway and thus are impractical to be used as mask reference).
Masking conventions for video file formats assume no alpha channel is used and so the actual mask image format is used (assumed black and white), white color to indicate inpainting regions and black color to indicate preserved regions. Ensure video file formats for masking are saved as lossless as well to avoid to avoid artifacts/inaccruacies.
Any source/masking frame specified, whether it be for masking or for image RGB source, is always repeated against any JSON-defined parameter for keyframe/frame length under I2V/V2V usages. This allows for 1 or more masking/source frame images to be repeated aross the next keyframe or across the entire stretch of frames for the generated video. With this convention, it allows you to easily set up looping videos or persistantly repeated color/masking frames across the entire required duration of the video with 1 (or the needed N amount) of single source image(s).
"positive":
: The positive prompt to use/add to the existing workflow."negative":
: The negative prompt to prepend to the existing workflow. (Typically the WAN defaullts and any custom defaults of your own can be added on your end for the workflow itself)."reference_image":
(Optional) Absolute file path to image file asset for VACE reference image. Leave it as a blank string""
to automatically use the first frame of the source asset as the VACE reference image. Leave the field out to ensure no reference image is used.- Cnet (controlnet) settings: The workflow has a default cnet setting by default for region-based masking with mask assets. But you can override this with a
cnet
parameter."cnet":
0
Do not use any controlnet presets. Always replace any masked areas with pseudo-mask flat color instead."cnet":
>= 1
(Use controlnet preset N for masked areas or entire masked control frames) See the workflow's control masking group node switch for options or customise it accordingly. If this value is explicitly specified in JSON parameters, but no extramsk or mask assets are found/specified, then it is assumed controlnet coloring presets are to be applied across the entire set of source frames, and the entire set of source frames will be masked for restyling replacement."cnet":
-1
(Do not replace any masked areas with controlnet presets or process them for output. Assumed source video/frames is already baked with the intended controlnet colorings. If this value is explicitly specified in JSON parameters but no extramask or mask assets are found/specified, it will also be assumed that the entire set of source frames will be masked for restyling replacement.- For V2V extension of video, make sure you leave
"cnet"
JSON setting undefined (ie. unspecified) or zero to ensure the original source video remains untouched even when no mask assets are specified/found for the VACE control mask.
"fps"
: Forced fps setting for loaded videos, whether it be source/mask/control videos. If left undefined, uses the workflow default.- IF2V Frame length control: You need to ensure total frame length should be ((divisible by 4) + 1).
"frames":
(Optional) Keyframe spec settings as an array list of string-literal numbers, to define how many more extra blank frames to add after each given source keyframe. Including this spec indicates that some form of I2V/FL2V (IF2V) process is intended with potentially multiple keyframes.- eg. I2V:
["80"]
for 80 extra blank frames after the first keyframe for a total of 81 frames. - eg. FL2V:
["79", "0"]
for 79 extra blank frames after the first keyframe and zero extra frames after the last keyframe for a total of 81 frames. - eg. IF2V:
["63", "32"]
for 63 extra blank frames after the first keyframe and 32 extra blank frames after the last keyframe for a total of 97 frames. - eg. to ensure valid counts, assuming
frames
is the only extra set of frames defined with no extra appended control video,frames
count amounts must involve a number divisible by4
for the last keyframe and a number divisible by4 minus 1
for any non-last keyframes.
- V2V Frame length control: You need to ensure total resulting video frame length after these adjustments should be ((divisible by 4) + 1).
"load_cap":
If no masks/extra mask assets are found, this will allow capping off the frame count by a non-zero amount for the source video. Negative values are supported to truncate relative from end of image frame sequence. Useful for V2V cutting of undesirable video portions, alongside with video extension."skip_index":
If no masks/extra mask assets are found, this will allow skipping (shifting away) the initial length of source video frames from the start . Negative values are supported to start skipping the start source index relative to the end of the image frame sequence. Useful for V2V video extension by omitting out intial starting frames."extra_control_video":
Absolute path to extra control video file asset or animated webP file (it is assumed all controlnet colors are already baked into this video, and no further processing is required). Use this to append extra control video frames after the initial set of source control frames. By default, all the frames under the extra control video will be fully restyled unless"extra_control_video_outpainting"
is specified."extra_control_video_outpainting"
: If specified,"extra_control_video"
is intended to be an animated sequence with outpainted border settings instead for top, right, bottom and left to extend from accordingly. Use an array strictly consisting of 4 numeric string-based arrays. eg.["32", "64", "0", "0"]
to represent your outpainting border settings. The non-border outpainted regions will be preserved as non-masked areas for the respective control mask frames, to preserve the non-border areas of the extra video. You can supply a zero based array value eg.["0", "0", "0", "0"]
to simply treat the entire video as extra context frames appended at the end with no outpainting borders, thus resulting in no intended modifications or restylings required for those frames."extra_frames":
Use this for V2V cases for extending the entire video sequence of frames with extra amount of blank frames according to the stipulated amount after the generated source control frames. (eg."extra_frames": 64
). This will be ignored if"frames":
is being used, but will still be used as extra blank frames to add after"extra_control_video"
(if found).- This can also be used to also set up T2V video generation cases of varying lengths by using a single blank transparent webp/png image as a starting blank source image frame along with a extra_frames count that is a multiple of 4.
"extra_load_cap":
If loading extra control video via"extra_control_video"
, this will allow capping off the frame count by a non-zero amount for the extra control video. Negative values are supported to allow for truncating relative from the end of video frame sequence."extra_skip_index":
If loading extra control video via"extra_control_video"
, this will allow skipping (shifting away) the initial length of frames from the start for the extra control video. Negative values are supported to start skipping from the start source index relative from the end of the video frame sequence.
Use optional "outpaint_top":
, "outpaint_right":
, "outpaint_bottom":
and "outpaint_top":
numeric values to represent how much additional border dimensions (in pixels) to outpaint for the given supplied source and mask assets. The total resulting width and height dimensions after outpainting must be divisible by 16 to ensure processing works.
psd
: Photoshop file format- each visible layer treated as a single source frame arranged from bottom up order.
- (optional) each layer named 'mask', which is hidden versus the visible layers, will be treated as a respective Control Masking/
.mask.
image frame arranged from bottom up order. - (optional) each layer named 'extramask', which is hidden versus the visible layers, will be treated as a respective Extra masking/
"extramask":
image frame arranged from bottom up order. - If the first visible layer has a numeric based Layer Name, (eg.
80
), then it is assumed all visible layers adopt numeric-based layer names to define the respective"frames":
keyframes setting. This allows you to define a IF2V/FLF2V/I2V frame durations completely within the single PSD file itself, which can be quite convenient for keeping track instead of using external json. Note that if you still have a specific external JSON definition (excluding the default fallback json node), that definition will still always override whatever is declared in the PSD file itself.
webp|png|jpg|jpeg
: Image file formats- To define image-based frames. Alpha transparency channel in image frames are converted to mask directly. (ie. the transparent regions imply masking areas for inpainting => inverted alpha channel values for mask). When used as a mask asset under
.mask.
file orextramask:
JSON-specified file path, the colored channels are for personal reference only and are ignored by the batching engine. Masking with jpg/jpeg not supported here due to image-based frame conventions requiring alpha channel instead.
- To define image-based frames. Alpha transparency channel in image frames are converted to mask directly. (ie. the transparent regions imply masking areas for inpainting => inverted alpha channel values for mask). When used as a mask asset under
- Folder of images (ie. no file extension) : Folder containing image-based frames, will be arranged in alphabetical order
webm|mp4|avi|mov|gif|apng
: Video file formats- To define frames in a video container. When used as a mask asset under
.mask.
file orextramask:
JSON-specified file path, use a black and white lossless video always (since there is no alpha channel typically for most video formats), so that the black and white colors in the video frames can be converted to mask directly. (ie. the white regions imply masking areas for inpainting according to standard image mask conventions).
- To define frames in a video container. When used as a mask asset under
Depending on which source the configuration parameter is derived from, the following order is used to resolve configuration value. (Higher specificiality indicated first)
From:
{stem}.txt
file (assumes the txt file contains the JSON configuration for the specific item)"{stem}":
key in index.json- Implied from source asset(s)
- From workflow-specific default
index.json
fallback node key.
examples: advanced mixed images/video controlnet hybrid composition examples
Reference image with pre-baked Controlnet motion control (single image source )
"reference_image": ""
, "extra_control_video": "...",
skip_index: 1`"extra_frames"
to freely extend videoReference image with source Controlnet motion control (single video source)
"reference_image": "...specify actual path to image...."
. Use"cnet": -1
if source video already pre-baked with controlnet colors, otherwise, explicitly specify a positive"cnet"
value to ensure one of the controlnet coloring presets are used for source video."extra_frames"
to freely extend videoStarting image frame with pre-baked Controlnet motion control over other frames (single image/video source)
"extra_control_video": "..."
, "extra_skip_index
of 1 or more if needed depending on situation, and respective"extra_load_cap": -1
if the loaded video is actually generated from initial starting image frame and wish to regenerate over latter frames recyling previously generated motion."skip_index": 1
and"load_cap": 1
, if loading first frame from video source."reference_image": ""
if want to re-emphasise first frame as reference image"extra_frames"
to freely extend videoStarting video with pre-baked Controlnet motion control over other frames (single video source)
"extra_control_video": "..."
, ""reference_image": ""
if want to re-emphasise first frame as reference image"extra_frames"
to freely extend videoIF2V keyframes with pre-baked Controlnet motion control video extension (single set of images source)
"frames: ["31", "0"]
, "extra_control_video": "...""extra_frames"
to freely extend videoIF2V keyframes with source Controlnet restyling for whole keyframes (single set of images source)
"frames: ["79", "0"]
,"cnet": 2
(ie. your prefered>=1
cnet preset)examples: looping video
IF2V looping video with single source image
"frames: ["79", "0"]
"positive": "give a description of the video and specify it as a looping video animation in the prompt and hope for the best"
IF2V looping video with two source images (first frame and in-between frame to guide motion)
"frames: ["39", "39", "0"]
"positive": "give a description of the video and specify it as a looping video animation in the prompt and hope for the best"