On this page
- One Button Press, N Parallel Executions, One Logical Result — How Do You Track All of It?
- What You Must Understand Before This Makes Sense
- The Air Traffic Control Tower: One Controller, Many Independent Flights
- Building the State Machine, Block by Block
- Chunk 1: The Double-Checked Locking Singleton
- Chunk 2: The BatchState Data Class
- Chunk 3: The Prompt Interceptor — handle_prompt()
- Chunk 4: Generating Combinations and Queueing Clone Prompts
- Chunk 5: The Deep Clone and Queue Injection
- Chunk 6: The Callback-Driven Result Aggregation
- Memory: deepcopy Is O(n), But It Is the Right Tool
- Race Conditions, Timeout Orphans, and the Cancellation Gap
- You Now Know How to Build a Thread-Safe Batch Coordinator in Python
One Button Press, N Parallel Executions, One Logical Result — How Do You Track All of It?
The Problem
ComfyUI’s execution model is simple and elegant: one prompt = one execution run. Press queue, one set of nodes runs, images are produced, done. This works perfectly for single-shot generation.
Now you need to run 90 variations of a workflow. You could ask the user to press queue 90 times. But then you have no logical grouping: you cannot know when “all 90 of this batch” are done. You cannot build a comparison grid because you don’t know which 90 images belong together. You cannot provide a progress bar. You cannot cancel the batch.
The problem is fundamentally a distributed coordination problem: you are orchestrating N independent processes (ComfyUI prompt executions) that must be treated as a single logical unit, sharing a lifecycle they are individually unaware of.
The Solution
core/iteration_state.py implements an IterationStateManager — a thread-safe singleton that owns the full batch lifecycle: interception, cloning, state tracking, result aggregation, timeout cleanup, and HTTP-based status reporting. It is the architectural centerpiece of the entire extension.
You’ll understand how to implement the double-checked locking singleton pattern in Python, how to safely manage shared mutable state across async and threaded contexts, how to clone and inject modified prompt graphs into ComfyUI’s queue, and how to design a callback-driven result aggregation system that fires a completion event exactly once.
What You Must Understand Before This Makes Sense
Knowledge Prerequisites
- Python
threading.Lock— what it does and when to use it - The concept of a singleton — one instance shared across the entire application
- ComfyUI’s prompt format — a JSON dict mapping node IDs to node configurations
copy.deepcopy— why shallow copy is insufficient for nested dicts- Python
async/awaitbasics — the HTTP route handlers use it
Environment
Python >= 3.9
threading (standard library)
copy (standard library)
uuid (standard library)
time (standard library)
aiohttp (provided by ComfyUI's embedded server)
ComfyUI's PromptServer and prompt_queue internal APIs
🔴 Danger: This module directly calls PromptServer.instance and server.prompt_queue.put() — internal ComfyUI APIs. These are not officially documented and could change between ComfyUI releases. The extension wraps all such calls in try/except blocks to fail gracefully.
The Air Traffic Control Tower: One Controller, Many Independent Flights
State Machine Diagram
stateDiagram-v2
[*] --> Pending : onprompt fires, batch created
Pending --> Active : combinations generated,\nprompts queued
Active --> Collecting : first result registered\nvia register_result()
Collecting --> Collecting : more results arrive\n(completed_count < total)
Collecting --> Complete : completed_count == total\n(register_result returns True)
Active --> Cancelled : /wi/cancel_batch POST
Collecting --> Cancelled : /wi/cancel_batch POST
Active --> Expired : age > 3600s\n(_cleanup_expired fires)
Complete --> [*] : cleanup_batch() called\nby WIGridCompositor
Cancelled --> [*] : cleanup_batch() called
Expired --> [*] : _cleanup_expired removes it
The Analogy
An air traffic control tower doesn’t fly any planes. It doesn’t build them. Its sole job is to know where every plane is, coordinate their sequencing, and declare when all planes in a flight group have landed safely. IterationStateManager is that tower. The individual ComfyUI execution workers are the planes. They take off independently, fly independently, but all report to the tower on landing.
Building the State Machine, Block by Block
Chunk 1: The Double-Checked Locking Singleton
The singleton pattern ensures exactly one IterationStateManager exists for the entire ComfyUI process lifetime. This matters because the batch registry (_active_batches) must be shared: the onprompt handler adds to it, and node execution callbacks read from it — from different threads.
import threading
# Module-level variables — not instance variables.
# This is the singleton storage.
_instance = None
_instance_lock = threading.Lock()
class IterationStateManager:
@classmethod
def instance(cls):
"""Get or create the singleton instance."""
global _instance
# First check (no lock): fast path for the common case where
# the instance already exists. 99.9% of calls take this path.
if _instance is None:
# Second check (with lock): only one thread creates the instance.
# Without the lock, two threads could both pass the first check
# and both call cls() — creating two instances.
with _instance_lock:
if _instance is None:
_instance = cls()
return _instance
This is the double-checked locking pattern. The outer if _instance is None check avoids acquiring the lock on every call (which would be a performance bottleneck). The inner check inside the lock prevents the race condition where two threads both see None and both try to create the instance.
🔵 Deep Dive: In Python, the GIL (Global Interpreter Lock) prevents true parallel execution of Python bytecode. Does that make locks unnecessary? No — because if _instance is None followed by _instance = cls() is two bytecode operations. The GIL can be released between them. With I/O, time.sleep, or C extension calls in between, the race is real.
Chunk 2: The BatchState Data Class
Every batch gets its own BatchState object. This is not a Python dataclass — it is a plain class, but it serves the same purpose: a named container for correlated values that travel together.
class BatchState:
"""Tracks state for one batch iteration run."""
def __init__(self, batch_id, total, combinations, batch_name, mode):
self.batch_id = batch_id
self.total = total # How many iterations to expect
self.combinations = combinations # Full list of planned combos
self.batch_name = batch_name
self.mode = mode # "matrix" or "linear"
# Mutable state — modified by register_result() under lock
self.results = {} # index → {images, metadata}
self.completed_count = 0
self.cancelled = False
self.created_at = time.time() # For timeout cleanup
# Tracks all prompt_ids queued for this batch.
# Used for cancellation — we can tell ComfyUI to dequeue them.
self.prompt_ids = []
Notice what is not stored: image tensors. The results dict stores a path or lightweight reference. Holding full GPU tensors here would cause memory leaks for large batches. The WISaveImage and WIGridCompositor nodes manage the actual image data.
Chunk 3: The Prompt Interceptor — handle_prompt()
This is the most important method. It is called synchronously by ComfyUI’s onprompt hook before any execution begins. It must be fast and must not raise exceptions (the wrapper in __init__.py catches them, but failures would silently drop the batch).
def handle_prompt(self, json_data):
# Step 1: Housekeeping — remove stale batches before adding new ones.
self._cleanup_expired()
prompt = json_data.get("prompt", {})
if not prompt:
return json_data # Nothing to do; return unchanged
# Step 2: Find WorkflowIterator nodes. A workflow might have none.
iterator_nodes = self._find_iterator_nodes(prompt)
if not iterator_nodes:
return json_data # Not an iteration workflow; pass through
for node_id, node_data in iterator_nodes.items():
inputs = node_data.get("inputs", {})
if not inputs.get("enabled", True):
continue
mode = inputs.get("mode", "matrix")
batch_name = inputs.get("batch_name", "batch")
# Step 3: Trace the param_stack chain to collect all parameter defs
param_defs = self._collect_param_definitions(prompt, node_id)
if not param_defs:
continue # No parameters defined → run workflow once
# Step 4: Parse raw value strings into typed Python lists
parsed_params = []
for pdef in param_defs:
try:
values = parameter_parser.parse(pdef["values_raw"], pdef["type"])
parsed_params.append({**pdef, "parsed_values": values})
except (ValueError, FileNotFoundError) as e:
logger.error(f"Error parsing parameter '{pdef['name']}': {e}")
continue
if not parsed_params:
continue
The error handling philosophy here is important: a single bad parameter definition is logged and skipped, but it does not abort the entire batch. The remaining parameters still run.
Chunk 4: Generating Combinations and Queueing Clone Prompts
This is where the multi-queue cloning strategy executes. The first combination modifies the original prompt in-place. Every subsequent combination is a deep-copied clone, independently queued.
# Step 5: Generate all combinations
if mode == "matrix":
combinations = combination_engine.cartesian(parsed_params)
else:
combinations = combination_engine.linear_zip(parsed_params)
total = len(combinations)
batch_id = f"{batch_name}_{uuid.uuid4().hex[:8]}"
batch_state = BatchState(batch_id, total, combinations, batch_name, mode)
# Step 6: Apply combination[0] to the original prompt (in-place mutation).
# This is safe because `json_data` was just created for this submission.
self._apply_combination(
json_data, prompt, workflow,
combinations[0], 0, total, batch_id, batch_name, mode
)
# Step 7: Queue combinations 1..N-1 as independent prompts.
self._queue_remaining(
json_data, workflow, combinations,
batch_id, batch_name, mode, total, batch_state
)
# Step 8: Register the batch in the shared registry (under lock).
with self._lock:
self._active_batches[batch_id] = batch_state
Chunk 5: The Deep Clone and Queue Injection
def _queue_remaining(
self, json_data, workflow, combinations, batch_id,
batch_name, mode, total, batch_state
):
from server import PromptServer
server = PromptServer.instance
for i in range(1, total): # Skip index 0 — already handled
# copy.deepcopy creates a completely independent copy of the
# entire prompt graph. Shallow copy would share nested dicts,
# meaning _apply_combination on clone would mutate originals.
cloned = copy.deepcopy(json_data)
cloned_prompt = cloned.get("prompt", {})
# Apply this iteration's parameter values to the clone
self._apply_combination(
cloned, cloned_prompt, workflow,
combinations[i], i, total, batch_id, batch_name, mode
)
new_prompt_id = str(uuid.uuid4())
# ComfyUI's internal queue API — put() validates and enqueues
valid = server.prompt_queue.put(
server.number, # Queue sequence number
new_prompt_id, # Unique ID for this execution
cloned_prompt, # The modified node graph
cloned.get("extra_data", {}),
self._get_output_nodes(cloned_prompt), # Which nodes produce output
)
if valid:
server.number += 1 # Advance the sequence counter
# Track this prompt ID in the batch for cancellation support
batch_state.prompt_ids.append(new_prompt_id)
# Map prompt_id → (batch_id, index) for result routing
with self._lock:
self._prompt_to_batch[new_prompt_id] = (batch_id, i)
Chunk 6: The Callback-Driven Result Aggregation
register_result is the method node executions call when they complete. It returns True exactly once — when the final iteration lands.
def register_result(self, batch_id, index, images, metadata):
"""
Called by WISaveImage or WIGridCompositor per completed iteration.
Returns True if this was the final iteration — signals grid build.
"""
with self._lock:
batch = self._active_batches.get(batch_id)
if batch:
# Store the result for this index
batch.results[index] = {"images": images, "metadata": metadata}
# Atomic increment and comparison — both under the same lock.
# Without the lock, two threads could both read completed_count=8,
# both increment to 9, and both return True for a 10-item batch.
batch.completed_count += 1
logger.info(
f"Batch '{batch.batch_name}': "
f"{batch.completed_count}/{batch.total} complete"
)
return batch.completed_count >= batch.total
# Batch not found (already cleaned up or cancelled)
return False
The critical insight: both the increment and the >= comparison happen inside the same lock acquisition. This makes “are we done?” an atomic question. Without the lock, the classic TOCTOU (time-of-check-time-of-use) race would allow two threads to both return True — triggering grid assembly twice.
Memory: deepcopy Is O(n), But It Is the Right Tool
Deep Copy Cost
copy.deepcopy recursively copies every object in the graph. A ComfyUI prompt JSON is typically 5–50 KB. Cloning 90 of them: 450 KB–4.5 MB allocated in a tight loop. At Python’s allocation rate, this takes 1–50 ms. For GPU workloads that take seconds to minutes, this is insignificant.
Could you optimize by only cloning the nodes that change? Yes — but the implementation complexity is substantial, the fragility is high (you must know exactly which nodes share mutable state), and the benefit is measured in milliseconds. This is a case where the simple, correct solution (deepcopy) is the right one.
Lock Granularity
The _lock is a single instance-level lock. Every method that touches _active_batches or _prompt_to_batch acquires it. This is coarse-grained locking — simple but potentially a bottleneck if hundreds of threads compete for it.
In practice, the lock is held for microseconds (dictionary reads/writes). The only remotely expensive operation under lock is the _cleanup_expired scan, which iterates _active_batches. With at most dozens of active batches at a time, this is O(batches) with a tiny constant. A finer-grained lock (per-batch) would be premature optimization.
🔵 Deep Dive: Python’s threading.Lock is a binary mutex implemented in C. Acquisition is O(1) when uncontested (a single atomic compare-and-swap). Contested acquisition parks the thread in a kernel wait queue — this is orders of magnitude more expensive. For this workload, uncontested access is the overwhelmingly common case.
Race Conditions, Timeout Orphans, and the Cancellation Gap
Race Condition: Late Arriving Results
If a batch is cancelled and cleanup_batch removes it from _active_batches, but a result arrives a moment later via register_result, the lookup returns None and the method returns False silently. The orphaned result is simply discarded. This is acceptable: the user explicitly cancelled the batch.
The Timeout Orphan Problem
What if ComfyUI crashes mid-batch? The batch state persists in _active_batches forever — a memory leak. The _cleanup_expired method runs at the start of every handle_prompt call and removes batches older than 3,600 seconds (1 hour). This is a lazy cleanup strategy: cleanup happens when the system is already active, not on a background timer. The trade-off is that orphaned batches aren’t cleaned up until the next user interaction. For a tool running on a local machine, this is acceptable.
🔴 Danger: The cancellation route (/wi/cancel_batch/{batch_id}) calls server.prompt_queue.delete_queue_item(pid) for each queued prompt. This is a ComfyUI internal API. If ComfyUI changes the queue’s interface in a future release, cancellation will silently fail (the try/except catches it). The batch will be marked cancelled in the state manager, but prompts may still execute. This is a known fragility point.
The _prompt_to_batch Map Leak
Every queued prompt ID is added to _prompt_to_batch. This map is only cleaned up by cleanup_batch. If cleanup is never called (e.g., WIGridCompositor never executes because the workflow has no compositor node), the map grows unboundedly. The timeout cleanup calls cleanup_batch for expired batches, which handles this transitively. But a non-expired batch with a missing compositor would leak entries until the 1-hour timeout fires.
You Now Know How to Build a Thread-Safe Batch Coordinator in Python
This is a non-trivial piece of concurrent software engineering. The specific skills you’ve acquired:
-
Double-checked locking singleton: How to create a truly thread-safe singleton in Python without making every call pay lock acquisition cost.
-
Atomic result aggregation: How to design a callback system where “all work is done” is detected exactly once, even under concurrent callbacks.
-
Prompt graph mutation and cloning: How to intercept, modify, and multiply a structured JSON workflow before the execution engine ever sees it.
-
Lazy timeout cleanup: A pragmatic alternative to background timer threads for orphan resource management.
-
Graceful degradation under external API instability: Every call to ComfyUI internals is wrapped in try/except, ensuring the extension degrades gracefully rather than crashing the host process.