Describe the bug
ComfyUI crashes on startup with a Windows fatal exception: access violation originating from _probe_bfloat16_support() in src/optimization/compatibility.py (line 688). The crash is a hard segfault that Python's try/except cannot catch.
Environment
- OS: Windows 11
- GPU: NVIDIA GeForce RTX 4090
- NVIDIA Driver: 595.79 (recent update)
- PyTorch: 2.9.1+cu130
- cuDNN: 91200
- Python: 3.13.11
- ComfyUI: 0.16.4
- SeedVR2: v2.5.23 (commit 4490bd1)
Stack trace
Windows fatal exception: access violation
Stack (most recent call first):
File "...\seedvr2_videoupscaler\src\optimization\compatibility.py", line 688 in _probe_bfloat16_support
File "...\seedvr2_videoupscaler\src\optimization\compatibility.py", line 697 in <module>
File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
File "<frozen importlib._bootstrap_external>", line 1023 in exec_module
...
File "...\ComfyUI\nodes.py", line 2225 in load_custom_node
Root cause
The function _probe_bfloat16_support() performs a raw CUDA allocation (torch.randn(..., dtype=torch.bfloat16, device='cuda:0')) at module import time. With recent NVIDIA drivers (595.xx series), this triggers a fatal access violation during the CUDA/cuDNN initialization phase. Since it's a segfault, the try/except RuntimeError block cannot catch it, and the entire ComfyUI process terminates.
The GPU (RTX 4090, sm_89) fully supports bfloat16 — the crash is specifically about when and how the probe runs, not about actual bfloat16 capability.
Proposed fix
Run the bfloat16 probe in a subprocess so that if it crashes, the main process is unaffected:
def _probe_bfloat16_support() -> bool:
if not torch.cuda.is_available():
return True
# Subprocess-based probe (safe from access violations)
try:
import subprocess
import sys
probe_script = (
"import torch; "
"a = torch.randn(8, 8, dtype=torch.bfloat16, device='cuda:0'); "
"_ = torch.matmul(a, a); "
"print('OK')"
)
result = subprocess.run(
[sys.executable, "-c", probe_script],
capture_output=True,
text=True,
timeout=30,
env={**os.environ, "CUDA_VISIBLE_DEVICES": os.environ.get("CUDA_VISIBLE_DEVICES", "0")},
)
if result.returncode == 0 and "OK" in result.stdout:
return True
else:
return False
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
pass
# Fallback: check GPU compute capability (sm_80+ supports bfloat16)
try:
major, _ = torch.cuda.get_device_capability(0)
return major >= 8
except Exception:
return True
This adds ~2 seconds to startup but prevents fatal crashes with any driver version. The actual bfloat16 result is identical — no performance impact at runtime.
Likely impact
Anyone on Windows with PyTorch 2.9+ and recent NVIDIA drivers (595.xx series, March 2026) will hit this crash. It likely affects all GPU models, not just RTX 4090.
Full Diff
diff --git a/src/optimization/compatibility.py b/src/optimization/compatibility.py
index c462022..ab60146 100644
--- a/src/optimization/compatibility.py
+++ b/src/optimization/compatibility.py
@@ -682,17 +682,51 @@ if not os.environ.get("SEEDVR2_OPTIMIZATIONS_LOGGED"):
# Bfloat16 CUBLAS support
def _probe_bfloat16_support() -> bool:
+ """
+ Probe bfloat16 CUBLAS support using a subprocess to prevent fatal access
+ violations from crashing the main ComfyUI process.
+
+ On PyTorch 2.9+ with cuDNN >= 91200, calling torch.randn(..., dtype=bfloat16, device='cuda')
+ during module import can trigger a Windows fatal exception: access violation.
+ Running the probe in a subprocess isolates this crash.
+ """
if not torch.cuda.is_available():
return True
+
+ # First try: subprocess-based probe (safe from access violations)
try:
- a = torch.randn(8, 8, dtype=torch.bfloat16, device='cuda:0')
- _ = torch.matmul(a, a)
- del a
- return True
- except RuntimeError as e:
- if "CUBLAS_STATUS_NOT_SUPPORTED" in str(e):
+ import subprocess
+ import sys
+
+ probe_script = (
+ "import torch; "
+ "a = torch.randn(8, 8, dtype=torch.bfloat16, device='cuda:0'); "
+ "_ = torch.matmul(a, a); "
+ "print('OK')"
+ )
+
+ result = subprocess.run(
+ [sys.executable, "-c", probe_script],
+ capture_output=True,
+ text=True,
+ timeout=30,
+ env={**os.environ, "CUDA_VISIBLE_DEVICES": os.environ.get("CUDA_VISIBLE_DEVICES", "0")},
+ )
+
+ if result.returncode == 0 and "OK" in result.stdout:
+ return True
+ else:
+ # Subprocess crashed or returned error - bf16 not safe
return False
- raise
+ except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
+ pass
+
+ # Fallback: check GPU compute capability (sm_80+ supports bfloat16)
+ try:
+ major, _ = torch.cuda.get_device_capability(0)
+ return major >= 8
+ except Exception:
+ return True
BFLOAT16_SUPPORTED = _probe_bfloat16_support()
COMPUTE_DTYPE = torch.bfloat16 if BFLOAT16_SUPPORTED else torch.float16
Describe the bug
ComfyUI crashes on startup with a
Windows fatal exception: access violationoriginating from _probe_bfloat16_support() in src/optimization/compatibility.py (line 688). The crash is a hard segfault that Python'stry/exceptcannot catch.Environment
Stack trace
Root cause
The function _probe_bfloat16_support() performs a raw CUDA allocation (
torch.randn(..., dtype=torch.bfloat16, device='cuda:0')) at module import time. With recent NVIDIA drivers (595.xx series), this triggers a fatal access violation during the CUDA/cuDNN initialization phase. Since it's a segfault, thetry/except RuntimeErrorblock cannot catch it, and the entire ComfyUI process terminates.The GPU (RTX 4090, sm_89) fully supports bfloat16 — the crash is specifically about when and how the probe runs, not about actual bfloat16 capability.
Proposed fix
Run the bfloat16 probe in a subprocess so that if it crashes, the main process is unaffected:
This adds ~2 seconds to startup but prevents fatal crashes with any driver version. The actual bfloat16 result is identical — no performance impact at runtime.
Likely impact
Anyone on Windows with PyTorch 2.9+ and recent NVIDIA drivers (595.xx series, March 2026) will hit this crash. It likely affects all GPU models, not just RTX 4090.
Full Diff