[BUG] Fatal access violation crash on startup with recent NVIDIA drivers (bfloat16 probe)

**Describe the bug**

ComfyUI crashes on startup with a `Windows fatal exception: access violation` originating from [_probe_bfloat16_support()](file:///f:/ComfyUI_windows_portable/ComfyUI/custom_nodes/seedvr2_videoupscaler/src/optimization/compatibility.py#684-730) in [src/optimization/compatibility.py](file:///f:/ComfyUI_windows_portable/ComfyUI/custom_nodes/seedvr2_videoupscaler/src/optimization/compatibility.py) (line 688). The crash is a hard segfault that Python's `try/except` cannot catch.

**Environment**
- OS: Windows 11
- GPU: NVIDIA GeForce RTX 4090
- NVIDIA Driver: 595.79 (recent update)
- PyTorch: 2.9.1+cu130
- cuDNN: 91200
- Python: 3.13.11
- ComfyUI: 0.16.4
- SeedVR2: v2.5.23 (commit 4490bd1)

**Stack trace**
```
Windows fatal exception: access violation

Stack (most recent call first):
  File "...\seedvr2_videoupscaler\src\optimization\compatibility.py", line 688 in _probe_bfloat16_support
  File "...\seedvr2_videoupscaler\src\optimization\compatibility.py", line 697 in <module>
  File "<frozen importlib._bootstrap>", line 488 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1023 in exec_module
  ...
  File "...\ComfyUI\nodes.py", line 2225 in load_custom_node
```

**Root cause**

The function [_probe_bfloat16_support()](file:///f:/ComfyUI_windows_portable/ComfyUI/custom_nodes/seedvr2_videoupscaler/src/optimization/compatibility.py#684-730) performs a raw CUDA allocation (`torch.randn(..., dtype=torch.bfloat16, device='cuda:0')`) at module import time. With recent NVIDIA drivers (595.xx series), this triggers a fatal access violation during the CUDA/cuDNN initialization phase. Since it's a segfault, the `try/except RuntimeError` block cannot catch it, and the entire ComfyUI process terminates.

The GPU (RTX 4090, sm_89) fully supports bfloat16 — the crash is specifically about *when and how* the probe runs, not about actual bfloat16 capability.

**Proposed fix**

Run the bfloat16 probe in a subprocess so that if it crashes, the main process is unaffected:

```python
def _probe_bfloat16_support() -> bool:
    if not torch.cuda.is_available():
        return True
    
    # Subprocess-based probe (safe from access violations)
    try:
        import subprocess
        import sys
        
        probe_script = (
            "import torch; "
            "a = torch.randn(8, 8, dtype=torch.bfloat16, device='cuda:0'); "
            "_ = torch.matmul(a, a); "
            "print('OK')"
        )
        
        result = subprocess.run(
            [sys.executable, "-c", probe_script],
            capture_output=True,
            text=True,
            timeout=30,
            env={**os.environ, "CUDA_VISIBLE_DEVICES": os.environ.get("CUDA_VISIBLE_DEVICES", "0")},
        )
        
        if result.returncode == 0 and "OK" in result.stdout:
            return True
        else:
            return False
    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
        pass
    
    # Fallback: check GPU compute capability (sm_80+ supports bfloat16)
    try:
        major, _ = torch.cuda.get_device_capability(0)
        return major >= 8
    except Exception:
        return True
```

This adds ~2 seconds to startup but prevents fatal crashes with any driver version. The actual bfloat16 result is identical — no performance impact at runtime.

**Likely impact**

Anyone on Windows with PyTorch 2.9+ and recent NVIDIA drivers (595.xx series, March 2026) will hit this crash. It likely affects all GPU models, not just RTX 4090.

---

##  Full Diff


```diff
diff --git a/src/optimization/compatibility.py b/src/optimization/compatibility.py
index c462022..ab60146 100644
--- a/src/optimization/compatibility.py
+++ b/src/optimization/compatibility.py
@@ -682,17 +682,51 @@ if not os.environ.get("SEEDVR2_OPTIMIZATIONS_LOGGED"):
 
 # Bfloat16 CUBLAS support
 def _probe_bfloat16_support() -> bool:
+    """
+    Probe bfloat16 CUBLAS support using a subprocess to prevent fatal access
+    violations from crashing the main ComfyUI process.
+    
+    On PyTorch 2.9+ with cuDNN >= 91200, calling torch.randn(..., dtype=bfloat16, device='cuda')
+    during module import can trigger a Windows fatal exception: access violation.
+    Running the probe in a subprocess isolates this crash.
+    """
     if not torch.cuda.is_available():
         return True
+    
+    # First try: subprocess-based probe (safe from access violations)
     try:
-        a = torch.randn(8, 8, dtype=torch.bfloat16, device='cuda:0')
-        _ = torch.matmul(a, a)
-        del a
-        return True
-    except RuntimeError as e:
-        if "CUBLAS_STATUS_NOT_SUPPORTED" in str(e):
+        import subprocess
+        import sys
+        
+        probe_script = (
+            "import torch; "
+            "a = torch.randn(8, 8, dtype=torch.bfloat16, device='cuda:0'); "
+            "_ = torch.matmul(a, a); "
+            "print('OK')"
+        )
+        
+        result = subprocess.run(
+            [sys.executable, "-c", probe_script],
+            capture_output=True,
+            text=True,
+            timeout=30,
+            env={**os.environ, "CUDA_VISIBLE_DEVICES": os.environ.get("CUDA_VISIBLE_DEVICES", "0")},
+        )
+        
+        if result.returncode == 0 and "OK" in result.stdout:
+            return True
+        else:
+            # Subprocess crashed or returned error - bf16 not safe
             return False
-        raise
+    except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
+        pass
+    
+    # Fallback: check GPU compute capability (sm_80+ supports bfloat16)
+    try:
+        major, _ = torch.cuda.get_device_capability(0)
+        return major >= 8
+    except Exception:
+        return True
 
 BFLOAT16_SUPPORTED = _probe_bfloat16_support()
 COMPUTE_DTYPE = torch.bfloat16 if BFLOAT16_SUPPORTED else torch.float16
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Fatal access violation crash on startup with recent NVIDIA drivers (bfloat16 probe) #552

Full Diff

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[BUG] Fatal access violation crash on startup with recent NVIDIA drivers (bfloat16 probe) #552

Description

Full Diff

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions