Skip to content

[BUG] [CuTeDSL] cutlass_torch.matrix throws torch._dynamo.exc.Unsupported error with torch.compile fullGraph #3134

@vishnoianil

Description

@vishnoianil

Which component has the problem?

CuTe DSL

Bug Report

Describe the bug
I am trying to integrate CuTe DSL based kernel to vllm. I was trying to explicitly convert the tensor before passing it to the jit function. Torch dynamo throws torch._dynamo.exc.Unsupported: Tensor.random_ op when using cutlass_torch.matrix with @torch.compile(fullgraph=True).

(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     super().__init__(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/v1/engine/core.py", line 124, in __init__
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/v1/engine/core.py", line 247, in _initialize_kv_caches
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/v1/executor/abstract.py", line 136, in determine_available_memory
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return self.collective_rpc("determine_available_memory")
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/v1/executor/uniproc_executor.py", line 80, in collective_rpc
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/v1/serial_utils.py", line 510, in run_method
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     self.model_runner.profile_run()
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/v1/worker/gpu_model_runner.py", line 5773, in profile_run
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]                                         ^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/v1/worker/gpu_model_runner.py", line 5466, in _dummy_run
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     outputs = self.model(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]               ^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/compilation/cuda_graph.py", line 254, in __call__
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return self.runnable(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/models/llama.py", line 577, in forward
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     model_output = self.model(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]                    ^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/compilation/decorators.py", line 596, in __call__
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/compilation/wrapper.py", line 176, in aot_compile
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return self._compiled_callable.aot_compile((args, kwargs))
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 832, in aot_compile
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return aot_compile_fullgraph(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 195, in aot_compile_fullgraph
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     capture_output = convert_frame.fullgraph_capture(model, args, kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1208, in fullgraph_capture
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return _fullgraph_capture_frame(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1273, in _fullgraph_capture_frame
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     raise e.with_traceback(None) from e.__cause__  # User compiler error
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] torch._dynamo.exc.Unsupported: Tensor.random_ op
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   Explanation: This is currently not supported.
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   Hint: Use the out-of-place version of this op
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   Developer debug context: Tensor.random_(args=[ConstantVariable(int: -2), ConstantVariable(int: 2)], kwargs={})
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]  For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0107.html
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] from user code:
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]    File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/models/llama.py", line 423, in forward
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     hidden_states, residual = layer(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/models/llama.py", line 328, in forward
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     hidden_states = self.self_attn(positions=positions, hidden_states=hidden_states)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/models/llama.py", line 228, in forward
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     qkv, _ = self.qkv_proj(hidden_states)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/layers/linear.py", line 582, in forward
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     output_parallel = self.quant_method.apply(self, input_, bias)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py", line 921, in apply
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return scheme.apply_weights(layer, x, bias=bias)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py", line 209, in apply_weights
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return self.fp8_linear.apply_weights(layer, x, bias)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/kernels/linear/scaled_mm/ScaledMMLinearKernel.py", line 148, in apply_weights
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     return self.apply_scaled_mm(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/kernels/linear/scaled_mm/cutlass.py", line 170, in apply_scaled_mm
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     output = ops.cutlass_scaled_mm(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/_custom_ops.py", line 848, in cutlass_scaled_mm
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     out = cutedsl_scaled_mm(a, b, scale_a, scale_b, out_dtype, bias)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/kernels/cutedsl/scaled_mm_dispatch.py", line 326, in cutedsl_scaled_mm
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     compiled = _kernel_cache.get_or_compile(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/kernels/cutedsl/scaled_mm_dispatch.py", line 65, in get_or_compile
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     self._cache[key] = compile_fn()
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/kernels/cutedsl/scaled_mm_dispatch.py", line 328, in <lambda>
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     lambda: _compile_kernel(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/vllm/kernels/cutedsl/scaled_mm_dispatch.py", line 150, in _compile_kernel
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     a_cpu = cutlass_torch.matrix(l, m, k, False, ab_dtype)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/torch.py", line 273, in matrix
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     torch_tensor = create_and_permute_torch_tensor(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]   File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/torch.py", line 146, in create_and_permute_torch_tensor
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]     f32_torch_tensor = init_torch_tensor.random_(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]

Steps/Code to reproduce bug

Expected behavior
it breaks the @torch.compile graph, so it would be great if it can be replaced with any out-of-place version of this op.

Environment details (please complete the following information):
NVIDIA H100 / Fedora

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions