(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] super().__init__(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/v1/engine/core.py", line 124, in __init__
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/v1/engine/core.py", line 247, in _initialize_kv_caches
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/v1/executor/abstract.py", line 136, in determine_available_memory
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return self.collective_rpc("determine_available_memory")
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/v1/executor/uniproc_executor.py", line 80, in collective_rpc
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/v1/serial_utils.py", line 510, in run_method
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] self.model_runner.profile_run()
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/v1/worker/gpu_model_runner.py", line 5773, in profile_run
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] hidden_states, last_hidden_states = self._dummy_run(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/v1/worker/gpu_model_runner.py", line 5466, in _dummy_run
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] outputs = self.model(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/compilation/cuda_graph.py", line 254, in __call__
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return self.runnable(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return self._call_impl(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return forward_call(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/models/llama.py", line 577, in forward
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] model_output = self.model(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/compilation/decorators.py", line 596, in __call__
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/compilation/wrapper.py", line 176, in aot_compile
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return self._compiled_callable.aot_compile((args, kwargs))
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 832, in aot_compile
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return aot_compile_fullgraph(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 195, in aot_compile_fullgraph
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] capture_output = convert_frame.fullgraph_capture(model, args, kwargs)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1208, in fullgraph_capture
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return _fullgraph_capture_frame(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/convert_frame.py", line 1273, in _fullgraph_capture_frame
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] raise e.with_traceback(None) from e.__cause__ # User compiler error
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] torch._dynamo.exc.Unsupported: Tensor.random_ op
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] Explanation: This is currently not supported.
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] Hint: Use the out-of-place version of this op
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] Developer debug context: Tensor.random_(args=[ConstantVariable(int: -2), ConstantVariable(int: 2)], kwargs={})
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0107.html
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] from user code:
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/models/llama.py", line 423, in forward
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] hidden_states, residual = layer(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/models/llama.py", line 328, in forward
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] hidden_states = self.self_attn(positions=positions, hidden_states=hidden_states)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/models/llama.py", line 228, in forward
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] qkv, _ = self.qkv_proj(hidden_states)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/layers/linear.py", line 582, in forward
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] output_parallel = self.quant_method.apply(self, input_, bias)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py", line 921, in apply
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return scheme.apply_weights(layer, x, bias=bias)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py", line 209, in apply_weights
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return self.fp8_linear.apply_weights(layer, x, bias)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/kernels/linear/scaled_mm/ScaledMMLinearKernel.py", line 148, in apply_weights
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] return self.apply_scaled_mm(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/model_executor/kernels/linear/scaled_mm/cutlass.py", line 170, in apply_scaled_mm
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] output = ops.cutlass_scaled_mm(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/_custom_ops.py", line 848, in cutlass_scaled_mm
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] out = cutedsl_scaled_mm(a, b, scale_a, scale_b, out_dtype, bias)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/kernels/cutedsl/scaled_mm_dispatch.py", line 326, in cutedsl_scaled_mm
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] compiled = _kernel_cache.get_or_compile(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/kernels/cutedsl/scaled_mm_dispatch.py", line 65, in get_or_compile
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] self._cache[key] = compile_fn()
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/kernels/cutedsl/scaled_mm_dispatch.py", line 328, in <lambda>
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] lambda: _compile_kernel(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/vllm/kernels/cutedsl/scaled_mm_dispatch.py", line 150, in _compile_kernel
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] a_cpu = cutlass_torch.matrix(l, m, k, False, ab_dtype)
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/torch.py", line 273, in matrix
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] torch_tensor = create_and_permute_torch_tensor(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] File "/home/cloud-user/avishnoi/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/torch.py", line 146, in create_and_permute_torch_tensor
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] f32_torch_tensor = init_torch_tensor.random_(
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108] Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
(EngineCore pid=538975) ERROR 03-29 22:28:48 [core.py:1108]
Which component has the problem?
CuTe DSL
Bug Report
Describe the bug
I am trying to integrate CuTe DSL based kernel to vllm. I was trying to explicitly convert the tensor before passing it to the jit function. Torch dynamo throws
torch._dynamo.exc.Unsupported: Tensor.random_ opwhen usingcutlass_torch.matrixwith @torch.compile(fullgraph=True).Steps/Code to reproduce bug
Expected behavior
it breaks the @torch.compile graph, so it would be great if it can be replaced with any out-of-place version of this op.
Environment details (please complete the following information):
NVIDIA H100 / Fedora
Additional context
Add any other context about the problem here.