Skip to content

[Bug]: Inconsistent tensor shape when running Qwen3.5 on NPU #35209

@GH-Jo

Description

@GH-Jo

OpenVINO Version

2026.2.0.dev20260324

Operating System

Windows System

Device used for inference

NPU

Framework

None

Model used

https://huggingface.co/Qwen/Qwen3.5-2B

Issue description

I exported Qwen3.5 to openvino using optimum-intel PR #1634, but it fails to run on NPU. (It runs normally on CPU and GPU)

The error occurs in a loop node. which I believe corresponds to the Gated Delta Rule block, based on the information for the previous and subsequent layers found in openvino_language_model.xml file. The previous layers are in_proj_q/k/v/a/b, which are the inputs to the Gated Delta Rule block, while the subsequent layers includes norm, in_proj_z, and out_proj.

The figure below shows the structure of the loop node in Qwen3.5-2B. In this diagram, Multiply_1206, outlined by a red-dotted line, outputs a tensor with an inconsistent shape on NPU, causing the model to fail.

Image

Step-by-step reproduction

I followed the steps in optimum-intel PR #1634, with some deviations (arguments for quantization & device selection), in a venv using Python 3.13.7.

Installation instructions:

pip install git+https://github.com/rkazants/optimum-intel.git@support_qwen3_5
pip install --pre -U openvino openvino-tokenizers nncf --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
pip install transformers==5.2.0
pip install requests torchvision opencv-python

Exporting cmd-line:

optimum-cli export openvino -m Qwen/Qwen3.5-2B --weight-format int8 --sym Qwen3.5-2B-int8
optimum-cli export openvino -m Qwen/Qwen3.5-4B --weight-format int4 --ratio 1 --sym Qwen3.5-4B-ov-int4

Inference script:

from transformers import AutoProcessor
from transformers.video_utils import load_video
from huggingface_hub import hf_hub_download
from optimum.intel.openvino import OVModelForVisualCausalLM
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--model-dir', type=str, required=True)
parser.add_argument('--device', type=str, default="NPU")
args = parser.parse_args()

processor = AutoProcessor.from_pretrained(args.model_dir)

ov_config = {
    "NPU_USE_NPUW": "YES",
    "NPUW_LLM": "YES",
}

if args.device == "CPU":
    model = OVModelForVisualCausalLM.from_pretrained(args.model_dir)  # default: CPU
elif args.device == "GPU":
    model = OVModelForVisualCausalLM.from_pretrained(args.model_dir, device='GPU') 
elif args.device == "NPU":
    model = OVModelForVisualCausalLM.from_pretrained(args.model_dir, ov_config=ov_config, device="NPU")

# Prepare video input
video_path = hf_hub_download(
                repo_id="raushan-testing-hf/videos-test",
                filename="sample_demo_1.mp4",
                repo_type="dataset",
            )
input_video, _ = load_video(video_path, num_frames=10, backend="opencv")

messages = [
    {"role": "user", "content": [
        {"type": "video"},
        {"type": "text", "text": "Why is this video funny?"},
    ]}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], videos=[input_video], return_tensors="pt")

# Run inference
output_ids = model.generate(**inputs, max_new_tokens=100)
output_text = processor.decode(output_ids[0], skip_special_tokens=True)

print(output_text)

Usage

python infer.py --model-dir Qwen3.5-2B-int8 --device NPU
python infer.py --model-dir Qwen3.5-4B-int4 --device NPU

Short error log

RuntimeError: Exception from src/inference/src/cpp/core.cpp:121:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'TRShape::broadcast_merge_into(output_shape, input_shapes[1], autob)' failed at src/core/shape_inference/include\eltwise_shape_inference.hpp:28:
While validating node 'opset1::Multiply Multiply_1210 (opset1::Multiply Multiply_1206[0]:f32[1,16,0,128], opset1::Unsqueeze Unsqueeze_1214[0]:f32[1,16,128,1]) -> (f32[?,16,128,128])' with friendly_name 'Multiply_1210':
Argument shapes are inconsistent.
Full log
Traceback (most recent call last):
  File "C:\intel_260325\infer.py", line 21, in <module>
    model = OVModelForVisualCausalLM.from_pretrained(model_dir, ov_config=properties, device="NPU")
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_base.py", line 617, in from_pretrained
    return super().from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~^
        model_id,
        ^^^^^^^^^
    ...<9 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\modeling_base.py", line 407, in from_pretrained
    return from_pretrained_method(
        model_id=model_id,
    ...<9 lines>...
        **kwargs,
    )
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 582, in _from_pretrained
    model = model_cls(
        language_model=language_model,
    ...<6 lines>...
        **kwargs,
    )
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 4865, in __init__
    super().__init__(
    ~~~~~~~~~~~~~~~~^
        language_model=language_model,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 418, in __init__
    self.language_model = OVModelWithEmbedForCausalLM(
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        language_model,
        ^^^^^^^^^^^^^^^
    ...<7 lines>...
        compile_only=self._compile_only,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 96, in __init__
    super().__init__(
    ~~~~~~~~~~~~~~~~^
        model=model,
        ^^^^^^^^^^^^
    ...<6 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_decoder.py", line 208, in __init__
    self.compile()
    ~~~~~~~~~~~~^^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 114, in compile
    super().compile()
    ~~~~~~~~~~~~~~~^^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_decoder.py", line 428, in compile
    super().compile()
    ~~~~~~~~~~~~~~~^^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_base.py", line 914, in compile
    self.request = self._compile_model(self.model, self._device, ov_config, self.model_save_dir)
                   ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_base.py", line 420, in _compile_model
    compiled_model = core.compile_model(model, device.upper() if device is not None else device, config=ov_config)
  File "C:\intel_260325\.venv\Lib\site-packages\openvino\_ov_api.py", line 646, in compile_model
    super().compile_model(model, device_name, {} if config is None else config),
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:121:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'TRShape::broadcast_merge_into(output_shape, input_shapes[1], autob)' failed at src/core/shape_inference/include\eltwise_shape_inference.hpp:28:
While validating node 'opset1::Multiply Multiply_1210 (opset1::Multiply Multiply_1206[0]:f32[1,16,0,128], opset1::Unsqueeze Unsqueeze_1214[0]:f32[1,16,128,1]) -> (f32[?,16,128,128])' with friendly_name 'Multiply_1210':
Argument shapes are inconsistent.

The error log above is acquired when running Qwen3.5-2B-int8. The only difference in the error log for Qwen3.5-4B-int4 is the numeric suffixes.

Relevant log output

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions