Traceback (most recent call last):
File "C:\intel_260325\infer.py", line 21, in <module>
model = OVModelForVisualCausalLM.from_pretrained(model_dir, ov_config=properties, device="NPU")
File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_base.py", line 617, in from_pretrained
return super().from_pretrained(
~~~~~~~~~~~~~~~~~~~~~~~^
model_id,
^^^^^^^^^
...<9 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\intel_260325\.venv\Lib\site-packages\optimum\modeling_base.py", line 407, in from_pretrained
return from_pretrained_method(
model_id=model_id,
...<9 lines>...
**kwargs,
)
File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 582, in _from_pretrained
model = model_cls(
language_model=language_model,
...<6 lines>...
**kwargs,
)
File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 4865, in __init__
super().__init__(
~~~~~~~~~~~~~~~~^
language_model=language_model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...<8 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 418, in __init__
self.language_model = OVModelWithEmbedForCausalLM(
~~~~~~~~~~~~~~~~~~~~~~~~~~~^
language_model,
^^^^^^^^^^^^^^^
...<7 lines>...
compile_only=self._compile_only,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 96, in __init__
super().__init__(
~~~~~~~~~~~~~~~~^
model=model,
^^^^^^^^^^^^
...<6 lines>...
**kwargs,
^^^^^^^^^
)
^
File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_decoder.py", line 208, in __init__
self.compile()
~~~~~~~~~~~~^^
File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 114, in compile
super().compile()
~~~~~~~~~~~~~~~^^
File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_decoder.py", line 428, in compile
super().compile()
~~~~~~~~~~~~~~~^^
File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_base.py", line 914, in compile
self.request = self._compile_model(self.model, self._device, ov_config, self.model_save_dir)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_base.py", line 420, in _compile_model
compiled_model = core.compile_model(model, device.upper() if device is not None else device, config=ov_config)
File "C:\intel_260325\.venv\Lib\site-packages\openvino\_ov_api.py", line 646, in compile_model
super().compile_model(model, device_name, {} if config is None else config),
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:121:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'TRShape::broadcast_merge_into(output_shape, input_shapes[1], autob)' failed at src/core/shape_inference/include\eltwise_shape_inference.hpp:28:
While validating node 'opset1::Multiply Multiply_1210 (opset1::Multiply Multiply_1206[0]:f32[1,16,0,128], opset1::Unsqueeze Unsqueeze_1214[0]:f32[1,16,128,1]) -> (f32[?,16,128,128])' with friendly_name 'Multiply_1210':
Argument shapes are inconsistent.
OpenVINO Version
2026.2.0.dev20260324
Operating System
Windows System
Device used for inference
NPU
Framework
None
Model used
https://huggingface.co/Qwen/Qwen3.5-2B
Issue description
I exported Qwen3.5 to openvino using optimum-intel PR #1634, but it fails to run on NPU. (It runs normally on CPU and GPU)
The error occurs in a loop node. which I believe corresponds to the Gated Delta Rule block, based on the information for the previous and subsequent layers found in
openvino_language_model.xmlfile. The previous layers arein_proj_q/k/v/a/b, which are the inputs to the Gated Delta Rule block, while the subsequent layers includesnorm,in_proj_z, andout_proj.The figure below shows the structure of the loop node in Qwen3.5-2B. In this diagram,
Multiply_1206, outlined by a red-dotted line, outputs a tensor with an inconsistent shape on NPU, causing the model to fail.Step-by-step reproduction
I followed the steps in optimum-intel PR #1634, with some deviations (arguments for quantization & device selection), in a venv using Python 3.13.7.
Installation instructions:
Exporting cmd-line:
Inference script:
Usage
Short error log
Full log
The error log above is acquired when running
Qwen3.5-2B-int8. The only difference in the error log forQwen3.5-4B-int4is the numeric suffixes.Relevant log output
Issue submission checklist