[Bug]: Inconsistent tensor shape when running Qwen3.5 on NPU

### OpenVINO Version

2026.2.0.dev20260324

### Operating System

Windows System

### Device used for inference

NPU

### Framework

None

### Model used

https://huggingface.co/Qwen/Qwen3.5-2B

### Issue description

I exported Qwen3.5 to openvino using [optimum-intel PR #1634](https://github.com/huggingface/optimum-intel/pull/1634), but it fails to run on NPU. (It runs normally on CPU and GPU)

The error occurs in a loop node. which I believe corresponds to the Gated Delta Rule block, based on the information for the previous and subsequent layers found in `openvino_language_model.xml` file. The previous layers are `in_proj_q/k/v/a/b`, which are the inputs to the Gated Delta Rule block, while the subsequent layers includes `norm`, `in_proj_z`, and `out_proj`.

<img src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/rvhZECILOG3cLpMyrimbl.png" />

The figure below shows the structure of the loop node in Qwen3.5-2B. In this diagram, `Multiply_1206`, outlined by a red-dotted line, outputs a tensor with an inconsistent shape on NPU, causing the model to fail.

<img width="1615" height="1239" alt="Image" src="https://github.com/user-attachments/assets/8bc9a1fa-d582-40e1-b04b-ab0a52406f54" />


### Step-by-step reproduction

I followed the steps in [optimum-intel PR #1634](https://github.com/huggingface/optimum-intel/pull/1634), with some deviations (arguments for quantization & device selection), in a venv using Python 3.13.7.

Installation instructions:
```
pip install git+https://github.com/rkazants/optimum-intel.git@support_qwen3_5
pip install --pre -U openvino openvino-tokenizers nncf --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
pip install transformers==5.2.0
pip install requests torchvision opencv-python
```

Exporting cmd-line:
```
optimum-cli export openvino -m Qwen/Qwen3.5-2B --weight-format int8 --sym Qwen3.5-2B-int8
optimum-cli export openvino -m Qwen/Qwen3.5-4B --weight-format int4 --ratio 1 --sym Qwen3.5-4B-ov-int4
```

Inference script:
```
from transformers import AutoProcessor
from transformers.video_utils import load_video
from huggingface_hub import hf_hub_download
from optimum.intel.openvino import OVModelForVisualCausalLM
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--model-dir', type=str, required=True)
parser.add_argument('--device', type=str, default="NPU")
args = parser.parse_args()

processor = AutoProcessor.from_pretrained(args.model_dir)

ov_config = {
    "NPU_USE_NPUW": "YES",
    "NPUW_LLM": "YES",
}

if args.device == "CPU":
    model = OVModelForVisualCausalLM.from_pretrained(args.model_dir)  # default: CPU
elif args.device == "GPU":
    model = OVModelForVisualCausalLM.from_pretrained(args.model_dir, device='GPU') 
elif args.device == "NPU":
    model = OVModelForVisualCausalLM.from_pretrained(args.model_dir, ov_config=ov_config, device="NPU")

# Prepare video input
video_path = hf_hub_download(
                repo_id="raushan-testing-hf/videos-test",
                filename="sample_demo_1.mp4",
                repo_type="dataset",
            )
input_video, _ = load_video(video_path, num_frames=10, backend="opencv")

messages = [
    {"role": "user", "content": [
        {"type": "video"},
        {"type": "text", "text": "Why is this video funny?"},
    ]}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], videos=[input_video], return_tensors="pt")

# Run inference
output_ids = model.generate(**inputs, max_new_tokens=100)
output_text = processor.decode(output_ids[0], skip_special_tokens=True)

print(output_text)
```

Usage
```
python infer.py --model-dir Qwen3.5-2B-int8 --device NPU
python infer.py --model-dir Qwen3.5-4B-int4 --device NPU
```


Short error log 
```
RuntimeError: Exception from src/inference/src/cpp/core.cpp:121:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'TRShape::broadcast_merge_into(output_shape, input_shapes[1], autob)' failed at src/core/shape_inference/include\eltwise_shape_inference.hpp:28:
While validating node 'opset1::Multiply Multiply_1210 (opset1::Multiply Multiply_1206[0]:f32[1,16,0,128], opset1::Unsqueeze Unsqueeze_1214[0]:f32[1,16,128,1]) -> (f32[?,16,128,128])' with friendly_name 'Multiply_1210':
Argument shapes are inconsistent.
```

<details>
<summary>Full log</summary>

```
Traceback (most recent call last):
  File "C:\intel_260325\infer.py", line 21, in <module>
    model = OVModelForVisualCausalLM.from_pretrained(model_dir, ov_config=properties, device="NPU")
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_base.py", line 617, in from_pretrained
    return super().from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~^
        model_id,
        ^^^^^^^^^
    ...<9 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\modeling_base.py", line 407, in from_pretrained
    return from_pretrained_method(
        model_id=model_id,
    ...<9 lines>...
        **kwargs,
    )
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 582, in _from_pretrained
    model = model_cls(
        language_model=language_model,
    ...<6 lines>...
        **kwargs,
    )
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 4865, in __init__
    super().__init__(
    ~~~~~~~~~~~~~~~~^
        language_model=language_model,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 418, in __init__
    self.language_model = OVModelWithEmbedForCausalLM(
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        language_model,
        ^^^^^^^^^^^^^^^
    ...<7 lines>...
        compile_only=self._compile_only,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 96, in __init__
    super().__init__(
    ~~~~~~~~~~~~~~~~^
        model=model,
        ^^^^^^^^^^^^
    ...<6 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_decoder.py", line 208, in __init__
    self.compile()
    ~~~~~~~~~~~~^^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_visual_language.py", line 114, in compile
    super().compile()
    ~~~~~~~~~~~~~~~^^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_decoder.py", line 428, in compile
    super().compile()
    ~~~~~~~~~~~~~~~^^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_base.py", line 914, in compile
    self.request = self._compile_model(self.model, self._device, ov_config, self.model_save_dir)
                   ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\intel_260325\.venv\Lib\site-packages\optimum\intel\openvino\modeling_base.py", line 420, in _compile_model
    compiled_model = core.compile_model(model, device.upper() if device is not None else device, config=ov_config)
  File "C:\intel_260325\.venv\Lib\site-packages\openvino\_ov_api.py", line 646, in compile_model
    super().compile_model(model, device_name, {} if config is None else config),
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Exception from src/inference/src/cpp/core.cpp:121:
Exception from src/inference/src/dev/plugin.cpp:54:
Check 'TRShape::broadcast_merge_into(output_shape, input_shapes[1], autob)' failed at src/core/shape_inference/include\eltwise_shape_inference.hpp:28:
While validating node 'opset1::Multiply Multiply_1210 (opset1::Multiply Multiply_1206[0]:f32[1,16,0,128], opset1::Unsqueeze Unsqueeze_1214[0]:f32[1,16,128,1]) -> (f32[?,16,128,128])' with friendly_name 'Multiply_1210':
Argument shapes are inconsistent.
```
</details>

The error log above is acquired when running `Qwen3.5-2B-int8`. The only difference in the error log for `Qwen3.5-4B-int4` is the numeric suffixes.


### Relevant log output

```shell

```

### Issue submission checklist

- [x] I'm reporting an issue. It's not a question.
- [x] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- [x] There is reproducer code and related data files such as images, videos, models, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Inconsistent tensor shape when running Qwen3.5 on NPU #35209

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Inconsistent tensor shape when running Qwen3.5 on NPU #35209

Description

OpenVINO Version

Operating System

Device used for inference

Framework

Model used

Issue description

Step-by-step reproduction

Relevant log output

Issue submission checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions