SPEED-Q

This is an official implementation of "SPEED-Q: Staged Processing with Enhanced Distillation Towards Efficient Low-Bit On-Device VLM Quantization" in AAAI 2026. It's a novel framework for low-bit on-device weight-only VLM quantization.

📣 Updates

[2026.01.16] 🔥 Our Code is in public on Github. Models are to be released.
[2025.11.12] 🔥 Our paper is in public on arxiv.

🌅 Gallery

🎥 Demo

These demos showcase on-device inference in a completely offline environment, where all computations are performed locally on the edge device without any network connectivity.

📊 Qualitative Results

🚀 Quick Start

🛠️ Environment Setup

Tested GPUs: A100(80G)

# Install libraries
$ pip install -r requirements.txt

🧱Model and Data Preparation

Models	Download Link
InternVL3-1B	🤗 Huggingface
InternVL3-2B	🤗 Huggingface

Data format is referenced from https://huggingface.co/datasets/Ahren09/llava_zh, and details of the datasets used can be found in the paper's appendix. The final list of datasets used can be found in: data/training_dataset.json.

📝 Training

Example for the quantization process of InternVL3-1B.

Stage1: ViT is quantized using an image-only calibration set.

For the quantization of the ViT, we use the block-wise AdaRound, code is based on https://github.com/yhhhli/BRECQ. The quantized weights of the ViT will be uploaded later.

Stage2: Projector is trained to better align the quantized ViT (qViT).

$ bash stage2_internvl3_1b_2bit_proj.sh

SAVE_DIR: Path to the save logs and weights
MODEL_PATH: Path to the VLM
TEACHER_MODEL_PATH: Path to the bf16 teacher VLM
QUANT_VIT_PATH: Path to the quantized ViT weights

Stage3: qViT is frozen, the projector and LLM undergo quantization-aware training.

$ bash stage3_internvl3_1b_2bit_qat.sh

🤗 Evaluation

The quantized weights of the SPEED-Q will be uploaded later.

Dequantize to the float model.

$ bash save_fake_quant.sh

Eval with VLMEvalKit.

We evaluate the quantized VLMs using VLMEvalKit.

model_name="InternVL3-1B-SPEED-Q-2bit"
python run.py --data HallusionBench --model ${model_name} --verbose
python run.py --data AI2D_TEST --model ${model_name} --verbose
python run.py --data OCRBench --model ${model_name} --verbose
python run.py --data MMBench_DEV_EN_V11 --model ${model_name} --verbose
python run.py --data MMBench_DEV_CN_V11 --model ${model_name} --verbose
python run.py --data MMStar --model ${model_name} --verbose
python run.py --data MMMU_DEV_VAL --model ${model_name} --verbose
python run.py --data ScienceQA_VAL --model ${model_name} --verbose
python run.py --data SEEDBench_IMG --model ${model_name} --verbose

📝 TODO List

Status	Milestone
✅	Open-source release of SPEED-Q code on GitHub
🚀	Release the `InternVL3-1B-2bit/4bit-SPEED-Q` models on Hugging Face, including both ViT and VLM components with quantized weights and corresponding dequantized floating-point weights
🚀	Provide comprehensive documentation and code for quantization parameters

📒 Citation

If you find our work useful for your research, please consider citing the paper:

@misc{guo2025speedq,
  title={SPEED-Q: Staged Processing with Enhanced Distillation Towards Efficient Low-Bit On-Device VLM Quantization},
  author={Tianyu Guo, Shanwei Zhao, Shiai Zhu, Chenguang Ma},
  year={2025},
  eprint={2511.08914},
  archivePrefix={arXiv}
}

📜 Reference

InternVL: https://github.com/OpenGVLab/InternVL
VLMEvalKit: https://github.com/open-compass/VLMEvalKit

🔑 License

The models in this repository are licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
asset		asset
config		config
data		data
internvl		internvl
quantization		quantization
shell/internvl3/quant		shell/internvl3/quant
.gitignore		.gitignore
LEGAL.md		LEGAL.md
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
save_fake_quant.py		save_fake_quant.py
save_fake_quant.sh		save_fake_quant.sh
stage2_internvl3_1b_2bit_proj.sh		stage2_internvl3_1b_2bit_proj.sh
stage2_internvl3_1b_4bit_proj.sh		stage2_internvl3_1b_4bit_proj.sh
stage3_internvl3_1b_2bit_qat.sh		stage3_internvl3_1b_2bit_qat.sh
stage3_internvl3_1b_4bit_qat.sh		stage3_internvl3_1b_4bit_qat.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPEED-Q

📣 Updates

🌅 Gallery

🎥 Demo

📊 Qualitative Results

🚀 Quick Start

🛠️ Environment Setup

🧱Model and Data Preparation

📝 Training

Stage1: ViT is quantized using an image-only calibration set.

Stage2: Projector is trained to better align the quantized ViT (qViT).

Stage3: qViT is frozen, the projector and LLM undergo quantization-aware training.

🤗 Evaluation

Dequantize to the float model.

Eval with VLMEvalKit.

📝 TODO List

📒 Citation

📜 Reference

🔑 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

SPEED-Q

📣 Updates

🌅 Gallery

🎥 Demo

📊 Qualitative Results

🚀 Quick Start

🛠️ Environment Setup

🧱Model and Data Preparation

📝 Training

Stage1: ViT is quantized using an image-only calibration set.

Stage2: Projector is trained to better align the quantized ViT (qViT).

Stage3: qViT is frozen, the projector and LLM undergo quantization-aware training.

🤗 Evaluation

Dequantize to the float model.

Eval with VLMEvalKit.

📝 TODO List

📒 Citation

📜 Reference

🔑 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages