Skip to content

Quantization fixes#32

Merged
spectralflight merged 4 commits intonvidia-cosmos:mainfrom
milesial:patch-1
Jan 15, 2026
Merged

Quantization fixes#32
spectralflight merged 4 commits intonvidia-cosmos:mainfrom
milesial:patch-1

Conversation

@milesial
Copy link
Copy Markdown
Contributor

  • enables FP8 KV quant (k and v scales, no q or prob)
  • fixes the default saved max model len (256k)
  • use static FP8 activation scales instead of dynamic (tensor-wise, minmax) for better perf
  • direct copy of tokenizer and preprocessing configs to avoid llmcompressor introducing silent truncation

@spectralflight spectralflight self-requested a review January 5, 2026 17:37
Copy link
Copy Markdown
Contributor

@spectralflight spectralflight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few minor comments, but otherwise LGTM. Please run just lint per https://github.com/nvidia-cosmos/cosmos-reason2/blob/main/CONTRIBUTING.md#test

Comment thread scripts/quantize.py
Comment thread scripts/quantize.py
Comment thread scripts/quantize.py
@spectralflight spectralflight merged commit 4caf947 into nvidia-cosmos:main Jan 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants