feat: implement UE8M0 scale format support for FP8 inference by Libres-coder · Pull Request #1023 · deepseek-ai/DeepSeek-V3

Libres-coder · 2025-10-26T16:53:41Z

Summary

Implements UE8M0 (uint8 exponent) scale format for FP8 quantization, as specified in config_v3.1.json. Reduces activation scale memory by 75% (4 bytes → 1 byte) and optimizes computation through exponent-based operations.

Changes

Core Implementation (`inference/kernel.py`)

New functions:

convert_scale_to_ue8m0() - Convert float32 scales to uint8 format
convert_scale_from_ue8m0() - Convert uint8 back to float32

Updated functions:

act_quant() / act_quant_kernel() - Support UE8M0 output (uint8 scales)
fp8_gemm() / fp8_gemm_kernel() - Use exponent addition for UE8M0
weight_dequant() / weight_dequant_kernel() - Support UE8M0 input

Format specification:

# Encoding: uint8 = ceil(log2(scale)) + 127
# Decoding: scale = 2^(uint8 - 127)
# Optimization: exp_a + exp_b instead of scale_a * scale_b

Integration (`inference/model.py`)

Added scale_fmt parameter to linear() function
Pass scale_fmt from config through model to kernels
Configuration flow: config.json → ModelArgs.scale_fmt → Linear.scale_fmt → kernels

Benefits

Aspect	Improvement
Activation scale memory	-75% (4 bytes → 1 byte)
Memory bandwidth	-75% for scale transfers
Computation	Exponent addition (faster than float multiplication)

Backward Compatibility

Zero breaking changes:

Default scale_fmt=None uses original float32 path
Weight scales remain float32 (safetensors compatibility)
Automatic runtime conversion for mixed formats

Usage

Already configured in config_v3.1.json:

{
    "dtype": "fp8",
    "scale_fmt": "ue8m0"
}

Run inference:

python inference/generate.py --config configs/config_v3.1.json ...

Files Changed

inference/kernel.py - Core implementation (~80 lines)
inference/model.py - Integration (~15 lines)

Resolves: #994

Libres-coder · 2025-10-26T17:01:34Z

ptal,thx @GeeeekExplorer @haswelliris @mowentian

feat: implement UE8M0 scale format support for FP8 inference

73fe98d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement UE8M0 scale format support for FP8 inference#1023

feat: implement UE8M0 scale format support for FP8 inference#1023
Libres-coder wants to merge 1 commit intodeepseek-ai:mainfrom
Libres-coder:main

Libres-coder commented Oct 26, 2025 •

edited

Loading

Uh oh!

Libres-coder commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Libres-coder commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core Implementation (inference/kernel.py)

Integration (inference/model.py)

Benefits

Backward Compatibility

Usage

Files Changed

Uh oh!

Libres-coder commented Oct 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Libres-coder commented Oct 26, 2025 •

edited

Loading

Core Implementation (`inference/kernel.py`)

Integration (`inference/model.py`)