feat(engine): sparse delta compression for disk-based weight updates

## Summary

Add sparse delta encoding and compression for the `type="disk"` weight update path, reducing checkpoint transfer volume by ~50-100× for RL training.

## Motivation

Between consecutive RL training steps, >98% of bf16 parameters remain bit-identical. The xccl path now skips unchanged parameters using hash-based detection. However, the disk path (`_update_weights_from_disk`) still writes and reads the full model checkpoint every time.

For cross-region and decentralized RL setups where weight sync happens through shared object storage (S3/GCS), the disk path is the primary transfer mechanism. Sending only the changed elements instead of the full checkpoint would drastically reduce transfer time and storage cost.

## Proposed Approach

1. **Detect changed elements**: After optimizer step, compare current weights vs previous weights element-wise in bf16. Only ~1-2% of elements typically change.
2. **Sparse encode**: For each parameter, store only `(indices, values)` of changed elements instead of the full tensor.
3. **Compress**: Apply lossless compression (e.g., zstd) to the sparse representation. Index sorting + delta encoding makes the index stream highly compressible.
4. **Checkpoint chain**: Periodically write full "anchor" checkpoints (every N steps). Between anchors, write only sparse deltas. This bounds reconstruction cost.
5. **Reconstruct**: Inference workers download base + chain of deltas, apply sequentially, verify per-tensor checksums for bit-identical reconstruction.

### Key properties

- **Lossless**: Bit-identical reconstruction guaranteed (no floating-point drift)
- **Bounded chain**: Full anchor every N steps prevents unbounded delta accumulation
- **Integrity**: Per-tensor checksum verification after reconstruction
- **Independent of xccl path**: This is a separate optimization for the disk-based weight update flow

## Files to modify

- `areal/engine/fsdp_engine.py` — `_update_weights_from_disk()`, `_save_model_to_hf()`
- `areal/experimental/engine/archon_weight_sync.py` — `update_weights_from_disk()`
- `areal/experimental/engine/archon_checkpoint.py` — `save_model_to_hf()`
- New: `areal/utils/sparse_checkpoint.py` — Sparse encoding/decoding/compression utilities

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engine): sparse delta compression for disk-based weight updates #1125

Summary

Motivation

Proposed Approach

Key properties

Files to modify

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(engine): sparse delta compression for disk-based weight updates #1125

Description

Summary

Motivation

Proposed Approach

Key properties

Files to modify

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions