You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/notes/guide/mixer/doremi.md
+15-25Lines changed: 15 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: DoReMi Data Mixer
3
-
createTime: 2025/01/30 10:00:00
3
+
createTime: 2025/11/27 10:00:00
4
4
icon: material-symbols:balance
5
5
permalink: /en/guide/mixer/doremi/
6
6
---
@@ -33,9 +33,6 @@ component_name: static # Use static mixer
33
33
mixture_sample_rule: mixture
34
34
init_mixture_proportions: [0.5, 0.5] # Initial weights, uniform distribution
35
35
static_mix: true
36
-
warmup_step: 100
37
-
update_step: 200
38
-
update_times: 3
39
36
```
40
37
41
38
**Key Parameters**:
@@ -56,7 +53,7 @@ mixers:
56
53
57
54
### Step 2: Proxy Model Weight Optimization
58
55
59
-
Use the DoReMi algorithm to dynamically optimize domain weights on a small proxy model. The algorithm adjusts weights by computing excess loss for each domain.
56
+
Use the DoReMi algorithm to dynamically optimize domain weights on a small proxy model. The algorithm adjusts weights by computing excess loss for each domain. During training, the algorithm uses uniform sampling for data selection, but the optimized domain weights are recorded and used for loss reweighting in the training step.
0 commit comments