Skip to content

Commit 300e4c8

Browse files
committed
add nice selector
1 parent e7e86da commit 300e4c8

6 files changed

Lines changed: 451 additions & 0 deletions

File tree

docs/.vuepress/notes/en/guide.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ export const Guide: ThemeNote = defineNoteConfig({
2525
'quickstart',
2626
'tutorial',
2727
'selector_less',
28+
'selector_nice',
2829
'selector_offline_tsds',
2930
'selector_offline_near',
3031
'selector_zeroth'

docs/.vuepress/notes/zh/guide.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ export const Guide: ThemeNote = defineNoteConfig({
2525
'quickstart',
2626
'tutorial',
2727
'selector_less',
28+
'selector_nice',
2829
'selector_offline_tsds',
2930
'selector_offline_near',
3031
'selector_zeroth',

docs/en/notes/guide/selector/selector_less.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ less:
7070
gradient_type: adam
7171
proj_dim: 4096
7272
seed: 123
73+
save_interval: 16
7374
```
7475

7576
**Parameter Description:**
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
---
2+
title: NICE Data Selector
3+
createTime: 2025/12/17 12:00:08
4+
permalink: /en/guide/nice/
5+
icon: carbon:select-02
6+
---
7+
# NICE Selector Usage Guide
8+
9+
This document introduces how to use the **NICE Selector** for dynamic data selection in the **DataFlex** framework. The method constructs gradient similarity between the training set and the validation set: the training set uses SFT loss gradients, while the validation set uses policy gradients based on a reward model. After random projection, similarities are computed to select training samples that are most aligned with the target samples. This method is based on
10+
[**NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric** (ICML 2025)](https://icml.cc/virtual/2025/poster/46560).
11+
12+
---
13+
14+
## 1. Method Overview
15+
16+
The core workflow of **NICE Selector**:
17+
18+
1. **Data normalization**: Automatically supports formats such as Alpaca and ShareGPT.
19+
2. **Training-set gradients**: Compute gradients for each training sample and project them using TRAK.
20+
3. **Reward-set gradients**: Perform Monte Carlo sampling on validation data, generate responses, score them using a reward model (local vLLM or remote API), compute policy gradients toward the reward direction, and project them.
21+
4. **Similarity-based selection**: Align and normalize projected gradients, rank training samples by their average similarity to validation samples, and select the top-k samples for the current training round.
22+
23+
24+
## 2. Implementation Steps
25+
26+
### Step 1: Environment Setup
27+
28+
```bash
29+
git clone https://github.com/OpenDCAI/DataFlex.git
30+
cd DataFlex
31+
pip install -e .
32+
pip install llamafactory
33+
````
34+
35+
---
36+
37+
### Step 2: NICE Selector Configuration
38+
39+
**Configuration file path:**
40+
41+
```
42+
DataFlex/src/dataflex/configs/components.yaml
43+
```
44+
45+
**Example configuration:**
46+
47+
```yaml
48+
nice:
49+
name: nice
50+
params:
51+
cache_dir: ../dataflex_saves/nice_output
52+
gradient_type: adam
53+
proj_dim: 4096
54+
seed: 123
55+
save_interval: 16
56+
reward_model_backend: local_vllm # choices: [local_vllm, api]
57+
reward_backend_params:
58+
local_vllm:
59+
hf_model_name_or_path: meta-llama/Llama-3.1-8B
60+
vllm_tensor_parallel_size: 1
61+
vllm_temperature: 0.0
62+
vllm_top_p: 0.9
63+
vllm_max_tokens: 512
64+
vllm_top_k: 40
65+
vllm_seed: 42
66+
vllm_max_model_len: null
67+
vllm_gpu_memory_utilization: 0.9
68+
api:
69+
api_url: https://api.openai.com/v1/chat/completions
70+
api_key: DF_API_KEY
71+
model_name: gpt-4o
72+
temperature: 0.0
73+
mc_samples: 4
74+
max_new_tokens: 512
75+
generation_temperature: 0.7
76+
max_prompt_length: 4096
77+
```
78+
79+
**Parameter description:**
80+
81+
* `cache_dir`: Path to cache gradient projections and selection results; supports resuming from checkpoints.
82+
* `gradient_type`: `adam` (with first- and second-moment normalization) or `sgd`.
83+
* `proj_dim`: Random projection dimension, controlling the cost/accuracy trade-off of similarity computation.
84+
* `reward_model_backend`: Reward model backend; `local_vllm` uses local vLLM inference, `api` uses an HTTP service.
85+
* `reward_backend_params`: Backend-specific parameters.
86+
* `mc_samples`: Number of Monte Carlo generations per reward sample, used to stabilize policy gradient estimation.
87+
* `max_new_tokens` / `generation_temperature` / `max_prompt_length`: Generation length and sampling strategy for the policy model.
88+
89+
---
90+
91+
### Step 3: Dynamic Training Configuration
92+
93+
**Configuration file path:**
94+
95+
```
96+
DataFlex/examples/train_lora/selectors/nice.yaml
97+
```
98+
99+
**Example configuration:**
100+
101+
```yaml
102+
### model
103+
model_name_or_path: meta-llama/Llama-3.1-8B
104+
trust_remote_code: true
105+
106+
### method
107+
stage: sft
108+
do_train: true
109+
finetuning_type: lora
110+
lora_target: all
111+
lora_rank: 16
112+
lora_alpha: 8
113+
114+
### dataset
115+
dataset: alpaca_en_demo
116+
template: llama3
117+
cutoff_len: 4096
118+
overwrite_cache: true
119+
preprocessing_num_workers: 16
120+
dataloader_num_workers: 0
121+
seed: 42
122+
123+
### output
124+
output_dir: ../dataflex_saves/nice_output
125+
logging_steps: 10
126+
save_steps: 100
127+
plot_loss: true
128+
save_only_model: false
129+
overwrite_output_dir: true
130+
131+
### swanlab
132+
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
133+
# use_swanlab: true
134+
# swanlab_project: dynamic_nice_sft
135+
# swanlab_run_name: name
136+
# swanlab_workspace: your_workspace
137+
# swanlab_api_key: xxxxxxx
138+
139+
### train
140+
per_device_train_batch_size: 1
141+
gradient_accumulation_steps: 1
142+
learning_rate: 1.0e-4
143+
num_train_epochs: 1.0
144+
lr_scheduler_type: cosine
145+
warmup_ratio: 0.1
146+
bf16: true
147+
ddp_timeout: 180000000
148+
149+
### dynamic_train
150+
train_type: dynamic_select
151+
components_cfg_file: src/dataflex/configs/components.yaml
152+
component_name: nice
153+
warmup_step: 10
154+
update_step: 10
155+
update_times: 2
156+
157+
eval_dataset: alpaca_zh_demo
158+
per_device_eval_batch_size: 1
159+
metric_for_best_model: eval_loss
160+
greater_is_better: false
161+
load_best_model_at_end: true
162+
eval_strategy: steps # choices: [no, steps, epoch]
163+
eval_steps: 10
164+
early_stopping_steps: 3
165+
early_stopping_min_delta: 0.01
166+
```
167+
168+
**Parameter description:**
169+
170+
* `component_name`: Must match the `nice` component in `components.yaml`, determining reward backend and projection dimensions.
171+
* `warmup_step` / `update_step` / `update_times`: Control the dynamic selection schedule; total steps = `warmup_step + update_step × update_times`.
172+
* `eval_dataset`: Validation set (Alpaca/ShareGPT style); reward model is used for scoring during generation.
173+
* `output_dir`: Path to save LoRA adapters and caches.
174+
175+
---
176+
177+
### Step 4: Run Training
178+
179+
```bash
180+
FORCE_TORCHRUN=1 DISABLE_VERSION_CHECK=1 dataflex-cli train examples/train_lora/selectors/nice.yaml
181+
```
182+
183+
---
184+
185+
### Step 5: Model Merge and Export
186+
187+
**Configuration file path:**
188+
189+
```
190+
DataFlex/examples/merge_lora/llama3_lora_sft.yaml
191+
```
192+
193+
**Example configuration:**
194+
195+
```yaml
196+
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
197+
adapter_name_or_path: ../dataflex_saves/nice_output
198+
template: llama3
199+
trust_remote_code: true
200+
201+
export_dir: ../dataflex_saves/Llama-3.1-8B_nice_lora_sft
202+
export_size: 5
203+
export_device: cpu # choices: [cpu, auto]
204+
export_legacy_format: false
205+
```
206+
207+
**Parameter description:**
208+
209+
* `adapter_name_or_path`: Path to the LoRA adapters obtained from NICE dynamic selection training.
210+
* `export_dir`: Output directory for the merged full model.
211+
212+
Run the merge and export command:
213+
214+
```bash
215+
llamafactory-cli export llama3_lora_sft.yaml
216+
```
217+
218+
The merged model will be saved to:
219+
220+
```
221+
/dataflex_saves/Llama-3.1-8B_nice_lora_sft
222+
```
223+
224+
225+
## 3. Model Evaluation
226+
227+
It is recommended to use the [DataFlow](https://github.com/OpenDCAI/DataFlow)
228+
[Model QA Evaluation Pipeline](https://opendcai.github.io/DataFlow-Doc/en/guide/2k5wjgls/) to systematically evaluate the generated model, and to inspect the scoring logs in `cache_dir` to analyze the reward model’s sensitivity to different samples.
229+

docs/zh/notes/guide/selector/selector_less.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ less:
6767
gradient_type: adam
6868
proj_dim: 4096
6969
seed: 123
70+
save_interval: 16
7071
```
7172
7273
**参数说明:**

0 commit comments

Comments
 (0)