|
| 1 | +--- |
| 2 | +title: NICE Data Selector |
| 3 | +createTime: 2025/12/17 12:00:08 |
| 4 | +permalink: /en/guide/nice/ |
| 5 | +icon: carbon:select-02 |
| 6 | +--- |
| 7 | +# NICE Selector Usage Guide |
| 8 | + |
| 9 | +This document introduces how to use the **NICE Selector** for dynamic data selection in the **DataFlex** framework. The method constructs gradient similarity between the training set and the validation set: the training set uses SFT loss gradients, while the validation set uses policy gradients based on a reward model. After random projection, similarities are computed to select training samples that are most aligned with the target samples. This method is based on |
| 10 | +[**NICE Data Selection for Instruction Tuning in LLMs with Non-differentiable Evaluation Metric** (ICML 2025)](https://icml.cc/virtual/2025/poster/46560). |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +## 1. Method Overview |
| 15 | + |
| 16 | +The core workflow of **NICE Selector**: |
| 17 | + |
| 18 | +1. **Data normalization**: Automatically supports formats such as Alpaca and ShareGPT. |
| 19 | +2. **Training-set gradients**: Compute gradients for each training sample and project them using TRAK. |
| 20 | +3. **Reward-set gradients**: Perform Monte Carlo sampling on validation data, generate responses, score them using a reward model (local vLLM or remote API), compute policy gradients toward the reward direction, and project them. |
| 21 | +4. **Similarity-based selection**: Align and normalize projected gradients, rank training samples by their average similarity to validation samples, and select the top-k samples for the current training round. |
| 22 | + |
| 23 | + |
| 24 | +## 2. Implementation Steps |
| 25 | + |
| 26 | +### Step 1: Environment Setup |
| 27 | + |
| 28 | +```bash |
| 29 | +git clone https://github.com/OpenDCAI/DataFlex.git |
| 30 | +cd DataFlex |
| 31 | +pip install -e . |
| 32 | +pip install llamafactory |
| 33 | +```` |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +### Step 2: NICE Selector Configuration |
| 38 | + |
| 39 | +**Configuration file path:** |
| 40 | + |
| 41 | +``` |
| 42 | +DataFlex/src/dataflex/configs/components.yaml |
| 43 | +``` |
| 44 | +
|
| 45 | +**Example configuration:** |
| 46 | +
|
| 47 | +```yaml |
| 48 | +nice: |
| 49 | + name: nice |
| 50 | + params: |
| 51 | + cache_dir: ../dataflex_saves/nice_output |
| 52 | + gradient_type: adam |
| 53 | + proj_dim: 4096 |
| 54 | + seed: 123 |
| 55 | + save_interval: 16 |
| 56 | + reward_model_backend: local_vllm # choices: [local_vllm, api] |
| 57 | + reward_backend_params: |
| 58 | + local_vllm: |
| 59 | + hf_model_name_or_path: meta-llama/Llama-3.1-8B |
| 60 | + vllm_tensor_parallel_size: 1 |
| 61 | + vllm_temperature: 0.0 |
| 62 | + vllm_top_p: 0.9 |
| 63 | + vllm_max_tokens: 512 |
| 64 | + vllm_top_k: 40 |
| 65 | + vllm_seed: 42 |
| 66 | + vllm_max_model_len: null |
| 67 | + vllm_gpu_memory_utilization: 0.9 |
| 68 | + api: |
| 69 | + api_url: https://api.openai.com/v1/chat/completions |
| 70 | + api_key: DF_API_KEY |
| 71 | + model_name: gpt-4o |
| 72 | + temperature: 0.0 |
| 73 | + mc_samples: 4 |
| 74 | + max_new_tokens: 512 |
| 75 | + generation_temperature: 0.7 |
| 76 | + max_prompt_length: 4096 |
| 77 | +``` |
| 78 | + |
| 79 | +**Parameter description:** |
| 80 | + |
| 81 | +* `cache_dir`: Path to cache gradient projections and selection results; supports resuming from checkpoints. |
| 82 | +* `gradient_type`: `adam` (with first- and second-moment normalization) or `sgd`. |
| 83 | +* `proj_dim`: Random projection dimension, controlling the cost/accuracy trade-off of similarity computation. |
| 84 | +* `reward_model_backend`: Reward model backend; `local_vllm` uses local vLLM inference, `api` uses an HTTP service. |
| 85 | +* `reward_backend_params`: Backend-specific parameters. |
| 86 | +* `mc_samples`: Number of Monte Carlo generations per reward sample, used to stabilize policy gradient estimation. |
| 87 | +* `max_new_tokens` / `generation_temperature` / `max_prompt_length`: Generation length and sampling strategy for the policy model. |
| 88 | + |
| 89 | +--- |
| 90 | + |
| 91 | +### Step 3: Dynamic Training Configuration |
| 92 | + |
| 93 | +**Configuration file path:** |
| 94 | + |
| 95 | +``` |
| 96 | +DataFlex/examples/train_lora/selectors/nice.yaml |
| 97 | +``` |
| 98 | + |
| 99 | +**Example configuration:** |
| 100 | + |
| 101 | +```yaml |
| 102 | +### model |
| 103 | +model_name_or_path: meta-llama/Llama-3.1-8B |
| 104 | +trust_remote_code: true |
| 105 | + |
| 106 | +### method |
| 107 | +stage: sft |
| 108 | +do_train: true |
| 109 | +finetuning_type: lora |
| 110 | +lora_target: all |
| 111 | +lora_rank: 16 |
| 112 | +lora_alpha: 8 |
| 113 | + |
| 114 | +### dataset |
| 115 | +dataset: alpaca_en_demo |
| 116 | +template: llama3 |
| 117 | +cutoff_len: 4096 |
| 118 | +overwrite_cache: true |
| 119 | +preprocessing_num_workers: 16 |
| 120 | +dataloader_num_workers: 0 |
| 121 | +seed: 42 |
| 122 | + |
| 123 | +### output |
| 124 | +output_dir: ../dataflex_saves/nice_output |
| 125 | +logging_steps: 10 |
| 126 | +save_steps: 100 |
| 127 | +plot_loss: true |
| 128 | +save_only_model: false |
| 129 | +overwrite_output_dir: true |
| 130 | + |
| 131 | +### swanlab |
| 132 | +report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow] |
| 133 | +# use_swanlab: true |
| 134 | +# swanlab_project: dynamic_nice_sft |
| 135 | +# swanlab_run_name: name |
| 136 | +# swanlab_workspace: your_workspace |
| 137 | +# swanlab_api_key: xxxxxxx |
| 138 | + |
| 139 | +### train |
| 140 | +per_device_train_batch_size: 1 |
| 141 | +gradient_accumulation_steps: 1 |
| 142 | +learning_rate: 1.0e-4 |
| 143 | +num_train_epochs: 1.0 |
| 144 | +lr_scheduler_type: cosine |
| 145 | +warmup_ratio: 0.1 |
| 146 | +bf16: true |
| 147 | +ddp_timeout: 180000000 |
| 148 | + |
| 149 | +### dynamic_train |
| 150 | +train_type: dynamic_select |
| 151 | +components_cfg_file: src/dataflex/configs/components.yaml |
| 152 | +component_name: nice |
| 153 | +warmup_step: 10 |
| 154 | +update_step: 10 |
| 155 | +update_times: 2 |
| 156 | + |
| 157 | +eval_dataset: alpaca_zh_demo |
| 158 | +per_device_eval_batch_size: 1 |
| 159 | +metric_for_best_model: eval_loss |
| 160 | +greater_is_better: false |
| 161 | +load_best_model_at_end: true |
| 162 | +eval_strategy: steps # choices: [no, steps, epoch] |
| 163 | +eval_steps: 10 |
| 164 | +early_stopping_steps: 3 |
| 165 | +early_stopping_min_delta: 0.01 |
| 166 | +``` |
| 167 | +
|
| 168 | +**Parameter description:** |
| 169 | +
|
| 170 | +* `component_name`: Must match the `nice` component in `components.yaml`, determining reward backend and projection dimensions. |
| 171 | +* `warmup_step` / `update_step` / `update_times`: Control the dynamic selection schedule; total steps = `warmup_step + update_step × update_times`. |
| 172 | +* `eval_dataset`: Validation set (Alpaca/ShareGPT style); reward model is used for scoring during generation. |
| 173 | +* `output_dir`: Path to save LoRA adapters and caches. |
| 174 | + |
| 175 | +--- |
| 176 | + |
| 177 | +### Step 4: Run Training |
| 178 | + |
| 179 | +```bash |
| 180 | +FORCE_TORCHRUN=1 DISABLE_VERSION_CHECK=1 dataflex-cli train examples/train_lora/selectors/nice.yaml |
| 181 | +``` |
| 182 | + |
| 183 | +--- |
| 184 | + |
| 185 | +### Step 5: Model Merge and Export |
| 186 | + |
| 187 | +**Configuration file path:** |
| 188 | + |
| 189 | +``` |
| 190 | +DataFlex/examples/merge_lora/llama3_lora_sft.yaml |
| 191 | +``` |
| 192 | +
|
| 193 | +**Example configuration:** |
| 194 | +
|
| 195 | +```yaml |
| 196 | +model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct |
| 197 | +adapter_name_or_path: ../dataflex_saves/nice_output |
| 198 | +template: llama3 |
| 199 | +trust_remote_code: true |
| 200 | +
|
| 201 | +export_dir: ../dataflex_saves/Llama-3.1-8B_nice_lora_sft |
| 202 | +export_size: 5 |
| 203 | +export_device: cpu # choices: [cpu, auto] |
| 204 | +export_legacy_format: false |
| 205 | +``` |
| 206 | + |
| 207 | +**Parameter description:** |
| 208 | + |
| 209 | +* `adapter_name_or_path`: Path to the LoRA adapters obtained from NICE dynamic selection training. |
| 210 | +* `export_dir`: Output directory for the merged full model. |
| 211 | + |
| 212 | +Run the merge and export command: |
| 213 | + |
| 214 | +```bash |
| 215 | +llamafactory-cli export llama3_lora_sft.yaml |
| 216 | +``` |
| 217 | + |
| 218 | +The merged model will be saved to: |
| 219 | + |
| 220 | +``` |
| 221 | +/dataflex_saves/Llama-3.1-8B_nice_lora_sft |
| 222 | +``` |
| 223 | + |
| 224 | + |
| 225 | +## 3. Model Evaluation |
| 226 | + |
| 227 | +It is recommended to use the [DataFlow](https://github.com/OpenDCAI/DataFlow) |
| 228 | +[Model QA Evaluation Pipeline](https://opendcai.github.io/DataFlow-Doc/en/guide/2k5wjgls/) to systematically evaluate the generated model, and to inspect the scoring logs in `cache_dir` to analyze the reward model’s sensitivity to different samples. |
| 229 | + |
0 commit comments