cd RISE/dynamics/dynamics_model
# Navigate to the dynamics model directory before running the following commands
The framework expects data in the LeRobot format. For optimal training performance, we strongly recommend pre-resizing videos to [256, 192] resolution.
All tasks should be organized in the dataset directory with the following structure:
# copy your dataset under the dataset directory
cp -r path/to/your/dataset dataset/Each dataset is organized as follows:
task_A/
├── data/
│ └── chunk-000/
│ ├── episode_000000.parquet
│ ├── episode_000001.parquet
│ ├── episode_000002.parquet
│ └── ...
├── meta/
│ ├── info.json
│ ├── episodes.jsonl
│ ├── episodes_stats.jsonl
│ └── tasks.jsonl
└── videos/
└── chunk-000/
└── [video files]
The preprocess.sh script resizes all videos in the dataset to 256x192 resolution using ffmpeg, preserving aspect ratio with center padding. Processed videos are saved in videos_small while maintaining the original directory structure.
Usage:
# Process specific datasets
./preprocess.sh dataset1 [dataset2](optional)The output would be as follows with videos_small:
task_A/
├── data/
├── meta/
└── videos/
└── videos_small/
│ └── chunk-000/
│ └── [video files]
Download the LTX-Video backbone components (Text Encoder, Tokenizer, and VAE) using the provided script:
./download.shThis script automatically downloads all required components from the LTX-Video HuggingFace repository to the checkpoints directory.
Alternatively, you can manually download the following components:
- Text Encoder: text_encoder
- Tokenizer: tokenizer
- VAE: vae
- Pre-trained dynamics model: dynamics_model, pretrained on Galaxea Open World and AgiBot World Alpha jointly.
Place all downloaded weights in the same directory and update the pretrained_model_name_or_path field in your configuration file.
Pre-training is performed on large-scale robotic datasets to learn general dynamics priors. We utilize the following datasets:
- Galaxea Open World Dataset: Galaxea-Open-World-Dataset
- AgiBot World Alpha: AgiBotWorld-Alpha
-
Prepare Data: Convert your datasets to the LeRobot format as described above.
-
Configure Training: Edit
configs/ltx_model/pretrain.yamlaccording to the comments:- Set
pretrained_model_name_or_pathto your LTX backbone checkpoint directory - Set
diffusion_model.model_pathto your pre-trained diffusion checkpoint - Configure
data.train.data_rootsanddata.val.data_rootsto point to your dataset directories
- Set
-
Launch Training:
bash train_task_centric.sh
Fine-tuning adapts the pre-trained model to specific task domains using domain-specific datasets.
-
Prepare Task-Specific Data: Organize your fine-tuning dataset in the LeRobot format.
-
Compute Action Normalization Statistics: Use
norm.pyto compute and save normalization statistics:python norm.py --datasets <your_finetune_dataset> --save-config data/utils/action_norm.json
This automatically computes min and max values for each dataset and saves them to a JSON configuration file.
-
Configure Fine-tuning: Edit
configs/ltx_model/finetune.yaml:- Set
pretrained_model_name_or_pathto your LTX backbone checkpoint directory - Set
diffusion_model.model_pathto your diffusion checkpoint - Configure
data.train.data_rootsanddata.val.data_rootsfor your fine-tuning dataset - Add
norm_config_path: data/utils/action_norm.jsonto bothdata.trainanddata.valsections - The data loader will automatically use the normalization values from the config file based on dataset names
- Set
-
Launch Fine-tuning:
bash task_finetune.sh
The inference pipeline generates future video sequences conditioned on initial observations and action sequences.
-
Configure Inference: Edit
configs/ltx_model/infer.yaml:- Set
pretrained_model_name_or_pathyour LTX backbone checkpoint directory - Set
diffusion_model.model_pathto your diffusion checkpoint
- Set
-
Update Inference Script: Edit
infer.shwith appropriate paths -
Run Inference:
bash infer.sh
--config_file: Path to inference configuration file--image_root: Directory containing input observation images--output_path: Directory to save generated videos--act_tokens_path: Path to action token file (.ptformat)--norm_constant: Normalization constant for action tokens (e.g.,FINETUNE_TASK)

