DDexbotic Docs
Docs/RLinf

RL Post-Training for Dexbotic-π0 using RLinf

Overview

We are pleased to announce a strategic collaboration with RLinf. This document describes how to apply RL post-training with Dexbotic-π0 using RLinf on LIBERO.


Environment Setup

Set up the RLinf environment first.

git clone https://github.com/RLinf/RLinf.git
cd RLinf
bash requirements/install.sh embodied --venv dexbotic --model dexbotic --env maniskill_libero
source .venv/bin/activate

Step 1. Apply Supervised Fine-tuning (SFT) with Dexbotic-π0

You can directly download our released checkpoints from Hugging Face, which were trained on LIBERO 4 suites jointly, or fine-tune by yourself with the command below:

python playground/benchmarks/libero/libero_pi0.py --task train

Step 2. Apply Post-Training with RLinf

Follow the RLinf training workflow and use the fine-tuned checkpoint from Step 1 in the RLinf config files.

Configuration Files

  • libero_10_ppo_dexbotic_pi0.yaml
  • libero_goal_ppo_dexbotic_pi0.yaml
  • libero_spatial_ppo_dexbotic_pi0.yaml
  • libero_object_ppo_dexbotic_pi0.yaml

Running the Training

Before running, set the checkpoint path in the chosen config to your converted checkpoint from Step 1, then launch training:

bash examples/embodiment/run_embodiment.sh CHOSEN_CONFIG

Replace CHOSEN_CONFIG with one of the four configs above.


Step 3. Evaluate with RLinf

Evaluation follows the RLinf OpenPI evaluation guide:

https://github.com/RLinf/RLinf/blob/main/toolkits/eval_scripts_openpi/README.md

Use your trained checkpoint and run the corresponding evaluation command from that guide.


Results

Model SettingLibero-SpatialLibero-ObjectLibero-GoalLibero-10Average
DB-π0 (SFT)97.697.694.885.093.8
+ RLinf-PPO99.299.897.295.697.95
Δ Improvement+1.6+2.2+2.4+10.6+4.15

References