DDexbotic Docs
Docs/DM0

DM0 Tutorial

DM0 is a vision-language-action model built on a dual-expert architecture with merged attention and Flow Matching for continuous action generation. Unlike the CogACT/OFT models, DM0 generates action trajectories through a diffusion-based approach, producing a chunk of future actions in one forward pass.

DM0 architecture

This tutorial follows the same workflow as the main Tutorial but focuses on DM0-specific configurations. Please ensure you have completed the Installation steps before proceeding.

Pretrained Model

ModelDescriptionInput ImagesAction DimModel SizeLink
DM0-baseDM0 base model with Flow Matching action generationUp to 3 Views32D2.4B🤗 Hugging Face

Download the pretrained DM0 model into the checkpoints folder:

mkdir -p checkpoints
cd checkpoints
git clone https://huggingface.co/Dexmal/DM0-base DM0-base

Training

Before starting training, please follow the instructions in ModelZoo.md to download the pretrained DM0 model, and download the Libero dataset as described in Data.md.

Training a Model with Provided Data

We use Libero as an example to demonstrate how to train a DM0 model. The experiment configuration file for this example is located at: playground/benchmarks/libero/libero_dm0.py

  1. Launch Training
torchrun --nproc_per_node=8 playground/benchmarks/libero/libero_dm0.py

We recommend using 8 × NVIDIA A100/H100 GPUs for training. If you are using 8 × RTX 4090, please use the configuration file scripts/deepspeed/zero3_offload.json to reduce GPU memory utilization. Normalization statistics are automatically computed before the first training run if not already cached.

Training a Model with Your Own Data

  1. Prepare Your Own Data

Refer to Data.md for detailed instructions on data preparation. Once created, register your dataset under dexbotic/data/data_source.

  1. Experiment Configuration

Create a new experiment configuration file based on playground/benchmarks/libero/libero_dm0.py and customize the following:

# DM0TrainerConfig
output_dir = [Path to save checkpoints]

# DM0DataConfig
dataset_name = [Name of your registered dataset]
num_images = [Number of camera views in your dataset]

# DM0InferenceConfig
model_name_or_path = [Path to your trained checkpoint]
action_dim = [Your action dimension]
non_delta_mask = [Indices of non-delta dimensions, e.g., gripper]
  1. Launch Training
torchrun --nproc_per_node=8 path/to/your_dm0_exp.py

Evaluation

We provide pre-trained models for the Libero simulation benchmark. Here we use the Libero pre-trained DM0 model as an example.

First, you should download the pre-trained models and put it in the checkpoints folder.

mkdir -p checkpoints/libero
cd checkpoints/libero
git clone https://huggingface.co/Dexmal/DM0-libero DM0-libero

Deploy Mode

  1. Start Inference Server
CUDA_VISIBLE_DEVICES=0 python playground/benchmarks/libero/libero_dm0.py --task inference
  1. Test Model Inference Results
curl -X POST \
  -F "text=What action should the robot take to put both moka pots on the stove?" \
  -F "image=@test_data/libero_test.png" \
  http://localhost:7891/process_frame
  1. Test Libero Benchmark with Dexbotic-Benchmark

Set up the dexbotic-benchmark following its instructions and test the deployed model in the LIBERO-GOAL environment.

cd dexbotic-benchmark
docker run --gpus all --network host -v $(pwd):/workspace \
  dexmal/dexbotic_benchmark \
  bash /workspace/scripts/env_sh/libero.sh /workspace/evaluation/configs/libero/example_libero.yaml

dexbotic-benchmark also works without docker, see its documentation for further support

Real-Robot Evaluation with RoboChallenge

You can evaluate DM0 models on real robots through the RoboChallenge platform using the Dexbotic-RoboChallengeInference framework.

  1. Installation: Install this project (dexbotic) first, then clone and install the inference framework:
git clone https://github.com/dexmal/Dexbotic-RoboChallengeInference.git
cd Dexbotic-RoboChallengeInference
pip install -r requirements.txt
  1. Download Checkpoints: Download task-specific DM0 checkpoints from the DM0-table30-specialist collection:
huggingface-cli download Dexmal/DM0-table30_put_cup_on_coaster --local-dir ./checkpoints/DM0-table30_put_cup_on_coaster
  1. Submit Evaluation: Log in to RoboChallenge, submit an evaluation request, and wait for task assignment.

  2. Run Inference:

# Online mode (with robot, during assigned evaluation period)
python execute.py --config-name=specialist/put_cup_on_coaster user_id=YOUR_USER_ID

For full details on configuration and advanced usage, see the Dexbotic-RoboChallengeInference README.

After training, please refer to the Evaluation section above to evaluate your model. Update the model_name_or_path in the inference config to your trained checkpoint, and run inference or start the inference server as described.

Benchmark Results

Libero

ModelSpatialObjectGoalLongAverage
DM098.298.896.682.694.1

RoboChallenge

#Task NameDM0 SR/ScoreDM0_gen SR/Scorepi0 SR/Scorepi0.5 SR/Score
1arrange_flowers70% / 82.5020% / 49.0050% / 67.5050% / 69.50
2arrange_fruits_in_basket100% / 99.5070% / 87.0020% / 22.5040% / 70.50
3arrange_paper_cups30% / 73.0010% / 54.000% / 41.500% / 48.00
4clean_dining_table0% / 20.500% / 12.000% / 33.5010% / 58.50
5fold_dishcloth20% / 44.0010% / 10.500% / 32.0020% / 24.00
6hang_toothbrush_cup80% / 84.0090% / 95.0050% / 70.0050% / 71.00
7make_vegetarian_sandwich0% / 7.000% / 15.000% / 17.500% / 29.50
8move_objects_into_box100% / 97.0050% / 64.5050% / 66.0050% / 63.50
9open_the_drawer100% / 98.0090% / 95.000% / 50.0040% / 60.50
10place_shoes_on_rack100% / 100.00100% / 98.5080% / 77.0090% / 90.50
11plug_in_network_cable80% / 84.0020% / 45.5020% / 45.0020% / 65.00
12pour_fries_into_plate40% / 51.000% / 6.0040% / 56.0030% / 38.00
13put_cup_on_coaster100% / 97.50100% / 100.0060% / 71.0090% / 96.00
14put_opener_in_drawer30% / 28.0010% / 10.0050% / 71.5080% / 77.50
15press_three_buttons90% / 96.000% / 0.000% / 0.000% / 0.00
16put_pen_into_pencil_case90% / 96.0020% / 40.0070% / 88.0080% / 89.50
17scan_QR_code0% / 7.000% / 0.0030% / 30.5050% / 55.00
18search_green_boxes100% / 98.50100% / 95.5070% / 74.0080% / 80.00
19set_the_plates100% / 99.5060% / 62.0010% / 34.5080% / 88.00
20shred_scrap_paper30% / 39.0030% / 45.0030% / 59.000% / 36.00
21sort_books20% / 44.500% / 8.500% / 24.500% / 60.00
22sort_electronic_products0% / 20.880% / 18.380% / 31.1250% / 68.62
23stack_bowls100% / 100.0070% / 71.00100% / 98.50100% / 99.50
24stack_color_blocks100% / 100.00100% / 100.0070% / 72.25100% / 99.00
25stick_tape_to_box40% / 68.000% / 14.0010% / 28.0010% / 29.00
26sweep_the_rubbish80% / 82.0030% / 40.0010% / 27.0020% / 46.00
27turn_on_faucet100% / 100.0070% / 84.5020% / 23.00100% / 99.00
28turn_on_light_switch80% / 84.0070% / 70.5010% / 40.0040% / 61.00
29water_potted_plant80% / 94.000% / 33.500% / 6.000% / 36.50
30wipe_the_table0% / 72.000% / 47.500% / 35.000% / 46.00
Average62% / 72.2537% / 49.0828% / 46.4143% / 61.84

ObjectNav

MethodHM3D SR ↑HM3D SPL ↑MP3D SR ↑MP3D SPL ↑
VLFM52.530.436.417.5
L3MVN54.225.5--
UniGoal54.525.141.016.4
OVRL62.026.828.67.4
PirlNav70.434.1--
Uni-NaVid73.737.1--
DM073.525.745.312.9