DDexbotic Docs
Docs/Introduction
Dexbotic Logo

One-Stop VLA Development Toolbox for Embodied Intelligence

PaperHugging FaceDocumentationLicenseChinese

Pretraining · Fine-tuning · Inference · Evaluation
Supporting mainstream policies such as π0, CogACT, OFT, MemVLA, and more

Introduction

Dexbotic is a VLA (Vision-Language-Action) development toolbox built on the PyTorch framework, designed to provide a unified and efficient solution for embodied intelligence research. It comes with built-in environment configurations for various mainstream VLA models, allowing users to reproduce, fine-tune, and inference cutting-edge VLA algorithms with simple setup.

  • Ready-to-Use VLA Framework: Centered around VLA models, integrating embodied manipulation and navigation capabilities, supporting multiple cutting-edge algorithms.
  • High-Performance Pre-trained Foundation Models: For mainstream VLA algorithms such as π0 and CogACT, Dexbotic provides multiple optimized pre-trained models.
  • Modular Development Architecture: Adopting a "layered configuration + factory registration + entry dispatch" architecture, users can easily modify configurations, change models, or add tasks by simply modifying experimental scripts.
  • Unified Cloud and Local Training: Fully supports both cloud and local training needs, supporting cloud training platforms such as Alibaba Cloud and Volcano Engine, while also accommodating consumer-grade GPUs for local training.
  • Extensive Robot Compatibility: For mainstream robots such as UR5, Franka, and ALOHA, Dexbotic provides a unified training data format and deployment scripts.

Quick Start

We strongly recommend using Docker for development or deployment to get the best experience.

1. Installation and Environment Setup

# 1. Clone the repository
git clone https://github.com/dexmal/dexbotic.git

# 2. Start Docker container
docker run -it --rm --gpus all --network host \
  -v $(pwd)/dexbotic:/dexbotic \
  dexmal/dexbotic \
  bash

# 3. Activate environment and install dependencies
cd /dexbotic
conda activate dexbotic
pip install -e .

System Requirements: Ubuntu 20.04/22.04, recommended GPUs: RTX 4090, A100, or H100 (8 GPUs recommended for training, 1 GPU for deployment).

Using on Blackwell GPUs

For users with Blackwell architecture GPUs (e.g., B100, RTX 5090), please use the specialized Docker image dexmal/dexbotic:c130t28.

# 1. Start Docker with Blackwell image
docker run -it --rm --gpus all --network host \
  -v /path/to/dexbotic:/dexbotic \
  dexmal/dexbotic:c130t28 \
  bash

# 2. Activate environment
cd /dexbotic
pip install -e .

2. Usage Guide

Benchmark Results

The following shows a comparison of evaluation results between models trained with Dexbotic and original models on mainstream simulation environments. View more detailed evaluation results: Benchmark Results

Libero

ModelAverageLibero-SpatialLibero-ObjectLibero-GoalLibero-10
CogACT93.697.298.090.288.8
DB-CogACT94.993.897.896.291.8
π094.296.898.895.885.2
DB-π093.99798.29486.4
MemVLA96.798.498.496.493.4
DB-MemVLA97.097.299.298.493.2
DB-GR00TN194.893.099.695.291.4

CALVIN

ModelAverage Length12345
CogACT3.24683.872.964.055.948.0
DB-CogACT4.06393.586.780.376.069.8
OFT3.47289.179.467.459.851.5
DB-OFT3.54092.880.769.260.251.1

SimplerEnv

ModelAverageSpoonCarrotStack BlocksEggplant
CogACT51.2571.750.81567.5
DB-CogACT69.4587.565.2829.1795.83
OFT30.2312.54.24.2100
DB-OFT76.3991.6776.3943.0694.44
MemVLA71.975.075.037.5100.0
DB-MemVLA84.4100.066.770.8100.0

ManiSkill2

ModelAveragePickCubeStackCubePickSingleYCBPickSingleEGADPickClutterYCB
CogACT405570302520
DB-CogACT589065654030
OFT214045550
DB-OFT639075556530
π0669585558510
DB-π0659585655030

RoboTwin2.0

ModelAverageAdjust BottleGrab RollerPlace Empty CupPlace Phone Stand
CogACT43.88772115
DB-CogACT58.599892818

FAQ

Q: Failed to install Flash-Attention

A: For detailed installation instructions and troubleshooting, please refer to the official documentation at https://github.com/Dao-AILab/flash-attention.

Q: Converting RLDS/LeRobot to Dexdata

A: We provide a general data conversion guide in data conversion. An example of Lerobot data conversion can be found in convert_lerobot_to_dexdata, and an example for RLDS data conversion is available in convert_rlds_to_dexdata.

Q: Is 5090 supported?

A: Yes, please refer to Using on Blackwell GPUs.

Support Us

We are continuously improving, with more features coming soon. If you like this project, please give us a star on GitHub GitHub. Your support is our motivation to keep moving forward!

If Dexbotic has been helpful in your research work, please consider citing our technical report:

@article{dexbotic,
  title={Dexbotic: Open-Source Vision-Language-Action Toolbox},
  author={Dexbotic Contributors},
  journal={arXiv preprint arXiv:2510.23511},
  year={2025}
}

License

This project is licensed under the MIT License.