Training and Testing
Contents
Training and Testing#
This tutorial shows how to train and evaluate a policy on FurnitureBench and FurnitureSim.
Prerequisites#
Run following commands to install the dependencies for training and testing.
cd <path/to/furniture-bench> ./install_model_deps.sh
Now, install JAX with GPU support based on your CUDA version. See official doc.
Make sure your PyTorch and JAX are installed properly. You can run the following commands to check the installation.
python -c "import torch; print(f'PyTorch installed successfully, with GPU support = {torch.cuda.is_available()}')" python -c "import jax.numpy as jnp; jnp.ones((3,)); print('JAX installed successfully')"
Prepare training data. You can download the FurnitureBench dataset (Dataset) or generate one in FurnitureSim (Automated Assembly Script).
Evaluating Pre-trained Policies#
This section shows how to evaluate pre-trained policies of BC and Implicit Q-Learning (IQL) algorithms.
Evaluating Pre-trained IQL#
You can run our pre-trained IQL policies in FurnitureSim using implicit_q_learning/test_offline.py
.
cd <path/to/furniture-bench>
python implicit_q_learning/test_offline.py --env_name=FurnitureSimImageFeature-v0/one_leg --config=implicit_q_learning/configs/furniture_config.py --ckpt_step=1000000 --run_name one_leg_full_iql_r3m_low_sim_1000 --randomness low
If you use the pair of
run_name
andseed
that we provide, the pre-trained checkpoint will be automatically downloaded from Google Drive. The checkpoint will be saved incheckpoint/ckpt/<run_name>.<seed>
(e.g.,one_leg_full_iql_r3m_low_sim_1000.42
for run nameone_leg_full_iql_r3m_low_sim_1000
and seed42
).The below table shows the list of pre-trained
run_name
andseed
:
Run name / seed |
Note |
---|---|
|
IQL trained with 1000 scripted demos in simulation, low randomness. |
|
IQL trained with 1000 real-world demos, low randomness. |
|
IQL trained with 1000 real-world demos, medium randomness. |
|
IQL trained with 2000 real-world demos, low and medium randomness. |
To evaluate the real-world policies, you must change
--env_name
with the real-world environment:FurnitureBenchImageFeature-v0
.
Evaluating Pre-trained BC#
BC policies are evaluated using run.py
.
python -m run env.id=FurnitureSim-v0 env.furniture=one_leg run_prefix=<run_prefix> init_ckpt_path=<path/to/checkpoint> rolf.encoder_type=<encoder_type> is_train=False gpu=<gpu_id> env.randomness=<randomness>
# E.g., evaluate a pre-trained BC policy with ResNet18 encoder
python -m run env.id=FurnitureSim-v0 env.furniture=one_leg run_prefix=one_leg_full_bc_resnet18_low_sim_1000 init_ckpt_path=checkpoints/ckpt/one_leg_full_bc_resnet18_low_sim_1000/ckpt_00000000050.pt rolf.encoder_type=resnet18 is_train=False gpu=0 env.randomness=low
To evaluate the real-world policies, set
env.name=FurnitureBenchImage-v0
.
Training a Policy from Scratch#
We provide a tutorial on how to train a policy from scratch using our codebase.
Preprocess Data for Training#
Both for BC and IQL training, you need to convert a raw dataset as follows:
python furniture_bench/scripts/preprocess_data.py --in-data-path <path/to/demos> --out-data-path <path/to/processed/demo>
# E.g., convert data in `scripted_sim_demo/one_leg` and store in `scripted_sim_demo/one_leg_processed`
python furniture_bench/scripts/preprocess_data.py --in-data-path scripted_sim_demo/one_leg --out-data-path scripted_sim_demo/one_leg_processed
To extract skill-specific segmented trajectories, use --from-skill
and --to-skill
:
python furniture_bench/scripts/preprocess_data.py --in-data-path <path/to/demos> --out-data-path <path/to/processed/demo> --from-skill <skill_index> --to-skill <skill_index>
Training BC#
The following command trains a BC policy. You can change rolf.encoder_type
to resnet18
, resnet32
, resnet50
, r3m
, or vip
. If you want to log using wandb
, use these arguments: wandb=True wandb_entity=<entity_name> wandb_project=<project_name>
.
python -m run run_prefix=<run_prefix> rolf.demo_path=<path/to/processed/demo> env.furniture=<furniture> rolf.encoder_type=<encoder_type> gpu=<gpu_id>
# E.g., train BC with ResNet18 encoder
python -m run run_prefix=one_leg_full_bc_resnet18_low_sim rolf.demo_path=scripted_sim_demo/one_leg_processed/ env.furniture=one_leg rolf.encoder_type=resnet18 rolf.finetune_encoder=True gpu=0
The setup for BC training is specified in the file rolf/rolf/config/algo/bc.yaml
. This configuration will be merged with the default settings for the training. The merged configuration will be stored in the config
directory, following the naming convention: FurnitureDummy-v0.bc.<run_prefix>.<seed>.yaml
.
Evaluating BC#
To evaluate a BC policy, add is_train=False
and the checkpoint path to evalute init_ckpt_path=log/FurnitureDummy-v0.bc.<run_prefix>.<seed>/ckpt/<checkpoint name>
.
python -m run env.id=FurnitureSim-v0 run_prefix=<run_prefix> env.furniture=<furniture> rolf.encoder_type=<encoder_type> gpu=<gpu_id> is_train=False init_ckpt_path=<path/to/checkpoint>
# E.g., evaluate BC with ResNet18 encoder
python -m run env.id=FurnitureSim-v0 run_prefix=one_leg_full_bc_resnet18_low_sim env.furniture=one_leg rolf.encoder_type=resnet18 gpu=0 is_train=False init_ckpt_path=log/FurnitureDummy-v0.bc.one_leg_full_bc_resnet18_low_sim.123/ckpt/ckpt_00000000050.pt
Training IQL#
Extract R3M or VIP features from the demonstrations:
python implicit_q_learning/extract_feature.py --furniture <furniture> --demo_dir <path/to/data> --out_file_path <path/to/converted_data> [--use_r3m | --use_vip]
# E.g., extract R3M features from the dataset
python implicit_q_learning/extract_feature.py --furniture one_leg --demo_dir scripted_sim_demo/one_leg_processed/ --out_file_path scripted_sim_demo/one_leg_sim.pkl --use_r3m
You can train an IQL policy using the following script. If you want to log using
wandb
, use these arguments:--wandb --wandb_entity <entity_name> --wandb_project <project_name>
.
python implicit_q_learning/train_offline.py --env_name=FurnitureImageFeatureDummy-v0/<furniture> --config=implicit_q_learning/configs/furniture_config.py --run_name <run_name> --data_path=<path/to/pkl> --encoder_type=[vip | r3m]
# E.g., train IQL with R3M features
python implicit_q_learning/train_offline.py --env_name=FurnitureImageFeatureDummy-v0/one_leg --config=implicit_q_learning/configs/furniture_config.py --run_name one_leg_sim --data_path=scripted_sim_demo/one_leg_sim.pkl --encoder_type=r3m
Evaluating IQL#
To evaluate an IQL policy, run implicit_q_learning/test_offline.py
as follows:
export XLA_PYTHON_CLIENT_PREALLOCATE=false
python implicit_q_learning/test_offline.py --env_name=FurnitureSimImageFeature-v0/<furniture> --config=implicit_q_learning/configs/furniture_config.py --run_name <run_name> --encoder_type=[vip | r3m] --ckpt_step <ckpt_step>
# E.g., evaluate IQL with R3M features
python implicit_q_learning/test_offline.py --env_name=FurnitureSimImageFeature-v0/one_leg --config=implicit_q_learning/configs/furniture_config.py --run_name one_leg_sim --encoder_type=r3m --ckpt_step 1000000