Training and Testing ========================= This tutorial shows how to train and evaluate a policy on FurnitureBench and FurnitureSim. Prerequisites ~~~~~~~~~~~~~ * Run following commands to install the dependencies for training and testing. .. code:: cd ./install_model_deps.sh * Now, install JAX with GPU support based on your CUDA version. See `official doc `__. * Make sure your PyTorch and JAX are installed properly. You can run the following commands to check the installation. .. code:: python -c "import torch; print(f'PyTorch installed successfully, with GPU support = {torch.cuda.is_available()}')" python -c "import jax.numpy as jnp; jnp.ones((3,)); print('JAX installed successfully')" * Prepare training data. You can download the FurnitureBench dataset (:ref:`Dataset`) or generate one in FurnitureSim (:ref:`Automated Assembly Script`). Evaluating Pre-trained Policies ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This section shows how to evaluate pre-trained policies of BC and Implicit Q-Learning (IQL) algorithms. Evaluating Pre-trained IQL -------------------------- You can run our pre-trained IQL policies in FurnitureSim using ``implicit_q_learning/test_offline.py``. .. code:: cd python implicit_q_learning/test_offline.py --env_name=FurnitureSimImageFeature-v0/one_leg --config=implicit_q_learning/configs/furniture_config.py --ckpt_step=1000000 --run_name one_leg_full_iql_r3m_low_sim_1000 --randomness low * If you use the pair of ``run_name`` and ``seed`` that we provide, the pre-trained checkpoint will be automatically downloaded from Google Drive. The checkpoint will be saved in ``checkpoint/ckpt/.`` (e.g., ``one_leg_full_iql_r3m_low_sim_1000.42`` for run name ``one_leg_full_iql_r3m_low_sim_1000`` and seed ``42``). * The below table shows the list of pre-trained ``run_name`` and ``seed``: ============================================== ==================================================================================== Run name / seed Note ============================================== ==================================================================================== ``one_leg_full_iql_r3m_low_sim_1000`` / ``42`` IQL trained with 1000 scripted demos in simulation, low randomness. ``one_leg_full_iql_r3m_low_1000`` / ``42`` IQL trained with 1000 real-world demos, low randomness. ``one_leg_full_iql_r3m_med_1000`` / ``42`` IQL trained with 1000 real-world demos, medium randomness. ``one_leg_full_iql_r3m_mixed_2000`` / ``42`` IQL trained with 2000 real-world demos, low and medium randomness. ============================================== ==================================================================================== * To evaluate the real-world policies, you must change ``--env_name`` with the real-world environment: ``FurnitureBenchImageFeature-v0``. Evaluating Pre-trained BC ------------------------- BC policies are evaluated using ``run.py``. .. code:: python -m run env.id=FurnitureSim-v0 env.furniture=one_leg run_prefix= init_ckpt_path= rolf.encoder_type= is_train=False gpu= env.randomness= # E.g., evaluate a pre-trained BC policy with ResNet18 encoder python -m run env.id=FurnitureSim-v0 env.furniture=one_leg run_prefix=one_leg_full_bc_resnet18_low_sim_1000 init_ckpt_path=checkpoints/ckpt/one_leg_full_bc_resnet18_low_sim_1000/ckpt_00000000050.pt rolf.encoder_type=resnet18 is_train=False gpu=0 env.randomness=low * To evaluate the real-world policies, set ``env.name=FurnitureBenchImage-v0``. Training a Policy from Scratch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We provide a tutorial on how to train a policy from scratch using our codebase. Preprocess Data for Training ------------------------- Both for BC and IQL training, you need to convert a raw dataset as follows: .. code:: python furniture_bench/scripts/preprocess_data.py --in-data-path --out-data-path # E.g., convert data in `scripted_sim_demo/one_leg` and store in `scripted_sim_demo/one_leg_processed` python furniture_bench/scripts/preprocess_data.py --in-data-path scripted_sim_demo/one_leg --out-data-path scripted_sim_demo/one_leg_processed To extract skill-specific segmented trajectories, use ``--from-skill`` and ``--to-skill``: .. code:: python furniture_bench/scripts/preprocess_data.py --in-data-path --out-data-path --from-skill --to-skill Training BC ----------- The following command trains a BC policy. You can change ``rolf.encoder_type`` to ``resnet18``, ``resnet32``, ``resnet50``, ``r3m``, or ``vip``. If you want to log using ``wandb``, use these arguments: ``wandb=True wandb_entity= wandb_project=``. .. code:: python -m run run_prefix= rolf.demo_path= env.furniture= rolf.encoder_type= gpu= # E.g., train BC with ResNet18 encoder python -m run run_prefix=one_leg_full_bc_resnet18_low_sim rolf.demo_path=scripted_sim_demo/one_leg_processed/ env.furniture=one_leg rolf.encoder_type=resnet18 rolf.finetune_encoder=True gpu=0 The setup for BC training is specified in the file ``rolf/rolf/config/algo/bc.yaml``. This configuration will be merged with the default settings for the training. The merged configuration will be stored in the ``config`` directory, following the naming convention: ``FurnitureDummy-v0.bc...yaml``. Evaluating BC ------------- To evaluate a BC policy, add ``is_train=False`` and the checkpoint path to evalute ``init_ckpt_path=log/FurnitureDummy-v0.bc../ckpt/``. .. code:: python -m run env.id=FurnitureSim-v0 run_prefix= env.furniture= rolf.encoder_type= gpu= is_train=False init_ckpt_path= # E.g., evaluate BC with ResNet18 encoder python -m run env.id=FurnitureSim-v0 run_prefix=one_leg_full_bc_resnet18_low_sim env.furniture=one_leg rolf.encoder_type=resnet18 gpu=0 is_train=False init_ckpt_path=log/FurnitureDummy-v0.bc.one_leg_full_bc_resnet18_low_sim.123/ckpt/ckpt_00000000050.pt Training IQL ------------ 1) Extract R3M or VIP features from the demonstrations: .. code:: python implicit_q_learning/extract_feature.py --furniture --demo_dir --out_file_path [--use_r3m | --use_vip] # E.g., extract R3M features from the dataset python implicit_q_learning/extract_feature.py --furniture one_leg --demo_dir scripted_sim_demo/one_leg_processed/ --out_file_path scripted_sim_demo/one_leg_sim.pkl --use_r3m 2) You can train an IQL policy using the following script. If you want to log using ``wandb``, use these arguments: ``--wandb --wandb_entity --wandb_project ``. .. code:: python implicit_q_learning/train_offline.py --env_name=FurnitureImageFeatureDummy-v0/ --config=implicit_q_learning/configs/furniture_config.py --run_name --data_path= --encoder_type=[vip | r3m] # E.g., train IQL with R3M features python implicit_q_learning/train_offline.py --env_name=FurnitureImageFeatureDummy-v0/one_leg --config=implicit_q_learning/configs/furniture_config.py --run_name one_leg_sim --data_path=scripted_sim_demo/one_leg_sim.pkl --encoder_type=r3m Evaluating IQL -------------- To evaluate an IQL policy, run ``implicit_q_learning/test_offline.py`` as follows: .. code:: export XLA_PYTHON_CLIENT_PREALLOCATE=false python implicit_q_learning/test_offline.py --env_name=FurnitureSimImageFeature-v0/ --config=implicit_q_learning/configs/furniture_config.py --run_name --encoder_type=[vip | r3m] --ckpt_step # E.g., evaluate IQL with R3M features python implicit_q_learning/test_offline.py --env_name=FurnitureSimImageFeature-v0/one_leg --config=implicit_q_learning/configs/furniture_config.py --run_name one_leg_sim --encoder_type=r3m --ckpt_step 1000000