Troubleshooting#

[Q&A] Polymetis#

Q: What should I do if I get the communication_constraints_violation error on the server while using Polymetis (Library for Franka robot interface)?

A: Consider disabling CPU frequency scaling. Refer to this page for instructions

Q: What should I do if I encounter OSError: libtorch_cuda.so: cannot open shared object file: No such file or directory error?

A: If you encounter a Segmentation fault (core dumped) error in the client PC while using a robot, you should rebuild fairo by running the command below in the Docker image. This might be caused by the linking error because of the reinstall of PyTorch.

$ cd /fairo/polymetis/polymetis/build/
$ make -j

[Q&A] Docker Build#

Q: Iā€™m encountering git@github.com: Permission denied (publickey). fatal: Could not read from remote repository. when trying to clone fairo while building a Docker image. How can I fix this?

A: Ensure your public key is added to your GitHub account. Then, run the following commands:

eval "$(ssh-agent -s)"
# Add your SSH private key to the agent
ssh-add ~/.ssh/id_rsa

# Verify the connection to GitHub
ssh -T git@github.com

Q: Iā€™m encountering issues while building Polymetis from the source. How can I resolve this?

A: Here are some common errors and solutions.

  • For /home/linuxbrew/.linuxbrew/Cellar/openssl@1.1/1.1.1t/lib/libcrypto.so: undefined reference to `dlsym@GLIBC_2.34' error:

    Follow the steps below.

    # 1. Unlink the brew openssl (https://github.com/Homebrew/homebrew-core/issues/118825)
    brew unlink openssl@1.1
    
    # 2. build the Polymetis again.
    cd <path/to/fairo>/polymetis/polymetis/
    
    # Remove the build directory if it exists.
    rm -rf build
    mkdir build && cd build
    # Rebuild.
    cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_FRANKA=OFF -DBUILD_TESTS=OFF -DBUILD_DOCS=OFF
    make -j
    
  • For Failed to detect a default CUDA architecture. error:

    Make sure the CUDA path is set correctly.

    # E.g.,
    export PATH=/usr/local/cuda-11.7/bin:$PATH
    export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH
    

    Ensure you can run the following command.

    nvcc -V
    

    You should see something like this.

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Wed_Jun__8_16:49:14_PDT_2022
    Cuda compilation tools, release 11.7, V11.7.99
    Build cuda_11.7.r11.7/compiler.31442593_0
    
  • For /home/user/fairo/polymetis/polymetis/torch_isolation/include/torch_server_ops.hpp:56:39: error: ā€˜size_tā€™ has not been declared error:

    Add #include <stddef.h> on top of the torch_server_ops.hpp file, and build again.

    1. Open the file.

    vim <path/to/fairo>/polymetis/polymetis/torch_isolation/include/torch_server_ops.hpp
    
    1. And add the following line on top of the file.

    #include <stddef.h>
    

Q: What should I do with warnings like Warning: Failed to load 'libtorchrot.so' from CONDA_PREFIX or Warning: Failed to load 'libtorchscript_pinocchio.so' from CONDA_PREFIX

A: It does not affect the functionality of the system, so you can ignore it.

[Q&A] Device Connections#

Note

Make sure all the devices (cameras, Oculus if you are collecting the data) are using USB 3.x.

Q: How can I check if my devices (cameras, Oculus) are using USB 3.x?

A: Run lsusb and lsusb -t. When you run lsusb -t, the communication speed in Mbps located at the end of each line must be equal to or above 5000M (USB 3.0).

For example,

$ lsusb
Bus 002 Device 006: ID 8086:0b07 Intel Corp. Intel(R) RealSense(TM) Depth Camera 435
Bus 002 Device 007: ID 8086:0b07 Intel Corp. Intel(R) RealSense(TM) Depth Camera 435

Bus 004 Device 008: ID 2833:0183 GenesysLogic USB3.2 Hub
Bus 004 Device 002: ID 05e3:0625 Genesys Logic, Inc. USB3.2 Hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

$ lsusb -t
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
    |__ Port 2: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
        |__ Port 2: Dev 8, If 0, Class=Imaging, Driver=usbfs, 5000M
        |__ Port 2: Dev 8, If 1, Class=Vendor Specific Class, Driver=, 5000M
        |__ Port 2: Dev 8, If 2, Class=Vendor Specific Class, Driver=usbfs, 5000M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
    |__ Port 2: Dev 7, If 0, Class=Video, Driver=uvcvideo, 5000M
    |__ Port 2: Dev 7, If 1, Class=Video, Driver=uvcvideo, 5000M
    |__ Port 2: Dev 7, If 2, Class=Video, Driver=uvcvideo, 5000M
    |__ Port 2: Dev 7, If 3, Class=Video, Driver=uvcvideo, 5000M
    |__ Port 2: Dev 7, If 4, Class=Video, Driver=uvcvideo, 5000M
    |__ Port 5: Dev 6, If 4, Class=Video, Driver=uvcvideo, 5000M
    |__ Port 5: Dev 6, If 2, Class=Video, Driver=uvcvideo, 5000M
    |__ Port 5: Dev 6, If 0, Class=Video, Driver=uvcvideo, 5000M
    |__ Port 5: Dev 6, If 3, Class=Video, Driver=uvcvideo, 5000M
    |__ Port 5: Dev 6, If 1, Class=Video, Driver=uvcvideo, 5000M
Q: The robot does not follow Oculus Quest 2 even after the connection is established. What should I do?

A: Please check the following:

  • Make sure you find Oculus device when running adb devices commands in Client.

  • Please double-check if you follow the instructions in the Setup Oculus Quest 2 section.

  • If the problem persist, restart the Oculus.

[Q&A] Training and Testing#

Q: How can I resolve RuntimeError: GET was unable to find an engine to execute this computation error during the evaluation of IQL model?

A: This may be due to JAX version mismatch. Try to install a different version of JAX. For example, run the following command:

conda install -c anaconda cudnn=8.2.1
pip install -U jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Q: What should I do if I face a CUDA Out of memory (OOM) issue while trying to learn implicit_q_learning?

A: It might be due to preallocated GPU memory by JAX. You can adjust the memory fraction by setting export XLA_PYTHON_CLIENT_PREALLOCATE=false to resolve this issue.

Q: What should I do if I face Access denied with the following error: or FileNotFoundError: [Errno 2] No such file or directory: '/root/.r3m/r3m_50/model.pt' while downloading r3m checkpoints?

A: This might be due to the permission issue. Please download the checkpoints manually from Google Drive and copy them to the Docker image.

  • (Here we show the example of downloading the checkpoint for r3m ResNet50.)

  • Download the checkpoint in your local machine

  • Get the container ID by running docker ps

  • Copy the checkpoint to the container by running docker cp <checkpoint_path> <container_id>:/root/.r3m/r3m_50/

[Q&A] Oculus#

Q: What should I do to prevent sudden actions from the robot due to wrong signal readings when using Oculus?

A: To prevent sudden actions from the robot due to wrong signal readings when using Oculus, ensure that the cable connection is stable.

Q: What if the robot is not moving when I use Oculus?

A: Make sure to control the robot in the guidance area of Oculus, allow access to the Oculus, and verify that the device is visible and accessible by running adb devices. Also, check the Oculus is turned on (white light is on in the front).

[Q&A] Camera#

Q: How can I check if my camera is connected stably?

A: Consider installing realsense viewer and test whether the camera is connected stably. Also, there are other features in the viewer that can be used to check the camera status.

Q: What should I do if I encounter a RuntimeError: Frame didnā€™t arrive within 5000 error when using a camera?

A: Please unplug and replug the USB cable.

Q: What does the error message ā€œRuntimeError: xioctl(VIDIOC_S_FMT) failed Last Error: Device or resource busyā€ mean when working with a camera?

A: This error message indicates that there is another program, such as realsense-viewer or a Python code, using the camera. The camera should only run in a single program at a time. To resolve this issue, check if there is another program that may be using the camera and close it before running the desired program.

Note

  • Make sure recent firmware is installed. (Our setting was 05.13.00.50 version)

  • Make the camera is connected using USB 3.x

[Q&A] FurnitureSim#

Q: What should I do if I encounter an error isaacgymenvs setup command: 'python_requires' must be a string containing valid version specifiers; Invalid specifier: '>=3.6.* during local installation?

A: execute the following commands, and then rerun the installation.

pip install --upgrade pip wheel
pip install setuptools==58
pip install --upgrade pip==22.2.2
Q: I am encountering ImportError: libpython3.8m.so.1.0: cannot open shared object file: No such file or directory error.

A: Run the following commands.

sudo apt update
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install python3.8-dev
Q: What should I do if I encounter an error [Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer wit h error 101?

A: You should specify vulkan explicitly.

# Get the device id.
apt install vulkan-tools
MESA_VK_DEVICE_SELECT=list vulkaninfo

Specify the device id.

# e.g.
MESA_VK_DEVICE_SELECT='10de:2204' python furniture_bench/scripts/run_sim_env.py --furniture square_table --no-action

If the above method does not work, especially on a machine with multiple GPUs, explicitly specifying the graphics device ID with graphics-device-id might resolve the problem (note that the device ID may vary depending on the machine).ā€

# e.g.
python furniture_bench/scripts/run_sim_env.py --furniture square_table --no-action --graphics-device-id 2
Q: FurnitureSim crashes with segmentation fault.

A: It is likely to be an issue with NVIDIA driver and Vulkan. Install an NVIDIA driver again, reboot, and try it.

Q: Simulator does not terminate even after I press Ctrl+C. What should I do?

A: It could happen when the input streams are blocked. The workaround is to press Ctrl+Z and then kill %1 to terminate the first job.

[Q&A] Gym#

Q: What should I do if I encounter an observation space error while working with Gym? (such as 'python_requires' must be a string containing valid version specifiers; Invalid specifier: '>=3.6.*')

A: Install Gym version 0.21.0 by running pip install gym==0.21.0.

Q: I am getting the error while running pip install gym==0.21.0 (e.g., extras_require ..).

A: Run the following commands and then rerun the installation (reference).

pip install setuptools==65.5.0 pip==21  # gym 0.21 installation is broken with more recent versions