Troubleshooting
Contents
Troubleshooting#
[Q&A] Polymetis#
Q: What should I do if I get the communication_constraints_violation
error on the server while using Polymetis (Library for Franka robot interface)?
A: Consider disabling CPU frequency scaling. Refer to this page for instructions
Q: What should I do if I encounter OSError: libtorch_cuda.so: cannot open shared object file: No such file or directory
error?
A: If you encounter a Segmentation fault (core dumped) error in the client PC while using a robot, you should rebuild fairo by running the command below in the Docker image. This might be caused by the linking error because of the reinstall of PyTorch.
$ cd /fairo/polymetis/polymetis/build/ $ make -j
[Q&A] Docker Build#
Q: Iām encountering git@github.com: Permission denied (publickey). fatal: Could not read from remote repository.
when trying to clone fairo while building a Docker image. How can I fix this?
A: Ensure your public key is added to your GitHub account. Then, run the following commands:
eval "$(ssh-agent -s)" # Add your SSH private key to the agent ssh-add ~/.ssh/id_rsa # Verify the connection to GitHub ssh -T git@github.com
Q: Iām encountering issues while building Polymetis from the source. How can I resolve this?
A: Here are some common errors and solutions.
For
/home/linuxbrew/.linuxbrew/Cellar/openssl@1.1/1.1.1t/lib/libcrypto.so: undefined reference to `dlsym@GLIBC_2.34'
error:Follow the steps below.
# 1. Unlink the brew openssl (https://github.com/Homebrew/homebrew-core/issues/118825) brew unlink openssl@1.1 # 2. build the Polymetis again. cd <path/to/fairo>/polymetis/polymetis/ # Remove the build directory if it exists. rm -rf build mkdir build && cd build # Rebuild. cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_FRANKA=OFF -DBUILD_TESTS=OFF -DBUILD_DOCS=OFF make -jFor
Failed to detect a default CUDA architecture.
error:Make sure the CUDA path is set correctly.
# E.g., export PATH=/usr/local/cuda-11.7/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATHEnsure you can run the following command.
nvcc -VYou should see something like this.
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Jun__8_16:49:14_PDT_2022 Cuda compilation tools, release 11.7, V11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0For
/home/user/fairo/polymetis/polymetis/torch_isolation/include/torch_server_ops.hpp:56:39: error: āsize_tā has not been declared
error:Add
#include <stddef.h>
on top of thetorch_server_ops.hpp
file, and build again.
Open the file.
vim <path/to/fairo>/polymetis/polymetis/torch_isolation/include/torch_server_ops.hpp
And add the following line on top of the file.
#include <stddef.h>
Q: What should I do with warnings like
Warning: Failed to load 'libtorchrot.so' from CONDA_PREFIX
or
Warning: Failed to load 'libtorchscript_pinocchio.so' from CONDA_PREFIX
A: It does not affect the functionality of the system, so you can ignore it.
[Q&A] Device Connections#
Note
Make sure all the devices (cameras, Oculus if you are collecting the data) are using USB 3.x.
A: Run
lsusb
andlsusb -t
. When you runlsusb -t
, the communication speed in Mbps located at the end of each line must be equal to or above 5000M (USB 3.0).For example,
$ lsusb Bus 002 Device 006: ID 8086:0b07 Intel Corp. Intel(R) RealSense(TM) Depth Camera 435 Bus 002 Device 007: ID 8086:0b07 Intel Corp. Intel(R) RealSense(TM) Depth Camera 435 Bus 004 Device 008: ID 2833:0183 GenesysLogic USB3.2 Hub Bus 004 Device 002: ID 05e3:0625 Genesys Logic, Inc. USB3.2 Hub Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub $ lsusb -t /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M |__ Port 2: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M |__ Port 2: Dev 8, If 0, Class=Imaging, Driver=usbfs, 5000M |__ Port 2: Dev 8, If 1, Class=Vendor Specific Class, Driver=, 5000M |__ Port 2: Dev 8, If 2, Class=Vendor Specific Class, Driver=usbfs, 5000M /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M |__ Port 2: Dev 7, If 0, Class=Video, Driver=uvcvideo, 5000M |__ Port 2: Dev 7, If 1, Class=Video, Driver=uvcvideo, 5000M |__ Port 2: Dev 7, If 2, Class=Video, Driver=uvcvideo, 5000M |__ Port 2: Dev 7, If 3, Class=Video, Driver=uvcvideo, 5000M |__ Port 2: Dev 7, If 4, Class=Video, Driver=uvcvideo, 5000M |__ Port 5: Dev 6, If 4, Class=Video, Driver=uvcvideo, 5000M |__ Port 5: Dev 6, If 2, Class=Video, Driver=uvcvideo, 5000M |__ Port 5: Dev 6, If 0, Class=Video, Driver=uvcvideo, 5000M |__ Port 5: Dev 6, If 3, Class=Video, Driver=uvcvideo, 5000M |__ Port 5: Dev 6, If 1, Class=Video, Driver=uvcvideo, 5000M
A: Please check the following:
Make sure you find Oculus device when running adb devices commands in Client.
Please double-check if you follow the instructions in the Setup Oculus Quest 2 section.
If the problem persist, restart the Oculus.
[Q&A] Training and Testing#
RuntimeError: GET was unable to find an engine to execute this computation
error during the evaluation of IQL model?A: This may be due to JAX version mismatch. Try to install a different version of JAX. For example, run the following command:
conda install -c anaconda cudnn=8.2.1 pip install -U jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
CUDA Out of memory (OOM)
issue while trying to learn implicit_q_learning?A: It might be due to preallocated GPU memory by JAX. You can adjust the memory fraction by setting
export XLA_PYTHON_CLIENT_PREALLOCATE=false
to resolve this issue.
Access denied with the following error:
or FileNotFoundError: [Errno 2] No such file or directory: '/root/.r3m/r3m_50/model.pt'
while downloading r3m checkpoints?A: This might be due to the permission issue. Please download the checkpoints manually from Google Drive and copy them to the Docker image.
(Here we show the example of downloading the checkpoint for
r3m ResNet50
.)Download the checkpoint in your local machine
Get the container ID by running
docker ps
Copy the checkpoint to the container by running
docker cp <checkpoint_path> <container_id>:/root/.r3m/r3m_50/
[Q&A] Oculus#
A: To prevent sudden actions from the robot due to wrong signal readings when using Oculus, ensure that the cable connection is stable.
A: Make sure to control the robot in the guidance area of Oculus, allow access to the Oculus, and verify that the device is visible and accessible by running adb devices. Also, check the Oculus is turned on (white light is on in the front).
[Q&A] Camera#
A: Consider installing realsense viewer and test whether the camera is connected stably. Also, there are other features in the viewer that can be used to check the camera status.
A: Please unplug and replug the USB cable.
A: This error message indicates that there is another program, such as realsense-viewer or a Python code, using the camera. The camera should only run in a single program at a time. To resolve this issue, check if there is another program that may be using the camera and close it before running the desired program.
Note
Make sure recent firmware is installed. (Our setting was 05.13.00.50 version)
Make the camera is connected using USB 3.x
[Q&A] FurnitureSim#
isaacgymenvs setup command: 'python_requires' must be a string containing valid version specifiers; Invalid specifier: '>=3.6.*
during local installation?A: execute the following commands, and then rerun the installation.
pip install --upgrade pip wheel pip install setuptools==58 pip install --upgrade pip==22.2.2
ImportError: libpython3.8m.so.1.0: cannot open shared object file: No such file or directory
error.A: Run the following commands.
sudo apt update sudo add-apt-repository ppa:deadsnakes/ppa -y sudo apt update sudo apt install python3.8-dev
[Error] [carb.gym.plugin] cudaExternamMemoryGetMappedBuffer failed on rgbImage buffer wit h error 101
?A: You should specify vulkan explicitly.
# Get the device id. apt install vulkan-tools MESA_VK_DEVICE_SELECT=list vulkaninfoSpecify the device id.
# e.g. MESA_VK_DEVICE_SELECT='10de:2204' python furniture_bench/scripts/run_sim_env.py --furniture square_table --no-actionIf the above method does not work, especially on a machine with multiple GPUs, explicitly specifying the graphics device ID with
graphics-device-id
might resolve the problem (note that the device ID may vary depending on the machine).ā# e.g. python furniture_bench/scripts/run_sim_env.py --furniture square_table --no-action --graphics-device-id 2
A: It is likely to be an issue with NVIDIA driver and Vulkan. Install an NVIDIA driver again, reboot, and try it.
A: It could happen when the input streams are blocked. The workaround is to press Ctrl+Z and then
kill %1
to terminate the first job.
[Q&A] Gym#
'python_requires' must be a string containing valid version specifiers; Invalid specifier: '>=3.6.*'
)A: Install Gym version 0.21.0 by running
pip install gym==0.21.0
.
pip install gym==0.21.0
(e.g., extras_require ..
).A: Run the following commands and then rerun the installation (reference).
pip install setuptools==65.5.0 pip==21 # gym 0.21 installation is broken with more recent versions