Furniture Assembly
Long-horizon complex manipulation tasks
Reproducible
Standardized environment setup
Easy to Use
Python-based robot control stack / Simulator
Large-Scale Dataset
200+ hours of 5000+ teleoperation trajectories
Furniture Assembly: Long-horizon Complex Manipulation Task
Reinforcement learning (RL), imitation learning (IL), and task and motion planning (TAMP) have demonstrated impressive performance across various robotic manipulation tasks. However, these approaches have been limited to learning simple behaviors in current real-world manipulation benchmarks, such as pushing or pick-and-place. To enable more complex, long-horizon behaviors of an autonomous robot, we propose to focus on real-world furniture assembly, a complex, long-horizon robot manipulation task that requires addressing many current robotic manipulation challenges to solve.
FurnitureBench
FurnitureBench is a reproducible real-world furniture assembly benchmark aimed at providing a low barrier for entry and being easily reproducible, so that researchers across the world can reliably test their algorithms and compare them against prior work.
Our system runs with vision-based control for assembly tasks. The robot takes as an observation the images from front and wrist cameras and proprioceptive robot state, and operates with delta end-effector pose and gripper action commands at 10Hz. Then, Operational Space Control (OSC) converts an action into joint torques.

FurnitureBench provides a suite of 8 tasks, each of which introduces its own interactions and challenges. The furniture models are designed inspired by IKEA furniture and modified to enable a single robotic arm to carry out the assembly.

Reproducible Environment
To make the real-robot environment easy to reproduce, we opt for widely used products across the world (e.g., Franka Panda, Intel RealSense cameras, IKEA table) and 3D-printing objects. 10 participants could successfully reproduce the environment from scratch with the performance gaps smaller than 16%.
Our benchmark has three different levels of randomness in the initial states: low, medium, and high. The higher the randomness in the initial state is, the more generalization capability is required for an agent. We provide a task initialization GUI tool to evaluate algorithms under those initial state distribution.
![]() |
|
Easy-to-Use Environment
We provide a Python-based plug-and-play software stack, including scripts for environment setup, data collection, training, evaluation, and more. FurnitureBench's Python APIs are shared across real-world and simulated environments.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
(2x speed) |
FurnitureSim
FurnitureSim is a fast and realistic simulator of FurnitureBench, based on IsaacGym and Factory. FurnitureSim is designed to be a seamless substitution of the real-world environment, which enables rapid prototyping of new algorithms. Check out FurnitureSim via PyPI!
pip install furniture-bench
(5.2x speed) |
(4x speed) |
(6x speed) |
Large-Scale Datasets
The complexity and long task horizon of FurnitureBench tasks make tabula rasa RL with real-world interactions challenging. To make our benchmark tractable, we provide 219.6 hours of 5100 teleoperation demonstrations. The table below summarizes the dataset statistics.
Benchmark Results
Our experiments consist of two benchmarks:
- Single-Skill Benchmark evaluates each individual subtask (e.g., grasping, pushing, inserting, screwing)
- Full-Assembly Benchmark evaluates entire assembly tasks that require long-horizon planning and dexterous control
We evaluate our benchmark with imitation learning (BC) and the state-of-the-art offline RL (IQL).
Single-Skill Benchmark
We benchmark first five skills of each furniture task. The results in the table below demonstrate that individual skill policies can successfully learn grasping and placing skills; but, mostly fail at inserting, which requires precise alignment.

Full-Assembly Benchmark
We present both quantitative and qualitative results on full furniture assembly tasks below. The completed phases represent the number of successful subtasks (e.g., grasping, placing, inserting) of full-assembly tasks. Overall, neither BC nor IQL achieves a single part assembly except for one-leg.

Qualitative Results
Videos play in 2x speed.
lamp
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Project using FurnitureBench
Related Work
- IKEA Furniture Assembly Environment introduces a furniture assembly simulator as a testbed for long-horizon and complex manipulation tasks; but, it does not support screwing.
- Factory and Orbit enables complex physics simulation and photo-realistic rendering, respectively.
- There are prior works on reproducible real-world benchmarking environments, namely RGB-Stacking, ROBEL, REPLAB, and Real Robot Challenge.
Citations
@inproceedings{heo2023furniturebench,
title={FurnitureBench: Reproducible Real-World Benchmark for Long-Horizon Complex Manipulation},
author={Minho Heo and Youngwoon Lee and Doohyun Lee and Joseph J. Lim},
booktitle={Robotics: Science and Systems},
year={2023}
}