Learning to Coordinate Manipulation Skills
via Skill Behavior Diversification
Autonomous agents with multiple end-effectors can perform complex tasks by coordinating sub-skills of each end-effector. To realize temporal and behavioral coordination of skills, we propose a modular framework that first individually trains sub-skills of each end-effector with skill behavior diversification, and learns to coordinate end-effectors using diverse behaviors of the skills. We demonstrate that our proposed framework is able to efficiently learn sub-skills with diverse behaviors and coordinate them to solve challenging collaborative control tasks.
Learning to Coordinate Manipulation Skills
via Skill Behavior Diversification

Autonomous agents with multiple end-effectors can perform complex tasks by coordinating sub-skills of each end-effector. To realize temporal and behavioral coordination of skills, we propose a modular framework that first individually trains sub-skills of each end-effector with skill behavior diversification, and learns to coordinate end-effectors using diverse behaviors of the skills. We demonstrate that our proposed framework is able to efficiently learn sub-skills with diverse behaviors and coordinate them to solve challenging collaborative control tasks.

Motivation

When mastering a complex manipulation task, humans often decompose the task into sub-skills of their body parts, practice the sub-skills independently, and then execute the sub-skills together. Similarly, a robot with multiple end-effectors can efficiently learn to perform complex tasks by coordinating sub-skills of each end-effector.

Motivation illustration
Motivation illustration

Problem


Our method and baselines

Baseline illustrations
Baseline illustrations

For a cooperative task that requires N agents or end-effectors to work together, we propose a modular framework with skill behavior diversification (Modular-SBD), which first individually trains each agent's primitive skills with diverse behaviors conditioned on a behavior embedding z. Then, a meta policy takes as input the full observation and selects both a primitive skill and a behavior embedding for each agent.


To compare the performance of our method with various single- and multi-agent RL methods, we designed 5 baselines illustrated in the figure above. The RL and MARL baselines are vanilla RL frameworks that are widely used in single- and multi-agent RL literature. The Modular baseline is a hierarchical framework that composes of a meta policy and N sets of primitive skills and the meta policy selects a primitive skill to execute for each agent but not the behavior of the skill. In addition, we also consider the RL-SBD and MARL-SBD baselines. These baselines are the RL and MARL baselines augmented by a meta policy that outputs skill behavior embeddings served as an additional input to low-level policies.


Videos

Jaco Pick-Push-Place

  • Two Jaco arms start with a block on the left and a container on the right. To complete the task, the arms need to pick up the block, push the container to the center, and place the block inside the container.
RL
Modular
Modular with SBD (Ours)

Jaco Pick-Move-Place

  • Two Jaco arms need to pick up a long bar together, move the bar towards a target location while maintaining its rotation, and place it on the table.
RL
Modular
Modular with SBD (Ours)

Two Ant Push

  • Two ants need to push a large object toward a green target place, collaborating with each other to keep the angle of the object as stable as possible.
RL
Modular
Modular with SBD (Ours)

Quantitative results

Learning curves
Learning curves
Success rates
Success rates

Citation

@inproceedings{lee2020learning,
  title={Learning to Coordinate Manipulation Skills via Skill Behavior Diversification},
  author={Youngwoon Lee and Jingyun Yang and Joseph J. Lim},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://openreview.net/forum?id=ryxB2lBtvH}
}