Policy Transfer across Visual and Dynamics
Domain Gaps via Iterative Grounding
The ability to transfer a policy from one environment to another is a promising avenue for efficient robot learning in realistic settings where task supervision is not available. To succeed, such policy transfer must overcome both the visual and dynamics domain gap between source and target environments. We propose IDAPT, a novel policy transfer method with iterative environment grounding that alternates between (1) directly minimizing both visual and dynamics domain gaps by grounding the source environment in the target environment, and (2) training a policy on the grounded source environment. The empirical results on locomotion and robotic manipulation tasks demonstrate that our method can effectively transfer a policy across large domain gaps with minimal interaction with the target environment.