GHIL-Glue can be applied to existing hierarchical imitation learning methods that use video prediction models. GHIL-Glue filters out generated subgoal images that do not make task progress and trains low-level goal reaching policies to be robust to hallucinated artifacts in generated subgoal images.
GHIL-Glue reduces dithering behavior in hierarchical policy methods and makes policies robust to hallucinated artifacts.
Applying GHIL-GLUE to existing hierarchical policy methods achieves state-of-the-art zero-shot generalization performance on both real and simulated robotics benchmarks.
GHIL-Glue achieves state-of-the-art performance on the CALVIN benchmark as well as on 5 physical environments based on the Bridge V2 robot platform.
For additional ablations and implementation details, please see the Appendix.
@inproceedings{hatch2024ghilglue,
title={GHIL-Glue: Hierarchical Control with Filtered Subgoal Images},
author={Kyle Beltran Hatch and Ashwin Balakrishna and Oier Mees and Suraj Nair and Seohong Park and Blake Wulfe and Masha Itkina and Benjamin Eysenbach and Sergey Levine and Thomas Kollar and Benjamin Burchfiel},
booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
year={2025},
address = {Atlanta, USA}
}