New Robotics Research Enables Zero-Shot Generalization in Robots with Just a Few Demonstrations

A groundbreaking robotics paper, "From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models," is set to be the focus of an upcoming episode of the RoboPapers podcast. The research, co-authored by Nishanth Kumar, introduces a novel approach that allows robots to learn complex tasks and generalize to new environments with minimal training data. The podcast, co-hosted by Chris Paxton and Michael Cho, will feature a deep dive into the methodology and implications of this work.

The paper addresses the challenge of enabling robots to solve long-horizon decision-making problems in complex domains, utilizing low-level skills and only a handful of short-horizon demonstrations. The core innovation lies in learning abstract symbolic world models that facilitate "zero-shot generalization" to novel goals through planning. This means robots can adapt to entirely new situations without explicit prior training for those specific scenarios.

The method, dubbed "pix2pred," leverages pretrained Vision-Language Models (VLMs) to propose a wide array of visual predicates relevant for decision-making. These predicates are then evaluated directly from camera images. During training, an optimization-based model-learning algorithm uses these predicates and demonstrations to construct an abstract symbolic world model, defined by a compact subset of the proposed predicates. This approach has shown empirical success in both simulated and real-world experiments, demonstrating aggressive generalization across varied object types, arrangements, and visual backgrounds.

Nishanth Kumar, a Ph.D. student in Electrical Engineering and Computer Science at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), is a key contributor to this research. His work, advised by professors Leslie Kaelbling and Tomás Lozano-Pérez, focuses on combining deep learning with automated planning and reasoning to create general-purpose AI systems for robotics. The RoboPapers podcast, known for "Geeking out with paper authors," provides a platform for such cutting-edge discussions, with hosts Chris Paxton and Michael Cho facilitating insights into the latest advancements in the field.

The implications of this research are significant for the future of robotics, promising more adaptable and efficient autonomous systems. By enabling robots to learn and generalize from extremely small amounts of data, "From Pixels to Predicates" paves the way for robots that can perform complex tasks in dynamic, unfamiliar environments, moving beyond the limitations of traditional model-free approaches.