New AI Technique Lets Robots "Think" Before They Perform Complex Tasks
UC Berkeley, Stanford researchers have created an AI method that enables robots to reason through tasks step-by-step using foundation models.
Robotics scientists have developed a novel way to improve how robots interact with their environment.
Researchers from UC Berkeley, Stanford University and the University of Warsaw have developed a method that enables robots to enhance their decision-making processes by incorporating reasoning.
The method, called Embodied Chain-of-Thought Reasoning (ECoT), enables robots to think through tasks step by step and consider their surroundings before taking action.
As detailed in a newly published paper, ECoT is designed to boost a robot’s ability to handle new tasks and environments effectively. It also provides human operators with a way to correct behaviors by modifying a robot’s reasoning through natural language feedback.
Vision-language-action models (VLAs) have increasingly emerged as a powerful way to train a robot to perform an action.
They are designed to give a robot the ability to better understand the task it has been asked to perform. Google DeepMind researchers highlighted VLA’s potential in a study published in June 2023.
However, according to the researchers, VLAs typically learn from observations of actions without any intermediate reasoning, meaning they’re limited in their ability to handle complex, novel situations that require more thoughtful planning and adaptation.
The researchers sought to improve robotic reasoning by adding a foundation model to the equation. They developed a scalable pipeline for generating synthetic training data for ECoT, leveraging various foundation models to extract features from robot demonstrations in the Bridge V2 dataset.
They used a suite of foundation models in their project, using object detectors and vision-language models to create descriptions of the environment the robot was in, annotating information like objects.
They then used Google’s Gemini model to generate plans, subtasks and movement labels, combining the data previously gathered on objects in the scene as well as details on the robot's gripper position.
Dividing the process into submodules allowed a staggered, more methodical approach that enabled the robot to perform its task after thoroughly thinking it through.
Read more about:
AI BusinessAbout the Authors
You May Also Like