New AI Technique Lets Robots "Think" Before They Perform Complex Tasks

UC Berkeley, Stanford researchers have created an AI method that enables robots to reason through tasks step-by-step using foundation models.

2 Min Read
robotic hand arranges ABC blocks in sequential order
Alamy

Robotics scientists have developed a novel way to improve how robots interact with their environment.

Researchers from UC Berkeley, Stanford University and the University of Warsaw have developed a method that enables robots to enhance their decision-making processes by incorporating reasoning.

The method, called Embodied Chain-of-Thought Reasoning (ECoT), enables robots to think through tasks step by step and consider their surroundings before taking action.

As detailed in a newly published paper, ECoT is designed to boost a robot’s ability to handle new tasks and environments effectively. It also provides human operators with a way to correct behaviors by modifying a robot’s reasoning through natural language feedback.

chart showing generated embodied chain-of-thought

Vision-language-action models (VLAs) have increasingly emerged as a powerful way to train a robot to perform an action.

They are designed to give a robot the ability to better understand the task it has been asked to perform. Google DeepMind researchers highlighted VLA’s potential in a study published in June 2023.

However, according to the researchers, VLAs typically learn from observations of actions without any intermediate reasoning, meaning they’re limited in their ability to handle complex, novel situations that require more thoughtful planning and adaptation.

Related:Artificial General Intelligence: Are We There Yet?

The researchers sought to improve robotic reasoning by adding a foundation model to the equation. They developed a scalable pipeline for generating synthetic training data for ECoT, leveraging various foundation models to extract features from robot demonstrations in the Bridge V2 dataset.

They used a suite of foundation models in their project, using object detectors and vision-language models to create descriptions of the environment the robot was in, annotating information like objects.

They then used Google’s Gemini model to generate plans, subtasks and movement labels, combining the data previously gathered on objects in the scene as well as details on the robot's gripper position. 

Dividing the process into submodules allowed a staggered, more methodical approach that enabled the robot to perform its task after thoroughly thinking it through.

Continue Reading This Article on AI Business

Read more about:

AI Business

About the Authors

Ben Wodecki

Assistant Editor, AI Business

Ben Wodecki is assistant editor at AI Business, a publication dedicated to the latest trends in artificial intelligence.

AI Business

AI Business, an ITPro Today sister site, is the leading content portal for artificial intelligence and its real-world applications. With its exclusive access to the global c-suite and the trendsetters of the technology world, it brings readers up-to-the-minute insights into how AI technologies are transforming the global economy - and societies - today.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like