ARTICLEarstechnica.com2 min read

Robotic Dogs Gain Advanced Instrument Reading Abilities

By Jeremy Hsu

Robotic Dogs Gain Advanced Instrument Reading Abilities

AI Summary

Boston Dynamics' robotic dog, Spot, has taken a significant leap forward in its capabilities, thanks to Google DeepMind's latest AI model, Gemini Robotics-ER 1.6. This model enhances the robot's ability to read analog thermometers and pressure gauges with remarkable accuracy, a crucial skill for navigating and inspecting industrial environments. The collaboration between Google DeepMind and Boston Dynamics aims to improve 'embodied reasoning' in robots, enabling them to interact more effectively with their surroundings.

The Gemini Robotics-ER 1.6 model acts as a high-level reasoning engine, allowing robots to plan and execute complex tasks. It introduces 'agentic vision,' a feature that combines visual reasoning with code execution to create a 'visual scratchpad' for analyzing images. This innovation has dramatically increased the accuracy of instrument reading tasks from 23% in the previous model to 98%.

Spot, the quadruped robot, is being tested in various industrial settings, including Hyundai Motor Group's automotive factories, to perform inspection duties. These tasks require sophisticated visual reasoning to interpret complex instruments, such as gauges and sight glasses, which are essential for monitoring liquid levels and other parameters.

The model's ability to achieve 86% accuracy even without agentic vision highlights its robustness. It employs a method of pointing to different elements in visual images to process complex tasks, enhancing its multi-view reasoning capability. This allows robots to use multiple camera streams to gain a comprehensive understanding of their environment, making them more effective in industrial applications.

Key Concepts

Embodied Reasoning

Embodied reasoning refers to the ability of robots to interact with and understand their physical environment through sensory inputs and cognitive processes. It involves integrating perception and action to perform tasks effectively.

Agentic Vision

Agentic vision is a cognitive process that combines visual perception with the ability to execute tasks based on visual input. It allows robots to create a mental 'scratchpad' for analyzing and manipulating visual data.

Category

Technology
M

Summarized by Mente

Save any article, video, or tweet. AI summarizes it, finds connections, and creates your to-do list.

Start free, no credit card