Generative AI successfully applied to robotic grasping

#deep learning #generative AI #responsible AI #Robotics

**Demonstration of end-to-end robotic input at the OECD’s AI Working Day. ©Nikolas Schmidt**

CEA-List’s smart robotics demonstrator highlights generative AI’s potential as an enabler of robotic tasks whose instructions are given in natural language. Our researchers designed a robotic handling agent that leverages computer vision and deep learning to accurately execute a grasping task based on a high-level natural-language instruction.

The purpose of the research was to design a software module that would give robots the ability to understand and execute tasks based on instructions given in natural language or provided in images. The principle is to translate intuitive interactions into specific physical actions. We integrated a generic, or foundation, transformer AI model that had been pretrained on a large dataset of robot trajectories. We then refined the model on our own data to improve performance on the target tasks.

The model we ultimately selected, Octo, adapts efficiently to various robotic configurations, requires relatively little data, and is reasonable in terms of computing resources. What makes Octo so flexible is a modular attention structure that allows the model to adjust to the specificities of the target tasks with ease. This in turn improves generalization to and performance on a wide range of robotic tasks.

We also developed a remote operation mode to gather data specific to the robotic grasping task at hand. The system is built on a lightweight six-axis robot remote controlled using a virtual reality joystick, enabling precise, intuitive handling—essential for quality data acquisition. To generate the actual data, volunteers performed robotic grasping tasks involving a dozen objects handled in four distinct spatial configurations.

The diversity of objects and spatial configurations is important to ensure that the data is representative of real-world tasks and to give the robot an opportunity to learn across a wide variety of handling scenarios. CEA-List’s PIXANO software was used to “clean” the data gathered, correcting any annotation errors.

The Octo model was then fine-tuned using a cleaned training dataset containing 678 trajectories and a test dataset of 70 trajectories. Once trained, Octo was successful at identifying and grasping an object from the training dataset, placed either alone or with distractor objects, without a dedicated 3D perception system.

Research on more complex tasks, including bimanual object input, is currently underway.

[1] Octo Model Team, D. Ghos, H. R. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y. L. Tan, P. R. Sanketi, Q . Vuong, T. Xiao, D. Sadigh, C. Finn and S. Levine, Octo: An Open-Source Generalist Robot Policy, ArXiv, 2024, https://api.semanticscholar.org/CorpusID:266379116

**Grasping a target object among distractor objects. ©CEA**

These advances came out of our research on intuitive programming, the purpose of which is to help make robotics more accessible to operators without specialist knowledge or training.

Caroline Vienne

Deputy department head — CEA-List

The goal of our research is to leverage artificial intelligence to develop robotic systems that are robust, accessible, and rapidly deployable in industrial settings.

Jaonary Rabarisoa

Research engineer — CEA-List

Contributors to this article:

Caroline Vienne, deputy department head at CEA-List
Jaonary Rabarisoa, research engineer at CEA-List

Find out more about the 2024 advances of the “Responsible artificial intelligence” program in the CEA-List 2024 activity report

Generative AI successfully applied to robotic grasping

Caroline Vienne

Jaonary Rabarisoa

Contributors to this article:

See also

July 11, 2024 | Robotic operator assistance for the precise handling of heavy loads

Artificial intelligence

SMART interactive robotics platform

January 11, 2022 | Collaborative robotics: AI supports enhanced interaction

CEA-List, the smart digital systems specialists

▼ Naviguer dans le portail ▼