Share

Generative AI successfully applied to robotic grasping

Demonstration of end-to-end robotic input at the OECD’s AI Working Day. ©Nikolas Schmidt
CEA-List’s smart robotics demonstrator highlights generative AI’s potential as an enabler of robotic tasks whose instructions are given in natural language. Our researchers designed a robotic handling agent that leverages computer vision and deep learning to accurately execute a grasping task based on a high-level natural-language instruction.

The purpose of the research was to design a software module that would give robots the ability to understand and execute tasks based on instructions given in natural language or provided in images. The principle is to translate intuitive interactions into specific physical actions. We integrated a generic, or foundation, transformer AI model that had been pretrained on a large dataset of robot trajectories. We then refined the model on our own data to improve performance on the target tasks.

The model we ultimately selected, Octo, adapts efficiently to various robotic configurations, requires relatively little data, and is reasonable in terms of computing resources. What makes Octo so flexible is a modular attention structure that allows the model to adjust to the specificities of the target tasks with ease. This in turn improves generalization to and performance on a wide range of robotic tasks.

We also developed a remote operation mode to gather data specific to the robotic grasping task at hand. The system is built on a lightweight six-axis robot remote controlled using a virtual reality joystick, enabling precise, intuitive handling—essential for quality data acquisition. To generate the actual data, volunteers performed robotic grasping tasks involving a dozen objects handled in four distinct spatial configurations.

 


Data acquisition session. ©CEA


 

The diversity of objects and spatial configurations is important to ensure that the data is representative of real-world tasks and to give the robot an opportunity to learn across a wide variety of handling scenarios. CEA-List’s PIXANO software was used to “clean” the data gathered, correcting any annotation errors.

The Octo model was then fine-tuned using a cleaned training dataset containing 678 trajectories and a test dataset of 70 trajectories. Once trained, Octo was successful at identifying and grasping an object from the training dataset, placed either alone or with distractor objects, without a dedicated 3D perception system.

Research on more complex tasks, including bimanual object input, is currently underway.

 

[1] Octo Model Team, D. Ghos, H. R. Walke, K. Pertsch, K. Black, O. Mees, S. Dasari, J. Hejna, T. Kreiman, C. Xu, J. Luo, Y. L. Tan, P. R. Sanketi, Q . Vuong, T. Xiao, D. Sadigh, C. Finn and S. Levine, Octo: An Open-Source Generalist Robot Policy, ArXiv, 2024, https://api.semanticscholar.org/CorpusID:266379116

 

Grasping a target object among distractor objects. ©CEA


 

These advances came out of our research on intuitive programming, the purpose of which is to help make robotics more accessible to operators without specialist knowledge or training.

Rebecca Cabean

Caroline Vienne

Deputy department head — CEA-List

The goal of our research is to leverage artificial intelligence to develop robotic systems that are robust, accessible, and rapidly deployable in industrial settings.

Rebecca Cabean

Jaonary Rabarisoa

Research engineer — CEA-List

Contributors to this article:

  • Caroline Vienne, deputy department head at CEA-List
  • Jaonary Rabarisoa, research engineer at CEA-List

See also

Technological advances

July 11, 2024 | Robotic operator assistance for the precise handling of heavy loads

CEA-List’s Cobomanip cobot, developed over a decade of R&D, gives operators precision load[1]handling assistance in complex environments.
Read more
Challenges

Artificial intelligence

From home to work, artificial intelligence has made in roads into virtually every aspect of our lives. It has transformed how we relate to others, do our jobs, and interact with the devices we use eve...
Read more
Technology platforms

SMART interactive robotics platform

Improve robots’ capabilities and develop new ways of interaction with humans.
Read more