Share

A cognitive programming interface

Training the robotic system on a gesture for a medical use case: The operator manually performs the sequence of operations to be automated (gripping the vial with the gripper on the right and grasping the needle to be inserted into the vial cap with the Abilis dexterous gripper on the left).
Programming robots is a time-consuming endeavor—one that can be an obstacle to their deployment in industrial settings, especially for smaller industrial companies that perform constantly-changing tasks. Robots can be programmed at task level. However, this approach doesn’t provide adequate contextualization and requires specialized knowledge. We developed an ontology-based cognitive interface to enable context-sensitive programming that requires no robotics expertise.

With our Cognitive Programming Interface (CPI), non-expert users can program robots by creating a series of sequences consistent with the current state of the scene. The CPI is based on three developments:

  1. An ontology describing a scene, the objects in the scene (such as tables, bottles, or manipulators, which are called actors), their interfaces with the other actors (bottom, gripping area, threading on a canister, etc.), and a set of known skills (gripping, placing, screwing, etc.).
  2. A context interpreter updates the state of the world in real time, using changes in states (closing or opening manipulators, etc.) to determine possible interactions between actors based on the interfaces either hidden or left available by each actor.
  3. A graphical interface showing the scene in 3D and the skills that can be attained at each stage.

 

In the CPI’s cyclical workflow, the scene is initialized by the world model, which then uses a semantic reasoning engine to calculate the possible skills or abilities. This set of possibilities is sent to the graphical interface and then, according to the actions chosen by the user, the model updates itself. Some steps, such as when a particular relationship or action needs to be specified, may require human intervention.

In the ontological model, actors are described by a geometric primitive, their properties, and, particularly, their interfaces, whose relative positions in the object’s frame of reference is known. These interfaces express interaction capabilities (grippable, insertable, placeable, etc.), enabling reasoning not based on objects, but rather on their most likely interactions depending on the context. Relationships between ontology instances are represented as <subject, predicate, object> triplets (e.g.<robot, grasps, object>).

Interaction capabilities are defined by a set of parameters, preconditions, and their effects, which allow relationships to be added, removed, or updated in the ontology. The CPI also allows hybrid queries, which, in the presence of ambiguous or missing information, can call upon either the knowledge base or the user.
 

Figure 1 : The state of the system before and after input.


 
A demonstration-based interaction recognition module can automatically deduce the sequence of skills or abilities performed during a kinesthetic demonstration performed manually by the operator. In this module, it is assumed that each skill begins or ends with a change in gripper state. Two main functions—TriggerGrasp and TriggerRelease—interpret movements, contacts, and relationships between objects to update the semantic state of the world. An algorithm then identifies the type of skill using the modified predicates and, finally, deduces the exact parameters by comparing the observed changes with the expected effects of the possible skills.
 

Figure 2 : Gripping action


 

Learn more

EU Tracebot project (https://www.tracebot.eu/)

  • This project addresses the traceability and automation of industrial sterility control processes in the medical and pharmaceutical industries. Two containers placed in racks were handled by two UR10e robots. In the experiment, skills were automatically recognized from a kinesthetic demonstration. Four actions were performed by the operator. First, the two containers were picked up and placed in the rack. Each time the state of the scene changed, the system correctly identified each type of skill and its associated parameters (object, robot, location), comparing the semantic changes observed with those expected. The complete sequence was faithfully reconstructed and displayed via the interface, demonstrating the robustness of the approach. Based on the kinesthetic demonstration, the action can then be repeated automatically by the robot.

 

Flagship publication

  • « Cognitive Programming Interface : from Task Level Programming to Coherent Task Level Programming » Raphaël Gerin , Julie Dumora , Olivier David , Baptiste Gradoussoff , 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE), Bari, Italy, pp. 1345-1352. https://doi.org/10.1109/CASE59546.2024.10711577

Contributor to this article

  • Raphaël Gerin, Research Engineer, CEA-List