Our research focuses on motion-guided object discovery, which presents several technical challenges. First, by definition, the motion information used as a source of supervision does not target static objects. This creates difficulties generalizing to these objects. Second, camera movement generates noise that makes it hard to distinguish moving objects from background elements that appear to be in motion.
We looked to self-distillation—a concept yet unexplored for object discovery—to address these challenges. Self-distillation depends on a teacher model used to automatically label unannotated images, and a student model, which learns to solve the main task using data annotated manually or by the teacher model. The teacher-student setup makes it possible to learn from new unannotated data. The quality of the pseudo-labels initially generated by the teacher model gradually improves.
DIOD is the first method to combine self-distillation with object discovery. With the teacher-student architecture, the teacher model can be updated dynamically based on what the student has learned. The student discovers objects from two sources: the teacher’s attention maps, which include a confidence rating to ensure only high-confidence objects are retained; and the movement masks, from which noisy segments have been removed. The pseudo-labels are gradually improved, increasing overall performance during training. This approach addresses the previously mentioned technical challenges. The discovery of static objects (like parked cars) that the teacher model is able to generalize can now be learned, and the noise caused by camera movement is significantly reduced by the filtering mechanisms applied.
DIOD outperforms state-of-the-art methods by a significant margin (+18.8 points fg-ARI, +43.8 points all-ARI, +8.9 points F1@score on the KITTI dataset). It is more effective at discovering both moving and static objects, eliminating background noise, and distinguishing adjacent objects of the same semantic class.
These capabilities make DIOD a high-performance object discovery method that requires no manual annotation. DIOD could be used to automate annotation, either reducing its cost or eliminating it entirely. It could also be applied to 3D point clouds from LIDAR data, which is potentially highly valuable for automated driving applications, and for the discovery of 2D or 3D objects using a multi-modal model that combines the strengths of 2D RGB images and 3D LIDAR data.
“DIOD: Self-Distillation Meets Object Discovery.”
Kara, S., Ammar, H., Denize, J., Chabot, F., and Pham, Q. C. (2024).
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (rang A*).