Self-Supervised Object Detection from Egocentric Videos

Published in International Conference on Computer Vision (ICCV), Paris, France, 2023

Recommended citation: Peri Akiva, Jing Huang, Kevin J. Liang, Rama Kovvuri, Xingyu Chen, Matt Feiszli, Kristin Dana, and Tal Hassner. Self-Supervised Object Detection from Egocentric Videos. International Conference on Computer Vision (ICCV), Paris, France, 2023.

Abstract

Understanding the visual world from the perspective of humans (egocentric) has been a long-standing challenge in computer vision. Egocentric videos exhibit high scene complexity and irregular motion flows compared to typical video understanding tasks. With the egocentric domain in mind, we address the problem of self-supervised, class-agnostic object detection, which aims to locate all objects in a given view, regardless of category, without any annotations or pre-training weights. Our method, self-supervised object Detection from Egocentric VIdeos (DEVI), generalizes appearance-based methods to learn features that are category-specific and invariant to viewing angles and illumination conditions from highly ambiguous environments in an end-to-end manner. Our approach leverages typical human behavior and its egocentric perception to sample diverse views of the same objects for our multi-view and scale-regression loss functions. With our learned cluster residual module, we are able to effectively describe multi-category patches for better complex scene understanding. DEVI provides a boost in performance on recent egocentric datasets, with performance gains up to 4.11% AP50, 0.11% AR1, 1.32% AR10, and 5.03% AR100, while significantly reducing model complexity. We also demonstrate competitive performance on out-of-domain datasets without additional training or fine-tuning.

BibTex:

@InProceedings{Akiva_2023_ICCV,
    author    = {Akiva, Peri and Huang, Jing and Liang, Kevin J and Kovvuri, Rama and Chen, Xingyu and Feiszli, Matt and Dana, Kristin and Hassner, Tal},
    title     = {Self-Supervised Object Detection from Egocentric Videos},
    booktitle = {Proceedings of the {IEEE/CVF} International Conference on Computer Vision ({ICCV})},
    month     = {October},
    year      = {2023},
    pages     = {5225-5237},
    url       = {https://talhassner.github.io/home/publication/2023_ICCV_1}
}

Paper on CVF