Gaze Target 360

Abstract

Enabling robots to understand human gaze target is a crucial step to allow capabilities in downstream tasks, for example, attention estimation and movement anticipation in real-world human-robot interactions. Prior works have addressed the in-frame target localization problem with data-driven approaches by carefully removing out-of-frame samples. Vision-based gaze estimation methods, such as OpenFace, do not effectively absorb background information in images and cannot predict gaze target in situations where subjects look away from the camera. In this work, we propose a system to address the problem of 360-degree gaze target estimation from an image in generalized visual scenes. The system, named GazeTarget360, integrates conditional inference engines of an eye-contact detector, a pre-trained vision encoder, and a multi-scale-fusion decoder. Cross validation results show that GazeTarget360 can produce accurate and reliable gaze target predictions in unseen scenarios. This makes a first-of-its-kind system to predict gaze targets from realistic camera footage which is highly efficient and deployable.

Green boxes indicate detected heads. A green rendering represents eye contact (EC); a red rendering represents gazing at out-of-frame target (OFT); an arrow pointing toward a green dot represents in-frame target (IFT) location with an overlaying heatmap. The system produces consistent performance across datasets.

BibTeX


        @INPROCEEDINGS{gazetarget360_iros2025,
        author    = {Dai, Zhuangzhuang and Zakka, Vincent Gbouna and Manso, Luis J. and Li, Chen},
        booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, 
        title     = {GazeTarget360: Towards Gaze Target Estimation in 360-Degree for Robot Perception}, 
        year      = {2025}
        }

Gaze Target 360

Abstract

Gaze Target 360 can predict all three gaze target regions: in-frame (IFT), out-of-frame (OFT), and eye contact (EC).

Video Demo

BibTeX