Scene Parsing

[NeurIPS23] ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab

The challenge of replicating research results has posed a significant impediment to the field of molecular biology. The advent of …

Jieming Cui, Ziren Gong, Baoxiong Jia, Siyuan Huang, Zilong Zheng, Jianzhu Ma, Yixin Zhu

[ICCV21] YouRefIt: Embodied Reference Understanding with Language and Gesture

We study the machine’s understanding of embodied reference: One agent uses both language and gesture to refer to an object to …

Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Song-Chun Zhu, Tao Gao, Yixin Zhu, Siyuan Huang

[ICCV21] YouRefIt: Embodied Reference Understanding with Language and Gesture

[CVPR21] Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

Humans possess a unique social cognition capability; nonverbal communication can convey rich social information among agents. In …

Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin Zhu

[CVPR21] Learning Triadic Belief Dynamics in Nonverbal Communication from Videos

[ECCV20] LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

Understanding and interpreting human actions is a long-standing challenge and a critical indicator of perception in artificial …

Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-Chun Zhu

[ECCV20] LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

[ICRA20] Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

Aiming to understand how human (false-)belief—a core socio-cognitive ability—would affect human interactions with robots, …

Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, Song-Chun Zhu

[ICRA20] Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs

[NeurIPS19] PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate …

Siyuan Huang, Yixin Chen, Tao Yuan, Siyuan Qi, Yixin Zhu, Song-Chun Zhu

[NeurIPS19] PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points

[ICCV19] Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

We propose a new 3D holistic++ scene understanding problem, which jointly tackles two tasks from a single-view image: (i) holistic …

Yixin Chen, Siyuan Huang, Tao Yuan, Yixin Zhu, Siyuan Qi, Song-Chun Zhu

[ICCV19] Holistic++ Scene Understanding: Single-view 3D Holistic Scene Parsing and Human Pose Estimation with Human-Object Interaction and Physical Commonsense

[NeurIPS18] Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation

Holistic 3D indoor scene understanding refers to jointly recovering the i) object bounding boxes, ii) room layout, and iii) camera …

Siyuan Huang, Siyuan Qi, Yinxue Xiao, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

[NeurIPS18] Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation