Deep Multi-modal Learning for Radar-Vision human sensing

Abstract

The emergence of the Internet of Things (IoT) has facilitated the proliferation of smart devices in daily life. These devices possess a notable characteristic that sets them apart from traditional ones: the ability to perceive their physical surroundings using wireless sensors such as RGBD cameras, WiFi, LiDAR, millimeter-Wave (mmWave) radars, and others. The prevalent vision-based sensing approach is unsuitable for indoor environments that demand privacy protection, possess environmental complexity, or require low energy consumption. In this project, we propose to utilize 60-64 GHz mmWave radar as a low-cost, low-power-consumption, low-environmental-requirements, and privacy-preserving solution for 2D human pose estimation, one of the most fundamental human sensing tasks.

In our proposed method, supervision for mmWave-based human sensing is generated from synchronized RGB frames and the human pose landmarks are extracted from 5D mmWave point clouds by using a point transformer-based deep learning network. We gather a multi-modal dataset and perform feasibility studies across various application scenarios and develop multiple experimental protocols to simulate potential obstacles encountered in real-world deployment scenarios. The result shows that the utilization of 60-64 GHz mmWave radar is viable for 2D human pose estimation and can yield comparable results with vision-based solutions.

General Framework

mmWave Point Transformer

HPE Result

Visualization of the human pose estimation result

We demonstrate the human pose estimation result of 3 actions in the frame level.

Loading

Deep Multi-modal Learning for
Radar-Vision human sensing

Visualization of the Data of both vision and mmWave modality
and their corresponding 2D human pose estimations.

Abstract

General Framework

mmWave Point Transformer

HPE Result

Visualization of the human pose estimation result

Related Links

BibTeX

Deep Multi-modal Learning for Radar-Vision human sensing

Visualization of the Data of both vision and mmWave modality and their corresponding 2D human pose estimations.

Abstract

General Framework

mmWave Point Transformer

HPE Result

Visualization of the human pose estimation result

Related Links

BibTeX

Deep Multi-modal Learning for
Radar-Vision human sensing

Visualization of the Data of both vision and mmWave modality
and their corresponding 2D human pose estimations.