Deep Multi-modal Learning for
Radar-Vision human sensing

Authors: Chen Xinyan
Supervisor: Xie Lihua, Co-Supervisor: Yang Jianfei, Examinar: Wang Dan Wei

School of Electrical and Electronic Engineering
Nanyang Technological University


A final year project presented to the Nanyang Technological University
in partial fulfilment of the requirements of the degree of
Bachelor of Engineering

2023

Visualization of the Data of both vision and mmWave modality
and their corresponding 2D human pose estimations.

Abstract

The emergence of the Internet of Things (IoT) has facilitated the proliferation of smart devices in daily life. These devices possess a notable characteristic that sets them apart from traditional ones: the ability to perceive their physical surroundings using wireless sensors such as RGBD cameras, WiFi, LiDAR, millimeter-Wave (mmWave) radars, and others. The prevalent vision-based sensing approach is unsuitable for indoor environments that demand privacy protection, possess environmental complexity, or require low energy consumption. In this project, we propose to utilize 60-64 GHz mmWave radar as a low-cost, low-power-consumption, low-environmental-requirements, and privacy-preserving solution for 2D human pose estimation, one of the most fundamental human sensing tasks.

In our proposed method, supervision for mmWave-based human sensing is generated from synchronized RGB frames and the human pose landmarks are extracted from 5D mmWave point clouds by using a point transformer-based deep learning network. We gather a multi-modal dataset and perform feasibility studies across various application scenarios and develop multiple experimental protocols to simulate potential obstacles encountered in real-world deployment scenarios. The result shows that the utilization of 60-64 GHz mmWave radar is viable for 2D human pose estimation and can yield comparable results with vision-based solutions.

General Framework

General Framework

mmWave Point Transformer

mmWave_PT

HPE Result

Visualization of the human pose estimation result

We demonstrate the human pose estimation result of 3 actions in the frame level.

Loading

Related Links

BibTeX