Jingtai Liu

CV
h-index3
4papers
20citations
Novelty52%
AI Score37

4 Papers

CVSep 22, 2022Code
MGTR: End-to-End Mutual Gaze Detection with Transformer

Hang Guo, Zhengxi Hu, Jingtai Liu

People's looking at each other or mutual gaze is ubiquitous in our daily interactions, and detecting mutual gaze is of great significance for understanding human social scenes. Current mutual gaze detection methods focus on two-stage methods, whose inference speed is limited by the two-stage pipeline and the performance in the second stage is affected by the first one. In this paper, we propose a novel one-stage mutual gaze detection framework called Mutual Gaze TRansformer or MGTR to perform mutual gaze detection in an end-to-end manner. By designing mutual gaze instance triples, MGTR can detect each human head bounding box and simultaneously infer mutual gaze relationship based on global image information, which streamlines the whole process with simplicity. Experimental results on two mutual gaze datasets show that our method is able to accelerate mutual gaze detection process without losing performance. Ablation study shows that different components of MGTR can capture different levels of semantic information in images. Code is available at https://github.com/Gmbition/MGTR

SYOct 30, 2016
Impedance control of a cable-driven series elastic actuator with the 2-DOF control structure

Wulin Zou, Zhuo Yang, Wen Tan et al.

Series elastic actuators (SEAs) are growingly important in physical human-robot interaction (HRI) due to their inherent safety and compliance. Cable-driven SEAs also allow flexible installation and remote torque transmission, etc. However, there are still challenges for the impedance control of cable-driven SEAs, such as the reduced bandwidth caused by the elastic component, and the performance balance between reference tracking and robustness. In this paper, a velocity sourced cable-driven SEA has been set up. Then, a stabilizing 2 degrees of freedom (2-DOF) control approach was designed to separately pursue the goals of robustness and torque tracking. Further, the impedance control structure for human-robot interaction was designed and implemented with a torque compensator. Both simulation and practical experiments have validated the efficacy of the 2-DOF method for the control of cable-driven SEAs.

CVJul 9, 2025Code
MK-Pose: Category-Level Object Pose Estimation via Multimodal-Based Keypoint Learning

Yifan Yang, Peili Song, Enfan Lan et al.

Category-level object pose estimation, which predicts the pose of objects within a known category without prior knowledge of individual instances, is essential in applications like warehouse automation and manufacturing. Existing methods relying on RGB images or point cloud data often struggle with object occlusion and generalization across different instances and categories. This paper proposes a multimodal-based keypoint learning framework (MK-Pose) that integrates RGB images, point clouds, and category-level textual descriptions. The model uses a self-supervised keypoint detection module enhanced with attention-based query generation, soft heatmap matching and graph-based relational modeling. Additionally, a graph-enhanced feature fusion module is designed to integrate local geometric information and global context. MK-Pose is evaluated on CAMERA25 and REAL275 dataset, and is further tested for cross-dataset capability on HouseCat6D dataset. The results demonstrate that MK-Pose outperforms existing state-of-the-art methods in both IoU and average precision without shape priors. Codes will be released at \href{https://github.com/yangyifanYYF/MK-Pose}{https://github.com/yangyifanYYF/MK-Pose}.

CVMar 11, 2025
Simulating Automotive Radar with Lidar and Camera Inputs

Peili Song, Dezhen Song, Yifan Yang et al.

Low-cost millimeter automotive radar has received more and more attention due to its ability to handle adverse weather and lighting conditions in autonomous driving. However, the lack of quality datasets hinders research and development. We report a new method that is able to simulate 4D millimeter wave radar signals including pitch, yaw, range, and Doppler velocity along with radar signal strength (RSS) using camera image, light detection and ranging (lidar) point cloud, and ego-velocity. The method is based on two new neural networks: 1) DIS-Net, which estimates the spatial distribution and number of radar signals, and 2) RSS-Net, which predicts the RSS of the signal based on appearance and geometric information. We have implemented and tested our method using open datasets from 3 different models of commercial automotive radar. The experimental results show that our method can successfully generate high-fidelity radar signals. Moreover, we have trained a popular object detection neural network with data augmented by our synthesized radar. The network outperforms the counterpart trained only on raw radar data, a promising result to facilitate future radar-based research and development.