CVFeb 15, 2023
Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose EstimationHan Li, Bowen Shi, Wenrui Dai et al.
There has been a recent surge of interest in introducing transformers to 3D human pose estimation (HPE) due to their powerful capabilities in modeling long-term dependencies. However, existing transformer-based methods treat body joints as equally important inputs and ignore the prior knowledge of human skeleton topology in the self-attention mechanism. To tackle this issue, in this paper, we propose a Pose-Oriented Transformer (POT) with uncertainty guided refinement for 3D HPE. Specifically, we first develop novel pose-oriented self-attention mechanism and distance-related position embedding for POT to explicitly exploit the human skeleton topology. The pose-oriented self-attention mechanism explicitly models the topological interactions between body joints, whereas the distance-related position embedding encodes the distance of joints to the root joint to distinguish groups of joints with different difficulties in regression. Furthermore, we present an Uncertainty-Guided Refinement Network (UGRN) to refine pose predictions from POT, especially for the difficult joints, by considering the estimated uncertainty of each joint with uncertainty-guided sampling strategy and self-attention mechanism. Extensive experiments demonstrate that our method significantly outperforms the state-of-the-art methods with reduced model parameters on 3D HPE benchmarks such as Human3.6M and MPI-INF-3DHP
CVAug 23, 2024
CathAction: A Benchmark for Endovascular Intervention UnderstandingBaoru Huang, Tuan Vo, Chayun Kongtongvattana et al.
Real-time visual feedback from catheterization analysis is crucial for enhancing surgical safety and efficiency during endovascular interventions. However, existing datasets are often limited to specific tasks, small scale, and lack the comprehensive annotations necessary for broader endovascular intervention understanding. To tackle these limitations, we introduce CathAction, a large-scale dataset for catheterization understanding. Our CathAction dataset encompasses approximately 500,000 annotated frames for catheterization action understanding and collision detection, and 25,000 ground truth masks for catheter and guidewire segmentation. For each task, we benchmark recent related works in the field. We further discuss the challenges of endovascular intentions compared to traditional computer vision tasks and point out open research questions. We hope that CathAction will facilitate the development of endovascular intervention understanding methods that can be applied to real-world applications. The dataset is available at https://airvlab.github.io/cathaction/.
CVJul 18, 2023
ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose PromptingHongwei Zheng, Han Li, Bowen Shi et al.
Recent 2D-to-3D human pose estimation (HPE) utilizes temporal consistency across sequences to alleviate the depth ambiguity problem but ignore the action related prior knowledge hidden in the pose sequence. In this paper, we propose a plug-and-play module named Action Prompt Module (APM) that effectively mines different kinds of action clues for 3D HPE. The highlight is that, the mining scheme of APM can be widely adapted to different frameworks and bring consistent benefits. Specifically, we first present a novel Action-related Text Prompt module (ATP) that directly embeds action labels and transfers the rich language information in the label to the pose sequence. Besides, we further introduce Action-specific Pose Prompt module (APP) to mine the position-aware pose pattern of each action, and exploit the correlation between the mined patterns and input pose sequence for further pose refinement. Experiments show that APM can improve the performance of most video-based 2D-to-3D HPE frameworks by a large margin.
LGDec 8, 2022Code
MixBoost: Improving the Robustness of Deep Neural Networks by Boosting Data AugmentationZhendong Liu, Wenyu Jiang, Min guo et al.
As more and more artificial intelligence (AI) technologies move from the laboratory to real-world applications, the open-set and robustness challenges brought by data from the real world have received increasing attention. Data augmentation is a widely used method to improve model performance, and some recent works have also confirmed its positive effect on the robustness of AI models. However, most of the existing data augmentation methods are heuristic, lacking the exploration of their internal mechanisms. We apply the explainable artificial intelligence (XAI) method, explore the internal mechanisms of popular data augmentation methods, analyze the relationship between game interactions and some widely used robustness metrics, and propose a new proxy for model robustness in the open-set environment. Based on the analysis of the internal mechanisms, we develop a mask-based boosting method for data augmentation that comprehensively improves several robustness measures of AI models and beats state-of-the-art data augmentation approaches. Experiments show that our method can be widely applied to many popular data augmentation methods. Different from the adversarial training, our boosting method not only significantly improves the robustness of models, but also improves the accuracy of test sets. Our code is available at \url{https://github.com/Anonymous_for_submission}.
CVNov 23, 2021Code
Hierarchical Graph Networks for 3D Human Pose EstimationHan Li, Bowen Shi, Wenrui Dai et al.
Recent 2D-to-3D human pose estimation works tend to utilize the graph structure formed by the topology of the human skeleton. However, we argue that this skeletal topology is too sparse to reflect the body structure and suffer from serious 2D-to-3D ambiguity problem. To overcome these weaknesses, we propose a novel graph convolution network architecture, Hierarchical Graph Networks (HGN). It is based on denser graph topology generated by our multi-scale graph structure building strategy, thus providing more delicate geometric information. The proposed architecture contains three sparse-to-fine representation subnetworks organized in parallel, in which multi-scale graph-structured features are processed and exchange information through a novel feature fusion strategy, leading to rich hierarchical representations. We also introduce a 3D coarse mesh constraint to further boost detail-related feature learning. Extensive experiments demonstrate that our HGN achieves the state-of-the art performance with reduced network parameters. Code is released at https://github.com/qingshi9974/BMVC2021-Hierarchical-Graph-Networks-for-3D-Human-Pose-Estimation.
LGOct 10, 2021
A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform QuantizationYuyang Zhang, Dik Hin Leung, Min Guo et al.
Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance in edge computing, we introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. The implementation is running on Field-programmable Gate Array (FPGA) devices and tested its performance on handwritten digit classification and Q-learning tasks. Results show that our method can achieve better performance with fewer power consumption.
CYSep 23, 2020
Accurate and Interpretable Machine Learning for Transparent Pricing of Health Insurance PlansRohun Kshirsagar, Li-Yen Hsu, Vatshank Chaturvedi et al.
Health insurance companies cover half of the United States population through commercial employer-sponsored health plans and pay 1.2 trillion US dollars every year to cover medical expenses for their members. The actuary and underwriter roles at a health insurance company serve to assess which risks to take on and how to price those risks to ensure profitability of the organization. While Bayesian hierarchical models are the current standard in the industry to estimate risk, interest in machine learning as a way to improve upon these existing methods is increasing. Lumiata, a healthcare analytics company, ran a study with a large health insurance company in the United States. We evaluated the ability of machine learning models to predict the per member per month cost of employer groups in their next renewal period, especially those groups who will cost less than 95\% of what an actuarial model predicts (groups with "concession opportunities"). We developed a sequence of two models, an individual patient-level and an employer-group-level model, to predict the annual per member per month allowed amount for employer groups, based on a population of 14 million patients. Our models performed 20\% better than the insurance carrier's existing pricing model, and identified 84\% of the concession opportunities. This study demonstrates the application of a machine learning system to compute an accurate and fair price for health insurance products and analyzes how explainable machine learning models can exceed actuarial models' predictive accuracy while maintaining interpretability.