GRMay 2
Investigating Anthropometric Fidelity in SAM 3D BodyAizierjiang Aiersilan, Ruting Cheng, James Hahn
The release of SAM 3D Body is a recent development in human mesh recovery, demonstrating improved performance in producing clean, topologically coherent meshes from single images. By leveraging the Momentum Human Rig (MHR), it achieves robustness to occlusion and diverse poses. However, our evaluation reveals a specific and consistent limitation: the model struggles to reconstruct detailed anthropometric deviations, particularly in populations exhibiting distinctive morphological alterations such as geriatric muscle atrophy, scoliosis, or pregnancy, even when these features are prominent in the input image. In this paper, we investigate this phenomenon not as a failure of the model's capacity, but as a byproduct of the "perception-distortion trade-off". We posit that the architectural reliance on the low-dimensional parametric MHR representation, combined with semantic-invariant conditioning (DINOv3) and annotation-based alignment, creates a pervasive "regression to the mean" effect. We analyze these mechanisms to understand why individual biological details are smoothed out. Furthermore, we state our contributions by proposing specific, constructive pathways for future work, such as implicit-explicit hybrid representations and Medical-in-the-Loop alignment, to extend the baseline performance of SAM 3D Body into the high-precision medical domain.
GRDec 17, 2025
Representations of 3D Rotations: Mathematical Foundations and Comparative AnalysisAizierjiang Aiersilan, Haochen Liu, James Hahn
Rotation representations are foundational in fields such as computer graphics, robotics, and machine learning, where precise and efficient modeling of 3D orientations is critical. This paper comprehensively investigates diverse representations of the special orthogonal group $SO(3)$, such as Euler angles, axis-angle vectors, quaternions, rotation matrices, exponential maps, and emerging continuous and probabilistic methods, evaluating their mathematical formulations, continuity, susceptibility to gimbal lock, computational efficiency, storage requirements, interpolation properties, and composition operations, while integrating detailed algebraic insights with practical applications in fields like animation, pose estimation, inertial navigation, 3D shape registration, and neural networks. Empirical evidence highlights quaternions' dominance due to their compactness and computational efficiency, while alternatives like 6D continuous representations and matrix Fisher distributions provide enhanced continuity and uncertainty modeling. Future research could explore hybrid methods and thorough large-scale evaluations to help build a solid foundation for improving rotation representation techniques.
IVMay 18, 2024
Liver Fat Quantification Network with Body ShapeQiyue Wang, Wu Xue, Xiaoke Zhang et al.
It is critically important to detect the content of liver fat as it is related to cardiac complications and cardiovascular disease mortality. However, existing methods are either associated with high cost and/or medical complications (e.g., liver biopsy, imaging technology) or only roughly estimate the grades of steatosis. In this paper, we propose a deep neural network to estimate the percentage of liver fat using only body shapes. The proposed is composed of a flexible baseline network and a lightweight Attention module. The attention module is trained to generate discriminative and diverse features which significant improve the performance. In order to validate the method, we perform extensive tests on the public medical dataset. The results verify that our proposed method yields state-of-the-art performance with Root mean squared error (RMSE) of 5.26% and R-Squared value over 0.8. It offers an accurate and more accessible assessment of hepatic steatosis.
CVJan 15, 2019
Measuring Effectiveness of Video AdvertisementsJames Hahn, Adriana Kovashka
Advertisements are unavoidable in modern society. Times Square is notorious for its incessant display of advertisements. Its popularity is worldwide and smaller cities possess miniature versions of the display, such as Pittsburgh and its digital works in Oakland on Forbes Avenue. Tokyo's Ginza district recently rose to popularity due to its upscale shops and constant onslaught of advertisements to pedestrians. Advertisements arise in other mediums as well. For example, they help popular streaming services, such as Spotify, Hulu, and Youtube TV gather significant streams of revenue to reduce the cost of monthly subscriptions for consumers. Ads provide an additional source of money for companies and entire industries to allocate resources toward alternative business motives. They are attractive to companies and nearly unavoidable for consumers. One challenge for advertisers is examining a advertisement's effectiveness or usefulness in conveying a message to their targeted demographics. Rather than constructing a single, static image of content, a video advertisement possesses hundreds of frames of data with varying scenes, actors, objects, and complexity. Therefore, measuring effectiveness of video advertisements is important to impacting a billion-dollar industry. This paper explores the combination of human-annotated features and common video processing techniques to predict effectiveness ratings of advertisements collected from Youtube. This task is seen as a binary (effective vs. non-effective), four-way, and five-way machine learning classification task. The first findings in terms of accuracy and inference on this dataset, as well as some of the first ad research, on a small dataset are presented. Accuracies of 84\%, 65\%, and 55\% are reached on the binary, four-way, and five-way tasks respectively.