Anran Xu

CV
h-index14
7papers
203citations
Novelty57%
AI Score47

7 Papers

CVMar 29, 2022Code
OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction

Lixin Yang, Kailin Li, Xinyu Zhan et al.

Learning how humans manipulate objects requires machines to acquire knowledge from two perspectives: one for understanding object affordances and the other for learning human's interactions based on the affordances. Even though these two knowledge bases are crucial, we find that current databases lack a comprehensive awareness of them. In this work, we propose a multi-modal and rich-annotated knowledge repository, OakInk, for visual and cognitive understanding of hand-object interactions. We start to collect 1,800 common household objects and annotate their affordances to construct the first knowledge base: Oak. Given the affordance, we record rich human interactions with 100 selected objects in Oak. Finally, we transfer the interactions on the 100 recorded objects to their virtual counterparts through a novel method: Tink. The recorded and transferred hand-object interactions constitute the second knowledge base: Ink. As a result, OakInk contains 50,000 distinct affordance-aware and intent-oriented hand-object interactions. We benchmark OakInk on pose estimation and grasp generation tasks. Moreover, we propose two practical applications of OakInk: intent-based interaction generation and handover generation. Our datasets and source code are publicly available at https://github.com/lixiny/OakInk.

89.4HCMay 11
HapticLDM: A Diffusion Model for Text-to-Vibrotactile Generation

Jiahao Xiong, Fei Wang, Anran Xu et al.

Text-to-vibration generation converts natural language into haptic feedback, enabling vibration-effect designers to get scenarios-fitted vibrations more efficiently, which shows great potentials in application fields such as metaverse, games, and film to enrich the user experience in interactive scenarios. The core challenge in this field is how to generate accurate, consistent, and complete vibrations according to textual semantics. Very recent autoregressive (AR) approaches (e.g., HapticGen) exhibit limited capacity in fully capturing global dependencies, owing to the inherent sequential nature of their modeling and prevailing data constraints. In this paper, we proposed HapticLDM, the first text-to-vibration generative model built upon Latent Diffusion Models (LDMs). Firstly, with respect to the data, we introduced a text-processing strategy that emphasizes dynamic characteristics to curate high-quality data pairs for fine-grained dynamic modeling. Secondly, HapticLDM incorporates a global denoising mechanism that regulates coherent and stable variations in the temporal envelope. Furthermore, we conduct extensive evaluations, including A/B testing against the state-of-the-art baseline and a user study involving 30 participants. The results demonstrate that our model enhances realism and semantic alignment. Qualitative feedback further indicates that HapticLDM simplifies the haptic design workflow while generating diverse, subtle, and physically precise vibrations.

LGDec 7, 2023
A Test-Time Learning Approach to Reparameterize the Geophysical Inverse Problem with a Convolutional Neural Network

Anran Xu, Lindsey J. Heagy

Regularization is critical for solving ill-posed geophysical inverse problems. Explicit regularization is often used, but there are opportunities to explore the implicit regularization effects that are inherent in a Neural Network structure. Researchers have discovered that the Convolutional Neural Network (CNN) architecture inherently enforces a regularization that is advantageous for addressing diverse inverse problems in computer vision, including de-noising and in-painting. In this study, we examine the applicability of this implicit regularization to geophysical inversions. The CNN maps an arbitrary vector to the model space. The predicted subsurface model is then fed into a forward numerical simulation to generate corresponding predicted measurements. Subsequently, the objective function value is computed by comparing these predicted measurements with the observed measurements. The backpropagation algorithm is employed to update the trainable parameters of the CNN during the inversion. Note that the CNN in our proposed method does not require training before the inversion, rather, the CNN weights are estimated in the inversion process, hence this is a test-time learning (TTL) approach. In this study, we choose to focus on the Direct Current (DC) resistivity inverse problem, which is representative of typical Tikhonov-style geophysical inversions (e.g. gravity, electromagnetic, etc.), to test our hypothesis. The experimental results demonstrate that the implicit regularization can be useful in some DC resistivity inversions. We also provide a discussion of the potential sources of this implicit regularization introduced from the CNN architecture and discuss some practical guides for applying the proposed method to other geophysical methods.

LGMar 21, 2025
Towards Understanding the Benefits of Neural Network Parameterizations in Geophysical Inversions: A Study With Neural Fields

Anran Xu, Lindsey J. Heagy

In this work, we employ neural fields, which use neural networks to map a coordinate to the corresponding physical property value at that coordinate, in a test-time learning manner. For a test-time learning method, the weights are learned during the inversion, as compared to traditional approaches which require a network to be trained using a training dataset. Results for synthetic examples in seismic tomography and direct current resistivity inversions are shown first. We then perform a singular value decomposition analysis on the Jacobian of the weights of the neural network (SVD analysis) for both cases to explore the effects of neural networks on the recovered model. The results show that the test-time learning approach can eliminate unwanted artifacts in the recovered subsurface physical property model caused by the sensitivity of the survey and physics. Therefore, NFs-Inv improves the inversion results compared to the conventional inversion in some cases such as the recovery of the dip angle or the prediction of the boundaries of the main target. In the SVD analysis, we observe similar patterns in the left-singular vectors as were observed in some diffusion models, trained in a supervised manner, for generative tasks in computer vision. This observation provides evidence that there is an implicit bias, which is inherent in neural network structures, that is useful in supervised learning and test-time learning models. This implicit bias has the potential to be useful for recovering models in geophysical inversions.

AIAug 6, 2025
Deliberative Reasoning Network: An Uncertainty-Driven Paradigm for Belief-Tracked Inference with Pretrained Language Models

Anran Xu, Jincheng Wang, Baigen Cai et al.

Large language models often fail at logical reasoning when semantic heuristics conflict with decisive evidence - a phenomenon we term cognitive traps. To address this fundamental limitation, we introduce the Deliberative Reasoning Network (DRN), a novel paradigm that reframes logical reasoning from probability maximization to uncertainty minimization. Instead of asking "Which answer is most likely?", DRN asks "Which hypothesis has the most internally consistent evidence?". DRN achieves intrinsic interpretability by explicitly tracking belief states and quantifying epistemic uncertainty for competing hypotheses through an iterative evidence synthesis process. We validate our approach through two complementary architectures - a bespoke discriminative model that embodies the core uncertainty minimization principle, and a lightweight verification module that enhances existing generative LLMs. Evaluated on LCR-1000, our new adversarial reasoning benchmark designed to expose cognitive traps, the bespoke DRN achieves up to 15.2% improvement over standard baselines. When integrated as a parameter-efficient verifier with Mistral-7B, our hybrid system boosts accuracy from 20% to 80% on the most challenging problems. Critically, DRN demonstrates strong zero-shot generalization, improving TruthfulQA performance by 23.6% without additional training, indicating that uncertainty-driven deliberation learns transferable reasoning principles. We position DRN as a foundational, verifiable System 2 reasoning component for building more trustworthy AI systems.

HCJul 29, 2021
SyncUp: Vision-based Practice Support for Synchronized Dancing

Zhongyi Zhou, Anran Xu, Koji Yatani

The beauty of synchronized dancing lies in the synchronization of body movements among multiple dancers. While dancers utilize camera recordings for their practice, standard video interfaces do not efficiently support their activities of identifying segments where they are not well synchronized. This thus fails to close a tight loop of an iterative practice process (i.e., capturing a practice, reviewing the video, and practicing again). We present SyncUp, a system that provides multiple interactive visualizations to support the practice of synchronized dancing and liberate users from manual inspection of recorded practice videos. By analyzing videos uploaded by users, SyncUp quantifies two aspects of synchronization in dancing: pose similarity among multiple dancers and temporal alignment of their movements. The system then highlights which body parts and which portions of the dance routine require further practice to achieve better synchronization. The results of our system evaluations show that our pose similarity estimation and temporal alignment predictions were correlated well with human ratings. Participants in our qualitative user evaluation expressed the benefits and its potential use of SyncUp, confirming that it would enable quick iterative practice.

CVMar 10, 2020
Off-Road Drivable Area Extraction Using 3D LiDAR Data

Biao Gao, Anran Xu, Yancheng Pan et al.

We propose a method for off-road drivable area extraction using 3D LiDAR data with the goal of autonomous driving application. A specific deep learning framework is designed to deal with the ambiguous area, which is one of the main challenges in the off-road environment. To reduce the considerable demand for human-annotated data for network training, we utilize the information from vast quantities of vehicle paths and auto-generated obstacle labels. Using these autogenerated annotations, the proposed network can be trained using weakly supervised or semi-supervised methods, which can achieve better performance with fewer human annotations. The experiments on our dataset illustrate the reasonability of our framework and the validity of our weakly and semi-supervised methods.