CRApr 15
TopFeaRe: Locating Critical State of Adversarial Resilience for Graphs Regarding Topology-Feature EntanglementXinxin Fan, Wenxiong Chen, Quanliang Jing et al.
Graph adversarial attacks are usually produced from the two perspectives of topology/structure and node feature, both of them represent the paramount characteristics learned by today's deep learning models. Although some defense countermeasures are proposed at present, they fails to disclose the intrinsic reasons why these two aspects necessitate and how they are adequately fused to co-learn the graph representation. Towards this question, we in this paper propose an adversarial defense approach through locating the graph's critical state of adversarial resilience, resorting to the equilibrium-point theory in the discipline of complex dynamic system (CDS). In brief, our work has three novelties: i) Adversarial-Attack Modeling, i.e. map a graph regime into CDS, and use the oscillation of dynamic system to model the behavior of adversarial perturbation; ii) 2D Topology-Feature-Entangled Function Design for Perturbed Graph, i.e. project graph topology and node feature as two characteristic spaces, and define two-dimensional entangled perturbation functions to represent the dynamic variance under adversarial attacks; and iii) Location of Critical State of Adversarial Resilience, i.e. utilize the equilibrium-point theory to locate the graph's critical state of attack resilience resorting to the perturbation-reflected 2D function. Finally, multi-facet experiments on five commonly-used realistic datasets validate the effectiveness of our proposed approach, and the results show our approach can significantly outperform the state-of-the-art baselines under four representative graph adversarial attacks.
GRMar 11, 2025Code
HessianForge: Scalable LiDAR reconstruction with Physics-Informed Neural Representation and Smoothness Energy ConstraintsHrishikesh Viswanath, Md Ashiqur Rahman, Chi Lin et al.
Accurate and efficient 3D mapping of large-scale outdoor environments from LiDAR measurements is a fundamental challenge in robotics, particularly towards ensuring smooth and artifact-free surface reconstructions. Although the state-of-the-art methods focus on memory-efficient neural representations for high-fidelity surface generation, they often fail to produce artifact-free manifolds, with artifacts arising due to noisy and sparse inputs. To address this issue, we frame surface mapping as a physics-informed energy optimization problem, enforcing surface smoothness by optimizing an energy functional that penalizes sharp surface ridges. Specifically, we propose a deep learning based approach that learns the signed distance field (SDF) of the surface manifold from raw LiDAR point clouds using a physics-informed loss function that optimizes the $L_2$-Hessian energy of the surface. Our learning framework includes a hierarchical octree based input feature encoding and a multi-scale neural network to iteratively refine the signed distance field at different scales of resolution. Lastly, we introduce a test-time refinement strategy to correct topological inconsistencies and edge distortions that can arise in the generated mesh. We propose a \texttt{CUDA}-accelerated least-squares optimization that locally adjusts vertex positions to enforce feature-preserving smoothing. We evaluate our approach on large-scale outdoor datasets and demonstrate that our approach outperforms current state-of-the-art methods in terms of improved accuracy and smoothness. Our code is available at \href{https://github.com/HrishikeshVish/HessianForge/}{https://github.com/HrishikeshVish/HessianForge/}
CRNov 27, 2025
FastFHE: Packing-Scalable and Depthwise-Separable CNN Inference Over FHEWenbo Song, Xinxin Fan, Quanliang Jing et al.
The deep learning (DL) has been penetrating daily life in many domains, how to keep the DL model inference secure and sample privacy in an encrypted environment has become an urgent and increasingly important issue for various security-critical applications. To date, several approaches have been proposed based on the Residue Number System variant of the Cheon-Kim-Kim-Song (RNS-CKKS) scheme. However, they all suffer from high latency, which severely limits the applications in real-world tasks. Currently, the research on encrypted inference in deep CNNs confronts three main bottlenecks: i) the time and storage costs of convolution calculation; ii) the time overhead of huge bootstrapping operations; and iii) the consumption of circuit multiplication depth. Towards these three challenges, we in this paper propose an efficient and effective mechanism FastFHE to accelerate the model inference while simultaneously retaining high inference accuracy over fully homomorphic encryption. Concretely, our work elaborates four unique novelties. First, we propose a new scalable ciphertext data-packing scheme to save the time and storage consumptions. Second, we work out a depthwise-separable convolution fashion to degrade the computation load of convolution calculation. Third, we figure out a BN dot-product fusion matrix to merge the ciphertext convolutional layer with the batch-normalization layer without incurring extra multiplicative depth. Last but not least, we adopt the low-degree Legendre polynomial to approximate the nonlinear smooth activation function SiLU under the guarantee of tiny accuracy error before and after encrypted inference. Finally, we execute multi-facet experiments to verify the efficiency and effectiveness of our proposed approach.
LGMay 20, 2025
SifterNet: A Generalized and Model-Agnostic Trigger Purification ApproachShaoye Luo, Xinxin Fan, Quanliang Jing et al.
Aiming at resisting backdoor attacks in convolution neural networks and vision Transformer-based large model, this paper proposes a generalized and model-agnostic trigger-purification approach resorting to the classic Ising model. To date, existing trigger detection/removal studies usually require to know the detailed knowledge of target model in advance, access to a large number of clean samples or even model-retraining authorization, which brings the huge inconvenience for practical applications, especially inaccessible to target model. An ideal countermeasure ought to eliminate the implanted trigger without regarding whatever the target models are. To this end, a lightweight and black-box defense approach SifterNet is proposed through leveraging the memorization-association functionality of Hopfield network, by which the triggers of input samples can be effectively purified in a proper manner. The main novelty of our proposed approach lies in the introduction of ideology of Ising model. Extensive experiments also validate the effectiveness of our approach in terms of proper trigger purification and high accuracy achievement, and compared to the state-of-the-art baselines under several commonly-used datasets, our SiferNet has a significant superior performance.
CVSep 29, 2020
CrowdMOT: Crowdsourcing Strategies for Tracking Multiple Objects in VideosSamreen Anjum, Chi Lin, Danna Gurari
Crowdsourcing is a valuable approach for tracking objects in videos in a more scalable manner than possible with domain experts. However, existing frameworks do not produce high quality results with non-expert crowdworkers, especially for scenarios where objects split. To address this shortcoming, we introduce a crowdsourcing platform called CrowdMOT, and investigate two micro-task design decisions: (1) whether to decompose the task so that each worker is in charge of annotating all objects in a sub-segment of the video versus annotating a single object across the entire video, and (2) whether to show annotations from previous workers to the next individuals working on the task. We conduct experiments on a diversity of videos which show both familiar objects (aka - people) and unfamiliar objects (aka - cells). Our results highlight strategies for efficiently collecting higher quality annotations than observed when using strategies employed by today's state-of-art crowdsourcing system.
CVJul 29, 2019
ChaLearn Looking at People: IsoGD and ConGD Large-scale RGB-D Gesture RecognitionJun Wan, Chi Lin, Longyin Wen et al.
The ChaLearn large-scale gesture recognition challenge has been run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than $200$ teams round the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. This paper describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. We discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition, and provide a detailed analysis of the current state-of-the-art methods for large-scale isolated and continuous gesture recognition based on RGB-D video sequences. In addition to recognition rate and mean jaccard index (MJI) as evaluation metrics used in our previous challenges, we also introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) baseline method, determining the video division points based on the skeleton points extracted by convolutional pose machine (CPM). Experiments demonstrate that the proposed Bi-LSTM outperforms the state-of-the-art methods with an absolute improvement of $8.1\%$ (from $0.8917$ to $0.9639$) of CSR.
CVFeb 22, 2018
VizWiz Grand Challenge: Answering Visual Questions from Blind PeopleDanna Gurari, Qing Li, Abigale J. Stangl et al.
The study of algorithms to automatically answer visual questions currently is motivated by visual question answering (VQA) datasets constructed in artificial VQA settings. We propose VizWiz, the first goal-oriented VQA dataset arising from a natural VQA setting. VizWiz consists of over 31,000 visual questions originating from blind people who each took a picture using a mobile phone and recorded a spoken question about it, together with 10 crowdsourced answers per visual question. VizWiz differs from the many existing VQA datasets because (1) images are captured by blind photographers and so are often poor quality, (2) questions are spoken and so are more conversational, and (3) often visual questions cannot be answered. Evaluation of modern algorithms for answering visual questions and deciding if a visual question is answerable reveals that VizWiz is a challenging dataset. We introduce this dataset to encourage a larger community to develop more generalized algorithms that can assist blind people.