ROSep 19, 2022Code
LGC-Net: A Lightweight Gyroscope Calibration Network for Efficient Attitude EstimationYaohua Liu, Wei Liang, Jinqiang Cui
This paper presents a lightweight, efficient calibration neural network model for denoising low-cost microelectromechanical system (MEMS) gyroscope and estimating the attitude of a robot in real-time. The key idea is extracting local and global features from the time window of inertial measurement units (IMU) measurements to regress the output compensation components for the gyroscope dynamically. Following a carefully deduced mathematical calibration model, LGC-Net leverages the depthwise separable convolution to capture the sectional features and reduce the network model parameters. The Large kernel attention is designed to learn the long-range dependencies and feature representation better. The proposed algorithm is evaluated in the EuRoC and TUM-VI datasets and achieves state-of-the-art on the (unseen) test sequences with a more lightweight model structure. The estimated orientation with our LGC-Net is comparable with the top-ranked visual-inertial odometry systems, although it does not adopt vision sensors. We make our method open-source at: https://github.com/huazai665/LGC-Net
OCFeb 7, 2023
Averaged Method of Multipliers for Bi-Level Optimization without Lower-Level Strong ConvexityRisheng Liu, Yaohua Liu, Wei Yao et al.
Gradient methods have become mainstream techniques for Bi-Level Optimization (BLO) in learning fields. The validity of existing works heavily rely on either a restrictive Lower-Level Strong Convexity (LLSC) condition or on solving a series of approximation subproblems with high accuracy or both. In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction. We further provide non-asymptotic convergence analysis of sl-BAMM towards KKT stationary points, and the comparative advantage of our analysis lies in the absence of strong gradient boundedness assumption, which is always required by others. Thus our theory safely captures a wider variety of applications in deep learning, especially where the upper-level objective is quadratic w.r.t. the lower-level variable. Experimental results demonstrate the superiority of our method.
LGOct 19, 2023Code
Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation RegularizationYaohua Liu, Jiaxin Gao, Xianghao Jiao et al.
Adversarial Training (AT), pivotal in fortifying the robustness of deep learning models, is extensively adopted in practical applications. However, prevailing AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting. In this context, our work illuminates the potential of leveraging the target model's historical states as a proxy to provide effective initialization and defense prior, which results in a general proxy guided defense framework, `LAST' ({\bf L}earn from the P{\bf ast}). Specifically, LAST derives response of the proxy model as dynamically learned fast weights, which continuously corrects the update direction of the target model. Besides, we introduce a self-distillation regularized defense objective, ingeniously designed to steer the proxy model's update trajectory without resorting to external teacher models, thereby ameliorating the impact of catastrophic overfitting on performance. Extensive experiments and ablation studies showcase the framework's efficacy in markedly improving model robustness (e.g., up to 9.2\% and 20.3\% enhancement in robust accuracy on CIFAR10 and CIFAR100 datasets, respectively) and training stability. These improvements are consistently observed across various model architectures, larger datasets, perturbation sizes, and attack modalities, affirming LAST's ability to consistently refine both single-step and multi-step AT strategies. The code will be available at~\url{https://github.com/callous-youth/LAST}.
CVSep 11, 2023
Diving into Darkness: A Dual-Modulated Framework for High-Fidelity Super-Resolution in Ultra-Dark EnvironmentsJiaxin Gao, Ziyu Yue, Yaohua Liu et al.
Super-resolution tasks oriented to images captured in ultra-dark environments is a practical yet challenging problem that has received little attention. Due to uneven illumination and low signal-to-noise ratio in dark environments, a multitude of problems such as lack of detail and color distortion may be magnified in the super-resolution process compared to normal-lighting environments. Consequently, conventional low-light enhancement or super-resolution methods, whether applied individually or in a cascaded manner for such problem, often encounter limitations in recovering luminance, color fidelity, and intricate details. To conquer these issues, this paper proposes a specialized dual-modulated learning framework that, for the first time, attempts to deeply dissect the nature of the low-light super-resolution task. Leveraging natural image color characteristics, we introduce a self-regularized luminance constraint as a prior for addressing uneven lighting. Expanding on this, we develop Illuminance-Semantic Dual Modulation (ISDM) components to enhance feature-level preservation of illumination and color details. Besides, instead of deploying naive up-sampling strategies, we design the Resolution-Sensitive Merging Up-sampler (RSMU) module that brings together different sampling modalities as substrates, effectively mitigating the presence of artifacts and halos. Comprehensive experiments showcases the applicability and generalizability of our approach to diverse and challenging ultra-low-light conditions, outperforming state-of-the-art methods with a notable improvement (i.e., $\uparrow$5\% in PSNR, and $\uparrow$43\% in LPIPS). Especially noteworthy is the 19-fold increase in the RMSE score, underscoring our method's exceptional generalization across different darkness levels. The code will be available online upon publication of the paper.
GNNov 5, 2022
Efficient Cavity Searching for Gene Network of Influenza A VirusJunjie Li, Jietong Zhao, Yanqing Su et al.
High order structures (cavities and cliques) of the gene network of influenza A virus reveal tight associations among viruses during evolution and are key signals that indicate viral cross-species infection and cause pandemics. As indicators for sensing the dynamic changes of viral genes, these higher order structures have been the focus of attention in the field of virology. However, the size of the viral gene network is usually huge, and searching these structures in the networks introduces unacceptable delay. To mitigate this issue, in this paper, we propose a simple-yet-effective model named HyperSearch based on deep learning to search cavities in a computable complex network for influenza virus genetics. Extensive experiments conducted on a public influenza virus dataset demonstrate the effectiveness of HyperSearch over other advanced deep-learning methods without any elaborated model crafting. Moreover, HyperSearch can finish the search works in minutes while 0-1 programming takes days. Since the proposed method is simple and easy to be transferred to other complex networks, HyperSearch has the potential to facilitate the monitoring of dynamic changes in viral genes and help humans keep up with the pace of virus mutations.
AIMay 12
Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex OptimizationPeipei Xu, SiYuan Ma, Yaohua Liu et al.
Generating feasible Pareto fronts for constrained bi-objective continuous optimization is central to multi-criteria decision-making. Existing methods usually rely on iterative scalarization, evolutionary search, or problem-specific solvers, requiring repeated optimization for each instance. We introduce DIPS, an end-to-end framework that fine-tunes large language models as amortized Pareto-front generators for constrained bi-objective convex optimization. Given a textual problem description, DIPS directly outputs an ordered set of feasible continuous decision vectors approximating the Pareto front. To make continuous optimization compatible with autoregressive language modeling, DIPS combines a compact discretization scheme, Numerically Grounded Token Initialization for new numerical tokens, and Three-Phase Curriculum Optimization, which progressively aligns structural validity, feasibility, and Pareto-front quality. Across five families of constrained bi-objective convex problems, a fine-tuned 7B-parameter model achieves normalized hypervolume ratios of 95.29% to 98.18% relative to reference fronts. With vLLM-accelerated inference, DIPS solves one instance in as little as 0.16 seconds and outperforms general-purpose and reasoning LLM baselines under the evaluated setting. These results suggest that LLMs can serve as effective amortized generators for continuous Pareto-front approximation.
ROMar 24Code
PHANTOM HandTeng Yan, Jiongxu Chen, Qixiang Hua et al.
Tendon-driven underactuated hands excel in adaptive grasping but often suffer from kinematic unpredictability and highly non-linear force transmission. This ambiguity limits their ability to perform precise free-motion shaping and deliver reliable payloads for complex manipulation tasks. To address this, we introduce the PHANTOM Hand (Hybrid Precision-Augmented Compliance): a modular, 1:1 human-scale system featuring 6 actuators and 15 degrees of freedom (DoFs). We propose a unified framework that bridges the gap between precise analytic shaping and robust compliant grasping. By deriving a sparse mapping from physical geometry and integrating a mechanics-based compensation model, we effectively suppress kinematic drift caused by spring counter-tension and tendon elasticity. This approach achieves sub-degree kinematic reproducibility for free-motion planning while retaining the inherent mechanical compliance required for stable physical interaction. Experimental validation confirms the system's capabilities through (1) kinematic analysis verifying sub-degree global accuracy across the workspace; (2) static expressibility tests demonstrating complex hand gestures; (3) diverse grasping experiments covering power, precision, and tool-use categories; and (4) quantitative fingertip force characterization. The results demonstrate that the PHANTOM hand successfully combines analytic kinematic precision with continuous, predictable force output, significantly expanding the payload and dexterity of underactuated hands. To drive the development of the underactuated manipulation ecosystem, all hardware designs and control scripts are fully open-sourced for community engagement.
LGJun 4, 2024Code
Advancing Generalized Transfer Attack with Initialization Derived Bilevel Optimization and Dynamic Sequence TruncationYaohua Liu, Jiaxin Gao, Xuan Liu et al.
Transfer attacks generate significant interest for real-world black-box applications by crafting transferable adversarial examples through surrogate models. Whereas, existing works essentially directly optimize the single-level objective w.r.t. the surrogate model, which always leads to poor interpretability of attack mechanism and limited generalization performance over unknown victim models. In this work, we propose the \textbf{B}il\textbf{E}vel \textbf{T}ransfer \textbf{A}ttac\textbf{K} (BETAK) framework by establishing an initialization derived bilevel optimization paradigm, which explicitly reformulates the nested constraint relationship between the Upper-Level (UL) pseudo-victim attacker and the Lower-Level (LL) surrogate attacker. Algorithmically, we introduce the Hyper Gradient Response (HGR) estimation as an effective feedback for the transferability over pseudo-victim attackers, and propose the Dynamic Sequence Truncation (DST) technique to dynamically adjust the back-propagation path for HGR and reduce computational overhead simultaneously. Meanwhile, we conduct detailed algorithmic analysis and provide convergence guarantee to support non-convexity of the LL surrogate attacker. Extensive evaluations demonstrate substantial improvement of BETAK (e.g., $\mathbf{53.41}$\% increase of attack success rates against IncRes-v$2_{ens}$) against different victims and defense methods in targeted and untargeted attack scenarios. The source code is available at https://github.com/callous-youth/BETAK.
IVNov 8, 2021Code
Triple-level Model Inferred Collaborative Network Architecture for Video DerainingPan Mu, Zhu Liu, Yaohua Liu et al.
Video deraining is an important issue for outdoor vision systems and has been investigated extensively. However, designing optimal architectures by the aggregating model formation and data distribution is a challenging task for video deraining. In this paper, we develop a model-guided triple-level optimization framework to deduce network architecture with cooperating optimization and auto-searching mechanism, named Triple-level Model Inferred Cooperating Searching (TMICS), for dealing with various video rain circumstances. In particular, to mitigate the problem that existing methods cannot cover various rain streaks distribution, we first design a hyper-parameter optimization model about task variable and hyper-parameter. Based on the proposed optimization model, we design a collaborative structure for video deraining. This structure includes Dominant Network Architecture (DNA) and Companionate Network Architecture (CNA) that is cooperated by introducing an Attention-based Averaging Scheme (AAS). To better explore inter-frame information from videos, we introduce a macroscopic structure searching scheme that searches from Optical Flow Module (OFM) and Temporal Grouping Module (TGM) to help restore latent frame. In addition, we apply the differentiable neural architecture searching from a compact candidate set of task-specific operations to discover desirable rain streaks removal architectures automatically. Extensive experiments on various datasets demonstrate that our model shows significant improvements in fidelity and temporal consistency over the state-of-the-art works. Source code is available at https://github.com/vis-opt-group/TMICS.
LGSep 28, 2020Code
BOML: A Modularized Bilevel Optimization Library in Python for Meta LearningYaohua Liu, Risheng Liu
Meta-learning (a.k.a. learning to learn) has recently emerged as a promising paradigm for a variety of applications. There are now many meta-learning methods, each focusing on different modeling aspects of base and meta learners, but all can be (re)formulated as specific bilevel optimization problems. This work presents BOML, a modularized optimization library that unifies several meta-learning algorithms into a common bilevel optimization framework. It provides a hierarchical optimization pipeline together with a variety of iteration modules, which can be used to solve the mainstream categories of meta-learning methods, such as meta-feature-based and meta-initialization-based formulations. The library is written in Python and is available at https://github.com/dut-media-lab/BOML.
CLFeb 17
NeuroSymActive: Differentiable Neural-Symbolic Reasoning with Active Exploration for Knowledge Graph Question AnsweringRong Fu, Yang Li, Zeyu Zhang et al.
Large pretrained language models and neural reasoning systems have advanced many natural language tasks, yet they remain challenged by knowledge-intensive queries that require precise, structured multi-hop inference. Knowledge graphs provide a compact symbolic substrate for factual grounding, but integrating graph structure with neural models is nontrivial: naively embedding graph facts into prompts leads to inefficiency and fragility, while purely symbolic or search-heavy approaches can be costly in retrievals and lack gradient-based refinement. We introduce NeuroSymActive, a modular framework that combines a differentiable neural-symbolic reasoning layer with an active, value-guided exploration controller for Knowledge Graph Question Answering. The method couples soft-unification style symbolic modules with a neural path evaluator and a Monte-Carlo style exploration policy that prioritizes high-value path expansions. Empirical results on standard KGQA benchmarks show that NeuroSymActive attains strong answer accuracy while reducing the number of expensive graph lookups and model calls compared to common retrieval-augmented baselines.
ROMay 14, 2024
A Prompt-driven Task Planning Method for Multi-drones based on Large Language ModelYaohua Liu
With the rapid development of drone technology, the application of multi-drones is becoming increasingly widespread in various fields. However, the task planning technology for multi-drones still faces challenges such as the complexity of remote operation and the convenience of human-machine interaction. To address these issues, this paper proposes a prompt-driven task planning method for multi-drones based on large language models. By introducing the Prompt technique, appropriate prompt information is provided for the multi-drone system.
CVMar 14, 2025
Observation-Graph Interaction and Key-Detail Guidance for Vision and Language NavigationYifan Xie, Binkai Ou, Fei Ma et al.
Vision and Language Navigation (VLN) requires an agent to navigate through environments following natural language instructions. However, existing methods often struggle with effectively integrating visual observations and instruction details during navigation, leading to suboptimal path planning and limited success rates. In this paper, we propose OIKG (Observation-graph Interaction and Key-detail Guidance), a novel framework that addresses these limitations through two key components: (1) an observation-graph interaction module that decouples angular and visual information while strengthening edge representations in the navigation space, and (2) a key-detail guidance module that dynamically extracts and utilizes fine-grained location and object information from instructions. By enabling more precise cross-modal alignment and dynamic instruction interpretation, our approach significantly improves the agent's ability to follow complex navigation instructions. Extensive experiments on the R2R and RxR datasets demonstrate that OIKG achieves state-of-the-art performance across multiple evaluation metrics, validating the effectiveness of our method in enhancing navigation precision through better observation-instruction alignment.
CVMay 25, 2023
PEARL: Preprocessing Enhanced Adversarial Robust Learning of Image Deraining for Semantic SegmentationXianghao Jiao, Yaohua Liu, Jiaxin Gao et al.
In light of the significant progress made in the development and application of semantic segmentation tasks, there has been increasing attention towards improving the robustness of segmentation models against natural degradation factors (e.g., rain streaks) or artificially attack factors (e.g., adversarial attack). Whereas, most existing methods are designed to address a single degradation factor and are tailored to specific application scenarios. In this work, we present the first attempt to improve the robustness of semantic segmentation tasks by simultaneously handling different types of degradation factors. Specifically, we introduce the Preprocessing Enhanced Adversarial Robust Learning (PEARL) framework based on the analysis of our proposed Naive Adversarial Training (NAT) framework. Our approach effectively handles both rain streaks and adversarial perturbation by transferring the robustness of the segmentation model to the image derain model. Furthermore, as opposed to the commonly used Negative Adversarial Attack (NAA), we design the Auxiliary Mirror Attack (AMA) to introduce positive information prior to the training of the PEARL framework, which improves defense capability and segmentation performance. Our extensive experiments and ablation studies based on different derain methods and segmentation models have demonstrated the significant performance improvement of PEARL with AMA in defense against various adversarial attacks and rain streaks while maintaining high generalization performance across different datasets.
CVMay 17, 2023
Motion-Scenario Decoupling for Rat-Aware Video Position Prediction: Strategy and BenchmarkXiaofeng Liu, Jiaxin Gao, Yaohua Liu et al.
Recently significant progress has been made in human action recognition and behavior prediction using deep learning techniques, leading to improved vision-based semantic understanding. However, there is still a lack of high-quality motion datasets for small bio-robotics, which presents more challenging scenarios for long-term movement prediction and behavior control based on third-person observation. In this study, we introduce RatPose, a bio-robot motion prediction dataset constructed by considering the influence factors of individuals and environments based on predefined annotation rules. To enhance the robustness of motion prediction against these factors, we propose a Dual-stream Motion-Scenario Decoupling (\textit{DMSD}) framework that effectively separates scenario-oriented and motion-oriented features and designs a scenario contrast loss and motion clustering loss for overall training. With such distinctive architecture, the dual-branch feature flow information is interacted and compensated in a decomposition-then-fusion manner. Moreover, we demonstrate significant performance improvements of the proposed \textit{DMSD} framework on different difficulty-level tasks. We also implement long-term discretized trajectory prediction tasks to verify the generalization ability of the proposed dataset.
LGOct 1, 2021
Towards Gradient-based Bilevel Optimization with Non-convex Followers and BeyondRisheng Liu, Yaohua Liu, Shangzhi Zeng et al.
In recent years, Bi-Level Optimization (BLO) techniques have received extensive attentions from both learning and vision communities. A variety of BLO models in complex and practical tasks are of non-convex follower structure in nature (a.k.a., without Lower-Level Convexity, LLC for short). However, this challenging class of BLOs is lack of developments on both efficient solution strategies and solid theoretical guarantees. In this work, we propose a new algorithmic framework, named Initialization Auxiliary and Pessimistic Trajectory Truncated Gradient Method (IAPTT-GM), to partially address the above issues. In particular, by introducing an auxiliary as initialization to guide the optimization dynamics and designing a pessimistic trajectory truncation operation, we construct a reliable approximate version of the original BLO in the absence of LLC hypothesis. Our theoretical investigations establish the convergence of solutions returned by IAPTT-GM towards those of the original BLO without LLC. As an additional bonus, we also theoretically justify the quality of our IAPTT-GM embedded with Nesterov's accelerated dynamics under LLC. The experimental results confirm both the convergence of our algorithm without LLC, and the theoretical findings under LLC.
OCSep 15, 2019
Communication-Censored Linearized ADMM for Decentralized Consensus OptimizationWeiyu Li, Yaohua Liu, Zhi Tian et al.
In this paper, we propose a communication- and computation-efficient algorithm to solve a convex consensus optimization problem defined over a decentralized network. A remarkable existing algorithm to solve this problem is the alternating direction method of multipliers (ADMM), in which at every iteration every node updates its local variable through combining neighboring variables and solving an optimization subproblem. The proposed algorithm, called as COmmunication-censored Linearized ADMM (COLA), leverages a linearization technique to reduce the iteration-wise computation cost of ADMM and uses a communication-censoring strategy to alleviate the communication cost. To be specific, COLA introduces successive linearization approximations to the local cost functions such that the resultant computation is first-order and light-weight. Since the linearization technique slows down the convergence speed, COLA further adopts the communication-censoring strategy to avoid transmissions of less informative messages. A node is allowed to transmit only if the distance between the current local variable and its previously transmitted one is larger than a censoring threshold. COLA is proven to be convergent when the local cost functions have Lipschitz continuous gradients and the censoring threshold is summable. When the local cost functions are further strongly convex, we establish the linear (sublinear) convergence rate of COLA, given that the censoring threshold linearly (sublinearly) decays to 0. Numerical experiments corroborate with the theoretical findings and demonstrate the satisfactory communication-computation tradeoff of COLA.
CVSep 13, 2018
Computer Vision-aided Atom Tracking in STEM ImagingYawei Hui, Yaohua Liu
To address the SMC'17 data challenge -- "Data mining atomically resolved images for material properties", we first used the classic "blob detection" algorithms developed in computer vision to identify all atom centers in each STEM image frame. With the help of nearest neighbor analysis, we then found and labeled every atom center common to all the STEM frames and tracked their movements through the given time interval for both Molybdenum or Selenium atoms.
CVSep 13, 2018
Discovering Features in Sr$_{14}$Cu$_{24}$O$_{41}$ Neutron Single Crystal Diffraction Data by Cluster AnalysisYawei Hui, Yaohua Liu, Byung-Hoon Park
To address the SMC'18 data challenge, "Discovering Features in Sr$_{14}$Cu$_{24}$O$_{41}$", we have used the clustering algorithm "DBSCAN" to separate the diffuse scattering features from the Bragg peaks, which takes into account both spatial and photometric information in the dataset during in the clustering process. We find that, in additional to highly localized Bragg peaks, there exists broad diffuse scattering patterns consisting of distinguishable geometries. Besides these two distinctive features, we also identify a third distinguishable feature submerged in the low signal-to-noise region in the reciprocal space, whose origin remains an open question.
CVOct 16, 2017
Volumetric Data Exploration with Machine Learning-Aided Visualization in Neutron ScienceYawei Hui, Yaohua Liu
Recent advancements in neutron and X-ray sources, instrumentation and data collection modes have significantly increased the experimental data size (which could easily contain 10$^{8}$ -- 10$^{10}$ data points), so that conventional volumetric visualization approaches become inefficient for both still imaging and interactive OpenGL rendition in a 3D setting. We introduce a new approach based on the unsupervised machine learning algorithm, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), to efficiently analyze and visualize large volumetric datasets. Here we present two examples of analyzing and visualizing datasets from the diffuse scattering experiment of a single crystal sample and the tomographic reconstruction of a neutron scanning of a turbine blade. We found that by using the intensity as the weighting factor in the clustering process, DBSCAN becomes very effective in de-noising and feature/boundary detection, and thus enables better visualization of the hierarchical internal structures of the neutron scattering data.