Jinhu Lü

h-index89

13papers

997citations

Novelty42%

AI Score45

Ranked #42,592 of 194,257 authors (top 22%)#916 in CR (top 14%)

13 Papers

17.5CVApr 1, 2023Code

Q-DETR: An Efficient Low-Bit Quantized Detection Transformer

Sheng Xu, Yanjing Li, Mingbao Lin et al.

The recent detection transformer (DETR) has advanced object detection, but its application on resource-constrained devices requires massive computation and memory resources. Quantization stands out as a solution by representing the network in low-bit parameters and operations. However, there is a significant performance drop when performing low-bit quantized DETR (Q-DETR) with existing quantization methods. We find that the bottlenecks of Q-DETR come from the query information distortion through our empirical analyses. This paper addresses this problem based on a distribution rectification distillation (DRD). We formulate our DRD as a bi-level optimization problem, which can be derived by generalizing the information bottleneck (IB) principle to the learning of Q-DETR. At the inner level, we conduct a distribution alignment for the queries to maximize the self-information entropy. At the upper level, we introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy. Extensive experimental results show that our method performs much better than prior arts. For example, the 4-bit Q-DETR can theoretically accelerate DETR with ResNet-50 backbone by 6.6x and achieve 39.4% AP, with only 2.6% performance gaps than its real-valued counterpart on the COCO dataset.

7.3CVMar 24, 2022

Bi-level Doubly Variational Learning for Energy-based Latent Variable Models

Ge Kan, Jinhu Lü, Tian Wang et al.

Energy-based latent variable models (EBLVMs) are more expressive than conventional energy-based models. However, its potential on visual tasks are limited by its training process based on maximum likelihood estimate that requires sampling from two intractable distributions. In this paper, we propose Bi-level doubly variational learning (BiDVL), which is based on a new bi-level optimization framework and two tractable variational distributions to facilitate learning EBLVMs. Particularly, we lead a decoupled EBLVM consisting of a marginal energy-based distribution and a structural posterior to handle the difficulties when learning deep EBLVMs on images. By choosing a symmetric KL divergence in the lower level of our framework, a compact BiDVL for visual tasks can be obtained. Our model achieves impressive image generation performance over related works. It also demonstrates the significant capacity of testing image reconstruction and out-of-distribution detection.

5.8ROApr 14

Relative Pose Estimation for Nonholonomic Robot Formation with UWB-IO Measurements (Extended version)

Kunrui Ze, Wei Wang, Shuoyu Yue et al.

This article studies the problem of distributed formation control for multiple robots by using onboard ultra wide band (UWB) distance and inertial odometer (IO) measurements. Although this problem has been widely studied, a fundamental limitation of most works is that they require each robot's pose and sensor measurements are expressed in a common reference frame. However, it is inapplicable for nonholonomic robot formations due to the practical difficulty of aligning IO measurements of individual robot in a common frame. To address this problem, firstly, a concurrent-learning based estimator is firstly proposed to achieve relative localization between neighboring robots in a local frame. Different from most relative localization methods in a global frame, both relative position and orientation in a local frame are estimated with only UWB ranging and IO measurements. Secondly, to deal with information loss caused by directed communication topology, a cooperative localization algorithm is introduced to estimate the relative pose to the leader robot. Thirdly, based on the theoretical results on relative pose estimation, a distributed formation tracking controller is proposed for nonholonomic robots. Both 3D and 2D real-world experiments conducted on aerial robots and grounded robots are provided to demonstrate the effectiveness of the proposed method.

12.4AISep 23, 2025

Memory in Large Language Models: Mechanisms, Evaluation and Evolution

Dianxing Zhang, Wendong Li, Kani Song et al.

Under a unified operational definition, we define LLM memory as a persistent state written during pretraining, finetuning, or inference that can later be addressed and that stably influences outputs. We propose a four-part taxonomy (parametric, contextual, external, procedural/episodic) and a memory quadruple (location, persistence, write/access path, controllability). We link mechanism, evaluation, and governance via the chain write -> read -> inhibit/update. To avoid distorted comparisons across heterogeneous setups, we adopt a three-setting protocol (parametric only, offline retrieval, online retrieval) that decouples capability from information availability on the same data and timeline. On this basis we build a layered evaluation: parametric (closed-book recall, edit differential, memorization/privacy), contextual (position curves and the mid-sequence drop), external (answer correctness vs snippet attribution/faithfulness), and procedural/episodic (cross-session consistency and timeline replay, E MARS+). The framework integrates temporal governance and leakage auditing (freshness hits, outdated answers, refusal slices) and uncertainty reporting via inter-rater agreement plus paired tests with multiple-comparison correction. For updating and forgetting, we present DMM Gov: coordinating DAPT/TAPT, PEFT, model editing (ROME, MEND, MEMIT, SERAC), and RAG to form an auditable loop covering admission thresholds, rollout, monitoring, rollback, and change audits, with specs for timeliness, conflict handling, and long-horizon consistency. Finally, we give four testable propositions: minimum identifiability; a minimal evaluation card; causally constrained editing with verifiable forgetting; and when retrieval with small-window replay outperforms ultra-long-context reading. This yields a reproducible, comparable, and governable coordinate system for research and deployment.

3.2ROApr 22, 2025

Bidirectional Task-Motion Planning Based on Hierarchical Reinforcement Learning for Strategic Confrontation

Qizhen Wu, Lei Chen, Kexin Liu et al.

In swarm robotics, confrontation scenarios, including strategic confrontations, require efficient decision-making that integrates discrete commands and continuous actions. Traditional task and motion planning methods separate decision-making into two layers, but their unidirectional structure fails to capture the interdependence between these layers, limiting adaptability in dynamic environments. Here, we propose a novel bidirectional approach based on hierarchical reinforcement learning, enabling dynamic interaction between the layers. This method effectively maps commands to task allocation and actions to path planning, while leveraging cross-training techniques to enhance learning across the hierarchical framework. Furthermore, we introduce a trajectory prediction model that bridges abstract task representations with actionable planning goals. In our experiments, it achieves over 80% in confrontation win rate and under 0.01 seconds in decision time, outperforming existing approaches. Demonstrations through large-scale tests and real-world robot experiments further emphasize the generalization capabilities and practical applicability of our method.

7.1LGApr 16, 2025

HyperSAT: Unsupervised Hypergraph Neural Networks for Weighted MaxSAT Problems

Qiyue Chen, Shaolin Tan, Suixiang Gao et al.

Graph neural networks (GNNs) have shown promising performance in solving both Boolean satisfiability (SAT) and Maximum Satisfiability (MaxSAT) problems due to their ability to efficiently model and capture the structural dependencies between literals and clauses. However, GNN methods for solving Weighted MaxSAT problems remain underdeveloped. The challenges arise from the non-linear dependency and sensitive objective function, which are caused by the non-uniform distribution of weights across clauses. In this paper, we present HyperSAT, a novel neural approach that employs an unsupervised hypergraph neural network model to solve Weighted MaxSAT problems. We propose a hypergraph representation for Weighted MaxSAT instances and design a cross-attention mechanism along with a shared representation constraint loss function to capture the logical interactions between positive and negative literal nodes in the hypergraph. Extensive experiments on various Weighted MaxSAT datasets demonstrate that HyperSAT achieves better performance than state-of-the-art competitors.

5.7ROJun 12, 2024Code

Hierarchical Reinforcement Learning for Swarm Confrontation with High Uncertainty

Qizhen Wu, Kexin Liu, Lei Chen et al.

In swarm robotics, confrontation including the pursuit-evasion game is a key scenario. High uncertainty caused by unknown opponents' strategies, dynamic obstacles, and insufficient training complicates the action space into a hybrid decision process. Although the deep reinforcement learning method is significant for swarm confrontation since it can handle various sizes, as an end-to-end implementation, it cannot deal with the hybrid process. Here, we propose a novel hierarchical reinforcement learning approach consisting of a target allocation layer, a path planning layer, and the underlying dynamic interaction mechanism between the two layers, which indicates the quantified uncertainty. It decouples the hybrid process into discrete allocation and continuous planning layers, with a probabilistic ensemble model to quantify the uncertainty and regulate the interaction frequency adaptively. Furthermore, to overcome the unstable training process introduced by the two layers, we design an integration training method including pre-training and cross-training, which enhances the training efficiency and stability. Experiment results in both comparison, ablation, and real-robot studies validate the effectiveness and generalization performance of our proposed approach. In our defined experiments with twenty to forty agents, the win rate of the proposed method reaches around ninety percent, outperforming other traditional methods.

15.3AIMay 27, 2021

Learning to Optimize Industry-Scale Dynamic Pickup and Delivery Problems

Xijun Li, Weilin Luo, Mingxuan Yuan et al.

The Dynamic Pickup and Delivery Problem (DPDP) is aimed at dynamically scheduling vehicles among multiple sites in order to minimize the cost when delivery orders are not known a priori. Although DPDP plays an important role in modern logistics and supply chain management, state-of-the-art DPDP algorithms are still limited on their solution quality and efficiency. In practice, they fail to provide a scalable solution as the numbers of vehicles and sites become large. In this paper, we propose a data-driven approach, Spatial-Temporal Aided Double Deep Graph Network (ST-DDGN), to solve industry-scale DPDP. In our method, the delivery demands are first forecast using spatial-temporal prediction method, which guides the neural network to perceive spatial-temporal distribution of delivery demand when dispatching vehicles. Besides, the relationships of individuals such as vehicles are modelled by establishing a graph-based value function. ST-DDGN incorporates attention-based graph embedding with Double DQN (DDQN). As such, it can make the inference across vehicles more efficiently compared with traditional methods. Our method is entirely data driven and thus adaptive, i.e., the relational representation of adjacent vehicles can be learned and corrected by ST-DDGN from data periodically. We have conducted extensive experiments over real-world data to evaluate our solution. The results show that ST-DDGN reduces 11.27% number of the used vehicles and decreases 13.12% total transportation cost on average over the strong baselines, including the heuristic algorithm deployed in our UAT (User Acceptance Test) environment and a variety of vanilla DRL methods. We are due to fully deploy our solution into our online logistics system and it is estimated that millions of USD logistics cost can be saved per year.

1.4CVMar 8, 2021

Interpretable Attention Guided Network for Fine-grained Visual Classification

Zhenhuan Huang, Xiaoyue Duan, Bo Zhao et al.

Fine-grained visual classification (FGVC) is challenging but more critical than traditional classification tasks. It requires distinguishing different subcategories with the inherently subtle intra-class object variations. Previous works focus on enhancing the feature representation ability using multiple granularities and discriminative regions based on the attention strategy or bounding boxes. However, these methods highly rely on deep neural networks which lack interpretability. We propose an Interpretable Attention Guided Network (IAGN) for fine-grained visual classification. The contributions of our method include: i) an attention guided framework which can guide the network to extract discriminitive regions in an interpretable way; ii) a progressive training mechanism obtained to distill knowledge stage by stage to fuse features of various granularities; iii) the first interpretable FGVC method with a competitive performance on several standard FGVC benchmark datasets.

9.6CRMar 27, 2018

Cryptanalysis of a Chaotic Image Encryption Algorithm Based on Information Entropy

Chengqing Li, Dongdong Lin, Bingbing Feng et al.

Recently, a chaotic image encryption algorithm based on information entropy (IEAIE) was proposed. This paper scrutinizes the security properties of the algorithm and evaluates the validity of the used quantifiable security metrics. When the round number is only one, the equivalent secret key of every basic operation of IEAIE can be recovered with a differential attack separately. Some common insecurity problems in the field of chaotic image encryption are found in IEAIE, e.g. the short orbits of the digital chaotic system and the invalid sensitivity mechanism built on information entropy of the plain image. Even worse, each security metric is questionable, which undermines the security credibility of IEAIE. Hence, IEAIE can only serve as a counterexample for illustrating common pitfalls in designing secure communication method for image data.

12.3CRNov 6, 2017

Cryptanalyzing an image encryption algorithm based on autoblocking and electrocardiography

Chengqing Li, Dongdong Lin, Jinhu Lü et al.

This paper analyzes the security of an image encryption algorithm proposed by Ye and Huang [\textit{IEEE MultiMedia}, vol. 23, pp. 64-71, 2016]. The Ye-Huang algorithm uses electrocardiography (ECG) signals to generate the initial key for a chaotic system and applies an autoblocking method to divide a plain image into blocks of certain sizes suitable for subsequent encryption. The designers claimed that the proposed algorithm is "strong and flexible enough for practical applications". In this paper, we perform a thorough analysis of their algorithm from the view point of modern cryptography. We find it is vulnerable to the known plaintext attack: based on one pair of a known plain-image and its corresponding cipher-image, an adversary is able to derive a mask image, which can be used as an equivalent secret key to successfully decrypt other cipher-images encrypted under the same key with a non-negligible probability of 1/256. Using this as a typical counterexample, we summarize security defects in the design of the Ye-Huang algorithm. The lessons are generally applicable to many other image encryption schemes.

15.9CRSep 17, 2016

On the cryptanalysis of Fridrich's chaotic image encryption scheme

Eric Yong Xie, Chengqing Li, Simin Yu et al.

Utilizing complex dynamics of chaotic maps and systems in encryption was studied comprehensively in the past two and a half decades. In 1989, Fridrich's chaotic image encryption scheme was designed by iterating chaotic position permutation and value substitution some rounds, which received intensive attention in the field of chaos-based cryptography. In 2010, Solak \textit{et al.} proposed a chosen-ciphertext attack on the Fridrich's scheme utilizing influence network between cipher-pixels and the corresponding plain-pixels. Based on their creative work, this paper scrutinized some properties of Fridrich's scheme with concise mathematical language. Then, some minor defects of the real performance of Solak's attack method were given. The work provides some bases for further optimizing attack on the Fridrich's scheme and its variants.

13.0CRJul 6, 2016

Cryptanalyzing an Image-Scrambling Encryption Algorithm of Pixel Bits

Chengqing Li, Dongdong Lin, Jinhu Lü

Position scrambling (permutation) is widely used in multimedia encryption schemes and some international encryption standards, such as the Data Encryption Standard and the Advanced Encryption Standard. In this article, the authors re-evaluate the security of a typical image-scrambling encryption algorithm (ISEA). Using the internal correlation remaining in the cipher image, they disclose important visual information of the corresponding plain image in a ciphertext-only attack scenario. Furthermore, they found that the real scrambling domain--the position-scrambling scope of ISEA's scrambled elements--can be used to support an efficient known or chosen-plaintext attack on it. Detailed experimental results have verified these points and demonstrate that some advanced multimedia processing techniques can facilitate the cryptanalysis of multimedia encryption algorithms.