Liusheng Huang

CR
h-index46
14papers
243citations
Novelty53%
AI Score43

14 Papers

20.8CVJul 3, 2020Code
Segment as Points for Efficient Online Multi-Object Tracking and Segmentation

Zhenbo Xu, Wei Zhang, Xiao Tan et al.

Current multi-object tracking and segmentation (MOTS) methods follow the tracking-by-detection paradigm and adopt convolutions for feature extraction. However, as affected by the inherent receptive field, convolution based feature extraction inevitably mixes up the foreground features and the background features, resulting in ambiguities in the subsequent instance association. In this paper, we propose a highly effective method for learning instance embeddings based on segments by converting the compact image representation to un-ordered 2D point cloud representation. Our method generates a new tracking-by-points paradigm where discriminative instance embeddings are learned from randomly selected points rather than images. Furthermore, multiple informative data modalities are converted into point-wise representations to enrich point-wise features. The resulting online MOTS framework, named PointTrack, surpasses all the state-of-the-art methods including 3D tracking methods by large margins (5.4% higher MOTSA and 18 times faster over MOTSFusion) with the near real-time speed (22 FPS). Evaluations across three datasets demonstrate both the effectiveness and efficiency of our method. Moreover, based on the observation that current MOTS datasets lack crowded scenes, we build a more challenging MOTS dataset named APOLLO MOTS with higher instance density. Both APOLLO MOTS and our codes are publicly available at https://github.com/detectRecog/PointTrack.

16.6CVMar 1, 2020Code
ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection

Zhenbo Xu, Wei Zhang, Xiaoqing Ye et al.

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.

15.7LGSep 10, 2025
Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism

Jiaming Yan, Jianchun Liu, Hongli Xu et al.

Mixture-of-Experts (MoE) has emerged as a promising architecture for modern large language models (LLMs). However, massive parameters impose heavy GPU memory (i.e., VRAM) demands, hindering the widespread adoption of MoE LLMs. Offloading the expert parameters to CPU RAM offers an effective way to alleviate the VRAM requirements for MoE inference. Existing approaches typically cache a small subset of experts in VRAM and dynamically prefetch experts from RAM during inference, leading to significant degradation in inference speed due to the poor cache hit rate and substantial expert loading latency. In this work, we propose MoEpic, an efficient MoE inference system with a novel expert split mechanism. Specifically, each expert is vertically divided into two segments: top and bottom. MoEpic caches the top segment of hot experts, so that more experts will be stored under the limited VRAM budget, thereby improving the cache hit rate. During each layer's inference, MoEpic predicts and prefetches the activated experts for the next layer. Since the top segments of cached experts are exempt from fetching, the loading time is reduced, which allows efficient transfer-computation overlap. Nevertheless, the performance of MoEpic critically depends on the cache configuration (i.e., each layer's VRAM budget and expert split ratio). To this end, we propose a divide-and-conquer algorithm based on fixed-point iteration for adaptive cache configuration. Extensive experiments on popular MoE LLMs demonstrate that MoEpic can save about half of the GPU cost, while lowering the inference latency by about 37.51%-65.73% compared to the baselines.

4.3DCMar 13, 2025
Collaborative Speculative Inference for Efficient LLM Inference Serving

Luyao Gao, Jianchun Liu, Hongli Xu et al.

Speculative inference is a promising paradigm employing small speculative models (SSMs) as drafters to generate draft tokens, which are subsequently verified in parallel by the target large language model (LLM). This approach enhances the efficiency of inference serving by reducing LLM inference latency and costs while preserving generation quality. However, existing speculative methods face critical challenges, including inefficient resource utilization and limited draft acceptance, which constrain their scalability and overall effectiveness. To overcome these obstacles, we present CoSine, a novel speculative inference system that decouples sequential speculative decoding from parallel verification, enabling efficient collaboration among multiple nodes. Specifically, CoSine routes inference requests to specialized drafters based on their expertise and incorporates a confidence-based token fusion mechanism to synthesize outputs from cooperating drafters, ensuring high-quality draft generation. Additionally, CoSine dynamically orchestrates the execution of speculative decoding and verification in a pipelined manner, employing batch scheduling to selectively group requests and adaptive speculation control to minimize idle periods. By optimizing parallel workflows through heterogeneous node collaboration, CoSine balances draft generation and verification throughput in real-time, thereby maximizing resource utilization. Experimental results demonstrate that CoSine achieves superior performance compared to state-of-the-art speculative approaches. Notably, with equivalent resource costs, CoSine achieves up to a 23.2% decrease in latency and a 32.5% increase in throughput compared to baseline methods.

2.3DBSep 3, 2025
Adaptive KV-Cache Compression without Manually Setting Budget

Chenxia Tang, Jianchun Liu, Hongli Xu et al.

Large language models (LLMs) inference relies heavily on KV-caches to accelerate autoregressive decoding, but the resulting memory footprint grows rapidly with sequence length, posing significant efficiency challenges. Current KV-cache compression methods suffer from a Procrustes' bed problem: they force diverse workloads into fixed compression ratios, leading to suboptimal resource allocation and inference performance. To this end, we present GVote, an adaptive KV-cache compression scheme that eliminates manual budget specification while achieving superior accuracy-efficiency trade-offs. GVote operates on the principle that the important keys are the aggregation of keys required by future queries. The method predicts future query attention demands by Monte-Carlo style sampling potential queries and aggregating selected keys to determine the optimal cache budget without manual specification. Experimental evaluation demonstrates GVote's effectiveness across multiple benchmarks, including GSM8K, RULER and Longbench. Compared to baselines, GVote exhibits 2$\times$ memory reduction while the accuracy maintains higher or comparable.

2.6LGDec 25, 2024
Enhancing Federated Graph Learning via Adaptive Fusion of Structural and Node Characteristics

Xianjun Gao, Jianchun Liu, Hongli Xu et al.

Federated Graph Learning (FGL) has demonstrated the advantage of training a global Graph Neural Network (GNN) model across distributed clients using their local graph data. Unlike Euclidean data (\eg, images), graph data is composed of nodes and edges, where the overall node-edge connections determine the topological structure, and individual nodes along with their neighbors capture local node features. However, existing studies tend to prioritize one aspect over the other, leading to an incomplete understanding of the data and the potential misidentification of key characteristics across varying graph scenarios. Additionally, the non-independent and identically distributed (non-IID) nature of graph data makes the extraction of these two data characteristics even more challenging. To address the above issues, we propose a novel FGL framework, named FedGCF, which aims to simultaneously extract and fuse structural properties and node features to effectively handle diverse graph scenarios. FedGCF first clusters clients by structural similarity, performing model aggregation within each cluster to form the shared structural model. Next, FedGCF selects the clients with common node features and aggregates their models to generate a common node model. This model is then propagated to all clients, allowing common node features to be shared. By combining these two models with a proper ratio, FedGCF can achieve a comprehensive understanding of the graph data and deliver better performance, even under non-IID distributions. Experimental results show that FedGCF improves accuracy by 4.94%-7.24% under different data distributions and reduces communication cost by 64.18%-81.25% to reach the same accuracy compared to baselines.

3.8CRJan 13, 2021Code
F3SNet: A Four-Step Strategy for QIM Steganalysis of Compressed Speech Based on Hierarchical Attention Network

Chuanpeng Guo, Wei Yang, Liusheng Huang

Traditional machine learning-based steganalysis methods on compressed speech have achieved great success in the field of communication security. However, previous studies lacked mathematical description and modeling of the correlation between codewords, and there is still room for improvement in steganalysis for small-sized and low embedding rates sample. To deal with the challenge, We use Bayesian networks to measure different types of correlations between codewords in linear prediction code and present F3SNet -- a four-step strategy: Embedding, Encoding, Attention and Classification for quantizaition index modulation steganalysis of compressed speech based on Hierarchical Attention Network. Among them, Embedding converts codewords into high-density numerical vectors, Encoding uses the memory characteristics of LSTM to retain more information by distributing it among all its vectors and Attention further determines which vectors have a greater impact on the final classification result. To evaluate the performance of F3SNet, we make comprehensive comparison of F3SNet with existing steganography methods. Experimental results show that F3SNet surpasses the state-of-the-art methods, particularly for small-sized and low embedding rate samples.

7.9CVJul 3, 2020
PointTrack++ for Effective Online Multi-Object Tracking and Segmentation

Zhenbo Xu, Wei Zhang, Xiao Tan et al.

Multiple-object tracking and segmentation (MOTS) is a novel computer vision task that aims to jointly perform multiple object tracking (MOT) and instance segmentation. In this work, we present PointTrack++, an effective on-line framework for MOTS, which remarkably extends our recently proposed PointTrack framework. To begin with, PointTrack adopts an efficient one-stage framework for instance segmentation, and learns instance embeddings by converting compact image representations to un-ordered 2D point cloud. Compared with PointTrack, our proposed PointTrack++ offers three major improvements. Firstly, in the instance segmentation stage, we adopt a semantic segmentation decoder trained with focal loss to improve the instance selection quality. Secondly, to further boost the segmentation performance, we propose a data augmentation strategy by copy-and-paste instances into training images. Finally, we introduce a better training strategy in the instance association stage to improve the distinguishability of learned instance embeddings. The resulting framework achieves the state-of-the-art performance on the 5th BMTT MOTChallenge.

9.7CRAug 14, 2019
Aggregating Votes with Local Differential Privacy: Usefulness, Soundness vs. Indistinguishability

Shaowei Wang, Jiachun Du, Wei Yang et al.

Voting plays a central role in bringing crowd wisdom to collective decision making, meanwhile data privacy has been a common ethical/legal issue in eliciting preferences from individuals. This work studies the problem of aggregating individual's voting data under the local differential privacy setting, where usefulness and soundness of the aggregated scores are of major concern. One naive approach to the problem is adding Laplace random noises, however, it makes aggregated scores extremely fragile to new types of strategic behaviors tailored to the local privacy setting: data amplification attack and view disguise attack. The data amplification attack means an attacker's manipulation power is amplified by the privacy-preserving procedure when contributing a fraud vote. The view disguise attack happens when an attacker could disguise malicious data as valid private views to manipulate the voting result. In this work, after theoretically quantifying the estimation error bound and the manipulating risk bound of the Laplace mechanism, we propose two mechanisms improving the usefulness and soundness simultaneously: the weighted sampling mechanism and the additive mechanism. The former one interprets the score vector as probabilistic data. Compared to the Laplace mechanism for Borda voting rule with $d$ candidates, it reduces the mean squared error bound by half and lowers the maximum magnitude risk bound from $+\infty$ to $O(\frac{d^3}{nε})$. The latter one randomly outputs a subset of candidates according to their total scores. Its mean squared error bound is optimized from $O(\frac{d^5}{nε^2})$ to $O(\frac{d^4}{nε^2})$, and its maximum magnitude risk bound is reduced to $O(\frac{d^2}{nε})$. Experimental results validate that our proposed approaches averagely reduce estimation error by $50\%$ and are more robust to adversarial attacks.

5.4HCNov 17, 2018
Attention-based Walking Gait and Direction Recognition in Wi-Fi Networks

Yang Xu, Min Chen, Wei Yang et al.

The study of human gait recognition has been becoming an active research field. In this paper, we propose to adopt the attention-based Recurrent Neural Network (RNN) encoder-decoder framework to implement a cycle-independent human gait and walking direction recognition system in Wi-Fi networks. For capturing more human walking dynamics, two receivers together with one transmitter are deployed in different spatial layouts. In the proposed system, the Channel State Information (CSI) measurements from different receivers are first gathered together and refined to form an integrated walking profile. Then, the RNN encoder reads and encodes the walking profile into primary feature vectors. Given a specific recognition task, the decoder computes a corresponding attention vector which is a weighted sum of the primary features assigned with different attentions, and is finally used to predict the target. The attention scheme motivates our system to learn to adaptively align with different critical clips of CSI data sequence for human walking gait and direction recognitions. We implement our system on commodity Wi-Fi devices in indoor environment, and the experimental results demonstrate that our system can achieve average F1 scores of 89.69% for gait recognition from a group of 8 subjects and 95.06% for direction recognition from 8 walking directions, in addition, the average accuracies of these two recognition tasks both exceed 97%.

1.2DCJan 25, 2017
Personalized Classifier Ensemble Pruning Framework for Mobile Crowdsourcing

Shaowei Wang, Liusheng Huang, Pengzhan Wang et al.

Ensemble learning has been widely employed by mobile applications, ranging from environmental sensing to activity recognitions. One of the fundamental issue in ensemble learning is the trade-off between classification accuracy and computational costs, which is the goal of ensemble pruning. During crowdsourcing, the centralized aggregator releases ensemble learning models to a large number of mobile participants for task evaluation or as the crowdsourcing learning results, while different participants may seek for different levels of the accuracy-cost trade-off. However, most of existing ensemble pruning approaches consider only one identical level of such trade-off. In this study, we present an efficient ensemble pruning framework for personalized accuracy-cost trade-offs via multi-objective optimization. Specifically, for the commonly used linear-combination style of the trade-off, we provide an objective-mixture optimization to further reduce the number of ensemble candidates. Experimental results show that our framework is highly efficient for personalized ensemble pruning, and achieves much better pruning performance with objective-mixture optimization when compared to state-of-art approaches.

14.7CRJul 29, 2013
PS-TRUST: Provably Secure Solution for Truthful Double Spectrum Auctions

Zhili Chen, Liusheng Huang, Lu Li et al.

Truthful spectrum auctions have been extensively studied in recent years. Truthfulness makes bidders bid their true valuations, simplifying greatly the analysis of auctions. However, revealing one's true valuation causes severe privacy disclosure to the auctioneer and other bidders. To make things worse, previous work on secure spectrum auctions does not provide adequate security. In this paper, based on TRUST, we propose PS-TRUST, a provably secure solution for truthful double spectrum auctions. Besides maintaining the properties of truthfulness and special spectrum reuse of TRUST, PS-TRUST achieves provable security against semi-honest adversaries in the sense of cryptography. Specifically, PS-TRUST reveals nothing about the bids to anyone in the auction, except the auction result. To the best of our knowledge, PS-TRUST is the first provably secure solution for spectrum auctions. Furthermore, experimental results show that the computation and communication overhead of PS-TRUST is modest, and its practical applications are feasible.

1.1CLAug 15, 2012
More than Word Frequencies: Authorship Attribution via Natural Frequency Zoned Word Distribution Analysis

Zhili Chen, Liusheng Huang, Wei Yang et al.

With such increasing popularity and availability of digital text data, authorships of digital texts can not be taken for granted due to the ease of copying and parsing. This paper presents a new text style analysis called natural frequency zoned word distribution analysis (NFZ-WDA), and then a basic authorship attribution scheme and an open authorship attribution scheme for digital texts based on the analysis. NFZ-WDA is based on the observation that all authors leave distinct intrinsic word usage traces on texts written by them and these intrinsic styles can be identified and employed to analyze the authorship. The intrinsic word usage styles can be estimated through the analysis of word distribution within a text, which is more than normal word frequency analysis and can be expressed as: which groups of words are used in the text; how frequently does each group of words occur; how are the occurrences of each group of words distributed in the text. Next, the basic authorship attribution scheme and the open authorship attribution scheme provide solutions for both closed and open authorship attribution problems. Through analysis and extensive experimental studies, this paper demonstrates the efficiency of the proposed method for authorship attribution.

3.0CRJan 12, 2012
On the security of an enhanced short signature scheme

Miaomiao Tian, Liusheng Huang, Wei Yang

Currently, short signature is receiving significant attention since it is particularly useful in low-bandwidth communication environments. However, most of the short signature schemes are only based on one intractable assumption. Recently, Su presented an identity-based short signature scheme based on knapsack and bilinear pairing. He claimed that the signature scheme is secure in the random oracle model. Unfortunately, in this paper, we show that his scheme is insecure. Concretely, an adversary can forge a valid signature on any message with respect to any identity in Su's scheme.