Fei Wen

CV
h-index22
31papers
771citations
Novelty55%
AI Score55

31 Papers

CVJun 21, 2022Code
Optimally Controllable Perceptual Lossy Compression

Zeyu Yan, Fei Wen, Peilin Liu

Recent studies in lossy compression show that distortion and perceptual quality are at odds with each other, which put forward the tradeoff between distortion and perception (D-P). Intuitively, to attain different perceptual quality, different decoders have to be trained. In this paper, we present a nontrivial finding that only two decoders are sufficient for optimally achieving arbitrary (an infinite number of different) D-P tradeoff. We prove that arbitrary points of the D-P tradeoff bound can be achieved by a simple linear interpolation between the outputs of a minimum MSE decoder and a specifically constructed perfect perceptual decoder. Meanwhile, the perceptual quality (in terms of the squared Wasserstein-2 distance metric) can be quantitatively controlled by the interpolation factor. Furthermore, to construct a perfect perceptual decoder, we propose two theoretically optimal training frameworks. The new frameworks are different from the distortion-plus-adversarial loss based heuristic framework widely used in existing methods, which are not only theoretically optimal but also can yield state-of-the-art performance in practical perceptual decoding. Finally, we validate our theoretical finding and demonstrate the superiority of our frameworks via experiments. Code is available at: https://github.com/ZeyuYan/Controllable-Perceptual-Compression

LGSep 3, 2024
Brain-Inspired Online Adaptation for Remote Sensing with Spiking Neural Network

Dexin Duan, Peilin liu, Bingwei Hui et al.

On-device computing, or edge computing, is becoming increasingly important for remote sensing, particularly in applications like deep network-based perception on on-orbit satellites and unmanned aerial vehicles (UAVs). In these scenarios, two brain-like capabilities are crucial for remote sensing models: (1) high energy efficiency, allowing the model to operate on edge devices with limited computing resources, and (2) online adaptation, enabling the model to quickly adapt to environmental variations, weather changes, and sensor drift. This work addresses these needs by proposing an online adaptation framework based on spiking neural networks (SNNs) for remote sensing. Starting with a pretrained SNN model, we design an efficient, unsupervised online adaptation algorithm, which adopts an approximation of the BPTT algorithm and only involves forward-in-time computation that significantly reduces the computational complexity of SNN adaptation learning. Besides, we propose an adaptive activation scaling scheme to boost online SNN adaptation performance, particularly in low time-steps. Furthermore, for the more challenging remote sensing detection task, we propose a confidence-based instance weighting scheme, which substantially improves adaptation performance in the detection task. To our knowledge, this work is the first to address the online adaptation of SNNs. Extensive experiments on seven benchmark datasets across classification, segmentation, and detection tasks demonstrate that our proposed method significantly outperforms existing domain adaptation and domain generalization approaches under varying weather conditions. The proposed method enables energy-efficient and fast online adaptation on edge devices, and has much potential in applications such as remote perception on on-orbit satellites and UAV.

CVApr 11, 2023
Loop Closure Detection Based on Object-level Spatial Layout and Semantic Consistency

Xingwu Ji, Peilin Liu, Haochen Niu et al.

Visual simultaneous localization and mapping (SLAM) systems face challenges in detecting loop closure under the circumstance of large viewpoint changes. In this paper, we present an object-based loop closure detection method based on the spatial layout and semanic consistency of the 3D scene graph. Firstly, we propose an object-level data association approach based on the semantic information from semantic labels, intersection over union (IoU), object color, and object embedding. Subsequently, multi-view bundle adjustment with the associated objects is utilized to jointly optimize the poses of objects and cameras. We represent the refined objects as a 3D spatial graph with semantics and topology. Then, we propose a graph matching approach to select correspondence objects based on the structure layout and semantic property similarity of vertices' neighbors. Finally, we jointly optimize camera trajectories and object poses in an object-level pose graph optimization, which results in a globally consistent map. Experimental results demonstrate that our proposed data association approach can construct more accurate 3D semantic maps, and our loop closure method is more robust than point-based and object-based methods in circumstances with large viewpoint changes.

ITJun 18, 2011
Anti-measurement Matrix Uncertainty Sparse Signal Recovery for Compressive Sensing

Yipeng Liu, Qun Wan, Fei Wen et al.

Compressive sensing (CS) is a technique for estimating a sparse signal from the random measurements and the measurement matrix. Traditional sparse signal recovery methods have seriously degeneration with the measurement matrix uncertainty (MMU). Here the MMU is modeled as a bounded additive error. An anti-uncertainty constraint in the form of a mixed L2 and L1 norm is deduced from the sparse signal model with MMU. Then we combine the sparse constraint with the anti-uncertainty constraint to get an anti-uncertainty sparse signal recovery operator. Numerical simulations demonstrate that the proposed operator has a better reconstructing performance with the MMU than traditional methods.

ITJun 18, 2011
Sparse Support Recovery with Phase-Only Measurements

Yipeng Liu, Qun Wan, Fei Wen et al.

Sparse support recovery (SSR) is an important part of the compressive sensing (CS). Most of the current SSR methods are with the full information measurements. But in practice the amplitude part of the measurements may be seriously destroyed. The corrupted measurements mismatch the current SSR algorithms, which leads to serious performance degeneration. This paper considers the problem of SSR with only phase information. In the proposed method, the minimization of the l1 norm of the estimated sparse signal enforces sparse distribution, while a nonzero constraint of the uncorrupted random measurements' amplitudes with respect to the reconstructed sparse signal is introduced. Because it only requires the phase components of the measurements in the constraint, it can avoid the performance deterioration by corrupted amplitude components. Simulations demonstrate that the proposed phase-only SSR is superior in the support reconstruction accuracy when the amplitude components of the measurements are contaminated.

CVSep 7, 2024
Explicit Mutual Information Maximization for Self-Supervised Learning

Lele Chang, Peilin Liu, Qinghai Guo et al.

Recently, self-supervised learning (SSL) has been extensively studied. Theoretically, mutual information maximization (MIM) is an optimal criterion for SSL, with a strong theoretical foundation in information theory. However, it is difficult to directly apply MIM in SSL since the data distribution is not analytically available in applications. In practice, many existing methods can be viewed as approximate implementations of the MIM criterion. This work shows that, based on the invariance property of MI, explicit MI maximization can be applied to SSL under a generic distribution assumption, i.e., a relaxed condition of the data distribution. We further illustrate this by analyzing the generalized Gaussian distribution. Based on this result, we derive a loss function based on the MIM criterion using only second-order statistics. We implement the new loss for SSL and demonstrate its effectiveness via extensive experiments.

CVSep 13, 2024
VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation

Hanning Chen, Yang Ni, Wenjun Huang et al.

Vision Transformers (ViTs) have emerged as the backbone of many segmentation models, consistently achieving state-of-the-art (SOTA) performance. However, their success comes at a significant computational cost. Image token pruning is one of the most effective strategies to address this complexity. However, previous approaches fall short when applied to more complex task-oriented segmentation (TOS), where the class of each image patch is not predefined but dependent on the specific input task. This work introduces the Vision Language Guided Token Pruning (VLTP), a novel token pruning mechanism that can accelerate ViT-based segmentation models, particularly for TOS guided by multi-modal large language model (MLLM). We argue that ViT does not need to process every image token through all of its layers -- only the tokens related to reasoning tasks are necessary. We design a new pruning decoder to take both image tokens and vision-language guidance as input to predict the relevance of each image token to the task. Only image tokens with high relevance are passed to deeper layers of the ViT. Experiments show that the VLTP framework reduces the computational costs of ViT by approximately 25% without performance degradation and by around 40% with only a 1% performance drop. The code associated with this study can be found at this URL.

CVSep 1, 2024
Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach

Wenjun Huang, Yang Ni, Arghavan Rezvani et al.

Human pose estimation (HPE) is crucial for various applications. However, deploying HPE algorithms in surveillance contexts raises significant privacy concerns due to the potential leakage of sensitive personal information (SPI) such as facial features, and ethnicity. Existing privacy-enhancing methods often compromise either privacy or performance, or they require costly additional modalities. We propose a novel privacy-enhancing system that generates privacy-enhanced portraits while maintaining high HPE performance. Our key innovations include the reversible recovery of SPI for authorized personnel and the preservation of contextual information. By jointly optimizing a privacy-enhancing module, a privacy recovery module, and a pose estimator, our system ensures robust privacy protection, efficient SPI recovery, and high-performance HPE. Experimental results demonstrate the system's robust performance in privacy enhancement, SPI recovery, and HPE.

NENov 3, 2025
A High-Throughput Spiking Neural Network Processor Enabling Synaptic Delay Emulation

Faquan Chen, Qingyang Tian, Ziren Wu et al.

Synaptic delay has attracted significant attention in neural network dynamics for integrating and processing complex spatiotemporal information. This paper introduces a high-throughput Spiking Neural Network (SNN) processor that supports synaptic delay-based emulation for edge applications. The processor leverages a multicore pipelined architecture with parallel compute engines, capable of real-time processing of the computational load associated with synaptic delays. We develop a SoC prototype of the proposed processor on PYNQ Z2 FPGA platform and evaluate its performance using the Spiking Heidelberg Digits (SHD) benchmark for low-power keyword spotting tasks. The processor achieves 93.4% accuracy in deployment and an average throughput of 104 samples/sec at a typical operating frequency of 125 MHz and 282 mW power consumption.

ARMar 24
TRINE: A Token-Aware, Runtime-Adaptive FPGA Inference Engine for Multimodal AI

Hyunwoo Oh, Hanning Chen, Sanggeon Yun et al.

Multimodal stacks that mix ViTs, CNNs, GNNs, and transformer NLP strain embedded platforms because their compute/memory patterns diverge and hard real-time targets leave little slack. TRINE is a single-bitstream FPGA accelerator and compiler that executes end-to-end multimodal inference without reconfiguration. Layers are unified as DDMM/SDDMM/SpMM and mapped to a mode-switchable engine that toggles at runtime among weight/output-stationary systolic, 1xCS SIMD, and a routable adder tree (RADT) on a shared PE array. A width-matched, two-stage top-k unit enables in-stream token pruning, while dependency-aware layer offloading (DALO) overlaps independent kernels across reconfigurable processing units to sustain utilization. Evaluated on Alveo U50 and ZCU104, TRINE reduces latency by up to 22.57x vs. RTX 4090 and 6.86x vs. Jetson Orin Nano at 20-21 W; token pruning alone yields up to 7.8x on ViT-heavy pipelines, and DALO contributes up to 79% throughput improvement. With int8 quantization, accuracy drops remain <2.5% across representative tasks, delivering state-of-the-art latency and energy efficiency for unified vision, language, and graph workloads-in one bitstream.

ROApr 16, 2025Code
An Online Adaptation Method for Robust Depth Estimation and Visual Odometry in the Open World

Xingwu Ji, Haochen Niu, Dexin Duan et al.

Recently, learning-based robotic navigation systems have gained extensive research attention and made significant progress. However, the diversity of open-world scenarios poses a major challenge for the generalization of such systems to practical scenarios. Specifically, learned systems for scene measurement and state estimation tend to degrade when the application scenarios deviate from the training data, resulting to unreliable depth and pose estimation. Toward addressing this problem, this work aims to develop a visual odometry system that can fast adapt to diverse novel environments in an online manner. To this end, we construct a self-supervised online adaptation framework for monocular visual odometry aided by an online-updated depth estimation module. Firstly, we design a monocular depth estimation network with lightweight refiner modules, which enables efficient online adaptation. Then, we construct an objective for self-supervised learning of the depth estimation module based on the output of the visual odometry system and the contextual semantic information of the scene. Specifically, a sparse depth densification module and a dynamic consistency enhancement module are proposed to leverage camera poses and contextual semantics to generate pseudo-depths and valid masks for the online adaptation. Finally, we demonstrate the robustness and generalization capability of the proposed method in comparison with state-of-the-art learning-based approaches on urban, in-house datasets and a robot platform. Code is publicly available at: https://github.com/jixingwu/SOL-SLAM.

IVAug 4, 2021Code
Optimal Transport for Unsupervised Denoising Learning

Wei Wang, Fei Wen, Zeyu Yan et al.

Recently, much progress has been made in unsupervised denoising learning. However, existing methods more or less rely on some assumptions on the signal and/or degradation model, which limits their practical performance. How to construct an optimal criterion for unsupervised denoising learning without any prior knowledge on the degradation model is still an open question. Toward answering this question, this work proposes a criterion for unsupervised denoising learning based on the optimal transport theory. This criterion has favorable properties, e.g., approximately maximal preservation of the information of the signal, whilst achieving perceptual reconstruction. Furthermore, though a relaxed unconstrained formulation is used in practical implementation, we prove that the relaxed formulation in theory has the same solution as the original constrained formulation. Experiments on synthetic and real-world data, including realistic photographic, microscopy, depth, and raw depth images, demonstrate that the proposed method even compares favorably with supervised methods, e.g., approaching the PSNR of supervised methods while having better perceptual quality. Particularly, for spatially correlated noise and realistic microscopy images, the proposed method not only achieves better perceptual quality but also has higher PSNR than supervised methods. Besides, it shows remarkable superiority in harsh practical conditions with complex noise, e.g., raw depth images. Code is available at https://github.com/wangweiSJTU/OTUR.

CVAug 4, 2020Code
Revisiting Robust Model Fitting Using Truncated Loss

Fei Wen, Hewen Wei, Yipeng Liu et al.

Robust fitting is a fundamental problem in low-level vision, which is typically achieved by maximum consensus (MC) estimators to identify inliers first or by M-estimators directly. While these two methods are discriminately preferred in different applications, truncated loss based M-estimators are similar to MC as they can also identify inliers. This work revisits a formulation that achieves simultaneous inlier identification and model estimation (SIME) using truncated loss. It has a generalized form adapts to both linear and nonlinear residual models. We show that as SIME takes fitting residual into account in finding inliers, its lowest achievable residual in model fitting is lower than that of MC robust fitting. Then, an alternating minimization (AM) algorithm is employed to solve the SIME formulation. Meanwhile, a semidefinite relaxation (SDR) embedded AM algorithm is developed in order to ease the high nonconvexity of the SIME formulation. Furthermore, the new algorithms are applied to various 2D/3D registration problems. Experimental results show that the new algorithms significantly outperform RANSAC and deterministic approximate MC methods at high outlier ratios. Besides, in rotation and Euclidean registration problems, the new algorithms also compare favorably with state-of-the-art registration methods, especially in high noise and outliers. Code is available at \textit{https://github.com/FWen/mcme.git}.

LGMar 2, 2019Code
Matrix Completion via Nonconvex Regularization: Convergence of the Proximal Gradient Algorithm

Fei Wen, Rendong Ying, Peilin Liu et al.

Matrix completion has attracted much interest in the past decade in machine learning and computer vision. For low-rank promotion in matrix completion, the nuclear norm penalty is convenient due to its convexity but has a bias problem. Recently, various algorithms using nonconvex penalties have been proposed, among which the proximal gradient descent (PGD) algorithm is one of the most efficient and effective. For the nonconvex PGD algorithm, whether it converges to a local minimizer and its convergence rate are still unclear. This work provides a nontrivial analysis on the PGD algorithm in the nonconvex case. Besides the convergence to a stationary point for a generalized nonconvex penalty, we provide more deep analysis on a popular and important class of nonconvex penalties which have discontinuous thresholding functions. For such penalties, we establish the finite rank convergence, convergence to restricted strictly local minimizer and eventually linear convergence rate of the PGD algorithm. Meanwhile, convergence to a local minimizer has been proved for the hard-thresholding penalty. Our result is the first shows that, nonconvex regularized matrix completion only has restricted strictly local minimizers, and the PGD algorithm can converge to such minimizers with eventually linear rate under certain conditions. Illustration of the PGD algorithm via experiments has also been provided. Code is available at https://github.com/FWen/nmc.

ITAug 16, 2018Code
A Survey on Nonconvex Regularization Based Sparse and Low-Rank Recovery in Signal Processing, Statistics, and Machine Learning

Fei Wen, Lei Chu, Peilin Liu et al.

In the past decade, sparse and low-rank recovery have drawn much attention in many areas such as signal/image processing, statistics, bioinformatics and machine learning. To achieve sparsity and/or low-rankness inducing, the $\ell_1$ norm and nuclear norm are of the most popular regularization penalties due to their convexity. While the $\ell_1$ and nuclear norm are convenient as the related convex optimization problems are usually tractable, it has been shown in many applications that a nonconvex penalty can yield significantly better performance. In recent, nonconvex regularization based sparse and low-rank recovery is of considerable interest and it in fact is a main driver of the recent progress in nonconvex and nonsmooth optimization. This paper gives an overview of this topic in various fields in signal processing, statistics and machine learning, including compressive sensing (CS), sparse regression and variable selection, sparse signals separation, sparse principal component analysis (PCA), large covariance and inverse covariance matrices estimation, matrix completion, and robust PCA. We present recent developments of nonconvex regularization based sparse and low-rank recovery in these fields, addressing the issues of penalty selection, applications and the convergence of nonconvex algorithms. Code is available at https://github.com/FWen/ncreg.git.

CVAug 9, 2018Code
Efficient Outlier Removal in Large Scale Global Structure-from-Motion

Fei Wen, Danping Zou, Rendong Ying et al.

This work addresses the outlier removal problem in large-scale global structure-from-motion. In such applications, global outlier removal is very useful to mitigate the deterioration caused by mismatches in the feature point matching step. Unlike existing outlier removal methods, we exploit the structure in multiview geometry problems to propose a dimension reduced formulation, based on which two methods have been developed. The first method considers a convex relaxed $\ell_1$ minimization and is solved by a single linear programming (LP), whilst the second one approximately solves the ideal $\ell_0$ minimization by an iteratively reweighted method. The dimension reduction results in a significant speedup of the new algorithms. Further, the iteratively reweighted method can significantly reduce the possibility of removing true inliers. Realistic multiview reconstruction experiments demonstrated that, compared with state-of-the-art algorithms, the new algorithms are much more efficient and meanwhile can give improved solution. Matlab code for reproducing the results is available at \textit{https://github.com/FWen/OUTLR.git}.

ITApr 15, 2016Code
Positive Definite Estimation of Large Covariance Matrix Using Generalized Nonconvex Penalties

Fei Wen, Yuan Yang, Peilin Liu et al.

This work addresses the issue of large covariance matrix estimation in high-dimensional statistical analysis. Recently, improved iterative algorithms with positive-definite guarantee have been developed. However, these algorithms cannot be directly extended to use a nonconvex penalty for sparsity inducing. Generally, a nonconvex penalty has the capability of ameliorating the bias problem of the popular convex lasso penalty, and thus is more advantageous. In this work, we propose a class of positive-definite covariance estimators using generalized nonconvex penalties. We develop a first-order algorithm based on the alternating direction method framework to solve the nonconvex optimization problem efficiently. The convergence of this algorithm has been proved. Further, the statistical properties of the new estimators have been analyzed for generalized nonconvex penalties. Moreover, extension of this algorithm to covariance estimation from sketched measurements has been considered. The performances of the new estimators have been demonstrated by both a simulation study and a gene clustering example for tumor tissues. Code for the proposed estimators is available at https://github.com/FWen/Nonconvex-PDLCE.git.

CVApr 29, 2023
Optimal Transport Based Unsupervised Restoration Learning Exploiting Degradation Sparsity

Fei Wen, Wei Wang, Zeyu Yan et al.

Optimal transport (OT) has recently been shown as a promising criterion for unsupervised restoration when no explicit prior model is available. Despite its theoretical appeal, OT still significantly falls short of supervised methods on challenging tasks such as super-resolution, deraining, and dehazing. In this paper, we propose a \emph{sparsity-aware optimal transport} (SOT) framework to bridge this gap by leveraging a key observation: the degradations in these tasks exhibit distinct sparsity in the frequency domain. Incorporating this sparsity prior into OT can significantly reduce the ambiguity of the inverse mapping for restoration and substantially boost performance. We provide analysis to show exploiting degradation sparsity benefits unsupervised restoration learning. Extensive experiments on real-world super-resolution, deraining, and dehazing demonstrate that SOT offers notable performance gains over standard OT, while achieving superior perceptual quality compared to existing supervised and unsupervised methods. In particular, SOT consistently outperforms existing unsupervised methods across all three tasks and narrows the performance gap to supervised counterparts.

ROApr 2
Boosting Vision-Language-Action Finetuning with Feasible Action Neighborhood Prior

Haochen Niu, Kanyu Zhang, Shuyu Yin et al.

In real-world robotic manipulation, states typically admit a neighborhood of near-equivalent actions. That is, for each state, there exist a feasible action neighborhood (FAN) rather than a single correct action, within which motions yield indistinguishable progress. However, prevalent VLA training methodologies are directly inherited from linguistic settings and do not exploit the FAN property, thus leading to poor generalization and low sample efficiency. To address this limitation, we introduce a FAN-guided regularizer that shapes the model's output distribution to align with the geometry of FAN. Concretely, we introduce a Gaussian prior that promotes locally smooth and unimodal predictions around the preferred direction and magnitude. In extensive experiments across both reinforced finetuning (RFT) and supervised finetuning (SFT), our method achieves significant improvement in sample efficiency, and success rate in both in-distribution and out-of-distribution (OOD) scenarios. By aligning with the intrinsic action tolerance of physical manipulation, FAN-guided regularization provides a principled and practical method for sample-efficient, and generalizable VLA adaptation.

SDJan 28
Switchcodec: Adaptive residual-expert sparse quantization for high-fidelity neural audio coding

Xiangbo Wang, Wenbin Jiang, Jin Wang et al.

Recent neural audio compression models often rely on residual vector quantization for high-fidelity coding, but using a fixed number of per-frame codebooks is suboptimal for the wide variability of audio content-especially for signals that are either very simple or highly complex. To address this limitation, we propose SwitchCodec, a neural audio codec based on Residual Experts Vector Quantization (REVQ). REVQ combines a shared quantizer with dynamically routed expert quantizers that are activated according to the input audio, decoupling bitrate from codebook capacity and improving compression efficiency. This design ensures full training and utilization of each quantizer. In addition, a variable-bitrate mechanism adjusts the number of active expert quantizers at inference, enabling multi-bitrate operation without retraining. Experiments demonstrate that SwitchCodec surpasses existing baselines on both objective metrics and subjective listening tests.

LGJul 18, 2024
Analyzing and Bridging the Gap between Maximizing Total Reward and Discounted Reward in Deep Reinforcement Learning

Shuyu Yin, Fei Wen, Peilin Liu et al.

The optimal objective is a fundamental aspect of reinforcement learning (RL), as it determines how policies are evaluated and optimized. While total return maximization is the ideal objective in RL, discounted return maximization is the practical objective due to its stability. This can lead to a misalignment of objectives. To better understand the problem, we theoretically analyze the performance gap between the policy maximizes the total return and the policy maximizes the discounted return. Our analysis reveals that increasing the discount factor can be ineffective at eliminating this gap when environment contains cyclic states,a frequent scenario. To address this issue, we propose two alternative approaches to align the objectives. The first approach achieves alignment by modifying the terminal state value, treating it as a tunable hyper-parameter with its suitable range defined through theoretical analysis. The second approach focuses on calibrating the reward data in trajectories, enabling alignment in practical Deep RL applications using off-policy algorithms. This method enhances robustness to the discount factor and improve performance when the trajectory length is large. Our proposed methods demonstrate that adjusting reward data can achieve alignment, providing an insight that can be leveraged to design new optimization objectives to fundamentally enhance the performance of RL algorithms.

CVMar 12, 2024
TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection

Hanning Chen, Wenjun Huang, Yang Ni et al.

Task-oriented object detection aims to find objects suitable for accomplishing specific tasks. As a challenging task, it requires simultaneous visual data processing and reasoning under ambiguous semantics. Recent solutions are mainly all-in-one models. However, the object detection backbones are pre-trained without text supervision. Thus, to incorporate task requirements, their intricate models undergo extensive learning on a highly imbalanced and scarce dataset, resulting in capped performance, laborious training, and poor generalizability. In contrast, we propose TaskCLIP, a more natural two-stage design composed of general object detection and task-guided object selection. Particularly for the latter, we resort to the recently successful large Vision-Language Models (VLMs) as our backbone, which provides rich semantic knowledge and a uniform embedding space for images and texts. Nevertheless, the naive application of VLMs leads to sub-optimal quality, due to the misalignment between embeddings of object images and their visual attributes, which are mainly adjective phrases. To this end, we design a transformer-based aligner after the pre-trained VLMs to re-calibrate both embeddings. Finally, we employ a trainable score function to post-process the VLM matching results for object selection. Experimental results demonstrate that our TaskCLIP outperforms the state-of-the-art DETR-based model TOIST by 3.5% and only requires a single NVIDIA RTX 4090 for both training and inference.

ARMar 9, 2024
HDReason: Algorithm-Hardware Codesign for Hyperdimensional Knowledge Graph Reasoning

Hanning Chen, Yang Ni, Ali Zakeri et al.

In recent times, a plethora of hardware accelerators have been put forth for graph learning applications such as vertex classification and graph classification. However, previous works have paid little attention to Knowledge Graph Completion (KGC), a task that is well-known for its significantly higher algorithm complexity. The state-of-the-art KGC solutions based on graph convolution neural network (GCN) involve extensive vertex/relation embedding updates and complicated score functions, which are inherently cumbersome for acceleration. As a result, existing accelerator designs are no longer optimal, and a novel algorithm-hardware co-design for KG reasoning is needed. Recently, brain-inspired HyperDimensional Computing (HDC) has been introduced as a promising solution for lightweight machine learning, particularly for graph learning applications. In this paper, we leverage HDC for an intrinsically more efficient and acceleration-friendly KGC algorithm. We also co-design an acceleration framework named HDReason targeting FPGA platforms. On the algorithm level, HDReason achieves a balance between high reasoning accuracy, strong model interpretability, and less computation complexity. In terms of architecture, HDReason offers reconfigurability, high training throughput, and low energy consumption. When compared with NVIDIA RTX 4090 GPU, the proposed accelerator achieves an average 10.6x speedup and 65x energy efficiency improvement. When conducting cross-models and cross-platforms comparison, HDReason yields an average 4.2x higher performance and 3.4x better energy efficiency with similar accuracy versus the state-of-the-art FPGA-based GCN training platform.

LGJun 12, 2024
Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation

Shuyu Yin, Fei Wen, Peilin Liu et al.

Semi-gradient Q-learning is applied in many fields, but due to the absence of an explicit loss function, studying its dynamics and implicit bias in the parameter space is challenging. This paper introduces the Fokker--Planck equation and employs partial data obtained through sampling to construct and visualize the effective loss landscape within a two-dimensional parameter space. This visualization reveals how the global minima in the loss landscape can transform into saddle points in the effective loss landscape, as well as the implicit bias of the semi-gradient method. Additionally, we demonstrate that saddle points, originating from the global minima in loss landscape, still exist in the effective loss landscape under high-dimensional parameter spaces and neural network settings. This paper develop a novel approach for probing implicit bias in semi-gradient Q-learning.

LGFeb 24, 2024
A priori Estimates for Deep Residual Network in Continuous-time Reinforcement Learning

Shuyu Yin, Qixuan Zhou, Fei Wen et al.

Deep reinforcement learning excels in numerous large-scale practical applications. However, existing performance analyses ignores the unique characteristics of continuous-time control problems, is unable to directly estimate the generalization error of the Bellman optimal loss and require a boundedness assumption. Our work focuses on continuous-time control problems and proposes a method that is applicable to all such problems where the transition function satisfies semi-group and Lipschitz properties. Under this method, we can directly analyze the \emph{a priori} generalization error of the Bellman optimal loss. The core of this method lies in two transformations of the loss function. To complete the transformation, we propose a decomposition method for the maximum operator. Additionally, this analysis method does not require a boundedness assumption. Finally, we obtain an \emph{a priori} generalization error without the curse of dimensionality.

LGDec 13, 2021
Simple and Robust Loss Design for Multi-Label Learning with Missing Labels

Youcai Zhang, Yuhao Cheng, Xinyu Huang et al.

Multi-label learning in the presence of missing labels (MLML) is a challenging problem. Existing methods mainly focus on the design of network structures or training schemes, which increase the complexity of implementation. This work seeks to fulfill the potential of loss function in MLML without increasing the procedure and complexity. Toward this end, we propose two simple yet effective methods via robust loss design based on an observation that a model can identify missing labels during training with a high precision. The first is a novel robust loss for negatives, namely the Hill loss, which re-weights negatives in the shape of a hill to alleviate the effect of false negatives. The second is a self-paced loss correction (SPLC) method, which uses a loss derived from the maximum likelihood criterion under an approximate distribution of missing labels. Comprehensive experiments on a vast range of multi-label image classification datasets demonstrate that our methods can remarkably boost the performance of MLML and achieve new state-of-the-art loss functions in MLML.

ITJun 5, 2021
On Perceptual Lossy Compression: The Cost of Perceptual Reconstruction and An Optimal Training Framework

Zeyu Yan, Fei Wen, Rendong Ying et al.

Lossy compression algorithms are typically designed to achieve the lowest possible distortion at a given bit rate. However, recent studies show that pursuing high perceptual quality would lead to increase of the lowest achievable distortion (e.g., MSE). This paper provides nontrivial results theoretically revealing that, \textit{1}) the cost of achieving perfect perception quality is exactly a doubling of the lowest achievable MSE distortion, \textit{2}) an optimal encoder for the "classic" rate-distortion problem is also optimal for the perceptual compression problem, \textit{3}) distortion loss is unnecessary for training a perceptual decoder. Further, we propose a novel training framework to achieve the lowest MSE distortion under perfect perception constraint at a given bit rate. This framework uses a GAN with discriminator conditioned on an MSE-optimized encoder, which is superior over the traditional framework using distortion plus adversarial loss. Experiments are provided to verify the theoretical finding and demonstrate the superiority of the proposed training framework.

CVJan 20, 2021
Scalable Deep Compressive Sensing

Zhonghao Zhang, Yipeng Liu, Xingyu Cao et al.

Deep learning has been used to image compressive sensing (CS) for enhanced reconstruction performance. However, most existing deep learning methods train different models for different subsampling ratios, which brings additional hardware burden. In this paper, we develop a general framework named scalable deep compressive sensing (SDCS) for the scalable sampling and reconstruction (SSR) of all existing end-to-end-trained models. In the proposed way, images are measured and initialized linearly. Two sampling masks are introduced to flexibly control the subsampling ratios used in sampling and reconstruction, respectively. To make the reconstruction model adapt to any subsampling ratio, a training strategy dubbed scalable training is developed. In scalable training, the model is trained with the sampling matrix and the initialization matrix at various subsampling ratios by integrating different sampling matrix masks. Experimental results show that models with SDCS can achieve SSR without changing their structure while maintaining good performance, and SDCS outperforms other SSR methods.

CVSep 15, 2020
A Robust and Reliable Point Cloud Recognition Network Under Rigid Transformation

Dongrui Liu, Chuanchuan Chen, Changqing Xu et al.

Point cloud recognition is an essential task in industrial robotics and autonomous driving. Recently, several point cloud processing models have achieved state-of-the-art performances. However, these methods lack rotation robustness, and their performances degrade severely under random rotations, failing to extend to real-world scenarios with varying orientations. To this end, we propose a method named Self Contour-based Transformation (SCT), which can be flexibly integrated into various existing point cloud recognition models against arbitrary rotations. SCT provides efficient rotation and translation invariance by introducing Contour-Aware Transformation (CAT), which linearly transforms Cartesian coordinates of points to translation and rotation-invariant representations. We prove that CAT is a rotation and translation-invariant transformation based on the theoretical analysis. Furthermore, the Frame Alignment module is proposed to enhance discriminative feature extraction by capturing contours and transforming self contour-based frames into intra-class frames. Extensive experimental results show that SCT outperforms the state-of-the-art approaches under arbitrary rotations in effectiveness and efficiency on synthetic and real-world benchmarks. Furthermore, the robustness and generality evaluations indicate that SCT is robust and is applicable to various point cloud processing models, which highlights the superiority of SCT in industrial applications.

IVJun 29, 2020
Hyperspectral Image Denoising with Partially Orthogonal Matrix Vector Tensor Factorization

Zhen Long, Yipeng Liu, Sixing Zeng et al.

Hyperspectral image (HSI) has some advantages over natural image for various applications due to the extra spectral information. During the acquisition, it is often contaminated by severe noises including Gaussian noise, impulse noise, deadlines, and stripes. The image quality degeneration would badly effect some applications. In this paper, we present a HSI restoration method named smooth and robust low rank tensor recovery. Specifically, we propose a structural tensor decomposition in accordance with the linear spectral mixture model of HSI. It decomposes a tensor into sums of outer matrix vector products, where the vectors are orthogonal due to the independence of endmember spectrums. Based on it, the global low rank tensor structure can be well exposited for HSI denoising. In addition, the 3D anisotropic total variation is used for spatial spectral piecewise smoothness of HSI. Meanwhile, the sparse noise including impulse noise, deadlines and stripes, is detected by the l1 norm regularization. The Frobenius norm is used for the heavy Gaussian noise in some real world scenarios. The alternating direction method of multipliers is adopted to solve the proposed optimization model, which simultaneously exploits the global low rank property and the spatial spectral smoothness of the HSI. Numerical experiments on both simulated and real data illustrate the superiority of the proposed method in comparison with the existing ones.

IVApr 21, 2020
AMP-Net: Denoising based Deep Unfolding for Compressive Image Sensing

Zhonghao Zhang, Yipeng Liu, Jiani Liu et al.

Most compressive sensing (CS) reconstruction methods can be divided into two categories, i.e. model-based methods and classical deep network methods. By unfolding the iterative optimization algorithm for model-based methods onto networks, deep unfolding methods have the good interpretation of model-based methods and the high speed of classical deep network methods. In this paper, to solve the visual image CS problem, we propose a deep unfolding model dubbed AMP-Net. Rather than learning regularization terms, it is established by unfolding the iterative denoising process of the well-known approximate message passing algorithm. Furthermore, AMP-Net integrates deblocking modules in order to eliminate the blocking artifacts that usually appear in CS of visual images. In addition, the sampling matrix is jointly trained with other network parameters to enhance the reconstruction performance. Experimental results show that the proposed AMP-Net has better reconstruction accuracy than other state-of-the-art methods with high reconstruction speed and a small number of network parameters.