CVSep 21, 2022Code
KXNet: A Model-Driven Deep Neural Network for Blind Super-ResolutionJiahong Fu, Hong Wang, Qi Xie et al.
Although current deep learning-based methods have gained promising performance in the blind single image super-resolution (SISR) task, most of them mainly focus on heuristically constructing diverse network architectures and put less emphasis on the explicit embedding of the physical generation mechanism between blur kernels and high-resolution (HR) images. To alleviate this issue, we propose a model-driven deep neural network, called KXNet, for blind SISR. Specifically, to solve the classical SISR model, we propose a simple-yet-effective iterative algorithm. Then by unfolding the involved iterative steps into the corresponding network module, we naturally construct the KXNet. The main specificity of the proposed KXNet is that the entire learning process is fully and explicitly integrated with the inherent physical mechanism underlying this SISR task. Thus, the learned blur kernel has clear physical patterns and the mutually iterative process between blur kernel and HR image can soundly guide the KXNet to be evolved in the right direction. Extensive experiments on synthetic and real data finely demonstrate the superior accuracy and generality of our method beyond the current representative state-of-the-art blind SISR methods. Code is available at: https://github.com/jiahong-fu/KXNet.
CLOct 9, 2022Code
Controllable Dialogue Simulation with In-Context LearningZekun Li, Wenhu Chen, Shiyang Li et al.
Building dialogue systems requires a large corpus of annotated dialogues. Such datasets are usually created via crowdsourcing, which is expensive and time-consuming. In this paper, we propose \textsc{Dialogic}, a novel dialogue simulation method based on large language model in-context learning to automate dataset creation. Seeded with a few annotated dialogues, \textsc{Dialogic} automatically selects in-context examples for demonstration and prompts GPT-3 to generate new dialogues and annotations in a controllable way. Our method can rapidly expand a small set of dialogue data with minimum or zero \textit{human involvement} and \textit{parameter update} and is thus much more cost-efficient and time-saving than crowdsourcing. Experimental results on the MultiWOZ dataset demonstrate that training a model on the simulated dialogues leads to even better performance than using the same amount of human-generated dialogues under the challenging low-resource settings, with as few as 85 dialogues as a seed. When enough data is available, our method can still serve as an effective data augmentation method. Human evaluation results also show that our simulated dialogues have near-human fluency and annotation accuracy. The code and data are available at \textbf{\url{https://github.com/Leezekun/dialogic}}.
IVMay 16, 2022Code
Adaptive Convolutional Dictionary Network for CT Metal Artifact ReductionHong Wang, Yuexiang Li, Deyu Meng et al.
Inspired by the great success of deep neural networks, learning-based methods have gained promising performances for metal artifact reduction (MAR) in computed tomography (CT) images. However, most of the existing approaches put less emphasis on modelling and embedding the intrinsic prior knowledge underlying this specific MAR task into their network designs. Against this issue, we propose an adaptive convolutional dictionary network (ACDNet), which leverages both model-based and learning-based methods. Specifically, we explore the prior structures of metal artifacts, e.g., non-local repetitive streaking patterns, and encode them as an explicit weighted convolutional dictionary model. Then, a simple-yet-effective algorithm is carefully designed to solve the model. By unfolding every iterative substep of the proposed algorithm into a network module, we explicitly embed the prior structure into a deep network, \emph{i.e.,} a clear interpretability for the MAR task. Furthermore, our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image based on its content. Hence, our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods. Comprehensive experiments executed on synthetic and clinical datasets show the superiority of our ACDNet in terms of effectiveness and model generalization. {\color{blue}{\textit{Code is available at {\url{https://github.com/hongwang01/ACDNet}.}}}}
CVSep 6, 2022Code
SIND: A Drone Dataset at Signalized Intersection in ChinaYanchao Xu, Wenbo Shao, Jun Li et al.
Intersection is one of the most challenging scenarios for autonomous driving tasks. Due to the complexity and stochasticity, essential applications (e.g., behavior modeling, motion prediction, safety validation, etc.) at intersections rely heavily on data-driven techniques. Thus, there is an intense demand for trajectory datasets of traffic participants (TPs) in intersections. Currently, most intersections in urban areas are equipped with traffic lights. However, there is not yet a large-scale, high-quality, publicly available trajectory dataset for signalized intersections. Therefore, in this paper, a typical two-phase signalized intersection is selected in Tianjin, China. Besides, a pipeline is designed to construct a Signalized INtersection Dataset (SIND), which contains 7 hours of recording including over 13,000 TPs with 7 types. Then, the behaviors of traffic light violations in SIND are recorded. Furthermore, the SIND is also compared with other similar works. The features of the SIND can be summarized as follows: 1) SIND provides more comprehensive information, including traffic light states, motion parameters, High Definition (HD) map, etc. 2) The category of TPs is diverse and characteristic, where the proportion of vulnerable road users (VRUs) is up to 62.6% 3) Multiple traffic light violations of non-motor vehicles are shown. We believe that SIND would be an effective supplement to existing datasets and can promote related research on autonomous driving.The dataset is available online via: https://github.com/SOTIF-AVLab/SinD
CVNov 7, 2022Code
PeSOTIF: a Challenging Visual Dataset for Perception SOTIF Problems in Long-tail Traffic ScenariosLiang Peng, Jun Li, Wenbo Shao et al.
Perception algorithms in autonomous driving systems confront great challenges in long-tail traffic scenarios, where the problems of Safety of the Intended Functionality (SOTIF) could be triggered by the algorithm performance insufficiencies and dynamic operational environment. However, such scenarios are not systematically included in current open-source datasets, and this paper fills the gap accordingly. Based on the analysis and enumeration of trigger conditions, a high-quality diverse dataset is released, including various long-tail traffic scenarios collected from multiple resources. Considering the development of probabilistic object detection (POD), this dataset marks trigger sources that may cause perception SOTIF problems in the scenarios as key objects. In addition, an evaluation protocol is suggested to verify the effectiveness of POD algorithms in identifying the key objects via uncertainty. The dataset never stops expanding, and the first batch of open-source data includes 1126 frames with an average of 2.27 key objects and 2.47 normal objects in each frame. To demonstrate how to use this dataset for SOTIF research, this paper further quantifies the perception SOTIF entropy to confirm whether a scenario is unknown and unsafe for a perception system. The experimental results show that the quantified entropy can effectively and efficiently reflect the failure of the perception algorithm.
IVDec 26, 2022Code
Orientation-Shared Convolution Representation for CT Metal Artifact LearningHong Wang, Qi Xie, Yuexiang Li et al.
During X-ray computed tomography (CT) scanning, metallic implants carrying with patients often lead to adverse artifacts in the captured CT images and then impair the clinical treatment. Against this metal artifact reduction (MAR) task, the existing deep-learning-based methods have gained promising reconstruction performance. Nevertheless, there is still some room for further improvement of MAR performance and generalization ability, since some important prior knowledge underlying this specific task has not been fully exploited. Hereby, in this paper, we carefully analyze the characteristics of metal artifacts and propose an orientation-shared convolution representation strategy to adapt the physical prior structures of artifacts, i.e., rotationally symmetrical streaking patterns. The proposed method rationally adopts Fourier-series-expansion-based filter parametrization in artifact modeling, which can better separate artifacts from anatomical tissues and boost the model generalizability. Comprehensive experiments executed on synthesized and clinical datasets show the superiority of our method in detail preservation beyond the current representative MAR methods. Code will be available at \url{https://github.com/hongwang01/OSCNet}
IVJun 25, 2023Code
MEPNet: A Model-Driven Equivariant Proximal Network for Joint Sparse-View Reconstruction and Metal Artifact Reduction in CT ImagesHong Wang, Minghao Zhou, Dong Wei et al.
Sparse-view computed tomography (CT) has been adopted as an important technique for speeding up data acquisition and decreasing radiation dose. However, due to the lack of sufficient projection data, the reconstructed CT images often present severe artifacts, which will be further amplified when patients carry metallic implants. For this joint sparse-view reconstruction and metal artifact reduction task, most of the existing methods are generally confronted with two main limitations: 1) They are almost built based on common network modules without fully embedding the physical imaging geometry constraint of this specific task into the dual-domain learning; 2) Some important prior knowledge is not deeply explored and sufficiently utilized. Against these issues, we specifically construct a dual-domain reconstruction model and propose a model-driven equivariant proximal network, called MEPNet. The main characteristics of MEPNet are: 1) It is optimization-inspired and has a clear working mechanism; 2) The involved proximal operator is modeled via a rotation equivariant convolutional neural network, which finely represents the inherent rotational prior underlying the CT scanning that the same organ can be imaged at different angles. Extensive experiments conducted on several datasets comprehensively substantiate that compared with the conventional convolution-based proximal network, such a rotation equivariance mechanism enables our proposed method to achieve better reconstruction performance with fewer network parameters. We will release the code at \url{https://github.com/hongwang01/MEPNet}.
CLOct 13, 2022
Explanations from Large Language Models Make Small Reasoners BetterShiyang Li, Jianshu Chen, Yelong Shen et al.
Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations. In this paper, we consider the problem of leveraging the explanations generated by LLM to improve the training of small reasoners, which are more favorable in real-production deployment due to their low cost. We systematically explore three explanation generation approaches from LLM and utilize a multi-task learning framework to facilitate small models to acquire strong reasoning power together with explanation generation capabilities. Experiments on multiple reasoning tasks show that our method can consistently and significantly outperform finetuning baselines across different settings, and even perform better than finetuning/prompting a 60x larger GPT-3 (175B) model by up to 9.5% in accuracy. As a side benefit, human evaluation further shows that our method can generate high-quality explanations to justify its predictions, moving towards the goal of explainable AI.
NAJun 20, 2018
Fractional Gray-Scott Model: Well-posedness, Discretization, and SimulationsTingting Wang, Fangying Song, Hong Wang et al.
The Gray-Scott (GS) model represents the dynamics and steady state pattern formation in reaction-diffusion systems and has been extensively studied in the past. In this paper, we consider the effects of anomalous diffusion on pattern formation by introducing the fractional Laplacian into the GS model. First, we prove that the continuous solutions of the fractional GS model are unique. We then introduce the Crank-Nicolson (C-N) scheme for time discretization and weighted shifted Grünwald difference operator for spatial discretization. We perform stability analysis for the time semi-discrete numerical scheme, and furthermore, we analyze numerically the errors with benchmark solutions that show second-order convergence both in time and space. We also employ the spectral collocation method in space and C-N scheme in time to solve the GS model in order to verify the accuracy of our numerical solutions. We observe the formation of different patterns at different values of the fractional order, which are quite different than the patterns of the corresponding integer-order GS model, and quantify them by using the radial distribution function (RDF). Finally, we discover the scaling law for steady patterns of the RDFs in terms of the fractional order $1<α\leq 2 $.
NAMar 7, 2017
POD/DEIM Reduced-Order Modeling of Time-Fractional Partial Differential Equations with Applications in Parameter IdentificationHongfei Fu, Hong Wang, Zhu Wang
In this paper, a reduced-order model (ROM) based on the proper orthogonal decomposition and the discrete empirical interpolation method is proposed for efficiently simulating time-fractional partial differential equations (TFPDEs). Both linear and nonlinear equations are considered. We demonstrate the effectiveness of the ROM by several numerical examples, in which the ROM achieves the same accuracy of the full-order model (FOM) over a long-term simulation while greatly reducing the computational cost. The proposed ROM is then regarded as a surrogate of FOM and is applied to an inverse problem for identifying the order of the time-fractional derivative of the TFPDE model. Based on the Levenberg--Marquardt regularization iterative method with the Armijo rule, we develop a ROM-based algorithm for solving the inverse problem. For cases in which the observation data is either uncontaminated or contaminated by random noise, the proposed approach is able to achieve accurate parameter estimation efficiently.
CLAug 9, 2022
Limitations of Language Models in Arithmetic and Symbolic InductionJing Qian, Hong Wang, Zekun Li et al.
Recent work has shown that large pretrained Language Models (LMs) can not only perform remarkably well on a range of Natural Language Processing (NLP) tasks but also start improving on reasoning tasks such as arithmetic induction, symbolic manipulation, and commonsense reasoning with increasing size of models. However, it is still unclear what the underlying capabilities of these LMs are. Surprisingly, we find that these models have limitations on certain basic symbolic manipulation tasks such as copy, reverse, and addition. When the total number of symbols or repeating symbols increases, the model performance drops quickly. We investigate the potential causes behind this phenomenon and examine a set of possible methods, including explicit positional markers, fine-grained computation steps, and LMs with callable programs. Experimental results show that none of these techniques can solve the simplest addition induction problem completely. In the end, we introduce LMs with tutor, which demonstrates every single step of teaching. LMs with tutor is able to deliver 100% accuracy in situations of OOD and repeating symbols, shedding new insights on the boundary of large LMs in induction.
LGFeb 10, 2023Code
XFL: A High Performace, Lightweighted Federated Learning FrameworkHong Wang, Yuanzhi Zhou, Chi Zhang et al.
This paper introduces XFL, an industrial-grade federated learning project. XFL supports training AI models collaboratively on multiple devices, while utilizes homomorphic encryption, differential privacy, secure multi-party computation and other security technologies ensuring no leakage of data. XFL provides an abundant algorithms library, integrating a large number of pre-built, secure and outstanding federated learning algorithms, covering both the horizontally and vertically federated learning scenarios. Numerical experiments have shown the prominent performace of these algorithms. XFL builds a concise configuration interfaces with presettings for all federation algorithms, and supports the rapid deployment via docker containers.Therefore, we believe XFL is the most user-friendly and easy-to-develop federated learning framework. XFL is open-sourced, and both the code and documents are available at https://github.com/paritybit-ai/XFL.
CVJan 11, 2023
How Does Traffic Environment Quantitatively Affect the Autonomous Driving Prediction?Wenbo Shao, Yanchao Xu, Jun Li et al.
An accurate trajectory prediction is crucial for safe and efficient autonomous driving in complex traffic environments. In recent years, artificial intelligence has shown strong capabilities in improving prediction accuracy. However, its characteristics of inexplicability and uncertainty make it challenging to determine the traffic environmental effect on prediction explicitly, posing significant challenges to safety-critical decision-making. To address these challenges, this study proposes a trajectory prediction framework with the epistemic uncertainty estimation ability that outputs high uncertainty when confronting unforeseeable or unknown scenarios. The proposed framework is used to analyze the environmental effect on the prediction algorithm performance. In the analysis, the traffic environment is considered in terms of scenario features and shifts, respectively, where features are divided into kinematic features of a target agent, features of its surrounding traffic participants, and other features. In addition, feature correlation and importance analyses are performed to study the above features' influence on the prediction error and epistemic uncertainty. Further, a cross-dataset case study is conducted using multiple intersection datasets to investigate the impact of unavoidable distributional shifts in the real world on trajectory prediction. The results indicate that the deep ensemble-based method has advantages in improving prediction robustness and estimating epistemic uncertainty. The consistent conclusions are obtained by the feature correlation and importance analyses, including the conclusion that kinematic features of the target agent have relatively strong effects on the prediction error and epistemic uncertainty. Furthermore, the prediction failure caused by distributional shifts and the potential of the deep ensemble-based method are analyzed.
LGOct 12, 2022
Towards Theoretically Inspired Neural Initialization OptimizationYibo Yang, Hong Wang, Haobo Yuan et al.
Automated machine learning has been widely explored to reduce human efforts in designing neural architectures and looking for proper hyperparameters. In the domain of neural initialization, however, similar automated techniques have rarely been studied. Most existing initialization methods are handcrafted and highly dependent on specific architectures. In this paper, we propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network. Specifically, GradCosine is the cosine similarity of sample-wise gradients with respect to the initialized parameters. By analyzing the sample-wise optimization landscape, we show that both the training and test performance of a network can be improved by maximizing GradCosine under gradient norm constraint. Based on this observation, we further propose the neural initialization optimization (NIO) algorithm. Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost compared with the training time. With NIO, we improve the classification performance of a variety of neural architectures on CIFAR-10, CIFAR-100, and ImageNet. Moreover, we find that our method can even help to train large vision Transformer architecture without warmup.
NAAug 8, 2018
A fast solver for spectral element approximation applied to fractional differential equations using hierarchical matrix approximationXianjuan Li, Zhiping Mao, Fangying Song et al.
We develop a fast solver for the spectral element method (SEM) applied to the two-sided fractional diffusion equation on uniform, geometric and graded meshes. By approximating the singular kernel with a degenerate kernel, we construct a hierarchical matrix (H-matrix) to represent the stiffness matrix of the SEM and provide error estimates verified numerically. We can solve efficiently the H-matrix approximation problem using a hierarchical LU decomposition method, which reduces the computational cost to $O(R^2 N_d \log^2N) +O(R^3 N_d \log N)$, where $R$ it is the rank of submatrices of the H-matrix approximation, $N_d$ is the total number of degrees of freedom and $N$ is the number of elements. However, we lose the high accuracy of the SEM. Thus, we solve the corresponding preconditioned system by using the H-matrix approximation problem as a preconditioner, recovering the high order accuracy of the SEM. The condition number of the preconditioned system is independent of the polynomial degree $P$ and grows with the number of elements, but at modest values of the rank $R$ is below order 10 in our experiments, which represents a reduction of more than 11 orders of magnitude from the unpreconditioned system; this reduction is higher in the two-sided fractional derivative compared to one-sided fractional derivative. The corresponding cost is $O(R^2 N_d \log^2 N)+O(R^3 N_d \log N)+O(N_d^2)$. Moreover, by using a structured mesh (uniform or geometric mesh), we can further reduce the computational cost to $O(R^2 N_d\log^2 N) +O(R^3 N_d \log N)+ O(P^2 N\log N)$ for the preconditioned system. We present several numerical tests to illustrate the proposed algorithm using $h$ and $p$ refinements.
CVSep 6, 2022
CAMO-MOT: Combined Appearance-Motion Optimization for 3D Multi-Object Tracking with Camera-LiDAR FusionLi Wang, Xinyu Zhang, Wenyuan Qin et al.
3D Multi-object tracking (MOT) ensures consistency during continuous dynamic detection, conducive to subsequent motion planning and navigation tasks in autonomous driving. However, camera-based methods suffer in the case of occlusions and it can be challenging to accurately track the irregular motion of objects for LiDAR-based methods. Some fusion methods work well but do not consider the untrustworthy issue of appearance features under occlusion. At the same time, the false detection problem also significantly affects tracking. As such, we propose a novel camera-LiDAR fusion 3D MOT framework based on the Combined Appearance-Motion Optimization (CAMO-MOT), which uses both camera and LiDAR data and significantly reduces tracking failures caused by occlusion and false detection. For occlusion problems, we are the first to propose an occlusion head to select the best object appearance features multiple times effectively, reducing the influence of occlusions. To decrease the impact of false detection in tracking, we design a motion cost matrix based on confidence scores which improve the positioning and object prediction accuracy in 3D space. As existing multi-object tracking methods only consider a single category, we also propose to build a multi-category loss to implement multi-object tracking in multi-category scenes. A series of validation experiments are conducted on the KITTI and nuScenes tracking benchmarks. Our proposed method achieves state-of-the-art performance and the lowest identity switches (IDS) value (23 for Car and 137 for Pedestrian) among all multi-modal MOT methods on the KITTI test dataset. And our proposed method achieves state-of-the-art performance among all algorithms on the nuScenes test dataset with 75.3% AMOTA.
ROJan 11, 2023
Failure Detection for Motion Prediction of Autonomous Driving: An Uncertainty PerspectiveWenbo Shao, Yanchao Xu, Liang Peng et al.
Motion prediction is essential for safe and efficient autonomous driving. However, the inexplicability and uncertainty of complex artificial intelligence models may lead to unpredictable failures of the motion prediction module, which may mislead the system to make unsafe decisions. Therefore, it is necessary to develop methods to guarantee reliable autonomous driving, where failure detection is a potential direction. Uncertainty estimates can be used to quantify the degree of confidence a model has in its predictions and may be valuable for failure detection. We propose a framework of failure detection for motion prediction from the uncertainty perspective, considering both motion uncertainty and model uncertainty, and formulate various uncertainty scores according to different prediction stages. The proposed approach is evaluated based on different motion prediction algorithms, uncertainty estimation methods, uncertainty scores, etc., and the results show that uncertainty is promising for failure detection for motion prediction but should be used with caution.
NANov 18, 2017
A comparative study on nonlocal diffusion operators related to the fractional LaplacianSiwei Duo, Hong Wang, Yanzhi Zhang
In this paper, we study four nonlocal diffusion operators, including the fractional Laplacian, spectral fractional Laplacian, regional fractional Laplacian, and peridynamic operator. These operators represent the infinitesimal generators of different stochastic processes, and especially their differences on a bounded domain are significant. We provide extensive numerical experiments to understand and compare their differences. We find that these four operators collapse to the classical Laplace operator as α\to 2. The eigenvalues and eigenfunctions of these four operators are different, and the k-th (for k \in N) eigenvalue of the spectral fractional Laplacian is always larger than those of the fractional Laplacian and regional fractional Laplacian. For any α\in (0, 2), the peridynamic operator can provide a good approximation to the fractional Laplacian, if the horizon size δis sufficiently large. We find that the solution of the peridynamic model converges to that of the fractional Laplacian model at a rate of O(δ^{-α}). In contrast, although the regional fractional Laplacian can be used to approximate the fractional Laplacian as α\to 2, it generally provides inconsistent result from that of the fractional Laplacian if α\ll 2. Moreover, some conjectures are made from our numerical results, which could contribute to the mathematics analysis on these operators.
NAMar 14, 2018
On Power Law Scaling Dynamics for Time-fractional Phase Field Models during CoarseningLizhen Chen, Jia Zhao, Hong Wang
In this paper, we study the phase field models with fractional-order in time. The phase field models have been widely used to study coarsening dynamics of material systems with microstructures. It is known that phase field models are usually derived from energy variation so that they obey some energy dissipation laws intrinsically. Recently, many works have been published on investigating fractional-order phase field models, but little is known of the corresponding energy dissipation laws. We focus on the time-fractional phase field models and report that the effective free energy and roughness obey a universal power-law scaling dynamics during coarsening. Mainly, the effective free energy and roughness in the time-fractional phase field models scale by following a similar power law as the integer phase field models, where the power is linearly proportional to the fractional order. This universal scaling law is verified numerically against several phase field models, including the Cahn-Hilliard equations with different variable mobilities and molecular beam epitaxy models. This new finding sheds light on potential applications of time fractional phase field models in studying coarsening dynamics and crystal growths.
NAMar 6, 2018
An Accurate and Efficient Algorithm for The Time-fractional Molecular Beam Epitaxy Model with Slope SelectionLizhen Chen, Jia Zhao, Waixiang Cao et al.
In this paper, we propose a time-fractional molecular beam epitaxy (MBE) model with slope selection and its efficient, accurate, full discrete, linear numerical approximation. The numerical scheme utilizes the fast algorithm for the Caputo fractional derivative operator in time discretization and Fourier spectral method in spatial discretization. Refinement tests are conducted to verify the $2-α$ order of time convergence, with $α\in (0, 1]$ the fractional order of derivative. Several numerical simulations are presented to demonstrate the accuracy and efficiency of our newly proposed scheme. By exploring the fast algorithm calculating the Caputo fractional derivative, our numerical scheme makes it practice for long time simulation of MBE coarsening, which is essential for MBE model in practice. With the proposed fractional MBE model, we observe that the scaling law for the energy decays as $ O(t^{-\fracα{3}})$ and the roughness increases as $O(t^{\fracα{3}})$, during the coarsening dynamics with random initial condition. That is to say, the coarsening rate of MBE model could be manipulated by the fractional order $α$, and it is linearly proportional to $α$. This is the first time in literature to report/discover such scaling correlation. It provides a potential application field for fractional differential equations. Besides, the numerical approximation strategy proposed in this paper can be readily applied to study many classes of time-fractional and high dimensional phase field models.
CVFeb 28, 2023
Interactive Segmentation as Gaussian Process ClassificationMinghao Zhou, Hong Wang, Qian Zhao et al.
Click-based interactive segmentation (IS) aims to extract the target objects under user interaction. For this task, most of the current deep learning (DL)-based methods mainly follow the general pipelines of semantic segmentation. Albeit achieving promising performance, they do not fully and explicitly utilize and propagate the click information, inevitably leading to unsatisfactory segmentation results, even at clicked points. Against this issue, in this paper, we propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image. To solve this model, we utilize amortized variational inference to approximate the intractable GP posterior in a data-driven manner and then decouple the approximated GP posterior into double space forms for efficient sampling with linear complexity. Then, we correspondingly construct a GP classification framework, named GPCIS, which is integrated with the deep kernel learning mechanism for more flexibility. The main specificities of the proposed GPCIS lie in: 1) Under the explicit guidance of the derived GP posterior, the information contained in clicks can be finely propagated to the entire image and then boost the segmentation; 2) The accuracy of predictions at clicks has good theoretical support. These merits of GPCIS as well as its good generality and high efficiency are substantiated by comprehensive experiments on several benchmarks, as compared with representative methods both quantitatively and qualitatively.
CVSep 30, 2023
MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware EmbeddingsLei Yang, Jiaxin Yu, Xinyu Zhang et al.
Although the majority of recent autonomous driving systems concentrate on developing perception methods based on ego-vehicle sensors, there is an overlooked alternative approach that involves leveraging intelligent roadside cameras to help extend the ego-vehicle perception ability beyond the visual range. We discover that most existing monocular 3D object detectors rely on the ego-vehicle prior assumption that the optical axis of the camera is parallel to the ground. However, the roadside camera is installed on a pole with a pitched angle, which makes the existing methods not optimal for roadside scenes. In this paper, we introduce a novel framework for Roadside Monocular 3D object detection with ground-aware embeddings, named MonoGAE. Specifically, the ground plane is a stable and strong prior knowledge due to the fixed installation of cameras in roadside scenarios. In order to reduce the domain gap between the ground geometry information and high-dimensional image features, we employ a supervised training paradigm with a ground plane to predict high-dimensional ground-aware embeddings. These embeddings are subsequently integrated with image features through cross-attention mechanisms. Furthermore, to improve the detector's robustness to the divergences in cameras' installation poses, we replace the ground plane depth map with a novel pixel-level refined ground plane equation map. Our approach demonstrates a substantial performance advantage over all previous monocular 3D object detectors on widely recognized 3D detection benchmarks for roadside cameras. The code and pre-trained models will be released soon.
AIMay 28
Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM AgentsZiyan Liu, Zhezheng Hao, Yeqiu Chen et al.
Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajectories into compact memory. However, existing approaches typically train these memory policies using outcome-based reinforcement learning, failing to localize where intermediate memory quality degrades. As interactions unfold, ambiguous recursive summaries progressively discard task-relevant information and introduce semantic noise. This exacerbates belief deviation, obscuring the agent's estimate of the latent task state and ultimately derailing long-horizon reasoning. We therefore argue that memory optimization should focus not merely on trajectory-level success, but on the clarity of the belief induced by intermediate summaries. To this end, we introduce Belief Entropy, a self-supervised proxy that probes how uncertain the model remains about the latent task state given its current memory. Based on this proxy, we propose Metacognitive Memory Policy Optimization (MMPO). Instead of relying only on sparse outcome-based signals, MMPO provides fine-grained, memory-specific supervision via explicitly penalizing summaries that induce high epistemic uncertainty. Experiments show that MMPO consistently outperforms existing methods on diverse long-horizon tasks, maintaining 97.1% performance even when scaled to 1.75M-token contexts.
MAMay 28
Evolve as a Team: Collaborative Self-Evolution for LLM-based Multi-Agent SystemsZhezheng Hao, Tianfu Wang, Huanshuo Dong et al.
LLM-based multi-agent systems (MAS) have emerged as an effective paradigm for complex and long-horizon tasks. However, in real-world tasks, MAS often exhibit various failures during execution and such failures are difficult to eliminate during design. This motivates experience-driven MAS evolution, where a system improves based on its own execution experience. Yet such evolution is challenging because MAS experience is prolonged and intricate, interleaving multiple agents' execution chains and communication messages, which makes it difficult to identify what should be improved. To address this challenge, we propose Meta-Team, an experience-driven MAS evolution framework based on collaborative self-evolution. Meta-Team preserves the execution context of each agent and coordinates post-task communication, enabling agents to exchange distributed evidence for evolution. Building on this design, Meta-Team conducts multi-scale self-evolution, transforming execution experience into reusable improvements to agent behaviors, inter-agent coordination, and team-level organization. Across six long-horizon agent benchmarks, Meta-Team consistently outperforms single-agent systems, hand-crafted MAS, and prior MAS evolution methods; further analyses demonstrate that Meta-Team enables more reliable and scalable MAS self-evolution.
CVSep 25, 2023
Recursive Counterfactual Deconfounding for Object RecognitionJiayin Sun, Hong Wang, Qiulei Dong
Image recognition is a classic and common task in the computer vision field, which has been widely applied in the past decade. Most existing methods in literature aim to learn discriminative features from labeled images for classification, however, they generally neglect confounders that infiltrate into the learned features, resulting in low performances for discriminating test images. To address this problem, we propose a Recursive Counterfactual Deconfounding model for object recognition in both closed-set and open-set scenarios based on counterfactual analysis, called RCD. The proposed model consists of a factual graph and a counterfactual graph, where the relationships among image features, model predictions, and confounders are built and updated recursively for learning more discriminative features. It performs in a recursive manner so that subtler counterfactual features could be learned and eliminated progressively, and both the discriminability and generalization of the proposed model could be improved accordingly. In addition, a negative correlation constraint is designed for alleviating the negative effects of the counterfactual features further at the model training stage. Extensive experimental results on both closed-set recognition task and open-set recognition task demonstrate that the proposed RCD model performs better than 11 state-of-the-art baselines significantly in most cases.
NANov 7, 2017
The finite steps of convergence of the fast thresholding algorithms with feedbacksNingning Han, Shidong Li, Zhanjie Song et al.
Iterative algorithms based on thresholding, feedback and null space tuning (NST+HT+FB) for sparse signal recovery are exceedingly effective and fast, particularly for large scale problems. The core algorithm is shown to converge in finitely many steps under a (preconditioned) restricted isometry condition. In this paper, we present a new perspective to analyze the algorithm, which turns out that the efficiency of the algorithm can be further elaborated by an estimate of the number of iterations for the guaranteed convergence. The convergence condition of NST+HT+FB is also improved. Moreover, an adaptive scheme (AdptNST+HT+FB) without the knowledge of the sparsity level is proposed with its convergence guarantee. The number of iterations for the finite step of convergence of the AdptNST+HT+FB scheme is also derived. It is further shown that the number of iterations can be significantly reduced by exploiting the structure of the specific sparse signal or the random measurement matrix.
NAOct 29, 2018
Spectral approximation of a variable coefficient fractional diffusion equation in one space dimensionXiangcheng Zheng, V. J. Ervin, Hong Wang
In this article we consider the approximation of a variable coefficient (two-sided) fractional diffusion equation (FDE), having unknown $u$. By introducing an intermediate unknown, $q$, the variable coefficient FDE is rewritten as a lower order, constant coefficient FDE. A spectral approximation scheme, using Jacobi polynomials, is presented for the approximation of $q$, $q_{N}$. The approximate solution to $u$, $u_{N}$, is obtained by post processing $q_{N}$. An a priori error analysis is given for $(q \, - \, q_{N})$ and $(u \, - \, u_{N})$. Two numerical experiments are presented whose results demonstrate the sharpness of the derived error estimates.
NANov 1, 2018
Wellposedness of the two-sided variable coefficient Caputo flux fractional diffusion equation and error estimate of its spectral approximationXiangcheng Zheng, V. J. Ervin, Hong Wang
In this article a two-sided variable coefficient fractional diffusion equation (FDE) is investigated, where the variable coefficient occurs outside of the fractional integral operator. Under a suitable transformation the variable coefficient equation is transformed to a constant coefficient equation. Then, using the spectral decomposition approach with Jacobi polynomials, we proved the wellposedness of the model and the regularity of its solution. A spectral approximation scheme is proposed and the accuracy of its approximation studied. Two numerical experiments are presented to demonstrate the derived error estimates.
SYJun 12, 2023
Evolving Testing Scenario Generation Method and Intelligence Evaluation Framework for Automated VehiclesYining Ma, Wei Jiang, Lingtong Zhang et al.
Interaction between the background vehicles (BVs) and automated vehicles (AVs) in scenario-based testing plays a critical role in evaluating the intelligence of the AVs. Current testing scenarios typically employ predefined or scripted BVs, which inadequately reflect the complexity of human-like social behaviors in real-world driving scenarios, and also lack a systematic metric for evaluating the comprehensive intelligence of AVs. Therefore, this paper proposes an evolving scenario generation method that utilizes deep reinforcement learning (DRL) to create human-like BVs for testing and intelligence evaluation of AVs. Firstly, a class of driver models with human-like competitive, cooperative, and mutual driving motivations is designed. Then, utilizing an improved "level-k" training procedure, the three distinct driver models acquire game-based interactive driving policies. And these models are assigned to BVs for generating evolving scenarios in which all BVs can interact continuously and evolve diverse contents. Next, a framework including safety, driving efficiency, and interaction utility are presented to evaluate and quantify the intelligence performance of 3 systems under test (SUTs), indicating the effectiveness of the evolving scenario for intelligence testing. Finally, the complexity and fidelity of the proposed evolving testing scenario are validated. The results demonstrate that the proposed evolving scenario exhibits the highest level of complexity compared to other baseline scenarios and has more than 85% similarity to naturalistic driving data. This highlights the potential of the proposed method to facilitate the development and evaluation of high-level AVs in a realistic and challenging environment.
NISep 3, 2024
When Digital Twin Meets 6G: Concepts, Obstacles, and Research ProspectsWenshuai Liu, Yaru Fu, Zheng Shi et al.
The convergence of digital twin technology and the emerging 6G network presents both challenges and numerous research opportunities. This article explores the potential synergies between digital twin and 6G, highlighting the key challenges and proposing fundamental principles for their integration. We discuss the unique requirements and capabilities of digital twin in the context of 6G networks, such as sustainable deployment, real-time synchronization, seamless migration, predictive analytic, and closed-loop control. Furthermore, we identify research opportunities for leveraging digital twin and artificial intelligence to enhance various aspects of 6G, including network optimization, resource allocation, security, and intelligent service provisioning. This article aims to stimulate further research and innovation at the intersection of digital twin and 6G, paving the way for transformative applications and services in the future.
NAFeb 26, 2019
Numerical approximations for the variable coefficient fractional diffusion equations with non-smooth dataXiangcheng Zheng, V. J. Ervin, Hong Wang
In this article we study the numerical approximation of a variable coefficient fractional diffusion equation. Using a change of variable, the variable coefficient fractional diffusion equation is transformed into a constant coefficient fractional diffusion equation of the same order. The transformed equation retains the desirable stability property of being an elliptic equation. A spectral approximation scheme is proposed and analyzed for the transformed equation, with error estimates for the approximated solution derived. An approximation to the unknown of the variable coefficient fractional diffusion equation is then obtained by post processing the computed approximation to the transformed equation. Error estimates are also presented for the approximation to the unknown of the variable coefficient equation with both smooth and non-smooth diffusivity coefficient and right-hand side. Three numerical experiments are given whose convergence results are in strong agreement with the theoretically derived estimates.
AINov 8, 2022
SOTIF Entropy: Online SOTIF Risk Quantification and Mitigation for Autonomous DrivingLiang Peng, Boqi Li, Wenhao Yu et al.
Autonomous driving confronts great challenges in complex traffic scenarios, where the risk of Safety of the Intended Functionality (SOTIF) can be triggered by the dynamic operational environment and system insufficiencies. The SOTIF risk is reflected not only intuitively in the collision risk with objects outside the autonomous vehicles (AVs), but also inherently in the performance limitation risk of the implemented algorithms themselves. How to minimize the SOTIF risk for autonomous driving is currently a critical, difficult, and unresolved issue. Therefore, this paper proposes the "Self-Surveillance and Self-Adaption System" as a systematic approach to online minimize the SOTIF risk, which aims to provide a systematic solution for monitoring, quantification, and mitigation of inherent and external risks. The core of this system is the risk monitoring of the implemented artificial intelligence algorithms within the AV. As a demonstration of the Self-Surveillance and Self-Adaption System, the risk monitoring of the perception algorithm, i.e., YOLOv5 is highlighted. Moreover, the inherent perception algorithm risk and external collision risk are jointly quantified via SOTIF entropy, which is then propagated downstream to the decision-making module and mitigated. Finally, several challenging scenarios are demonstrated, and the Hardware-in-the-Loop experiments are conducted to verify the efficiency and effectiveness of the system. The results demonstrate that the Self-Surveillance and Self-Adaption System enables dependable online monitoring, quantification, and mitigation of SOTIF risk in real-time critical traffic environments.
IVJun 5, 2023
Cross-Modal Vertical Federated Learning for MRI ReconstructionYunlu Yan, Hong Wang, Yawen Huang et al.
Federated learning enables multiple hospitals to cooperatively learn a shared model without privacy disclosure. Existing methods often take a common assumption that the data from different hospitals have the same modalities. However, such a setting is difficult to fully satisfy in practical applications, since the imaging guidelines may be different between hospitals, which makes the number of individuals with the same set of modalities limited. To this end, we formulate this practical-yet-challenging cross-modal vertical federated learning task, in which shape data from multiple hospitals have different modalities with a small amount of multi-modality data collected from the same individuals. To tackle such a situation, we develop a novel framework, namely Federated Consistent Regularization constrained Feature Disentanglement (Fed-CRFD), for boosting MRI reconstruction by effectively exploring the overlapping samples (individuals with multi-modalities) and solving the domain shift problem caused by different modalities. Particularly, our Fed-CRFD involves an intra-client feature disentangle scheme to decouple data into modality-invariant and modality-specific features, where the modality-invariant features are leveraged to mitigate the domain shift problem. In addition, a cross-client latent representation consistency constraint is proposed specifically for the overlapping samples to further align the modality-invariant features extracted from different modalities. Hence, our method can fully exploit the multi-source data from hospitals while alleviating the domain shift problem. Extensive experiments on two typical MRI datasets demonstrate that our network clearly outperforms state-of-the-art MRI reconstruction methods. The source code will be publicly released upon the publication of this work.
NAJun 15, 2016
Wellposedness and regularity of steady-state two-sided variable-coefficient conservative space-fractional diffusion equationsDanping Yang, Hong Wang
We study the Dirichlet boundary-value problem of steady-state two-sided variable-coefficient conservative space-fractional diffusion equations. We show that the Galerkin weak formulation, which was proved to be coercive and continuous for a constant-coefficient analogue of the problem, loses its coercivity. We characterize the solution to the variable-coefficient problem in terms of the solutions of second-order diffusion equations along with a two-sided fractional integral equation. We then derive a Petrov-Galerkin formulation for this problem and prove that the weak formulation is weakly coercive and so the problem is well posed. We then prove high-order regularity estimates of the true solution in a properly chosen norm of Riemann-Liouville derivatives.
IVSep 27, 2023
RSF-Conv: Rotation-and-Scale Equivariant Fourier Parameterized Convolution for Retinal Vessel SegmentationZihong Sun, Hong Wang, Qi Xie et al.
Retinal vessel segmentation is of great clinical significance for the diagnosis of many eye-related diseases, but it is still a formidable challenge due to the intricate vascular morphology. With the skillful characterization of the translation symmetry existing in retinal vessels, convolutional neural networks (CNNs) have achieved great success in retinal vessel segmentation. However, the rotation-and-scale symmetry, as a more widespread image prior in retinal vessels, fails to be characterized by CNNs. Therefore, we propose a rotation-and-scale equivariant Fourier parameterized convolution (RSF-Conv) specifically for retinal vessel segmentation, and provide the corresponding equivariance analysis. As a general module, RSF-Conv can be integrated into existing networks in a plug-and-play manner while significantly reducing the number of parameters. For instance, we replace the traditional convolution filters in U-Net and Iter-Net with RSF-Convs, and faithfully conduct comprehensive experiments. RSF-Conv+U-Net and RSF-Conv+Iter-Net not only have slight advantages under in-domain evaluation, but more importantly, outperform all comparison methods by a significant margin under out-of-domain evaluation. It indicates the remarkable generalization of RSF-Conv, which holds greater practical clinical significance for the prevalent cross-device and cross-hospital challenges in clinical practice. To comprehensively demonstrate the effectiveness of RSF-Conv, we also apply RSF-Conv+U-Net and RSF-Conv+Iter-Net to retinal artery/vein classification and achieve promising performance as well, indicating its clinical application potential.
CVMay 11Code
DeepSight: Long-Horizon World Modeling via Latent States Prediction for End-to-End Autonomous DrivingLingjun Zhang, Changjie Wu, Linzhe Shi et al.
End-to-end autonomous driving systems are increasingly integrating Vision-Language Model (VLM) architectures, incorporating text reasoning or visual reasoning to enhance the robustness and accuracy of driving decisions. However, the reasoning mechanisms employed in most methods are direct adaptations from general domains, lacking in-depth exploration tailored to autonomous driving scenarios, particularly within visual reasoning modules. In this paper, we propose a driving world model that performs parallel prediction of latent semantic features for consecutive future frames in the bird's-eye-view (BEV) space, thereby enabling long-horizon modeling of future world states. We also introduce an efficient and adaptive text reasoning mechanism that utilizes additional social knowledge and reasoning capabilities to further improve driving performance in challenging long-tail scenarios. We present a novel, efficient, and effective approach that achieves state-of-the-art (SOTA) results on the closed-loop Bench2drive benchmark. Codes are available at: https://github.com/hotdogcheesewhite/DeepSight.
CVJul 13, 2022
Context-Consistent Semantic Image Editing with Style-Preserved ModulationWuyang Luo, Su Yang, Hong Wang et al.
Semantic image editing utilizes local semantic label maps to generate the desired content in the edited region. A recent work borrows SPADE block to achieve semantic image editing. However, it cannot produce pleasing results due to style discrepancy between the edited region and surrounding pixels. We attribute this to the fact that SPADE only uses an image-independent local semantic layout but ignores the image-specific styles included in the known pixels. To address this issue, we propose a style-preserved modulation (SPM) comprising two modulations processes: The first modulation incorporates the contextual style and semantic layout, and then generates two fused modulation parameters. The second modulation employs the fused parameters to modulate feature maps. By using such two modulations, SPM can inject the given semantic layout while preserving the image-specific context style. Moreover, we design a progressive architecture for generating the edited content in a coarse-to-fine manner. The proposed method can obtain context-consistent results and significantly alleviate the unpleasant boundary between the generated regions and the known pixels.
CVJan 20, 2023
Chaos to Order: A Label Propagation Perspective on Source-Free Domain AdaptationChunwei Wu, Guitao Cao, Yan Li et al.
Source-free domain adaptation (SFDA), where only a pre-trained source model is used to adapt to the target distribution, is a more general approach to achieving domain adaptation in the real world. However, it can be challenging to capture the inherent structure of the target features accurately due to the lack of supervised information on the target domain. By analyzing the clustering performance of the target features, we show that they still contain core features related to discriminative attributes but lack the collation of semantic information. Inspired by this insight, we present Chaos to Order (CtO), a novel approach for SFDA that strives to constrain semantic credibility and propagate label information among target subpopulations. CtO divides the target data into inner and outlier samples based on the adaptive threshold of the learning state, customizing the learning strategy to fit the data properties best. Specifically, inner samples are utilized for learning intra-class structure thanks to their relatively well-clustered properties. The low-density outlier samples are regularized by input consistency to achieve high accuracy with respect to the ground truth labels. In CtO, by employing different learning strategies to propagate the labels from the inner local to outlier instances, it clusters the global samples from chaos to order. We further adaptively regulate the neighborhood affinity of the inner samples to constrain the local semantic credibility. In theoretical and empirical analyses, we demonstrate that our algorithm not only propagates from inner to outlier but also prevents local clustering from forming spurious clusters. Empirical evidence demonstrates that CtO outperforms the state of the arts on three public benchmarks: Office-31, Office-Home, and VisDA.
NAFeb 11, 2019
Data-driven physics informed deep learning of solute transport with anomalous diffusionHuan Liu, Hong Wang, Xiangcheng Zheng
The fractional advection-dispersion equation (FADE) has attracted increased attention from researchers as it provides an accurate description for challenging phenomenas with long-range time memory and spatial interactions, such as the anomalous diffusion behavior in the solute transport in porous media. Practically, a full characterization of the model parameters, such as the fluid velocity, dispersion coefficient and the order of the fractional derivative, often implies a huge amount of experiments and measurements and thus are hard to be determined. In this paper, we employ the framework of feedforward deep neural networks (DNNs) to develop an efficient data-driven deep learning algorithm for inferring the aforementioned parameters of the FADE, such as the time-dependent space-fractional advection-dispersion equation (sFADE) and the variable-order fractional mobile/immobile equation (VoFMIE), in which the feedforward DNNs are trained to minimize the mean square error loss function formulated by means of the finite difference approximations of sFADE and VoFMIE, respectively. Several numerical experiments, in which we discover the model parameters by the feedforward DNNs for both the synthetic and field data, are presented to demonstrate the effectiveness and robustness of the proposed data-driven deep learning algorithm.
PRFeb 5, 2019
Stable Lévy diffusion and related model fittingParamita Chakraborty, Xu Guo, Hong Wang
A fractional advection-dispersion equation (fADE) has been advocated for heavy-tailed flows where the usual Brownian diffusion models fail. A stochastic differential equation (SDE) driven by a stable Lévy process gives a forward equation that matches the space-fractional advection-dispersion equation and thus gives the stochastic framework of particle tracking for heavy-tailed flows. For constant advection and dispersion coefficient functions, the solution to such SDE itself is a stable process and can be derived easily by least square parameter fitting from the observed flow concentration data. However, in a more generalized scenario, a closed form for the solution to a stable SDE may not exist. We propose a numerical method for solving/generating a stable SDE in a general set-up. The method incorporates a discretized finite volume scheme with the characteristic line to solve the fADE or the forward equation for the Markov process that solves the stable SDE. Then we use a numerical scheme to generate the solution to the governing SDE using the fADE solution. Also, often the functional form of the advection or dispersion coefficients are not known for a given plume concentration data to start with. We use a Levenberg--Marquardt (L-M) regularization method to estimate advection and dispersion coefficient function from the observed data (we present the case for a linear advection) and proceed with the SDE solution construction described above.
APJan 30, 2018
Fast procedures for Caputo fractional derivative and its applications to ordinary and partial differential equationsZhengguang Liu, Aijie Cheng, Xiaoli Li et al.
In this paper, we develop fast procedures for solving linear systems arising from discretization of ordinary and partial differential equations with Caputo fractional derivative w.r.t time variable. First, we consider a finite difference scheme to solve a two-sided fractional ordinary equation. Furthermore, we present a fast solution technique to accelerate Toeplitz matrix-vector multiplications arising from finite difference discretization. This fast solution technique is based on a fast Fourier transform and depends on the special structure of coefficient matrices, and it helps to reduce the computational work from $O(N^{3})$ required by traditional methods to $O(Nlog^{2}N)$ and the memory requirement from $O(N^{2})$ to $O(N)$ without using any lossy compression, where $N$ is the number of unknowns. Two finite difference schemes to solve time fractional hyperbolic equations with different fractional order $γ$ are considered. We present a fast solution technique depending on the special structure of coefficient matrices by rearranging the order of unknowns. It helps to reduce the computational work from $O(N^2M)$ required by traditional methods to $O(N$log$^{2}N)$ and the memory requirement from $O(NM)$ to $O(N)$ without using any lossy compression, where $N=τ^{-1}$ and $τ$ is the size of time step, $M=h^{-1}$ and $h$ is the size of space step. Importantly, a fast method is employed to solve the classical time fractional diffusion equation with a lower coast at $O(MN$log$^2N)$, where the direct method requires an overall computational complexity of $O(N^2M)$. Moreover, the applicability and accuracy of the scheme are demonstrated by numerical experiments to support our theoretical analysis.
LGApr 26, 2022
Designing thermal radiation metamaterials via hybrid adversarial autoencoder and Bayesian optimizationDezhao Zhu, Jiang Guo, Gang Yu et al.
Designing thermal radiation metamaterials is challenging especially for problems with high degrees of freedom and complex objective. In this letter, we have developed a hybrid materials informatics approach which combines the adversarial autoencoder and Bayesian optimization to design narrowband thermal emitters at different target wavelengths. With only several hundreds of training data sets, new structures with optimal properties can be quickly figured out in a compressed 2-dimensional latent space. This enables the optimal design by calculating far less than 0.001\% of the total candidate structures, which greatly decreases the design period and cost. The proposed design framework can be easily extended to other thermal radiation metamaterials design with higher dimensional features.
CVJul 2, 2024
A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature UpsamplingMinghao Zhou, Hong Wang, Yefeng Zheng et al.
Feature upsampling is a fundamental and indispensable ingredient of almost all current network structures for dense prediction tasks. Recently, a popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance to help upsample the low-resolution deep feature based on their local similarity. Albeit achieving promising performance, this pipeline has specific limitations: 1) HR query and LR key features are not well aligned; 2) the similarity between query-key features is computed based on the fixed inner product form; 3) neighbor selection is coarsely operated on LR features, resulting in mosaic artifacts. These shortcomings make the existing methods along this pipeline primarily applicable to hierarchical network architectures with iterative features as guidance and they are not readily extended to a broader range of structures, especially for a direct high-ratio upsampling. Against the issues, we meticulously optimize every methodological design. Specifically, we firstly propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives, and then construct a parameterized paired central difference convolution block for flexibly calculating the similarity between the well-aligned query-key features. Besides, we develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts. Based on these careful designs, we systematically construct a refreshed similarity-based feature upsampling framework named ReSFU. Extensive experiments substantiate that our proposed ReSFU is finely applicable to various types of architectures in a direct high-ratio upsampling manner, and consistently achieves satisfactory performance on different dense prediction applications, showing superior generality and ease of deployment.
CLJun 7, 2023
STEPS: A Benchmark for Order Reasoning in Sequential TasksWeizhi Wang, Hong Wang, Xifeng Yan
Various human activities can be abstracted into a sequence of actions in natural text, i.e. cooking, repairing, manufacturing, etc. Such action sequences heavily depend on the executing order, while disorder in action sequences leads to failure of further task execution by robots or AI agents. Therefore, to verify the order reasoning capability of current neural models in sequential tasks, we propose a challenging benchmark , named STEPS. STEPS involves two subtask settings, focusing on determining the rationality of given next step in recipes and selecting the reasonable step from the multi-choice question, respectively. We describe the data construction and task formulations, and benchmark most of significant Large Language Models (LLMs). The experimental results demonstrate 1) The commonsense reasoning of action orders in sequential tasks are challenging to resolve via zero-shot prompting or few-shot in-context learning for LLMs; 2) Prompting method still significantly lags behind tuning-based method on STEPS.
AIMay 21
ArborKV: Structure-Aware KV Cache Management for Scaling Tree-based LLM ReasoningYeqiu Chen, Ziyan Liu, Zhenxin Huang et al.
Recent progress in LLM reasoning has increasingly shifted from single-pass generation to explicit search over intermediate reasoning states. Tree-of-Thoughts (ToT) organizes inference to tree-structured search with branching and backtracking, but it substantially amplifies the Key--Value (KV) cache: retaining KV states for a frontier of partial trajectories quickly becomes a memory bottleneck that limits throughput and constrains search depth and width under fixed hardware budgets. We address this challenge by observing that KV reuse in ToT-style inference is governed by search dynamics: near-term decoding depends primarily on the active branch and its ancestors, whereas inactive subtrees have low short-term reuse probability yet must remain recoverable for backtracking. Motivated by this, we propose ArborKV, a structure-aware eviction framework that couples a lightweight value estimator with a tree-aware allocation policy, and performs purely token-extractive eviction with lazy rehydration to support revisits. Experiments on ToT-style reasoning benchmarks show that ArborKV achieves up to ~4x peak KV-memory reduction while preserving near-full-retention accuracy, enabling larger search configurations under fixed device budgets that would otherwise run out of memory.
CVSep 9, 2023
Exploring Robust Features for Improving Adversarial RobustnessHong Wang, Yuefan Deng, Shinjae Yoo et al.
While deep neural networks (DNNs) have revolutionized many fields, their fragility to carefully designed adversarial attacks impedes the usage of DNNs in safety-critical applications. In this paper, we strive to explore the robust features which are not affected by the adversarial perturbations, i.e., invariant to the clean image and its adversarial examples, to improve the model's adversarial robustness. Specifically, we propose a feature disentanglement model to segregate the robust features from non-robust features and domain specific features. The extensive experiments on four widely used datasets with different attacks demonstrate that robust features obtained from our model improve the model's adversarial robustness compared to the state-of-the-art approaches. Moreover, the trained domain discriminator is able to identify the domain specific features from the clean images and adversarial examples almost perfectly. This enables adversarial example detection without incurring additional computational costs. With that, we can also specify different classifiers for clean images and adversarial examples, thereby avoiding any drop in clean image accuracy.
CVNov 25, 2022
Spatial-Temporal Attention Network for Open-Set Fine-Grained Image RecognitionJiayin Sun, Hong Wang, Qiulei Dong
Triggered by the success of transformers in various visual tasks, the spatial self-attention mechanism has recently attracted more and more attention in the computer vision community. However, we empirically found that a typical vision transformer with the spatial self-attention mechanism could not learn accurate attention maps for distinguishing different categories of fine-grained images. To address this problem, motivated by the temporal attention mechanism in brains, we propose a spatial-temporal attention network for learning fine-grained feature representations, called STAN, where the features learnt by implementing a sequence of spatial self-attention operations corresponding to multiple moments are aggregated progressively. The proposed STAN consists of four modules: a self-attention backbone module for learning a sequence of features with self-attention operations, a spatial feature self-organizing module for facilitating the model training, a spatial-temporal feature learning module for aggregating the re-organized features via a Long Short-Term Memory network, and a context-aware module that is implemented as the forget block of the spatial-temporal feature learning module for preserving/forgetting the long-term memory by utilizing contextual information. Then, we propose a STAN-based method for open-set fine-grained recognition by integrating the proposed STAN network with a linear classifier, called STAN-OSFGR. Extensive experimental results on 3 fine-grained datasets and 2 coarse-grained datasets demonstrate that the proposed STAN-OSFGR outperforms 9 state-of-the-art open-set recognition methods significantly in most cases.
LGJan 22
Learning Neural Operators from Partial Observations via Latent Autoregressive ModelingJingren Hou, Hong Wang, Pengyu Xu et al.
Real-world scientific applications frequently encounter incomplete observational data due to sensor limitations, geographic constraints, or measurement costs. Although neural operators significantly advanced PDE solving in terms of computational efficiency and accuracy, their underlying assumption of fully-observed spatial inputs severely restricts applicability in real-world applications. We introduce the first systematic framework for learning neural operators from partial observation. We identify and formalize two fundamental obstacles: (i) the supervision gap in unobserved regions that prevents effective learning of physical correlations, and (ii) the dynamic spatial mismatch between incomplete inputs and complete solution fields. Specifically, our proposed Latent Autoregressive Neural Operator(LANO) introduces two novel components designed explicitly to address the core difficulties of partial observations: (i) a mask-to-predict training strategy that creates artificial supervision by strategically masking observed regions, and (ii) a Physics-Aware Latent Propagator that reconstructs solutions through boundary-first autoregressive generation in latent space. Additionally, we develop POBench-PDE, a dedicated and comprehensive benchmark designed specifically for evaluating neural operators under partial observation conditions across three PDE-governed tasks. LANO achieves state-of-the-art performance with 18--69$\%$ relative L2 error reduction across all benchmarks under patch-wise missingness with less than 50$\%$ missing rate, including real-world climate prediction. Our approach effectively addresses practical scenarios involving up to 75$\%$ missing rate, to some extent bridging the existing gap between idealized research settings and the complexities of real-world scientific computing.
LGJan 14
HGATSolver: A Heterogeneous Graph Attention Solver for Fluid-Structure InteractionQin-Yi Zhang, Hong Wang, Siyao Liu et al.
Fluid-structure interaction (FSI) systems involve distinct physical domains, fluid and solid, governed by different partial differential equations and coupled at a dynamic interface. While learning-based solvers offer a promising alternative to costly numerical simulations, existing methods struggle to capture the heterogeneous dynamics of FSI within a unified framework. This challenge is further exacerbated by inconsistencies in response across domains due to interface coupling and by disparities in learning difficulty across fluid and solid regions, leading to instability during prediction. To address these challenges, we propose the Heterogeneous Graph Attention Solver (HGATSolver). HGATSolver encodes the system as a heterogeneous graph, embedding physical structure directly into the model via distinct node and edge types for fluid, solid, and interface regions. This enables specialized message-passing mechanisms tailored to each physical domain. To stabilize explicit time stepping, we introduce a novel physics-conditioned gating mechanism that serves as a learnable, adaptive relaxation factor. Furthermore, an Inter-domain Gradient-Balancing Loss dynamically balances the optimization objectives across domains based on predictive uncertainty. Extensive experiments on two constructed FSI benchmarks and a public dataset demonstrate that HGATSolver achieves state-of-the-art performance, establishing an effective framework for surrogate modeling of coupled multi-physics systems.
ROJan 26, 2023
Planning Automated Driving with Accident Experience Referencing and Common-sense InferencingShaobo Qiu, Ji Li, Guoxi Chen et al.
Although a typical autopilot system far surpasses humans in term of sensing accuracy, performance stability and response agility, such a system is still far behind humans in the wisdom of understanding an unfamiliar environment with creativity, adaptivity and resiliency. Current AD brains are basically expert systems featuring logical computations, which resemble the thinking flow of a left brain working at tactical level. A right brain is needed to upgrade the safety of automated driving vehicle onto next generation by making intuitive strategical judgements that can supervise the tactical action planning. In this work, we present the concept of an Automated Driving Strategical Brain (ADSB): a framework of a scene perception and scene safety evaluation system that works at a higher abstraction level, incorporating experience referencing, common-sense inferring and goal-and-value judging capabilities, to provide a contextual perspective for decision making within automated driving planning. The ADSB brain architecture is made up of the Experience Referencing Engine (ERE), the Common-sense Referencing Engine (CIE) and the Goal and Value Keeper (GVK). 1,614,748 cases from FARS/CRSS database of NHTSA in the period 1975 to 2018 are used for the training of ERE model. The kernel of CIE is a trained model, COMET-BART by ATOMIC, which can be used to provide directional advice when tactical-level environmental perception conclusions are ambiguous; it can also use future scenario models to remind tactical-level decision systems to plan ahead of a perceived hazard scene. GVK can take in any additional expert-hand-written rules that are of qualitative nature. Moreover, we believe that with good scalability, the ADSB approach provides a potential solution to the problem of long-tail corner cases encountered in the validation of a rule-based planning algorithm.