Ling Wang

LG
h-index96
49papers
708citations
Novelty50%
AI Score57

49 Papers

CVSep 10, 2023Code
3D Implicit Transporter for Temporally Consistent Keypoint Discovery

Chengliang Zhong, Yuhang Zheng, Yupeng Zheng et al.

Keypoint-based representation has proven advantageous in various visual and robotic tasks. However, the existing 2D and 3D methods for detecting keypoints mainly rely on geometric consistency to achieve spatial alignment, neglecting temporal consistency. To address this issue, the Transporter method was introduced for 2D data, which reconstructs the target frame from the source frame to incorporate both spatial and temporal information. However, the direct application of the Transporter to 3D point clouds is infeasible due to their structural differences from 2D images. Thus, we propose the first 3D version of the Transporter, which leverages hybrid 3D representation, cross attention, and implicit reconstruction. We apply this new learning system on 3D articulated objects and nonrigid animals (humans and rodents) and show that learned keypoints are spatio-temporally consistent. Additionally, we propose a closed-loop control strategy that utilizes the learned keypoints for 3D object manipulation and demonstrate its superior performance. Codes are available at https://github.com/zhongcl-thu/3D-Implicit-Transporter.

CVFeb 2, 2023Code
STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation

Yupeng Zheng, Chengliang Zhong, Pengfei Li et al.

Self-supervised depth estimation draws a lot of attention recently as it can promote the 3D sensing capabilities of self-driving vehicles. However, it intrinsically relies upon the photometric consistency assumption, which hardly holds during nighttime. Although various supervised nighttime image enhancement methods have been proposed, their generalization performance in challenging driving scenarios is not satisfactory. To this end, we propose the first method that jointly learns a nighttime image enhancer and a depth estimator, without using ground truth for either task. Our method tightly entangles two self-supervised tasks using a newly proposed uncertain pixel masking strategy. This strategy originates from the observation that nighttime images not only suffer from underexposed regions but also from overexposed regions. By fitting a bridge-shaped curve to the illumination map distribution, both regions are suppressed and two tasks are bridged naturally. We benchmark the method on two established datasets: nuScenes and RobotCar and demonstrate state-of-the-art performance on both of them. Detailed ablations also reveal the mechanism of our proposal. Last but not least, to mitigate the problem of sparse ground truth of existing datasets, we provide a new photo-realistically enhanced nighttime dataset based upon CARLA. It brings meaningful new challenges to the community. Codes, data, and models are available at https://github.com/ucaszyp/STEPS.

LGApr 25, 2022
Deep Reinforcement Learning for Online Routing of Unmanned Aerial Vehicles with Wireless Power Transfer

Kaiwen Li, Tao Zhang, Rui Wang et al.

The unmanned aerial vehicle (UAV) plays an vital role in various applications such as delivery, military mission, disaster rescue, communication, etc., due to its flexibility and versatility. This paper proposes a deep reinforcement learning method to solve the UAV online routing problem with wireless power transfer, which can charge the UAV remotely without wires, thus extending the capability of the battery-limited UAV. Our study considers the power consumption of the UAV and the wireless charging process. Unlike the previous works, we solve the problem by a designed deep neural network. The model is trained using a deep reinforcement learning method offline, and is used to optimize the UAV routing problem online. On small and large scale instances, the proposed model runs from four times to 500 times faster than Google OR-tools, the state-of-the-art combinatorial optimization solver, with identical solution quality. It also outperforms different types of heuristic and local search methods in terms of both run-time and optimality. In addition, once the model is trained, it can scale to new generated problem instances with arbitrary topology that are not seen during training. The proposed method is practically applicable when the problem scale is large and the response time is crucial.

LGOct 11, 2023Code
Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets

Zhongji Zhang, Yuhang Wang, Yinghao Zhu et al.

Emerging diseases present challenges in symptom recognition and timely clinical intervention due to limited available information. An effective prognostic model could assist physicians in making accurate diagnoses and designing personalized treatment plans to prevent adverse outcomes. However, in the early stages of disease emergence, several factors hamper model development: limited data collection, insufficient clinical experience, and privacy and ethical concerns restrict data availability and complicate accurate label assignment. Furthermore, Electronic Medical Record (EMR) data from different diseases or sources often exhibit significant cross-dataset feature misalignment, severely impacting the effectiveness of deep learning models. We present a domain-invariant representation learning method that constructs a transition model between source and target datasets. By constraining the distribution shift of features generated across different domains, we capture domain-invariant features specifically relevant to downstream tasks, developing a unified domain-invariant encoder that achieves better feature representation across various task domains. Experimental results across multiple target tasks demonstrate that our proposed model surpasses competing baseline methods and achieves faster training convergence, particularly when working with limited data. Extensive experiments validate our method's effectiveness in providing more accurate predictions for emerging pandemics and other diseases. Code is publicly available at https://github.com/wang1yuhang/domain_invariant_network.

MNMay 8, 2022
FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction

Hanxuan Cai, Huimin Zhang, Duancheng Zhao et al.

Deep learning is an important method for molecular design and exhibits considerable ability to predict molecular properties, including physicochemical, bioactive, and ADME/T (absorption, distribution, metabolism, excretion, and toxicity) properties. In this study, we advanced a novel deep learning architecture, termed FP-GNN, which combined and simultaneously learned information from molecular graphs and fingerprints. To evaluate the FP-GNN model, we conducted experiments on 13 public datasets, an unbiased LIT-PCBA dataset, and 14 phenotypic screening datasets for breast cell lines. Extensive evaluation results showed that compared to advanced deep learning and conventional machine learning algorithms, the FP-GNN algorithm achieved state-of-the-art performance on these datasets. In addition, we analyzed the influence of different molecular fingerprints, and the effects of molecular graphs and molecular fingerprints on the performance of the FP-GNN model. Analysis of the anti-noise ability and interpretation ability also indicated that FP-GNN was competitive in real-world situations.

LGJul 4, 2023
Learning to Branch in Combinatorial Optimization with Graph Pointer Networks

Rui Wang, Zhiming Zhou, Tao Zhang et al.

Branch-and-bound is a typical way to solve combinatorial optimization problems. This paper proposes a graph pointer network model for learning the variable selection policy in the branch-and-bound. We extract the graph features, global features and historical features to represent the solver state. The proposed model, which combines the graph neural network and the pointer mechanism, can effectively map from the solver state to the branching variable decisions. The model is trained to imitate the classic strong branching expert rule by a designed top-k Kullback-Leibler divergence loss function. Experiments on a series of benchmark problems demonstrate that the proposed approach significantly outperforms the widely used expert-designed branching rules. Our approach also outperforms the state-of-the-art machine-learning-based branch-and-bound methods in terms of solving speed and search tree size on all the test instances. In addition, the model can generalize to unseen instances and scale to larger instances.

LGAug 30, 2022Code
HiGNN: Hierarchical Informative Graph Neural Networks for Molecular Property Prediction Equipped with Feature-Wise Attention

Weimin Zhu, Yi Zhang, DuanCheng Zhao et al.

Elucidating and accurately predicting the druggability and bioactivities of molecules plays a pivotal role in drug design and discovery and remains an open challenge. Recently, graph neural networks (GNN) have made remarkable advancements in graph-based molecular property prediction. However, current graph-based deep learning methods neglect the hierarchical information of molecules and the relationships between feature channels. In this study, we propose a well-designed hierarchical informative graph neural networks framework (termed HiGNN) for predicting molecular property by utilizing a co-representation learning of molecular graphs and chemically synthesizable BRICS fragments. Furthermore, a plug-and-play feature-wise attention block is first designed in HiGNN architecture to adaptively recalibrate atomic features after the message passing phase. Extensive experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark datasets. In addition, we devise a molecule-fragment similarity mechanism to comprehensively investigate the interpretability of HiGNN model at the subgraph level, indicating that HiGNN as a powerful deep learning tool can help chemists and pharmacists identify the key components of molecules for designing better molecules with desired properties or functions. The source code is publicly available at https://github.com/idruglab/hignn.

LGJul 3, 2023
Enhancing the Robustness of QMIX against State-adversarial Attacks

Weiran Guo, Guanjun Liu, Ziyuan Zhou et al.

Deep reinforcement learning (DRL) performance is generally impacted by state-adversarial attacks, a perturbation applied to an agent's observation. Most recent research has concentrated on robust single-agent reinforcement learning (SARL) algorithms against state-adversarial attacks. Still, there has yet to be much work on robust multi-agent reinforcement learning. Using QMIX, one of the popular cooperative multi-agent reinforcement algorithms, as an example, we discuss four techniques to improve the robustness of SARL algorithms and extend them to multi-agent scenarios. To increase the robustness of multi-agent reinforcement learning (MARL) algorithms, we train models using a variety of attacks in this research. We then test the models taught using the other attacks by subjecting them to the corresponding attacks throughout the training phase. In this way, we organize and summarize techniques for enhancing robustness when used with MARL.

CVJul 2, 2024
EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More

Xu Zheng, Ling Wang, Kanghao Chen et al.

Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli presented to participants compromise their quality and fidelity in capturing precise brain activity. The study in neuroscience unveils that the relationship between visual and textual stimulus in EEG recordings provides valuable insights into the brain's ability to process and integrate multi-modal information simultaneously. Inspired by this, we propose a novel large-scale multi-modal dataset, named EIT-1M, with over 1 million EEG-image-text pairs. Our dataset is superior in its capacity of reflecting brain activities in simultaneously processing multi-modal information. To achieve this, we collected data pairs while participants viewed alternating sequences of visual-textual stimuli from 60K natural images and category-specific texts. Common semantic categories are also included to elicit better reactions from participants' brains. Meanwhile, response-based stimulus timing and repetition across blocks and sessions are included to ensure data diversity. To verify the effectiveness of EIT-1M, we provide an in-depth analysis of EEG data captured from multi-modal stimuli across different categories and participants, along with data quality scores for transparency. We demonstrate its validity on two tasks: 1) EEG recognition from visual or textual stimuli or both and 2) EEG-to-visual generation.

BMSep 17, 2022
VDDB: a comprehensive resource and machine learning platform for antiviral drug discovery

Shunming Tao, Yihao Chen, Jingxing Wu et al.

Virus infection is one of the major diseases that seriously threaten human health. To meet the growing demand for mining and sharing data resources related to antiviral drugs and to accelerate the design and discovery of new antiviral drugs, we presented an open-access antiviral drug resource and machine learning platform (VDDB), which, to the best of our knowledge, is the first comprehensive dedicated resource for experimentally verified potential drugs/molecules based on manually curated data. Currently, VDDB highlights 848 clinical vaccines, 199 clinical antibodies, as well as over 710,000 small molecules targeting 39 medically important viruses including SARS-CoV-2. Furthermore, VDDB stores approximately 3 million records of pharmacological data for these collected potential antiviral drugs/molecules, involving 314 cell infection-based phenotypic and 234 target-based genotypic assays. Based on these annotated pharmacological data, VDDB allows users to browse, search and download reliable information about these collects for various viruses of interest. In particular, VDDB also integrates 57 cell infection- and 117 target-based associated high-accuracy machine learning models to support various antivirals identification-related tasks, such as compound activity prediction, virtual screening, drug repositioning and target fishing. VDDB is freely accessible at http://vddb.idruglab.cn.

CVFeb 25
Scan Clusters, Not Pixels: A Cluster-Centric Paradigm for Efficient Ultra-high-definition Image Restoration

Chen Wu, Ling Wang, Zhuoran Zheng et al.

Ultra-High-Definition (UHD) image restoration is trapped in a scalability crisis: existing models, bound to pixel-wise operations, demand unsustainable computation. While state space models (SSMs) like Mamba promise linear complexity, their pixel-serial scanning remains a fundamental bottleneck for the millions of pixels in UHD content. We ask: must we process every pixel to understand the image? This paper introduces C$^2$SSM, a visual state space model that breaks this taboo by shifting from pixel-serial to cluster-serial scanning. Our core discovery is that the rich feature distribution of a UHD image can be distilled into a sparse set of semantic centroids via a neural-parameterized mixture model. C$^2$SSM leverages this to reformulate global modeling into a novel dual-path process: it scans and reasons over a handful of cluster centers, then diffuses the global context back to all pixels through a principled similarity distribution, all while a lightweight modulator preserves fine details. This cluster-centric paradigm achieves a decisive leap in efficiency, slashing computational costs while establishing new state-of-the-art results across five UHD restoration tasks. More than a solution, C$^2$SSM charts a new course for efficient large-scale vision: scan clusters, not pixels.

LGJan 17, 2024Code
ADCNet: a unified framework for predicting the activity of antibody-drug conjugates

Liye Chen, Biaoshun Li, Yihao Chen et al.

Antibody-drug conjugate (ADC) has revolutionized the field of cancer treatment in the era of precision medicine due to their ability to precisely target cancer cells and release highly effective drug. Nevertheless, the realization of rational design of ADC is very difficult because the relationship between their structures and activities is difficult to understand. In the present study, we introduce a unified deep learning framework called ADCNet to help design potential ADCs. The ADCNet highly integrates the protein representation learning language model ESM-2 and small-molecule representation learning language model FG-BERT models to achieve activity prediction through learning meaningful features from antigen and antibody protein sequences of ADC, SMILES strings of linker and payload, and drug-antibody ratio (DAR) value. Based on a carefully designed and manually tailored ADC data set, extensive evaluation results reveal that ADCNet performs best on the test set compared to baseline machine learning models across all evaluation metrics. For example, it achieves an average prediction accuracy of 87.12%, a balanced accuracy of 0.8689, and an area under receiver operating characteristic curve of 0.9293 on the test set. In addition, cross-validation, ablation experiments, and external independent testing results further prove the stability, advancement, and robustness of the ADCNet architecture. For the convenience of the community, we develop the first online platform (https://ADCNet.idruglab.cn) for the prediction of ADCs activity based on the optimal ADCNet model, and the source code is publicly available at https://github.com/idrugLab/ADCNet.

CVFeb 26
SceneTransporter: Optimal Transport-Guided Compositional Latent Diffusion for Single-Image Structured 3D Scene Generation

Ling Wang, Hao-Xiang Guo, Xinzhou Wang et al.

We introduce SceneTransporter, an end-to-end framework for structured 3D scene generation from a single image. While existing methods generate part-level 3D objects, they often fail to organize these parts into distinct instances in open-world scenes. Through a debiased clustering probe, we reveal a critical insight: this failure stems from the lack of structural constraints within the model's internal assignment mechanism. Based on this finding, we reframe the task of structured 3D scene generation as a global correlation assignment problem. To solve this, SceneTransporter formulates and solves an entropic Optimal Transport (OT) objective within the denoising loop of the compositional DiT model. This formulation imposes two powerful structural constraints. First, the resulting transport plan gates cross-attention to enforce an exclusive, one-to-one routing of image patches to part-level 3D latents, preventing entanglement. Second, the competitive nature of the transport encourages the grouping of similar patches, a process that is further regularized by an edge-based cost, to form coherent objects and prevent fragmentation. Extensive experiments show that SceneTransporter outperforms existing methods on open-world scene generation, significantly improving instance-level coherence and geometric fidelity. Code and models will be publicly available at https://2019epwl.github.io/SceneTransporter/.

LGJul 17, 2024
Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments

Runfa Chen, Ling Wang, Yu Du et al.

Learning policies for multi-entity systems in 3D environments is far more complicated against single-entity scenarios, due to the exponential expansion of the global state space as the number of entities increases. One potential solution of alleviating the exponential complexity is dividing the global space into independent local views that are invariant to transformations including translations and rotations. To this end, this paper proposes Subequivariant Hierarchical Neural Networks (SHNN) to facilitate multi-entity policy learning. In particular, SHNN first dynamically decouples the global space into local entity-level graphs via task assignment. Second, it leverages subequivariant message passing over the local entity-level graphs to devise local reference frames, remarkably compressing the representation redundancy, particularly in gravity-affected environments. Furthermore, to overcome the limitations of existing benchmarks in capturing the subtleties of multi-entity systems under the Euclidean symmetry, we propose the Multi-entity Benchmark (MEBEN), a new suite of environments tailored for exploring a wide range of multi-entity reinforcement learning. Extensive experiments demonstrate significant advancements of SHNN on the proposed benchmarks compared to existing methods. Comprehensive ablations are conducted to verify the indispensability of task assignment and subequivariance.

AIMay 14
Hypergraph Enterprise Agentic Reasoner over Heterogeneous Business Systems

Ling Wang, Songnan Liu, Jianan Wang et al.

Applying Large Language Models (LLMs) to heterogeneous enterprise systems is hindered by hallucinations and failures in multi-hop, n-ary reasoning. Existing paradigms (e.g., GraphRAG, NL2SQL) lack the semantic grounding and auditable execution required for these complex environments. We introduce HEAR, an enterprise agentic reasoner built on a Stratified Hypergraph Ontology. Its base Graph Layer virtualizes provenance-aware data interfaces, while the Hyperedge Layer encodes n-ary business rules and procedural protocols. Operating an evidence-driven reasoning loop, HEAR dynamically orchestrates ontology tools for structured multi-hop analysis without requiring LLM retraining. Evaluations on supply-chain tasks, including order fulfillment blockage root cause analysis (RCA), show HEAR achieves up to 94.7% accuracy. Crucially, HEAR demonstrates adaptive efficiency: utilizing procedural hyperedges to minimize token costs, while leveraging topological exploration for rigorous correctness on complex queries. By matching proprietary model performance with open-weight backbones and automating manual diagnostics, HEAR establishes a scalable, auditable foundation for enterprise intelligence.

CVSep 21, 2024
BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance

Ling Wang, Chen Wu, Lin Wang

Can we directly visualize what we imagine in our brain together with what we describe? The inherent nature of human perception reveals that, when we think, our body can combine language description and build a vivid picture in our brain. Intuitively, generative models should also hold such versatility. In this paper, we introduce BrainDreamer, a novel end-to-end language-guided generative framework that can mimic human reasoning and generate high-quality images from electroencephalogram (EEG) brain signals. Our method is superior in its capacity to eliminate the noise introduced by non-invasive EEG data acquisition and meanwhile achieve a more precise mapping between the EEG and image modality, thus leading to significantly better-generated images. Specifically, BrainDreamer consists of two key learning stages: 1) modality alignment and 2) image generation. In the alignment stage, we propose a novel mask-based triple contrastive learning strategy to effectively align EEG, text, and image embeddings to learn a unified representation. In the generation stage, we inject the EEG embeddings into the pre-trained Stable Diffusion model by designing a learnable EEG adapter to generate high-quality reasoning-coherent images. Moreover, BrainDreamer can accept textual descriptions (e.g., color, position, etc.) to achieve controllable image generation. Extensive experiments show that our method significantly outperforms prior arts in terms of generating quality and quantitative performance.

CVSep 20, 2024
DAP-LED: Learning Degradation-Aware Priors with CLIP for Joint Low-light Enhancement and Deblurring

Ling Wang, Chen Wu, Lin Wang

Autonomous vehicles and robots often struggle with reliable visual perception at night due to the low illumination and motion blur caused by the long exposure time of RGB cameras. Existing methods address this challenge by sequentially connecting the off-the-shelf pretrained low-light enhancement and deblurring models. Unfortunately, these methods often lead to noticeable artifacts (\eg, color distortions) in the over-exposed regions or make it hardly possible to learn the motion cues of the dark regions. In this paper, we interestingly find vision-language models, \eg, Contrastive Language-Image Pretraining (CLIP), can comprehensively perceive diverse degradation levels at night. In light of this, we propose a novel transformer-based joint learning framework, named DAP-LED, which can jointly achieve low-light enhancement and deblurring, benefiting downstream tasks, such as depth estimation, segmentation, and detection in the dark. The key insight is to leverage CLIP to adaptively learn the degradation levels from images at night. This subtly enables learning rich semantic information and visual representation for optimization of the joint tasks. To achieve this, we first introduce a CLIP-guided cross-fusion module to obtain multi-scale patch-wise degradation heatmaps from the image embeddings. Then, the heatmaps are fused via the designed CLIP-enhanced transformer blocks to retain useful degradation information for effective model optimization. Experimental results show that, compared to existing methods, our DAP-LED achieves state-of-the-art performance in the dark. Meanwhile, the enhanced results are demonstrated to be effective for three downstream tasks. For demo and more results, please check the project page: \url{https://vlislab22.github.io/dap-led/}.

LGJul 1, 2025Code
PNAct: Crafting Backdoor Attacks in Safe Reinforcement Learning

Weiran Guo, Guanjun Liu, Ziyuan Zhou et al.

Reinforcement Learning (RL) is widely used in tasks where agents interact with an environment to maximize rewards. Building on this foundation, Safe Reinforcement Learning (Safe RL) incorporates a cost metric alongside the reward metric, ensuring that agents adhere to safety constraints during decision-making. In this paper, we identify that Safe RL is vulnerable to backdoor attacks, which can manipulate agents into performing unsafe actions. First, we introduce the relevant concepts and evaluation metrics for backdoor attacks in Safe RL. It is the first attack framework in the Safe RL field that involves both Positive and Negative Action sample (PNAct) is to implant backdoors, where positive action samples provide reference actions and negative action samples indicate actions to be avoided. We theoretically point out the properties of PNAct and design an attack algorithm. Finally, we conduct experiments to evaluate the effectiveness of our proposed backdoor attack framework, evaluating it with the established metrics. This paper highlights the potential risks associated with Safe RL and underscores the feasibility of such attacks. Our code and supplementary material are available at https://github.com/azure-123/PNAct.

CVMay 12
Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution

Chen Wu, Ling Wang, Zhuoran Zheng et al.

Guided depth super-resolution (GDSR) reconstructs HR depth maps from LR inputs with HR RGB guidance. Existing methods either model each modality independently or rely on computationally expensive attention mechanisms with quadratic complexity, hindering the establishment of efficient and semantically interactive joint representations. In this paper, we observe that feature maps from different modalities exhibit semantic-level correlations during feature extraction. This motivates us to develop a more flexible approach enabling dense, semantically-aware deep interactions between modalities. To this end, we propose a novel GDSR framework centered around the Interactive State Space Model. Specifically, we design a cross-modal local scanning mechanism that enables fine-grained semantic interactions between RGB and depth features. Leveraging the Mamba architecture, our framework achieves global modeling with linear complexity. Furthermore, a cross-modal matching transform module is introduced to enhance interactive modeling quality by utilizing representative features from both modalities. Extensive experiments demonstrate competitive performance against state-of-the-art methods.

CVNov 14, 2025
From Events to Clarity: The Event-Guided Diffusion Framework for Dehazing

Ling Wang, Yunfan Lu, Wenzong Ma et al.

Clear imaging under hazy conditions is a critical task. Prior-based and neural methods have improved results. However, they operate on RGB frames, which suffer from limited dynamic range. Therefore, dehazing remains ill-posed and can erase structure and illumination details. To address this, we use event cameras for dehazing for the \textbf{first time}. Event cameras offer much higher HDR ($120 dBvs.60 dB$) and microsecond latency, therefore they suit hazy scenes. In practice, transferring HDR cues from events to frames is hard because real paired data are scarce. To tackle this, we propose an event-guided diffusion model that utilizes the strong generative priors of diffusion models to reconstruct clear images from hazy inputs by effectively transferring HDR information from events. Specifically, we design an event-guided module that maps sparse HDR event features, \textit{e.g.,} edges, corners, into the diffusion latent space. This clear conditioning provides precise structural guidance during generation, improves visual realism, and reduces semantic drift. For real-world evaluation, we collect a drone dataset in heavy haze (AQI = 341) with synchronized RGB and event sensors. Experiments on two benchmarks and our dataset achieve state-of-the-art results.

LGAug 5, 2024
Spatial-temporal Graph Convolutional Networks with Diversified Transformation for Dynamic Graph Representation Learning

Ling Wang, Yixiang Huang, Hao Wu

Dynamic graphs (DG) are often used to describe evolving interactions between nodes in real-world applications. Temporal patterns are a natural feature of DGs and are also key to representation learning. However, existing dynamic GCN models are mostly composed of static GCNs and sequence modules, which results in the separation of spatiotemporal information and cannot effectively capture complex temporal patterns in DGs. To address this problem, this study proposes a spatial-temporal graph convolutional networks with diversified transformation (STGCNDT), which includes three aspects: a) constructing a unified graph tensor convolutional network (GTCN) using tensor M-products without the need to represent spatiotemporal information separately; b) introducing three transformation schemes in GTCN to model complex temporal patterns to aggregate temporal information; and c) constructing an ensemble of diversified transformation schemes to obtain higher representation capabilities. Empirical studies on four DGs that appear in communication networks show that the proposed STGCNDT significantly outperforms state-of-the-art models in solving link weight estimation tasks due to the diversified transformations.

AIJan 15, 2024
Constrained Multi-objective Optimization with Deep Reinforcement Learning Assisted Operator Selection

Fei Ming, Wenyin Gong, Ling Wang et al.

Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention. Various constrained multi-objective optimization evolutionary algorithms (CMOEAs) have been developed with the use of different algorithmic strategies, evolutionary operators, and constraint-handling techniques. The performance of CMOEAs may be heavily dependent on the operators used, however, it is usually difficult to select suitable operators for the problem at hand. Hence, improving operator selection is promising and necessary for CMOEAs. This work proposes an online operator selection framework assisted by Deep Reinforcement Learning. The dynamics of the population, including convergence, diversity, and feasibility, are regarded as the state; the candidate operators are considered as actions; and the improvement of the population state is treated as the reward. By using a Q-Network to learn a policy to estimate the Q-values of all actions, the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance. The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems. The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs.

CVDec 6, 2023
AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation

Xinzhou Wang, Yikai Wang, Junliang Ye et al.

Advances in 3D generation have facilitated sequential 3D model generation (a.k.a 4D generation), yet its application for animatable objects with large motion remains scarce. Our work proposes AnimatableDreamer, a text-to-4D generation framework capable of generating diverse categories of non-rigid objects on skeletons extracted from a monocular video. At its core, AnimatableDreamer is equipped with our novel optimization design dubbed Canonical Score Distillation (CSD), which lifts 2D diffusion for temporal consistent 4D generation. CSD, designed from a score gradient perspective, generates a canonical model with warp-robustness across different articulations. Notably, it also enhances the authenticity of bones and skinning by integrating inductive priors from a diffusion model. Furthermore, with multi-view distillation, CSD infers invisible regions, thereby improving the fidelity of monocular non-rigid reconstruction. Extensive experiments demonstrate the capability of our method in generating high-flexibility text-guided 3D models from the monocular video, while also showing improved reconstruction performance over existing non-rigid reconstruction methods.

AIDec 3, 2024
Medical Multimodal Foundation Models in Clinical Diagnosis and Treatment: Applications, Challenges, and Future Directions

Kai Sun, Siyan Xue, Fuchun Sun et al.

Recent advancements in deep learning have significantly revolutionized the field of clinical diagnosis and treatment, offering novel approaches to improve diagnostic precision and treatment efficacy across diverse clinical domains, thus driving the pursuit of precision medicine. The growing availability of multi-organ and multimodal datasets has accelerated the development of large-scale Medical Multimodal Foundation Models (MMFMs). These models, known for their strong generalization capabilities and rich representational power, are increasingly being adapted to address a wide range of clinical tasks, from early diagnosis to personalized treatment strategies. This review offers a comprehensive analysis of recent developments in MMFMs, focusing on three key aspects: datasets, model architectures, and clinical applications. We also explore the challenges and opportunities in optimizing multimodal representations and discuss how these advancements are shaping the future of healthcare by enabling improved patient outcomes and more efficient clinical workflows.

CVNov 10, 2024
Dropout the High-rate Downsampling: A Novel Design Paradigm for UHD Image Restoration

Chen Wu, Ling Wang, Long Peng et al.

With the popularization of high-end mobile devices, Ultra-high-definition (UHD) images have become ubiquitous in our lives. The restoration of UHD images is a highly challenging problem due to the exaggerated pixel count, which often leads to memory overflow during processing. Existing methods either downsample UHD images at a high rate before processing or split them into multiple patches for separate processing. However, high-rate downsampling leads to significant information loss, while patch-based approaches inevitably introduce boundary artifacts. In this paper, we propose a novel design paradigm to solve the UHD image restoration problem, called D2Net. D2Net enables direct full-resolution inference on UHD images without the need for high-rate downsampling or dividing the images into several patches. Specifically, we ingeniously utilize the characteristics of the frequency domain to establish long-range dependencies of features. Taking into account the richer local patterns in UHD images, we also design a multi-scale convolutional group to capture local features. Additionally, during the decoding stage, we dynamically incorporate features from the encoding stage to reduce the flow of irrelevant information. Extensive experiments on three UHD image restoration tasks, including low-light image enhancement, image dehazing, and image deblurring, show that our model achieves better quantitative and qualitative results than state-of-the-art methods.

CVApr 11, 2024
Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition

Yang Chen, Jingcai Guo, Tian He et al.

Skeleton-based zero-shot action recognition aims to recognize unknown human actions based on the learned priors of the known skeleton-based actions and a semantic descriptor space shared by both known and unknown categories. However, previous works focus on establishing the bridges between the known skeleton representation space and semantic descriptions space at the coarse-grained level for recognizing unknown action categories, ignoring the fine-grained alignment of these two spaces, resulting in suboptimal performance in distinguishing high-similarity action categories. To address these challenges, we propose a novel method via Side information and dual-prompts learning for skeleton-based zero-shot action recognition (STAR) at the fine-grained level. Specifically, 1) we decompose the skeleton into several parts based on its topology structure and introduce the side information concerning multi-part descriptions of human body movements for alignment between the skeleton and the semantic space at the fine-grained level; 2) we design the visual-attribute and semantic-part prompts to improve the intra-class compactness within the skeleton space and inter-class separability within the semantic space, respectively, to distinguish the high-similarity actions. Extensive experiments show that our method achieves state-of-the-art performance in ZSL and GZSL settings on NTU RGB+D, NTU RGB+D 120, and PKU-MMD datasets.

LGJan 13, 2024
Tensor Graph Convolutional Network for Dynamic Graph Representation Learning

Ling Wang, Ye Yuan

Dynamic graphs (DG) describe dynamic interactions between entities in many practical scenarios. Most existing DG representation learning models combine graph convolutional network and sequence neural network, which model spatial-temporal dependencies through two different types of neural networks. However, this hybrid design cannot well capture the spatial-temporal continuity of a DG. In this paper, we propose a tensor graph convolutional network to learn DG representations in one convolution framework based on the tensor product with the following two-fold ideas: a) representing the information of DG by tensor form; b) adopting tensor product to design a tensor graph convolutional network modeling spatial-temporal feature simultaneously. Experiments on real-world DG datasets demonstrate that our model obtains state-of-the-art performance.

CVApr 1, 2024
Equivariant Local Reference Frames for Unsupervised Non-rigid Point Cloud Shape Correspondence

Ling Wang, Runfa Chen, Yikai Wang et al.

Unsupervised non-rigid point cloud shape correspondence underpins a multitude of 3D vision tasks, yet itself is non-trivial given the exponential complexity stemming from inter-point degree-of-freedom, i.e., pose transformations. Based on the assumption of local rigidity, one solution for reducing complexity is to decompose the overall shape into independent local regions using Local Reference Frames (LRFs) that are invariant to SE(3) transformations. However, the focus solely on local structure neglects global geometric contexts, resulting in less distinctive LRFs that lack crucial semantic information necessary for effective matching. Furthermore, such complexity introduces out-of-distribution geometric contexts during inference, thus complicating generalization. To this end, we introduce 1) EquiShape, a novel structure tailored to learn pair-wise LRFs with global structural cues for both spatial and semantic consistency, and 2) LRF-Refine, an optimization strategy generally applicable to LRF-based methods, aimed at addressing the generalization challenges. Specifically, for EquiShape, we employ cross-talk within separate equivariant graph neural networks (Cross-GVP) to build long-range dependencies to compensate for the lack of semantic information in local structure modeling, deducing pair-wise independent SE(3)-equivariant LRF vectors for each point. For LRF-Refine, the optimization adjusts LRFs within specific contexts and knowledge, enhancing the geometric and semantic generalizability of point features. Our overall framework surpasses the state-of-the-art methods by a large margin on three benchmarks. Code and models will be publicly available.

AIMar 6, 2024
An EnKF-LSTM Assimilation Algorithm for Crop Growth Model

Siqi Zhou, Ling Wang, Jie Liu et al.

Accurate and timely prediction of crop growth is of great significance to ensure crop yields and researchers have developed several crop models for the prediction of crop growth. However, there are large difference between the simulation results obtained by the crop models and the actual results, thus in this paper, we proposed to combine the simulation results with the collected crop data for data assimilation so that the accuracy of prediction will be improved. In this paper, an EnKF-LSTM data assimilation method for various crops is proposed by combining ensemble Kalman filter and LSTM neural network, which effectively avoids the overfitting problem of existing data assimilation methods and eliminates the uncertainty of the measured data. The verification of the proposed EnKF-LSTM method and the comparison of the proposed method with other data assimilation methods were performed using datasets collected by sensor equipment deployed on a farm.

LGFeb 19, 2024
Vehicle-group-based Crash Risk Prediction and Interpretation on Highways

Tianheng Zhu, Ling Wang, Yiheng Feng et al.

Previous studies in predicting crash risks primarily associated the number or likelihood of crashes on a road segment with traffic parameters or geometric characteristics, usually neglecting the impact of vehicles' continuous movement and interactions with nearby vehicles. Recent technology advances, such as Connected and Automated Vehicles (CAVs) and Unmanned Aerial Vehicles (UAVs) are able to collect high-resolution trajectory data, which enables trajectory-based risk analysis. This study investigates a new vehicle group (VG) based risk analysis method and explores risk evolution mechanisms considering VG features. An impact-based vehicle grouping method is proposed to cluster vehicles into VGs by evaluating their responses to the erratic behaviors of nearby vehicles. The risk of a VG is aggregated based on the risk between each vehicle pair in the VG, measured by inverse Time-to-Collision (iTTC). A Logistic Regression and a Graph Neural Network (GNN) are then employed to predict VG risks using aggregated and disaggregated VG information. Both methods achieve excellent performance with AUC values exceeding 0.93. For the GNN model, GNNExplainer with feature perturbation is applied to identify critical individual vehicle features and their directional impact on VG risks. Overall, this research contributes a new perspective for identifying, predicting, and interpreting traffic risks.

LGMar 8
Constraints Matrix Diffusion based Generative Neural Solver for Vehicle Routing Problems

Zhenwei Wang, Tiehua Zhang, Ning Xue et al.

Over the past decade, neural network solvers powered by generative artificial intelligence have garnered significant attention in the domain of vehicle routing problems (VRPs), owing to their exceptional computational efficiency and superior reasoning capabilities. In particular, autoregressive solvers integrated with reinforcement learning have emerged as a prominent trend. However, much of the existing work emphasizes large-scale generalization of neural approaches while neglecting the limited robustness of attention-based methods across heterogeneous distributions of problem parameters. Their improvements over heuristic search remain largely restricted to hand-curated, fixed-distribution benchmarks. Furthermore, these architectures tend to degrade significantly when node representations are highly similar or when tasks involve long decision horizons. To address the aforementioned limitations, we propose a novel fusion neural network framework that employs a discrete noise graph diffusion model to learn the underlying constraints of vehicle routing problems and generate a constraint assignment matrix. This matrix is subsequently integrated adaptively into the feature representation learning and decision process of the autoregressive solver, serving as a graph structure mask that facilitates the formation of solutions characterized by both global vision and local feature integration. To the best of our knowledge, this work represents the first comprehensive experimental investigation of neural network model solvers across a 378-combinatorial space spanning four distinct dimensions within the CVRPlib public dataset. Extensive experimental evaluations demonstrate that our proposed fusion model effectively captures and leverages problem constraints, achieving state-of-the-art performance across multiple benchmark datasets.

CVOct 28, 2025
TeleEgo: Benchmarking Egocentric AI Assistants in the Wild

Jiaqi Yan, Ruilong Ren, Jingren Liu et al.

Egocentric AI assistants in real-world settings must process multi-modal inputs (video, audio, text), respond in real time, and retain evolving long-term memory. However, existing benchmarks typically evaluate these abilities in isolation, lack realistic streaming scenarios, or support only short-term tasks. We introduce \textbf{TeleEgo}, a long-duration, streaming, omni-modal benchmark for evaluating egocentric AI assistants in realistic daily contexts. The dataset features over 14 hours per participant of synchronized egocentric video, audio, and text across four domains: work \& study, lifestyle \& routines, social activities, and outings \& culture. All data is aligned on a unified global timeline and includes high-quality visual narrations and speech transcripts, curated through human refinement.TeleEgo defines 12 diagnostic subtasks across three core capabilities: Memory (recalling past events), Understanding (interpreting the current moment), and Cross-Memory Reasoning (linking distant events). It contains 3,291 human-verified QA items spanning multiple question formats (single-choice, binary, multi-choice, and open-ended), evaluated strictly in a streaming setting. We propose two key metrics -- Real-Time Accuracy and Memory Persistence Time -- to jointly assess correctness, temporal responsiveness, and long-term retention. TeleEgo provides a realistic and comprehensive evaluation to advance the development of practical AI assistants.

LGJul 19, 2025
A Transformer-Based Conditional GAN with Multiple Instance Learning for UAV Signal Detection and Classification

Haochen Liu, Jia Bi, Xiaomin Wang et al.

Unmanned Aerial Vehicles (UAVs) are increasingly used in surveillance, logistics, agriculture, disaster management, and military operations. Accurate detection and classification of UAV flight states, such as hovering, cruising, ascending, or transitioning, which are essential for safe and effective operations. However, conventional time series classification (TSC) methods often lack robustness and generalization for dynamic UAV environments, while state of the art(SOTA) models like Transformers and LSTM based architectures typically require large datasets and entail high computational costs, especially with high-dimensional data streams. This paper proposes a novel framework that integrates a Transformer-based Generative Adversarial Network (GAN) with Multiple Instance Locally Explainable Learning (MILET) to address these challenges in UAV flight state classification. The Transformer encoder captures long-range temporal dependencies and complex telemetry dynamics, while the GAN module augments limited datasets with realistic synthetic samples. MIL is incorporated to focus attention on the most discriminative input segments, reducing noise and computational overhead. Experimental results show that the proposed method achieves superior accuracy 96.5% on the DroneDetect dataset and 98.6% on the DroneRF dataset that outperforming other SOTA approaches. The framework also demonstrates strong computational efficiency and robust generalization across diverse UAV platforms and flight states, highlighting its potential for real-time deployment in resource constrained environments.

LGJun 15, 2025
PDCNet: a benchmark and general deep learning framework for activity prediction of peptide-drug conjugates

Yun Liu, Jintu Huang, Yingying Zhu et al.

Peptide-drug conjugates (PDCs) represent a promising therapeutic avenue for human diseases, particularly in cancer treatment. Systematic elucidation of structure-activity relationships (SARs) and accurate prediction of the activity of PDCs are critical for the rational design and optimization of these conjugates. To this end, we carefully design and construct a benchmark PDCs dataset compiled from literature-derived collections and PDCdb database, and then develop PDCNet, the first unified deep learning framework for forecasting the activity of PDCs. The architecture systematically captures the complex factors underlying anticancer decisions of PDCs in real-word scenarios through a multi-level feature fusion framework that collaboratively characterizes and learns the features of peptides, linkers, and payloads. Leveraging a curated PDCs benchmark dataset, comprehensive evaluation results show that PDCNet demonstrates superior predictive capability, with the highest AUC, F1, MCC and BA scores of 0.9213, 0.7656, 0.7071 and 0.8388 for the test set, outperforming eight established traditional machine learning models. Multi-level validations, including 5-fold cross-validation, threshold testing, ablation studies, model interpretability analysis and external independent testing, further confirm the superiority, robustness, and usability of the PDCNet architecture. We anticipate that PDCNet represents a novel paradigm, incorporating both a benchmark dataset and advanced models, which can accelerate the design and discovery of new PDC-based therapeutic agents.

SYJun 5, 2025
Towards provable probabilistic safety for scalable embodied AI systems

Linxuan He, Qing-Shan Jia, Ang Li et al.

Embodied AI systems, comprising AI models and physical plants, are increasingly prevalent across various applications. Due to the rarity of system failures, ensuring their safety in complex operating environments remains a major challenge, which severely hinders their large-scale deployment in safety-critical domains, such as autonomous vehicles, medical devices, and robotics. While achieving provable deterministic safety--verifying system safety across all possible scenarios--remains theoretically ideal, the rarity and complexity of corner cases make this approach impractical for scalable embodied AI systems. Instead, empirical safety evaluation is employed as an alternative, but the absence of provable guarantees imposes significant limitations. To address these issues, we argue for a paradigm shift to provable probabilistic safety that integrates provable guarantees with progressive achievement toward a probabilistic safety boundary on overall system performance. The new paradigm better leverages statistical methods to enhance feasibility and scalability, and a well-defined probabilistic safety boundary enables embodied AI systems to be deployed at scale. In this Perspective, we outline a roadmap for provable probabilistic safety, along with corresponding challenges and potential solutions. By bridging the gap between theoretical safety assurance and practical deployment, this Perspective offers a pathway toward safer, large-scale adoption of embodied AI systems in safety-critical applications.

LGApr 26, 2025
High-order Graph Neural Networks with Common Neighbor Awareness for Link Prediction

Ling Wang, Minglian Han

Link prediction is a fundamental task in dynamic graph learning (DGL), inherently shaped by the topology of the DG. Recent advancements in dynamic graph neural networks (DGNN), primarily by modeling the relationships among nodes via a message passing scheme, have significantly improved link prediction performance. However, DGNNs heavily rely on the pairwise node interactions, which neglect the common neighbor interaction in DGL. To address this limitation, we propose a High-order Graph Neural Networks with Common Neighbor Awareness (HGNN-CNA) for link prediction with two-fold ideas: a) estimating correlation score by considering multi-hop common neighbors for capturing the complex interaction between nodes; b) fusing the correlation into the message-passing process to consider common neighbor interaction directly in DGL. Experimental results on three real DGs demonstrate that the proposed HGNN-CNA acquires a significant accuracy gain over several state-of-the-art models on the link prediction task.

NEJan 22, 2025
Reinforcement learning Based Automated Design of Differential Evolution Algorithm for Black-box Optimization

Xu Yang, Rui Wang, Kaiwen Li et al.

Differential evolution (DE) algorithm is recognized as one of the most effective evolutionary algorithms, demonstrating remarkable efficacy in black-box optimization due to its derivative-free nature. Numerous enhancements to the fundamental DE have been proposed, incorporating innovative mutation strategies and sophisticated parameter tuning techniques to improve performance. However, no single variant has proven universally superior across all problems. To address this challenge, we introduce a novel framework that employs reinforcement learning (RL) to automatically design DE for black-box optimization through meta-learning. RL acts as an advanced meta-optimizer, generating a customized DE configuration that includes an optimal initialization strategy, update rule, and hyperparameters tailored to a specific black-box optimization problem. This process is informed by a detailed analysis of the problem characteristics. In this proof-of-concept study, we utilize a double deep Q-network for implementation, considering a subset of 40 possible strategy combinations and parameter optimizations simultaneously. The framework's performance is evaluated against black-box optimization benchmarks and compared with state-of-the-art algorithms. The experimental results highlight the promising potential of our proposed framework.

LGMay 17, 2023
MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks

Wenfang Sun, Yingjun Du, Xiantong Zhen et al.

Meta-learning algorithms are able to learn a new task using previously learned knowledge, but they often require a large number of meta-training tasks which may not be readily available. To address this issue, we propose a method for few-shot learning with fewer tasks, which we call MetaModulation. The key idea is to use a neural network to increase the density of the meta-training tasks by modulating batch normalization parameters during meta-training. Additionally, we modify parameters at various network levels, rather than just a single layer, to increase task diversity. To account for the uncertainty caused by the limited training tasks, we propose a variational MetaModulation where the modulation parameters are treated as latent variables. We also introduce learning variational feature hierarchies by the variational MetaModulation, which modulates features at all layers and can consider task uncertainty and generate more diverse tasks. The ablation studies illustrate the advantages of utilizing a learnable task modulation at different levels and demonstrate the benefit of incorporating probabilistic variants in few-task meta-learning. Our MetaModulation and its variational variants consistently outperform state-of-the-art alternatives on four few-task meta-learning benchmarks.

LGFeb 24, 2022
Temporal Convolution Domain Adaptation Learning for Crops Growth Prediction

Shengzhe Wang, Ling Wang, Zhihao Lin et al.

Existing Deep Neural Nets on crops growth prediction mostly rely on availability of a large amount of data. In practice, it is difficult to collect enough high-quality data to utilize the full potential of these deep learning models. In this paper, we construct an innovative network architecture based on domain adaptation learning to predict crops growth curves with limited available crop data. This network architecture overcomes the challenge of data availability by incorporating generated data from the developed crops simulation model. We are the first to use the temporal convolution filters as the backbone to construct a domain adaptation network architecture which is suitable for deep learning regression models with very limited training data of the target domain. We conduct experiments to test the performance of the network and compare our proposed architecture with other state-of-the-art methods, including a recent LSTM-based domain adaptation network architecture. The results show that the proposed temporal convolution-based network architecture outperforms all benchmarks not only in accuracy but also in model size and convergence rate.

QMDec 25, 2021
Explainable Artificial Intelligence for Pharmacovigilance: What Features Are Important When Predicting Adverse Outcomes?

Isaac Ronald Ward, Ling Wang, Juan lu et al.

Explainable Artificial Intelligence (XAI) has been identified as a viable method for determining the importance of features when making predictions using Machine Learning (ML) models. In this study, we created models that take an individual's health information (e.g. their drug history and comorbidities) as inputs, and predict the probability that the individual will have an Acute Coronary Syndrome (ACS) adverse outcome. Using XAI, we quantified the contribution that specific drugs had on these ACS predictions, thus creating an XAI-based technique for pharmacovigilance monitoring, using ACS as an example of the adverse outcome to detect. Individuals aged over 65 who were supplied Musculo-skeletal system (anatomical therapeutic chemical (ATC) class M) or Cardiovascular system (ATC class C) drugs between 1993 and 2009 were identified, and their drug histories, comorbidities, and other key features were extracted from linked Western Australian datasets. Multiple ML models were trained to predict if these individuals would have an ACS related adverse outcome (i.e., death or hospitalisation with a discharge diagnosis of ACS), and a variety of ML and XAI techniques were used to calculate which features -- specifically which drugs -- led to these predictions. The drug dispensing features for rofecoxib and celecoxib were found to have a greater than zero contribution to ACS related adverse outcome predictions (on average), and it was found that ACS related adverse outcomes can be predicted with 72% accuracy. Furthermore, the XAI libraries LIME and SHAP were found to successfully identify both important and unimportant features, with SHAP slightly outperforming LIME. ML models trained on linked administrative health datasets in tandem with XAI algorithms can successfully quantify feature importance, and with further development, could potentially be used as pharmacovigilance monitoring techniques.

LGJul 26, 2021
Trade When Opportunity Comes: Price Movement Forecasting via Locality-Aware Attention and Iterative Refinement Labeling

Liang Zeng, Lei Wang, Hui Niu et al.

Price movement forecasting, aimed at predicting financial asset trends based on current market information, has achieved promising advancements through machine learning (ML) methods. Most existing ML methods, however, struggle with the extremely low signal-to-noise ratio and stochastic nature of financial data, often mistaking noises for real trading signals without careful selection of potentially profitable samples. To address this issue, we propose LARA, a novel price movement forecasting framework with two main components: Locality-Aware Attention (LA-Attention) and Iterative Refinement Labeling (RA-Labeling). (1) LA-Attention, enhanced by metric learning techniques, automatically extracts the potentially profitable samples through masked attention scheme and task-specific distance metrics. (2) RA-Labeling further iteratively refines the noisy labels of potentially profitable samples, and combines the learned predictors robust to the unseen and noisy samples. In a set of experiments on three real-world financial markets: stocks, cryptocurrencies, and ETFs, LARA significantly outperforms several machine learning based methods on the Qlib quantitative investment platform. Extensive ablation studies confirm LARA's superior ability in capturing more reliable trading opportunities.

CVApr 16, 2021
LAI Estimation of Cucumber Crop Based on Improved Fully Convolutional Network

Weiqi Shu, Ling Wang, Bolong Liu et al.

LAI (Leaf Area Index) is of great importance for crop yield estimation in agronomy. It is directly related to plant growth status, net assimilation rate, plant photosynthesis, and carbon dioxide in the environment. How to measure LAI accurately and efficiently is the key to the crop yield estimation problem. Manual measurement consumes a lot of human resources and material resources. Remote sensing technology is not suitable for near-Earth LAI measurement. Besides, methods based on traditional digital image processing are greatly affected by environmental noise and image exposure. Nowadays, deep learning is widely used in many fields. The improved FCN (Fully Convolutional Network) is proposed in our study for LAI measure task. Eighty-two cucumber images collected from our greenhouse are labeled to fine-tuning the pre-trained model. The result shows that the improved FCN model performs well on our dataset. Our method's mean IoU can reach 0.908, which is 11% better than conventional methods and 4.7% better than the basic FCN model.

CVFeb 3, 2021
A Deep Learning-Based Approach to Extracting Periosteal and Endosteal Contours of Proximal Femur in Quantitative CT Images

Yu Deng, Ling Wang, Chen Zhao et al.

Automatic CT segmentation of proximal femur is crucial for the diagnosis and risk stratification of orthopedic diseases; however, current methods for the femur CT segmentation mainly rely on manual interactive segmentation, which is time-consuming and has limitations in both accuracy and reproducibility. In this study, we proposed an approach based on deep learning for the automatic extraction of the periosteal and endosteal contours of proximal femur in order to differentiate cortical and trabecular bone compartments. A three-dimensional (3D) end-to-end fully convolutional neural network, which can better combine the information between neighbor slices and get more accurate segmentation results, was developed for our segmentation task. 100 subjects aged from 50 to 87 years with 24,399 slices of proximal femur CT images were enrolled in this study. The separation of cortical and trabecular bone derived from the QCT software MIAF-Femur was used as the segmentation reference. We randomly divided the whole dataset into a training set with 85 subjects for 10-fold cross-validation and a test set with 15 subjects for evaluating the performance of models. Two models with the same network structures were trained and they achieved a dice similarity coefficient (DSC) of 97.87% and 96.49% for the periosteal and endosteal contours, respectively. To verify the excellent performance of our model for femoral segmentation, we measured the volume of different parts of the femur and compared it with the ground truth and the relative errors between predicted result and ground truth are all less than 5%. It demonstrated a strong potential for clinical use, including the hip fracture risk prediction and finite element analysis.

CROct 15, 2020
Progressive Defense Against Adversarial Attacks for Deep Learning as a Service in Internet of Things

Ling Wang, Cheng Zhang, Zejian Luo et al.

Nowadays, Deep Learning as a service can be deployed in Internet of Things (IoT) to provide smart services and sensor data processing. However, recent research has revealed that some Deep Neural Networks (DNN) can be easily misled by adding relatively small but adversarial perturbations to the input (e.g., pixel mutation in input images). One challenge in defending DNN against these attacks is to efficiently identifying and filtering out the adversarial pixels. The state-of-the-art defense strategies with good robustness often require additional model training for specific attacks. To reduce the computational cost without loss of generality, we present a defense strategy called a progressive defense against adversarial attacks (PDAAA) for efficiently and effectively filtering out the adversarial pixel mutations, which could mislead the neural network towards erroneous outputs, without a-priori knowledge about the attack type. We evaluated our progressive defense strategy against various attack methods on two well-known datasets. The result shows it outperforms the state-of-the-art while reducing the cost of model training by 50% on average.

NEAug 2, 2020
Balancing Common Treatment and Epidemic Control in Medical Procurement during COVID-19: Transform-and-Divide Evolutionary Optimization

Yu-Jun Zheng, Xin Chen, Tie-Er Gan et al.

Balancing common disease treatment and epidemic control is a key objective of medical supplies procurement in hospitals during a pandemic such as COVID-19. This problem can be formulated as a bi-objective optimization problem for simultaneously optimizing the effects of common disease treatment and epidemic control. However, due to the large number of supplies, difficulties in evaluating the effects, and the strict budget constraint, it is difficult for existing evolutionary multiobjective algorithms to efficiently approximate the Pareto front of the problem. In this paper, we present an approach that first transforms the original high-dimensional, constrained multiobjective optimization problem to a low-dimensional, unconstrained multiobjective optimization problem, and then evaluates each solution to the transformed problem by solving a set of simple single-objective optimization subproblems, such that the problem can be efficiently solved by existing evolutionary multiobjective algorithms. We applied the transform-and-divide evolutionary optimization approach to six hospitals in Zhejiang Province, China, during the peak of COVID-19. Results showed that the proposed approach exhibits significantly better performance than that of directly solving the original problem. Our study has also shown that transform-and-divide evolutionary optimization based on problem-specific knowledge can be an efficient solution approach to many other complex problems and, therefore, enlarge the application field of evolutionary algorithms.

LGDec 1, 2019
Model Embedded DRL for Intelligent Greenhouse Control

Tinghao Zhang, Jingxu Li, Jingfeng Li et al.

Greenhouse environment is the key to influence crops production. However, it is difficult for classical control methods to give precise environment setpoints, such as temperature, humidity, light intensity and carbon dioxide concentration for greenhouse because it is uncertain nonlinear system. Therefore, an intelligent close loop control framework based on model embedded deep reinforcement learning (MEDRL) is designed for greenhouse environment control. Specifically, computer vision algorithms are used to recognize growing periods and sex of crops, followed by the crop growth models, which can be trained with different growing periods and sex. These model outputs combined with the cost factor provide the setpoints for greenhouse and feedback to the control system in real-time. The whole MEDRL system has capability to conduct optimization control precisely and conveniently, and costs will be greatly reduced compared with traditional greenhouse control approaches.

CHEM-PHAug 2, 2019
Retrosynthesis with Attention-Based NMT Model and Chemical Analysis of the "Wrong" Predictions

Hongliang Duan, Ling Wang, Chengyun Zhang et al.

We cast retrosynthesis as a machine translation problem by introducing a special Tensor2Tensor, an entire attention-based and fully data-driven model. Given a data set comprising about 50,000 diverse reactions extracted from USPTO patents, the model significantly outperforms seq2seq model (34.7%) on a top-1 accuracy by achieving 54.1%. For yielding better results, parameters such as batch size and training time are thoroughly investigated to train the model. Additionally, we offer a novel insight into the causes of grammatically invalid SMILES, and conduct a test in which experienced chemists pick out and analyze the "wrong" predictions that may be chemically plausible but differ from the ground truth. Actually, the effectiveness of our model is un-derestimated and the "true" top-1 accuracy can reach to 64.6%.

LGNov 9, 2018
Sample-Efficient Policy Learning based on Completely Behavior Cloning

Qiming Zou, Ling Wang, Ke Lu et al.

Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise affine (PWA) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better.

LGJun 18, 2012
Total Variation and Euler's Elastica for Supervised Learning

Tong Lin, Hanlin Xue, Ling Wang et al.

In recent years, total variation (TV) and Euler's elastica (EE) have been successfully applied to image processing tasks such as denoising and inpainting. This paper investigates how to extend TV and EE to the supervised learning settings on high dimensional data. The supervised learning problem can be formulated as an energy functional minimization under Tikhonov regularization scheme, where the energy is composed of a squared loss and a total variation smoothing (or Euler's elastica smoothing). Its solution via variational principles leads to an Euler-Lagrange PDE. However, the PDE is always high-dimensional and cannot be directly solved by common methods. Instead, radial basis functions are utilized to approximate the target function, reducing the problem to finding the linear coefficients of basis functions. We apply the proposed methods to supervised learning tasks (including binary classification, multi-class classification, and regression) on benchmark data sets. Extensive experiments have demonstrated promising results of the proposed methods.