Peng Peng

h-index11

28papers

831citations

Novelty48%

AI Score52

Ranked #33,358 of 205,806 authors (top 16%)#7,637 in LG (top 18%)

28 Papers

LGFeb 13, 2023

An Order-Invariant and Interpretable Hierarchical Dilated Convolution Neural Network for Chemical Fault Detection and Diagnosis

Mengxuan Li, Peng Peng, Min Wang et al.

Fault detection and diagnosis is significant for reducing maintenance costs and improving health and safety in chemical processes. Convolution neural network (CNN) is a popular deep learning algorithm with many successful applications in chemical fault detection and diagnosis tasks. However, convolution layers in CNN are very sensitive to the order of features, which can lead to instability in the processing of tabular data. Optimal order of features result in better performance of CNN models but it is expensive to seek such optimal order. In addition, because of the encapsulation mechanism of feature extraction, most CNN models are opaque and have poor interpretability, thus failing to identify root-cause features without human supervision. These difficulties inevitably limit the performance and credibility of CNN methods. In this paper, we propose an order-invariant and interpretable hierarchical dilated convolution neural network (HDLCNN), which is composed by feature clustering, dilated convolution and the shapley additive explanations (SHAP) method. The novelty of HDLCNN lies in its capability of processing tabular data with features of arbitrary order without seeking the optimal order, due to the ability to agglomerate correlated features of feature clustering and the large receptive field of dilated convolution. Then, the proposed method provides interpretability by including the SHAP values to quantify feature contribution. Therefore, the root-cause features can be identified as the features with the highest contribution. Computational experiments are conducted on the Tennessee Eastman chemical process benchmark dataset. Compared with the other methods, the proposed HDLCNN-SHAP method achieves better performance on processing tabular data with features of arbitrary order, detecting faults, and identifying the root-cause features.

LGFeb 3, 2023

SCCAM: Supervised Contrastive Convolutional Attention Mechanism for Ante-hoc Interpretable Fault Diagnosis with Limited Fault Samples

Mengxuan Li, Peng Peng, Jingxin Zhang et al.

In real industrial processes, fault diagnosis methods are required to learn from limited fault samples since the procedures are mainly under normal conditions and the faults rarely occur. Although attention mechanisms have become popular in the field of fault diagnosis, the existing attention-based methods are still unsatisfying for the above practical applications. First, pure attention-based architectures like transformers need a large number of fault samples to offset the lack of inductive biases thus performing poorly under limited fault samples. Moreover, the poor fault classification dilemma further leads to the failure of the existing attention-based methods to identify the root causes. To address the aforementioned issues, we innovatively propose a supervised contrastive convolutional attention mechanism (SCCAM) with ante-hoc interpretability, which solves the root cause analysis problem under limited fault samples for the first time. The proposed SCCAM method is tested on a continuous stirred tank heater and the Tennessee Eastman industrial process benchmark. Three common fault diagnosis scenarios are covered, including a balanced scenario for additional verification and two scenarios with limited fault samples (i.e., imbalanced scenario and long-tail scenario). The comprehensive results demonstrate that the proposed SCCAM method can achieve better performance compared with the state-of-the-art methods on fault classification and root cause analysis.

CVDec 21, 2022

UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering

Chenlu Zhan, Peng Peng, Hongsen Wang et al.

Medical Visual Question Answering (Medical-VQA) aims to to answer clinical questions regarding radiology images, assisting doctors with decision-making options. Nevertheless, current Medical-VQA models learn cross-modal representations through residing vision and texture encoders in dual separate spaces, which lead to indirect semantic alignment. In this paper, we propose UnICLAM, a Unified and Interpretable Medical-VQA model through Contrastive Representation Learning with Adversarial Masking. Specifically, to learn an aligned image-text representation, we first establish a unified dual-stream pre-training structure with the gradually soft-parameter sharing strategy. Technically, the proposed strategy learns a constraint for the vision and texture encoders to be close in a same space, which is gradually loosened as the higher number of layers. Moreover, for grasping the unified semantic representation, we extend the adversarial masking data augmentation to the contrastive representation learning of vision and text in a unified manner. Concretely, while the encoder training minimizes the distance between original and masking samples, the adversarial masking module keeps adversarial learning to conversely maximize the distance. Furthermore, we also intuitively take a further exploration to the unified adversarial masking augmentation model, which improves the potential ante-hoc interpretability with remarkable performance and efficiency. Experimental results on VQA-RAD and SLAKE public benchmarks demonstrate that UnICLAM outperforms existing 11 state-of-the-art Medical-VQA models. More importantly, we make an additional discussion about the performance of UnICLAM in diagnosing heart failure, verifying that UnICLAM exhibits superior few-shot adaption performance in practical disease diagnosis.

LGFeb 12, 2023

SCLIFD:Supervised Contrastive Knowledge Distillation for Incremental Fault Diagnosis under Limited Fault Data

Peng Peng, Hanrong Zhang, Mengxuan Li et al.

Intelligent fault diagnosis has made extraordinary advancements currently. Nonetheless, few works tackle class-incremental learning for fault diagnosis under limited fault data, i.e., imbalanced and long-tailed fault diagnosis, which brings about various notable challenges. Initially, it is difficult to extract discriminative features from limited fault data. Moreover, a well-trained model must be retrained from scratch to classify the samples from new classes, thus causing a high computational burden and time consumption. Furthermore, the model may suffer from catastrophic forgetting when trained incrementally. Finally, the model decision is biased toward the new classes due to the class imbalance. The problems can consequently lead to performance degradation of fault diagnosis models. Accordingly, we introduce a supervised contrastive knowledge distillation for incremental fault diagnosis under limited fault data (SCLIFD) framework to address these issues, which extends the classical incremental classifier and representation learning (iCaRL) framework from three perspectives. Primarily, we adopt supervised contrastive knowledge distillation (KD) to enhance its representation learning capability under limited fault data. Moreover, we propose a novel prioritized exemplar selection method adaptive herding (AdaHerding) to restrict the increase of the computational burden, which is also combined with KD to alleviate catastrophic forgetting. Additionally, we adopt the cosine classifier to mitigate the adverse impact of class imbalance. We conduct extensive experiments on simulated and real-world industrial processes under different imbalance ratios. Experimental results show that our SCLIFD outperforms the existing methods by a large margin.

AIJun 27, 2023

Generalized Out-of-distribution Fault Diagnosis (GOOFD) via Internal Contrastive Learning

Xingyue Wang, Hanrong Zhang, Xinlong Qiao et al.

Fault diagnosis is crucial in monitoring machines within industrial processes. With the increasing complexity of working conditions and demand for safety during production, diverse diagnosis methods are required, and an integrated fault diagnosis system capable of handling multiple tasks is highly desired. However, the diagnosis subtasks are often studied separately, and the current methods still need improvement for such a generalized system. To address this issue, we propose the Generalized Out-of-distribution Fault Diagnosis (GOOFD) framework to integrate diagnosis subtasks. Additionally, a unified fault diagnosis method based on internal contrastive learning and Mahalanobis distance is put forward to underpin the proposed generalized framework. The method involves feature extraction through internal contrastive learning and outlier recognition based on the Mahalanobis distance. Our proposed method can be applied to multiple faults diagnosis tasks and achieve better performance than the existing single-task methods. Experiments are conducted on benchmark and practical process datasets, indicating the effectiveness of the proposed framework.

LGJun 26, 2023

Hard Sample Mining Enabled Supervised Contrastive Feature Learning for Wind Turbine Pitch System Fault Diagnosis

Zixuan Wang, Bo Qin, Mengxuan Li et al.

The efficient utilization of wind power by wind turbines relies on the ability of their pitch systems to adjust blade pitch angles in response to varying wind speeds. However, the presence of multiple health conditions in the pitch system due to the long-term wear and tear poses challenges in accurately classifying them, thus increasing the maintenance cost of wind turbines or even damaging them. This paper proposes a novel method based on hard sample mining-enabled supervised contrastive learning (HSMSCL) to address this problem. The proposed method employs cosine similarity to identify hard samples and subsequently, leverages supervised contrastive learning to learn more discriminative representations by constructing hard sample pairs. Furthermore, the hard sample mining framework in the proposed method also constructs hard samples with learned representations to make the training process of the multilayer perceptron (MLP) more challenging and make it a more effective classifier. The proposed approach progressively improves the fault diagnosis model by introducing hard samples in the SCL and MLP phases, thus enhancing its performance in complex multi-class fault diagnosis tasks. To evaluate the effectiveness of the proposed method, two real datasets comprising wind turbine pitch system cog belt fracture data are utilized. The fault diagnosis performance of the proposed method is compared against existing methods, and the results demonstrate its superior performance. The proposed approach exhibits significant improvements in fault diagnosis performance, providing promising prospects for enhancing the reliability and efficiency of wind turbine pitch system fault diagnosis.

LGOct 19, 2022

Supervised Contrastive Learning with Tree-Structured Parzen Estimator Bayesian Optimization for Imbalanced Tabular Data

Shuting Tao, Peng Peng, Qi Li et al.

Class imbalance has a detrimental effect on the predictive performance of most supervised learning algorithms as the imbalanced distribution can lead to a bias preferring the majority class. To solve this problem, we propose a Supervised Contrastive Learning (SCL) method with Tree-structured Parzen Estimator (TPE) technique for imbalanced tabular datasets. Contrastive learning (CL) can extract the information hidden in data even without labels and has shown some potential for imbalanced learning tasks. SCL further considers the label information based on CL, which also addresses the insufficient data augmentation techniques of tabular data. Therefore, in this work, we propose to use SCL to learn a discriminative representation of imbalanced tabular data. Additionally, the hyper-parameter temperature of SCL has a decisive influence on the performance and is difficult to tune. We introduce TPE, a well-known Bayesian optimization technique, to automatically select the best temperature. Experiments are conducted on both binary and multi-class imbalanced tabular datasets. As shown in the results obtained, TPE outperforms three other hyper-parameter optimization (HPO) methods such as grid search, random search, and genetic algorithm. More importantly, the proposed SCL-TPE method achieves much-improved performance compared with the state-of-the-art methods.

82.8IRApr 22Code

HaS: Accelerating RAG through Homology-Aware Speculative Retrieval

Peng Peng, Weiwei Lin, Wentai Wu et al.

Retrieval-Augmented Generation (RAG) expands the knowledge boundary of large language models (LLMs) at inference by retrieving external documents as context. However, retrieval becomes increasingly time-consuming as the knowledge databases grow in size. Existing acceleration strategies either compromise accuracy through approximate retrieval, or achieve marginal gains by reusing results of strictly identical queries. We propose HaS, a homology-aware speculative retrieval framework that performs low-latency speculative retrieval over restricted scopes to obtain candidate documents, followed by validating whether they contain the required knowledge. The validation, grounded in the homology relation between queries, is formulated as a homologous query re-identification task: once a previously observed query is identified as a homologous re-encounter of the incoming query, the draft is deemed acceptable, allowing the system to bypass slow full-database retrieval. Benefiting from the prevalence of homologous queries under real-world popularity patterns, HaS achieves substantial efficiency gains. Extensive experiments demonstrate that HaS reduces retrieval latency by 23.74% and 36.99% across datasets with only a 1-2% marginal accuracy drop. As a plug-and-play solution, HaS also significantly accelerates complex multi-hop queries in modern agentic RAG pipelines. Source code is available at: https://github.com/ErrEqualsNil/HaS.

LGJan 16, 2025Code

Class Incremental Fault Diagnosis under Limited Fault Data via Supervised Contrastive Knowledge Distillation

Hanrong Zhang, Yifei Yao, Zixuan Wang et al.

Class-incremental fault diagnosis requires a model to adapt to new fault classes while retaining previous knowledge. However, limited research exists for imbalanced and long-tailed data. Extracting discriminative features from few-shot fault data is challenging, and adding new fault classes often demands costly model retraining. Moreover, incremental training of existing methods risks catastrophic forgetting, and severe class imbalance can bias the model's decisions toward normal classes. To tackle these issues, we introduce a Supervised Contrastive knowledge distiLlation for class Incremental Fault Diagnosis (SCLIFD) framework proposing supervised contrastive knowledge distillation for improved representation learning capability and less forgetting, a novel prioritized exemplar selection method for sample replay to alleviate catastrophic forgetting, and the Random Forest Classifier to address the class imbalance. Extensive experimentation on simulated and real-world industrial datasets across various imbalance ratios demonstrates the superiority of SCLIFD over existing approaches. Our code can be found at https://github.com/Zhang-Henry/SCLIFD_TII.

AIJun 22, 2022

Evolutionary Game-Theoretical Analysis for General Multiplayer Asymmetric Games

Xinyu Zhang, Peng Peng, Yushan Zhou et al.

Evolutionary game theory has been a successful tool to combine classical game theory with learning-dynamical descriptions in multiagent systems. Provided some symmetric structures of interacting players, many studies have been focused on using a simplified heuristic payoff table as input to analyse the dynamics of interactions. Nevertheless, even for the state-of-the-art method, there are two limits. First, there is inaccuracy when analysing the simplified payoff table. Second, no existing work is able to deal with 2-population multiplayer asymmetric games. In this paper, we fill the gap between heuristic payoff table and dynamic analysis without any inaccuracy. In addition, we propose a general framework for $m$ versus $n$ 2-population multiplayer asymmetric games. Then, we compare our method with the state-of-the-art in some classic games. Finally, to illustrate our method, we perform empirical game-theoretical analysis on Wolfpack as well as StarCraft II, both of which involve complex multiagent interactions.

CVApr 29, 2021Code

Thermal Infrared Image Colorization for Nighttime Driving Scenes with Top-Down Guided Attention

Fuya Luo, Yunhan Li, Guang Zeng et al.

Benefitting from insensitivity to light and high penetration of foggy environments, infrared cameras are widely used for sensing in nighttime traffic scenes. However, the low contrast and lack of chromaticity of thermal infrared (TIR) images hinder the human interpretation and portability of high-level computer vision algorithms. Colorization to translate a nighttime TIR image into a daytime color (NTIR2DC) image may be a promising way to facilitate nighttime scene perception. Despite recent impressive advances in image translation, semantic encoding entanglement and geometric distortion in the NTIR2DC task remain under-addressed. Hence, we propose a toP-down attEntion And gRadient aLignment based GAN, referred to as PearlGAN. A top-down guided attention module and an elaborate attentional loss are first designed to reduce the semantic encoding ambiguity during translation. Then, a structured gradient alignment loss is introduced to encourage edge consistency between the translated and input images. In addition, pixel-level annotation is carried out on a subset of FLIR and KAIST datasets to evaluate the semantic preservation performance of multiple translation methods. Furthermore, a new metric is devised to evaluate the geometric consistency in the translation process. Extensive experiments demonstrate the superiority of the proposed PearlGAN over other image translation methods for the NTIR2DC task. The source code and labeled segmentation masks will be available at \url{https://github.com/FuyaLuo/PearlGAN/}.

40.4DBMar 11

MCI-SQL: Text-to-SQL with Metadata-Complete Context and Intermediate Correction

Qin Wang, Youhuan Li, Suixi Lin et al.

Text-to-SQL aims to translate natural language queries into SQL statements. Existing methods typically follow a pipeline of pre-processing, schema linking, candidate SQL generation, SQL alignment, and target SQL selection. However, these methods face significant challenges. First, they often struggle with column filtering during schema linking due to difficulties in comprehending raw metadata. Also, the candidate SQL generation process often suffers from reasoning errors, which limits accuracy improvements. To address these limitations, we propose a framework, called MCI-SQL, to efficiently and precisely generate SQL queries. Specifically, we assign metadata-complete contexts to each column, which significantly improves the accuracy of column filtering for schema linking. Also, for candidate SQL generation, we propose an intermediate correction mechanism that validates SQL queries and revises errors in a timely way. Moreover, we also propose effective optimizations in subsequent SQL alignment and selection phases, which further enhance the performance. Experiments on the widely-used BIRD benchmark show that MCI-SQL achieves execution accuracy of 74.45% on the development set and 76.41% on the test set, surpassing current published state-of-the-art results. In addition, we manually identify and correct 412 samples in the BIRD dataset, forming a new version named BIRD-clear, which is released together with our code on GitHub. We also evaluate our methods on BIRD-clear and find that MCI-SQL outperforms baselines by 8.47 percentage points in execution accuracy, further demonstrating the effectiveness and reliability of our framework.

CVNov 14, 2025

PROMISE: Prompt-Attentive Hierarchical Contrastive Learning for Robust Cross-Modal Representation with Missing Modalities

Jiajun Chen, Sai Cheng, Yutao Yuan et al.

Multimodal models integrating natural language and visual information have substantially improved generalization of representation models. However, their effectiveness significantly declines in real-world situations where certain modalities are missing or unavailable. This degradation primarily stems from inconsistent representation learning between complete multimodal data and incomplete modality scenarios. Existing approaches typically address missing modalities through relatively simplistic generation methods, yet these approaches fail to adequately preserve cross-modal consistency, leading to suboptimal performance. To overcome this limitation, we propose a novel multimodal framework named PROMISE, a PROMpting-Attentive HIerarchical ContraStive LEarning approach designed explicitly for robust cross-modal representation under conditions of missing modalities. Specifically, PROMISE innovatively incorporates multimodal prompt learning into a hierarchical contrastive learning framework, equipped with a specially designed prompt-attention mechanism. This mechanism dynamically generates robust and consistent representations for scenarios where particular modalities are absent, thereby effectively bridging the representational gap between complete and incomplete data. Extensive experiments conducted on benchmark datasets, along with comprehensive ablation studies, clearly demonstrate the superior performance of PROMISE compared to current state-of-the-art multimodal methods.

LGMay 13, 2025

FedRS-Bench: Realistic Federated Learning Datasets and Benchmarks in Remote Sensing

Haodong Zhao, Peng Peng, Chiyu Chen et al.

Remote sensing (RS) images are usually produced at an unprecedented scale, yet they are geographically and institutionally distributed, making centralized model training challenging due to data-sharing restrictions and privacy concerns. Federated learning (FL) offers a solution by enabling collaborative model training across decentralized RS data sources without exposing raw data. However, there lacks a realistic federated dataset and benchmark in RS. Prior works typically rely on manually partitioned single dataset, which fail to capture the heterogeneity and scale of real-world RS data, and often use inconsistent experimental setups, hindering fair comparison. To address this gap, we propose a realistic federated RS dataset, termed FedRS. FedRS consists of eight datasets that cover various sensors and resolutions and builds 135 clients, which is representative of realistic operational scenarios. Data for each client come from the same source, exhibiting authentic federated properties such as skewed label distributions, imbalanced client data volumes, and domain heterogeneity across clients. These characteristics reflect practical challenges in federated RS and support evaluation of FL methods at scale. Based on FedRS, we implement 10 baseline FL algorithms and evaluation metrics to construct the comprehensive FedRS-Bench. The experimental results demonstrate that FL can consistently improve model performance over training on isolated data silos, while revealing performance trade-offs of different methods under varying client heterogeneity and availability conditions. We hope FedRS-Bench will accelerate research on large-scale, realistic FL in RS by providing a standardized, rich testbed and facilitating fair comparisons across future works. The source codes and dataset are available at https://fedrs-bench.github.io/.

AIAug 3, 2025

SURE-Med: Systematic Uncertainty Reduction for Enhanced Reliability in Medical Report Generation

Yuhang Gu, Xingyu Hu, Yuyu Fan et al.

Automated medical report generation (MRG) holds great promise for reducing the heavy workload of radiologists. However, its clinical deployment is hindered by three major sources of uncertainty. First, visual uncertainty, caused by noisy or incorrect view annotations, compromises feature extraction. Second, label distribution uncertainty, stemming from long-tailed disease prevalence, biases models against rare but clinically critical conditions. Third, contextual uncertainty, introduced by unverified historical reports, often leads to factual hallucinations. These challenges collectively limit the reliability and clinical trustworthiness of MRG systems. To address these issues, we propose SURE-Med, a unified framework that systematically reduces uncertainty across three critical dimensions: visual, distributional, and contextual. To mitigate visual uncertainty, a Frontal-Aware View Repair Resampling module corrects view annotation errors and adaptively selects informative features from supplementary views. To tackle label distribution uncertainty, we introduce a Token Sensitive Learning objective that enhances the modeling of critical diagnostic sentences while reweighting underrepresented diagnostic terms, thereby improving sensitivity to infrequent conditions. To reduce contextual uncertainty, our Contextual Evidence Filter validates and selectively incorporates prior information that aligns with the current image, effectively suppressing hallucinations. Extensive experiments on the MIMIC-CXR and IU-Xray benchmarks demonstrate that SURE-Med achieves state-of-the-art performance. By holistically reducing uncertainty across multiple input modalities, SURE-Med sets a new benchmark for reliability in medical report generation and offers a robust step toward trustworthy clinical decision support.

ROFeb 27, 2022

Obstacle Avoidance of Resilient UAV Swarm Formation with Active Sensing System in the Dense Environment

Peng Peng, Wei Dong, Gang Chen et al.

This paper proposes a perception-shared and swarm trajectory global optimal (STGO) algorithm fused UAVs formation motion planning framework aided by an active sensing system. First, the point cloud received by each UAV is fit by the gaussian mixture model (GMM) and transmitted in the swarm. Resampling from the received GMM contributes to a global map, which is used as the foundation for consensus. Second, to improve flight safety, an active sensing system is designed to plan the observation angle of each UAV considering the unknown field, overlap of the field of view (FOV), velocity direction and smoothness of yaw rotation, and this planning problem is solved by the distributed particle swarm optimization (DPSO) algorithm. Last, for the formation motion planning, to ensure obstacle avoidance, the formation structure is allowed for affine transformation and is treated as the soft constraint on the control points of the B-spline. Besides, the STGO is introduced to avoid local minima. The combination of GMM communication and STGO guarantees a safe and strict consensus between UAVs. Tests on different formations in the simulation show that our algorithm can contribute to a strict consensus and has a success rate of at least 80% for obstacle avoidance in a dense environment. Besides, the active sensing system can increase the success rate of obstacle avoidance from 50% to 100% in some scenarios.

ROFeb 13, 2022

Continuous Occupancy Mapping in Dynamic Environments Using Particles

Gang Chen, Wei Dong, Peng Peng et al.

Particle-based dynamic occupancy maps were proposed in recent years to model the obstacles in dynamic environments. Current particle-based maps describe the occupancy status in discrete grid form and suffer from the grid size problem, wherein a large grid size is unfavorable for motion planning, while a small grid size lowers efficiency and causes gaps and inconsistencies. To tackle this problem, this paper generalizes the particle-based map into continuous space and builds an efficient 3D egocentric local map. A dual-structure subspace division paradigm, composed of a voxel subspace division and a novel pyramid-like subspace division, is proposed to propagate particles and update the map efficiently with the consideration of occlusions. The occupancy status of an arbitrary point in the map space can then be estimated with the particles' weights. To further enhance the performance of simultaneously modeling static and dynamic obstacles and minimize noise, an initial velocity estimation approach and a mixture model are utilized. Experimental results show that our map can effectively and efficiently model both dynamic obstacles and static obstacles. Compared to the state-of-the-art grid-form particle-based map, our map enables continuous occupancy estimation and substantially improves the performance in different resolutions.

ROJan 17, 2022

Risk-aware Trajectory Sampling for Quadrotor Obstacle Avoidance in Dynamic Environments

Gang Chen, Peng Peng, Peihan Zhang et al.

Obstacle avoidance of quadrotors in dynamic environments is still a very open problem. Current works commonly leverage traditional static maps to represent static obstacles and the detection and tracking of moving objects (DATMO) method to model dynamic obstacles separately. The detection module requires pre-training, and the dynamic obstacles can only be modeled with certain shapes, such as cylinders or ellipsoids. This work utilizes the dual-structure particle-based (DSP) dynamic occupancy map to represent the arbitrary-shaped static obstacles and dynamic obstacles simultaneously, and proposes an efficient risk-aware sampling-based local trajectory planner to realize safe flights in this map. The trajectory is planned by sampling motion primitives generated in the state space. Each motion primitive is divided into two phases: a short-term phase with a strict risk limitation and a relatively long-term phase designed to avoid high-risk regions. The risk is evaluated with the predicted particle-form future occupancy status, considering the time dimension. With an approach to split from and merge to an arbitrary global trajectory, the planner can also be used in the tasks with preplanned global trajectories. Comparison experiments show that the obstacle avoidance system composed of the DSP map and our planner performs the best in dynamic environments. In real-world tests, our quadrotor reaches a speed of 6 m/s with the motion capture system and 2.5 m/s with everything running on a low-price single-board computer.

CVNov 22, 2021

Learning Generalized Visual Odometry Using Position-Aware Optical Flow and Geometric Bundle Adjustment

Yijun Cao, Xianshi Zhang, Fuya Luo et al.

Recent visual odometry (VO) methods incorporating geometric algorithm into deep-learning architecture have shown outstanding performance on the challenging monocular VO task. Despite encouraging results are shown, previous methods ignore the requirement of generalization capability under noisy environment and various scenes. To address this challenging issue, this work first proposes a novel optical flow network (PANet). Compared with previous methods that predict optical flow as a direct regression task, our PANet computes optical flow by predicting it into the discrete position space with optical flow probability volume, and then converting it to optical flow. Next, we improve the bundle adjustment module to fit the self-supervised training pipeline by introducing multiple sampling, ego-motion initialization, dynamic damping factor adjustment, and Jacobi matrix weighting. In addition, a novel normalized photometric loss function is advanced to improve the depth estimation accuracy. The experiments show that the proposed system not only achieves comparable performance with other state-of-the-art self-supervised learning-based methods on the KITTI dataset, but also significantly improves the generalization capability compared with geometry-based, learning-based and hybrid VO systems on the noisy KITTI and the challenging outdoor (KAIST) scenes.

LGDec 24, 2020

SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II

Xiangjun Wang, Junxiao Song, Penghui Qi et al.

AlphaStar, the AI that reaches GrandMaster level in StarCraft II, is a remarkable milestone demonstrating what deep reinforcement learning can achieve in complex Real-Time Strategy (RTS) games. However, the complexities of the game, algorithms and systems, and especially the tremendous amount of computation needed are big obstacles for the community to conduct further research in this direction. We propose a deep reinforcement learning agent, StarCraft Commander (SCC). With order of magnitude less computation, it demonstrates top human performance defeating GrandMaster players in test matches and top professional players in a live event. Moreover, it shows strong robustness to various human strategies and discovers novel strategies unseen from human plays. In this paper, we will share the key insights and optimizations on efficient imitation learning and reinforcement learning for StarCraft II full game.

CVDec 1, 2020

A Unified Structure for Efficient RGB and RGB-D Salient Object Detection

Peng Peng, Yong-Jie Li

Salient object detection (SOD) has been well studied in recent years, especially using deep neural networks. However, SOD with RGB and RGB-D images is usually treated as two different tasks with different network structures that need to be designed specifically. In this paper, we proposed a unified and efficient structure with a cross-attention context extraction (CRACE) module to address both tasks of SOD efficiently. The proposed CRACE module receives and appropriately fuses two (for RGB SOD) or three (for RGB-D SOD) inputs. The simple unified feature pyramid network (FPN)-like structure with CRACE modules conveys and refines the results under the multi-level supervisions of saliency and boundaries. The proposed structure is simple yet effective; the rich context information of RGB and depth can be appropriately extracted and fused by the proposed structure efficiently. Experimental results show that our method outperforms other state-of-the-art methods in both RGB and RGB-D SOD tasks on various datasets and in terms of most metrics.

LGDec 1, 2020

How to fine-tune deep neural networks in few-shot learning?

Peng Peng, Jiugen Wang

Deep learning has been widely used in data-intensive applications. However, training a deep neural network often requires a large data set. When there is not enough data available for training, the performance of deep learning models is even worse than that of shallow networks. It has been proved that few-shot learning can generalize to new tasks with few training samples. Fine-tuning of a deep model is simple and effective few-shot learning method. However, how to fine-tune deep learning models (fine-tune convolution layer or BN layer?) still lack deep investigation. Hence, we study how to fine-tune deep models through experimental comparison in this paper. Furthermore, the weight of the models is analyzed to verify the feasibility of the fine-tuning method.

CVNov 20, 2020

Segmentation overlapping wear particles with few labelled data and imbalance sample

Peng Peng, Jiugen Wang

Ferrograph image segmentation is of significance for obtaining features of wear particles. However, wear particles are usually overlapped in the form of debris chains, which makes challenges to segment wear debris. An overlapping wear particle segmentation network (OWPSNet) is proposed in this study to segment the overlapped debris chains. The proposed deep learning model includes three parts: a region segmentation network, an edge detection network and a feature refine module. The region segmentation network is an improved U shape network, and it is applied to separate the wear debris form background of ferrograph image. The edge detection network is used to detect the edges of wear particles. Then, the feature refine module combines low-level features and high-level semantic features to obtain the final results. In order to solve the problem of sample imbalance, we proposed a square dice loss function to optimize the model. Finally, extensive experiments have been carried out on a ferrograph image dataset. Results show that the proposed model is capable of separating overlapping wear particles. Moreover, the proposed square dice loss function can improve the segmentation results, especially for the segmentation results of wear particle edge.

CVOct 14, 2020

Ferrograph image classification

Peng Peng, Jiugen Wang

It has been challenging to identify ferrograph images with a small dataset and various scales of wear particle. A novel model is proposed in this study to cope with these challenging problems. For the problem of insufficient samples, we first proposed a data augmentation algorithm based on the permutation of image patches. Then, an auxiliary loss function of image patch permutation recognition was proposed to identify the image generated by the data augmentation algorithm. Moreover, we designed a feature extraction loss function to force the proposed model to extract more abundant features and to reduce redundant representations. As for the challenge of large change range of wear particle size, we proposed a multi-scale feature extraction block to obtain the multi-scale representations of wear particles. We carried out experiments on a ferrograph image dataset and a mini-CIFAR-10 dataset. Experimental results show that the proposed model can improve the accuracy of the two datasets by 9% and 20% respectively compared with the baseline.

LGFeb 7, 2019

Artificial Intelligence for Prosthetics - challenge solutions

Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty et al.

In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.

AIDec 18, 2018

Continual Match Based Training in Pommerman: Technical Report

Peng Peng, Liang Pang, Yufeng Yuan et al.

Continual learning is the ability of agents to improve their capacities throughout multiple tasks continually. While recent works in the literature of continual learning mostly focused on developing either particular loss functions or specialized structures of neural network explaining the episodic memory or neural plasticity, we study continual learning from the perspective of the training mechanism. Specifically, we propose a COnitnual Match BAsed Training (COMBAT) framework for training a population of advantage-actor-critic (A2C) agents in Pommerman, a partially observable multi-agent environment with no communication. Following the COMBAT framework, we trained an agent, namely, Navocado, that won the title of the top 1 learning agent in the NeurIPS 2018 Pommerman Competition. Two critical features of our agent are worth mentioning. Firstly, our agent did not learn from any demonstrations. Secondly, our agent is highly reproducible. As a technical report, we articulate the design of state space, action space, reward, and most importantly, the COMBAT framework for our Pommerman agent. We show in the experiments that Pommerman is a perfect environment for studying continual learning, and the agent can improve its performance by continually learning new skills without forgetting the old ones. Finally, the result in the Pommerman Competition verifies the robustness of our agent when competing with various opponents.

CRNov 17, 2017

Towards the Adoption of Anti-spoofing Protocols

Hang Hu, Peng Peng, Gang Wang

Email spoofing is a critical step of phishing, where the attacker impersonates someone the victim knows or trusts. In this paper, we conduct a qualitative study to explore why email spoofing is still possible after years of efforts to develop and deploy anti-spoofing protocols (e.g., SPF, DKIM, DMARC). First, we measure the protocol adoption by scanning 1 million Internet domains. We find the adoption rates are still low, especially for the new DMARC (3.1%). Second, to understand the reasons behind the low-adoption rate, we collect 4293 discussion threads (25.7K messages) from the Internet Engineering Task Force (IETF), a working group formed to develop and promote Internet standards. Our analysis shows key security and usability limitations in the protocol design, which makes it difficult to generate a positive "net effect" for a wide adoption. We validate our results by interviewing email administrators and discuss key implications for future anti-spoofing solutions.

AIMar 29, 2017

Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games

Peng Peng, Ying Wen, Yaodong Yang et al.

Many artificial intelligence (AI) applications often require multiple intelligent agents to work in a collaborative effort. Efficient learning for intra-agent communication and coordination is an indispensable step towards general AI. In this paper, we take StarCraft combat game as a case study, where the task is to coordinate multiple agents as a team to defeat their enemies. To maintain a scalable yet effective communication protocol, we introduce a Multiagent Bidirectionally-Coordinated Network (BiCNet ['bIknet]) with a vectorised extension of actor-critic formulation. We show that BiCNet can handle different types of combats with arbitrary numbers of AI agents for both sides. Our analysis demonstrates that without any supervisions such as human demonstrations or labelled data, BiCNet could learn various types of advanced coordination strategies that have been commonly used by experienced game players. In our experiments, we evaluate our approach against multiple baselines under different scenarios; it shows state-of-the-art performance, and possesses potential values for large-scale real-world applications.