Yang Peng

CV
h-index4
18papers
184citations
Novelty46%
AI Score38

18 Papers

CVJul 18, 2024Code
SUSTechGAN: Image Generation for Object Detection in Adverse Conditions of Autonomous Driving

Gongjin Lan, Yang Peng, Qi Hao et al.

Autonomous driving significantly benefits from data-driven deep neural networks. However, the data in autonomous driving typically fits the long-tailed distribution, in which the critical driving data in adverse conditions is hard to collect. Although generative adversarial networks (GANs) have been applied to augment data for autonomous driving, generating driving images in adverse conditions is still challenging. In this work, we propose a novel framework, SUSTechGAN, with customized dual attention modules, multi-scale generators, and a novel loss function to generate driving images for improving object detection of autonomous driving in adverse conditions. We test the SUSTechGAN and the well-known GANs to generate driving images in adverse conditions of rain and night and apply the generated images to retrain object detection networks. Specifically, we add generated images into the training datasets to retrain the well-known YOLOv5 and evaluate the improvement of the retrained YOLOv5 for object detection in adverse conditions. The experimental results show that the generated driving images by our SUSTechGAN significantly improved the performance of retrained YOLOv5 in rain and night conditions, which outperforms the well-known GANs. The open-source code, video description and datasets are available on the page 1 to facilitate image generation development in autonomous driving under adverse conditions.

LGApr 6, 2022
Federated Reinforcement Learning with Environment Heterogeneity

Hao Jin, Yang Peng, Wenhao Yang et al.

We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. We stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state transitions. To obtain a value function or a policy function which optimizes the overall performance in all environments, we propose two federated RL algorithms, \texttt{QAvg} and \texttt{PAvg}. We theoretically prove that these algorithms converge to suboptimal solutions, while such suboptimality depends on how heterogeneous these $n$ environments are. Moreover, we propose a heuristic that achieves personalization by embedding the $n$ environments into $n$ vectors. The personalization heuristic not only improves the training but also allows for better generalization to new environments.

CVSep 7, 2022
Hardware faults that matter: Understanding and Estimating the safety impact of hardware faults on object detection DNNs

Syed Qutub, Florian Geissler, Yang Peng et al.

Object detection neural network models need to perform reliably in highly dynamic and safety-critical environments like automated driving or robotics. Therefore, it is paramount to verify the robustness of the detection under unexpected hardware faults like soft errors that can impact a systems perception module. Standard metrics based on average precision produce model vulnerability estimates at the object level rather than at an image level. As we show in this paper, this does not provide an intuitive or representative indicator of the safety-related impact of silent data corruption caused by bit flips in the underlying memory but can lead to an over- or underestimation of typical fault-induced hazards. With an eye towards safety-related real-time applications, we propose a new metric IVMOD (Image-wise Vulnerability Metric for Object Detection) to quantify vulnerability based on an incorrect image-wise object detection due to false positive (FPs) or false negative (FNs) objects, combined with a severity analysis. The evaluation of several representative object detection models shows that even a single bit flip can lead to a severe silent data corruption event with potentially critical safety implications, with e.g., up to (much greater than) 100 FPs generated, or up to approx. 90% of true positives (TPs) are lost in an image. Furthermore, with a single stuck-at-1 fault, an entire sequence of images can be affected, causing temporally persistent ghost detections that can be mistaken for actual objects (covering up to approx. 83% of the image). Furthermore, actual objects in the scene are continuously missed (up to approx. 64% of TPs are lost). Our work establishes a detailed understanding of the safety-related vulnerability of such critical workloads against hardware faults.

AINov 14, 2022
Knowledge Base Completion using Web-Based Question Answering and Multimodal Fusion

Yang Peng, Daisy Zhe Wang

Over the past few years, large knowledge bases have been constructed to store massive amounts of knowledge. However, these knowledge bases are highly incomplete. To solve this problem, we propose a web-based question answering system system with multimodal fusion of unstructured and structured information, to fill in missing information for knowledge bases. To utilize unstructured information from the Web for knowledge base completion, we design a web-based question answering system using multimodal features and question templates to extract missing facts, which can achieve good performance with very few questions. To help improve extraction quality, the question answering system employs structured information from knowledge bases, such as entity types and entity-to-entity relatedness.

CVSep 14, 2023
BEA: Revisiting anchor-based object detection DNN using Budding Ensemble Architecture

Syed Sha Qutub, Neslihan Kose, Rafael Rosales et al.

This paper introduces the Budding Ensemble Architecture (BEA), a novel reduced ensemble architecture for anchor-based object detection models. Object detection models are crucial in vision-based tasks, particularly in autonomous systems. They should provide precise bounding box detections while also calibrating their predicted confidence scores, leading to higher-quality uncertainty estimates. However, current models may make erroneous decisions due to false positives receiving high scores or true positives being discarded due to low scores. BEA aims to address these issues. The proposed loss functions in BEA improve the confidence score calibration and lower the uncertainty error, which results in a better distinction of true and false positives and, eventually, higher accuracy of the object detection models. Both Base-YOLOv3 and SSD models were enhanced using the BEA method and its proposed loss functions. The BEA on Base-YOLOv3 trained on the KITTI dataset results in a 6% and 3.7% increase in mAP and AP50, respectively. Utilizing a well-balanced uncertainty estimation threshold to discard samples in real-time even leads to a 9.6% higher AP50 than its base model. This is attributed to a 40% increase in the area under the AP50-based retention curve used to measure the quality of calibration of confidence scores. Furthermore, BEA-YOLOV3 trained on KITTI provides superior out-of-distribution detection on Citypersons, BDD100K, and COCO datasets compared to the ensembles and vanilla models of YOLOv3 and Gaussian-YOLOv3.

DBDec 4, 2022
Query-Driven Knowledge Base Completion using Multimodal Path Fusion over Multimodal Knowledge Graph

Yang Peng, Daisy Zhe Wang

Over the past few years, large knowledge bases have been constructed to store massive amounts of knowledge. However, these knowledge bases are highly incomplete, for example, over 70% of people in Freebase have no known place of birth. To solve this problem, we propose a query-driven knowledge base completion system with multimodal fusion of unstructured and structured information. To effectively fuse unstructured information from the Web and structured information in knowledge bases to achieve good performance, our system builds multimodal knowledge graphs based on question answering and rule inference. We propose a multimodal path fusion algorithm to rank candidate answers based on different paths in the multimodal knowledge graphs, achieving much better performance than question answering, rule inference and a baseline fusion algorithm. To improve system efficiency, query-driven techniques are utilized to reduce the runtime of our system, providing fast responses to user queries. Extensive experiments have been conducted to demonstrate the effectiveness and efficiency of our system.

MLSep 29, 2023
Estimation and Inference in Distributional Reinforcement Learning

Liangyu Zhang, Yang Peng, Jiadong Liang et al.

In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete return distribution (denoted $η^π$) attained by a given policy $π$. We use the certainty-equivalence method to construct our estimator $\hatη^π$, given a generative model is available. In this circumstance we need a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\varepsilon^{2p}(1-γ)^{2p+2}}\right)$ to guarantee the $p$-Wasserstein metric between $\hatη^π$ and $η^π$ less than $\varepsilon$ with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency. Also, we show that under different mild assumptions a dataset of size $\widetilde O\left(\frac{|\mathcal{S}||\mathcal{A}|}{\varepsilon^{2}(1-γ)^{4}}\right)$ suffices to ensure the Kolmogorov metric and total variation metric between $\hatη^π$ and $η^π$ is below $\varepsilon$ with high probability. Furthermore, we investigate the asymptotic behavior of $\hatη^π$. We demonstrate that the ``empirical process'' $\sqrt{n}(\hatη^π-η^π)$ converges weakly to a Gaussian process in the space of bounded functionals on Lipschitz function class $\ell^\infty(\mathcal{F}_{\text{W}})$, also in the space of bounded functionals on indicator function class $\ell^\infty(\mathcal{F}_{\text{KS}})$ and bounded measurable function class $\ell^\infty(\mathcal{F}_{\text{TV}})$ when some mild conditions hold. Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $η^π$.

LGApr 29, 2023
Semi-Infinitely Constrained Markov Decision Processes and Efficient Reinforcement Learning

Liangyu Zhang, Yang Peng, Wenhao Yang et al.

We propose a novel generalization of constrained Markov decision processes (CMDPs) that we call the \emph{semi-infinitely constrained Markov decision process} (SICMDP). Particularly, we consider a continuum of constraints instead of a finite number of constraints as in the case of ordinary CMDPs. We also devise two reinforcement learning algorithms for SICMDPs that we call SI-CRL and SI-CPO. SI-CRL is a model-based reinforcement learning algorithm. Given an estimate of the transition model, we first transform the reinforcement learning problem into a linear semi-infinitely programming (LSIP) problem and then use the dual exchange method in the LSIP literature to solve it. SI-CPO is a policy optimization algorithm. Borrowing the ideas from the cooperative stochastic approximation approach, we make alternative updates to the policy parameters to maximize the reward or minimize the cost. To the best of our knowledge, we are the first to apply tools from semi-infinitely programming (SIP) to solve constrained reinforcement learning problems. We present theoretical analysis for SI-CRL and SI-CPO, identifying their iteration complexity and sample complexity. We also conduct extensive numerical examples to illustrate the SICMDP model and demonstrate that our proposed algorithms are able to solve complex sequential decision-making tasks leveraging modern deep reinforcement learning techniques.

CVNov 10, 2023
Deep learning for 3D Object Detection and Tracking in Autonomous Driving: A Brief Survey

Yang Peng

Object detection and tracking are vital and fundamental tasks for autonomous driving, aiming at identifying and locating objects from those predefined categories in a scene. 3D point cloud learning has been attracting more and more attention among all other forms of self-driving data. Currently, there are many deep learning methods for 3D object detection. However, the tasks of object detection and tracking for point clouds still need intensive study due to the unique characteristics of point cloud data. To help get a good grasp of the present situation of this research, this paper shows recent advances in deep learning methods for 3D object detection and tracking.

LGJan 9, 2023
Finding Lookalike Customers for E-Commerce Marketing

Yang Peng, Changzheng Liu, Wei Shen

Customer-centric marketing campaigns generate a large portion of e-commerce website traffic for Walmart. As the scale of customer data grows larger, expanding the marketing audience to reach more customers is becoming more critical for e-commerce companies to drive business growth and bring more value to customers. In this paper, we present a scalable and efficient system to expand targeted audience of marketing campaigns, which can handle hundreds of millions of customers. We use a deep learning based embedding model to represent customers and an approximate nearest neighbor search method to quickly find lookalike customers of interest. The model can deal with various business interests by constructing interpretable and meaningful customer similarity metrics. We conduct extensive experiments to demonstrate the great performance of our system and customer embedding model.

MLFeb 20, 2025
A Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

Yang Peng, Kaicheng Jin, Liangyu Zhang et al.

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted Markov decision process for a given policy π. Previous works on statistical analysis of distributional TD learning mainly focus on the tabular case. In contrast, we first consider the linear function approximation setting and derive sharp finite-sample rates. Our theoretical results demonstrate that the sample complexity of linear distributional TD learning matches that of classic linear TD learning. This implies that, with linear function approximation, learning the full distribution of the return from streaming data is no more difficult than learning its expectation (value function). To derive tight sample complexity bounds, we conduct a fine-grained analysis of the linear-categorical Bellman equation and employ the exponential stability arguments for products of random matrices. Our results provide new insights into the statistical efficiency of distributional reinforcement learning algorithms.

MLMar 9, 2024
Statistical Efficiency of Distributional Temporal Difference Learning and Freedman's Inequality in Hilbert Spaces

Yang Peng, Liangyu Zhang, Zhihua Zhang

Distributional reinforcement learning (DRL) has achieved empirical success in various domains. One core task in DRL is distributional policy evaluation, which involves estimating the return distribution $η^π$ for a given policy $π$. Distributional temporal difference learning has been accordingly proposed, which extends the classic temporal difference learning (TD) in RL. In this paper, we focus on the non-asymptotic statistical rates of distributional TD. To facilitate theoretical analysis, we propose non-parametric distributional TD (NTD). For a $γ$-discounted infinite-horizon tabular Markov decision process, we show that for NTD with a generative model, we need $\tilde{O}(\varepsilon^{-2}μ_{\min}^{-1}(1-γ)^{-3})$ interactions with the environment to achieve an $\varepsilon$-optimal estimator with high probability, when the estimation error is measured by the $1$-Wasserstein. This sample complexity bound is minimax optimal up to logarithmic factors. In addition, we revisit categorical distributional TD (CTD), showing that the same non-asymptotic convergence bounds hold for CTD in the case of the $1$-Wasserstein distance. We also extend our analysis to the more general setting where the data generating process is Markovian. In the Markovian setting, we propose variance-reduced variants of NTD and CTD, and show that both can achieve a $\tilde{O}(\varepsilon^{-2} μ_{π,\min}^{-1}(1-γ)^{-3}+t_{mix}μ_{π,\min}^{-1}(1-γ)^{-1})$ sample complexity bounds in the case of the $1$-Wasserstein distance, which matches the state-of-the-art statistical results for classic policy evaluation. To achieve the sharp statistical rates, we establish a novel Freedman's inequality in Hilbert spaces. This new Freedman's inequality would be of independent interest for statistical analysis of various infinite-dimensional online learning problems.

MLNov 16, 2025
Accelerated Distributional Temporal Difference Learning with Linear Function Approximation

Kaicheng Jin, Yang Peng, Jiansheng Yang et al.

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning is to estimate the return distribution of a discounted Markov decision process for a given policy. Previous works on statistical analysis of distributional TD learning focus mainly on the tabular case. We first consider the linear function approximation setting and conduct a fine-grained analysis of the linear-categorical Bellman equation. Building on this analysis, we further incorporate variance reduction techniques in our new algorithms to establish tight sample complexity bounds independent of the support size $K$ when $K$ is large. Our theoretical results imply that, when employing distributional TD learning with linear function approximation, learning the full distribution of the return function from streaming data is no more difficult than learning its expectation. This work provide new insights into the statistical efficiency of distributional reinforcement learning algorithms.

MLMay 7, 2024
Federated Control in Markov Decision Processes

Hao Jin, Yang Peng, Liangyu Zhang et al.

We study problems of federated control in Markov Decision Processes. To solve an MDP with large state space, multiple learning agents are introduced to collaboratively learn its optimal policy without communication of locally collected experience. In our settings, these agents have limited capabilities, which means they are restricted within different regions of the overall state space during the training process. In face of the difference among restricted regions, we firstly introduce concepts of leakage probabilities to understand how such heterogeneity affects the learning process, and then propose a novel communication protocol that we call Federated-Q protocol (FedQ), which periodically aggregates agents' knowledge of their restricted regions and accordingly modifies their learning problems for further training. In terms of theoretical analysis, we justify the correctness of FedQ as a communication protocol, then give a general result on sample complexity of derived algorithms FedQ-X with the RL oracle , and finally conduct a thorough study on the sample complexity of FedQ-SynQ. Specifically, FedQ-X has been shown to enjoy linear speedup in terms of sample complexity when workload is uniformly distributed among agents. Moreover, we carry out experiments in various environments to justify the efficiency of our methods.

CVNov 28, 2021
Unsupervised Domain Adaptive Person Re-Identification via Human Learning Imitation

Yang Peng, Ping Liu, Yawei Luo et al.

Unsupervised domain adaptive person re-identification has received significant attention due to its high practical value. In past years, by following the clustering and finetuning paradigm, researchers propose to utilize the teacher-student framework in their methods to decrease the domain gap between different person re-identification datasets. Inspired by recent teacher-student framework based methods, which try to mimic the human learning process either by making the student directly copy behavior from the teacher or selecting reliable learning materials, we propose to conduct further exploration to imitate the human learning process from different aspects, \textit{i.e.}, adaptively updating learning materials, selectively imitating teacher behaviors, and analyzing learning materials structures. The explored three components, collaborate together to constitute a new method for unsupervised domain adaptive person re-identification, which is called Human Learning Imitation framework. The experimental results on three benchmark datasets demonstrate the efficacy of our proposed method.

LGAug 16, 2021
Towards a Safety Case for Hardware Fault Tolerance in Convolutional Neural Networks Using Activation Range Supervision

Florian Geissler, Syed Qutub, Sayanta Roychowdhury et al.

Convolutional neural networks (CNNs) have become an established part of numerous safety-critical computer vision applications, including human robot interactions and automated driving. Real-world implementations will need to guarantee their robustness against hardware soft errors corrupting the underlying platform memory. Based on the previously observed efficacy of activation clipping techniques, we build a prototypical safety case for classifier CNNs by demonstrating that range supervision represents a highly reliable fault detector and mitigator with respect to relevant bit flips, adopting an eight-exponent floating point data representation. We further explore novel, non-uniform range restriction methods that effectively suppress the probability of silent data corruptions and uncorrectable errors. As a safety-relevant end-to-end use case, we showcase the benefit of our approach in a vehicle classification scenario, using ResNet-50 and the traffic camera data set MIOVision. The quantitative evidence provided in this work can be leveraged to inspire further and possibly more complex CNN safety arguments.

NEOct 9, 2020
Bioinspired Bipedal Locomotion Control for Humanoid Robotics Based on EACO

Jingan Yang, Yang Peng

To construct a robot that can walk as efficiently and steadily as humans or other legged animals, we develop an enhanced elitist-mutated ant colony optimization~(EACO) algorithm with genetic and crossover operators in real-time applications to humanoid robotics or other legged robots. This work presents promoting global search capability and convergence rate of the EACO applied to humanoid robots in real-time by estimating the expected convergence rate using Markov chain. Furthermore, we put a special focus on the EACO algorithm on a wide range of problems, from ACO, real-coded GAs, GAs with neural networks~(NNs), particle swarm optimization~(PSO) to complex robotics systems including gait synthesis, dynamic modeling of parameterizable trajectories and gait optimization of humanoid robotics. The experimental results illustrate the capability of this method to discover the premature convergence probability, tackle successfully inherent stagnation, and promote the convergence rate of the EACO-based humanoid robotics systems and demonstrated the applicability and the effectiveness of our strategy for solving sophisticated optimization tasks. We found reliable and fast walking gaits with a velocity of up to 0.47m/s using the EACO optimization strategy. These findings have significant implications for understanding and tackling inherent stagnation and poor convergence rate of the EACO and provide new insight into the genetic architectures and control optimization of humanoid robotics.

AISep 11, 2020
To Root Artificial Intelligence Deeply in Basic Science for a New Generation of AI

Jingan Yang, Yang Peng

One of the ambitions of artificial intelligence is to root artificial intelligence deeply in basic science while developing brain-inspired artificial intelligence platforms that will promote new scientific discoveries. The challenges are essential to push artificial intelligence theory and applied technologies research forward. This paper presents the grand challenges of artificial intelligence research for the next 20 years which include:~(i) to explore the working mechanism of the human brain on the basis of understanding brain science, neuroscience, cognitive science, psychology and data science; (ii) how is the electrical signal transmitted by the human brain? What is the coordination mechanism between brain neural electrical signals and human activities? (iii)~to root brain-computer interface~(BCI) and brain-muscle interface~(BMI) technologies deeply in science on human behaviour; (iv)~making research on knowledge-driven visual commonsense reasoning~(VCR), develop a new inference engine for cognitive network recognition~(CNR); (v)~to develop high-precision, multi-modal intelligent perceptrons; (vi)~investigating intelligent reasoning and fast decision-making systems based on knowledge graph~(KG). We believe that the frontier theory innovation of AI, knowledge-driven modeling methodologies for commonsense reasoning, revolutionary innovation and breakthroughs of the novel algorithms and new technologies in AI, and developing responsible AI should be the main research strategies of AI scientists in the future.