SYMay 19, 2019Code
A Reinforcement Learning Approach for Intelligent Traffic Signal Control at Urban IntersectionsMengyu Guo, Pin Wang, Ching-Yao Chan et al.
Ineffective and inflexible traffic signal control at urban intersections can often lead to bottlenecks in traffic flows and cause congestion, delay, and environmental problems. How to manage traffic smartly by intelligent signal control is a significant challenge in urban traffic management. With recent advances in machine learning, especially reinforcement learning (RL), traffic signal control using advanced machine learning techniques represents a promising solution to tackle this problem. In this paper, we propose a RL approach for traffic signal control at urban intersections. Specifically, we use neural networks as Q-function approximator (a.k.a. Q-network) to deal with the complex traffic signal control problem where the state space is large and the action space can be discrete. The state space is defined based on real-time traffic information, i.e. vehicle position, direction and speed. The action space includes various traffic signal phases which are critical in generating a reasonable and realistic control mechanism, given the prominent spatial-temporal characteristics of urban traffic. In the simulation experiment, we use SUMO, an open source traffic simulator, to construct realistic urban intersection settings. Moreover, we use different traffic patterns, such as major/minor road traffic, through/left-turn lane traffic, tidal traffic, and varying demand traffic, to train a generalized traffic signal control model that can be adapted to various traffic conditions. The simulation results demonstrate the convergence and generalization performance of our RL approach as well as its significant benefits in terms of queue length and wait time over several benchmarking methods in traffic signal control.
CVMay 22
Coloring the Noise: Adversarial Sobolev Alignment for Faithful Image Super ResolutionHongbo Wang, Huaibo Huang, Pin Wang et al.
Generative priors in Image Super-Resolution (SR) often compromise faithful restoration, we attribute this limitation to a fundamental spectral misalignment between isotropic objectives and the intrinsic natural image manifold. While Direct Preference Optimization offers a path to alignment, its reliance on spectrally flat Gaussian noise fails to distinguish authentic high-frequency details from hallucinations. To bridge this geometric gap, we propose ASASR, a theoretically grounded framework that recasts the generative flow into a Sobolev-induced Riemannian geometry by explicitly coloring the noise transition kernel to mirror natural spectral decay. Driving this geometric alignment, we integrate a parametric adversary grounded in the Riesz Representation Theorem, which synthesizes targeted negative samples equivalent to worst-case Sobolev gradients to direct optimization along the tangent space of plausible structural failures. Extensive evaluations demonstrate that ASASR outperforms leading generative baselines, particularly in preserving spectral consistency and structural fidelity, offering a robust solution that effectively mitigates artifacts.
LGJun 25, 2022
Envelope imbalanced ensemble model with deep sample learning and local-global structure consistencyFan Li, Xiaoheng Zhang, Yongming Li et al.
The class imbalance problem is important and challenging. Ensemble approaches are widely used to tackle this problem because of their effectiveness. However, existing ensemble methods are always applied into original samples, while not considering the structure information among original samples. The limitation will prevent the imbalanced learning from being better. Besides, research shows that the structure information among samples includes local and global structure information. Based on the analysis above, an imbalanced ensemble algorithm with the deep sample pre-envelope network (DSEN) and local-global structure consistency mechanism (LGSCM) is proposed here to solve the problem.This algorithm can guarantee high-quality deep envelope samples for considering the local manifold and global structures information, which is helpful for imbalance learning. First, the deep sample envelope pre-network (DSEN) is designed to mine structure information among samples.Then, the local manifold structure metric (LMSM) and global structure distribution metric (GSDM) are designed to construct LGSCM to enhance distribution consistency of interlayer samples. Next, the DSEN and LGSCM are put together to form the final deep sample envelope network (DSEN-LG). After that, base classifiers are applied on the layers of deep samples respectively.Finally, the predictive results from base classifiers are fused through bagging ensemble learning mechanism. To demonstrate the effectiveness of the proposed method, forty-four public datasets and more than ten representative relevant algorithms are chosen for verification. The experimental results show that the algorithm is significantly better than other imbalanced ensemble algorithms.
LGNov 30, 2022
Overlapping oriented imbalanced ensemble learning method based on projective clustering and stagewise hybrid samplingFan Li, Bo Wang, Pin Wang et al.
The challenge of imbalanced learning lies not only in class imbalance problem, but also in the class overlapping problem which is complex. However, most of the existing algorithms mainly focus on the former. The limitation prevents the existing methods from breaking through. To address this limitation, this paper proposes an ensemble learning algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS). The DCSHS has three parts. Firstly, we design a projection clustering combination framework (PCC) guided by Davies-Bouldin clustering effectiveness index (DBI), which is used to obtain high-quality clusters and combine them to obtain a set of cross-complete subsets (CCS) with balanced class and low overlapping. Secondly, according to the characteristics of subset classes, a stage-wise hybrid sampling algorithm is designed to realize the de-overlapping and balancing of subsets. Finally, a projective clustering transfer mapping mechanism (CTM) is constructed for all processed subsets by means of transfer learning, thereby reducing class overlapping and explore structure information of samples. The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples, and learn as much information of overlapping samples as possible, thereby enhancing the class overlapping while class balancing. In the experimental section, more than 30 public datasets and over ten representative algorithms are chosen for verification. The experimental results show that the DCSHS is significantly best in terms of various evaluation criteria.
LGOct 25, 2022
A new Stack Autoencoder: Neighbouring Sample Envelope Embedded Stack Autoencoder Ensemble ModelChuanyan Zhou, Jie Ma, Fan Li et al.
Stack autoencoder (SAE), as a representative deep network, has unique and excellent performance in feature learning, and has received extensive attention from researchers. However, existing deep SAEs focus on original samples without considering the hierarchical structural information between samples. To address this limitation, this paper proposes a new SAE model-neighbouring envelope embedded stack autoencoder ensemble (NE_ESAE). Firstly, the neighbouring sample envelope learning mechanism (NSELM) is proposed for preprocessing of input of SAE. NSELM constructs sample pairs by combining neighbouring samples. Besides, the NSELM constructs a multilayer sample spaces by multilayer iterative mean clustering, which considers the similar samples and generates layers of envelope samples with hierarchical structural information. Second, an embedded stack autoencoder (ESAE) is proposed and trained in each layer of sample space to consider the original samples during training and in the network structure, thereby better finding the relationship between original feature samples and deep feature samples. Third, feature reduction and base classifiers are conducted on the layers of envelope samples respectively, and output classification results of every layer of samples. Finally, the classification results of the layers of envelope sample space are fused through the ensemble mechanism. In the experimental section, the proposed algorithm is validated with over ten representative public datasets. The results show that our method significantly has better performance than existing traditional feature learning methods and the representative deep autoencoders.
CLDec 22, 2023Code
YAYI 2: Multilingual Open-Source Large Language ModelsYin Luo, Qingchao Kong, Nan Xu et al.
As the latest advancements in natural language processing, large language models (LLMs) have achieved human-level language understanding and generation abilities in many real-world tasks, and even have been regarded as a potential path to the artificial general intelligence. To better facilitate research on LLMs, many open-source LLMs, such as Llama 2 and Falcon, have recently been proposed and gained comparable performances to proprietary models. However, these models are primarily designed for English scenarios and exhibit poor performances in Chinese contexts. In this technical report, we propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback. Extensive experiments on multiple benchmarks, such as MMLU and CMMLU, consistently demonstrate that the proposed YAYI 2 outperforms other similar sized open-source models.
CVMar 23
Tuning Real-World Image Restoration at Inference: A Test-Time Scaling Paradigm for Flow Matching ModelsPurui Bai, Junxian Duan, Pin Wang et al.
Although diffusion-based real-world image restoration (Real-IR) has achieved remarkable progress, efficiently leveraging ultra-large-scale pre-trained text-to-image (T2I) models and fully exploiting their potential remain significant challenges. To address this issue, we propose ResFlow-Tuner, an image restoration framework based on the state-of-the-art flow matching model, FLUX.1-dev, which integrates unified multi-modal fusion (UMMF) with test-time scaling (TTS) to achieve unprecedented restoration performance. Our approach fully leverages the advantages of the Multi-Modal Diffusion Transformer (MM-DiT) architecture by encoding multi-modal conditions into a unified sequence that guides the synthesis of high-quality images. Furthermore, we introduce a training-free test-time scaling paradigm tailored for image restoration. During inference, this technique dynamically steers the denoising direction through feedback from a reward model (RM), thereby achieving significant performance gains with controllable computational overhead. Extensive experiments demonstrate that our method achieves state-of-the-art performance across multiple standard benchmarks. This work not only validates the powerful capabilities of the flow matching model in low-level vision tasks but, more importantly, proposes a novel and efficient inference-time scaling paradigm suitable for large pre-trained models.
SDNov 17, 2021
Subject Enveloped Deep Sample Fuzzy Ensemble Learning Algorithm of Parkinson's Speech DataYiwen Wang, Fan Li, Xiaoheng Zhang et al.
Parkinson disease (PD)'s speech recognition is an effective way for its diagnosis, which has become a hot and difficult research area in recent years. As we know, there are large corpuses (segments) within one subject. However, too large segments will increase the complexity of the classification model. Besides, the clinicians interested in finding diagnostic speech markers that reflect the pathology of the whole subject. Since the optimal relevant features of each speech sample segment are different, it is difficult to find the uniform diagnostic speech markers. Therefore, it is necessary to reconstruct the existing large segments within one subject into few segments even one segment within one subject, which can facilitate the extraction of relevant speech features to characterize diagnostic markers for the whole subject. To address this problem, an enveloped deep speech sample learning algorithm for Parkinson's subjects based on multilayer fuzzy c-mean (MlFCM) clustering and interlayer consistency preservation is proposed in this paper. The algorithm can be used to achieve intra-subject sample reconstruction for Parkinson's disease (PD) to obtain a small number of high-quality prototype sample segments. At the end of the paper, several representative PD speech datasets are selected and compared with the state-of-the-art related methods, respectively. The experimental results show that the proposed algorithm is effective signifcantly.
AINov 2, 2021
Envelope Imbalance Learning Algorithm based on Multilayer Fuzzy C-means Clustering and Minimum Interlayer discrepancyFan Li, Xiaoheng Zhang, Pin Wang et al.
Imbalanced learning is important and challenging since the problem of the classification of imbalanced datasets is prevalent in machine learning and data mining fields. Sampling approaches are proposed to address this issue, and cluster-based oversampling methods have shown great potential as they aim to simultaneously tackle between-class and within-class imbalance issues. However, all existing clustering methods are based on a one-time approach. Due to the lack of a priori knowledge, improper setting of the number of clusters often exists, which leads to poor clustering performance. Besides, the existing methods are likely to generate noisy instances. To solve these problems, this paper proposes a deep instance envelope network-based imbalanced learning algorithm with the multilayer fuzzy c-means (MlFCM) and a minimum interlayer discrepancy mechanism based on the maximum mean discrepancy (MIDMD). This algorithm can guarantee high quality balanced instances using a deep instance envelope network in the absence of prior knowledge. In the experimental section, thirty-three popular public datasets are used for verification, and over ten representative algorithms are used for comparison. The experimental results show that the proposed approach significantly outperforms other popular methods.
SDAug 23, 2021
Subject Envelope based Multitype Reconstruction Algorithm of Speech Samples of Parkinson's DiseaseYongming Li, Chengyu Liu, Pin Wang et al.
The risk of Parkinson's disease (PD) is extremely serious, and PD speech recognition is an effective method of diagnosis nowadays. However, due to the influence of the disease stage, corpus, and other factors on data collection, the ability of every samples within one subject to reflect the status of PD vary. No samples are useless totally, and not samples are 100% perfect. This characteristic means that it is not suitable just to remove some samples or keep some samples. It is necessary to consider the sample transformation for obtaining high quality new samples. Unfortunately, existing PD speech recognition methods focus mainly on feature learning and classifier design rather than sample learning, and few methods consider the sample transformation. To solve the problem above, a PD speech sample transformation algorithm based on multitype reconstruction operators is proposed in this paper. The algorithm is divided into four major steps. Three types of reconstruction operators are designed in the algorithm: types A, B and C. Concerning the type A operator, the original dataset is directly reconstructed by designing a linear transformation to obtain the first dataset. The type B operator is designed for clustering and linear transformation of the dataset to obtain the second new dataset. The third operator, namely, the type C operator, reconstructs the dataset by clustering and convolution to obtain the third dataset. Finally, the base classifier is trained based on the three new datasets, and then the classification results are fused by decision weighting. In the experimental section, two representative PD speech datasets are used for verification. The results show that the proposed algorithm is effective. Compared with other algorithms, the proposed algorithm achieves apparent improvements in terms of classification accuracy.
IVAug 9, 2021
FA-GAN: Fused Attentive Generative Adversarial Networks for MRI Image Super-ResolutionMingfeng Jiang, Minghao Zhi, Liying Wei et al.
High-resolution magnetic resonance images can provide fine-grained anatomical information, but acquiring such data requires a long scanning time. In this paper, a framework called the Fused Attentive Generative Adversarial Networks(FA-GAN) is proposed to generate the super-resolution MR image from low-resolution magnetic resonance images, which can reduce the scanning time effectively but with high resolution MR images. In the framework of the FA-GAN, the local fusion feature block, consisting of different three-pass networks by using different convolution kernels, is proposed to extract image features at different scales. And the global feature fusion module, including the channel attention module, the self-attention module, and the fusion operation, is designed to enhance the important features of the MR image. Moreover, the spectral normalization process is introduced to make the discriminator network stable. 40 sets of 3D magnetic resonance images (each set of images contains 256 slices) are used to train the network, and 10 sets of images are used to test the proposed method. The experimental results show that the PSNR and SSIM values of the super-resolution magnetic resonance image generated by the proposed FA-GAN method are higher than the state-of-the-art reconstruction methods.
ROMay 29, 2021
A Survey of Deep Reinforcement Learning Algorithms for Motion Planning and Control of Autonomous VehiclesFei Ye, Shen Zhang, Pin Wang et al.
In this survey, we systematically summarize the current literature on studies that apply reinforcement learning (RL) to the motion planning and control of autonomous vehicles. Many existing contributions can be attributed to the pipeline approach, which consists of many hand-crafted modules, each with a functionality selected for the ease of human interpretation. However, this approach does not automatically guarantee maximal performance due to the lack of a system-level optimization. Therefore, this paper also presents a growing trend of work that falls into the end-to-end approach, which typically offers better performance and smaller system scales. However, their performance also suffers from the lack of expert data and generalization issues. Finally, the remaining challenges applying deep RL algorithms on autonomous driving are summarized, and future research directions are also presented to tackle these challenges.
LGMar 23, 2021
Meta-Adversarial Inverse Reinforcement Learning for Decision-making TasksPin Wang, Hanhan Li, Ching-Yao Chan
Learning from demonstrations has made great progress over the past few years. However, it is generally data hungry and task specific. In other words, it requires a large amount of data to train a decent model on a particular task, and the model often fails to generalize to new tasks that have a different distribution. In practice, demonstrations from new tasks will be continuously observed and the data might be unlabeled or only partially labeled. Therefore, it is desirable for the trained model to adapt to new tasks that have limited data samples available. In this work, we build an adaptable imitation learning model based on the integration of Meta-learning and Adversarial Inverse Reinforcement Learning (Meta-AIRL). We exploit the adversarial learning and inverse reinforcement learning mechanisms to learn policies and reward functions simultaneously from available training tasks and then adapt them to new tasks with the meta-learning framework. Simulation results show that the adapted policy trained with Meta-AIRL can effectively learn from limited number of demonstrations, and quickly reach the performance comparable to that of the experts on unseen tasks.
CVMar 11, 2021
Integrated Age Estimation MechanismFan Li, Yongming Li, Pin Wang et al.
Machine-learning-based age estimation has received lots of attention. Traditional age estimation mechanism focuses estimation age error, but ignores that there is a deviation between the estimated age and real age due to disease. Pathological age estimation mechanism the author proposed before introduces age deviation to solve the above problem and improves classification capability of the estimated age significantly. However,it does not consider the age estimation error of the normal control (NC) group and results in a larger error between the estimated age and real age of NC group. Therefore, an integrated age estimation mechanism based on Decision-Level fusion of error and deviation orientation model is proposed to solve the problem.Firstly, the traditional age estimation and pathological age estimation mechanisms are weighted together.Secondly, their optimal weights are obtained by minimizing mean absolute error (MAE) between the estimated age and real age of normal people. In the experimental section, several representative age-related datasets are used for verification of the proposed method. The results show that the proposed age estimation mechanism achieves a good tradeoff effect of age estimation. It not only improves the classification ability of the estimated age, but also reduces the age estimation error of the NC group. In general, the proposed age estimation mechanism is effective. Additionally, the mechanism is a framework mechanism that can be used to construct different specific age estimation algorithms, contributing to relevant research.
LGAug 28, 2020
Meta Reinforcement Learning-Based Lane Change Strategy for Autonomous VehiclesFei Ye, Pin Wang, Ching-Yao Chan et al.
Recent advances in supervised learning and reinforcement learning have provided new opportunities to apply related methodologies to automated driving. However, there are still challenges to achieve automated driving maneuvers in dynamically changing environments. Supervised learning algorithms such as imitation learning can generalize to new environments by training on a large amount of labeled data, however, it can be often impractical or cost-prohibitive to obtain sufficient data for each new environment. Although reinforcement learning methods can mitigate this data-dependency issue by training the agent in a trial-and-error way, they still need to re-train policies from scratch when adapting to new environments. In this paper, we thus propose a meta reinforcement learning (MRL) method to improve the agent's generalization capabilities to make automated lane-changing maneuvers at different traffic environments, which are formulated as different traffic congestion levels. Specifically, we train the model at light to moderate traffic densities and test it at a new heavy traffic density condition. We use both collision rate and success rate to quantify the safety and effectiveness of the proposed model. A benchmark model is developed based on a pretraining method, which uses the same network structure and training tasks as our proposed model for fair comparison. The simulation results shows that the proposed method achieves an overall success rate up to 20% higher than the benchmark model when it is generalized to the new environment of heavy traffic density. The collision rate is also reduced by up to 18% than the benchmark model. Finally, the proposed model shows more stable and efficient generalization capabilities adapting to the new environment, and it can achieve 100% successful rate and 0% collision rate with only a few steps of gradient updates.
CVJun 20, 2020
Deep Double-Side Learning Ensemble Model for Few-Shot Parkinson Speech RecognitionYongming Li, Lang Zhou, Lingyun Qin et al.
Diagnosis and therapeutic effect assessment of Parkinson disease based on voice data are very important,but its few-shot learning problem is challenging.Although deep learning is good at automatic feature extraction, it suffers from few-shot learning problem. Therefore, the general effective method is first conduct feature extraction based on prior knowledge, and then carry out feature reduction for subsequent classification. However, there are two major problems: 1) Structural information among speech features has not been mined and new features of higher quality have not been reconstructed. 2) Structural information between data samples has not been mined and new samples with higher quality have not been reconstructed. To solve these two problems, based on the existing Parkinson speech feature data set, a deep double-side learning ensemble model is designed in this paper that can reconstruct speech features and samples deeply and simultaneously. As to feature reconstruction, an embedded deep stacked group sparse auto-encoder is designed in this paper to conduct nonlinear feature transformation, so as to acquire new high-level deep features, and then the deep features are fused with original speech features by L1 regularization feature selection method. As to speech sample reconstruction, a deep sample learning algorithm is designed in this paper based on iterative mean clustering to conduct samples transformation, so as to obtain new high-level deep samples. Finally, the bagging ensemble learning mode is adopted to fuse the deep feature learning algorithm and the deep samples learning algorithm together, thereby constructing a deep double-side learning ensemble model. At the end of this paper, two representative speech datasets of Parkinson's disease were used for verification. The experimental results show that the proposed algorithm are effective.
LGFeb 17, 2020
Hybrid Embedded Deep Stacked Sparse Autoencoder with w_LPPD SVM EnsembleYongming Li, Yan Lei, Pin Wang et al.
Deep learning is a kind of feature learning method with strong nonliear feature transformation and becomes more and more important in many fields of artificial intelligence. Deep autoencoder is one representative method of the deep learning methods, and can effectively extract abstract the information of datasets. However, it does not consider the complementarity between the deep features and original features during deep feature transformation. Besides, it suffers from small sample problem. In order to solve these problems, a novel deep autoencoder - hybrid feature embedded stacked sparse autoencoder(HESSAE) has been proposed in this paper. HFESAE is capable to learn discriminant deep features with the help of embedding original features to filter weak hidden-layer outputs during training. For the issue that class representation ability of abstract information is limited by small sample problem, a feature fusion strategy has been designed aiming to combining abstract information learned by HFESAE with original feature and obtain hybrid features for feature reduction. The strategy is hybrid feature selection strategy based on L1 regularization followed by an support vector machine(SVM) ensemble model, in which weighted local discriminant preservation projection (w_LPPD), is designed and employed on each base classifier. At the end of this paper, several representative public datasets are used to verify the effectiveness of the proposed algorithm. The experimental results demonstrated that, the proposed feature learning method yields superior performance compared to other existing and state of art feature learning algorithms including some representative deep autoencoder methods.
LGFeb 10, 2020
Classification Algorithm of Speech Data of Parkinsons Disease Based on Convolution Sparse Kernel Transfer Learning with Optimal Kernel and Parallel Sample Feature SelectionXiaoheng Zhang, Yongming Li, Pin Wang et al.
Labeled speech data from patients with Parkinsons disease (PD) are scarce, and the statistical distributions of training and test data differ significantly in the existing datasets. To solve these problems, dimensional reduction and sample augmentation must be considered. In this paper, a novel PD classification algorithm based on sparse kernel transfer learning combined with a parallel optimization of samples and features is proposed. Sparse transfer learning is used to extract effective structural information of PD speech features from public datasets as source domain data, and the fast ADDM iteration is improved to enhance the information extraction performance. To implement the parallel optimization, the potential relationships between samples and features are considered to obtain high-quality combined features. First, features are extracted from a specific public speech dataset to construct a feature dataset as the source domain. Then, the PD target domain, including the training and test datasets, is encoded by convolution sparse coding, which can extract more in-depth information. Next, parallel optimization is implemented. To further improve the classification performance, a convolution kernel optimization mechanism is designed. Using two representative public datasets and one self-constructed dataset, the experiments compare over thirty relevant algorithms. The results show that when taking the Sakar dataset, MaxLittle dataset and DNSH dataset as target domains, the proposed algorithm achieves obvious improvements in classification accuracy. The study also found large improvements in the algorithms in this paper compared with nontransfer learning approaches, demonstrating that transfer learning is both more effective and has a more acceptable time cost.
LGFeb 7, 2020
Automated Lane Change Strategy using Proximal Policy Optimization-based Deep Reinforcement LearningFei Ye, Xuxin Cheng, Pin Wang et al.
Lane-change maneuvers are commonly executed by drivers to follow a certain routing plan, overtake a slower vehicle, adapt to a merging lane ahead, etc. However, improper lane change behaviors can be a major cause of traffic flow disruptions and even crashes. While many rule-based methods have been proposed to solve lane change problems for autonomous driving, they tend to exhibit limited performance due to the uncertainty and complexity of the driving environment. Machine learning-based methods offer an alternative approach, as Deep reinforcement learning (DRL) has shown promising success in many application domains including robotic manipulation, navigation, and playing video games. However, applying DRL to autonomous driving still faces many practical challenges in terms of slow learning rates, sample inefficiency, and safety concerns. In this study, we propose an automated lane change strategy using proximal policy optimization-based deep reinforcement learning, which shows great advantages in learning efficiency while still maintaining stable performance. The trained agent is able to learn a smooth, safe, and efficient driving policy to make lane-change decisions (i.e. when and how) in a challenging situation such as dense traffic scenarios. The effectiveness of the proposed policy is validated by using metrics of task success rate and collision rate. The simulation results demonstrate the lane change maneuvers can be efficiently learned and executed in a safe, smooth, and efficient manner.
CVFeb 4, 2020
Adversarially Robust Frame Sampling with Bounded IrregularitiesHanhan Li, Pin Wang
In recent years, video analysis tools for automatically extracting meaningful information from videos are widely studied and deployed. Because most of them use deep neural networks which are computationally expensive, feeding only a subset of video frames into such algorithms is desired. Sampling the frames with fixed rate is always attractive for its simplicity, representativeness, and interpretability. For example, a popular cloud video API generated video and shot labels by processing only the first frame of every second in a video. However, one can easily attack such strategies by placing chosen frames at the sampled locations. In this paper, we present an elegant solution to this sampling problem that is provably robust against adversarial attacks and introduces bounded irregularities as well.
CVFeb 4, 2020
TPPO: A Novel Trajectory Predictor with Pseudo OracleBiao Yang, Caizhen He, Pin Wang et al.
Forecasting pedestrian trajectories in dynamic scenes remains a critical problem in various applications, such as autonomous driving and socially aware robots. Such forecasting is challenging due to human-human and human-object interactions and future uncertainties caused by human randomness. Generative model-based methods handle future uncertainties by sampling a latent variable. However, few studies explored the generation of the latent variable. In this work, we propose the Trajectory Predictor with Pseudo Oracle (TPPO), which is a generative model-based trajectory predictor. The first pseudo oracle is pedestrians' moving directions, and the second one is the latent variable estimated from ground truth trajectories. A social attention module is used to aggregate neighbors' interactions based on the correlation between pedestrians' moving directions and future trajectories. This correlation is inspired by the fact that pedestrians' future trajectories are often influenced by pedestrians in front. A latent variable predictor is proposed to estimate latent variable distributions from observed and ground-truth trajectories. Moreover, the gap between these two distributions is minimized during training. Therefore, the latent variable predictor can estimate the latent variable from observed trajectories to approximate that estimated from ground-truth trajectories. We compare the performance of TPPO with related methods on several public datasets. Results demonstrate that TPPO outperforms state-of-the-art methods with low average and final displacement errors. The ablation study shows that the prediction performance will not dramatically decrease as sampling times decline during tests.
CVFeb 2, 2020
A Novel Graph based Trajectory Predictor with Pseudo OracleBiao Yang, Guocheng Yan, Pin Wang et al.
Pedestrian trajectory prediction in dynamic scenes remains a challenging and critical problem in numerous applications, such as self-driving cars and socially aware robots. Challenges concentrate on capturing pedestrians' motion patterns and social interactions, as well as handling the future uncertainties. Recent studies focus on modeling pedestrians' motion patterns with recurrent neural networks, capturing social interactions with pooling-based or graph-based methods, and handling future uncertainties by using random Gaussian noise as the latent variable. However, they do not integrate specific obstacle avoidance experience (OAE) that may improve prediction performance. For example, pedestrians' future trajectories are always influenced by others in front. Here we propose GTPPO (Graph-based Trajectory Predictor with Pseudo Oracle), an encoder-decoder-based method conditioned on pedestrians' future behaviors. Pedestrians' motion patterns are encoded with a long short-term memory unit, which introduces the temporal attention to highlight specific time steps. Their interactions are captured by a graph-based attention mechanism, which draws OAE into the data-driven learning process of graph attention. Future uncertainties are handled by generating multi-modal outputs with an informative latent variable. Such a variable is generated by a novel pseudo oracle predictor, which minimizes the knowledge gap between historical and ground-truth trajectories. Finally, the GTPPO is evaluated on ETH, UCY and Stanford Drone datasets, and the results demonstrate state-of-the-art performance. Besides, the qualitative evaluations show successful cases of handling sudden motion changes in the future. Such findings indicate that GTPPO can peek into the future.
LGNov 29, 2019
Quadratic Q-network for Learning Continuous Control for Autonomous VehiclesPin Wang, Hanhan Li, Ching-Yao Chan
Reinforcement Learning algorithms have recently been proposed to learn time-sequential control policies in the field of autonomous driving. Direct applications of Reinforcement Learning algorithms with discrete action space will yield unsatisfactory results at the operational level of driving where continuous control actions are actually required. In addition, the design of neural networks often fails to incorporate the domain knowledge of the targeting problem such as the classical control theories in our case. In this paper, we propose a hybrid model by combining Q-learning and classic PID (Proportion Integration Differentiation) controller for handling continuous vehicle control problems under dynamic driving environment. Particularly, instead of using a big neural network as Q-function approximation, we design a Quadratic Q-function over actions with multiple simple neural networks for finding optimal values within a continuous space. We also build an action network based on the domain knowledge of the control mechanism of a PID controller to guide the agent to explore optimal actions more efficiently.We test our proposed approach in simulation under two common but challenging driving situations, the lane change scenario and ramp merge scenario. Results show that the autonomous vehicle agent can successfully learn a smooth and efficient driving behavior in both situations.
AINov 19, 2019
Decision Making for Autonomous Driving via Augmented Adversarial Inverse Reinforcement LearningPin Wang, Dapeng Liu, Jiayu Chen et al.
Making decisions in complex driving environments is a challenging task for autonomous agents. Imitation learning methods have great potentials for achieving such a goal. Adversarial Inverse Reinforcement Learning (AIRL) is one of the state-of-art imitation learning methods that can learn both a behavioral policy and a reward function simultaneously, yet it is only demonstrated in simple and static environments where no interactions are introduced. In this paper, we improve and stabilize AIRL's performance by augmenting it with semantic rewards in the learning framework. Additionally, we adapt the augmented AIRL to a more practical and challenging decision-making task in a highly interactive environment in autonomous driving. The proposed method is compared with four baselines and evaluated by four performance metrics. Simulation results show that the augmented AIRL outperforms all the baseline methods, and its performance is comparable with that of the experts on all of the four metrics.
LGJun 6, 2019
Intention-aware Long Horizon Trajectory Prediction of Surrounding Vehicles using Dual LSTM NetworksLong Xin, Pin Wang, Ching-Yao Chan et al.
As autonomous vehicles (AVs) need to interact with other road users, it is of importance to comprehensively understand the dynamic traffic environment, especially the future possible trajectories of surrounding vehicles. This paper presents an algorithm for long-horizon trajectory prediction of surrounding vehicles using a dual long short term memory (LSTM) network, which is capable of effectively improving prediction accuracy in strongly interactive driving environments. In contrast to traditional approaches which require trajectory matching and manual feature selection, this method can automatically learn high-level spatial-temporal features of driver behaviors from naturalistic driving data through sequence learning. By employing two blocks of LSTMs, the proposed method feeds the sequential trajectory to the first LSTM for driver intention recognition as an intermediate indicator, which is immediately followed by a second LSTM for future trajectory prediction. Test results from real-world highway driving data show that the proposed method can, in comparison to state-of-art methods, output more accurate and reasonable estimate of different future trajectories over 5s time horizon with root mean square error (RMSE) for longitudinal and lateral prediction less than 5.77m and 0.49m, respectively.
ROJun 5, 2019
Continuous Control for Automated Lane Change Behavior Based on Deep Deterministic Policy Gradient AlgorithmPin Wang, Hanhan Li, Ching-Yao Chan
Lane change is a challenging task which requires delicate actions to ensure safety and comfort. Some recent studies have attempted to solve the lane-change control problem with Reinforcement Learning (RL), yet the action is confined to discrete action space. To overcome this limitation, we formulate the lane change behavior with continuous action in a model-free dynamic driving environment based on Deep Deterministic Policy Gradient (DDPG). The reward function, which is critical for learning the optimal policy, is defined by control values, position deviation status, and maneuvering time to provide the RL agent informative signals. The RL agent is trained from scratch without resorting to any prior knowledge of the environment and vehicle dynamics since they are not easy to obtain. Seven models under different hyperparameter settings are compared. A video showing the learning progress of the driving behavior is available. It demonstrates the RL vehicle agent initially runs out of road boundary frequently, but eventually has managed to smoothly and stably change to the target lane with a success rate of 100% under diverse driving situations in simulation.
ROApr 23, 2019
Driving Decision and Control for Autonomous Lane Change based on Deep Reinforcement LearningTianyu Shi, Pin Wang, Xuxin Cheng et al.
We apply Deep Q-network (DQN) with the consideration of safety during the task for deciding whether to conduct the maneuver. Furthermore, we design two similar Deep Q learning frameworks with quadratic approximator for deciding how to select a comfortable gap and just follow the preceding vehicle. Finally, a polynomial lane change trajectory is generated and Pure Pursuit Control is implemented for path tracking. We demonstrate the effectiveness of this framework in simulation, from both the decision-making and control layers. The proposed architecture also has the potential to be extended to other autonomous driving scenarios.
ROJan 31, 2019
A Data Driven Method of Optimizing Feedforward Compensator for Autonomous VehicleTianyu Shi, Pin Wang, Ching-Yao Chan et al.
A reliable controller is critical and essential for the execution of safe and smooth maneuvers of an autonomous vehicle.The controller must be robust to external disturbances, such as road surface, weather, and wind conditions, and so on.It also needs to deal with the internal parametric variations of vehicle sub-systems, including power-train efficiency, measurement errors, time delay,so on.Moreover, as in most production vehicles, the low-control commands for the engine, brake, and steering systems are delivered through separate electronic control units.These aforementioned factors introduce opaque and ineffectiveness issues in controller performance.In this paper, we design a feed-forward compensate process via a data-driven method to model and further optimize the controller performance.We apply the principal component analysis to the extraction of most influential features.Subsequently,we adopt a time delay neural network and include the accuracy of the predicted error in a future time horizon.Utilizing the predicted error,we then design a feed-forward compensate process to improve the control performance.Finally,we demonstrate the effectiveness of the proposed feed-forward compensate process in simulation scenarios.
ROApr 21, 2018
A Reinforcement Learning Based Approach for Automated Lane Change ManeuversPin Wang, Ching-Yao Chan, Arnaud de La Fortelle
Lane change is a crucial vehicle maneuver which needs coordination with surrounding vehicles. Automated lane changing functions built on rule-based models may perform well under pre-defined operating conditions, but they may be prone to failure when unexpected situations are encountered. In our study, we proposed a Reinforcement Learning based approach to train the vehicle agent to learn an automated lane change behavior such that it can intelligently make a lane change under diverse and even unforeseen scenarios. Particularly, we treated both state space and action space as continuous, and designed a Q-function approximator that has a closed- form greedy policy, which contributes to the computation efficiency of our deep Q-learning algorithm. Extensive simulations are conducted for training the algorithm, and the results illustrate that the Reinforcement Learning based vehicle agent is capable of learning a smooth and efficient driving policy for lane change maneuvers.
AIMar 25, 2018
Autonomous Ramp Merge Maneuver Based on Reinforcement Learning with Continuous Action SpacePin Wang, Ching-Yao Chan
Ramp merging is a critical maneuver for road safety and traffic efficiency. Most of the current automated driving systems developed by multiple automobile manufacturers and suppliers are typically limited to restricted access freeways only. Extending the automated mode to ramp merging zones presents substantial challenges. One is that the automated vehicle needs to incorporate a future objective (e.g. a successful and smooth merge) and optimize a long-term reward that is impacted by subsequent actions when executing the current action. Furthermore, the merging process involves interaction between the merging vehicle and its surrounding vehicles whose behavior may be cooperative or adversarial, leading to distinct merging countermeasures that are crucial to successfully complete the merge. In place of the conventional rule-based approaches, we propose to apply reinforcement learning algorithm on the automated vehicle agent to find an optimal driving policy by maximizing the long-term reward in an interactive driving environment. Most importantly, in contrast to most reinforcement learning applications in which the action space is resolved as discrete, our approach treats the action space as well as the state space as continuous without incurring additional computational costs. Our unique contribution is the design of the Q-function approximation whose format is structured as a quadratic function, by which simple but effective neural networks are used to estimate its coefficients. The results obtained through the implementation of our training platform demonstrate that the vehicle agent is able to learn a safe, smooth and timely merging policy, indicating the effectiveness and practicality of our approach.
ROMar 25, 2018
Automated Driving Maneuvers under Interactive Environment based on Deep Reinforcement LearningPin Wang, Ching-Yao Chan, Hanhan Li
Safe and efficient autonomous driving maneuvers in an interactive and complex environment can be considerably challenging due to the unpredictable actions of other surrounding agents that may be cooperative or adversarial in their interactions with the ego vehicle. One of the state-of-the-art approaches is to apply Reinforcement Learning (RL) to learn a time-sequential driving policy, to execute proper control strategy or tracking trajectory in dynamic situations. However, direct application of RL algorithms is not satisfactorily enough to deal with the cases in the autonomous driving domain, mainly due to the complex driving environment and continuous action space. In this paper, we adopt Q-learning as our basic learning framework and design a unique format of the Q-function approximator that consists of neural networks to handle the continuous action space challenge. The learning model is present in a closed form of continuous control variables and trained in a simulation platform that we have developed with embedded properties of real-time vehicle interactions. The proposed algorithm avoids invoking an additional actor network that learns to take actions, as in actor-critic algorithms. At the same time, some prior knowledge of vehicle dynamics is also fed into the model to assist learning. We test our algorithm with a challenging use case - lane change maneuver, to verify the practicability and feasibility of the proposed approach. Results from accumulated rewards and vehicle performance show that RL vehicle agents successfully learn a safe, comfort and efficient driving policy as defined in the reward function.
LGSep 7, 2017
Formulation of Deep Reinforcement Learning Architecture Toward Autonomous Driving for On-Ramp MergePin Wang, Ching-Yao Chan
Multiple automakers have in development or in production automated driving systems (ADS) that offer freeway-pilot functions. This type of ADS is typically limited to restricted-access freeways only, that is, the transition from manual to automated modes takes place only after the ramp merging process is completed manually. One major challenge to extend the automation to ramp merging is that the automated vehicle needs to incorporate and optimize long-term objectives (e.g. successful and smooth merge) when near-term actions must be safely executed. Moreover, the merging process involves interactions with other vehicles whose behaviors are sometimes hard to predict but may influence the merging vehicle optimal actions. To tackle such a complicated control problem, we propose to apply Deep Reinforcement Learning (DRL) techniques for finding an optimal driving policy by maximizing the long-term reward in an interactive environment. Specifically, we apply a Long Short-Term Memory (LSTM) architecture to model the interactive environment, from which an internal state containing historical driving information is conveyed to a Deep Q-Network (DQN). The DQN is used to approximate the Q-function, which takes the internal state as input and generates Q-values as output for action selection. With this DRL architecture, the historical impact of interactive environment on the long-term reward can be captured and taken into account for deciding the optimal control policy. The proposed architecture has the potential to be extended and applied to other autonomous driving scenarios such as driving through a complex intersection or changing lanes under varying traffic flow conditions.