Feng Yin

h-index29

26papers

670citations

Novelty38%

AI Score39

Ranked #78,973 of 194,257 authors (top 41%)#17,600 in LG (top 44%)

26 Papers

34.0AIAug 22, 2023

ProAgent: Building Proactive Cooperative Agents with Large Language Models

Ceyao Zhang, Kaijie Yang, Siyi Hu et al. · pku

Building agents with adaptive behavior in cooperative tasks stands as a paramount goal in the realm of multi-agent systems. Current approaches to developing cooperative agents rely primarily on learning-based methods, whose policy generalization depends heavily on the diversity of teammates they interact with during the training phase. Such reliance, however, constrains the agents' capacity for strategic adaptation when cooperating with unfamiliar teammates, which becomes a significant challenge in zero-shot coordination scenarios. To address this challenge, we propose ProAgent, a novel framework that harnesses large language models (LLMs) to create proactive agents capable of dynamically adapting their behavior to enhance cooperation with teammates. ProAgent can analyze the present state, and infer the intentions of teammates from observations. It then updates its beliefs in alignment with the teammates' subsequent actual behaviors. Moreover, ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various of coordination scenarios. Experimental evaluations conducted within the Overcooked-AI environment unveil the remarkable performance superiority of ProAgent, outperforming five methods based on self-play and population-based training when cooperating with AI agents. Furthermore, in partnered with human proxy models, its performance exhibits an average improvement exceeding 10% compared to the current state-of-the-art method. For more information about our project, please visit~\url{https://pku-proagent.github.io}.

14.5MLMay 28, 2022

Rethinking Bayesian Learning for Data Analysis: The Art of Prior and Inference in Sparsity-Aware Modeling

Lei Cheng, Feng Yin, Sergios Theodoridis et al.

Sparse modeling for signal processing and machine learning has been at the focus of scientific research for over two decades. Among others, supervised sparsity-aware learning comprises two major paths paved by: a) discriminative methods and b) generative methods. The latter, more widely known as Bayesian methods, enable uncertainty evaluation w.r.t. the performed predictions. Furthermore, they can better exploit related prior information and naturally introduce robustness into the model, due to their unique capacity to marginalize out uncertainties related to the parameter estimates. Moreover, hyper-parameters associated with the adopted priors can be learnt via the training data. To implement sparsity-aware learning, the crucial point lies in the choice of the function regularizer for discriminative methods and the choice of the prior distribution for Bayesian learning. Over the last decade or so, due to the intense research on deep learning, emphasis has been put on discriminative techniques. However, a come back of Bayesian methods is taking place that sheds new light on the design of deep neural networks, which also establish firm links with Bayesian models and inspire new paths for unsupervised learning, such as Bayesian tensor decomposition. The goal of this article is two-fold. First, to review, in a unified way, some recent advances in incorporating sparsity-promoting priors into three highly popular data modeling tools, namely deep neural networks, Gaussian processes, and tensor decomposition. Second, to review their associated inference techniques from different aspects, including: evidence maximization via optimization and variational inference methods. Challenges such as small data dilemma, automatic model structure search, and natural prediction uncertainty evaluation are also discussed. Typical signal processing and machine learning tasks are demonstrated.

9.8LGJan 21, 2023Code

Towards Flexibility and Interpretability of Gaussian Process State-Space Model

Zhid Lin, Feng Yin, Juan Maroñas

The Gaussian process state-space model (GPSSM) has garnered considerable attention over the past decade. However, the standard GP with a preliminary kernel, such as the squared exponential kernel or Matérn kernel, that is commonly used in GPSSM studies, limits the model's representation power and substantially restricts its applicability to complex scenarios. To address this issue, we propose a new class of probabilistic state-space models called TGPSSMs, which leverage a parametric normalizing flow to enrich the GP priors in the standard GPSSM, enabling greater flexibility and expressivity. Additionally, we present a scalable variational inference algorithm that offers a flexible and optimal structure for the variational distribution of latent states. The proposed algorithm is interpretable and computationally efficient due to the sparse GP representation and the bijective nature of normalizing flow. Moreover, we incorporate a constrained optimization framework into the algorithm to enhance the state-space representation capabilities and optimize the hyperparameters, leading to superior learning and inference performance. Experimental results on synthetic and real datasets corroborate that the proposed TGPSSM outperforms several state-of-the-art methods. The accompanying source code is available at \url{https://github.com/zhidilin/TGPSSM}.

7.7LGNov 28, 2023

Attentional Graph Neural Network Is All You Need for Robust Massive Network Localization

Wenzhong Yan, Feng Yin, Juntao Wang et al.

In this paper, we design Graph Neural Networks (GNNs) with attention mechanisms to tackle an important yet challenging nonlinear regression problem: massive network localization. We first review our previous network localization method based on Graph Convolutional Network (GCN), which can exhibit state-of-the-art localization accuracy, even under severe Non-Line-of-Sight (NLOS) conditions, by carefully preselecting a constant threshold for determining adjacency. As an extension, we propose a specially designed Attentional GNN (AGNN) model to resolve the sensitive thresholding issue of the GCN-based method and enhance the underlying model capacity. The AGNN comprises an Adjacency Learning Module (ALM) and Multiple Graph Attention Layers (MGAL), employing distinct attention architectures to systematically address the demerits of the GCN-based method, rendering it more practical for real-world applications. Comprehensive analyses are conducted to explain the superior performance of these methods, including a theoretical analysis of the AGNN's dynamic attention property and computational complexity, along with a systematic discussion of their robust characteristic against NLOS measurements. Extensive experimental results demonstrate the effectiveness of the GCN-based and AGNN-based network localization methods. Notably, integrating attention mechanisms into the AGNN yields substantial improvements in localization accuracy, approaching the fundamental lower bound and showing approximately 37\% to 53\% reduction in localization error compared to the vanilla GCN-based method across various NLOS noise configurations. Both methods outperform all competing approaches by far in terms of localization accuracy, robustness, and computational time, especially for considerably large network sizes.

5.3LGSep 15, 2023Code

Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel

Richard Cornelius Suwandi, Zhidi Lin, Feng Yin et al.

Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture product (GSMP) kernel is tailored for multi-dimensional data, effectively reducing the number of hyper-parameters while maintaining good approximation capability. We further demonstrate that the associated hyper-parameter optimization of this kernel yields sparse solutions. To exploit the inherent sparsity of the solutions, we introduce the Sparse LInear Multiple Kernel Learning (SLIM-KL) framework. The framework incorporates a quantized alternating direction method of multipliers (ADMM) scheme for collaborative learning among multiple agents, where the local optimization problem is solved using a distributed successive convex approximation (DSCA) algorithm. SLIM-KL effectively manages large-scale hyper-parameter optimization for the proposed kernel, simultaneously ensuring data privacy and minimizing communication costs. Theoretical analysis establishes convergence guarantees for the learning framework, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our proposed methods.

6.9LGDec 15, 2022Code

Output-Dependent Gaussian Process State-Space Model

Zhidi Lin, Lei Cheng, Feng Yin et al.

Gaussian process state-space model (GPSSM) is a fully probabilistic state-space model that has attracted much attention over the past decade. However, the outputs of the transition function in the existing GPSSMs are assumed to be independent, meaning that the GPSSMs cannot exploit the inductive biases between different outputs and lose certain model capacities. To address this issue, this paper proposes an output-dependent and more realistic GPSSM by utilizing the well-known, simple yet practical linear model of coregionalization (LMC) framework to represent the output dependency. To jointly learn the output-dependent GPSSM and infer the latent states, we propose a variational sparse GP-based learning method that only gently increases the computational complexity. Experiments on both synthetic and real datasets demonstrate the superiority of the output-dependent GPSSM in terms of learning and inference performance.

1.8LGSep 4, 2022

A Case Study on the Classification of Lost Circulation Events During Drilling using Machine Learning Techniques on an Imbalanced Large Dataset

Toluwalase A. Olukoga, Yin Feng

This study presents machine learning models that forecast and categorize lost circulation severity preemptively using a large class imbalanced drilling dataset. We demonstrate reproducible core techniques involved in tackling a large drilling engineering challenge utilizing easily interpretable machine learning approaches. We utilized a 65,000+ records data with class imbalance problem from Azadegan oilfield formations in Iran. Eleven of the dataset's seventeen parameters are chosen to be used in the classification of five lost circulation events. To generate classification models, we used six basic machine learning algorithms and four ensemble learning methods. Linear Discriminant Analysis (LDA), Logistic Regression (LR), Support Vector Machines (SVM), Classification and Regression Trees (CART), k-Nearest Neighbors (KNN), and Gaussian Naive Bayes (GNB) are the six fundamental techniques. We also used bagging and boosting ensemble learning techniques in the investigation of solutions for improved predicting performance. The performance of these algorithms is measured using four metrics: accuracy, precision, recall, and F1-score. The F1-score weighted to represent the data imbalance is chosen as the preferred evaluation criterion. The CART model was found to be the best in class for identifying drilling fluid circulation loss events with an average weighted F1-score of 0.9904 and standard deviation of 0.0015. Upon application of ensemble learning techniques, a Random Forest ensemble of decision trees showed the best predictive performance. It identified and classified lost circulation events with a perfect weighted F1-score of 1.0. Using Permutation Feature Importance (PFI), the measured depth was found to be the most influential factor in accurately recognizing lost circulation events while drilling.

36.3AIJan 7, 2024Code

Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects

Yuheng Cheng, Ceyao Zhang, Zhengwen Zhang et al. · pku

Intelligent agents stand out as a potential path toward artificial general intelligence (AGI). Thus, researchers have dedicated significant effort to diverse implementations for them. Benefiting from recent progress in large language models (LLMs), LLM-based agents that use universal natural language as an interface exhibit robust generalization capabilities across various applications -- from serving as autonomous general-purpose task assistants to applications in coding, social, and economic domains, LLM-based agents offer extensive exploration opportunities. This paper surveys current research to provide an in-depth overview of LLM-based intelligent agents within single-agent and multi-agent systems. It covers their definitions, research frameworks, and foundational components such as their composition, cognitive and planning methods, tool utilization, and responses to environmental feedback. We also delve into the mechanisms of deploying LLM-based agents in multi-agent systems, including multi-role collaboration, message passing, and strategies to alleviate communication issues between agents. The discussions also shed light on popular datasets and application scenarios. We conclude by envisioning prospects for LLM-based agents, considering the evolving landscape of AI and natural language processing.

15.7LGSep 22, 2025Code

Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs

Richard Cornelius Suwandi, Feng Yin, Juntao Wang et al.

The efficiency of Bayesian optimization (BO) relies heavily on the choice of the Gaussian process (GP) kernel, which plays a central role in balancing exploration and exploitation under limited evaluation budgets. Traditional BO methods often rely on fixed or heuristic kernel selection strategies, which can result in slow convergence or suboptimal solutions when the chosen kernel is poorly suited to the underlying objective function. To address this limitation, we propose a freshly-baked Context-Aware Kernel Evolution (CAKE) to enhance BO with large language models (LLMs). Concretely, CAKE leverages LLMs as the crossover and mutation operators to adaptively generate and refine GP kernels based on the observed data throughout the optimization process. To maximize the power of CAKE, we further propose BIC-Acquisition Kernel Ranking (BAKER) to select the most effective kernel through balancing the model fit measured by the Bayesian information criterion (BIC) with the expected improvement at each iteration of BO. Extensive experiments demonstrate that our fresh CAKE-based BO method consistently outperforms established baselines across a range of real-world tasks, including hyperparameter optimization, controller tuning, and photonic chip design. Our code is publicly available at https://github.com/richardcsuwandi/cake.

6.6LGSep 3, 2023Code

Towards Efficient Modeling and Inference in Multi-Dimensional Gaussian Process State-Space Models

Zhidi Lin, Juan Maroñas, Ying Li et al.

The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states. To surmount this obstacle, we propose to integrate the efficient transformed Gaussian process (ETGP) into the GPSSM, which involves pushing a shared GP through multiple normalizing flows to efficiently model the transition function in high-dimensional latent state space. Additionally, we develop a corresponding variational inference algorithm that surpasses existing methods in terms of parameter count and computational complexity. Experimental results on diverse synthetic and real-world datasets corroborate the efficiency of the proposed method, while also demonstrating its ability to achieve similar inference performance compared to existing methods. Code is available at \url{https://github.com/zhidilin/gpssmProj}.

7.2LGOct 22, 2020Code

Graph Neural Network for Large-Scale Network Localization

Wenzhong Yan, Di Jin, Zhidi Lin et al.

Graph neural networks (GNNs) are popular to use for classifying structured data in the context of machine learning. But surprisingly, they are rarely applied to regression problems. In this work, we adopt GNN for a classic but challenging nonlinear regression problem, namely the network localization. Our main findings are in order. First, GNN is potentially the best solution to large-scale network localization in terms of accuracy, robustness and computational time. Second, proper thresholding of the communication range is essential to its superior performance. Simulation results corroborate that the proposed GNN based method outperforms all state-of-the-art benchmarks by far. Such inspiring results are theoretically justified in terms of data aggregation, non-line-of-sight (NLOS) noise removal and low-pass filtering effect, all affected by the threshold for neighbor selection. Code is available at https://github.com/Yanzongzi/GNN-For-localization.

2.3LGJun 7, 2020Code

Optimally Combining Classifiers for Semi-Supervised Learning

Zhiguo Wang, Liusha Yang, Feng Yin et al.

This paper considers semi-supervised learning for tabular data. It is widely known that Xgboost based on tree model works well on the heterogeneous features while transductive support vector machine can exploit the low density separation assumption. However, little work has been done to combine them together for the end-to-end semi-supervised learning. In this paper, we find these two methods have complementary properties and larger diversity, which motivates us to propose a new semi-supervised learning method that is able to adaptively combine the strengths of Xgboost and transductive support vector machine. Instead of the majority vote rule, an optimization problem in terms of ensemble weight is established, which helps to obtain more accurate pseudo labels for unlabeled data. The experimental results on the UCI data sets and real commercial data set demonstrate the superior classification performance of our method over the five state-of-the-art algorithms improving test accuracy by about $3\%-4\%$. The partial code can be found at https://github.com/hav-cam-mit/CTO.

9.2MLApr 2, 2024Code

Preventing Model Collapse in Gaussian Process Latent Variable Models

Ying Li, Zhidi Lin, Feng Yin et al.

Gaussian process latent variable models (GPLVMs) are a versatile family of unsupervised learning models commonly used for dimensionality reduction. However, common challenges in modeling data with GPLVMs include inadequate kernel flexibility and improper selection of the projection noise, leading to a type of model collapse characterized by vague latent representations that do not reflect the underlying data structure. This paper addresses these issues by, first, theoretically examining the impact of projection variance on model collapse through the lens of a linear GPLVM. Second, we tackle model collapse due to inadequate kernel flexibility by integrating the spectral mixture (SM) kernel and a differentiable random Fourier feature (RFF) kernel approximation, which ensures computational scalability and efficiency through off-the-shelf automatic differentiation tools for learning the kernel hyperparameters, projection variance, and latent representations within the variational inference framework. The proposed GPLVM, named advisedRFLVM, is evaluated across diverse datasets and consistently outperforms various salient competing models, including state-of-the-art variational autoencoders (VAEs) and other GPLVM variants, in terms of informative latent representations and missing data imputation.

10.4LGMar 15, 2024

Regularization-Based Efficient Continual Learning in Deep State-Space Models

Yuanhang Zhang, Zhidi Lin, Yiyong Sun et al.

Deep state-space models (DSSMs) have gained popularity in recent years due to their potent modeling capacity for dynamic systems. However, existing DSSM works are limited to single-task modeling, which requires retraining with historical task data upon revisiting a forepassed task. To address this limitation, we propose continual learning DSSMs (CLDSSMs), which are capable of adapting to evolving tasks without catastrophic forgetting. Our proposed CLDSSMs integrate mainstream regularization-based continual learning (CL) methods, ensuring efficient updates with constant computational and memory costs for modeling multiple dynamic systems. We also conduct a comprehensive cost analysis of each CL method applied to the respective CLDSSMs, and demonstrate the efficacy of CLDSSMs through experiments on real-world datasets. The results corroborate that while various competing CL methods exhibit different merits, the proposed CLDSSMs consistently outperform traditional DSSMs in terms of effectively addressing catastrophic forgetting, enabling swift and accurate parameter transfer to new tasks.

4.1LGApr 5, 2025

Vehicle Acceleration Prediction Considering Environmental Influence and Individual Driving Behavior

Wenxuan Wang, Lexing Zhang, Jiale Lei et al.

Accurate vehicle acceleration prediction is critical for intelligent driving control and energy efficiency management, particularly in environments with complex driving behavior dynamics. This paper proposes a general short-term vehicle acceleration prediction framework that jointly models environmental influence and individual driving behavior. The framework adopts a dual input design by incorporating environmental sequences, constructed from historical traffic variables such as percentile-based speed and acceleration statistics of multiple vehicles at specific spatial locations, capture group-level driving behavior influenced by the traffic environment. In parallel, individual driving behavior sequences represent motion characteristics of the target vehicle prior to the prediction point, reflecting personalized driving styles. These two inputs are processed using an LSTM Seq2Seq model enhanced with an attention mechanism, enabling accurate multi-step acceleration prediction. To demonstrate the effectiveness of the proposed method, an empirical study was conducted using high resolution radar video fused trajectory data collected from the exit section of the Guangzhou Baishi Tunnel. Drivers were clustered into three categories conservative, moderate, and aggressive based on key behavioral indicators, and a dedicated prediction model was trained for each group to account for driver heterogeneity.Experimental results show that the proposed method consistently outperforms four baseline models, yielding a 10.9% improvement in accuracy with the inclusion of historical traffic variables and a 33% improvement with driver classification. Although prediction errors increase with forecast distance, incorporating environment- and behavior-aware features significantly enhances model robustness.

7.8MLMar 24, 2025

Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems

Zhidi Lin, Ying Li, Feng Yin et al.

Gaussian process state-space models (GPSSMs) offer a principled framework for learning and inference in nonlinear dynamical systems with uncertainty quantification. However, existing GPSSMs are limited by the use of multiple independent stationary Gaussian processes (GPs), leading to prohibitive computational and parametric complexity in high-dimensional settings and restricted modeling capacity for non-stationary dynamics. To address these challenges, we propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems. Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics while significantly reducing model complexity. For the inference of the implicit process, we develop a variational inference algorithm that jointly approximates the posterior over the underlying GP and the neural network parameters defining the normalizing flows. To avoid explicit variational parameterization of the latent states, we further incorporate the ensemble Kalman filter (EnKF) into the variational framework, enabling accurate and efficient state estimation. Extensive empirical evaluations on synthetic and real-world datasets demonstrate the superior performance of our ETGPSSM in system dynamics learning, high-dimensional state estimation, and time-series forecasting, outperforming existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.

7.7LGDec 10, 2023Code

Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference

Zhidi Lin, Yiyong Sun, Feng Yin et al.

The Gaussian process state-space models (GPSSMs) represent a versatile class of data-driven nonlinear dynamical system models. However, the presence of numerous latent variables in GPSSM incurs unresolved issues for existing variational inference approaches, particularly under the more realistic non-mean-field (NMF) assumption, including extensive training effort, compromised inference accuracy, and infeasibility for online applications, among others. In this paper, we tackle these challenges by incorporating the ensemble Kalman filter (EnKF), a well-established model-based filtering technique, into the NMF variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO). Moreover, owing to the streamlined parameterization via the EnKF, the new GPSSM model can be easily accommodated in online learning applications. We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting. We also provide detailed analysis and fresh insights for the proposed algorithms. Comprehensive evaluation across diverse real and synthetic datasets corroborates the superior learning and inference performance of our EnKF-aided variational inference algorithms compared to existing methods.

6.5LGMar 18, 2021

Recent Advances in Data-Driven Wireless Communication Using Gaussian Processes: A Comprehensive Survey

Kai Chen, Qinglei Kong, Yijue Dai et al.

Data-driven paradigms are well-known and salient demands of future wireless communication. Empowered by big data and machine learning, next-generation data-driven communication systems will be intelligent with the characteristics of expressiveness, scalability, interpretability, and especially uncertainty modeling, which can confidently involve diversified latent demands and personalized services in the foreseeable future. In this paper, we review a promising family of nonparametric Bayesian machine learning methods, i.e., Gaussian processes (GPs), and their applications in wireless communication. Since GPs achieve the expressive and interpretable learning ability with uncertainty, it is particularly suitable for wireless communication. Moreover, it provides a natural framework for collaborating data and empirical models (DEM). Specifically, we first envision three-level motivations of data-driven wireless communication using GPs. Then, we present the background of the GPs in terms of covariance structure and model inference. The expressiveness of the GP model using various interpretable kernel designs is surveyed, namely, stationary, non-stationary, deep, and multi-task kernels. Furthermore, we review the distributed GPs with promising scalability, which is suitable for applications in wireless networks with a large number of distributed edge devices. Finally, we list representative solutions and promising techniques that adopt GPs in wireless communication systems.

18.1DCMar 8, 2020

FedLoc: Federated Learning Framework for Data-Driven Cooperative Localization and Location Data Processing

Feng Yin, Zhidi Lin, Yue Xu et al.

In this overview paper, data-driven learning model-based cooperative localization and location data processing are considered, in line with the emerging machine learning and big data methods. We first review (1) state-of-the-art algorithms in the context of federated learning, (2) two widely used learning models, namely the deep neural network model and the Gaussian process model, and (3) various distributed model hyper-parameter optimization schemes. Then, we demonstrate various practical use cases that are summarized from a mixture of standard, newly published, and unpublished works, which cover a broad range of location services, including collaborative static localization/fingerprinting, indoor target tracking, outdoor navigation using low-sampling GPS, and spatio-temporal wireless traffic data modeling and prediction. Experimental results show that near centralized data fitting- and prediction performance can be achieved by a set of collaborative mobile users running distributed algorithms. All the surveyed use cases fall under our newly proposed Federated Localization (FedLoc) framework, which targets on collaboratively building accurate location services without sacrificing user privacy, in particular, sensitive information related to their geographical trajectories. Future research directions are also discussed at the end of this paper.

5.0LGMar 1, 2020

Scalable Learning Paradigms for Data-Driven Wireless Communication

Yue Xu, Feng Yin, Wenjun Xu et al.

The marriage of wireless big data and machine learning techniques revolutionizes the wireless system by the data-driven philosophy. However, the ever exploding data volume and model complexity will limit centralized solutions to learn and respond within a reasonable time. Therefore, scalability becomes a critical issue to be solved. In this article, we aim to provide a systematic discussion on the building blocks of scalable data-driven wireless networks. On one hand, we discuss the forward-looking architecture and computing framework of scalable data-driven systems from a global perspective. On the other hand, we discuss the learning algorithms and model training strategies performed at each individual node from a local perspective. We also highlight several promising research directions in the context of scalable data-driven wireless communications to inspire future research.

1.0LGJul 5, 2019

Gaussian Processes for Analyzing Positioned Trajectories in Sports

Yuxin Zhao, Feng Yin, Fredrik Gunnarsson et al.

Kernel-based machine learning approaches are gaining increasing interest for exploring and modeling large dataset in recent years. Gaussian process (GP) is one example of such kernel-based approaches, which can provide very good performance for nonlinear modeling problems. In this work, we first propose a grey-box modeling approach to analyze the forces in cross country skiing races. To be more precise, a disciplined set of kinetic motion model formulae is combined with data-driven Gaussian process regression model, which accounts for everything unknown in the system. Then, a modeling approach is proposed to analyze the kinetic flow of both individual and clusters of skiers. The proposed approaches can be generally applied to use cases where positioned trajectories and kinetic measurements are available. The proposed approaches are evaluated using data collected from the Falun Nordic World Ski Championships 2015, in particular the Men's cross country $4\times10$ km relay. Forces during the cross country skiing races are analyzed and compared. Velocity models for skiers at different competition stages are also evaluated. Finally, the comparisons between the grey-box and black-box approach are carried out, where the grey-box approach can reduce the predictive uncertainty by $30\%$ to $40\%$.

1.2MLJun 6, 2019

A General $\mathcal{O}(n^2)$ Hyper-Parameter Optimization for Gaussian Process Regression with Cross-Validation and Non-linearly Constrained ADMM

Linning Xu, Feng Yin, Jiawei Zhang et al.

Hyper-parameter optimization remains as the core issue of Gaussian process (GP) for machine learning nowadays. The benchmark method using maximum likelihood (ML) estimation and gradient descent (GD) is impractical for processing big data due to its $O(n^3)$ complexity. Many sophisticated global or local approximation models, for instance, sparse GP, distributed GP, have been proposed to address such complexity issue. In this paper, we propose two novel and general-purpose GP hyper-parameter training schemes (GPCV-ADMM) by replacing ML with cross-validation (CV) as the fitting criterion and replacing GD with a non-linearly constrained alternating direction method of multipliers (ADMM) as the optimization method. The proposed schemes are of $O(n^2)$ complexity for any covariance matrix without special structure. We conduct various experiments based on both synthetic and real data sets, wherein the proposed schemes show excellent performance in terms of convergence, hyper-parameter estimation accuracy, and computational time in comparison with the traditional ML based routines given in the GPML toolbox.

6.0LGApr 21, 2019

Linear Multiple Low-Rank Kernel Based Stationary Gaussian Processes Regression for Time Series

Feng Yin, Lishuo Pan, Xinwei He et al.

Gaussian processes (GP) for machine learning have been studied systematically over the past two decades and they are by now widely used in a number of diverse applications. However, GP kernel design and the associated hyper-parameter optimization are still hard and to a large extend open problems. In this paper, we consider the task of GP regression for time series modeling and analysis. The underlying stationary kernel can be approximated arbitrarily close by a new proposed grid spectral mixture (GSM) kernel, which turns out to be a linear combination of low-rank sub-kernels. In the case where a large number of the sub-kernels are used, either the Nyström or the random Fourier feature approximations can be adopted to deal efficiently with the computational demands. The unknown GP hyper-parameters consist of the non-negative weights of all sub-kernels as well as the noise variance; their estimation is performed via the maximum-likelihood (ML) estimation framework. Two efficient numerical optimization methods for solving the unknown hyper-parameters are derived, including a sequential majorization-minimization (MM) method and a non-linearly constrained alternating direction of multiplier method (ADMM). The MM matches perfectly with the proven low-rank property of the proposed GSM sub-kernels and turns out to be a part of efficiency, stable, and efficient solver, while the ADMM has the potential to generate better local minimum in terms of the test MSE. Experimental results, based on various classic time series data sets, corroborate that the proposed GSM kernel-based GP regression model outperforms several salient competitors of similar kind in terms of prediction mean-squared-error and numerical stability.

0.9CVMar 9, 2019

How Effectively Can Indoor Wireless Positioning Relieve Visual Tracking Pains: A Camera-Rao Bound Viewpoint

Panwen Hu, Zizheng Yan, Rui Huang et al.

Visual tracking is fragile in some difficult scenarios, for instance, appearance ambiguity and variation, occlusion can easily degrade most of visual trackers to some extent. In this paper, visual tracking is empowered with wireless positioning to achieve high accuracy while maintaining robustness. Fundamentally different from the previous works, this study does not involve any specific wireless positioning algorithms. Instead, we use the confidence region derived from the wireless positioning Cramer-Rao bound (CRB) as the search region of visual trackers. The proposed framework is low-cost and very simple to implement, yet readily leads to enhanced and robustified visual tracking performance in difficult scenarios as corroborated by our experimental results. Most importantly, it is utmost valuable for the practioners to pre-evaluate how effectively can the wireless resources available at hand alleviate the visual tracking pains.

4.1LGAug 3, 2018

Multitask Gaussian Process with Hierarchical Latent Interactions

Kai Chen, Twan van Laarhoven, Elena Marchiori et al.

Multitask Gaussian process (MTGP) is powerful for joint learning of multiple tasks with complicated correlation patterns. However, due to the assembling of additive independent latent functions, all current MTGPs including the salient linear model of coregionalization (LMC) and convolution frameworks cannot effectively represent and learn the hierarchical latent interactions between its latent functions. In this paper, we further investigate the interactions in LMC of MTGP and then propose a novel kernel representation of the hierarchical interactions, which ameliorates both the expressiveness and the interpretability of MTGP. Specifically, we express the interaction as a product of function interaction and coefficient interaction. The function interaction is modeled by using cross convolution of latent functions. The coefficient interaction between the LMCs is described as a cross coregionalization term. We validate that considering the interactions can promote knowledge transferring in MTGP and compare our approach with some state-of-the-art MTGPs on both synthetic- and real-world datasets.

4.1LGAug 1, 2018

Compressible Spectral Mixture Kernels with Sparse Dependency Structures for Gaussian Processes

Kai Chen, Yijue Dai, Feng Yin et al.

Spectral mixture (SM) kernels comprise a powerful class of generalized kernels for Gaussian processes (GPs) to describe complex patterns. This paper introduces model compression and time- and phase (TP) modulated dependency structures to the original (SM) kernel for improved generalization of GPs. Specifically, by adopting Bienaymés identity, we generalize the dependency structure through cross-covariance between the SM components. Then, we propose a novel SM kernel with a dependency structure (SMD) by using cross-convolution between the SM components. Furthermore, we ameliorate the expressiveness of the dependency structure by parameterizing it with time and phase delays. The dependency structure has clear interpretations in terms of spectral density, covariance behavior, and sampling path. To enrich the SMD with effective hyperparameter initialization, compressible SM kernel components, and sparse dependency structures, we introduce a novel structure adaptation (SA) algorithm in the end. A thorough comparative analysis of the SMD on both synthetic and real-life applications corroborates its efficacy.