Hai-Tao Zhang

LG
h-index1
8papers
530citations
Novelty49%
AI Score38

8 Papers

SYMar 16, 2017
Distributed Kalman filtering with minimum-time consensus algorithm

Ye Yuan, Ling Shi, Jun Liu et al.

Fueled by applications in sensor networks, these years have witnessed a surge of interest in distributed estimation and filtering. A new approach is hereby proposed for the Distributed Kalman Filter (DKF) by integrating a local covariance computation scheme. Compared to existing well-established DKF methods, the virtue of the present approach lies in accelerating the convergence of the state estimates to those of the Centralized Kalman Filter (CKF). Meanwhile, an algorithm is proposed that allows each node to compute the averaged measurement noise covariance matrix within a minimal discrete-time running steps in a distributed way. Both theoretical analysis and extensive numerical simulations are conducted to show the feasibility and superiority of the proposed method.

LGSep 25, 2025
EvoMail: Self-Evolving Cognitive Agents for Adaptive Spam and Phishing Email Defense

Wei Huang, De-Tian Chu, Lin-Yuan Bai et al.

Modern email spam and phishing attacks have evolved far beyond keyword blacklists or simple heuristics. Adversaries now craft multi-modal campaigns that combine natural-language text with obfuscated URLs, forged headers, and malicious attachments, adapting their strategies within days to bypass filters. Traditional spam detection systems, which rely on static rules or single-modality models, struggle to integrate heterogeneous signals or to continuously adapt, leading to rapid performance degradation. We propose EvoMail, a self-evolving cognitive agent framework for robust detection of spam and phishing. EvoMail first constructs a unified heterogeneous email graph that fuses textual content, metadata (headers, senders, domains), and embedded resources (URLs, attachments). A Cognitive Graph Neural Network enhanced by a Large Language Model (LLM) performs context-aware reasoning across these sources to identify coordinated spam campaigns. Most critically, EvoMail engages in an adversarial self-evolution loop: a ''red-team'' agent generates novel evasion tactics -- such as character obfuscation or AI-generated phishing text -- while the ''blue-team'' detector learns from failures, compresses experiences into a memory module, and reuses them for future reasoning. Extensive experiments on real-world datasets (Enron-Spam, Ling-Spam, SpamAssassin, and TREC) and synthetic adversarial variants demonstrate that EvoMail consistently outperforms state-of-the-art baselines in detection accuracy, adaptability to evolving spam tactics, and interpretability of reasoning traces. These results highlight EvoMail's potential as a resilient and explainable defense framework against next-generation spam and phishing threats.

LGJul 15, 2021
DeceFL: A Principled Decentralized Federated Learning Framework

Ye Yuan, Jun Liu, Dou Jin et al.

Traditional machine learning relies on a centralized data pipeline, i.e., data are provided to a central server for model training. In many applications, however, data are inherently fragmented. Such a decentralized nature of these databases presents the biggest challenge for collaboration: sending all decentralized datasets to a central server raises serious privacy concerns. Although there has been a joint effort in tackling such a critical issue by proposing privacy-preserving machine learning frameworks, such as federated learning, most state-of-the-art frameworks are built still in a centralized way, in which a central client is needed for collecting and distributing model information (instead of data itself) from every other client, leading to high communication pressure and high vulnerability when there exists a failure at or attack on the central client. Here we propose a principled decentralized federated learning algorithm (DeceFL), which does not require a central client and relies only on local information transmission between clients and their neighbors, representing a fully decentralized learning framework. It has been further proven that every client reaches the global minimum with zero performance gap and achieves the same convergence rate $O(1/T)$ (where $T$ is the number of iterations in gradient descent) as centralized federated learning when the loss function is smooth and strongly convex. Finally, the proposed algorithm has been applied to a number of applications to illustrate its effectiveness for both convex and nonconvex loss functions, demonstrating its applicability to a wide range of real-world medical and industrial applications.

LGMar 21, 2020
BoostTree and BoostForest for Ensemble Learning

Changming Zhao, Dongrui Wu, Jian Huang et al.

Bootstrap aggregating (Bagging) and boosting are two popular ensemble learning approaches, which combine multiple base learners to generate a composite model for more accurate and more reliable performance. They have been widely used in biology, engineering, healthcare, etc. This paper proposes BoostForest, which is an ensemble learning approach using BoostTree as base learners and can be used for both classification and regression. BoostTree constructs a tree model by gradient boosting. It increases the randomness (diversity) by drawing the cut-points randomly at node splitting. BoostForest further increases the randomness by bootstrapping the training data in constructing different BoostTrees. BoostForest generally outperformed four classical ensemble learning approaches (Random Forest, Extra-Trees, XGBoost and LightGBM) on 35 classification and regression datasets. Remarkably, BoostForest tunes its parameters by simply sampling them randomly from a parameter pool, which can be easily specified, and its ensemble learning framework can also be used to combine many other base learners.

SYMay 3, 2019
Collective Dynamics and Control for Multiple Unmanned Surface Vessels

Bin Liu, Zhiyong Chen, Hai-Tao Zhang et al.

A multi-unmanned surface vessel (USV) formation control system is established on a novel platform composed of three 1.2 meter-long hydraulic jet propulsion surface vessels, a differential GPS reference station, and inter-vessel Zigbee communication modules. The system is also equipped with an upper level collective multi-USV protocol and a lower level vessel dynamics controller. The system is capable of chasing and surrounding a target vessel. The results are supported by rigorous theoretical analysis in terms of asymptotical surrounding behavior and trajectory regulation. Extensive experiments are conducted to demonstrate the effectiveness and efficiency of the proposed hardware and software architectures.

LGDec 17, 2018
A General End-to-end Diagnosis Framework for Manufacturing Systems

Ye Yuan, Guijun Ma, Cheng Cheng et al.

The manufacturing sector is envisioned to be heavily influenced by artificial intelligence-based technologies with the extraordinary increases in computational power and data volumes. A central challenge in manufacturing sector lies in the requirement of a general framework to ensure satisfied diagnosis and monitoring performances in different manufacturing applications. Here we propose a general data-driven, end-to-end framework for the monitoring of manufacturing systems. This framework, derived from deep learning techniques, evaluates fused sensory measurements to detect and even predict faults and wearing conditions. This work exploits the predictive power of deep learning to automatically extract hidden degradation features from noisy, time-course data. We have experimented the proposed framework on ten representative datasets drawn from a wide variety of manufacturing applications. Results reveal that the framework performs well in examined benchmark applications and can be applied in diverse contexts, indicating its potential use as a critical corner stone in smart manufacturing.

SYOct 1, 2018
Data-driven Discovery of Cyber-Physical Systems

Ye Yuan, Xiuchuan Tang, Wei Pan et al.

Cyber-physical systems (CPSs) embed software into the physical world. They appear in a wide range of applications such as smart grids, robotics, intelligent manufacture and medical monitoring. CPSs have proved resistant to modeling due to their intrinsic complexity arising from the combination of physical components and cyber components and the interaction between them. This study proposes a general framework for reverse engineering CPSs directly from data. The method involves the identification of physical systems as well as the inference of transition logic. It has been applied successfully to a number of real-world examples ranging from mechanical and electrical systems to medical applications. The novel framework seeks to enable researchers to make predictions concerning the trajectory of CPSs based on the discovered model. Such information has been proven essential for the assessment of the performance of CPS, the design of failure-proof CPS and the creation of design guidelines for new CPSs.

SYSep 14, 2018
Probabilistic Optimal Power Flow Considering Correlation of Wind Farms via Markov Chain Quasi-Monte Carlo Sampling

Weigao Sun, Mohsen Zamani, Hai-Tao Zhang et al.

The probabilistic characteristics of daily wind speed are not well captured by simple density functions such as Normal or Weibull distribuions as suggested by the existing literature. The unmodeled uncertainties can cause unknown influences on the power system operation. In this paper, we develop a new stochastic scheme for the probabilistic optimal power flow (POPF) problem, which can cope with arbitrarily complex wind speed distributions and also take into account the correlation of different wind farms. A multivariate Gaussian mixture model (GMM) is employed to approximate actual wind speed distributions from multiple wind farms. Furthermore, we propose to adopt the Markov Chain Monte Carlo (MCMC) sampling technique to deliver wind speed samples as the input of POPF. We also novelly integrate a Sobol-based quasi-Monte Carlo (QMC) technique into the MCMC sampling process to obtain a faster convergence rate. The IEEE 14- and 118-bus benchmark systems with additional wind farms are used to examine the effectiveness of the proposed POPF scheme.