Xue Zheng

LG
h-index10
5papers
17citations
Novelty52%
AI Score39

5 Papers

CLOct 11, 2023
KwaiYiiMath: Technical Report

Jiayi Fu, Lei Lin, Xiaoyang Gao et al.

Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning. In this report, we introduce the KwaiYiiMath which enhances the mathematical reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT) and Reinforced Learning from Human Feedback (RLHF), including on both English and Chinese mathematical tasks. Meanwhile, we also constructed a small-scale Chinese primary school mathematics test set (named KMath), consisting of 188 examples to evaluate the correctness of the problem-solving process generated by the models. Empirical studies demonstrate that KwaiYiiMath can achieve state-of-the-art (SOTA) performance on GSM8k, CMath, and KMath compared with the similar size models, respectively.

LGApr 27, 2023
Deep Spatiotemporal Clustering: A Temporal Clustering Approach for Multi-dimensional Climate Data

Omar Faruque, Francis Ndikum Nji, Mostafa Cham et al.

Clustering high-dimensional spatiotemporal data using an unsupervised approach is a challenging problem for many data-driven applications. Existing state-of-the-art methods for unsupervised clustering use different similarity and distance functions but focus on either spatial or temporal features of the data. Concentrating on joint deep representation learning of spatial and temporal features, we propose Deep Spatiotemporal Clustering (DSC), a novel algorithm for the temporal clustering of high-dimensional spatiotemporal data using an unsupervised deep learning method. Inspired by the U-net architecture, DSC utilizes an autoencoder integrating CNN-RNN layers to learn latent representations of the spatiotemporal data. DSC also includes a unique layer for cluster assignment on latent representations that uses the Student's t-distribution. By optimizing the clustering loss and data reconstruction loss simultaneously, the algorithm gradually improves clustering assignments and the nonlinear mapping between low-dimensional latent feature space and high-dimensional original data space. A multivariate spatiotemporal climate dataset is used to evaluate the efficacy of the proposed method. Our extensive experiments show our approach outperforms both conventional and deep learning-based unsupervised clustering algorithms. Additionally, we compared the proposed model with its various variants (CNN encoder, CNN autoencoder, CNN-RNN encoder, CNN-RNN autoencoder, etc.) to get insight into using both the CNN and RNN layers in the autoencoder, and our proposed technique outperforms these variants in terms of clustering results.

48.9LGApr 27
TTCD:Transformer Integrated Temporal Causal Discovery from Non-Stationary Time Series Data

Omar Faruque, Sahara Ali, Xue Zheng et al.

The widespread availability of complex time series data in various domains such as environmental science, epidemiology, and economics demands robust causal discovery methods that can identify intricate contemporaneous and lagged relationships in non-stationary, nonlinear, and noisy settings. Existing constraint-based methods often rely heavily on conditional independence tests that degrade for limited data samples and complex distributions, while score-based methods impose strong statistical assumptions. Recent methods address special cases such as change point detection or distribution shifts, but struggle to provide a unified solution. We propose the Transformer Integrated Temporal Causal Discovery (TTCD) Framework, a novel end-to-end approach that learns contemporaneous and lagged causal relations from non-stationary time series. TTCD introduces a Non-Stationary Feature Learner integrating temporal and frequency-domain attention with dynamic non-stationarity profiling, and a custom Causal Structure Learner. A key innovation is reconstruction-guided causal signal distillation, to distill essential causal signals through the reconstruction process of the transformer decoder, which mitigates noise and spurious correlations while preserving meaningful dependencies. The Causal Structure Learner operates on distilled reconstructed signals to infer the underlying causal graph without restrictive assumptions on noise distributions or data generation processes. Experiments on synthetic, benchmark, and real world datasets show that TTCD consistently outperforms state-of-the-art baselines in both accuracy and consistency with domain knowledge, demonstrating the approach's effectiveness for causal discovery in challenging real world contexts.

LGApr 1, 2024
TS-CausalNN: Learning Temporal Causal Relations from Non-linear Non-stationary Time Series Data

Omar Faruque, Sahara Ali, Xue Zheng et al.

The growing availability and importance of time series data across various domains, including environmental science, epidemiology, and economics, has led to an increasing need for time-series causal discovery methods that can identify the intricate relationships in the non-stationary, non-linear, and often noisy real world data. However, the majority of current time series causal discovery methods assume stationarity and linear relations in data, making them infeasible for the task. Further, the recent deep learning-based methods rely on the traditional causal structure learning approaches making them computationally expensive. In this paper, we propose a Time-Series Causal Neural Network (TS-CausalNN) - a deep learning technique to discover contemporaneous and lagged causal relations simultaneously. Our proposed architecture comprises (i) convolutional blocks comprising parallel custom causal layers, (ii) acyclicity constraint, and (iii) optimization techniques using the augmented Lagrangian approach. In addition to the simple parallel design, an advantage of the proposed model is that it naturally handles the non-stationarity and non-linearity of the data. Through experiments on multiple synthetic and real world datasets, we demonstrate the empirical proficiency of our proposed approach as compared to several state-of-the-art methods. The inferred graphs for the real world dataset are in good agreement with the domain understanding.

LGOct 23, 2024
ProFL: Performative Robust Optimal Federated Learning

Xue Zheng, Tian Xie, Xuwei Tan et al.

Performative prediction is a framework that captures distribution shifts that occur during the training of machine learning models due to their deployment. As the trained model is used, data generation causes the model to evolve, leading to deviations from the original data distribution. The impact of such model-induced distribution shifts in federated learning is increasingly likely to transpire in real-life use cases. A recently proposed approach extends performative prediction to federated learning with the resulting model converging to a performative stable point, which may be far from the performative optimal point. Earlier research in centralized settings has shown that the performative optimal point can be achieved under model-induced distribution shifts, but these approaches require the performative risk to be convex and the training data to be noiseless, assumptions often violated in realistic federated learning systems. This paper overcomes all of these shortcomings and proposes Performative Robust Optimal Federated Learning, an algorithm that finds performative optimal points in federated learning from noisy and contaminated data. We present the convergence analysis under the Polyak-Lojasiewicz condition, which applies to non-convex objectives. Extensive experiments on multiple datasets demonstrate the advantage of Robust Optimal Federated Learning over the state-of-the-art.