Shudong Sun

AI
h-index1
3papers
325citations
Novelty38%
AI Score35

3 Papers

MLOct 13, 2025
Quantifying Dataset Similarity to Guide Transfer Learning

Shudong Sun, Hao Helen Zhang

Transfer learning has become a cornerstone of modern machine learning, as it can empower models by leveraging knowledge from related domains to improve learning effectiveness. However, transferring from poorly aligned data can harm rather than help performance, making it crucial to determine whether the transfer will be beneficial before implementation. This work aims to address this challenge by proposing an innovative metric to measure dataset similarity and provide quantitative guidance on transferability. In the literature, existing methods largely focus on feature distributions while overlooking label information and predictive relationships, potentially missing critical transferability insights. In contrast, our proposed metric, the Cross-Learning Score (CLS), measures dataset similarity through bidirectional generalization performance between domains. We provide a theoretical justification for CLS by establishing its connection to the cosine similarity between the decision boundaries for the target and source datasets. Computationally, CLS is efficient and fast to compute as it bypasses the problem of expensive distribution estimation for high-dimensional problems. We further introduce a general framework that categorizes source datasets into positive, ambiguous, or negative transfer zones based on their CLS relative to the baseline error, enabling informed decisions. Additionally, we extend this approach to encoder-head architectures in deep learning to better reflect modern transfer pipelines. Extensive experiments on diverse synthetic and real-world tasks demonstrate that CLS can reliably predict whether transfer will improve or degrade performance, offering a principled tool for guiding data selection in transfer learning.

AIAug 12, 2013
Fighting Sample Degeneracy and Impoverishment in Particle Filters: A Review of Intelligent Approaches

Tiancheng Li, Shudong Sun, Tariq P. Sattar et al.

During the last two decades there has been a growing interest in Particle Filtering (PF). However, PF suffers from two long-standing problems that are referred to as sample degeneracy and impoverishment. We are investigating methods that are particularly efficient at Particle Distribution Optimization (PDO) to fight sample degeneracy and impoverishment, with an emphasis on intelligence choices. These methods benefit from such methods as Markov Chain Monte Carlo methods, Mean-shift algorithms, artificial intelligence algorithms (e.g., Particle Swarm Optimization, Genetic Algorithm and Ant Colony Optimization), machine learning approaches (e.g., clustering, splitting and merging) and their hybrids, forming a coherent standpoint to enhance the particle filter. The working mechanism, interrelationship, pros and cons of these approaches are provided. In addition, Approaches that are effective for dealing with high-dimensionality are reviewed. While improving the filter performance in terms of accuracy, robustness and convergence, it is noted that advanced techniques employed in PF often causes additional computational requirement that will in turn sacrifice improvement obtained in real life filtering. This fact, hidden in pure simulations, deserves the attention of the users and designers of new filters.

APJun 13, 2013
Adapting sample size in particle filters through KLD-resampling

Tiancheng Li, Shudong Sun, Tariq Pervez Sattar

This letter provides an adaptive resampling method. It determines the number of particles to resample so that the Kullback-Leibler distance (KLD) between distribution of particles before resampling and after resampling does not exceed a pre-specified error bound. The basis of the method is the same as Fox's KLD-sampling but implemented differently. The KLD-sampling assumes that samples are coming from the true posterior distribution and ignores any mismatch between the true and the proposal distribution. In contrast, we incorporate the KLD measure into the resampling in which the distribution of interest is just the posterior distribution. That is to say, for sample size adjustment, it is more theoretically rigorous and practically flexible to measure the fit of the distribution represented by weighted particles based on KLD during resampling than in sampling. Simulations of target tracking demonstrate the efficiency of our method.