LGAIJul 12, 2025

Extension OL-MDISF: Online Learning from Mix-Typed, Drifted, and Incomplete Streaming Features

arXiv:2507.10594v21 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses incremental improvements for researchers dealing with nonstationary, heterogeneous, and weakly supervised data streams in online learning.

The paper tackles the challenges of online learning with mixed feature types, data drift, and incomplete supervision by proposing the OL-MDISF algorithm, which uses copula models, adaptive sliding windows, and label proximity to achieve stable performance, as demonstrated through experiments on 14 real-world datasets.

Online learning, where feature spaces can change over time, offers a flexible learning paradigm that has attracted considerable attention. However, it still faces three significant challenges. First, the heterogeneity of real-world data streams with mixed feature types presents challenges for traditional parametric modeling. Second, data stream distributions can shift over time, causing an abrupt and substantial decline in model performance. Additionally, the time and cost constraints make it infeasible to label every data instance in a supervised setting. To overcome these challenges, we propose a new algorithm Online Learning from Mix-typed, Drifted, and Incomplete Streaming Features (OL-MDISF), which aims to relax restrictions on both feature types, data distribution, and supervision information. Our approach involves utilizing copula models to create a comprehensive latent space, employing an adaptive sliding window for detecting drift points to ensure model stability, and establishing label proximity information based on geometric structural relationships. To demonstrate the model's efficiency and effectiveness, we provide theoretical analysis and comprehensive experimental results. This extension serves as a standalone technical reference to the original OL-MDISF method. It provides (i) a contextual analysis of OL-MDISF within the broader landscape of online learning, covering recent advances in mixed-type feature modeling, concept drift adaptation, and weak supervision, and (ii) a comprehensive set of experiments across 14 real-world datasets under two types of drift scenarios. These include full CER trends, ablation studies, sensitivity analyses, and temporal ensemble dynamics. We hope this document can serve as a reproducible benchmark and technical resource for researchers working on nonstationary, heterogeneous, and weakly supervised data streams.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes