LGNov 30, 2025
Prediction-space knowledge markets for communication-efficient federated learning on multimedia tasksWenzhang Du
Federated learning (FL) enables collaborative training over distributed multimedia data but suffers acutely from statistical heterogeneity and communication constraints, especially when clients deploy large models. Classic parameter-averaging methods such as FedAvg transmit full model weights and can diverge under nonindependent and identically distributed (non-IID) data. We propose KTA v2, a prediction-space knowledge trading market for FL. Each round, clients locally train on their private data, then share only logits on a small public reference set. The server constructs a client-client similarity graph in prediction space, combines it with reference-set accuracy to form per-client teacher ensembles, and sends back personalized soft targets for a second-stage distillation update. This two-stage procedure can be interpreted as approximate block-coordinate descent on a unified objective with prediction-space regularization. Experiments on FEMNIST, CIFAR-10 and AG News show that, under comparable or much lower communication budgets, KTA v2 consistently outperforms a local-only baseline and strong parameter-based methods (FedAvg, FedProx), and substantially improves over a FedMD-style global teacher. On CIFAR-10 with ResNet-18, KTA v2 reaches 57.7% test accuracy using approximately 1/1100 of FedAvg's communication, while on AG News it attains 89.3% accuracy with approximately 1/300 of FedAvg's traffic.
LGDec 2, 2025
Retrieval-Augmented Memory for Online LearningWenzhang Du
Retrieval-augmented models couple parametric predictors with non-parametric memories, but their use in streaming supervised learning with concept drift is not well understood. We study online classification in non-stationary environments and propose Retrieval-Augmented Memory for Online Learning (RAM-OL), a simple extension of stochastic gradient descent that maintains a small buffer of past examples. At each time step, RAM-OL retrieves a few nearest neighbours of the current input in the hidden representation space and updates the model jointly on the current example and the retrieved neighbours. We compare a naive replay variant with a gated replay variant that constrains neighbours using a time window, similarity thresholds, and gradient reweighting, in order to balance fast reuse of relevant past data against robustness to outdated regimes. From a theoretical perspective, we interpret RAM-OL under a bounded drift model and discuss how retrieval can reduce adaptation cost and improve regret constants when patterns recur over time. Empirically, we instantiate RAM-OL on a simple online multilayer perceptron and evaluate it on three real-world data streams derived from electricity pricing, electricity load, and airline delay data. On strongly and periodically drifting streams, RAM-OL improves prequential accuracy by up to about seven percentage points and greatly reduces variance across random seeds, while on a noisy airline stream the gated variant closely matches the purely online baseline. These results show that retrieval-augmented memory is a practical and robust tool for online learning under concept drift.
LGDec 26, 2025
Optimistic Feasible Search for Closed-Loop Fair Threshold Decision-MakingWenzhang Du
Closed-loop decision-making systems (e.g., lending, screening, or recidivism risk assessment) often operate under fairness and service constraints while inducing feedback effects: decisions change who appears in the future, yielding non-stationary data and potentially amplifying disparities. We study online learning of a one-dimensional threshold policy from bandit feedback under demographic parity (DP) and, optionally, service-rate constraints. The learner observes only a scalar score each round and selects a threshold; reward and constraint residuals are revealed only for the chosen threshold. We propose Optimistic Feasible Search (OFS), a simple grid-based method that maintains confidence bounds for reward and constraint residuals for each candidate threshold. At each round, OFS selects a threshold that appears feasible under confidence bounds and, among those, maximizes optimistic reward; if no threshold appears feasible, OFS selects the threshold minimizing optimistic constraint violation. This design directly targets feasible high-utility thresholds and is particularly effective for low-dimensional, interpretable policy classes where discretization is natural. We evaluate OFS on (i) a synthetic closed-loop benchmark with stable contraction dynamics and (ii) two semi-synthetic closed-loop benchmarks grounded in German Credit and COMPAS, constructed by training a score model and feeding group-dependent acceptance decisions back into population composition. Across all environments, OFS achieves higher reward with smaller cumulative constraint violation than unconstrained and primal-dual bandit baselines, and is near-oracle relative to the best feasible fixed threshold under the same sweep procedure. Experiments are reproducible and organized with double-blind-friendly relative outputs.
LGNov 30, 2025
City-Conditioned Memory for Multi-City Traffic and Mobility ForecastingWenzhang Du
Deploying spatio-temporal forecasting models across many cities is difficult: traffic networks differ in size and topology, data availability can vary by orders of magnitude, and new cities may provide only a short history of logs. Existing deep traffic models are typically trained per city and backbone, creating high maintenance cost and poor transfer to data-scarce cities. We ask whether a single, backbone-agnostic layer can condition on "which city this sequence comes from", improve accuracy in full- and low-data regimes, and support better cross-city adaptation with minimal code changes. We propose CityCond, a light-weight city-conditioned memory layer that augments existing spatio-temporal backbones. CityCond combines a city-ID encoder with an optional shared memory bank (CityMem). Given a city index and backbone hidden states, it produces city-conditioned features fused through gated residual connections. We attach CityCond to five representative backbones (GRU, TCN, Transformer, GNN, STGCN) and evaluate three regimes: full-data, low-data, and cross-city few-shot transfer on METR-LA and PEMS-BAY. We also run auxiliary experiments on SIND, a drone-based multi-agent trajectory dataset from a signalized intersection in Tianjin (we focus on pedestrian tracks). Across more than fourteen model variants and three random seeds, CityCond yields consistent improvements, with the largest gains for high-capacity backbones such as Transformers and STGCNs. CityMem reduces Transformer error by roughly one third in full-data settings and brings substantial gains in low-data and cross-city transfer. On SIND, simple city-ID conditioning modestly improves low-data LSTM performance. CityCond can therefore serve as a reusable design pattern for scalable, multi-city forecasting under realistic data constraints.
SDDec 4, 2025
Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service GraphsWenzhang Du
Subjective mean opinion scores (MOS) remain the de-facto target for non-intrusive speech and singing quality assessment. However, MOS is a scalar that collapses heterogeneous user expectations, ignores service-level objectives, and is difficult to compare across deployment graphs. We propose a contract-driven QoE auditing framework: each service graph G is evaluated under a set of human-interpretable experience contracts C, yielding a contract-level satisfaction vector Q(G, C). We show that (i) classical MOS regression is a special case with a degenerate contract set, (ii) contract-driven quality is more stable than MOS under graph view transformations (e.g., pooling by system vs. by system type), and (iii) the effective sample complexity of learning contracts is governed by contract semantics rather than merely the dimensionality of C. We instantiate the framework on URGENT2024 MOS (6.9k speech utterances with raw rating vectors) and SingMOS v1 (7,981 singing clips; 80 systems). On URGENT, we train a contract-aware neural auditor on self-supervised WavLM embeddings; on SingMOS, we perform contract-driven graph auditing using released rating vectors and metadata without decoding audio. Empirically, our auditor matches strong MOS predictors in MOS accuracy while providing calibrated contract probabilities; on SingMOS, Q(G, C) exhibits substantially smaller cross-view drift than raw MOS and graph-only baselines; on URGENT, difficulty curves reveal that mis-specified "simple" contracts can be harder to learn than richer but better aligned contract sets.
LGDec 4, 2025
Contract-Governed Training for Earth Observation: Observed Service Agreement Graphs and Coverage-Accuracy Trade-offsWenzhang Du
Earth observation (EO) models are frequently trained under implicit sampling policies that optimize global accuracy but provide no explicit guarantees on who (which regions, classes, or mission-critical strata) is being served throughout training. This paper introduces a contract-governed training paradigm for EO in which training samples are grouped into service contracts -- semantically meaningful units such as (dataset, region, rare-crop indicator) -- and each contract is assigned a target service share. We instantiate this paradigm as an Observed Service Agreement Graph (OSAG), a lightweight governance layer that (i) monitors contract-level exposure (coverage) during optimization, (ii) drives empirical coverage toward target shares via contract-normalized sampling weights, and (iii) exposes explicit accuracy-governance trade-offs through two knobs: a sampling mixture coefficient alpha and a contract-regularization weight lambda_C. We provide a compact theory in a toy setting: OSAG sampling concentrates empirical coverage to targets; coverage deviations upper-bound service-risk deviations; and contract design (coarse vs. fine) modulates governance cost. Experiments on AVIRIS hyperspectral scenes (Indian Pines plus Salinas) and multispectral Sentinel-2 EuroSAT demonstrate that OSAG can substantially reduce priority coverage error while maintaining global accuracy and improving high-priority accuracy. A EuroSAT coarse-vs-fine contract ablation further evidences how semantically refined contracts can reduce the accuracy cost per unit of governance improvement.
LGNov 16, 2025
CAO: Curvature-Adaptive Optimization via Periodic Low-Rank Hessian SketchingWenzhang Du
First-order optimizers are reliable but slow in sharp, anisotropic regions. We study a curvature-adaptive method that periodically sketches a low-rank Hessian subspace via Hessian--vector products and preconditions gradients only in that subspace, leaving the orthogonal complement first-order. For L-smooth non-convex objectives, we recover the standard O(1/T) stationarity guarantee with a widened stable stepsize range; under a Polyak--Lojasiewicz (PL) condition with bounded residual curvature outside the sketch, the loss contracts at refresh steps. On CIFAR-10/100 with ResNet-18/34, the method enters the low-loss region substantially earlier: measured by epochs to a pre-declared train-loss threshold (0.75), it reaches the threshold 2.95x faster than Adam on CIFAR-100/ResNet-18, while matching final test accuracy. The approach is one-knob: performance is insensitive to the sketch rank k across {1,3,5}, and k=0 yields a principled curvature-free ablation. We release anonymized logs and scripts that regenerate all figures and tables.
LGNov 24, 2025
When +1% Is Not Enough: A Paired Bootstrap Protocol for Evaluating Small ImprovementsWenzhang Du
Recent machine learning papers often report 1-2 percentage point improvements from a single run on a benchmark. These gains are highly sensitive to random seeds, data ordering, and implementation details, yet are rarely accompanied by uncertainty estimates or significance tests. It is therefore unclear when a reported +1-2% reflects a real algorithmic advance versus noise. We revisit this problem under realistic compute budgets, where only a few runs are affordable. We propose a simple, PC-friendly evaluation protocol based on paired multi-seed runs, bias-corrected and accelerated (BCa) bootstrap confidence intervals, and a sign-flip permutation test on per-seed deltas. The protocol is intentionally conservative and is meant as a guardrail against over-claiming. We instantiate it on CIFAR-10, CIFAR-10N, and AG News using synthetic no-improvement, small-gain, and medium-gain scenarios. Single runs and unpaired t-tests often suggest significant gains for 0.6-2.0 point improvements, especially on text. With only three seeds, our paired protocol never declares significance in these settings. We argue that such conservative evaluation is a safer default for small gains under tight budgets.
LGNov 22, 2025
Mitigating Catastrophic Forgetting in Streaming Generative and Predictive Learning via Stateful ReplayWenzhang Du
Many deployed learning systems must update models on streaming data under memory constraints. The default strategy, sequential fine-tuning on each new phase, is architecture-agnostic but often suffers catastrophic forgetting when later phases correspond to different sub-populations or tasks. Replay with a finite buffer is a simple alternative, yet its behaviour across generative and predictive objectives is not well understood. We present a unified study of stateful replay for streaming autoencoding, time series forecasting, and classification. We view both sequential fine-tuning and replay as stochastic gradient methods for an ideal joint objective, and use a gradient alignment analysis to show when mixing current and historical samples should reduce forgetting. We then evaluate a single replay mechanism on six streaming scenarios built from Rotated MNIST, ElectricityLoadDiagrams 2011-2014, and Airlines delay data, using matched training budgets and three seeds. On heterogeneous multi task streams, replay reduces average forgetting by a factor of two to three, while on benign time based streams both methods perform similarly. These results position stateful replay as a strong and simple baseline for continual learning in streaming environments.
LGNov 21, 2025
Step-E: A Differentiable Data Cleaning Framework for Robust Learning with Noisy LabelsWenzhang Du
Training data collected in the wild often contain noisy labels and outliers that substantially degrade the performance and reliability of deep neural networks. While data cleaning is commonly applied as a separate preprocessing stage, such two-stage pipelines neither fully exploit feedback from the downstream model nor adapt to unknown noise patterns. We propose Step-E, a simple framework that integrates sample selection and model learning into a single optimization process. At each epoch, Step-E ranks samples by loss and gradually increases the fraction of high-loss examples that are excluded from gradient updates after a brief warm-up stage, yielding an online curriculum that focuses on easy and consistent examples and eventually ignores persistent outliers. On CIFAR-100N, Step-E improves the test accuracy of a ResNet-18 model from 43.3% (+/- 0.7%) to 50.4% (+/- 0.9%), clearly outperforming loss truncation, self-paced learning, and one-shot filtering while approaching the clean-label oracle at 60.5% (+/- 0.2%). On CIFAR-10N (aggre), Step-E also improves over the noisy baseline (85.3% vs. 83.9%) and nearly matches the clean-label oracle (85.9%), with only moderate training-time overhead.