LGApr 16, 2022
Towards cost-effective and resource-aware aggregation at Edge for Federated LearningAhmad Faraz Khan, Yuze Li, Xinran Wang et al.
Federated Learning (FL) is a machine learning approach that addresses privacy and data transfer costs by computing data at the source. It's particularly popular for Edge and IoT applications where the aggregator server of FL is in resource-capped edge data centers for reducing communication costs. Existing cloud-based aggregator solutions are resource-inefficient and expensive at the Edge, leading to low scalability and high latency. To address these challenges, this study compares prior and new aggregation methodologies under the changing demands of IoT and Edge applications. This work is the first to propose an adaptive FL aggregator at the Edge, enabling users to manage the cost and efficiency trade-off. An extensive comparative analysis demonstrates that the design improves scalability by up to 4X, time efficiency by 8X, and reduces costs by more than 2X compared to extant cloud-based static methodologies.
CLOct 6, 2022
Just ClozE! A Novel Framework for Evaluating the Factual Consistency Faster in Abstractive SummarizationYiyang Li, Lei Li, Marina Litvak et al.
The issue of factual consistency in abstractive summarization has received extensive attention in recent years, and the evaluation of factual consistency between summary and document has become an important and urgent task. Most of the current evaluation metrics are adopted from the question answering (QA) or natural language inference (NLI) task. However, the application of QA-based metrics is extremely time-consuming in practice while NLI-based metrics are lack of interpretability. In this paper, we propose a cloze-based evaluation framework called ClozE and show the great potential of the cloze-based metric. It inherits strong interpretability from QA, while maintaining the speed of NLI- level reasoning. We demonstrate that ClozE can reduce the evaluation time by nearly 96% relative to QA-based metrics while retaining their interpretability and performance through experiments on six human-annotated datasets and a meta-evaluation benchmark GO FIGURE (Gabriel et al., 2021). Finally, we discuss three important facets of ClozE in practice, which further shows better overall performance of ClozE compared to other metrics.
GTMar 18Code
On the Fragility of AI Agent CollusionJussi Keppo, Yuze Li, Gerry Tsoukalas et al.
Recent work shows that pricing with symmetric LLM agents leads to algorithmic collusion. We show that collusion is fragile under the heterogeneity typical of real deployments. In a stylized repeated-pricing model, heterogeneity in patience or data access reduces the set of collusive equilibria. Experiments with open-source LLM agents (totaling over 2,000 compute hours) align with these predictions: patience heterogeneity reduces price lift from 22% to 10% above competitive levels; asymmetric data access, to 7%. Increasing the number of competing LLMs breaks up collusion; so does cross-algorithm heterogeneity, that is, setting LLMs against Q-learning agents. But model-size differences (e.g., 32B vs. 14B weights) do not; they generate leader-follower dynamics that stabilize collusion. We discuss antitrust implications, such as enforcement actions restricting data-sharing and policies promoting algorithmic diversity.
CVMay 12
GeoQuery: Geometry-Query Diffusion for Sparse-View ReconstructionXiao Cao, Yuze Li, Youmin Zhang et al.
3D Gaussian Splatting (3DGS) has emerged as a prominent paradigm for 3D reconstruction and novel view synthesis. However, it remains vulnerable to severe artifacts when trained under sparse-view constraints. While recent methods attempt to rectify artifacts in rendered views using image diffusion models, they typically rely on multi-view self-attention to retrieve information from reference images. We observe that this mechanism often fails when the rendered novel views output by 3DGS are heavily corrupted: damaged query features lead to erroneous cross-view retrieval, resulting in inconsistent rendering refinement. To address this, we propose GeoQuery, a geometry-guided diffusion framework that integrates generative priors with explicit geometric cues via a novel Geometry-guided Cross-view Attention (GCA) mechanism. First, by leveraging predicted depth maps and camera poses, we construct a geometry-induced correspondence field to sample reference features, forming a geometry-aligned proxy query that replaces the corrupted rendering features. Furthermore, we design a new cross-view feature aggregation pipeline, in which we restrict the cross-view attention to a local window around each proxy query to effectively retrieve useful features while suppressing spurious matches. GeoQuery can be seamlessly integrated into existing diffusion-based pipelines, enabling robust reconstruction even under extreme view sparsity. Extensive experiments on sparse-view novel view synthesis and rendering artifact removal demonstrate the effectiveness of our approach.
CVMar 1
Let Your Image Move with Your Motion! -- Implicit Multi-Object Multi-Motion TransferYuze Li, Dong Gong, Xiao Cao et al.
Motion transfer has emerged as a promising direction for controllable video generation, yet existing methods largely focus on single-object scenarios and struggle when multiple objects require distinct motion patterns. In this work, we present FlexiMMT, the first implicit image-to-video (I2V) motion transfer framework that explicitly enables multi-object, multi-motion transfer. Given a static multi-object image and multiple reference videos, FlexiMMT independently extracts motion representations and accurately assigns them to different objects, supporting flexible recombination and arbitrary motion-to-object mappings. To address the core challenge of cross-object motion entanglement, we introduce a Motion Decoupled Mask Attention Mechanism that uses object-specific masks to constrain attention, ensuring that motion and text tokens only influence their designated regions. We further propose a Differentiated Mask Propagation Mechanism that derives object-specific masks directly from diffusion attention and progressively propagates them across frames efficiently. Extensive experiments demonstrate that FlexiMMT achieves precise, compositional, and state-of-the-art performance in I2V-based multi-object multi-motion transfer.
CVApr 23
UAU-Net: Uncertainty-aware Representation Learning and Evidential Classification for Facial Action Unit DetectionYuze Li, Zhilei Liu
Facial action unit (AU) detection remains challenging because it involves heterogeneous, AU-specific uncertainties arising at both the representation and decision stages. Recent methods have improved discriminative feature learning, but they often treat the AU representations as deterministic, overlooking uncertainty caused by visual noise, subject-dependent appearance variations, and ambiguous inter-AU relationships, all of which can substantially degrade robustness. Meanwhile, conventional point-estimation classifiers often provide poorly calibrated confidence, producing overconfident predictions, especially under the severe label imbalance typical of AU datasets. We propose UAU-Net, an Uncertainty-aware AU detection framework that explicitly models uncertainty at both stages. At the representation stage, we introduce CV-AFE, a conditional VAE (CVAE)-based AU feature extraction module that learns probabilistic AU representations by jointly estimating feature means and variances across multiple spatio-temporal scales; conditioning on AU labels further enables CV-AFE to capture uncertainty associated with inter-AU dependencies. At the decision stage, we design AB-ENN, an Asymmetric Beta Evidential Neural Network for multi-label AU detection, which parameterizes predictive uncertainty with Beta distributions and mitigates overconfidence via an asymmetric loss tailored to highly imbalanced binary labels. Extensive experiments on BP4D and DISFA show that UAU-Net achieves strong AU detection performance, and further analyses indicate that modeling uncertainty in both representation learning and evidential prediction improves robustness and reliability.
LGMar 21, 2025
TRACE: Time SeRies PArameter EffiCient FinE-tuningYuze Li, Wei Zhu
We propose an efficient fine-tuning method for time series foundation models, termed TRACE: Time Series Parameter Efficient Fine-tuning. While pretrained time series foundation models are gaining popularity, they face the following challenges: (1) Unlike natural language tasks, time series data vary in frequency, channel numbers, historical/prediction lengths. For long-term forecasting tasks in particular, tailored fine-tuning can significantly enhance performance.(2) Existing parameter-efficient tuning methods like LoRA remain applicable but require adaptation to temporal characteristics. To address these challenges, our TRACE framework introduces two key innovations: (1) Gated DSIC (Gated Dynamic Simulation Importance Calculation), an unbiased LoRA module importance selection mechanism that ensures conditional parameter consistency before and after masking. Experiments demonstrate that Gated DSIC outperforms common fine-tuning. (2) Reconstructed prediction heads for long-term forecasting tasks, which achieve comparable or superior performance to linear probing heads while drastically reducing parameter counts. Extensive experiments on long-/short-term forecasting, anomaly detection and natural language tasks across diverse datasets, coupled with ablation studies, validate the effectiveness of our method.
NIMar 7
pqRPKI: A Practical RPKI Architecture for the Post-Quantum EraWeitong Li, Yuze Li, Taejoong Chung
The Resource Public Key Infrastructure (RPKI) secures Internet routing by binding IP prefixes to authorized Autonomous Systems, yet its RSA foundations are vulnerable to quantum adversaries. A naive swap to post-quantum (PQ) signatures (eg Falcon) is a poor fit for RPKI's bulk model: every relying party (RP) repeatedly fetches and validates the entire global repository, so larger keys and signatures inflate bandwidth and CPU cost, especially during a long dual-stack transition. We present pqRPKI , a post-quantum RPKI framework that pairs a multi-layer Merkle Tree Ladder (MTL) with RPKI objects, customized to relocate per-object verification material from certificates into the Manifest. To update RPKI for Merkle tree based schemes, pqRPKI redesign the RPKI manifest and delegation chain, introduces a ladder-guided sync and bulk-verification workflow that lets validators localize diffs top-down and rebuild trees bottom-up. pqRPKI also preserves current RPKI objects and encodings, supports both hosted and delegated operation, and provides an additive migration path that coexists with today's trust anchors for dual-stack deployment with little size overhead. Implemented as a working publication point (PP) and RPs, we show that pqRPKI reduces repository footprint to 546.8 MB on average (65.5%/83.1% smaller than Falcon/ML-DSA), cuts full-cycle validation to 102.7 s, and achieves 118.3 s end-to-end PP to Router time, enabling sub-2-minute operating cadences with full-repository validation each cycle. Dual-stack deployment with RSA only adds just 3.4% size overhead versus today's RPKI repositories.