Jun Fang

LG
h-index61
42papers
754citations
Novelty50%
AI Score56

42 Papers

AIMar 17, 2025
The Amazon Nova Family of Models: Technical Report and Model Card

Amazon AGI, Aaron Langford, Aayush Shah et al. · amazon-science

We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.

CVJul 8, 2023
Threshold-Consistent Margin Loss for Open-World Deep Metric Learning

Qin Zhang, Linghan Xu, Qingming Tang et al. · amazon-science

Existing losses used in deep metric learning (DML) for image retrieval often lead to highly non-uniform intra-class and inter-class representation structures across test classes and data distributions. When combined with the common practice of using a fixed threshold to declare a match, this gives rise to significant performance variations in terms of false accept rate (FAR) and false reject rate (FRR) across test classes and data distributions. We define this issue in DML as threshold inconsistency. In real-world applications, such inconsistency often complicates the threshold selection process when deploying commercial image retrieval systems. To measure this inconsistency, we propose a novel variance-based metric called Operating-Point-Inconsistency-Score (OPIS) that quantifies the variance in the operating characteristics across classes. Using the OPIS metric, we find that achieving high accuracy levels in a DML model does not automatically guarantee threshold consistency. In fact, our investigation reveals a Pareto frontier in the high-accuracy regime, where existing methods to improve accuracy often lead to degradation in threshold consistency. To address this trade-off, we introduce the Threshold-Consistent Margin (TCM) loss, a simple yet effective regularization technique that promotes uniformity in representation structures across classes by selectively penalizing hard sample pairs. Extensive experiments demonstrate TCM's effectiveness in enhancing threshold consistency while preserving accuracy, simplifying the threshold selection process in practical DML settings.

ITOct 16, 2019
Fast Compressed Power Spectrum Estimation: Towards A Practical Solution for Wideband Spectrum Sensing

Linxiao Yang, Jun Fang, Huiping Duan et al.

There has been a growing interest in wideband spectrum sensing due to its applications in cognitive radios and electronic surveillance. To overcome the sampling rate bottleneck for wideband spectrum sensing, in this paper, we study the problem of compressed power spectrum estimation whose objective is to reconstruct the power spectrum of a wide-sense stationary signal based on sub-Nyquist samples. By exploring the sampling structure inherent in the multicoset sampling scheme, we develop a computationally efficient method for power spectrum reconstruction. An important advantage of our proposed method over existing compressed power spectrum estimation methods is that our proposed method, whose primary computational task consists of fast Fourier transform (FFT), has a very low computational complexity. Such a merit makes it possible to efficiently implement the proposed algorithm in a practical field-programmable gate array (FPGA)-based system for real-time wideband spectrum sensing. Our proposed method also provides a new perspective on the power spectrum recovery condition, which leads to a result similar to what was reported in prior works. Simulation results are presented to show the computational efficiency and the effectiveness of the proposed method.

LGMay 30, 2022
Confederated Learning: Federated Learning with Decentralized Edge Servers

Bin Wang, Jun Fang, Hongbin Li et al.

Federated learning (FL) is an emerging machine learning paradigm that allows to accomplish model training without aggregating data at a central server. Most studies on FL consider a centralized framework, in which a single server is endowed with a central authority to coordinate a number of devices to perform model training in an iterative manner. Due to stringent communication and bandwidth constraints, such a centralized framework has limited scalability as the number of devices grows. To address this issue, in this paper, we propose a ConFederated Learning (CFL) framework. The proposed CFL consists of multiple servers, in which each server is connected with an individual set of devices as in the conventional FL framework, and decentralized collaboration is leveraged among servers to make full use of the data dispersed throughout the network. We develop an alternating direction method of multipliers (ADMM) algorithm for CFL. The proposed algorithm employs a random scheduling policy which randomly selects a subset of devices to access their respective servers at each iteration, thus alleviating the need of uploading a huge amount of information from devices to servers. Theoretical analysis is presented to justify the proposed method. Numerical results show that the proposed method can converge to a decent solution significantly faster than gradient-based FL algorithms, thus boasting a substantial advantage in terms of communication efficiency.

CVSep 27, 2022
Towards Regression-Free Neural Networks for Diverse Compute Platforms

Rahul Duggal, Hao Zhou, Shuo Yang et al. · amazon-science

With the shift towards on-device deep learning, ensuring a consistent behavior of an AI service across diverse compute platforms becomes tremendously important. Our work tackles the emergent problem of reducing predictive inconsistencies arising as negative flips: test samples that are correctly predicted by a less accurate model, but incorrectly by a more accurate one. We introduce REGression constrained Neural Architecture Search (REG-NAS) to design a family of highly accurate models that engender fewer negative flips. REG-NAS consists of two components: (1) A novel architecture constraint that enables a larger model to contain all the weights of the smaller one thus maximizing weight sharing. This idea stems from our observation that larger weight sharing among networks leads to similar sample-wise predictions and results in fewer negative flips; (2) A novel search reward that incorporates both Top-1 accuracy and negative flips in the architecture search metric. We demonstrate that \regnas can successfully find desirable architectures with few negative flips in three popular architecture search spaces. Compared to the existing state-of-the-art approach, REG-NAS enables 33-48% relative reduction of negative flips.

CVSep 9, 2024
Open-World Dynamic Prompt and Continual Visual Representation Learning

Youngeun Kim, Jun Fang, Qin Zhang et al. · amazon-science

The open world is inherently dynamic, characterized by ever-evolving concepts and distributions. Continual learning (CL) in this dynamic open-world environment presents a significant challenge in effectively generalizing to unseen test-time classes. To address this challenge, we introduce a new practical CL setting tailored for open-world visual representation learning. In this setting, subsequent data streams systematically introduce novel classes that are disjoint from those seen in previous training phases, while also remaining distinct from the unseen test classes. In response, we present Dynamic Prompt and Representation Learner (DPaRL), a simple yet effective Prompt-based CL (PCL) method. Our DPaRL learns to generate dynamic prompts for inference, as opposed to relying on a static prompt pool in previous PCL methods. In addition, DPaRL jointly learns dynamic prompt generation and discriminative representation at each training stage whereas prior PCL methods only refine the prompt learning throughout the process. Our experimental results demonstrate the superiority of our approach, surpassing state-of-the-art methods on well-established open-world image retrieval benchmarks by an average of 4.7% improvement in Recall@1 performance.

CVSep 30, 2022
An In-depth Study of Stochastic Backpropagation

Jun Fang, Mingze Xu, Hao Chen et al.

In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks. During backward propagation, SBP calculates the gradients by only using a subset of feature maps to save the GPU memory and computational cost. We interpret SBP as an efficient way to implement stochastic gradient decent by performing backpropagation dropout, which leads to considerable memory saving and training process speedup, with a minimal impact on the overall model accuracy. We offer some good practices to apply SBP in training image recognition models, which can be adopted in learning a wide range of deep neural networks. Experiments on image classification and object detection show that SBP can save up to 40% of GPU memory with less than 1% accuracy degradation.

NANov 16, 2017
A hybrid approach to solve the high-frequency Helmholtz equation with source singularity in smooth heterogeneous media

Jun Fang, Jianliang Qian, Leonardo Zepeda-Núñez et al.

We propose a hybrid approach to solve the high-frequency Helmholtz equation with point source terms in smooth heterogeneous media. The method is based on the ray-based finite element method (ray-FEM), whose original version can not handle the singularity close to point sources accurately. This pitfall is addressed by combining the ray-FEM, which is used to compute the smooth far-field of the solution accurately, with a high-order asymptotic expansion close to the point source, which is used to properly capture the singularity of the solution in the near-field. The method requires a fixed number of grid points per wavelength to accurately represent the wave field with an asymptotic convergence rate of $\mathcal{O}(ω^{-1/2})$, where $ω$ is the frequency parameter in the Helmholtz equation. In addition, a fast sweeping-type preconditioner is used to solve the resulting linear system. We present numerical examples in 2D to show both accuracy and efficiency of our method as the frequency increases. In particular, we provide numerical evidence of the convergence rate, and we show empirically that the overall complexity is $\mathcal{O}(ω^2)$ up to a poly-logarithmic factor.

NADec 10, 2015
Spectral Compressed Sensing via CANDECOMP/PARAFAC Decomposition of Incomplete Tensors

Jun Fang, Linxiao Yang, Hongbin Li

We consider the line spectral estimation problem which aims to recover a mixture of complex sinusoids from a small number of randomly observed time domain samples. Compressed sensing methods formulates line spectral estimation as a sparse signal recovery problem by discretizing the continuous frequency parameter space into a finite set of grid points. Discretization, however, inevitably incurs errors and leads to deteriorated estimation performance. In this paper, we propose a new method which leverages recent advances in tensor decomposition. Specifically, we organize the observed data into a structured tensor and cast line spectral estimation as a CANDECOMP/PARAFAC (CP) decomposition problem with missing entries. The uniqueness of the CP decomposition allows the frequency components to be super-resolved with infinite precision. Simulation results show that the proposed method provides a competitive estimate accuracy compared with existing state-of-the-art algorithms.

ITMay 8, 2022
Over-the-Air Federated Multi-Task Learning via Model Sparsification and Turbo Compressed Sensing

Haoming Ma, Xiaojun Yuan, Zhi Ding et al.

To achieve communication-efficient federated multitask learning (FMTL), we propose an over-the-air FMTL (OAFMTL) framework, where multiple learning tasks deployed on edge devices share a non-orthogonal fading channel under the coordination of an edge server (ES). In OA-FMTL, the local updates of edge devices are sparsified, compressed, and then sent over the uplink channel in a superimposed fashion. The ES employs over-the-air computation in the presence of intertask interference. More specifically, the model aggregations of all the tasks are reconstructed from the channel observations concurrently, based on a modified version of the turbo compressed sensing (Turbo-CS) algorithm (named as M-Turbo-CS). We analyze the performance of the proposed OA-FMTL framework together with the M-Turbo-CS algorithm. Furthermore, based on the analysis, we formulate a communication-learning optimization problem to improve the system performance by adjusting the power allocation among the tasks at the edge devices. Numerical simulations show that our proposed OAFMTL effectively suppresses the inter-task interference, and achieves a learning performance comparable to its counterpart with orthogonal multi-task transmission. It is also shown that the proposed inter-task power allocation optimization algorithm substantially reduces the overall communication overhead by appropriately adjusting the power allocation among the tasks.

AIFeb 4Code
Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

Yansong Ning, Jun Fang, Naiqiang Tan et al.

Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To this end, we first conduct quantitative investigations into how thought and observation affect agent effectiveness and efficiency. Based on our findings, we propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations. Specifically, we first synthesize a small amount of cold-start data, including both single-turn and multi-turn omission scenarios, to fine-tune the agent for omission behaviors. Furthermore, we introduce an omit-aware agentic reinforcement learning approach, incorporating a dual sampling mechanism and a tailored omission reward to incentivize the agent's adaptive omission capability. Theoretically, we prove that the deviation of our omission policy is upper-bounded by KL-divergence. Experimental results on five agent benchmarks show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agent, and achieve the best effectiveness-efficiency trade-off than seven efficient LLM agents methods. Our code and data are available at https://github.com/usail-hkust/Agent-Omit.

NANov 15, 2012
A Symmetry-based Decomposition Approach to Eigenvalue Problems: Formulation, Discretization, and Implementation

Jun Fang, Xingyu Gao, Aihui Zhou

In this paper, we propose a decomposition approach for eigenvalue problems with spatial symmetries, including the formulation, discretization as well as implementation. This approach can handle eigenvalue problems with either Abelian or non-Abelian symmetries, and is friendly for grid-based discretizations such as finite difference, finite element or finite volume methods. With the formulation, we divide the original eigenvalue problem into a set of subproblems and require only a smaller number of eigenpairs for each subproblem. We implement the decomposition approach with finite elements and parallelize our code in two levels. We show that the decomposition approach can improve the efficiency and scalability of iterative diagonalization. In particular, we apply the approach to solving Kohn--Sham equations of symmetric molecules consisting of hundreds of atoms.

HCMar 27
Routine Computing: A Systematic Review of Sensing Daily Life Dimensions Towards Human-Centered Goals

Borislav Pavlov, Jiajin Li, Jun Fang et al.

Human routines structure daily life, yet remain challenging for computational systems to understand. This paper presents the first systematic review of routine computing, a previously implicit but increasingly recognized field that focuses on computationally sensing and modeling human behaviors. It synthesizes 203 studies published up to August 2025. The paper presents a new taxonomy of the literature, focusing on temporal structures, behavioral interactions, cognitive aspects, and how variability and deviations are addressed. The common goals of routine computing extend across four major application domains, including accessibility care, the promotion of healthy habits, adaptive and context-aware support, and large-scale population insights. Persistent challenges that limit the design of truly human-centered systems are identified, including the gap between low-level activity recognition and high-level intent, the tension between personalization and generalization, unresolved privacy concerns, and data-related limitations. By consolidating these findings, this paper provides a foundational framework for HCI researchers, outlining principles for designing ethical, adaptive, and human-centered routine-aware systems.

CLFeb 12
WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models

Yangzhuo Li, Shengpeng Ji, Yifu Chen et al.

With the rapid integration of advanced reasoning capabilities into spoken dialogue models, the field urgently demands benchmarks that transcend simple interactions to address real-world complexity. However, current evaluations predominantly adhere to text-generation standards, overlooking the unique audio-centric characteristics of paralinguistics and colloquialisms, alongside the cognitive depth required by modern agents. To bridge this gap, we introduce WavBench, a comprehensive benchmark designed to evaluate realistic conversational abilities where prior works fall short. Uniquely, WavBench establishes a tripartite framework: 1) Pro subset, designed to rigorously challenge reasoning-enhanced models with significantly increased difficulty; 2) Basic subset, defining a novel standard for spoken colloquialism that prioritizes "listenability" through natural vocabulary, linguistic fluency, and interactive rapport, rather than rigid written accuracy; and 3) Acoustic subset, covering explicit understanding, generation, and implicit dialogue to rigorously evaluate comprehensive paralinguistic capabilities within authentic real-world scenarios. Through evaluating five state-of-the-art models, WavBench offers critical insights into the intersection of complex problem-solving, colloquial delivery, and paralinguistic fidelity, guiding the evolution of robust spoken dialogue models. The benchmark dataset and evaluation toolkit are available at https://naruto-2024.github.io/wavbench.github.io/.

CRJan 23
SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment

Xianya Fang, Xianying Luo, Yadong Wang et al.

Despite the intrinsic risk-awareness of Large Language Models (LLMs), current defenses often result in shallow safety alignment, rendering models vulnerable to disguised attacks (e.g., prefilling) while degrading utility. To bridge this gap, we propose SafeThinker, an adaptive framework that dynamically allocates defensive resources via a lightweight gateway classifier. Based on the gateway's risk assessment, inputs are routed through three distinct mechanisms: (i) a Standardized Refusal Mechanism for explicit threats to maximize efficiency; (ii) a Safety-Aware Twin Expert (SATE) module to intercept deceptive attacks masquerading as benign queries; and (iii) a Distribution-Guided Think (DDGT) component that adaptively intervenes during uncertain generation. Experiments show that SafeThinker significantly lowers attack success rates across diverse jailbreak strategies without compromising utility, demonstrating that coordinating intrinsic judgment throughout the generation process effectively balances robustness and practicality.

CLMay 17, 2025Code
Not All Thoughts are Generated Equal: Efficient LLM Reasoning via Multi-Turn Reinforcement Learning

Yansong Ning, Wei Li, Jun Fang et al.

Compressing long chain-of-thought (CoT) from large language models (LLMs) is an emerging strategy to improve the reasoning efficiency of LLMs. Despite its promising benefits, existing studies equally compress all thoughts within a long CoT, hindering more concise and effective reasoning. To this end, we first investigate the importance of different thoughts by examining their effectiveness and efficiency in contributing to reasoning through automatic long CoT chunking and Monte Carlo rollouts. Building upon the insights, we propose a theoretically bounded metric to jointly measure the effectiveness and efficiency of different thoughts. We then propose Long$\otimes$Short, an efficient reasoning framework that enables two LLMs to collaboratively solve the problem: a long-thought LLM for more effectively generating important thoughts, while a short-thought LLM for efficiently generating remaining thoughts. Specifically, we begin by synthesizing a small amount of cold-start data to fine-tune LLMs for long-thought and short-thought reasoning styles, respectively. Furthermore, we propose a synergizing-oriented multi-turn reinforcement learning, focusing on the model self-evolution and collaboration between long-thought and short-thought LLMs. Experimental results show that our method enables Qwen2.5-7B and Llama3.1-8B to achieve comparable performance compared to DeepSeek-R1-Distill-Qwen-7B and DeepSeek-R1-Distill-Llama-8B, while reducing token length by over 80% across the MATH500, AIME24/25, AMC23, and GPQA Diamond benchmarks. Our data and code are available at https://github.com/usail-hkust/LongShort.

LGFeb 26
Compress the Easy, Explore the Hard: Difficulty-Aware Entropy Regularization for Efficient LLM Reasoning

Qin-Wen Luo, Sheng Ren, Xiang Chen et al.

Chain-of-Thought (CoT) has substantially empowered Large Language Models (LLMs) to tackle complex reasoning tasks, yet the verbose nature of explicit reasoning steps incurs prohibitive inference latency and computational costs, limiting real-world deployment. While existing compression methods - ranging from self-training to Reinforcement Learning (RL) with length constraints - attempt to mitigate this, they often sacrifice reasoning capability for brevity. We identify a critical failure mode in these approaches: explicitly optimizing for shorter trajectories triggers rapid entropy collapse, which prematurely shrinks the exploration space and stifles the discovery of valid reasoning paths, particularly for challenging questions requiring extensive deduction. To address this issue, we propose Compress responses for Easy questions and Explore Hard ones (CEEH), a difficulty-aware approach to RL-based efficient reasoning. CEEH dynamically assesses instance difficulty to apply selective entropy regularization: it preserves a diverse search space for currently hard questions to ensure robustness, while permitting aggressive compression on easier instances where the reasoning path is well-established. In addition, we introduce a dynamic optimal-length penalty anchored to the historically shortest correct response, which effectively counteracts entropy-induced length inflation and stabilizes the reward signal. Across six reasoning benchmarks, CEEH consistently reduces response length while maintaining accuracy comparable to the base model, and improves Pass@k relative to length-only optimization.

HCApr 7
SpeakSoftly: Scaffolding Nonviolent Communication in Intimate Relationships through LLM-Powered Just-In-Time Interventions

Ka I Chan, Hongbo Lan, Jun Fang et al.

Conflicts are common in text-based communication, particularly in intimate relationships, where misunderstandings can easily escalate into verbal aggression. To address this, we present SpeakSoftly, a system that applies Nonviolent Communication (NVC) principles to scaffold couples' conflict communication through LLM-powered just-in-time interventions. Informed by formative interviews with couples and NVC principles, we designed two core features: NVC-Prompt, which detects verbal aggression and suggests revisions to prevent escalation, and NVC-Guide, which analyzes dialogues to uncover users' feelings and needs, fostering self-awareness and perspective-taking. These features were implemented across three progressive intervention modes, each varying in intervention depth and tone: Basic Reminder, Neutral Guide, and Empathetic Guide. We conducted a mixed-methods user study with 18 couples across simulated and real-life conflict settings to evaluate the effectiveness of each mode. Results showed that Empathetic Guide significantly facilitated both behavioral and cognitive changes, while Neutral Guide was effective only for behavioral changes in simulated conflicts. In real-life conflicts, Neutral Guide showed distinct advantages due to lower cognitive load demands. We discuss the mechanisms behind these findings and propose design implications for in-situ interventions in high-stakes communication contexts.

CLFeb 12, 2025Code
DiMA: An LLM-Powered Ride-Hailing Assistant at DiDi

Yansong Ning, Shuowei Cai, Wei Li et al.

On-demand ride-hailing services like DiDi, Uber, and Lyft have transformed urban transportation, offering unmatched convenience and flexibility. In this paper, we introduce DiMA, an LLM-powered ride-hailing assistant deployed in DiDi Chuxing. Its goal is to provide seamless ride-hailing services and beyond through a natural and efficient conversational interface under dynamic and complex spatiotemporal urban contexts. To achieve this, we propose a spatiotemporal-aware order planning module that leverages external tools for precise spatiotemporal reasoning and progressive order planning. Additionally, we develop a cost-effective dialogue system that integrates multi-type dialog repliers with cost-aware LLM configurations to handle diverse conversation goals and trade-off response quality and latency. Furthermore, we introduce a continual fine-tuning scheme that utilizes real-world interactions and simulated dialogues to align the assistant's behavior with human preferred decision-making processes. Since its deployment in the DiDi application, DiMA has demonstrated exceptional performance, achieving 93% accuracy in order planning and 92% in response generation during real-world interactions. Offline experiments further validate DiMA capabilities, showing improvements of up to 70.23% in order planning and 321.27% in response generation compared to three state-of-the-art agent frameworks, while reducing latency by $0.72\times$ to $5.47\times$. These results establish DiMA as an effective, efficient, and intelligent mobile assistant for ride-hailing services. Our project is released at https://github.com/usail-hkust/DiMA and we also release the MCP service (https://mcp.didichuxing.com/api) to foster the ride-hailing research community.

AIJan 30
Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution

Hongze Mi, Yibo Feng, WenJie Lu et al.

Multimodal Large Language Model (MLLM) agents facilitate Graphical User Interface (GUI) automation but struggle with long-horizon, cross-application tasks due to limited context windows. While memory systems provide a viable solution, existing paradigms struggle to adapt to dynamic GUI environments, suffering from a granularity mismatch between high-level intent and low-level execution, and context pollution where the static accumulation of outdated experiences drives agents into hallucination. To address these bottlenecks, we propose the Darwinian Memory System (DMS), a self-evolving architecture that constructs memory as a dynamic ecosystem governed by the law of survival of the fittest. DMS decomposes complex trajectories into independent, reusable units for compositional flexibility, and implements Utility-driven Natural Selection to track survival value, actively pruning suboptimal paths and inhibiting high-risk plans. This evolutionary pressure compels the agent to derive superior strategies. Extensive experiments on real-world multi-app benchmarks validate that DMS boosts general-purpose MLLMs without training costs or architectural overhead, achieving average gains of 18.0% in success rate and 33.9% in execution stability, while reducing task latency, establishing it as an effective self-evolving memory system for GUI tasks.

LGDec 2, 2024Code
FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Frequency Domain

Zhengnan Li, Haoxuan Li, Hao Wang et al.

Time series forecasting (TSF) plays a crucial role in various domains, including web data analysis, energy consumption prediction, and weather forecasting. While Multi-Layer Perceptrons (MLPs) are lightweight and effective for capturing temporal dependencies, they are prone to overfitting when used to model inter-channel dependencies. In this paper, we investigate the overfitting problem in channel-wise MLPs using Rademacher complexity theory, revealing that extreme values in time series data exacerbate this issue. To mitigate this issue, we introduce a novel Simplex-MLP layer, where the weights are constrained within a standard simplex. This strategy encourages the model to learn simpler patterns and thereby reducing overfitting to extreme values. Based on the Simplex-MLP layer, we propose a novel \textbf{F}requency \textbf{S}implex \textbf{MLP} (FSMLP) framework for time series forecasting, comprising of two kinds of modules: \textbf{S}implex \textbf{C}hannel-\textbf{W}ise MLP (SCWM) and \textbf{F}requency \textbf{T}emporal \textbf{M}LP (FTM). The SCWM effectively leverages the Simplex-MLP to capture inter-channel dependencies, while the FTM is a simple yet efficient temporal MLP designed to extract temporal information from the data. Our theoretical analysis shows that the upper bound of the Rademacher Complexity for Simplex-MLP is lower than that for standard MLPs. Moreover, we validate our proposed method on seven benchmark datasets, demonstrating significant improvements in forecasting accuracy and efficiency, while also showcasing superior scalability. Additionally, we demonstrate that Simplex-MLP can improve other methods that use channel-wise MLP to achieve less overfitting and improved performance. Code are available \href{https://github.com/FMLYD/FSMLP}{\textcolor{red}{here}}.

LGNov 26, 2025
FANoise: Singular Value-Adaptive Noise Modulation for Robust Multimodal Representation Learning

Jiaoyang Li, Jun Fang, Tianhao Gao et al.

Representation learning is fundamental to modern machine learning, powering applications such as text retrieval and multimodal understanding. However, learning robust and generalizable representations remains challenging. While prior work has demonstrated that active noise injection, a form of data augmentation, can enhance encoding performance, most existing methods rely on heuristic or static noise, overlooking the dynamic nature of feature distributions during training. In this work, we systematically study the role of noise in representation learning from both gradient-based and feature distribution perspectives, using InfoNCE loss as a representative example. Focusing on multimodal representation learning, we propose FANoise, a novel feature-adaptive noise injection strategy. By leveraging the dynamics of contrastive learning, FANoise effectively mitigates the negative impacts of noise while preserving its benefits. Under this theoretically grounded framework, comprehensive experiments demonstrate that FANoise consistently improves overall performance on multimodal tasks across various base VLM models.

LGFeb 28, 2024
Communication Efficient ConFederated Learning: An Event-Triggered SAGA Approach

Bin Wang, Jun Fang, Hongbin Li et al.

Federated learning (FL) is a machine learning paradigm that targets model training without gathering the local data dispersed over various data sources. Standard FL, which employs a single server, can only support a limited number of users, leading to degraded learning capability. In this work, we consider a multi-server FL framework, referred to as \emph{Confederated Learning} (CFL), in order to accommodate a larger number of users. A CFL system is composed of multiple networked edge servers, with each server connected to an individual set of users. Decentralized collaboration among servers is leveraged to harness all users' data for model training. Due to the potentially massive number of users involved, it is crucial to reduce the communication overhead of the CFL system. We propose a stochastic gradient method for distributed learning in the CFL framework. The proposed method incorporates a conditionally-triggered user selection (CTUS) mechanism as the central component to effectively reduce communication overhead. Relying on a delicately designed triggering condition, the CTUS mechanism allows each server to select only a small number of users to upload their gradients, without significantly jeopardizing the convergence performance of the algorithm. Our theoretical analysis reveals that the proposed algorithm enjoys a linear convergence rate. Simulation results show that it achieves substantial improvement over state-of-the-art algorithms in terms of communication efficiency.

HCApr 9
StoryEcho: A Generative Child-as-Actor Storytelling System for Picky-Eating Intervention

Yanuo Zhou, Jun Fang, Yuntao Wang et al.

Picky eating in children can undermine dietary diversity and the development of healthy eating habits, while also creating recurring tension in family feeding routines. Prior interventions have explored food-centered designs, enhanced utensils, and mealtime interactive systems, but few position children as active participants in intervention processes that extend beyond single mealtime interactions. To better understand everyday responses to picky eating and child-acceptable intervention mechanisms, we conducted a formative study with caregivers and kindergarten teachers. Based on the resulting design considerations and iterative stakeholder review, we designed StoryEcho, a generative child-as-actor storytelling system for picky eating intervention. StoryEcho engages children outside mealtimes through personalized stories in which the child appears as a persistent story character and later shapes story development through real-world food-related behavior. The system combines non-mealtime story engagement, lightweight post-meal feedback, and behavior-informed story updates to support repeated intervention across everyday family routines. We evaluated StoryEcho in a between-group field study with 11 families of preschool children. Results provide preliminary evidence that StoryEcho can significantly increase children's willingness to approach and try target low-preference foods while reducing parental pressure around feeding. These findings suggest the promise of generative child-as-actor storytelling as a design approach for home-based behavior support that unfolds through recurring family routines.

ARApr 4
Efficient Solving for Dynamic Data Structure Constraint Satisfaction Problem

Nanbing Li, Weijie Peng, Jin Luo et al.

Functional verification plays a central role in ensuring the correctness of modern integrated circuit designs, where constrained-random verification is widely adopted to generate diverse stimuli under high-level constraints. In industrial verification environments, constraint solving increasingly involves dynamic data structures whose shape and content are determined at runtime, causing the sets of variables and constraint instances to evolve across solver invocations, which in turn leads to substantial overhead when nested and high-dimensional structures repeatedly expand across solves. We formalize this class of problems as the Dynamic Data Structure Constraint Satisfaction Problem (D2SCSP),which captures the interaction between dynamic data structure expansion and constraint evaluation. We propose a dependency-guided problem partitioning framework combined with an incremental encoding and constraint activation mechanism, enabling reuse of solver state and encodings across multiple solves. The framework is integrated into an industrial SystemVerilog verification flow and implemented in the commercial simulator VeriSim. Experimental results on industrial benchmarks demonstrate significant performance improvements, achieving an average speedup of 24.80x over a baseline and 1.72x over a state-of-the-art commercial simulator, highlighting the practicality of the approach for real-world verification workflows.

LGFeb 9
Spectral Disentanglement and Enhancement: A Dual-domain Contrastive Framework for Representation Learning

Jinjin Guo, Yexin Li, Zhichao Huang et al.

Large-scale multimodal contrastive learning has recently achieved impressive success in learning rich and transferable representations, yet it remains fundamentally limited by the uniform treatment of feature dimensions and the neglect of the intrinsic spectral structure of the learned features. Empirical evidence indicates that high-dimensional embeddings tend to collapse into narrow cones, concentrating task-relevant semantics in a small subspace, while the majority of dimensions remain occupied by noise and spurious correlations. Such spectral imbalance and entanglement undermine model generalization. We propose Spectral Disentanglement and Enhancement (SDE), a novel framework that bridges the gap between the geometry of the embedded spaces and their spectral properties. Our approach leverages singular value decomposition to adaptively partition feature dimensions into strong signals that capture task-critical semantics, weak signals that reflect ancillary correlations, and noise representing irrelevant perturbations. A curriculum-based spectral enhancement strategy is then applied, selectively amplifying informative components with theoretical guarantees on training stability. Building upon the enhanced features, we further introduce a dual-domain contrastive loss that jointly optimizes alignment in both the feature and spectral spaces, effectively integrating spectral regularization into the training process and encouraging richer, more robust representations. Extensive experiments on large-scale multimodal benchmarks demonstrate that SDE consistently improves representation robustness and generalization, outperforming state-of-the-art methods. SDE integrates seamlessly with existing contrastive pipelines, offering an effective solution for multimodal representation learning.

CVOct 16, 2025
Salient Concept-Aware Generative Data Augmentation

Tianchen Zhao, Xuanbai Chen, Zhihua Li et al. · amazon-science

Recent generative data augmentation methods conditioned on both image and text prompts struggle to balance between fidelity and diversity, as it is challenging to preserve essential image details while aligning with varied text prompts. This challenge arises because representations in the synthesis process often become entangled with non-essential input image attributes such as environmental contexts, creating conflicts with text prompts intended to modify these elements. To address this, we propose a personalized image generation framework that uses a salient concept-aware image embedding model to reduce the influence of irrelevant visual details during the synthesis process, thereby maintaining intuitive alignment between image and text inputs. By generating images that better preserve class-discriminative features with additional controlled variations, our framework effectively enhances the diversity of training datasets and thereby improves the robustness of downstream models. Our approach demonstrates superior performance across eight fine-grained vision datasets, outperforming state-of-the-art augmentation methods with averaged classification accuracy improvements by 0.73% and 6.5% under conventional and long-tail settings, respectively.

AISep 26, 2025
DeepTravel: An End-to-End Agentic Reinforcement Learning Framework for Autonomous Travel Planning Agents

Yansong Ning, Rui Liu, Jun Wang et al.

Travel planning (TP) agent has recently worked as an emerging building block to interact with external tools and resources for travel itinerary generation, ensuring enjoyable user experience. Despite its benefits, existing studies rely on hand craft prompt and fixed agent workflow, hindering more flexible and autonomous TP agent. This paper proposes DeepTravel, an end to end agentic reinforcement learning framework for building autonomous travel planning agent, capable of autonomously planning, executing tools, and reflecting on tool responses to explore, verify, and refine intermediate actions in multi step reasoning. To achieve this, we first construct a robust sandbox environment by caching transportation, accommodation and POI data, facilitating TP agent training without being constrained by real world APIs limitations (e.g., inconsistent outputs). Moreover, we develop a hierarchical reward modeling system, where a trajectory level verifier first checks spatiotemporal feasibility and filters unsatisfied travel itinerary, and then the turn level verifier further validate itinerary detail consistency with tool responses, enabling efficient and precise reward service. Finally, we propose the reply augmented reinforcement learning method that enables TP agent to periodically replay from a failures experience buffer, emerging notable agentic capacity. We deploy trained TP agent on DiDi Enterprise Solutions App and conduct comprehensive online and offline evaluations, demonstrating that DeepTravel enables small size LLMs (e.g., Qwen3 32B) to significantly outperform existing frontier LLMs such as OpenAI o1, o3 and DeepSeek R1 in travel planning tasks.

CVMay 11, 2023
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts

Zhaoyang Zhang, Yantao Shen, Kunyu Shi et al.

We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks which may interfere with each other, resulting in a single model which we named Musketeer. The integration of knowledge across heterogeneous tasks is enabled by a novel feature called Task Explanation Prompt (TEP). With rich and structured information such as task input/output format, TEP reduces interference among tasks, allowing the model to focus on their shared structure. With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.

LGJun 27, 2021
Over-the-Air Federated Multi-Task Learning

Haoming Ma, Xiaojun Yuan, Dian Fan et al.

In this letter, we introduce over-the-air computation into the communication design of federated multi-task learning (FMTL), and propose an over-the-air federated multi-task learning (OA-FMTL) framework, where multiple learning tasks deployed on edge devices share a non-orthogonal fading channel under the coordination of an edge server (ES). Specifically, the model updates for all the tasks are transmitted and superimposed concurrently over a non-orthogonal uplink fading channel, and the model aggregations of all the tasks are reconstructed at the ES through a modified version of the turbo compressed sensing algorithm (Turbo-CS) that overcomes inter-task interference. Both convergence analysis and numerical results show that the OA-FMTL framework can significantly improve the system efficiency in terms of reducing the number of channel uses without causing substantial learning performance degradation.

IRFeb 3, 2021
Session-based Recommendation with Self-Attention Networks

Jun Fang

Session-based recommendation aims to predict user's next behavior from current session and previous anonymous sessions. Capturing long-range dependencies between items is a vital challenge in session-based recommendation. A novel approach is proposed for session-based recommendation with self-attention networks (SR-SAN) as a remedy. The self-attention networks (SAN) allow SR-SAN capture the global dependencies among all items of a session regardless of their distance. In SR-SAN, a single item latent vector is used to capture both current interest and global interest instead of session embedding which is composed of current interest embedding and global interest embedding. Some experiments have been performed on some open benchmark datasets. Experimental results show that the proposed method outperforms some state-of-the-arts by comparisons.

CVJan 31, 2020
Post-Training Piecewise Linear Quantization for Deep Neural Networks

Jun Fang, Ali Shafiee, Hamzah Abdel-Aziz et al.

Quantization plays an important role in the energy-efficient deployment of deep neural networks on resource-limited devices. Post-training quantization is highly desirable since it does not require retraining or access to the full training dataset. The well-established uniform scheme for post-training quantization achieves satisfactory results by converting neural networks from full-precision to 8-bit fixed-point integers. However, it suffers from significant performance degradation when quantizing to lower bit-widths. In this paper, we propose a piecewise linear quantization (PWLQ) scheme to enable accurate approximation for tensor values that have bell-shaped distributions with long tails. Our approach breaks the entire quantization range into non-overlapping regions for each tensor, with each region being assigned an equal number of quantization levels. Optimal breakpoints that divide the entire range are found by minimizing the quantization error. Compared to state-of-the-art post-training quantization methods, experimental results show that our proposed method achieves superior performance on image classification, semantic segmentation, and object detection with minor overhead.

MLNov 5, 2018
Low-Rank Phase Retrieval via Variational Bayesian Learning

Kaihui Liu, Jiayi Wang, Zhengli Xing et al.

In this paper, we consider the problem of low-rank phase retrieval whose objective is to estimate a complex low-rank matrix from magnitude-only measurements. We propose a hierarchical prior model for low-rank phase retrieval, in which a Gaussian-Wishart hierarchical prior is placed on the underlying low-rank matrix to promote the low-rankness of the matrix. Based on the proposed hierarchical model, a variational expectation-maximization (EM) algorithm is developed. The proposed method is less sensitive to the choice of the initialization point and works well with random initialization. Simulation results are provided to illustrate the effectiveness of the proposed algorithm.

LGNov 6, 2017
Simultaneous Block-Sparse Signal Recovery Using Pattern-Coupled Sparse Bayesian Learning

Hang Xiao, Zhengli Xing, Linxiao Yang et al.

In this paper, we consider the block-sparse signals recovery problem in the context of multiple measurement vectors (MMV) with common row sparsity patterns. We develop a new method for recovery of common row sparsity MMV signals, where a pattern-coupled hierarchical Gaussian prior model is introduced to characterize both the block-sparsity of the coefficients and the statistical dependency between neighboring coefficients of the common row sparsity MMV signals. Unlike many other methods, the proposed method is able to automatically capture the block sparse structure of the unknown signal. Our method is developed using an expectation-maximization (EM) framework. Simulation results show that our proposed method offers competitive performance in recovering block-sparse common row sparsity pattern MMV signals.

MLAug 24, 2017
Bayesian Compressive Sensing Using Normal Product Priors

Zhou Zhou, Kaihui Liu, Jun Fang

In this paper, we introduce a new sparsity-promoting prior, namely, the "normal product" prior, and develop an efficient algorithm for sparse signal recovery under the Bayesian framework. The normal product distribution is the distribution of a product of two normally distributed variables with zero means and possibly different variances. Like other sparsity-encouraging distributions such as the Student's $t$-distribution, the normal product distribution has a sharp peak at origin, which makes it a suitable prior to encourage sparse solutions. A two-stage normal product-based hierarchical model is proposed. We resort to the variational Bayesian (VB) method to perform the inference. Simulations are conducted to illustrate the effectiveness of our proposed algorithm as compared with other state-of-the-art compressed sensing algorithms.

LGAug 8, 2017
Fast Low-Rank Bayesian Matrix Completion with Hierarchical Gaussian Prior Models

Linxiao Yang, Jun Fang, Huiping Duan et al.

The problem of low rank matrix completion is considered in this paper. To exploit the underlying low-rank structure of the data matrix, we propose a hierarchical Gaussian prior model, where columns of the low-rank matrix are assumed to follow a Gaussian distribution with zero mean and a common precision matrix, and a Wishart distribution is specified as a hyperprior over the precision matrix. We show that such a hierarchical Gaussian prior has the potential to encourage a low-rank solution. Based on the proposed hierarchical prior model, a variational Bayesian method is developed for matrix completion, where the generalized approximate massage passing (GAMP) technique is embedded into the variational Bayesian inference in order to circumvent cumbersome matrix inverse operations. Simulation results show that our proposed method demonstrates superiority over existing state-of-the-art matrix completion methods.

MLOct 10, 2016
Robust Bayesian Compressed sensing

Qian Wan, Huiping Duan, Jun Fang et al.

We consider the problem of robust compressed sensing whose objective is to recover a high-dimensional sparse signal from compressed measurements corrupted by outliers. A new sparse Bayesian learning method is developed for robust compressed sensing. The basic idea of the proposed method is to identify and remove the outliers from sparse signal recovery. To automatically identify the outliers, we employ a set of binary indicator hyperparameters to indicate which observations are outliers. These indicator hyperparameters are treated as random variables and assigned a beta process prior such that their values are confined to be binary. In addition, a Gaussian-inverse Gamma prior is imposed on the sparse signal to promote sparsity. Based on this hierarchical prior model, we develop a variational Bayesian method to estimate the indicator hyperparameters as well as the sparse signal. Simulation results show that the proposed method achieves a substantial performance improvement over existing robust compressed sensing techniques.

NAAug 31, 2016
Learning Dominant Wave Directions For Plane Wave Methods For High-Frequency Helmholtz Equations

Jun Fang, Jianliang Qian, Leonardo Zepeda-Núñez et al.

We present a ray-based finite element method (ray-FEM) by learning basis adaptive to the underlying high-frequency Helmholtz equation in smooth media. Based on the geometric optics ansatz of the wave field, we learn local dominant ray directions by probing the medium using low-frequency waves with the same source. Once local ray directions are extracted, they are incorporated into the finite element basis to solve the high-frequency Helmholtz equation. This process can be continued to further improve approximations for both local ray directions and the high frequency wave field iteratively. The method requires a fixed number of grid points per wavelength to represent the wave field and achieves an asymptotic convergence as the frequency $ω\rightarrow \infty$ without the pollution effect. A fast solver is developed for the resulting linear system with an empirical complexity $\mathcal{O}(ω^d)$ up to a poly-logarithmic factor. Numerical examples in 2D are presented to corroborate the claims.

NANov 15, 2015
An Iterative Reweighted Method for Tucker Decomposition of Incomplete Multiway Tensors

Linxiao Yang, Jun Fang, Hongbin Li et al.

We consider the problem of low-rank decomposition of incomplete multiway tensors. Since many real-world data lie on an intrinsically low dimensional subspace, tensor low-rank decomposition with missing entries has applications in many data analysis problems such as recommender systems and image inpainting. In this paper, we focus on Tucker decomposition which represents an Nth-order tensor in terms of N factor matrices and a core tensor via multilinear operations. To exploit the underlying multilinear low-rank structure in high-dimensional datasets, we propose a group-based log-sum penalty functional to place structural sparsity over the core tensor, which leads to a compact representation with smallest core tensor. The method for Tucker decomposition is developed by iteratively minimizing a surrogate function that majorizes the original objective function, which results in an iterative reweighted process. In addition, to reduce the computational complexity, an over-relaxed monotone fast iterative shrinkage-thresholding technique is adapted and embedded in the iterative reweighted process. The proposed method is able to determine the model complexity (i.e. multilinear rank) in an automatic way. Simulation results show that the proposed algorithm offers competitive performance compared with other existing algorithms.

LGJun 2, 2015
Global and Local Structure Preserving Sparse Subspace Learning: An Iterative Approach to Unsupervised Feature Selection

Nan Zhou, Yangyang Xu, Hong Cheng et al.

As we aim at alleviating the curse of high-dimensionality, subspace learning is becoming more popular. Existing approaches use either information about global or local structure of the data, and few studies simultaneously focus on global and local structures as the both of them contain important information. In this paper, we propose a global and local structure preserving sparse subspace learning (GLoSS) model for unsupervised feature selection. The model can simultaneously realize feature selection and subspace learning. In addition, we develop a greedy algorithm to establish a generic combinatorial model, and an iterative strategy based on an accelerated block coordinate descent is used to solve the GLoSS problem. We also provide whole iterate sequence convergence analysis of the proposed iterative algorithm. Extensive experiments are conducted on real-world datasets to show the superiority of the proposed approach over several state-of-the-art unsupervised feature selection approaches.

LGMar 7, 2015
Sparse Bayesian Dictionary Learning with a Gaussian Hierarchical Model

Linxiao Yang, Jun Fang, Hong Cheng et al.

We consider a dictionary learning problem whose objective is to design a dictionary such that the signals admits a sparse or an approximate sparse representation over the learned dictionary. Such a problem finds a variety of applications such as image denoising, feature extraction, etc. In this paper, we propose a new hierarchical Bayesian model for dictionary learning, in which a Gaussian-inverse Gamma hierarchical prior is used to promote the sparsity of the representation. Suitable priors are also placed on the dictionary and the noise variance such that they can be reasonably inferred from the data. Based on the hierarchical model, a variational Bayesian method and a Gibbs sampling method are developed for Bayesian inference. The proposed methods have the advantage that they do not require the knowledge of the noise variance \emph{a priori}. Numerical results show that the proposed methods are able to learn the dictionary with an accuracy better than existing methods, particularly for the case where there is a limited number of training signals.

ITNov 9, 2013
Pattern-Coupled Sparse Bayesian Learning for Recovery of Block-Sparse Signals

Jun Fang, Yanning Shen, Hongbin Li et al.

We consider the problem of recovering block-sparse signals whose structures are unknown \emph{a priori}. Block-sparse signals with nonzero coefficients occurring in clusters arise naturally in many practical scenarios. However, the knowledge of the block structure is usually unavailable in practice. In this paper, we develop a new sparse Bayesian learning method for recovery of block-sparse signals with unknown cluster patterns. Specifically, a pattern-coupled hierarchical Gaussian prior model is introduced to characterize the statistical dependencies among coefficients, in which a set of hyperparameters are employed to control the sparsity of signal coefficients. Unlike the conventional sparse Bayesian learning framework in which each individual hyperparameter is associated independently with each coefficient, in this paper, the prior for each coefficient not only involves its own hyperparameter, but also the hyperparameters of its immediate neighbors. In doing this way, the sparsity patterns of neighboring coefficients are related to each other and the hierarchical model has the potential to encourage structured-sparse solutions. The hyperparameters, along with the sparse signal, are learned by maximizing their posterior probability via an expectation-maximization (EM) algorithm. Numerical results show that the proposed algorithm presents uniform superiority over other existing methods in a series of experiments.