NAAug 28, 2018
Spectrum-Adapted Polynomial Approximation for Matrix FunctionsLi Fan, David I Shuman, Shashanka Ubaru et al.
We propose and investigate two new methods to approximate $f({\bf A}){\bf b}$ for large, sparse, Hermitian matrices ${\bf A}$. The main idea behind both methods is to first estimate the spectral density of ${\bf A}$, and then find polynomials of a fixed order that better approximate the function $f$ on areas of the spectrum with a higher density of eigenvalues. Compared to state-of-the-art methods such as the Lanczos method and truncated Chebyshev expansion, the proposed methods tend to provide more accurate approximations of $f({\bf A}){\bf b}$ at lower polynomial orders, and for matrices ${\bf A}$ with a large number of distinct interior eigenvalues and a small spectral width.
22.2IRMay 21
A Hierarchical Quantized Tokenization Framework for Task-Adaptive Graph Representation LearningYang Xiang, Li Fan, Chenke Yin et al.
Foundation models in language and vision benefit from a unified discrete token interface that converts raw inputs into sequences for scalable pre-training and inference. For graphs, an effective tokenizer should yield reusable discrete codes that capture both node semantics and relational structure across scales, yet prior quantization-based graph tokenizers typically combine residual vector quantization (RVQ) levels with fixed rules and often focus on a single structural view, limiting cross-task transfer. We present a hierarchical quantized tokenization framework with task-conditioned routing and dual-view token streams. It produces multi-scale codes and two synchronized sequences: a local stream that preserves node-level information and a diffusion-style multi-hop stream that summarizes connectivity. A lightweight router learns task-dependent mixtures over RVQ depths to select an appropriate granularity, while a gated cross-attention module aligns and fuses the two streams into a single token sequence without altering the downstream backbone encoder. Experiments on node classification and link prediction show consistent gains over strong quantized baselines at matched compute, with ablations verifying contributions from hierarchical quantization, adaptive routing, and fusion.
88.2DBMar 17Code
HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding StorageHaidong Rong, Jiashu Yao, Matthias Langer et al.
Traditional GPU hash tables preserve every inserted key -- a dictionary assumption that wastes scarce High Bandwidth Memory (HBM) when embedding tables routinely exceed single-GPU capacity. We challenge this assumption with cache semantics, where policy-driven eviction is a first-class operation. We introduce HierarchicalKV (HKV), the first general-purpose GPU hash table library whose normal full-capacity operating contract is cache-semantic: each full-bucket upsert (update-or-insert) is resolved in place by eviction or admission rejection rather than by rehashing or capacity-induced failure. HKV co-designs four core mechanisms -- cache-line-aligned buckets, in-line score-driven upsert, score-based dynamic dual-bucket selection, and triple-group concurrency -- and uses tiered key-value separation as a scaling enabler beyond HBM. On an NVIDIA H100 NVL GPU, HKV achieves up to 3.9 billion key-value pairs per second (B-KV/s) find throughput, stable across load factors 0.50-1.00 (<5% variation), and delivers 1.4x higher find throughput than WarpCore (the strongest dictionary-semantic GPU baseline at lambda=0.50) and up to 2.6-9.4x over indirection-based GPU baselines. Since its open-source release in October 2022, HKV has been integrated into multiple open-source recommendation frameworks.
CVMar 16, 2023
GDDS: Pulmonary Bronchioles Segmentation with Group Deep Dense SupervisionMingyue Zhao, Shang Zhao, Quan Quan et al.
Airway segmentation, especially bronchioles segmentation, is an important but challenging task because distal bronchus are sparsely distributed and of a fine scale. Existing neural networks usually exploit sparse topology to learn the connectivity of bronchioles and inefficient shallow features to capture such high-frequency information, leading to the breakage or missed detection of individual thin branches. To address these problems, we contribute a new bronchial segmentation method based on Group Deep Dense Supervision (GDDS) that emphasizes fine-scale bronchioles segmentation in a simple-but-effective manner. First, Deep Dense Supervision (DDS) is proposed by constructing local dense topology skillfully and implementing dense topological learning on a specific shallow feature layer. GDDS further empowers the shallow features with better perception ability to detect bronchioles, even the ones that are not easily discernible to the naked eye. Extensive experiments on the BAS benchmark dataset have shown that our method promotes the network to have a high sensitivity in capturing fine-scale branches and outperforms state-of-the-art methods by a large margin (+12.8 % in BD and +8.8 % in TD) while only introducing a small number of extra parameters.
LGNov 2, 2023
Federated Linear Bandits with Finite Adversarial ActionsLi Fan, Ruida Zhou, Chao Tian et al.
We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem with finite adversarial action sets that may be different across clients. To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm, which extends the principles of SupLinUCB and OFUL algorithms in linear contextual bandits. We prove that FedSupLinUCB achieves a total regret of $\tilde{O}(\sqrt{d T})$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model. This matches the minimax lower bound and thus is order-optimal (up to polylog terms). We study both asynchronous and synchronous cases and show that the communication cost can be controlled as $O(d M^2 \log(d)\log(T))$ and $O(\sqrt{d^3 M^3} \log(d))$, respectively. The FedSupLinUCB design is further extended to two scenarios: (1) variance-adaptive, where a total regret of $\tilde{O} (\sqrt{d \sum \nolimits_{t=1}^{T} σ_t^2})$ can be achieved with $σ_t^2$ being the noise variance of round $t$; and (2) adversarial corruption, where a total regret of $\tilde{O}(\sqrt{dT} + d C_p)$ can be achieved with $C_p$ being the total corruption budget. Experiment results corroborate the theoretical analysis and demonstrate the effectiveness of FedSupLinUCB on both synthetic and real-world datasets.
CLSep 2, 2025Code
Implicit Reasoning in Large Language Models: A Comprehensive SurveyJindong Li, Yali Fu, Li Fan et al.
Large Language Models (LLMs) have demonstrated strong generalization across a wide range of tasks. Reasoning with LLMs is central to solving multi-step problems and complex decision-making. To support efficient reasoning, recent studies have shifted attention from explicit chain-of-thought prompting toward implicit reasoning, where reasoning occurs silently via latent structures without emitting intermediate textual steps. Implicit reasoning brings advantages such as lower generation cost, faster inference, and better alignment with internal computation. Although prior surveys have discussed latent representations in the context of reasoning, a dedicated and mechanism-level examination of how reasoning unfolds internally within LLMs remains absent. This survey fills that gap by introducing a taxonomy centered on execution paradigms, shifting the focus from representational forms to computational strategies. We organize existing methods into three execution paradigms based on \textbf{\textit{how and where internal computation unfolds}}: latent optimization, signal-guided control, and layer-recurrent execution. We also review structural, behavioral and representation-based evidence that supports the presence of implicit reasoning in LLMs. We further provide a structured overview of the evaluation metrics and benchmarks used in existing works to assess the effectiveness and reliability of implicit reasoning. We maintain a continuously updated project at: https://github.com/digailab/awesome-llm-implicit-reasoning.
96.4LGMar 21
Discriminative Representation Learning for Clinical PredictionYang Zhang, Li Fan, Samuel Lawrence et al.
Foundation models in healthcare have largely adopted self supervised pretraining objectives inherited from natural language processing and computer vision, emphasizing reconstruction and large scale representation learning prior to downstream adaptation. We revisit this paradigm in outcome centric clinical prediction settings and argue that, when high quality supervision is available, direct outcome alignment may provide a stronger inductive bias than generative pretraining. We propose a supervised deep learning framework that explicitly shapes representation geometry by maximizing inter class separation relative to within class variance, thereby concentrating model capacity along clinically meaningful axes. Across multiple longitudinal electronic health record tasks, including mortality and readmission prediction, our approach consistently outperforms masked, autoregressive, and contrastive pretraining baselines under matched model capacity. The proposed method improves discrimination, calibration, and sample efficiency, while simplifying the training pipeline to a single stage optimization. These findings suggest that in low entropy, outcome driven healthcare domains, supervision can act as the statistically optimal driver of representation learning, challenging the assumption that large scale self supervised pretraining is a prerequisite for strong clinical performance.
CVMar 1, 2021Code
PENet: Towards Precise and Efficient Image Guided Depth CompletionMu Hu, Shuling Wang, Bin Li et al.
Image guided depth completion is the task of generating a dense depth map from a sparse depth map and a high quality image. In this task, how to fuse the color and depth modalities plays an important role in achieving good performance. This paper proposes a two-branch backbone that consists of a color-dominant branch and a depth-dominant branch to exploit and fuse two modalities thoroughly. More specifically, one branch inputs a color image and a sparse depth map to predict a dense depth map. The other branch takes as inputs the sparse depth map and the previously predicted depth map, and outputs a dense depth map as well. The depth maps predicted from two branches are complimentary to each other and therefore they are adaptively fused. In addition, we also propose a simple geometric convolutional layer to encode 3D geometric cues. The geometric encoded backbone conducts the fusion of different modalities at multiple stages, leading to good depth completion results. We further implement a dilated and accelerated CSPN++ to refine the fused depth map efficiently. The proposed full model ranks 1st in the KITTI depth completion online leaderboard at the time of submission. It also infers much faster than most of the top ranked methods. The code of this work is available at https://github.com/JUGGHM/PENet_ICRA2021.
CVSep 6, 2023
Using Neural Networks for Fast SAR Roughness Estimation of High Resolution ImagesLi Fan, Jeova Farias Sales Rocha Neto
The analysis of Synthetic Aperture Radar (SAR) imagery is an important step in remote sensing applications, and it is a challenging problem due to its inherent speckle noise. One typical solution is to model the data using the $G_I^0$ distribution and extract its roughness information, which in turn can be used in posterior imaging tasks, such as segmentation, classification and interpretation. This leads to the need of quick and reliable estimation of the roughness parameter from SAR data, especially with high resolution images. Unfortunately, traditional parameter estimation procedures are slow and prone to estimation failures. In this work, we proposed a neural network-based estimation framework that first learns how to predict underlying parameters of $G_I^0$ samples and then can be used to estimate the roughness of unseen data. We show that this approach leads to an estimator that is quicker, yields less estimation error and is less prone to failures than the traditional estimation procedures for this problem, even when we use a simple network. More importantly, we show that this same methodology can be generalized to handle image inputs and, even if trained on purely synthetic data for a few seconds, is able to perform real time pixel-wise roughness estimation for high resolution real SAR imagery.
HCMay 22, 2024
A New Era in Human Factors Engineering: A Survey of the Applications and Prospects of Large Multimodal ModelsLi Fan, Lee Ching-Hung, Han Su et al.
In recent years, the potential applications of Large Multimodal Models (LMMs) in fields such as healthcare, social psychology, and industrial design have attracted wide research attention, providing new directions for human factors research. For instance, LMM-based smart systems have become novel research subjects of human factors studies, and LMM introduces new research paradigms and methodologies to this field. Therefore, this paper aims to explore the applications, challenges, and future prospects of LMM in the domain of human factors and ergonomics through an expert-LMM collaborated literature review. Specifically, a novel literature review method is proposed, and research studies of LMM-based accident analysis, human modelling and intervention design are introduced. Subsequently, the paper discusses future trends of the research paradigm and challenges of human factors and ergonomics studies in the era of LMMs. It is expected that this study can provide a valuable perspective and serve as a reference for integrating human factors with artificial intelligence.
OPTICSNov 19, 2024
Perfecting Imperfect Physical Neural Networks with Transferable Robustness using Sharpness-Aware TrainingTengji Xu, Zeyu Luo, Shaojie Liu et al.
AI models are essential in science and engineering, but recent advances are pushing the limits of traditional digital hardware. To address these limitations, physical neural networks (PNNs), which use physical substrates for computation, have gained increasing attention. However, developing effective training methods for PNNs remains a significant challenge. Current approaches, regardless of offline and online training, suffer from significant accuracy loss. Offline training is hindered by imprecise modeling, while online training yields device-specific models that can't be transferred to other devices due to manufacturing variances. Both methods face challenges from perturbations after deployment, such as thermal drift or alignment errors, which make trained models invalid and require retraining. Here, we address the challenges with both offline and online training through a novel technique called Sharpness-Aware Training (SAT), where we innovatively leverage the geometry of the loss landscape to tackle the problems in training physical systems. SAT enables accurate training using efficient backpropagation algorithms, even with imprecise models. PNNs trained by SAT offline even outperform those trained online, despite modeling and fabrication errors. SAT also overcomes online training limitations by enabling reliable transfer of models between devices. Finally, SAT is highly resilient to perturbations after deployment, allowing PNNs to continuously operate accurately under perturbations without retraining. We demonstrate SAT across three types of PNNs, showing it is universally applicable, regardless of whether the models are explicitly known. This work offers a transformative, efficient approach to training PNNs, addressing critical challenges in analog computing and enabling real-world deployment.
ITNov 12, 2024
Decision Feedback In-Context Symbol Detection over Block-Fading ChannelsLi Fan, Jing Yang, Cong Shen
Pre-trained Transformers, through in-context learning (ICL), have demonstrated exceptional capabilities to adapt to new tasks using example prompts \textit{without model update}. Transformer-based wireless receivers, where prompts consist of the pilot data in the form of transmitted and received signal pairs, have shown high estimation accuracy when pilot data are abundant. However, pilot information is often costly and limited in practice. In this work, we propose the \underline{DE}cision \underline{F}eedback \underline{IN}-Cont\underline{E}xt \underline{D}etection (DEFINED) solution as a new wireless receiver design, which bypasses channel estimation and directly performs symbol detection using the (sometimes extremely) limited pilot data. The key innovation in DEFINED is the proposed decision feedback mechanism in ICL, where we sequentially incorporate the detected symbols into the prompts to improve the detections for subsequent symbols. Extensive experiments across a broad range of wireless communication settings demonstrate that DEFINED achieves significant performance improvements, in some cases only needing a single pilot pair.
AIJun 23, 2025
Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research TeamWeilun Yu, Shixiang Tang, Yonggui Huang et al.
Scientific progress increasingly relies on effective collaboration among researchers, a dynamic that large language models (LLMs) have only begun to emulate. While recent LLM-based scientist agents show promise in autonomous scientific discovery, they often lack the interactive reasoning and evaluation mechanisms essential to real-world research. We propose IDVSCI (Internal Discussion and Vote SCIentists), a multi-agent framework built on LLMs that incorporates two key innovations: a Dynamic Knowledge Exchange mechanism enabling iterative feedback among agents, and a Dual-Diversity Review paradigm that simulates heterogeneous expert evaluation. These components jointly promote deeper reasoning and the generation of more creative and impactful scientific ideas. To evaluate the effectiveness and generalizability of our approach, we conduct experiments on two datasets: a widely used benchmark in computer science and a new dataset we introduce in the health sciences domain. Results show that IDVSCI consistently achieves the best performance across both datasets, outperforming existing systems such as AI Scientist and VIRSCI. These findings highlight the value of modeling interaction and peer review dynamics in LLM-based autonomous research.
LGOct 29, 2024
RDSA: A Robust Deep Graph Clustering Framework via Dual Soft AssignmentYang Xiang, Li Fan, Tulika Saha et al.
Graph clustering is an essential aspect of network analysis that involves grouping nodes into separate clusters. Recent developments in deep learning have resulted in graph clustering, which has proven effective in many applications. Nonetheless, these methods often encounter difficulties when dealing with real-world graphs, particularly in the presence of noisy edges. Additionally, many denoising graph clustering methods tend to suffer from lower performance, training instability, and challenges in scaling to large datasets compared to non-denoised models. To tackle these issues, we introduce a new framework called the Robust Deep Graph Clustering Framework via Dual Soft Assignment (RDSA). RDSA consists of three key components: (i) a node embedding module that effectively integrates the graph's topological features and node attributes; (ii) a structure-based soft assignment module that improves graph modularity by utilizing an affinity matrix for node assignments; and (iii) a node-based soft assignment module that identifies community landmarks and refines node assignments to enhance the model's robustness. We assess RDSA on various real-world datasets, demonstrating its superior performance relative to existing state-of-the-art methods. Our findings indicate that RDSA provides robust clustering across different graph types, excelling in clustering effectiveness and robustness, including adaptability to noise, stability, and scalability.
CVMar 11, 2024
Skeleton Supervised Airway SegmentationMingyue Zhao, Han Li, Li Fan et al.
Fully-supervised airway segmentation has accomplished significant triumphs over the years in aiding pre-operative diagnosis and intra-operative navigation. However, full voxel-level annotation constitutes a labor-intensive and time-consuming task, often plagued by issues such as missing branches, branch annotation discontinuity, or erroneous edge delineation. label-efficient solutions for airway extraction are rarely explored yet primarily demanding in medical practice. To this end, we introduce a novel skeleton-level annotation (SkA) tailored to the airway, which simplifies the annotation workflow while enhancing annotation consistency and accuracy, preserving the complete topology. Furthermore, we propose a skeleton-supervised learning framework to achieve accurate airway segmentation. Firstly, a dual-stream buffer inference is introduced to realize initial label propagation from SkA, avoiding the collapse of direct learning from SkA. Then, we construct a geometry-aware dual-path propagation framework (GDP) to further promote complementary propagation learning, composed of hard geometry-aware propagation learning and soft geometry-aware propagation guidance. Experiments reveal that our proposed framework outperforms the competing methods with SKA, which amounts to only 1.96% airways, and achieves comparable performance with the baseline model that is fully supervised with 100% airways, demonstrating its significant potential in achieving label-efficient segmentation for other tubular structures, such as vessels.
LGJun 26, 2025
Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol DetectionLi Fan, Peng Wang, Jing Yang et al.
Transformers have shown potential in solving wireless communication problems, particularly via in-context learning (ICL), where models adapt to new tasks through prompts without requiring model updates. However, prior ICL-based Transformer models rely on deep architectures with many layers to achieve satisfactory performance, resulting in substantial storage and computational costs. In this work, we propose CHain Of thOught Symbol dEtection (CHOOSE), a CoT-enhanced shallow Transformer framework for wireless symbol detection. By introducing autoregressive latent reasoning steps within the hidden space, CHOOSE significantly improves the reasoning capacity of shallow models (1-2 layers) without increasing model depth. This design enables lightweight Transformers to achieve detection performance comparable to much deeper models, making them well-suited for deployment on resource-constrained mobile devices. Experimental results demonstrate that our approach outperforms conventional shallow Transformers and achieves performance comparable to that of deep Transformers, while maintaining storage and computational efficiency. This represents a promising direction for implementing Transformer-based algorithms in wireless receivers with limited computational resources.
OPTICSApr 21, 2025
Beyond Terabit/s Integrated Neuromorphic Photonic Processor for DSP-Free Optical InterconnectsBenshan Wang, Qiarong Xiao, Tengji Xu et al.
The rapid expansion of generative AI drives unprecedented demands for high-performance computing. Training large-scale AI models now requires vast interconnected GPU clusters across multiple data centers. Multi-scale AI training and inference demand uniform, ultra-low latency, and energy-efficient links to enable massive GPUs to function as a single cohesive unit. However, traditional electrical and optical interconnects, relying on conventional digital signal processors (DSPs) for signal distortion compensation, increasingly fail to meet these stringent requirements. To overcome these limitations, we present an integrated neuromorphic optical signal processor (OSP) that leverages deep reservoir computing and achieves DSP-free, all-optical, real-time processing. Experimentally, our OSP achieves a 100 Gbaud PAM4 per lane, 1.6 Tbit/s data center interconnect over a 5 km optical fiber in the C-band (equivalent to over 80 km in the O-band), far exceeding the reach of state-of-the-art DSP solutions, which are fundamentally constrained by chromatic dispersion in IMDD systems. Simultaneously, it reduces processing latency by four orders of magnitude and energy consumption by three orders of magnitude. Unlike DSPs, which introduce increased latency at high data rates, our OSP maintains consistent, ultra-low latency regardless of data rate scaling, making it ideal for future optical interconnects. Moreover, the OSP retains full optical field information for better impairment compensation and adapts to various modulation formats, data rates, and wavelengths. Fabricated using a mature silicon photonic process, the OSP can be monolithically integrated with silicon photonic transceivers, enhancing the compactness and reliability of all-optical interconnects. This research provides a highly scalable, energy-efficient, and high-speed solution, paving the way for next-generation AI infrastructure.
ITMar 20, 2025
Decision Feedback In-Context Learning for Wireless Symbol DetectionLi Fan, Wei Shen, Jing Yang et al.
Pre-trained Transformers, through in-context learning (ICL), have demonstrated exceptional capabilities to adapt to new tasks using example prompts without model update. Transformer-based wireless receivers, where prompts consist of the pilot data in the form of transmitted and received signal pairs, have shown high detection accuracy when pilot data are abundant. However, pilot information is often costly and limited in practice. In this work, we propose DEcision Feedback IN-ContExt Detection (DEFINED) as a new wireless receiver design, which bypasses channel estimation and directly performs symbol detection using the (sometimes extremely) limited pilot data. The key innovation in DEFINED is the proposed decision feedback mechanism in ICL, where we sequentially incorporate the detected symbols into the prompts as pseudo-labels to improve the detection for subsequent symbols. We further establish an error lower bound and provide theoretical insights into the model's generalization under channel distribution mismatch. Extensive experiments across a broad range of wireless settings demonstrate that a small Transformer trained with DEFINED achieves significant performance improvements over conventional methods, in some cases only needing a single pilot pair to achieve similar performance to the latter with more than 4 pilot pairs.
NAAug 7, 2015
Time Dependent Scattering from a GratingLi Fan, Peter Monk
Computing the electromagnetic field due to a periodic grating is critical for assessing the performance of thin film solar voltaic devices. In this paper we investigate the computation of these fields in the time domain (similar problems also arise in simulating antennas). Assuming a translation invariant periodic grating this reduces to solving the wave equation in a periodic domain. Materials used in practical devices have frequency dependent coefficients, and we provide a first proof of existence and uniqueness for a general class of such materials. Using Convolution Quadrature we can then prove time stepping error estimates. We end with some preliminary numerical results that demonstrate the convergence and stability of the scheme.