81.6CVApr 13Code
NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: AI Flash Portrait (Track 3)Ya-nan Guan, Shaonan Zhang, Hang Guo et al.
In this paper, we present a comprehensive overview of the NTIRE 2026 3rd Restore Any Image Model (RAIM) challenge, with a specific focus on Track 3: AI Flash Portrait. Despite significant advancements in deep learning for image restoration, existing models still encounter substantial challenges in real-world low-light portrait scenarios. Specifically, they struggle to achieve an optimal balance among noise suppression, detail preservation, and faithful illumination and color reproduction. To bridge this gap, this challenge aims to establish a novel benchmark for real-world low-light portrait restoration. We comprehensively evaluate the proposed algorithms utilizing a hybrid evaluation system that integrates objective quantitative metrics with rigorous subjective assessment protocols. For this competition, we provide a dataset containing 800 groups of real-captured low-light portrait data. Each group consists of a 1K-resolution low-light input image, a 1K ground truth (GT), and a 1K person mask. This challenge has garnered widespread attention from both academia and industry, attracting over 100 participating teams and receiving more than 3,000 valid submissions. This report details the motivation behind the challenge, the dataset construction process, the evaluation metrics, and the various phases of the competition. The released dataset and baseline code for this track are publicly available from the same \href{https://github.com/zsn1434/AI_Flash-BaseLine/tree/main}{GitHub repository}, and the official challenge webpage is hosted on \href{https://www.codabench.org/competitions/12885/}{CodaBench}.
SPSep 16, 2023
Intelligent machines work in unstructured environments by differential neuromorphic computingShengbo Wang, Shuo Gao, Chenyu Tang et al.
Efficient operation of intelligent machines in the real world requires methods that allow them to understand and predict the uncertainties presented by the unstructured environments with good accuracy, scalability and generalization, similar to humans. Current methods rely on pretrained networks instead of continuously learning from the dynamic signal properties of working environments and suffer inherent limitations, such as data-hungry procedures, and limited generalization capabilities. Herein, we present a memristor-based differential neuromorphic computing, perceptual signal processing and learning method for intelligent machines. The main features of environmental information such as amplification (>720%) and adaptation (<50%) of mechanical stimuli encoded in memristors, are extracted to obtain human-like processing in unstructured environments. The developed method takes advantage of the intrinsic multi-state property of memristors and exhibits good scalability and generalization, as confirmed by validation in two different application scenarios: object grasping and autonomous driving. In the former, a robot hand experimentally realizes safe and stable grasping through fast learning (in ~1 ms) the unknown object features (e.g., sharp corner and smooth surface) with a single memristor. In the latter, the decision-making information of 10 unstructured environments in autonomous driving (e.g., overtaking cars, pedestrians) is accurately (94%) extracted with a 40*25 memristor array. By mimicking the intrinsic nature of human low-level perception mechanisms, the electronic memristive neuromorphic circuit-based method, presented here shows the potential for adapting to diverse sensing technologies and helping intelligent machines generate smart high-level decisions in the real world.
99.2ETMay 29
Accelerating NBTI Aging Evaluation via Physics-Aware Graph Attention NetworksRenJie Zheng, HengXi Liu, ShuJun Gao et al.
As semiconductor technology advances to smaller nodes, Negative Bias Temperature Instability (NBTI) under prolonged workloads has emerged as a significant bottleneck constraining reliability-aware Design-Technology Co-Optimization (DTCO). Conventional TCAD simulations incur prohibitive computational overhead when evaluating device aging characteristics, making it difficult to satisfy the demand for efficient iterative design cycles. To address this challenge, this paper proposes an aging evaluation framework based on a physics-aware graph attention network (Physics-Aware RelGAT). By losslessly mapping unstructured device meshes into attributed graphs, this framework constructs a 45-dimensional device encoding scheme that integrates interface trap distributions and macroscopic electro-thermal stresses, achieving a direct mapping from underlying physical quantities to device degradation characteristics. To overcome the challenge of predicting currents that span multiple orders of magnitude, a dual-end normalization strategy and a log-scale loss function optimization are introduced, ensuring the model possesses high-precision fitting capabilities. Experimental results demonstrate that the model achieves a mean error of only 1.27% on an independent test set, achieving an acceleration of approximately 17,000 times compared to traditional TCAD simulations. This framework provides a solution for the assessment of circuit reliability in advanced process nodes that successfully balances physical fidelity with industrial-grade efficiency.
IVJun 25, 2023
SHISRCNet: Super-resolution And Classification Network For Low-resolution Breast Cancer Histopathology ImageLuyuan Xie, Cong Li, Zirui Wang et al.
The rapid identification and accurate diagnosis of breast cancer, known as the killer of women, have become greatly significant for those patients. Numerous breast cancer histopathological image classification methods have been proposed. But they still suffer from two problems. (1) These methods can only hand high-resolution (HR) images. However, the low-resolution (LR) images are often collected by the digital slide scanner with limited hardware conditions. Compared with HR images, LR images often lose some key features like texture, which deeply affects the accuracy of diagnosis. (2) The existing methods have fixed receptive fields, so they can not extract and fuse multi-scale features well for images with different magnification factors. To fill these gaps, we present a \textbf{S}ingle \textbf{H}istopathological \textbf{I}mage \textbf{S}uper-\textbf{R}esolution \textbf{C}lassification network (SHISRCNet), which consists of two modules: Super-Resolution (SR) and Classification (CF) modules. SR module reconstructs LR images into SR ones. CF module extracts and fuses the multi-scale features of SR images for classification. In the training stage, we introduce HR images into the CF module to enhance SHISRCNet's performance. Finally, through the joint training of these two modules, super-resolution and classified of LR images are integrated into our model. The experimental results demonstrate that the effects of our method are close to the SOTA methods with taking HR images as inputs.
94.0ARApr 9Code
A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM AcceleratorsCong Li, Chenhao Xue, Yi Ren et al.
Large language models (LLMs) exhibit memory-intensive behavior during decoding, making it a key bottleneck in LLM inference. To accelerate decoding execution, hybrid-bonding-based 3D-DRAM has been adopted in LLM accelerators. While this emerging technology provides strong performance gains over existing hardware, current 3D-DRAM accelerators (3D-Accelerators) rely on closed-source evaluation tools, limiting access to publicly available performance analysis methods. Moreover, existing designs are highly customized for specific scenarios, lacking a general and reusable full-stack modeling for 3D-Accelerators across diverse usecases. To bridge this fundamental gap, we present ATLAS, the first silicon-proven Architectural Three-dimesional-DRAM-based LLM Accelerator Simulation framework. Built on commercially deployed multi-layer 3D-DRAM technology, ATLAS introduces unified abstractions for both 3D-Accelerator system architecture and programming primitives to support arbitrary LLM inference scenarios. Validation against real silicon shows that ATLAS achieves $\le$8.57% simulation error and 97.26-99.96\% correlation with measured performance. Through design space exploration with ATLAS, we demonstrate its ability to guide architecture design and distill key takeaways for both 3D-DRAM memory system and 3D-Accelerator microarchitecture across scenarios. ATLAS will be open-sourced upon publication, enabling further research on 3D-Accelerators.
CRMar 3, 2023
NCL: Textual Backdoor Defense Using Noise-augmented Contrastive LearningShengfang Zhai, Qingni Shen, Xiaoyi Chen et al.
At present, backdoor attacks attract attention as they do great harm to deep learning models. The adversary poisons the training data making the model being injected with a backdoor after being trained unconsciously by victims using the poisoned dataset. In the field of text, however, existing works do not provide sufficient defense against backdoor attacks. In this paper, we propose a Noise-augmented Contrastive Learning (NCL) framework to defend against textual backdoor attacks when training models with untrustworthy data. With the aim of mitigating the mapping between triggers and the target label, we add appropriate noise perturbing possible backdoor triggers, augment the training dataset, and then pull homology samples in the feature space utilizing contrastive learning objective. Experiments demonstrate the effectiveness of our method in defending three types of textual backdoor attacks, outperforming the prior works.
80.7SEMar 20Code
Agentic Harness for Real-World CompilersYingwei Zheng, Cong Li, Shaohua Li et al.
Compilers are critical to modern computing, yet fixing compiler bugs is difficult. While recent large language model (LLM) advancements enable automated bug repair, compiler bugs pose unique challenges due to their complexity, deep cross-domain expertise requirements, and sparse, non-descriptive bug reports, necessitating compiler-specific tools. To bridge the gap, we introduce llvm-autofix, the first agentic harness designed to assist LLM agents in understanding and fixing compiler bugs. Our focus is on LLVM, one of the most widely used compiler infrastructures. Central to llvm-autofix are agent-friendly LLVM tools, a benchmark llvm-bench of reproducible LLVM bugs, and a tailored minimal agent llvm-autofix-mini for fixing LLVM bugs. Our evaluation demonstrates a performance decline of 60% in frontier models when tackling compiler bugs compared with common software bugs. Our minimal agent llvm-autofix-mini also outperforms the state-of-the-art by approximately 22%. This emphasizes the necessity for specialized harnesses like ours to close the loop between LLMs and compiler engineering. We believe this work establishes a foundation for advancing LLM capabilities in complex systems like compilers. GitHub: https://github.com/dtcxzyw/llvm-autofix
CVSep 10, 2024
Neuromorphic spatiotemporal optical flow: Enabling ultrafast visual perception beyond human capabilitiesShengbo Wang, Jingwen Zhao, Tongming Pu et al.
Optical flow, inspired by the mechanisms of biological visual systems, calculates spatial motion vectors within visual scenes that are necessary for enabling robotics to excel in complex and dynamic working environments. However, current optical flow algorithms, despite human-competitive task performance on benchmark datasets, remain constrained by unacceptable time delays (~0.6 seconds per inference, 4X human processing speed) in practical deployment. Here, we introduce a neuromorphic optical flow approach that addresses delay bottlenecks by encoding temporal information directly in a synaptic transistor array to assist spatial motion analysis. Compared to conventional spatial-only optical flow methods, our spatiotemporal neuromorphic optical flow offers the spatial-temporal consistency of motion information, rapidly identifying regions of interest in as little as 1-2 ms using the temporal motion cues derived from the embedded temporal information in the two-dimensional floating gate synaptic transistors. Thus, the visual input can be selectively filtered to achieve faster velocity calculations and various task execution. At the hardware level, due to the atomically sharp interfaces between distinct functional layers in two-dimensional van der Waals heterostructures, the synaptic transistor offers high-frequency response (~100 μs), robust non-volatility (>10000 s), and excellent endurance (>8000 cycles), enabling robust visual processing. In software benchmarks, our system outperforms state-of-the-art algorithms with a 400% speedup, frequently surpassing human-level performance while maintaining or enhancing accuracy by utilizing the temporal priors provided by the embedded temporal information.
LGMar 22, 2025Code
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLMCodefuse, Ling Team, Wenting Cai et al.
Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding. It is still challenging to build a code LLM with comprehensive performance yet ultimate efficiency. Many attempts have been released in the open source community to break the trade-off between performance and efficiency, such as the Qwen Coder series and the DeepSeek Coder series. This paper introduces yet another attempt in this area, namely Ling-Coder-Lite. We leverage the efficient Mixture-of-Experts (MoE) architecture along with a set of high-quality data curation methods (especially those based on program analytics) to build an efficient yet powerful code LLM. Ling-Coder-Lite exhibits on-par performance on 12 representative coding benchmarks compared to state-of-the-art models of similar size, such as Qwen2.5-Coder-7B and DeepSeek-Coder-V2-Lite, while offering competitive latency and throughput. In practice, we achieve a 50\% reduction in deployment resources compared to the similar-sized dense model without performance loss. To facilitate further research and development in this area, we open-source our models as well as a substantial portion of high-quality data for the annealing and post-training stages. The models and data can be accessed at~\url{https://huggingface.co/inclusionAI/Ling-Coder-lite}.
ARNov 6, 2025
AIM: Software and Hardware Co-design for Architecture-level IR-drop Mitigation in High-performance PIMYuanpeng Zhang, Xing Hu, Xi Chen et al.
SRAM Processing-in-Memory (PIM) has emerged as the most promising implementation for high-performance PIM, delivering superior computing density, energy efficiency, and computational precision. However, the pursuit of higher performance necessitates more complex circuit designs and increased operating frequencies, which exacerbate IR-drop issues. Severe IR-drop can significantly degrade chip performance and even threaten reliability. Conventional circuit-level IR-drop mitigation methods, such as back-end optimizations, are resource-intensive and often compromise power, performance, and area (PPA). To address these challenges, we propose AIM, comprehensive software and hardware co-design for architecture-level IR-drop mitigation in high-performance PIM. Initially, leveraging the bit-serial and in-situ dataflow processing properties of PIM, we introduce Rtog and HR, which establish a direct correlation between PIM workloads and IR-drop. Building on this foundation, we propose LHR and WDS, enabling extensive exploration of architecture-level IR-drop mitigation while maintaining computational accuracy through software optimization. Subsequently, we develop IR-Booster, a dynamic adjustment mechanism that integrates software-level HR information with hardware-based IR-drop monitoring to adapt the V-f pairs of the PIM macro, achieving enhanced energy efficiency and performance. Finally, we propose the HR-aware task mapping method, bridging software and hardware designs to achieve optimal improvement. Post-layout simulation results on a 7nm 256-TOPS PIM chip demonstrate that AIM achieves up to 69.2% IR-drop mitigation, resulting in 2.29x energy efficiency improvement and 1.152x speedup.
CLApr 24, 2024Code
Nyonic Technical ReportJunfeng Tian, Rui Wang, Cong Li et al.
This report details the development and key achievements of our latest language model designed for custom large language models. The advancements introduced include a novel Online Data Scheduler that supports flexible training data adjustments and curriculum learning. The model's architecture is fortified with state-of-the-art techniques such as Rotary Positional Embeddings, QK-LayerNorm, and a specially crafted multilingual tokenizer to enhance stability and performance. Moreover, our robust training framework incorporates advanced monitoring and rapid recovery features to ensure optimal efficiency. Our Wonton 7B model has demonstrated competitive performance on a range of multilingual and English benchmarks. Future developments will prioritize narrowing the performance gap with more extensively trained models, thereby enhancing the model's real-world efficacy and adaptability.GitHub: \url{https://github.com/nyonicai/nyonic-public}
LGMay 10, 2024
MH-pFLID: Model Heterogeneous personalized Federated Learning via Injection and Distillation for Medical Data AnalysisLuyuan Xie, Manqing Lin, Tianyu Luan et al.
Federated learning is widely used in medical applications for training global models without needing local data access. However, varying computational capabilities and network architectures (system heterogeneity), across clients pose significant challenges in effectively aggregating information from non-independently and identically distributed (non-IID) data. Current federated learning methods using knowledge distillation require public datasets, raising privacy and data collection issues. Additionally, these datasets require additional local computing and storage resources, which is a burden for medical institutions with limited hardware conditions. In this paper, we introduce a novel federated learning paradigm, named Model Heterogeneous personalized Federated Learning via Injection and Distillation (MH-pFLID). Our framework leverages a lightweight messenger model that carries concentrated information to collect the information from each client. We also develop a set of receiver and transmitter modules to receive and send information from the messenger model, so that the information could be injected and distilled with efficiency.
SPJan 6, 2024
TRLS: A Time Series Representation Learning Framework via Spectrogram for Medical Signal ProcessingLuyuan Xie, Cong Li, Xin Zhang et al. · pku
Representation learning frameworks in unlabeled time series have been proposed for medical signal processing. Despite the numerous excellent progresses have been made in previous works, we observe the representation extracted for the time series still does not generalize well. In this paper, we present a Time series (medical signal) Representation Learning framework via Spectrogram (TRLS) to get more informative representations. We transform the input time-domain medical signals into spectrograms and design a time-frequency encoder named Time Frequency RNN (TFRNN) to capture more robust multi-scale representations from the augmented spectrograms. Our TRLS takes spectrogram as input with two types of different data augmentations and maximizes the similarity between positive ones, which effectively circumvents the problem of designing negative samples. Our evaluation of four real-world medical signal datasets focusing on medical signal classification shows that TRLS is superior to the existing frameworks.
GNFeb 18, 2024
scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type AnnotationCong Li, Meng Xiao, Pengfei Wang et al.
Despite the inherent limitations of existing Large Language Models in directly reading and interpreting single-cell omics data, they demonstrate significant potential and flexibility as the Foundation Model. This research focuses on how to train and adapt the Large Language Model with the capability to interpret and distinguish cell types in single-cell RNA sequencing data. Our preliminary research results indicate that these foundational models excel in accurately categorizing known cell types, demonstrating the potential of the Large Language Models as effective tools for uncovering new biological insights.
RODec 27, 2024
Toward Scalable Multirobot Control: Fast Policy Learning in Distributed MPCXinglong Zhang, Wei Pan, Cong Li et al.
Distributed model predictive control (DMPC) is promising in achieving optimal cooperative control in multirobot systems (MRS). However, real-time DMPC implementation relies on numerical optimization tools to periodically calculate local control sequences online. This process is computationally demanding and lacks scalability for large-scale, nonlinear MRS. This article proposes a novel distributed learning-based predictive control (DLPC) framework for scalable multirobot control. Unlike conventional DMPC methods that calculate open-loop control sequences, our approach centers around a computationally fast and efficient distributed policy learning algorithm that generates explicit closed-loop DMPC policies for MRS without using numerical solvers. The policy learning is executed incrementally and forward in time in each prediction interval through an online distributed actor-critic implementation. The control policies are successively updated in a receding-horizon manner, enabling fast and efficient policy learning with the closed-loop stability guarantee. The learned control policies could be deployed online to MRS with varying robot scales, enhancing scalability and transferability for large-scale MRS. Furthermore, we extend our methodology to address the multirobot safe learning challenge through a force field-inspired policy learning approach. We validate our approach's effectiveness, scalability, and efficiency through extensive experiments on cooperative tasks of large-scale wheeled robots and multirotor drones. Our results demonstrate the rapid learning and deployment of DMPC policies for MRS with scales up to 10,000 units.
ASNov 27, 2024
Wearable intelligent throat enables natural speech in stroke patients with dysarthriaChenyu Tang, Shuo Gao, Cong Li et al.
Wearable silent speech systems hold significant potential for restoring communication in patients with speech impairments. However, seamless, coherent speech remains elusive, and clinical efficacy is still unproven. Here, we present an AI-driven intelligent throat (IT) system that integrates throat muscle vibrations and carotid pulse signal sensors with large language model (LLM) processing to enable fluent, emotionally expressive communication. The system utilizes ultrasensitive textile strain sensors to capture high-quality signals from the neck area and supports token-level processing for real-time, continuous speech decoding, enabling seamless, delay-free communication. In tests with five stroke patients with dysarthria, IT's LLM agents intelligently corrected token errors and enriched sentence-level emotional and logical coherence, achieving low error rates (4.2% word error rate, 2.9% sentence error rate) and a 55% increase in user satisfaction. This work establishes a portable, intuitive communication platform for patients with dysarthria with the potential to be applied broadly across different neurological conditions and in multi-language support systems.
IRSep 1, 2025
CSRM-LLM: Embracing Multilingual LLMs for Cold-Start Relevance Matching in Emerging E-commerce MarketsYujing Wang, Yiren Chen, Huoran Li et al.
As global e-commerce platforms continue to expand, companies are entering new markets where they encounter cold-start challenges due to limited human labels and user behaviors. In this paper, we share our experiences in Coupang to provide a competitive cold-start performance of relevance matching for emerging e-commerce markets. Specifically, we present a Cold-Start Relevance Matching (CSRM) framework, utilizing a multilingual Large Language Model (LLM) to address three challenges: (1) activating cross-lingual transfer learning abilities of LLMs through machine translation tasks; (2) enhancing query understanding and incorporating e-commerce knowledge by retrieval-based query augmentation; (3) mitigating the impact of training label errors through a multi-round self-distillation training strategy. Our experiments demonstrate the effectiveness of CSRM-LLM and the proposed techniques, resulting in successful real-world deployment and significant online gains, with a 45.8% reduction in defect ratio and a 0.866% uplift in session purchase rate.
GNDec 24, 2024
scReader: Prompting Large Language Models to Interpret scRNA-seq DataCong Li, Qingqing Long, Yuanchun Zhou et al.
Large language models (LLMs) have demonstrated remarkable advancements, primarily due to their capabilities in modeling the hidden relationships within text sequences. This innovation presents a unique opportunity in the field of life sciences, where vast collections of single-cell omics data from multiple species provide a foundation for training foundational models. However, the challenge lies in the disparity of data scales across different species, hindering the development of a comprehensive model for interpreting genetic data across diverse organisms. In this study, we propose an innovative hybrid approach that integrates the general knowledge capabilities of LLMs with domain-specific representation models for single-cell omics data interpretation. We begin by focusing on genes as the fundamental unit of representation. Gene representations are initialized using functional descriptions, leveraging the strengths of mature language models such as LLaMA-2. By inputting single-cell gene-level expression data with prompts, we effectively model cellular representations based on the differential expression levels of genes across various species and cell types. In the experiments, we constructed developmental cells from humans and mice, specifically targeting cells that are challenging to annotate. We evaluated our methodology through basic tasks such as cell annotation and visualization analysis. The results demonstrate the efficacy of our approach compared to other methods using LLMs, highlighting significant improvements in accuracy and interoperability. Our hybrid approach enhances the representation of single-cell data and offers a robust framework for future research in cross-species genetic analysis.
CVJul 30, 2025
DACA-Net: A Degradation-Aware Conditional Diffusion Network for Underwater Image EnhancementChang Huang, Jiahang Cao, Jun Ma et al.
Underwater images typically suffer from severe colour distortions, low visibility, and reduced structural clarity due to complex optical effects such as scattering and absorption, which greatly degrade their visual quality and limit the performance of downstream visual perception tasks. Existing enhancement methods often struggle to adaptively handle diverse degradation conditions and fail to leverage underwater-specific physical priors effectively. In this paper, we propose a degradation-aware conditional diffusion model to enhance underwater images adaptively and robustly. Given a degraded underwater image as input, we first predict its degradation level using a lightweight dual-stream convolutional network, generating a continuous degradation score as semantic guidance. Based on this score, we introduce a novel conditional diffusion-based restoration network with a Swin UNet backbone, enabling adaptive noise scheduling and hierarchical feature refinement. To incorporate underwater-specific physical priors, we further propose a degradation-guided adaptive feature fusion module and a hybrid loss function that combines perceptual consistency, histogram matching, and feature-level contrast. Comprehensive experiments on benchmark datasets demonstrate that our method effectively restores underwater images with superior colour fidelity, perceptual quality, and structural details. Compared with SOTA approaches, our framework achieves significant improvements in both quantitative metrics and qualitative visual assessments.
HCNov 28, 2024
An AI-Driven Multimodal Smart Home Platform for Continuous Monitoring and Assistance in Post-Stroke Motor ImpairmentChenyu Tang, Ruizhi Zhang, Shuo Gao et al.
At-home rehabilitation for post-stroke patients presents significant challenges, as continuous, personalized care is often limited outside clinical settings. Moreover, the lack of integrated solutions capable of simultaneously monitoring motor recovery and providing intelligent assistance in home environments hampers rehabilitation outcomes. Here, we present a multimodal smart home platform designed for continuous, at-home rehabilitation of post-stroke patients, integrating wearable sensing, ambient monitoring, and adaptive automation. A plantar pressure insole equipped with a machine learning pipeline classifies users into motor recovery stages with up to 94\% accuracy, enabling quantitative tracking of walking patterns during daily activities. An optional head-mounted eye-tracking module, together with ambient sensors such as cameras and microphones, supports seamless hands-free control of household devices with a 100\% success rate and sub-second response time. These data streams are fused locally via a hierarchical Internet of Things (IoT) architecture, ensuring low latency and data privacy. An embedded large language model (LLM) agent, Auto-Care, continuously interprets multimodal data to provide real-time interventions -- issuing personalized reminders, adjusting environmental conditions, and notifying caregivers. Implemented in a post-stroke context, this integrated smart home platform increased mean user satisfaction from 3.9 $\pm$ 0.8 in conventional home environments to 8.4 $\pm$ 0.6 with the full system ($n=20$). Beyond stroke, the system offers a scalable, patient-centered framework with potential for long-term use in broader neurorehabilitation and aging-in-place applications.
LGJun 29, 2024
MH-pFLGB: Model Heterogeneous personalized Federated Learning via Global Bypass for Medical Image AnalysisLuyuan Xie, Manqing Lin, ChenMing Xu et al.
In the evolving application of medical artificial intelligence, federated learning is notable for its ability to protect training data privacy. Federated learning facilitates collaborative model development without the need to share local data from healthcare institutions. Yet, the statistical and system heterogeneity among these institutions poses substantial challenges, which affects the effectiveness of federated learning and hampers the exchange of information between clients. To address these issues, we introduce a novel approach, MH-pFLGB, which employs a global bypass strategy to mitigate the reliance on public datasets and navigate the complexities of non-IID data distributions. Our method enhances traditional federated learning by integrating a global bypass model, which would share the information among the clients, but also serves as part of the network to enhance the performance on each client. Additionally, MH-pFLGB provides a feature fusion module to better combine the local and global features. We validate \model{}'s effectiveness and adaptability through extensive testing on different medical tasks, demonstrating superior performance compared to existing state-of-the-art methods.
CVJun 29, 2024
pFLFE: Cross-silo Personalized Federated Learning via Feature Enhancement on Medical Image SegmentationLuyuan Xie, Manqing Lin, Siyuan Liu et al.
In medical image segmentation, personalized cross-silo federated learning (FL) is becoming popular for utilizing varied data across healthcare settings to overcome data scarcity and privacy concerns. However, existing methods often suffer from client drift, leading to inconsistent performance and delayed training. We propose a new framework, Personalized Federated Learning via Feature Enhancement (pFLFE), designed to mitigate these challenges. pFLFE consists of two main stages: feature enhancement and supervised learning. The first stage improves differentiation between foreground and background features, and the second uses these enhanced features for learning from segmentation masks. We also design an alternative training approach that requires fewer communication rounds without compromising segmentation quality, even with limited communication resources. Through experiments on three medical segmentation tasks, we demonstrate that pFLFE outperforms the state-of-the-art methods.
CVJun 15, 2024
Technique Report of CVPR 2024 PBDL ChallengesYing Fu, Yu Li, Shaodi You et al.
The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches.
LGNov 1, 2021
GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory ProcessingZhe Zhou, Cong Li, Xuechao Wei et al.
Recently, Graph Neural Networks (GNNs) have become state-of-the-art algorithms for analyzing non-euclidean graph data. However, to realize efficient GNN training is challenging, especially on large graphs. The reasons are many-folded: 1) GNN training incurs a substantial memory footprint. Full-batch training on large graphs even requires hundreds to thousands of gigabytes of memory. 2) GNN training involves both memory-intensive and computation-intensive operations, challenging current CPU/GPU platforms. 3) The irregularity of graphs can result in severe resource under-utilization and load-imbalance problems. This paper presents a GNNear accelerator to tackle these challenges. GNNear adopts a DIMM-based memory system to provide sufficient memory capacity. To match the heterogeneous nature of GNN training, we offload the memory-intensive Reduce operations to in-DIMM Near-Memory-Engines (NMEs), making full use of the high aggregated local bandwidth. We adopt a Centralized-Acceleration-Engine (CAE) to process the computation-intensive Update operations. We further propose several optimization strategies to deal with the irregularity of input graphs and improve GNNear's performance. Comprehensive evaluations on 16 GNN training tasks demonstrate that GNNear achieves 30.8$\times$/2.5$\times$ geomean speedup and 79.6$\times$/7.3$\times$(geomean) higher energy efficiency compared to Xeon E5-2698-v4 CPU and NVIDIA V100 GPU.
AIApr 12, 2021
Deep Attributed Network Representation Learning via Attribute Enhanced NeighborhoodCong Li, Min Shi, Bo Qu et al.
Attributed network representation learning aims at learning node embeddings by integrating network structure and attribute information. It is a challenge to fully capture the microscopic structure and the attribute semantics simultaneously, where the microscopic structure includes the one-step, two-step and multi-step relations, indicating the first-order, second-order and high-order proximity of nodes, respectively. In this paper, we propose a deep attributed network representation learning via attribute enhanced neighborhood (DANRL-ANE) model to improve the robustness and effectiveness of node representations. The DANRL-ANE model adopts the idea of the autoencoder, and expands the decoder component to three branches to capture different order proximity. We linearly combine the adjacency matrix with the attribute similarity matrix as the input of our model, where the attribute similarity matrix is calculated by the cosine similarity between the attributes based on the social homophily. In this way, we preserve the second-order proximity to enhance the robustness of DANRL-ANE model on sparse networks, and deal with the topological and attribute information simultaneously. Moreover, the sigmoid cross-entropy loss function is extended to capture the neighborhood character, so that the first-order proximity is better preserved. We compare our model with the state-of-the-art models on five real-world datasets and two network analysis tasks, i.e., link prediction and node classification. The DANRL-ANE model performs well on various networks, even on sparse networks or networks with isolated nodes given the attribute information is sufficient.
SIAug 23, 2020
TSAM: Temporal Link Prediction in Directed Networks based on Self-Attention MechanismJinsong Li, Jianhua Peng, Shuxin Liu et al.
The development of graph neural networks (GCN) makes it possible to learn structural features from evolving complex networks. Even though a wide range of realistic networks are directed ones, few existing works investigated the properties of directed and temporal networks. In this paper, we address the problem of temporal link prediction in directed networks and propose a deep learning model based on GCN and self-attention mechanism, namely TSAM. The proposed model adopts an autoencoder architecture, which utilizes graph attentional layers to capture the structural feature of neighborhood nodes, as well as a set of graph convolutional layers to capture motif features. A graph recurrent unit layer with self-attention is utilized to learn temporal variations in the snapshot sequence. We run comparative experiments on four realistic networks to validate the effectiveness of TSAM. Experimental results show that TSAM outperforms most benchmarks under two evaluation metrics.
CVNov 19, 2019
Dense Fusion Classmate Network for Land Cover ClassificationChao Tian, Cong Li, Jianping Shi
Recently, FCNs based methods have made great progress in semantic segmentation. Different with ordinary scenes, satellite image owns specific characteristics, which elements always extend to large scope and no regular or clear boundaries. Therefore, effective mid-level structure information extremely missing, precise pixel-level classification becomes tough issues. In this paper, a Dense Fusion Classmate Network (DFCNet) is proposed to adopt in land cover classification.
LGJul 10, 2019
Neural Input Search for Large Scale Recommendation ModelsManas R. Joglekar, Cong Li, Jay K. Adams et al.
Recommendation problems with large numbers of discrete items, such as products, webpages, or videos, are ubiquitous in the technology industry. Deep neural networks are being increasingly used for these recommendation problems. These models use embeddings to represent discrete items as continuous vectors, and the vocabulary sizes and embedding dimensions, although heavily influence the model's accuracy, are often manually selected in a heuristical manner. We present Neural Input Search (NIS), a technique for learning the optimal vocabulary sizes and embedding dimensions for categorical features. The goal is to maximize prediction accuracy subject to a constraint on the total memory used by all embeddings. Moreover, we argue that the traditional Single-size Embedding (SE), which uses the same embedding dimension for all values of a feature, suffers from inefficient usage of model capacity and training data. We propose a novel type of embedding, namely Multi-size Embedding (ME), which allows the embedding dimension to vary for different values of the feature. During training we use reinforcement learning to find the optimal vocabulary size for each feature and embedding dimension for each value of the feature. In experiments on two common types of large scale recommendation problems, i.e. retrieval and ranking problems, NIS automatically found better vocabulary and embedding sizes that result in $6.8\%$ and $1.8\%$ relative improvements on Recall@1 and ROC-AUC over manually optimized ones.
ROFeb 27, 2019
Stereo Visual Inertial LiDAR Simultaneous Localization and MappingWeizhao Shao, Srinivasan Vijayarangan, Cong Li et al.
Simultaneous Localization and Mapping (SLAM) is a fundamental task to mobile and aerial robotics. LiDAR based systems have proven to be superior compared to vision based systems due to its accuracy and robustness. In spite of its superiority, pure LiDAR based systems fail in certain degenerate cases like traveling through a tunnel. We propose Stereo Visual Inertial LiDAR (VIL) SLAM that performs better on these degenerate cases and has comparable performance on all other cases. VIL-SLAM accomplishes this by incorporating tightly-coupled stereo visual inertial odometry (VIO) with LiDAR mapping and LiDAR enhanced visual loop closure. The system generates loop-closure corrected 6-DOF LiDAR poses in real-time and 1cm voxel dense maps near real-time. VIL-SLAM demonstrates improved accuracy and robustness compared to state-of-the-art LiDAR methods.
LGFeb 21, 2019
Reduced-Rank Local Distance Metric Learning for k-NN ClassificationYInjie Huang, Cong Li, Michael Georgiopoulos et al.
We propose a new method for local distance metric learning based on sample similarity as side information. These local metrics, which utilize conical combinations of metric weight matrices, are learned from the pooled spatial characteristics of the data, as well as the similarity profiles between the pairs of samples, whose distances are measured. The main objective of our framework is to yield metrics, such that the resulting distances between similar samples are small and distances between dissimilar samples are above a certain threshold. For learning and inference purposes, we describe a transductive, as well as an inductive algorithm; the former approach naturally befits our framework, while the latter one is provided in the interest of faster learning. Experimental results on a collection of classification problems imply that the new methods may exhibit notable performance advantages over alternative metric learning approaches that have recently appeared in the literature.
CVFeb 16, 2019
R$^2$-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing ImagesJiangmiao Pang, Cong Li, Jianping Shi et al.
Recently, the convolutional neural network has brought impressive improvements for object detection. However, detecting tiny objects in large-scale remote sensing images still remains challenging. First, the extreme large input size makes the existing object detection solutions too slow for practical use. Second, the massive and complex backgrounds cause serious false alarms. Moreover, the ultratiny objects increase the difficulty of accurate detection. To tackle these problems, we propose a unified and self-reinforced network called remote sensing region-based convolutional neural network ($\mathcal{R}^2$-CNN), composing of backbone Tiny-Net, intermediate global attention block, and final classifier and detector. Tiny-Net is a lightweight residual structure, which enables fast and powerful features extraction from inputs. Global attention block is built upon Tiny-Net to inhibit false positives. Classifier is then used to predict the existence of targets in each patch, and detector is followed to locate them accurately if available. The classifier and detector are mutually reinforced with end-to-end training, which further speed up the process and avoid false alarms. Effectiveness of $\mathcal{R}^2$-CNN is validated on hundreds of GF-1 images and GF-2 images that are 18 000 $\times$ 18 192 pixels, 2.0-m resolution, and 27 620 $\times$ 29 200 pixels, 0.8-m resolution, respectively. Specifically, we can process a GF-1 image in 29.4 s on Titian X just with single thread. According to our knowledge, no previous solution can detect the tiny object on such huge remote sensing images gracefully. We believe that it is a significant step toward practical real-time remote sensing systems.
LGJul 11, 2017
Multi-Task Learning Using Neighborhood KernelsNiloofar Yousefi, Cong Li, Mansooreh Mollaghasemi et al.
This paper introduces a new and effective algorithm for learning kernels in a Multi-Task Learning (MTL) setting. Although, we consider a MTL scenario here, our approach can be easily applied to standard single task learning, as well. As shown by our empirical results, our algorithm consistently outperforms the traditional kernel learning algorithms such as uniform combination solution, convex combinations of base kernels as well as some kernel alignment-based models, which have been proven to give promising results in the past. We present a Rademacher complexity bound based on which a new Multi-Task Multiple Kernel Learning (MT-MKL) model is derived. In particular, we propose a Support Vector Machine-regularized model in which, for each task, an optimal kernel is learned based on a neighborhood-defining kernel that is not restricted to be positive semi-definite. Comparative experimental results are showcased that underline the merits of our neighborhood-defining framework in both classification and regression problems.
LGAug 20, 2014
Conic Multi-Task ClassificationCong Li, Michael Georgiopoulos, Georgios C. Anagnostopoulos
Traditionally, Multi-task Learning (MTL) models optimize the average of task-related objective functions, which is an intuitive approach and which we will be referring to as Average MTL. However, a more general framework, referred to as Conic MTL, can be formulated by considering conic combinations of the objective functions instead; in this framework, Average MTL arises as a special case, when all combination coefficients equal 1. Although the advantage of Conic MTL over Average MTL has been shown experimentally in previous works, no theoretical justification has been provided to date. In this paper, we derive a generalization bound for the Conic MTL method, and demonstrate that the tightest bound is not necessarily achieved, when all combination coefficients equal 1; hence, Average MTL may not always be the optimal choice, and it is important to consider Conic MTL. As a byproduct of the generalization bound, it also theoretically explains the good experimental results of previous relevant works. Finally, we propose a new Conic MTL model, whose conic combination coefficients minimize the generalization bound, instead of choosing them heuristically as has been done in previous methods. The rationale and advantage of our model is demonstrated and verified via a series of experiments by comparing with several other methods.
LGApr 11, 2014
Pareto-Path Multi-Task Multiple Kernel LearningCong Li, Michael Georgiopoulos, Georgios C. Anagnostopoulos
A traditional and intuitively appealing Multi-Task Multiple Kernel Learning (MT-MKL) method is to optimize the sum (thus, the average) of objective functions with (partially) shared kernel function, which allows information sharing amongst tasks. We point out that the obtained solution corresponds to a single point on the Pareto Front (PF) of a Multi-Objective Optimization (MOO) problem, which considers the concurrent optimization of all task objectives involved in the Multi-Task Learning (MTL) problem. Motivated by this last observation and arguing that the former approach is heuristic, we propose a novel Support Vector Machine (SVM) MT-MKL framework, that considers an implicitly-defined set of conic combinations of task objectives. We show that solving our framework produces solutions along a path on the aforementioned PF and that it subsumes the optimization of the average of objective functions as a special case. Using algorithms we derived, we demonstrate through a series of experimental results that the framework is capable of achieving better classification performance, when compared to other similar MTL approaches.
LGJan 21, 2014
A Unifying Framework for Typical Multi-Task Multiple Kernel Learning ProblemsCong Li, Michael Georgiopoulos, Georgios C. Anagnostopoulos
Over the past few years, Multi-Kernel Learning (MKL) has received significant attention among data-driven feature selection techniques in the context of kernel-based learning. MKL formulations have been devised and solved for a broad spectrum of machine learning problems, including Multi-Task Learning (MTL). Solving different MKL formulations usually involves designing algorithms that are tailored to the problem at hand, which is, typically, a non-trivial accomplishment. In this paper we present a general Multi-Task Multi-Kernel Learning (Multi-Task MKL) framework that subsumes well-known Multi-Task MKL formulations, as well as several important MKL approaches on single-task problems. We then derive a simple algorithm that can solve the unifying framework. To demonstrate the flexibility of the proposed framework, we formulate a new learning problem, namely Partially-Shared Common Space (PSCS) Multi-Task MKL, and demonstrate its merits through experimentation.
LGDec 9, 2013
Multi-Task Classification Hypothesis Space with Improved Generalization BoundsCong Li, Michael Georgiopoulos, Georgios C. Anagnostopoulos
This paper presents a RKHS, in general, of vector-valued functions intended to be used as hypothesis space for multi-task classification. It extends similar hypothesis spaces that have previously considered in the literature. Assuming this space, an improved Empirical Rademacher Complexity-based generalization bound is derived. The analysis is itself extended to an MKL setting. The connection between the proposed hypothesis space and a Group-Lasso type regularizer is discussed. Finally, experimental results, with some SVM-based Multi-Task Learning problems, underline the quality of the derived bounds and validate the paper's analysis.
LGDec 9, 2013
Kernel-based Distance Metric Learning in the Output SpaceCong Li, Michael Georgiopoulos, Georgios C. Anagnostopoulos
In this paper we present two related, kernel-based Distance Metric Learning (DML) methods. Their respective models non-linearly map data from their original space to an output space, and subsequent distance measurements are performed in the output space via a Mahalanobis metric. The dimensionality of the output space can be directly controlled to facilitate the learning of a low-rank metric. Both methods allow for simultaneous inference of the associated metric and the mapping to the output space, which can be used to visualize the data, when the output space is 2- or 3-dimensional. Experimental results for a collection of classification tasks illustrate the advantages of the proposed methods over other traditional and kernel-based DML approaches.