CVMar 21, 2023Code
The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge DetectorCaixia Zhou, Yaping Huang, Mengyang Pu et al.
Deep learning-based edge detectors heavily rely on pixel-wise labels which are often provided by multiple annotators. Existing methods fuse multiple annotations using a simple voting process, ignoring the inherent ambiguity of edges and labeling bias of annotators. In this paper, we propose a novel uncertainty-aware edge detector (UAED), which employs uncertainty to investigate the subjectivity and ambiguity of diverse annotations. Specifically, we first convert the deterministic label space into a learnable Gaussian distribution, whose variance measures the degree of ambiguity among different annotations. Then we regard the learned variance as the estimated uncertainty of the predicted edge maps, and pixels with higher uncertainty are likely to be hard samples for edge detection. Therefore we design an adaptive weighting loss to emphasize the learning from those pixels with high uncertainty, which helps the network to gradually concentrate on the important pixels. UAED can be combined with various encoder-decoder backbones, and the extensive experiments demonstrate that UAED achieves superior performance consistently across multiple edge detection benchmarks. The source code is available at \url{https://github.com/ZhouCX117/UAED}
CLSep 20, 2024Code
Advancing Event Causality Identification via Heuristic Semantic Dependency Inquiry NetworkHaoran Li, Qiang Gao, Hongmei Wu et al.
Event Causality Identification (ECI) focuses on extracting causal relations between events in texts. Existing methods for ECI primarily rely on causal features and external knowledge. However, these approaches fall short in two dimensions: (1) causal features between events in a text often lack explicit clues, and (2) external knowledge may introduce bias, while specific problems require tailored analyses. To address these issues, we propose SemDI - a simple and effective Semantic Dependency Inquiry Network for ECI. SemDI captures semantic dependencies within the context using a unified encoder. Then, it utilizes a Cloze Analyzer to generate a fill-in token based on comprehensive context understanding. Finally, this fill-in token is used to inquire about the causal relation between two events. Extensive experiments demonstrate the effectiveness of SemDI, surpassing state-of-the-art methods on three widely used benchmarks. Code is available at https://github.com/hrlics/SemDI.
STOct 28, 2022
Incorporating Interactive Facts for Stock Selection via Neural Recursive ODEsQiang Gao, Xinzhu Zhou, Kunpeng Zhang et al.
Stock selection attempts to rank a list of stocks for optimizing investment decision making, aiming at minimizing investment risks while maximizing profit returns. Recently, researchers have developed various (recurrent) neural network-based methods to tackle this problem. Without exceptions, they primarily leverage historical market volatility to enhance the selection performance. However, these approaches greatly rely on discrete sampled market observations, which either fail to consider the uncertainty of stock fluctuations or predict continuous stock dynamics in the future. Besides, some studies have considered the explicit stock interdependence derived from multiple domains (e.g., industry and shareholder). Nevertheless, the implicit cross-dependencies among different domains are under-explored. To address such limitations, we present a novel stock selection solution -- StockODE, a latent variable model with Gaussian prior. Specifically, we devise a Movement Trend Correlation module to expose the time-varying relationships regarding stock movements. We design Neural Recursive Ordinary Differential Equation Networks (NRODEs) to capture the temporal evolution of stock volatility in a continuous dynamic manner. Moreover, we build a hierarchical hypergraph to incorporate the domain-aware dependencies among the stocks. Experiments conducted on two real-world stock market datasets demonstrate that StockODE significantly outperforms several baselines, such as up to 18.57% average improvement regarding Sharpe Ratio.
SEApr 10Code
DeepGuard: Secure Code Generation via Multi-Layer Semantic AggregationLi Huang, Zhongxin Liu, Yifan Wu et al.
Large Language Models (LLMs) for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer. However, this design may suffer from a final-layer bottleneck: vulnerability-discriminative cues can be distributed across layers and become less detectable near the output representations optimized for next-token prediction. To diagnose this issue, we perform layer-wise linear probing. We observe that vulnerability-related signals are most detectable in a band of intermediate-to-upper layers yet attenuate toward the final layers. Motivated by this observation, we introduce DeepGuard, a framework that leverages distributed security-relevant cues by aggregating representations from multiple upper layers via an attention-based module. The aggregated signal powers a dedicated security analyzer within a multi-objective training objective that balances security enhancement and functional correctness, and further supports a lightweight inference-time steering strategy. Extensive experiments across five code LLMs demonstrate that DeepGuard improves the secure-and-correct generation rate by an average of 11.9% over strong baselines such as SVEN. It also preserves functional correctness while exhibiting generalization to held-out vulnerability types. Our code is public at https://github.com/unknownhl/DeepGuard.
LGApr 8
OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at ScaleDihong Jiang, Ruoqi Cao, Zhiyuan Dang et al.
While traditional tree-based ensemble methods have long dominated tabular tasks, deep neural networks and emerging foundation models have challenged this primacy, yet no consensus exists on a universally superior paradigm. Existing benchmarks typically contain fewer than 100 datasets, raising concerns about evaluation sufficiency and potential selection biases. To address these limitations, we introduce OmniTabBench, the largest tabular benchmark to date, comprising 3030 datasets spanning diverse tasks that are comprehensively collected from diverse sources and categorized by industry using large language models. We conduct an unprecedented large-scale empirical evaluation of state-of-the-art models from all model families on OmniTabBench, confirming the absence of a dominant winner. Furthermore, through a decoupled metafeature analysis, which examines individual properties such as dataset size, feature types, feature and target skewness/kurtosis, we elucidate conditions favoring specific model categories, providing clearer, more actionable guidance than prior compound-metric studies.
AIDec 16, 2025
Intention Chain-of-Thought Prompting with Dynamic Routing for Code GenerationShen Li, Li Huang, Shaoxiong Zhan et al.
Large language models (LLMs) exhibit strong generative capabilities and have shown great potential in code generation. Existing chain-of-thought (CoT) prompting methods enhance model reasoning by eliciting intermediate steps, but suffer from two major limitations: First, their uniform application tends to induce overthinking on simple tasks. Second, they lack intention abstraction in code generation, such as explicitly modeling core algorithmic design and efficiency, leading models to focus on surface-level structures while neglecting the global problem objective. Inspired by the cognitive economy principle of engaging structured reasoning only when necessary to conserve cognitive resources, we propose RoutingGen, a novel difficulty-aware routing framework that dynamically adapts prompting strategies for code generation. For simple tasks, it adopts few-shot prompting; for more complex ones, it invokes a structured reasoning strategy, termed Intention Chain-of-Thought (ICoT), which we introduce to guide the model in capturing task intention, such as the core algorithmic logic and its time complexity. Experiments across three models and six standard code generation benchmarks show that RoutingGen achieves state-of-the-art performance in most settings, while reducing total token usage by 46.37% on average across settings. Furthermore, ICoT outperforms six existing prompting baselines on challenging benchmarks.
CVOct 17, 2022
Cutting-Splicing data augmentation: A novel technology for medical image segmentationLianting Hu, Huiying Liang, Jiajie Tang et al.
Background: Medical images are more difficult to acquire and annotate than natural images, which results in data augmentation technologies often being used in medical image segmentation tasks. Most data augmentation technologies used in medical segmentation were originally developed on natural images and do not take into account the characteristic that the overall layout of medical images is standard and fixed. Methods: Based on the characteristics of medical images, we developed the cutting-splicing data augmentation (CS-DA) method, a novel data augmentation technology for medical image segmentation. CS-DA augments the dataset by splicing different position components cut from different original medical images into a new image. The characteristics of the medical image result in the new image having the same layout as and similar appearance to the original image. Compared with classical data augmentation technologies, CS-DA is simpler and more robust. Moreover, CS-DA does not introduce any noise or fake information into the newly created image. Results: To explore the properties of CS-DA, many experiments are conducted on eight diverse datasets. On the training dataset with the small sample size, CS-DA can effectively increase the performance of the segmentation model. When CS-DA is used together with classical data augmentation technologies, the performance of the segmentation model can be further improved and is much better than that of CS-DA and classical data augmentation separately. We also explored the influence of the number of components, the position of the cutting line, and the splicing method on the CS-DA performance. Conclusions: The excellent performance of CS-DA in the experiment has confirmed the effectiveness of CS-DA, and provides a new data augmentation idea for the small sample segmentation task.
LGSep 15, 2024
Enhancing Data Quality through Self-learning on Imbalanced Financial Risk DataXu Sun, Zixuan Qin, Shun Zhang et al.
In the financial risk domain, particularly in credit default prediction and fraud detection, accurate identification of high-risk class instances is paramount, as their occurrence can have significant economic implications. Although machine learning models have gained widespread adoption for risk prediction, their performance is often hindered by the scarcity and diversity of high-quality data. This limitation stems from factors in datasets such as small risk sample sizes, high labeling costs, and severe class imbalance, which impede the models' ability to learn effectively and accurately forecast critical events. This study investigates data pre-processing techniques to enhance existing financial risk datasets by introducing TriEnhance, a straightforward technique that entails: (1) generating synthetic samples specifically tailored to the minority class, (2) filtering using binary feedback to refine samples, and (3) self-learning with pseudo-labels. Our experiments across six benchmark datasets reveal the efficacy of TriEnhance, with a notable focus on improving minority class calibration, a key factor for developing more robust financial risk prediction systems.
CVJul 4, 2024
Self Adaptive Threshold Pseudo-labeling and Unreliable Sample Contrastive Loss for Semi-supervised Image ClassificationXuerong Zhang, Li Huang, Jing Lv et al.
Semi-supervised learning is attracting blooming attention, due to its success in combining unlabeled data. However, pseudo-labeling-based semi-supervised approaches suffer from two problems in image classification: (1) Existing methods might fail to adopt suitable thresholds since they either use a pre-defined/fixed threshold or an ad-hoc threshold adjusting scheme, resulting in inferior performance and slow convergence. (2) Discarding unlabeled data with confidence below the thresholds results in the loss of discriminating information. To solve these issues, we develop an effective method to make sufficient use of unlabeled data. Specifically, we design a self adaptive threshold pseudo-labeling strategy, which thresholds for each class can be dynamically adjusted to increase the number of reliable samples. Meanwhile, in order to effectively utilise unlabeled data with confidence below the thresholds, we propose an unreliable sample contrastive loss to mine the discriminative information in low-confidence samples by learning the similarities and differences between sample features. We evaluate our method on several classification benchmarks under partially labeled settings and demonstrate its superiority over the other approaches.
LGOct 14, 2025Code
MoRA: On-the-fly Molecule-aware Low-Rank Adaptation Framework for LLM-based Multi-Modal Molecular AssistantTao Yin, Xiaohong Zhang, Jiacheng Zhang et al.
Effectively integrating molecular graph structures with Large Language Models (LLMs) is a key challenge in drug discovery. Most existing multi-modal alignment methods typically process these structures by fine-tuning the LLM or adding a static adapter simultaneously. However, these approaches have two main limitations: (1) it optimizes a shared parameter space across all molecular inputs, limiting the model's ability to capture instance-specific structural features; and (2) fine-tuning the LLM for molecular tasks can lead to catastrophic forgetting, undermining its general reasoning capabilities. In this paper, instead of static task-oriented adaptation, we propose an instance-specific parameter space alignment approach for each molecule on-the-fly. To this end, we introduce Molecule-aware Low-Rank Adaptation (MoRA) that produces a unique set of low-rank adaptation weights for each input molecular graph. These weights are then dynamically injected into a frozen LLM, allowing the model to adapt its reasoning to the structure of each molecular input, while preserving the LLM's core knowledge. Extensive experiments demonstrate that on key molecular tasks, such as chemical reaction prediction and molecular captioning, MoRA's instance-specific dynamic adaptation outperforms statically adapted baselines, including a 14.1% relative improvement in reaction prediction exact match and a 22% reduction in error for quantum property prediction. The code is available at https://github.com/jk-sounds/MoRA.
LGSep 29, 2025Code
ScatterAD: Temporal-Topological Scattering Mechanism for Time Series Anomaly DetectionTao Yin, Xiaohong Zhang, Shaochen Fu et al.
One main challenge in time series anomaly detection for industrial IoT lies in the complex spatio-temporal couplings within multivariate data. However, traditional anomaly detection methods focus on modeling spatial or temporal dependencies independently, resulting in suboptimal representation learning and limited sensitivity to anomalous dispersion in high-dimensional spaces. In this work, we conduct an empirical analysis showing that both normal and anomalous samples tend to scatter in high-dimensional space, especially anomalous samples are markedly more dispersed. We formalize this dispersion phenomenon as scattering, quantified by the mean pairwise distance among sample representations, and leverage it as an inductive signal to enhance spatio-temporal anomaly detection. Technically, we propose ScatterAD to model representation scattering across temporal and topological dimensions. ScatterAD incorporates a topological encoder for capturing graph-structured scattering and a temporal encoder for constraining over-scattering through mean squared error minimization between neighboring time steps. We introduce a contrastive fusion mechanism to ensure the complementarity of the learned temporal and topological representations. Additionally, we theoretically show that maximizing the conditional mutual information between temporal and topological views improves cross-view consistency and enhances more discriminative representations. Extensive experiments on multiple public benchmarks show that ScatterAD achieves state-of-the-art performance on multivariate time series anomaly detection. Code is available at this repository: https://github.com/jk-sounds/ScatterAD.
AIAug 18, 2025Code
E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language ModelRonghao Lin, Shuai Shen, Weipeng Hu et al.
Multimodal Empathetic Response Generation (MERG) is crucial for building emotionally intelligent human-computer interactions. Although large language models (LLMs) have improved text-based ERG, challenges remain in handling multimodal emotional content and maintaining identity consistency. Thus, we propose E3RG, an Explicit Emotion-driven Empathetic Response Generation System based on multimodal LLMs which decomposes MERG task into three parts: multimodal empathy understanding, empathy memory retrieval, and multimodal response generation. By integrating advanced expressive speech and video generative models, E3RG delivers natural, emotionally rich, and identity-consistent responses without extra training. Experiments validate the superiority of our system on both zero-shot and few-shot settings, securing Top-1 position in the Avatar-based Multimodal Empathy Challenge on ACM MM 25. Our code is available at https://github.com/RH-Lin/E3RG.
CLOct 23, 2020Code
KINNEWS and KIRNEWS: Benchmarking Cross-Lingual Text Classification for Kinyarwanda and KirundiRubungo Andre Niyongabo, Hong Qu, Julia Kreutzer et al.
Recent progress in text classification has been focused on high-resource languages such as English and Chinese. For low-resource languages, amongst them most African languages, the lack of well-annotated data and effective preprocessing, is hindering the progress and the transfer of successful methods. In this paper, we introduce two news datasets (KINNEWS and KIRNEWS) for multi-class classification of news articles in Kinyarwanda and Kirundi, two low-resource African languages. The two languages are mutually intelligible, but while Kinyarwanda has been studied in Natural Language Processing (NLP) to some extent, this work constitutes the first study on Kirundi. Along with the datasets, we provide statistics, guidelines for preprocessing, and monolingual and cross-lingual baseline models. Our experiments show that training embeddings on the relatively higher-resourced Kinyarwanda yields successful cross-lingual transfer to Kirundi. In addition, the design of the created datasets allows for a wider use in NLP beyond text classification in future studies, such as representation learning, cross-lingual learning with more distant languages, or as base for new annotations for tasks such as parsing, POS tagging, and NER. The datasets, stopwords, and pre-trained embeddings are publicly available at https://github.com/Andrews2017/KINNEWS-and-KIRNEWS-Corpus .
IRJan 6
M-RAG: Making RAG Faster, Stronger, and More EfficientSun Xu, Tongkai Xu, Baiheng Xie et al.
Retrieval-Augmented Generation (RAG) has become a widely adopted paradigm for enhancing the reliability of large language models (LLMs). However, RAG systems are sensitive to retrieval strategies that rely on text chunking to construct retrieval units, which often introduce information fragmentation, retrieval noise, and reduced efficiency. Recent work has even questioned the necessity of RAG, arguing that long-context LLMs may eliminate multi-stage retrieval pipelines by directly processing full documents. Nevertheless, expanded context capacity alone does not resolve the challenges of relevance filtering, evidence prioritization, and isolating answer-bearing information. To this end, we proposed M-RAG, a novel Chunk-free retrieval strategy. Instead of retrieving coarse-grained textual chunks, M-RAG extracts structured, k-v decomposition meta-markers, with a lightweight, intent-aligned retrieval key for retrieval and a context-rich information value for generation. Under this setting, M-RAG enables efficient and stable query-key similarity matching without sacrificing expressive ability. Experimental results on the LongBench subtasks demonstrate that M-RAG outperforms chunk-based RAG baselines across varying token budgets, particularly under low-resource settings. Extensive analysis further reveals that M-RAG retrieves more answer-friendly evidence with high efficiency, validating the effectiveness of decoupling retrieval representation from generation and highlighting the proposed strategy as a scalable and robust alternative to existing chunk-based methods.
LGFeb 4
Scaling DPPs for RAG: Density Meets DiversityXun Sun, Baiheng Xie, Li Huang et al.
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding generation in external knowledge, yielding relevance responses that are aligned with factual evidence and evolving corpora. Standard RAG pipelines construct context through relevance ranking, performing point-wise scoring between the user query and each corpora chunk. This formulation, however, ignores interactions among retrieved candidates, leading to redundant contexts that dilute density and fail to surface complementary evidence. We argue that effective retrieval should optimize jointly for both density and diversity, ensuring the grounding evidence that is dense in information yet diverse in coverage. In this study, we propose ScalDPP, a diversity-aware retrieval mechanism for RAG that incorporates Determinantal Point Processes (DPPs) through a lightweight P-Adapter, enabling scalable modeling of inter-chunk dependencies and complementary context selection. In addition, we develop a novel set-level objective, Diverse Margin Loss (DML), that enforces ground-truth complementary evidence chains to dominate any equally sized redundant alternatives under DPP geometry. Experimental results demonstrate the superiority of ScalDPP, substantiating our core statement in practice.
CVMay 15, 2024
AMSNet: Netlist Dataset for AMS CircuitsZhuofu Tao, Yichen Shi, Yiru Huo et al.
Today's analog/mixed-signal (AMS) integrated circuit (IC) designs demand substantial manual intervention. The advent of multimodal large language models (MLLMs) has unveiled significant potential across various fields, suggesting their applicability in streamlining large-scale AMS IC design as well. A bottleneck in employing MLLMs for automatic AMS circuit generation is the absence of a comprehensive dataset delineating the schematic-netlist relationship. We therefore design an automatic technique for converting schematics into netlists, and create dataset AMSNet, encompassing transistor-level schematics and corresponding SPICE format netlists. With a growing size, AMSNet can significantly facilitate exploration of MLLM applications in AMS circuit design. We have made an initial set of netlists public, and will make both our netlist generation tool and the full dataset available upon publishing of this paper.
LGFeb 4
CyIN: Cyclic Informative Latent Space for Bridging Complete and Incomplete Multimodal LearningRonghao Lin, Qiaolin He, Sijie Mai et al.
Multimodal machine learning, mimicking the human brain's ability to integrate various modalities has seen rapid growth. Most previous multimodal models are trained on perfectly paired multimodal input to reach optimal performance. In real-world deployments, however, the presence of modality is highly variable and unpredictable, causing the pre-trained models in suffering significant performance drops and fail to remain robust with dynamic missing modalities circumstances. In this paper, we present a novel Cyclic INformative Learning framework (CyIN) to bridge the gap between complete and incomplete multimodal learning. Specifically, we firstly build an informative latent space by adopting token- and label-level Information Bottleneck (IB) cyclically among various modalities. Capturing task-related features with variational approximation, the informative bottleneck latents are purified for more efficient cross-modal interaction and multimodal fusion. Moreover, to supplement the missing information caused by incomplete multimodal input, we propose cross-modal cyclic translation by reconstruct the missing modalities with the remained ones through forward and reverse propagation process. With the help of the extracted and reconstructed informative latents, CyIN succeeds in jointly optimizing complete and incomplete multimodal learning in one unified model. Extensive experiments on 4 multimodal datasets demonstrate the superior performance of our method in both complete and diverse incomplete scenarios.
SEJul 21, 2025
Do AI models help produce verified bug fixes?Li Huang, Ilgiz Mustafin, Marco Piccioni et al.
Among areas of software engineering where AI techniques -- particularly, Large Language Models -- seem poised to yield dramatic improvements, an attractive candidate is Automatic Program Repair (APR), the production of satisfactory corrections to software bugs. Does this expectation materialize in practice? How do we find out, making sure that proposed corrections actually work? If programmers have access to LLMs, how do they actually use them to complement their own skills? To answer these questions, we took advantage of the availability of a program-proving environment, which formally determines the correctness of proposed fixes, to conduct a study of program debugging with two randomly assigned groups of programmers, one with access to LLMs and the other without, both validating their answers through the proof tools. The methodology relied on a division into general research questions (Goals in the Goal-Query-Metric approach), specific elements admitting specific answers (Queries), and measurements supporting these answers (Metrics). While applied so far to a limited sample size, the results are a first step towards delineating a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs. These results caused surprise as compared to what one might expect from the use of AI for debugging and APR. The contributions also include: a detailed methodology for experiments in the use of LLMs for debugging, which other projects can reuse; a fine-grain analysis of programmer behavior, made possible by the use of full-session recording; a definition of patterns of use of LLMs, with 7 distinct categories; and validated advice for getting the best of LLMs for debugging and Automatic Program Repair.
CVMay 14, 2025
AMSnet 2.0: A Large AMS Database with AI Segmentation for Net DetectionYichen Shi, Zhuofu Tao, Yuhao Gao et al.
Current multimodal large language models (MLLMs) struggle to understand circuit schematics due to their limited recognition capabilities. This could be attributed to the lack of high-quality schematic-netlist training data. Existing work such as AMSnet applies schematic parsing to generate netlists. However, these methods rely on hard-coded heuristics and are difficult to apply to complex or noisy schematics in this paper. We therefore propose a novel net detection mechanism based on segmentation with high robustness. The proposed method also recovers positional information, allowing digital reconstruction of schematics. We then expand AMSnet dataset with schematic images from various sources and create AMSnet 2.0. AMSnet 2.0 contains 2,686 circuits with schematic images, Spectre-formatted netlists, OpenAccess digital schematics, and positional information for circuit components and nets, whereas AMSnet only includes 792 circuits with SPICE netlists but no digital schematics.
AISep 22, 2025
AI Pangaea: Unifying Intelligence Islands for Adapting Myriad TasksJianlong Chang, Haixin Wang, Zhiyuan Dang et al.
The pursuit of artificial general intelligence continuously demands generalization in one model across myriad tasks, even those not seen before. However, current AI models are isolated from each other for being limited to specific tasks, now first defined as Intelligence Islands. To unify Intelligence Islands into one, we propose Pangaea, the first AI supercontinent akin to the geological Pangaea. Pangaea encodes any data into a unified format and accumulates universal knowledge through pre-training on 296 datasets across diverse modalities. Eventually, it demonstrates remarkable generalization across 45 general tasks and 15 scientific tasks encompassing a wide range of scientific subjects. By investigating Pangaea deeper, the scaling effect of modality is revealed, quantifying the universal knowledge accumulation across modalities as the cumulative distribution function of a geometric distribution. On the whole, Pangaea shows strong potential to handle myriad tasks, indicating a new direction toward artificial general intelligence.
IVDec 31, 2024
SS-CTML: Self-Supervised Cross-Task Mutual Learning for CT Image ReconstructionGaofeng Chen, Yaoduo Zhang, Li Huang et al.
Supervised deep-learning (SDL) techniques with paired training datasets have been widely studied for X-ray computed tomography (CT) image reconstruction. However, due to the difficulties of obtaining paired training datasets in clinical routine, the SDL methods are still away from common uses in clinical practices. In recent years, self-supervised deep-learning (SSDL) techniques have shown great potential for the studies of CT image reconstruction. In this work, we propose a self-supervised cross-task mutual learning (SS-CTML) framework for CT image reconstruction. Specifically, a sparse-view scanned and a limited-view scanned sinogram data are first extracted from a full-view scanned sinogram data, which results in three individual reconstruction tasks, i.e., the full-view CT (FVCT) reconstruction, the sparse-view CT (SVCT) reconstruction, and limited-view CT (LVCT) reconstruction. Then, three neural networks are constructed for the three reconstruction tasks. Considering that the ultimate goals of the three tasks are all to reconstruct high-quality CT images, we therefore construct a set of cross-task mutual learning objectives for the three tasks, in which way, the three neural networks can be self-supervised optimized by learning from each other. Clinical datasets are adopted to evaluate the effectiveness of the proposed framework. Experimental results demonstrate that the SS-CTML framework can obtain promising CT image reconstruction performance in terms of both quantitative and qualitative measurements.
LGMay 21, 2024
Multiscale lubrication simulation based on fourier feature networks with trainable frequencyYihu Tang, Li Huang, Limin Wu et al.
Rough surface lubrication simulation is crucial for designing and optimizing tribological performance. Despite the growing application of Physical Information Neural Networks (PINNs) in hydrodynamic lubrication analysis, their use has been primarily limited to smooth surfaces. This is due to traditional PINN methods suffer from spectral bias, favoring to learn low-frequency features and thus failing to analyze rough surfaces with high-frequency signals. To date, no PINN methods have been reported for rough surface lubrication. To overcome these limitations, this work introduces a novel multi-scale lubrication neural network architecture that utilizes a trainable Fourier feature network. By incorporating learnable feature embedding frequencies, this architecture automatically adapts to various frequency components, thereby enhancing the analysis of rough surface characteristics. This method has been tested across multiple surface morphologies, and the results have been compared with those obtained using the finite element method (FEM). The comparative analysis demonstrates that this approach achieves a high consistency with FEM results. Furthermore, this novel architecture surpasses traditional Fourier feature networks with fixed feature embedding frequencies in both accuracy and computational efficiency. Consequently, the multi-scale lubrication neural network model offers a more efficient tool for rough surface lubrication analysis.
CVDec 16, 2021
AFDetV2: Rethinking the Necessity of the Second Stage for Object Detection from Point CloudsYihan Hu, Zhuangzhuang Ding, Runzhou Ge et al.
There have been two streams in the 3D detection from point clouds: single-stage methods and two-stage methods. While the former is more computationally efficient, the latter usually provides better detection accuracy. By carefully examining the two-stage approaches, we have found that if appropriately designed, the first stage can produce accurate box regression. In this scenario, the second stage mainly rescores the boxes such that the boxes with better localization get selected. From this observation, we have devised a single-stage anchor-free network that can fulfill these requirements. This network, named AFDetV2, extends the previous work by incorporating a self-calibrated convolution block in the backbone, a keypoint auxiliary supervision, and an IoU prediction branch in the multi-task head. As a result, the detection accuracy is drastically boosted in the single-stage. To evaluate our approach, we have conducted extensive experiments on the Waymo Open Dataset and the nuScenes Dataset. We have observed that our AFDetV2 achieves the state-of-the-art results on these two datasets, superior to all the prior arts, including both the single-stage and the two-stage 3D detectors. AFDetV2 won the 1st place in the Real-Time 3D Detection of the Waymo Open Dataset Challenge 2021. In addition, a variant of our model AFDetV2-Base was entitled the "Most Efficient Model" by the Challenge Sponsor, showing a superior computational efficiency. To demonstrate the generality of this single-stage method, we have also applied it to the first stage of the two-stage networks. Without exception, the results show that with the strengthened backbone and the rescoring approach, the second stage refinement is no longer needed.
NEDec 13, 2021
Human-Level Control through Directly-Trained Deep Spiking Q-NetworksGuisong Liu, Wenjie Deng, Xiurui Xie et al.
As the third-generation neural networks, Spiking Neural Networks (SNNs) have great potential on neuromorphic hardware because of their high energy-efficiency. However, Deep Spiking Reinforcement Learning (DSRL), i.e., the Reinforcement Learning (RL) based on SNNs, is still in its preliminary stage due to the binary output and the non-differentiable property of the spiking function. To address these issues, we propose a Deep Spiking Q-Network (DSQN) in this paper. Specifically, we propose a directly-trained deep spiking reinforcement learning architecture based on the Leaky Integrate-and-Fire (LIF) neurons and Deep Q-Network (DQN). Then, we adapt a direct spiking learning algorithm for the Deep Spiking Q-Network. We further demonstrate the advantages of using LIF neurons in DSQN theoretically. Comprehensive experiments have been conducted on 17 top-performing Atari games to compare our method with the state-of-the-art conversion method. The experimental results demonstrate the superiority of our method in terms of performance, stability, robustness and energy-efficiency. To the best of our knowledge, our work is the first one to achieve state-of-the-art performance on multiple Atari games with the directly-trained SNN.
CVJul 29, 2021
Real-Time Anchor-Free Single-Stage 3D Detection with IoU-AwarenessRunzhou Ge, Zhuangzhuang Ding, Yihan Hu et al.
In this report, we introduce our winning solution to the Real-time 3D Detection and also the "Most Efficient Model" in the Waymo Open Dataset Challenges at CVPR 2021. Extended from our last year's award-winning model AFDet, we have made a handful of modifications to the base model, to improve the accuracy and at the same time to greatly reduce the latency. The modified model, named as AFDetV2, is featured with a lite 3D Feature Extractor, an improved RPN with extended receptive field and an added sub-head that produces an IoU-aware confidence score. These model enhancements, together with enriched data augmentation, stochastic weights averaging, and a GPU-based implementation of voxelization, lead to a winning accuracy of 73.12 mAPH/L2 for our AFDetV2 with a latency of 60.06 ms, and an accuracy of 72.57 mAPH/L2 for our AFDetV2-base, entitled as the "Most Efficient Model" by the challenge sponsor, with a winning latency of 55.86 ms.
CVJun 28, 2020
2nd Place Solution for Waymo Open Dataset Challenge -- 2D Object DetectionSijia Chen, Yu Wang, Li Huang et al.
A practical autonomous driving system urges the need to reliably and accurately detect vehicles and persons. In this report, we introduce a state-of-the-art 2D object detection system for autonomous driving scenarios. Specifically, we integrate both popular two-stage detector and one-stage detector with anchor free fashion to yield a robust detection. Furthermore, we train multiple expert models and design a greedy version of the auto ensemble scheme that automatically merges detections from different models. Notably, our overall detection system achieves 70.28 L2 mAP on the Waymo Open Dataset v1.2, ranking the 2nd place in the 2D detection track of the Waymo Open Dataset Challenges.
CVJun 28, 2020
1st Place Solutions for Waymo Open Dataset Challenges -- 2D and 3D TrackingYu Wang, Sijia Chen, Li Huang et al.
This technical report presents the online and real-time 2D and 3D multi-object tracking (MOT) algorithms that reached the 1st places on both Waymo Open Dataset 2D tracking and 3D tracking challenges. An efficient and pragmatic online tracking-by-detection framework named HorizonMOT is proposed for camera-based 2D tracking in the image space and LiDAR-based 3D tracking in the 3D world space. Within the tracking-by-detection paradigm, our trackers leverage our high-performing detectors used in the 2D/3D detection challenges and achieved 45.13% 2D MOTA/L2 and 63.45% 3D MOTA/L2 in the 2D/3D tracking challenges.
CVJun 28, 2020
1st Place Solution for Waymo Open Dataset Challenge -- 3D Detection and Domain AdaptationZhuangzhuang Ding, Yihan Hu, Runzhou Ge et al.
In this technical report, we introduce our winning solution "HorizonLiDAR3D" for the 3D detection track and the domain adaptation track in Waymo Open Dataset Challenge at CVPR 2020. Many existing 3D object detectors include prior-based anchor box design to account for different scales and aspect ratios and classes of objects, which limits its capability of generalization to a different dataset or domain and requires post-processing (e.g. Non-Maximum Suppression (NMS)). We proposed a one-stage, anchor-free and NMS-free 3D point cloud object detector AFDet, using object key-points to encode the 3D attributes, and to learn an end-to-end point cloud object detection without the need of hand-engineering or learning the anchors. AFDet serves as a strong baseline in our winning solution and significant improvements are made over this baseline during the challenges. Specifically, we design stronger networks and enhance the point cloud data using densification and point painting. To leverage camera information, we append/paint additional attributes to each point by projecting them to camera space and gathering image-based perception information. The final detection performance also benefits from model ensemble and Test-Time Augmentation (TTA) in both the 3D detection track and the domain adaptation track. Our solution achieves the 1st place with 77.11% mAPH/L2 and 69.49% mAPH/L2 respectively on the 3D detection track and the domain adaptation track.
CVJun 23, 2020
AFDet: Anchor Free One Stage 3D Object DetectionRunzhou Ge, Zhuangzhuang Ding, Yihan Hu et al.
High-efficiency point cloud 3D object detection operated on embedded systems is important for many robotics applications including autonomous driving. Most previous works try to solve it using anchor-based detection methods which come with two drawbacks: post-processing is relatively complex and computationally expensive; tuning anchor parameters is tricky. We are the first to address these drawbacks with an anchor free and Non-Maximum Suppression free one stage detector called AFDet. The entire AFDet can be processed efficiently on a CNN accelerator or a GPU with the simplified post-processing. Without bells and whistles, our proposed AFDet performs competitively with other one stage anchor-based methods on KITTI validation set and Waymo Open Dataset validation set.
SEOct 31, 2019
Formal Verification of Dynamic and Stochastic Behaviors for Automotive SystemsLi Huang, Tian Liang, Eun-Young Kang
Formal analysis of functional and non-functional requirements is crucial in automotive systems. The behaviors of those systems often rely on complex dynamics as well as on stochastic behaviors. We have proposed a probabilistic extension of Clock Constraint Specification Language, called PrCCSL,for specification of (non)-functional requirements and proved the correctness of requirements by mapping the semantics of the specifications into UPPAAL models. Previous work is extended in this paper by including an extension of PrCCSL, called PrCCSL*, for specification of stochastic and dynamic system behaviors, as well as complex requirements related to multiple events. To formally analyze the system behaviors/requirements specified in PrCCSL*, the PrCCSL* specifications are translated into stochastic UPPAAL models for formal verification. We implement an automatic translation tool, namely ProTL, which can also perform formal analysis on PrCCSL* specifications using UPPAAL-SMC as an analysis backend. Our approach is demonstrated on two automotive systems case studies.
SEOct 31, 2019
Formal Analysis of Hybrid-Dynamic Timing Behaviors in Cyber-Physical SystemsLi Huang, Eun-Young Kang
Ensuring correctness of timed behaviors in cyber-physical systems (CPS) using closed-loop verification is challenging due to the hybrid dynamics in both systems and environments. Simulink and Stateflow are tools for model-based design that support a variety of mechanisms for modeling and analyzing hybrid dynamics of real-time embedded systems. In this paper, we present an SMT-based approach for formal analysis of the hybrid-dynamic timing behaviors of CPS modeled in Simulink blocks and Stateflow states (S/S). The hierarchically interconnected S/S are flattened and translated into the input language of SMT solver for formal verification. A translation algorithm is provided to facilitate the translation. Formal verification of timing constraints against the S/S models is reduced to the validity checking of the resulting SMT encodings. The applicability of our approach is demonstrated on an unmanned surface vessel case study.
SEApr 15, 2019
SMT-based Probabilistic Analysis of Timing Constraints in Cyber-Physical SystemsLi Huang, Eun-Young Kang
Modeling and analysis of timing constraints is crucial in cyber-physical systems (CPS). EAST-ADL is an architectural language dedicated to safety-critical embedded system design. SIMULINK/STATEFLOW (S/S) is a widely used industrial tool for modeling and analysis of embedded systems. In most cases, a bounded number of violations of timing constraints in systems would not lead to system failures when the results of the violations are negligible, called Weakly-Hard (WH). We have previously defined a probabilistic extension of Clock Constraint Specification Language (CCSL), called PrCCSL, for formal specification of EAST-ADL timing constraints in the context of WH. In this paper, we propose an SMT-based approach for probabilistic analysis of EAST-ADL timing constraints in CPS modeled in S/S: an automatic transformation from S/S models to the input language of SMT solver is provided; timing constraints specified in PrCCSL are encoded into SMT formulas and the probabilistic analysis of timing constraints is reduced to the validity checking of the resulting SMT encodings. Our approach is demonstrated a cooperative automotive system case study.
LGMar 22, 2019
Patient Clustering Improves Efficiency of Federated Machine Learning to predict mortality and hospital stay time using distributed Electronic Medical RecordsLi Huang, Dianbo Liu
Electronic medical records (EMRs) supports the development of machine learning algorithms for predicting disease incidence, patient response to treatment, and other healthcare events. But insofar most algorithms have been centralized, taking little account of the decentralized, non-identically independently distributed (non-IID), and privacy-sensitive characteristics of EMRs that can complicate data collection, sharing and learning. To address this challenge, we introduced a community-based federated machine learning (CBFL) algorithm and evaluated it on non-IID ICU EMRs. Our algorithm clustered the distributed data into clinically meaningful communities that captured similar diagnoses and geological locations, and learnt one model for each community. Throughout the learning process, the data was kept local on hospitals, while locally-computed results were aggregated on a server. Evaluation results show that CBFL outperformed the baseline FL algorithm in terms of Area Under the Receiver Operating Characteristic Curve (ROC AUC), Area Under the Precision-Recall Curve (PR AUC), and communication cost between hospitals and the server. Furthermore, communities' performance difference could be explained by how dissimilar one community was to others.
LGNov 30, 2018
LoAdaBoost: loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care dataLi Huang, Yifeng Yin, Zeng Fu et al.
Intensive care data are valuable for improvement of health care, policy making and many other purposes. Vast amount of such data are stored in different locations, on many different devices and in different data silos. Sharing data among different sources is a big challenge due to regulatory, operational and security reasons. One potential solution is federated machine learning, which is a method that sends machine learning algorithms simultaneously to all data sources, trains models in each source and aggregates the learned models. This strategy allows utilization of valuable data without moving them. One challenge in applying federated machine learning is the possibly different distributions of data from diverse sources. To tackle this problem, we proposed an adaptive boosting method named LoAdaBoost that increases the efficiency of federated machine learning. Using intensive care unit data from hospitals, we investigated the performance of learning in IID and non-IID data distribution scenarios, and showed that the proposed LoAdaBoost method achieved higher predictive accuracy with lower computational complexity than the baseline method.
SEJun 29, 2018
Probabilistic Analysis of Weakly-Hard Real-Time SystemsEun-Young Kang, Dongrui Mu, Li Huang
Modeling and analysis of non-functional properties, such as timing constraints, is crucial in automotive real-time embedded systems. EAST-ADL is a domain specific architectural language dedicated to safetycritical automotive embedded system design. We have previously specified EAST-ADL timing constraints in Clock Constraint Specification Language (CCSL) and proved the correctness of specification by mapping the semantics of the constraints into Uppaal models amenable to model checking. In most cases, a bounded number of violations of timing constraints in automotive systems would not lead to system failures when the results of the violations are negligible, called Weakly-Hard (WH). Previous work is extended in this paper by including support for probabilistic analysis of timing constraints in the context of WH: Probabilistic extension of CCSL, called PrCCSL, is defined and the EAST-ADL timing constraints with stochastic properties are specified in PrCCSL. The semantics of the extended constraints in PrCCSL is translated into Uppaal-SMC models for formal verification. Furthermore, a set of mapping rules is proposed to facilitate guarantee of translation. Our approach is demonstrated on an autonomous traffic sign recognition vehicle case study.
SEJun 20, 2018
Formal Specification & Analysis of Autonomous Systems in PrCCSL/Simulink Design VerifierEun-Young Kang, Li Huang
Modeling and analysis of timing constraints is crucial in automotive systems. EAST-ADL is a domain specific architectural language dedicated to safety-critical automotive embedded system design. In most cases, a bounded number of violations of timing constraints in systems would not lead to system failures when the results of the violations are negligible, called Weakly-Hard (WH). We have previously specified EAST-ADL timing constraints in Clock Constraint Specification Language (CCSL) and transformed timed behaviors in CCSL into formal models amenable to model checking. Previous work is extended in this paper by including support for probabilistic analysis of timing constraints in the context of WH: Probabilistic extension of CCSL, called PrCCSL, is defined and the EAST-ADL timing constraints with stochastic properties are specified in PrCCSL. The semantics of the extended constraints in PrCCSL is translated into Proof Objective Models that can be verified using SIMULINK DESIGN VERIFIER. Furthermore, a set of mapping rules is proposed to facilitate guarantee of translation. Our approach is demonstrated on an autonomous traffic sign recognition vehicle case study.
SEMar 16, 2018
Model-based Verification and Validation of an Autonomous Vehicle SystemEun-Young Kang, Dongrui Mu, Li Huang et al.
The software development for Cyber-Physical Systems (CPS), e.g., autonomous vehicles, requires both functional and non-functional quality assurance to guarantee that the CPS operates safely and effectively. EAST-ADL is a domain specific architectural language dedicated to safety-critical automotive embedded system design. We have previously modified EAST-ADL to include energy constraints and transformed energy-aware real-time (ERT) behaviors modeled in EAST-ADL/STATEFLOW into UPPAAL models amenable to formal verification. Previous work is extended in this paper by including support for SIMULINK and an integration of Simulink/Stateflow within a same tool-chain. Simulink/Stateflow models are transformed, based on extended ERT constraints in EAST-ADL, into verifiable UPPAAL models with stochastic semantics and integrate the translation with formal statistical analysis techniques: Probabilistic extension of EAST-ADL constraints is defined as a semantics denotation. A set of mapping rules is proposed to facilitate the guarantee of translation. Formal analysis on both functional- and non-functional properties is performed using SIMULINK DESIGN VERIFIER/UPPAAL-SMC. The analysis techniques are validated and demonstrated on the autonomous traffic sign recognition vehicle case study.
SEMar 16, 2018
Formal Analysis of Non-functional Properties for a Cooperative Automotive SystemEun-Young Kang, Li Huang, Dongrui Mu
Modeling and analysis of nonfunctional requirements is crucial in automotive systems. EAST-ADL is an architectural language dedicated to safety-critical automotive system design. We have previously modified EAST-ADL to include energy constraints and transformed energy-aware timed (ET) behaviors modeled in SIMULINK/STATEFLOW into UPPAAL models amenable to formal verification. Previous work is extended in this paper by including support for SIMULINK DESIGN VERIFIER (SDV), i.e., the ET constraints are translated into proof objective models that can be verified using SDV. Furthermore, probabilistic extension of EAST-ADL constraints is defined and the semantics of the extended constraints is translated into verifiable UPPAAL models with stochastic semantics for formal verification. A set of mapping rules are proposed to facilitate the guarantee of translation. Verification & Validation are performed on the extended timing and energy constraints using SDV and UPPAAL-SMC. Our approach is demonstrated on a cooperative automotive system case study.
CVFeb 14, 2018
Web-Scale Responsive Visual Search at BingHoudong Hu, Yan Wang, Linjun Yang et al.
In this paper, we introduce a web-scale general visual search system deployed in Microsoft Bing. The system accommodates tens of billions of images in the index, with thousands of features for each image, and can respond in less than 200 ms. In order to overcome the challenges in relevance, latency, and scalability in such large scale of data, we employ a cascaded learning-to-rank framework based on various latest deep learning visual features, and deploy in a distributed heterogeneous computing platform. Quantitative and qualitative experiments show that our system is able to support various applications on Bing website and apps.
STR-ELDec 6, 2016
Recommender Engine for Continuous Time Quantum Monte Carlo MethodsLi Huang, Yi-feng Yang, Lei Wang
Recommender systems play an essential role in the modern business world. They recommend favorable items like books, movies, and search queries to users based on their past preferences. Applying similar ideas and techniques to Monte Carlo simulations of physical systems boosts their efficiency without sacrificing accuracy. Exploiting the quantum to classical mapping inherent in the continuous-time quantum Monte Carlo methods, we construct a classical molecular gas model to reproduce the quantum distributions. We then utilize powerful molecular simulation techniques to propose efficient quantum Monte Carlo updates. The recommender engine approach provides a general way to speed up the quantum impurity solvers.
COMP-PHOct 10, 2016
Accelerate Monte Carlo Simulations with Restricted Boltzmann MachinesLi Huang, Lei Wang
Despite their exceptional flexibility and popularity, the Monte Carlo methods often suffer from slow mixing times for challenging statistical physics problems. We present a general strategy to overcome this difficulty by adopting ideas and techniques from the machine learning community. We fit the unnormalized probability of the physical model to a feedforward neural network and reinterpret the architecture as a restricted Boltzmann machine. Then, exploiting its feature detection ability, we utilize the restricted Boltzmann machine for efficient Monte Carlo updates and to speed up the simulation of the original physical system. We implement these ideas for the Falicov-Kimball model and demonstrate improved acceptance ratio and autocorrelation time near the phase transition point.