HEJul 10, 2023
Observation of high-energy neutrinos from the Galactic planeR. Abbasi, M. Ackermann, J. Adams et al.
The origin of high-energy cosmic rays, atomic nuclei that continuously impact Earth's atmosphere, has been a mystery for over a century. Due to deflection in interstellar magnetic fields, cosmic rays from the Milky Way arrive at Earth from random directions. However, near their sources and during propagation, cosmic rays interact with matter and produce high-energy neutrinos. We search for neutrino emission using machine learning techniques applied to ten years of data from the IceCube Neutrino Observatory. We identify neutrino emission from the Galactic plane at the 4.5$σ$ level of significance, by comparing diffuse emission models to a background-only hypothesis. The signal is consistent with modeled diffuse emission from the Galactic plane, but could also arise from a population of unresolved point sources.
HEP-EXSep 7, 2022
Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCubeR. Abbasi, M. Ackermann, J. Adams et al.
IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challenge due to the irregular detector geometry, inhomogeneous scattering and absorption of light in the ice and, below 100 GeV, the relatively low number of signal photons produced per event. To address this challenge, it is possible to represent IceCube events as point cloud graphs and use a Graph Neural Network (GNN) as the classification and reconstruction method. The GNN is capable of distinguishing neutrino events from cosmic-ray backgrounds, classifying different neutrino event types, and reconstructing the deposited energy, direction and interaction vertex. Based on simulation, we provide a comparison in the 1-100 GeV energy range to the current state-of-the-art maximum likelihood techniques used in current IceCube analyses, including the effects of known systematic uncertainties. For neutrino event classification, the GNN increases the signal efficiency by 18% at a fixed false positive rate (FPR), compared to current IceCube methods. Alternatively, the GNN offers a reduction of the FPR by over a factor 8 (to below half a percent) at a fixed signal efficiency. For the reconstruction of energy, direction, and interaction vertex, the resolution improves by an average of 13%-20% compared to current maximum likelihood techniques in the energy range of 1-30 GeV. The GNN, when run on a GPU, is capable of processing IceCube events at a rate nearly double of the median IceCube trigger rate of 2.7 kHz, which opens the possibility of using low energy neutrinos in online searches for transient events.
CLOct 1, 2025
Characterizing Model Behavior Under Synthetic Data Training: An Empirical Study Across Scales and Mixing RatiosY. Du, G. Wu, G. Tang et al.
Synthetic data generated by large language models has become integral to modern NLP training pipelines, from bootstrapping reasoning capabilities to augmenting instruction-following datasets. While recent work demonstrates successful applications maintaining high external data ratios, systematic understanding of how synthetic data proportion affects model behavior across different scales remains limited. This paper presents a controlled empirical study examining model performance, calibration, and output characteristics when trained on varying synthetic-to-external data ratios. Using the Pythia model suite (410M-12B parameters) across five diverse tasks, we evaluate models after one to three training iterations with synthetic data proportions ranging from 0-50\%. Our key findings include: models maintain stable performance with up to 20\% synthetic data, but degradation accelerates beyond 30\%; larger models (6.9B-12B) show greater robustness to synthetic data than smaller models (410M-1.4B); calibration degradation precedes accuracy loss, providing an early warning signal; and task characteristics matter, with reasoning tasks degrading faster than retrieval tasks under synthetic data training. Importantly, we find that current best practices, such as those employed in STaR and Self-Instruct systems that maintain greater than 80\% external data, operate well within safe regimes identified by our experiments. We provide practical guidance for practitioners on synthetic data budgets based on model scale and task requirements, alongside detailed comparison with concurrent work including Shumailov et al.'s model collapse findings.
CLAug 17, 2025
Cognitive Decision Routing in Large Language Models: When to Think Fast, When to Think SlowY. Du, C. Guo, W. Wang et al.
Large Language Models (LLMs) face a fundamental challenge in deciding when to rely on rapid, intuitive responses versus engaging in slower, more deliberate reasoning. Inspired by Daniel Kahneman's dual-process theory and his insights on human cognitive biases, we propose a novel Cognitive Decision Routing (CDR) framework that dynamically determines the appropriate reasoning strategy based on query characteristics. Our approach addresses the current limitations where models either apply uniform reasoning depth or rely on computationally expensive methods for all queries. We introduce a meta-cognitive layer that analyzes query complexity through multiple dimensions: correlation strength between given information and required conclusions, domain boundary crossings, stakeholder multiplicity, and uncertainty levels. Through extensive experiments on diverse reasoning tasks, we demonstrate that CDR achieves superior performance while reducing computational costs by 34\% compared to uniform deep reasoning approaches. Our framework shows particular strength in professional judgment tasks, achieving 23\% improvement in consistency and 18\% better accuracy on expert-level evaluations. This work bridges cognitive science principles with practical AI system design, offering a principled approach to adaptive reasoning in LLMs.
AIJan 14, 2025
A Driver Advisory System Based on Large Language Model for High-speed TrainY. C. Luo, J. Xun, W. Wang et al.
With the rapid development of China high-speed railway, drivers face increasingly significant technical challenges during operations, such as fault handling. Currently, drivers depend on the onboard mechanic when facing technical issues, for instance, traction loss or sensor faults. This dependency can hinder effective operation, even lead to accidents, while waiting for faults to be addressed. To enhance the accuracy and explainability of actions during fault handling, an Intelligent Driver Advisory System (IDAS) framework based on a large language model (LLM) named IDAS-LLM, is introduced. Initially, domain-fine-tuning of the LLM is performed using a constructed railway knowledge question-and-answer dataset to improve answer accuracy in railway-related questions. Subsequently, integration of the Retrieval-augmented Generation (RAG) architecture is pursued for system design to enhance the explainability of generated responses. Comparative experiments are conducted using the constructed railway driving knowledge assessment dataset. Results indicate that domain-fine-tuned LLMs show an improvement in answer accuracy by an average of 10%, outperforming some current mainstream LLMs. Additionally, the inclusion of the RAG framework increases the average recall rate of question-and-answer sessions by about 4%. Finally, the fault handling capability of IDAS-LLM is demonstrated through simulations of real operational scenarios, proving that the proposed framework has practical application prospects.
CVOct 12, 2021
Topic Scene Graph Generation by Attention Distillation from CaptionW. Wang, R. Wang, X. Chen
If an image tells a story, the image caption is the briefest narrator. Generally, a scene graph prefers to be an omniscient generalist, while the image caption is more willing to be a specialist, which outlines the gist. Lots of previous studies have found that a scene graph is not as practical as expected unless it can reduce the trivial contents and noises. In this respect, the image caption is a good tutor. To this end, we let the scene graph borrow the ability from the image caption so that it can be a specialist on the basis of remaining all-around, resulting in the so-called Topic Scene Graph. What an image caption pays attention to is distilled and passed to the scene graph for estimating the importance of partial objects, relationships, and events. Specifically, during the caption generation, the attention about individual objects in each time step is collected, pooled, and assembled to obtain the attention about relationships, which serves as weak supervision for regularizing the estimated importance scores of relationships. In addition, as this attention distillation process provides an opportunity for combining the generation of image caption and scene graph together, we further transform the scene graph into linguistic form with rich and free-form expressions by sharing a single generation model with image caption. Experiments show that attention distillation brings significant improvements in mining important relationships without strong supervision, and the topic scene graph shows great potential in subsequent applications.
LGAug 18, 2019
Robust DCD-Based Recursive Adaptive AlgorithmsY. Yu, L. Lu, Z. Zheng et al.
The dichotomous coordinate descent (DCD) algorithm has been successfully used for significant reduction in the complexity of recursive least squares (RLS) algorithms. In this work, we generalize the application of the DCD algorithm to RLS adaptive filtering in impulsive noise scenarios and derive a unified update formula. By employing different robust strategies against impulsive noise, we develop novel computationally efficient DCD-based robust recursive algorithms. Furthermore, to equip the proposed algorithms with the ability to track abrupt changes in unknown systems, a simple variable forgetting factor mechanism is also developed. Simulation results for channel identification scenarios in impulsive noise demonstrate the effectiveness of the proposed algorithms.
CRDec 21, 2015
Generating Searchable Public-Key Ciphertexts with Hidden Structures for Fast Keyword SearchP. Xu, Q. Wu, W. Wang et al.
Existing semantically secure public-key searchable encryption schemes take search time linear with the total number of the ciphertexts. This makes retrieval from large-scale databases prohibitive. To alleviate this problem, this paper proposes Searchable Public-Key Ciphertexts with Hidden Structures (SPCHS) for keyword search as fast as possible without sacrificing semantic security of the encrypted keywords. In SPCHS, all keyword-searchable ciphertexts are structured by hidden relations, and with the search trapdoor corresponding to a keyword, the minimum information of therelations is disclosed to a search algorithm as the guidance to find all matching ciphertexts efficiently. We construct a simple SPCHS scheme from scratch in which the ciphertexts have a hidden star-like structure. We prove our scheme to be semantically secure based on the decisional bilinear Diffie-Hellman assumption in the Random Oracle (RO) model. The search complexity of our scheme is dependent on the actual number of the ciphertexts containing the queried keyword, rather than the number of all ciphertexts. Finally, we present a generic SPCHS construction from anonymous identity-based encryption and collision-free full-identity malleable Identity-Based Key Encapsulation Mechanism (IBKEM) with anonymity. We illustrate two collision-free full-identity malleable IBKEM instances, which are semantically secure and anonymous, respectively, in the RO and standard models. The latter instance enables us to construct an SPCHS scheme with semantic security in the standard model.