Hongjun An

AI
h-index16
12papers
135citations
Novelty61%
AI Score55

12 Papers

AIAug 1, 2024
SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context

Hongjun An, Yifan Chen, Zhe Sun et al.

Current large language models (LLMs) primarily utilize next-token prediction method for inference, which significantly impedes their processing speed. In this paper, we introduce a novel inference methodology termed next-sentence prediction, aiming at enhancing the inference efficiency of LLMs. We present Sentence Variational Autoencoder (SentenceVAE), which includes a Sentence Encoder to compress multiple tokens in a sentence into a single token, and a Sentence Decoder to reconstruct it. By integrating SentenceVAE into the input and output layers of LLMs, we develop Sentence-level LLMs (SLLMs) that employ a sentence-by-sentence inference method. In addition, the SentenceVAE module of SLLMs can maintain the integrity of the original semantic content by segmenting the context into sentences, thereby improving accuracy while boosting inference speed. Moreover, compared to previous LLMs, SLLMs process fewer tokens over equivalent context length, significantly reducing memory demands for self-attention computation and facilitating the handling of longer context. Extensive experiments on Wanjuan dataset have revealed that the proposed method can accelerate inference speed by 204~365%, reduce perplexity (PPL) to 46~75% of its original metric, and decrease memory overhead by 86~91% for the equivalent context length, compared to previous token-by-token methods.

CVApr 14, 2024Code
StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging

Xuelong Li, Hongjun An, Haofei Zhao et al.

In this paper, we introduce StreakNet-Arch, a real-time, end-to-end binary-classification framework based on our self-developed Underwater Carrier LiDAR-Radar (UCLR) that embeds Self-Attention and our novel Double Branch Cross Attention (DBC-Attention) to enhance scatter suppression. Under controlled water tank validation conditions, StreakNet-Arch with Self-Attention or DBC-Attention outperforms traditional bandpass filtering and achieves higher $F_1$ scores than learning-based MP networks and CNNs at comparable model size and complexity. Real-time benchmarks on an NVIDIA RTX 3060 show a constant Average Imaging Time (54 to 84 ms) regardless of frame count, versus a linear increase (58 to 1,257 ms) for conventional methods. To facilitate further research, we contribute a publicly available streak-tube camera image dataset contains 2,695,168 real-world underwater 3D point cloud data. More importantly, we validate our UCLR system in a South China Sea trial, reaching an error of 46mm for 3D target at 1,000 m depth and 20 m range. Source code and data are available at https://github.com/BestAnHongjun/StreakNet .

8.9AIApr 16
Sequence Search: Automated Sequence Design using Neural Architecture Search

Rokgi Hong, Hongjun An, Sooyeon Ji et al.

Developing an MR sequence is challenging and remains largely constrained by human intuition. Recently, AI-driven approaches have been proposed; however, most require an initial sequence for parameter optimization or extensive training datasets, limiting their general applicability. In this study, we propose "Sequence Search," an automated sequence design framework based on neural architecture search. The method takes tissue properties, imaging parameters, and design objectives as inputs and generates pulse sequences satisfying the design objectives, without requiring prior knowledge of conventional sequence structures. Sequence Search iteratively generates candidate sequences through neural architecture search and optimizes them via a differentiable Bloch simulator and objective-specific loss functions using gradient-based learning. The framework successfully replicated conventional spin-echo, T2-weighted spin-echo, and inversion recovery sequences. Less intuitive solutions were also discovered, such as three-RF spin-echo-like sequences with reduced RF energy and refocusing phases deviating from the conventional Hahn-echo. This work establishes a generalizable framework for automated MR sequence design, highlighting the potential to explore configurations beyond conventional designs based on human intuition.

LGDec 29, 2025
Theoretical Foundations of Scaling Law in Familial Models

Huan Song, Qingfei Zhao, Ting Long et al.

Neural scaling laws have become foundational for optimizing large language model (LLM) training, yet they typically assume a single dense model output. This limitation effectively overlooks "Familial models, a transformative paradigm essential for realizing ubiquitous intelligence across heterogeneous device-edge-cloud hierarchies. Transcending static architectures, familial models integrate early exits with relay-style inference to spawn G deployable sub-models from a single shared backbone. In this work, we theoretically and empirically extend the scaling law to capture this "one-run, many-models" paradigm by introducing Granularity (G) as a fundamental scaling variable alongside model size (N) and training tokens (D). To rigorously quantify this relationship, we propose a unified functional form L(N, D, G) and parameterize it using large-scale empirical runs. Specifically, we employ a rigorous IsoFLOP experimental design to strictly isolate architectural impact from computational scale. Across fixed budgets, we systematically sweep model sizes (N) and granularities (G) while dynamically adjusting tokens (D). This approach effectively decouples the marginal cost of granularity from the benefits of scale, ensuring high-fidelity parameterization of our unified scaling law. Our results reveal that the granularity penalty follows a multiplicative power law with an extremely small exponent. Theoretically, this bridges fixed-compute training with dynamic architectures. Practically, it validates the "train once, deploy many" paradigm, demonstrating that deployment flexibility is achievable without compromising the compute-optimality of dense baselines.

53.1AIMar 23
Silicon Bureaucracy and AI Test-Oriented Education: Contamination Sensitivity and Score Confidence in LLM Benchmarks

Yiliang Song, Hongjun An, Jiangan Chen et al.

Public benchmarks increasingly govern how large language models (LLMs) are ranked, selected, and deployed. We frame this benchmark-centered regime as Silicon Bureaucracy and AI Test-Oriented Education, and argue that it rests on a fragile assumption: that benchmark scores directly reflect genuine generalization. In practice, however, such scores may conflate exam-oriented competence with principled capability, especially when contamination and semantic leakage are difficult to exclude from modern training pipelines. We therefore propose an audit framework for analyzing contamination sensitivity and score confidence in LLM benchmarks. Using a router-worker setup, we compare a clean-control condition with noisy conditions in which benchmark problems are systematically deleted, rewritten, and perturbed before being passed downstream. For a genuinely clean benchmark, noisy conditions should not consistently outperform the clean-control baseline. Yet across multiple models, we find widespread but heterogeneous above-baseline gains under noisy conditions, indicating that benchmark-related cues may be reassembled and can reactivate contamination-related memory. These results suggest that similar benchmark scores may carry substantially different levels of confidence. Rather than rejecting benchmarks altogether, we argue that benchmark-based evaluation should be supplemented with explicit audits of contamination sensitivity and score confidence.

CLFeb 26
Ruyi2 Technical Report

Huan Song, Shuyu Tian, Junyi Hao et al.

Large Language Models (LLMs) face significant challenges regarding deployment costs and latency, necessitating adaptive computing strategies. Building upon the AI Flow framework, we introduce Ruyi2 as an evolution of our adaptive model series designed for efficient variable-depth computation. While early-exit architectures offer a viable efficiency-performance balance, the Ruyi model and existing methods often struggle with optimization complexity and compatibility with large-scale distributed training. To bridge this gap, Ruyi2 introduces a stable "Familial Model" based on Megatron-LM. By using 3D parallel training, it achieves a 2-3 times speedup over Ruyi, while performing comparably to same-sized Qwen3 models. These results confirm that family-based parameter sharing is a highly effective strategy, establishing a new "Train Once, Deploy Many" paradigm and providing a key reference for balancing architectural efficiency with high-performance capabilities.

AIJan 23
CreditAudit: 2$^\text{nd}$ Dimension for LLM Evaluation and Selection

Yiliang Song, Hongjun An, Jiangong Xiao et al.

Leaderboard scores on public benchmarks have been steadily rising and converging, with many frontier language models now separated by only marginal differences. However, these scores often fail to match users' day to day experience, because system prompts, output protocols, and interaction modes evolve under routine iteration, and in agentic multi step pipelines small protocol shifts can trigger disproportionate failures, leaving practitioners uncertain about which model to deploy. We propose CreditAudit, a deployment oriented credit audit framework that evaluates models under a family of semantically aligned and non adversarial system prompt templates across multiple benchmarks, reporting mean ability as average performance across scenarios and scenario induced fluctuation sigma as a stability risk signal, and further mapping volatility into interpretable credit grades from AAA to BBB via cross model quantiles with diagnostics that mitigate template difficulty drift. Controlled experiments on GPQA, TruthfulQA, and MMLU Pro show that models with similar mean ability can exhibit substantially different fluctuation, and stability risk can overturn prioritization decisions in agentic or high failure cost regimes. By providing a 2D and grade based language for regime specific selection, CreditAudit supports tiered deployment and more disciplined allocation of testing and monitoring effort, enabling more objective and trustworthy model evaluation for real world use.

AIJun 14, 2025
AI Flow: Perspectives, Scenarios, and Approaches

Hongjun An, Wenhan Hu, Sida Huang et al.

Pioneered by the foundational information theory by Claude Shannon and the visionary framework of machine intelligence by Alan Turing, the convergent evolution of information and communication technologies (IT/CT) has created an unbroken wave of connectivity and computation. This synergy has sparked a technological revolution, now reaching its peak with large artificial intelligence (AI) models that are reshaping industries and redefining human-machine collaboration. However, the realization of ubiquitous intelligence faces considerable challenges due to substantial resource consumption in large models and high communication bandwidth demands. To address these challenges, AI Flow has been introduced as a multidisciplinary framework that integrates cutting-edge IT and CT advancements, with a particular emphasis on the following three key points. First, device-edge-cloud framework serves as the foundation, which integrates end devices, edge servers, and cloud clusters to optimize scalability and efficiency for low-latency model inference. Second, we introduce the concept of familial models, which refers to a series of different-sized models with aligned hidden features, enabling effective collaboration and the flexibility to adapt to varying resource constraints and dynamic scenarios. Third, connectivity- and interaction-based intelligence emergence is a novel paradigm of AI Flow. By leveraging communication networks to enhance connectivity, the collaboration among AI models across heterogeneous nodes achieves emergent intelligence that surpasses the capability of any single model. The innovations of AI Flow provide enhanced intelligence, timely responsiveness, and ubiquitous accessibility to AI services, paving the way for the tighter fusion of AI techniques and communication systems.

IVMar 10, 2025
Large model enhanced computational ghost imaging

Yifan Chen, Hongjun An, Zhe Sun et al.

Ghost imaging (GI) achieves 2D image reconstruction through high-order correlation of 1D bucket signals and 2D light field information, particularly demonstrating enhanced detection sensitivity and high-quality image reconstruction via efficient photon collection in scattering media. Recent investigations have established that deep learning (DL) can substantially enhance the ghost imaging reconstruction quality. Furthermore, with the emergence of large models like SDXL, GPT-4, etc., the constraints of conventional DL in parameters and architecture have been transcended, enabling models to comprehensively explore relationships among all distinct positions within feature sequences. This paradigm shift has significantly advanced the capability of DL in restoring severely degraded and low-resolution imagery, making it particularly advantageous for noise-robust image reconstruction in GI applications. In this paper, we propose the first large imaging model with 1.4 billion parameters that incorporates the physical principles of GI (GILM). The proposed GILM implements a skip connection mechanism to mitigate gradient explosion challenges inherent in deep architectures, ensuring sufficient parametric capacity to capture intricate correlations among object single-pixel measurements. Moreover, GILM leverages multi-head attention mechanism to learn spatial dependencies across pixel points during image reconstruction, facilitating the extraction of comprehensive object information for subsequent reconstruction. We validated the effectiveness of GILM through a series of experiments, including simulated object imaging, imaging objects in free space, and imaging object located 52 meters away in underwater environment. The experimental results show that GILM effectively analyzes the fluctuation trends of the collected signals, thereby optimizing the recovery of the object's image from the acquired data.

LGNov 1, 2024
Physics in Next-token Prediction

Hongjun An, Yiliang Song, Xuelong Li

We discovered the underlying physics in Next-token Prediction (NTP). We identified the law of information conservation within NTP and proposed the First Law of Information Capacity (IC-1), demonstrating that the essence of intelligence emergence in auto-regressive models is fundamentally a process of information transfer. We also introduced Landauer's Principle into NTP, formulating the Second Law of Information Capacity (IC-2), which establishes the relationship between auto-regressive model training and energy consumption. Additionally, we presented several corollaries, which hold practical significance for production practices. Finally, we demonstrate the consistency between the Law of Information Capacity and the Scaling Law for Neural Language Models, the Knowledge Capacity Scaling Laws, and the Scaling Laws for Precision.

IVMay 7, 2021
Deep reinforcement learning-designed radiofrequency waveform in MRI

Dongmyung Shin, Younghoon Kim, Chungseok Oh et al.

Carefully engineered radiofrequency (RF) pulses play a key role in a number of systems such as mobile phone, radar, and magnetic resonance imaging. The design of an RF waveform, however, is often posed as an inverse problem with no general solution. As a result, various design methods each with a specific purpose have been developed based on the intuition of human experts. In this work, we propose an artificial intelligence (AI)-powered RF pulse design framework, DeepRF, which utilizes the self-learning characteristics of deep reinforcement learning to generate a novel RF pulse. The effectiveness of DeepRF is demonstrated using four types of RF pulses that are commonly used. The DeepRF-designed pulses successfully satisfy the design criteria while reporting reduced energy. Analyses demonstrate the pulses utilize new mechanisms of magnetization manipulation, suggesting the potentials of DeepRF in discovering unseen design dimensions beyond human intuition. This work may lay the foundation for an emerging field of AI-driven RF waveform design.

IVFeb 4, 2021
DIFFnet: Diffusion parameter mapping network generalized for input diffusion gradient schemes and bvalues

Juhung Park, Woojin Jung, Eun-Jung Choi et al.

In MRI, deep neural networks have been proposed to reconstruct diffusion model parameters. However, the inputs of the networks were designed for a specific diffusion gradient scheme (i.e., diffusion gradient directions and numbers) and a specific b-value that are the same as the training data. In this study, a new deep neural network, referred to as DIFFnet, is developed to function as a generalized reconstruction tool of the diffusion-weighted signals for various gradient schemes and b-values. For generalization, diffusion signals are normalized in a q-space and then projected and quantized, producing a matrix (Qmatrix) as an input for the network. To demonstrate the validity of this approach, DIFFnet is evaluated for diffusion tensor imaging (DIFFnetDTI) and for neurite orientation dispersion and density imaging (DIFFnetNODDI). In each model, two datasets with different gradient schemes and b-values are tested. The results demonstrate accurate reconstruction of the diffusion parameters at substantially reduced processing time (approximately 8.7 times and 2240 times faster processing time than conventional methods in DTI and NODDI, respectively; less than 4% mean normalized root-mean-square errors (NRMSE) in DTI and less than 8% in NODDI). The generalization capability of the networks was further validated using reduced numbers of diffusion signals from the datasets. Different from previously proposed deep neural networks, DIFFnet does not require any specific gradient scheme and b-value for its input. As a result, it can be adopted as an online reconstruction tool for various complex diffusion imaging.