CLSep 26, 2023
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech RecognitionYu Yu, Chao-Han Huck Yang, Jari Kolehmainen et al.
We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.
CVFeb 9Code
MOVA: Towards Scalable and Synchronized Video-Audio GenerationSII-OpenMOSS Team, Donghua Yu, Mingshu Chen et al.
Audio is indispensable for real-world video, yet generation models have largely overlooked audio components. Current approaches to producing audio-visual content often rely on cascaded pipelines, which increase cost, accumulate errors, and degrade overall quality. While systems such as Veo 3 and Sora 2 emphasize the value of simultaneous generation, joint multimodal modeling introduces unique challenges in architecture, data, and training. Moreover, the closed-source nature of existing systems limits progress in the field. In this work, we introduce MOVA (MOSS Video and Audio), an open-source model capable of generating high-quality, synchronized audio-visual content, including realistic lip-synced speech, environment-aware sound effects, and content-aligned music. MOVA employs a Mixture-of-Experts (MoE) architecture, with a total of 32B parameters, of which 18B are active during inference. It supports IT2VA (Image-Text to Video-Audio) generation task. By releasing the model weights and code, we aim to advance research and foster a vibrant community of creators. The released codebase features comprehensive support for efficient inference, LoRA fine-tuning, and prompt enhancement.
IROct 27, 2022
AutoAttention: Automatic Field Pair Selection for Attention in User Behavior ModelingZuowu Zheng, Xiaofeng Gao, Junwei Pan et al.
In Click-through rate (CTR) prediction models, a user's interest is usually represented as a fixed-length vector based on her history behaviors. Recently, several methods are proposed to learn an attentive weight for each user behavior and conduct weighted sum pooling. However, these methods only manually select several fields from the target item side as the query to interact with the behaviors, neglecting the other target item fields, as well as user and context fields. Directly including all these fields in the attention may introduce noise and deteriorate the performance. In this paper, we propose a novel model named AutoAttention, which includes all item/user/context side fields as the query, and assigns a learnable weight for each field pair between behavior fields and query fields. Pruning on these field pairs via these learnable weights lead to automatic field pair selection, so as to identify and remove noisy field pairs. Though including more fields, the computation cost of AutoAttention is still low due to using a simple attention function and field pair selection. Extensive experiments on the public dataset and Tencent's production dataset demonstrate the effectiveness of the proposed approach.
CLDec 23, 2025Code
Multi-hop Reasoning via Early Knowledge AlignmentYuxin Wang, Shicheng Fang, Bo Wang et al.
Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for Large Language Models (LLMs) to address knowledge-intensive queries requiring domain-specific or up-to-date information. To handle complex multi-hop questions that are challenging for single-step retrieval, iterative RAG approaches incorporating reinforcement learning have been proposed. However, existing iterative RAG systems typically plan to decompose questions without leveraging information about the available retrieval corpus, leading to inefficient retrieval and reasoning chains that cascade into suboptimal performance. In this paper, we introduce Early Knowledge Alignment (EKA), a simple but effective module that aligns LLMs with retrieval set before planning in iterative RAG systems with contextually relevant retrieved knowledge. Extensive experiments on six standard RAG datasets demonstrate that by establishing a stronger reasoning foundation, EKA significantly improves retrieval precision, reduces cascading errors, and enhances both performance and efficiency. Our analysis from an entropy perspective demonstrate that incorporating early knowledge reduces unnecessary exploration during the reasoning process, enabling the model to focus more effectively on relevant information subsets. Moreover, EKA proves effective as a versatile, training-free inference strategy that scales seamlessly to large models. Generalization tests across diverse datasets and retrieval corpora confirm the robustness of our approach. Overall, EKA advances the state-of-the-art in iterative RAG systems while illuminating the critical interplay between structured reasoning and efficient exploration in reinforcement learning-augmented frameworks. The code is released at \href{https://github.com/yxzwang/EarlyKnowledgeAlignment}{Github}.
PLMar 8, 2024Code
LLM4Decompile: Decompiling Binary Code with Large Language ModelsHanzhuo Tan, Qi Luo, Jing Li et al.
Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results. Our code, dataset, and models are released at https://github.com/albertan017/LLM4Decompile
CLNov 1, 2025
Zero-RAG: Towards Retrieval-Augmented Generation with Zero Redundant KnowledgeQi Luo, Xiaonan Li, Junqi Dai et al.
Retrieval-Augmented Generation has shown remarkable results to address Large Language Models' hallucinations, which usually uses a large external corpus to supplement knowledge to LLMs. However, with the development of LLMs, the internal knowledge of LLMs has expanded significantly, thus causing significant knowledge redundancy between the external corpus and LLMs. On the one hand, the indexing cost of dense retrieval is highly related to the corpus size and thus significant redundant knowledge intensifies the dense retrieval's workload. On the other hand, the redundant knowledge in the external corpus is not helpful to LLMs and our exploratory analysis shows that it instead hurts the RAG performance on those questions which the LLM can answer by itself. To address these issues, we propose Zero-RAG to tackle these challenges. Specifically, we first propose the Mastery-Score metric to identify redundant knowledge in the RAG corpus to prune it. After pruning, answers to "mastered" questions rely primarily on internal knowledge of the LLM. To better harness the internal capacity, we propose Query Router and Noise-Tolerant Tuning to avoid the irrelevant documents' distraction and thus further improve the LLM's utilization of internal knowledge with pruned corpus. Experimental results show that Zero-RAG prunes the Wikipedia corpus by 30\% and accelerates the retrieval stage by 22\%, without compromising RAG's performance.
CLOct 31, 2025
MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic RetrievalQi Luo, Xiaonan Li, Yuxin Wang et al.
Large Language Models (LLMs) excel at reasoning and generation but are inherently limited by static pretraining data, resulting in factual inaccuracies and weak adaptability to new information. Retrieval-Augmented Generation (RAG) addresses this issue by grounding LLMs in external knowledge; However, the effectiveness of RAG critically depends on whether the model can adequately access relevant information. Existing RAG systems rely on a single retriever with fixed top-k selection, restricting access to a narrow and static subset of the corpus. As a result, this single-retriever paradigm has become the primary bottleneck for comprehensive external information acquisition, especially in tasks requiring corpus-level reasoning. To overcome this limitation, we propose MARAG-R1, a reinforcement-learned multi-tool RAG framework that enables LLMs to dynamically coordinate multiple retrieval mechanisms for broader and more precise information access. MARAG-R1 equips the model with four retrieval tools -- semantic search, keyword search, filtering, and aggregation -- and learns both how and when to use them through a two-stage training process: supervised fine-tuning followed by reinforcement learning. This design allows the model to interleave reasoning and retrieval, progressively gathering sufficient evidence for corpus-level synthesis. Experiments on GlobalQA, HotpotQA, and 2WikiMultiHopQA demonstrate that MARAG-R1 substantially outperforms strong baselines and achieves new state-of-the-art results in corpus-level reasoning tasks.
CLOct 30, 2025
Towards Global Retrieval Augmented Generation: A Benchmark for Corpus-Level ReasoningQi Luo, Xiaonan Li, Tingshuo Fan et al.
Retrieval-augmented generation (RAG) has emerged as a leading approach to reducing hallucinations in large language models (LLMs). Current RAG evaluation benchmarks primarily focus on what we call local RAG: retrieving relevant chunks from a small subset of documents to answer queries that require only localized understanding within specific text chunks. However, many real-world applications require a fundamentally different capability -- global RAG -- which involves aggregating and analyzing information across entire document collections to derive corpus-level insights (for example, "What are the top 10 most cited papers in 2023?"). In this paper, we introduce GlobalQA -- the first benchmark specifically designed to evaluate global RAG capabilities, covering four core task types: counting, extremum queries, sorting, and top-k extraction. Through systematic evaluation across different models and baselines, we find that existing RAG methods perform poorly on global tasks, with the strongest baseline achieving only 1.51 F1 score. To address these challenges, we propose GlobalRAG, a multi-tool collaborative framework that preserves structural coherence through chunk-level retrieval, incorporates LLM-driven intelligent filters to eliminate noisy documents, and integrates aggregation modules for precise symbolic computation. On the Qwen2.5-14B model, GlobalRAG achieves 6.63 F1 compared to the strongest baseline's 1.51 F1, validating the effectiveness of our method.
CRMar 2
DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull PatternXiaoyi Pang, Xuanyi Hao, Pengyu Liu et al.
Recent intelligent systems integrate powerful Large Language Models (LLMs) through APIs, but their trustworthiness may be critically undermined by targeted attacks like backdoor and prompt injection attacks, which secretly force LLMs to generate specific malicious sequences. Existing defensive approaches for such threats typically rely on high access rights, impose prohibitive costs, and hinder normal inference, rendering them impractical for real-world scenarios. To solve these limitations, we introduce DualSentinel, a lightweight and unified defense framework that can accurately and promptly detect the activation of targeted attacks alongside the LLM generation process. We first identify a characteristic of compromised LLMs, termed Entropy Lull: when a targeted attack successfully hijacks the generation process, the LLM exhibits a distinct period of abnormally low and stable token probability entropy, indicating it is following a fixed path rather than making creative choices. DualSentinel leverages this pattern by developing an innovative dual-check approach. It first employs a magnitude and trend-aware monitoring method to proactively and sensitively flag an entropy lull pattern at runtime. Upon such flagging, it triggers a lightweight yet powerful secondary verification based on task-flipping. An attack is confirmed only if the entropy lull pattern persists across both the original and the flipped task, proving that the LLM's output is coercively controlled. Extensive evaluations show that DualSentinel is both highly effective (superior detection accuracy with near-zero false positives) and remarkably efficient (negligible additional cost), offering a truly practical path toward securing deployed LLMs. The source code can be accessed at https://doi.org/10.5281/zenodo.18479273.
AIJun 5, 2023
Rhythm-controllable Attention with High Robustness for Long Sentence Speech SynthesisDengfeng Ke, Yayue Deng, Yukang Jia et al.
Regressive Text-to-Speech (TTS) system utilizes attention mechanism to generate alignment between text and acoustic feature sequence. Alignment determines synthesis robustness (e.g, the occurence of skipping, repeating, and collapse) and rhythm via duration control. However, current attention algorithms used in speech synthesis cannot control rhythm using external duration information to generate natural speech while ensuring robustness. In this study, we propose Rhythm-controllable Attention (RC-Attention) based on Tracotron2, which improves robustness and naturalness simultaneously. Proposed attention adopts a trainable scalar learned from four kinds of information to achieve rhythm control, which makes rhythm control more robust and natural, even when synthesized sentences are extremely longer than training corpus. We use word errors counting and AB preference test to measure robustness of proposed method and naturalness of synthesized speech, respectively. Results shows that RC-Attention has the lowest word error rate of nearly 0.6%, compared with 11.8% for baseline system. Moreover, nearly 60% subjects prefer to the speech synthesized with RC-Attention to that with Forward Attention, because the former has more natural rhythm.
LGMar 20
Graph-Aware Text-Only Backdoor Poisoning for Text-Attributed GraphsQi Luo, Minghui Xu, Dongxiao Yu et al.
Many learning systems now use graph data in which each node also contains text, such as papers with abstracts or users with posts. Because these texts often come from open platforms, an attacker may be able to quietly poison a small part of the training data and later make the model produce wrong predictions on demand. This paper studies that risk in a realistic setting where the attacker edits only node text and does not change the graph structure. We propose TAGBD, a text-only backdoor attack for text-attributed graphs. TAGBD first finds training nodes that are easier to influence, then generates natural-looking trigger text with the help of a shadow graph model, and finally injects the trigger by either replacing the original text or appending a short phrase. Experiments on three benchmark datasets show that the attack is highly effective, transfers across different graph models, and remains strong under common defenses. These results demonstrate that text alone is a practical attack channel in graph learning systems and suggest that future defenses should inspect both graph links and node content.
ROSep 23, 2020Code
TDR-OBCA: A Reliable Planner for Autonomous Driving in Free-Space EnvironmentRunxin He, Jinyun Zhou, Shu Jiang et al.
This paper presents an optimization-based collision avoidance trajectory generation method for autonomous driving in free-space environments, with enhanced robustness, driving comfort and efficiency. Starting from the hybrid optimization-based framework, we introduces two warm start methods, temporal and dual variable warm starts, to improve the efficiency. We also reformulate the problem to improve the robustness and efficiency. We name this new algorithm TDR-OBCA. With these changes, compared with original hybrid optimization we achieve a 96.67% failure rate decrease with respect to initial conditions, 13.53% increase in driving comforts and 3.33% to 44.82% increase in planner efficiency as obstacles number scales. We validate our results in hundreds of simulation scenarios and hundreds of hours of public road tests in both U.S. and China. Our source code is available at https://github.com/ApolloAuto/apollo.
LGApr 24
Utility-Aware Data Pricing: Token-Level Quality and Empirical Training Gain for LLMsMinghui Xu, Qi Luo, Kun Li
Traditional data valuation methods based on ``row-count $\times$ quality coefficient'' paradigms fail to capture the nuanced, nonlinear contributions that data makes to Large Language Model (LLM) capabilities. This paper presents a dynamic data valuation framework that transitions from static accounting to utility-based pricing. Our approach operates on three layers: (1) token-level information density metrics using Shannon entropy and Data Quality Scores; (2) empirical training gain measurement through influence functions, proxy model strategies, and Data Shapley values; and (3) cryptographic verifiability through hash-based commitments, Merkle trees, and a tamper-evident training ledger. We provide comprehensive experimental validation on three real domains (instruction following, mathematical reasoning, and code summarization), demonstrating that proxy-based empirical gain achieves near-perfect ranking alignment with realized utility, substantially outperforming row-count and token-count baselines. This framework enables a fair Data-as-a-Service economy where high-reasoning data is priced according to its actual contribution to model intelligence, while providing the transparency and auditability necessary for trustworthy data markets.
PLMar 7, 2025
Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?Qingyuan Liang, Zhao Zhang, Zeyu Sun et al. · pku
Grammar serves as a cornerstone in programming languages and software engineering, providing frameworks to define the syntactic space and program structure. Existing research demonstrates the effectiveness of grammar-based code representations in small-scale models, showing their ability to reduce syntax errors and enhance performance. However, as language models scale to the billion level or beyond, syntax-level errors become rare, making it unclear whether grammar information still provides performance benefits. To explore this, we develop a series of billion-scale GrammarCoder models, incorporating grammar rules in the code generation process. Experiments on HumanEval (+) and MBPP (+) demonstrate a notable improvement in code generation accuracy. Further analysis shows that grammar-based representations enhance LLMs' ability to discern subtle code differences, reducing semantic errors caused by minor variations. These findings suggest that grammar-based code representations remain valuable even in billion-scale models, not only by maintaining syntax correctness but also by improving semantic differentiation.
IVMar 31, 2024
MugenNet: A Novel Combined Convolution Neural Network and Transformer Network with its Application for Colonic Polyp Image SegmentationChen Peng, Zhiqin Qian, Kunyu Wang et al.
Biomedical image segmentation is a very important part in disease diagnosis. The term "colonic polyps" refers to polypoid lesions that occur on the surface of the colonic mucosa within the intestinal lumen. In clinical practice, early detection of polyps is conducted through colonoscopy examinations and biomedical image processing. Therefore, the accurate polyp image segmentation is of great significance in colonoscopy examinations. Convolutional Neural Network (CNN) is a common automatic segmentation method, but its main disadvantage is the long training time. Transformer utilizes a self-attention mechanism, which essentially assigns different importance weights to each piece of information, thus achieving high computational efficiency during segmentation. However, a potential drawback is the risk of information loss. In the study reported in this paper, based on the well-known hybridization principle, we proposed a method to combine CNN and Transformer to retain the strengths of both, and we applied this method to build a system called MugenNet for colonic polyp image segmentation. We conducted a comprehensive experiment to compare MugenNet with other CNN models on five publicly available datasets. The ablation experiment on MugentNet was conducted as well. The experimental results show that MugenNet achieves significantly higher processing speed and accuracy compared with CNN alone. The generalized implication with our work is a method to optimally combine two complimentary methods of machine learning.
SDSep 30, 2021
Emergency Vehicles Audio Detection and Localization in Autonomous DrivingHongyi Sun, Xinyi Liu, Kecheng Xu et al.
Emergency vehicles in service have right-of-way over all other vehicles. Hence, all other vehicles are supposed to take proper actions to yield emergency vehicles with active sirens. As this task requires the cooperation between ears and eyes for human drivers, it also needs audio detection as a supplement to vision-based algorithms for fully autonomous driving vehicles. In urban driving scenarios, we need to know both the existence of emergency vehicles and their relative positions to us to decide the proper actions. We present a novel system from collecting the real-world siren data to the deployment of models using only two cost-efficient microphones. We are able to achieve promising performance for each task separately, especially within the crucial 10m to 50m distance range to react (the size of our ego vehicle is around 5m in length and 2m in width). The recall rate to determine the existence of sirens is 99.16% , the median and mean angle absolute error is 9.64° and 19.18° respectively, and the median and mean distance absolute error of 9.30m and 10.58m respectively within that range. We also benchmark various machine learning approaches that can determine the siren existence and sound source localization which includes direction and distance simultaneously within 50ms of latency.
RONov 9, 2020
A Learning-Based Tune-Free Control Framework for Large Scale Autonomous Driving System DeploymentYu Wang, Shu Jiang, Weiman Lin et al.
This paper presents the design of a tune-free (human-out-of-the-loop parameter tuning) control framework, aiming at accelerating large scale autonomous driving system deployed on various vehicles and driving environments. The framework consists of three machine-learning-based procedures, which jointly automate the control parameter tuning for autonomous driving, including: a learning-based dynamic modeling procedure, to enable the control-in-the-loop simulation with highly accurate vehicle dynamics for parameter tuning; a learning-based open-loop mapping procedure, to solve the feedforward control parameters tuning; and more significantly, a Bayesian-optimization-based closed-loop parameter tuning procedure, to automatically tune feedback control (PID, LQR, MRAC, MPC, etc.) parameters in simulation environment. The paper shows an improvement in control performance with a significant increase in parameter tuning efficiency, in both simulation and road tests. This framework has been validated on different vehicles in US and China.
RONov 1, 2020
DRF: A Framework for High-Accuracy Autonomous Driving Vehicle ModelingShu Jiang, Yu Wang, Longtao Lin et al.
An accurate vehicle dynamic model is the key to bridge the gap between simulation and real road test in autonomous driving. In this paper, we present a Dynamic model-Residual correction model Framework (DRF) for vehicle dynamic modeling. On top of any existing open-loop dynamic model, this framework builds a Residual Correction Model (RCM) by integrating deep Neural Networks (NN) with Sparse Variational Gaussian Process (SVGP) model. RCM takes a sequence of vehicle control commands and dynamic status for a certain time duration as modeling inputs, extracts underlying context from this sequence with deep encoder networks, and predicts open-loop dynamic model prediction errors. Five vehicle dynamic models are derived from DRF via encoder variation. Our contribution is consolidated by experiments on evaluation of absolute trajectory error and similarity between DRF outputs and the ground truth. Compared to classic rule-based and learning-based vehicle dynamic models, DRF accomplishes as high as 74.12% to 85.02% of absolute trajectory error drop among all DRF variations.
ROSep 23, 2020
DL-IAPS and PJSO: A Path/Speed Decoupled Trajectory Optimization and its Application in Autonomous DrivingJinyun Zhou, Runxin He, Yu Wang et al.
This paper presents a free space trajectory optimization algorithm of autonomous driving vehicle, which decouples the collision-free trajectory planning problem into a Dual-Loop Iterative Anchoring Path Smoothing (DL-IAPS) and a Piece-wise Jerk Speed Optimization (PJSO). The work leads to remarkable driving performance improvements including more precise collision avoidance, higher control feasibility and better driving comfort, as those are often hard to realize in other existing path/speed decoupled trajectory optimization methods. Our algorithm's efficiency, robustness and adaptiveness to complex driving scenarios have been validated by both simulations and real on-road tests.
ROJun 11, 2020
Data Driven Prediction Architecture for Autonomous Driving and its Application on Apollo PlatformKecheng Xu, Xiangquan Xiao, Jinghao Miao et al.
Autonomous Driving vehicles (ADV) are on road with large scales. For safe and efficient operations, ADVs must be able to predict the future states and iterative with road entities in complex, real-world driving scenarios. How to migrate a well-trained prediction model from one geo-fenced area to another is essential in scaling the ADV operation and is difficult most of the time since the terrains, traffic rules, entities distributions, driving/walking patterns would be largely different in different geo-fenced operation areas. In this paper, we introduce a highly automated learning-based prediction model pipeline, which has been deployed on Baidu Apollo self-driving platform, to support different prediction learning sub-modules' data annotation, feature extraction, model training/tuning and deployment. This pipeline is completely automatic without any human intervention and shows an up to 400\% efficiency increase in parameter tuning, when deployed at scale in different scenarios across nations.
SEJul 23, 2018
Assessing Test Case Prioritization on Real Faults and MutantsQi Luo, Kevin Moran, Denys Poshyvanyk et al.
Test Case Prioritization (TCP) is an important component of regression testing, allowing for earlier detection of faults or helping to reduce testing time and cost. While several TCP approaches exist in the research literature, a growing number of studies have evaluated them against synthetic software defects, called mutants. Hence, it is currently unclear to what extent TCP performance on mutants would be representative of the performance achieved on real faults. To answer this fundamental question, we conduct the first empirical study comparing the performance of TCP techniques applied to both real-world and mutation faults. The context of our study includes eight well-studied TCP approaches, 35k+ mutation faults, and 357 real-world faults from five Java systems in the Defects4J dataset. Our results indicate that the relative performance of the studied TCP techniques on mutants may not strongly correlate with performance on real faults, depending upon attributes of the subject programs. This suggests that, in certain contexts, the best performing technique on a set of mutants may not be the best technique in practice when applied to real faults. We also illustrate that these correlations vary for mutants generated by different operators depending on whether chosen operators reflect typical faults of a subject program. This highlights the importance, particularly for TCP, of developing mutation operators tailored for specific program domains.
SEJun 26, 2018
How Do Static and Dynamic Test Case Prioritization Techniques Perform on Modern Software Systems? An Extensive Study on GitHub ProjectsQi Luo, Kevin Moran, Lingming Zhang et al.
Test Case Prioritization (TCP) is an increasingly important regression testing technique for reordering test cases according to a pre-defined goal, particularly as agile practices gain adoption. To better understand these techniques, we perform the first extensive study aimed at empirically evaluating four static TCP techniques, comparing them with state-of-research dynamic TCP techniques across several quality metrics. This study was performed on 58 real-word Java programs encompassing 714 KLoC and results in several notable observations. First, our results across two effectiveness metrics (the Average Percentage of Faults Detected APFD and the cost cognizant APFDc) illustrate that at test-class granularity, these metrics tend to correlate, but this correlation does not hold at test-method granularity. Second, our analysis shows that static techniques can be surprisingly effective, particularly when measured by APFDc. Third, we found that TCP techniques tend to perform better on larger programs, but that program size does not affect comparative performance measures between techniques. Fourth, software evolution does not significantly impact comparative performance results between TCP techniques. Fifth, neither the number nor type of mutants utilized dramatically impact measures of TCP effectiveness under typical experimental settings. Finally, our similarity analysis illustrates that highly prioritized test cases tend to uncover dissimilar faults.
SEJan 18, 2018
A Large-Scale Empirical Comparison of Static and Dynamic Test Case Prioritization TechniquesQi Luo, Kevin Moran, Denys Poshyvanyk
The large body of existing research in Test Case Prioritization (TCP) techniques, can be broadly classified into two categories: dynamic techniques (that rely on run-time execution information) and static techniques (that operate directly on source and test code). Absent from this current body of work is a comprehensive study aimed at understanding and evaluating the static approaches and comparing them to dynamic approaches on a large set of projects. In this work, we perform the first extensive study aimed at empirically evaluating four static TCP techniques comparing them with state-of-research dynamic TCP techniques at different test-case granularities (e.g., method and class-level) in terms of effectiveness, efficiency and similarity of faults detected. This study was performed on 30 real-word Java programs encompassing 431 KLoC. In terms of effectiveness, we find that the static call-graph-based technique outperforms the other static techniques at test-class level, but the topic-model-based technique performs better at test-method level. In terms of efficiency, the static call-graph-based technique is also the most efficient when compared to other static techniques. When examining the similarity of faults detected for the four static techniques compared to the four dynamic ones, we find that on average, the faults uncovered by these two groups of techniques are quite dissimilar, with the top 10% of test cases agreeing on only 25% - 30% of detected faults. This prompts further research into the severity/importance of faults uncovered by these techniques, and into the potential for combining static and dynamic information for more effective approaches.
AINov 14, 2016
An Evaluation of Information Sharing Parking Guidance Policies Using a Bayesian ApproachXinyi Wu, Kartik Balkumar, Qi Luo et al.
Real-time parking occupancy information is critical for a parking management system to facilitate drivers to park more efficiently. Recent advances in connected and automated vehicle technologies enable sensor-equipped cars (probe cars) to detect and broadcast available parking spaces when driving through parking lots. In this paper, we evaluate the impact of market penetration of probe cars on the system performance, and investigate different parking guidance policies to improve the data acquisition process. We adopt a simulation-based approach to impose four policies on an off- street parking lot influencing the behavior of probe cars to park in assigned parking spaces. This in turn effects the scanning route and the parking space occupancy estimations. The last policy we propose is a near-optimal guidance strategy that maximizes the information gain of posteriors. The results suggest that an efficient information gathering policy can compensate for low penetration of connected and automated vehicles. We also highlight the policy trade-off that occur while attempting to maximize information gain through explorations and improve assignment accuracy through exploitations. Our results can assist urban policy makers in designing and managing smart parking systems.
ROJul 22, 2016
A Statistical Method for Parking Spaces Occupancy Detection via Automotive RadarsQi Luo, Romesh Saigal, Robert Hampshire et al.
Real-time parking occupancy information is valuable for guiding drivers' searching for parking spaces. Recently many parking detection systems using range-based on-vehicle sensors are invented, but they disregard the practical difficulty of obtaining access to raw sensory data which are required for any feature-based algorithm. In this paper, we focus on a system using short-range radars (SRR) embedded in Advanced Driver Assistance System (ADAS) to collect occupancy information, and broadcast it through a connected vehicle network. The challenge that the data transmitted through ADAS unit has been encoded to sparse points is overcome by a statistical method instead of feature extractions. We propose a two-step classification algorithm combining Mean-Shift clustering and Support Vector Machine to analyze SRR-GPS data, and evaluate it through field experiments. The results show that the average Type I error rate for off-street parking is $15.23 \%$ and for on-street parking is $32.62\%$. In both cased the Type II error rates are less than $20 \%$. Bayesian updating can recursively improve the mapping results. This paper can provide a comprehensive method to elevate automotive sensors for the parking detection function.