CRFeb 17Code
SecCodeBench-V2 Technical ReportLongfei Chen, Ji Zhao, Lanxiao Cui et al.
We introduce SecCodeBench-V2, a publicly released benchmark for evaluating Large Language Model (LLM) copilots' capabilities of generating secure code. SecCodeBench-V2 comprises 98 generation and fix scenarios derived from Alibaba Group's industrial productions, where the underlying security issues span 22 common CWE (Common Weakness Enumeration) categories across five programming languages: Java, C, Python, Go, and JavaScript. SecCodeBench-V2 adopts a function-level task formulation: each scenario provides a complete project scaffold and requires the model to implement or patch a designated target function under fixed interfaces and dependencies. For each scenario, SecCodeBench-V2 provides executable proof-of-concept (PoC) test cases for both functional validation and security verification. All test cases are authored and double-reviewed by security experts, ensuring high fidelity, broad coverage, and reliable ground truth. Beyond the benchmark itself, we build a unified evaluation pipeline that assesses models primarily via dynamic execution. For most scenarios, we compile and run model-generated artifacts in isolated environments and execute PoC test cases to validate both functional correctness and security properties. For scenarios where security issues cannot be adjudicated with deterministic test cases, we additionally employ an LLM-as-a-judge oracle. To summarize performance across heterogeneous scenarios and difficulty levels, we design a Pass@K-based scoring protocol with principled aggregation over scenarios and severity, enabling holistic and comparable evaluation across models. Overall, SecCodeBench-V2 provides a rigorous and reproducible foundation for assessing the security posture of AI coding assistants, with results and artifacts released at https://alibaba.github.io/sec-code-bench. The benchmark is publicly available at https://github.com/alibaba/sec-code-bench.
SDFeb 12Code
Echo: Towards Advanced Audio Comprehension via Audio-Interleaved ReasoningDaiqing Wu, Xuan Zhang, Dongbao Yang et al.
The maturation of Large Audio Language Models (LALMs) has raised growing expectations for them to comprehend complex audio much like humans. Current efforts primarily replicate text-based reasoning by contextualizing audio content through a one-time encoding, which introduces a critical information bottleneck. Drawing inspiration from human cognition, we propose audio-interleaved reasoning to break through this bottleneck. It treats audio as an active reasoning component, enabling sustained audio engagement and perception-grounded analysis. To instantiate it, we introduce a two-stage training framework, first teaching LALMs to localize salient audio segments through supervised fine-tuning, and then incentivizing proficient re-listening via reinforcement learning. In parallel, a structured data generation pipeline is developed to produce high-quality training data. Consequently, we present Echo, a LALM capable of dynamically re-listening to audio in demand during reasoning. On audio comprehension benchmarks, Echo achieves overall superiority in both challenging expert-level and general-purpose tasks. Comprehensive analysis further confirms the efficiency and generalizability of audio-interleaved reasoning, establishing it as a promising direction for advancing audio comprehension. Project page: https://github.com/wdqqdw/Echo.
LGSep 18, 2023
Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics PerspectiveLaixin Xie, Yang Ouyang, Longfei Chen et al.
Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability.
90.3HCMar 20
ConSearcher: Supporting Conversational Information Seeking in Online Communities with Member PersonasShiwei Wu, Xinyue Chen, Yuheng Liu et al.
Many people browse online communities to learn from others' experiences and opinions, e.g., for constructing travel plans. Conversational search powered by large language models (LLMs) could ease this information-seeking task, but it remains under-investigated within the online community. In this paper, we first conducted an exploratory study (N=10) that indicated the helpfulness of a classic conversational search tool and identified room for improvement. Then, we proposed ConSearcher, an LLM-powered tool with dynamically generated member personas based on user queries to facilitate conversational search in the community. In ConSearcher, users can clarify their interests by checking what a simulated member similar to them may ask and get responses from diverse members' perspectives. A within-subjects study (N=27) showed that compared to two conversational search baselines, ConSearcher led to significantly higher information-seeking outcome and user engagement but raised concerns about over-personalization. We discuss implications for supporting conversational information seeking in online communities.
77.7SEApr 1
SmartPoC: Generating Executable and Validated PoCs for Smart Contract Bug ReportsLongfei Chen, Ruibin Yan, Taiyu Wong et al.
Smart contracts are commonly audited through static analysis to explore vulnerabilities. However, static approaches typically produce heterogeneous findings rather than reproducible, executable proof-of-concept (PoC) test cases, leading to costly and ad hoc manual validation. Large language models (LLMs) offer a promising way to translate audit reports into PoC test cases, but face three major challenges: noisy inputs, lack of execution grounding, and missing runtime oracles. We present SmartPoC, an end-to-end approach for validating reported vulnerabilities in audit reports by generating and executing PoC test cases with automated exploitability verification. SmartPoC first extracts a focused function-level slice from each report to reduce noise, centering on the key functions referenced in a finding and augmenting them with execution-relevant neighbors. To improve executability, we wrap LLM-based PoC synthesis in a generate-repair-execute loop, combining deterministic pre-execution sanitization with feedback-driven post-execution debugging. We further use differential verification as an oracle to confirm the exploitability of generated test cases. On the SmartBugs-Vul and FORGE-Vul benchmarks, SmartPoC achieves confirmation precision of 98.32% and 98.65%, with recall of 84.17% and 85.28%, respectively. On a recent Etherscan verified-source corpus, SmartPoC confirms 64 bugs from 545 audit findings at an average cost of $0.03.
CESep 19, 2020
Analysis of tunnel failure characteristics under multiple explosion loads based on persistent homology-based machine learningShengdong Zhang, Shihui You, Longfei Chen et al.
The study of tunnel failure characteristics under the load of external explosion source is an important problem in tunnel design and protection, in particular, it is of great significance to construct an intelligent topological feature description of the tunnel failure process. The failure characteristics of tunnels under explosive loading are described by using discrete element method and persistent homology-based machine learning. Firstly, the discrete element model of shallow buried tunnel was established in the discrete element software, and the explosive load was equivalent to a series of uniformly distributed loads acting on the surface by Saint-Venant principle, and the dynamic response of the tunnel under multiple explosive loads was obtained through iterative calculation. The topological characteristics of surrounding rock is studied by persistent homology-based machine learning. The geometric, physical and interunit characteristics of the tunnel subjected to explosive loading are extracted, and the nonlinear mapping relationship between the topological quantity of persistent homology, and the failure characteristics of the surrounding rock is established, and the results of the intelligent description of the failure characteristics of the tunnel are obtained. The research shows that the length of the longest Betty 1 bar code is closely related to the stability of the tunnel, which can be used for effective early warning of the tunnel failure, and an intelligent description of the tunnel failure process can be established to provide a new idea for tunnel engineering protection.