Wenxin Li

AI
h-index39
32papers
607citations
Novelty43%
AI Score55

32 Papers

90.9CLJun 1
SPADE-Bench: Evaluating Spontaneous Strategic Deception in Agents via Plan-Action Divergence

Yuyan Bu, Haowei Li, Qirui Zheng et al.

As LLM-based agents expand their operational scope, reliability becomes a prerequisite for real-world deployment. However, in practical applications, human users cannot monitor every immediate behavior; instead, the execution process often remains a black box, leaving users dependent solely on the agent's self-reported updates. This opacity creates a critical risk: agents may present observer-facing reports that diverge from their executed actions, rendering the system uncontrollable, especially in high-stakes autonomous scenarios. We term such self-reported plan-action divergence as agent deception. To assess this, we introduce SPADE-Bench, a benchmark designed to evaluate spontaneous plan-action divergence. Unlike prior deception benchmarks, SPADE-Bench simultaneously integrates actual tool execution and controlled pressure scenarios. This design ensures ecological validity and rigorously distinguishes strategic deception from mere hallucination through controlled plan-action comparisons under pressure. Experiments across mainstream models confirm that agent deception is a genuine and pressing issue in tool-use contexts. By providing a comprehensive and robust evaluation framework, SPADE-Bench fills a critical gap in agent safety, facilitating the community's progress toward building trustworthy and controllable autonomous systems.

85.7AIMar 18
ShuttleEnv: An Interactive Data-Driven RL Environment for Badminton Strategy Modeling

Ang Li, Xinyang Gong, Bozhou Chen et al.

We present ShuttleEnv, an interactive and data-driven simulation environment for badminton, designed to support reinforcement learning and strategic behavior analysis in fast-paced adversarial sports. The environment is grounded in elite-player match data and employs explicit probabilistic models to simulate rally-level dynamics, enabling realistic and interpretable agent-opponent interactions without relying on physics-based simulation. In this demonstration, we showcase multiple trained agents within ShuttleEnv and provide live, step-by-step visualization of badminton rallies, allowing attendees to explore different play styles, observe emergent strategies, and interactively analyze decision-making behaviors. ShuttleEnv serves as a reusable platform for research, visualization, and demonstration of intelligent agents in sports AI. Our ShuttleEnv demo video URL: https://drive.google.com/file/d/1hTR4P16U27H2O0-w316bR73pxE2ucczX/view

AIJan 22
BotzoneBench: Scalable LLM Evaluation via Graded AI Anchors

Lingfeng Li, Yunlong Lu, Yuefei Zhang et al.

Large Language Models (LLMs) are increasingly deployed in interactive environments requiring strategic decision-making, yet systematic evaluation of these capabilities remains challenging. Existing benchmarks for LLMs primarily assess static reasoning through isolated tasks and fail to capture dynamic strategic abilities. Recent game-based evaluations employ LLM-vs-LLM tournaments that produce relative rankings dependent on transient model pools, incurring quadratic computational costs and lacking stable performance anchors for longitudinal tracking. The central challenge is establishing a scalable evaluation framework that measures LLM strategic reasoning against consistent, interpretable standards rather than volatile peer models. Here we show that anchoring LLM evaluation to fixed hierarchies of skill-calibrated game Artificial Intelligence (AI) enables linear-time absolute skill measurement with stable cross-temporal interpretability. Built on the Botzone platform's established competitive infrastructure, our BotzoneBench evaluates LLMs across eight diverse games spanning deterministic perfect-information board games to stochastic imperfect-information card games. Through systematic assessment of 177,047 state-action pairs from five flagship models, we reveal significant performance disparities and identify distinct strategic behaviors, with top-performing models achieving proficiency comparable to mid-to-high-tier specialized game AI in multiple domains. This anchored evaluation paradigm generalizes beyond games to any domain with well-defined skill hierarchies, establishing a scalable and reusable framework for assessing interactive AI capabilities.

AIJun 22, 2022
Evolutionary Game-Theoretical Analysis for General Multiplayer Asymmetric Games

Xinyu Zhang, Peng Peng, Yushan Zhou et al.

Evolutionary game theory has been a successful tool to combine classical game theory with learning-dynamical descriptions in multiagent systems. Provided some symmetric structures of interacting players, many studies have been focused on using a simplified heuristic payoff table as input to analyse the dynamics of interactions. Nevertheless, even for the state-of-the-art method, there are two limits. First, there is inaccuracy when analysing the simplified payoff table. Second, no existing work is able to deal with 2-population multiplayer asymmetric games. In this paper, we fill the gap between heuristic payoff table and dynamic analysis without any inaccuracy. In addition, we propose a general framework for $m$ versus $n$ 2-population multiplayer asymmetric games. Then, we compare our method with the state-of-the-art in some classic games. Finally, to illustrate our method, we perform empirical game-theoretical analysis on Wolfpack as well as StarCraft II, both of which involve complex multiagent interactions.

AIMay 29, 2025Code
InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback

Boyuan Chen, Donghai Hong, Jiaming Ji et al.

As multimodal large models (MLLMs) continue to advance across challenging tasks, a key question emerges: What essential capabilities are still missing? A critical aspect of human learning is continuous interaction with the environment -- not limited to language, but also involving multimodal understanding and generation. To move closer to human-level intelligence, models must similarly support multi-turn, multimodal interaction. In particular, they should comprehend interleaved multimodal contexts and respond coherently in ongoing exchanges. In this work, we present an initial exploration through the InterMT -- the first preference dataset for multi-turn multimodal interaction, grounded in real human feedback. In this exploration, we particularly emphasize the importance of human oversight, introducing expert annotations to guide the process, motivated by the fact that current MLLMs lack such complex interactive capabilities. InterMT captures human preferences at both global and local levels into nine sub-dimensions, consists of 15.6k prompts, 52.6k multi-turn dialogue instances, and 32.4k human-labeled preference pairs. To compensate for the lack of capability for multi-modal understanding and generation, we introduce an agentic workflow that leverages tool-augmented MLLMs to construct multi-turn QA instances. To further this goal, we introduce InterMT-Bench to assess the ability of MLLMs in assisting judges with multi-turn, multimodal tasks. We demonstrate the utility of \InterMT through applications such as judge moderation and further reveal the multi-turn scaling law of judge model. We hope the open-source of our data can help facilitate further research on aligning current MLLMs to the next step. Our project website can be found at https://pku-intermt.github.io .

CVMar 6Code
Can we Trust Unreliable Voxels? Exploring 3D Semantic Occupancy Prediction under Label Noise

Wenxin Li, Kunyu Peng, Di Wen et al.

3D semantic occupancy prediction is a cornerstone of robotic perception, yet real-world voxel annotations are inherently corrupted by structural artifacts and dynamic trailing effects. This raises a critical but underexplored question: can autonomous systems safely rely on such unreliable occupancy supervision? To systematically investigate this issue, we establish OccNL, the first benchmark dedicated to 3D occupancy under occupancy-asymmetric and dynamic trailing noise. Our analysis reveals a fundamental domain gap: state-of-the-art 2D label noise learning strategies collapse catastrophically in sparse 3D voxel spaces, exposing a critical vulnerability in existing paradigms. To address this challenge, we propose DPR-Occ, a principled label noise-robust framework that constructs reliable supervision through dual-source partial label reasoning. By synergizing temporal model memory with representation-level structural affinity, DPR-Occ dynamically expands and prunes candidate label sets to preserve true semantics while suppressing noise propagation. Extensive experiments on SemanticKITTI demonstrate that DPR-Occ prevents geometric and semantic collapse under extreme corruption. Notably, even at 90% label noise, our method achieves significant performance gains (up to 2.57% mIoU and 13.91% IoU) over existing label noise learning baselines adapted to the 3D occupancy prediction task. By bridging label noise learning and 3D perception, OccNL and DPR-Occ provide a reliable foundation for safety-critical robotic perception in dynamic environments. The benchmark and source code will be made publicly available at https://github.com/mylwx/OccNL.

DCNov 27, 2025Code
PAT: Accelerating LLM Decoding via Prefix-Aware Attention with Resource Efficient Multi-Tile Kernel

Jinjun Yi, Zhixin Zhao, Yitao Hu et al.

LLM serving is increasingly dominated by decode attention, which is a memory-bound operation due to massive KV cache loading from global memory. Meanwhile, real-world workloads exhibit substantial, hierarchical shared prefixes across requests (e.g., system prompts, tools/templates, RAG). Existing attention implementations fail to fully exploit prefix sharing: one-query-per-CTA execution repeatedly loads shared prefix KV cache, while one-size-fits-all tiling leaves on-chip resources idle and exacerbates bubbles for uneven KV lengths. These choices amplify memory bandwidth pressure and stall memory-bound decode attention. This paper introduces PAT, a prefix-aware attention kernel implementation for LLM decoding that organizes execution with a pack-forward-merge paradigm. PAT packs queries by shared prefix to reduce repeated memory accesses, runs a customized multi-tile kernel to achieve high resource efficiency. It further applies practical multi-stream forwarding and KV splitting to reduce resource bubbles. The final merge performs online softmax with negligible overhead. We implement PAT as an off-the-shelf plugin for vLLM. Evaluation on both real-world and synthetic workloads shows that PAT reduces attention latency by 53.5% on average and TPOT by 17.0-93.1% under the same configurations against state-of-the-art attention kernels. PAT's source code is publicly available at https://github.com/flashserve/PAT.

CVSep 20, 2025Code
Segment-to-Act: Label-Noise-Robust Action-Prompted Video Segmentation Towards Embodied Intelligence

Wenxin Li, Kunyu Peng, Di Wen et al.

Embodied intelligence relies on accurately segmenting objects actively involved in interactions. Action-based video object segmentation addresses this by linking segmentation with action semantics, but it depends on large-scale annotations and prompts that are costly, inconsistent, and prone to multimodal noise such as imprecise masks and referential ambiguity. To date, this challenge remains unexplored. In this work, we take the first step by studying action-based video object segmentation under label noise, focusing on two sources: textual prompt noise (category flips and within-category noun substitutions) and mask annotation noise (perturbed object boundaries to mimic imprecise supervision). Our contributions are threefold. First, we introduce two types of label noises for the action-based video object segmentation task. Second, we build up the first action-based video object segmentation under a label noise benchmark ActiSeg-NL and adapt six label-noise learning strategies to this setting, and establish protocols for evaluating them under textual, boundary, and mixed noise. Third, we provide a comprehensive analysis linking noise types to failure modes and robustness gains, and we introduce a Parallel Mask Head Mechanism (PMHM) to address mask annotation noise. Qualitative evaluations further reveal characteristic failure modes, including boundary leakage and mislocalization under boundary perturbations, as well as occasional identity substitutions under textual flips. Our comparative analysis reveals that different learning strategies exhibit distinct robustness profiles, governed by a foreground-background trade-off where some achieve balanced performance while others prioritize foreground accuracy at the cost of background precision. The established benchmark and source code will be made publicly available at https://github.com/mylwx/ActiSeg-NL.

AIJan 13
Adapting Rules of Official International Mahjong for Online Players

Chucai Wang, Lingfeng Li, Yunlong Lu et al.

As one of the worldwide spread traditional game, Official International Mahjong can be played and promoted online through remote devices instead of requiring face-to-face interaction. However, online players have fragmented playtime and unfixed combination of opponents in contrary to offline players who have fixed opponents for multiple rounds of play. Therefore, the rules designed for offline players need to be modified to ensure the fairness of online single-round play. Specifically, We employ a world champion AI to engage in self-play competitions and conduct statistical data analysis. Our study reveals the first-mover advantage and issues in the subgoal scoring settings. Based on our findings, we propose rule adaptations to make the game more suitable for the online environment, such as introducing compensatory points for the first-mover advantage and refining the scores of subgoals for different tile patterns. Compared with the traditional method of rotating positions over multiple rounds to balance first-mover advantage, our compensatory points mechanism in each round is more convenient for online players. Furthermore, we implement the revised Mahjong game online, which is open for online players. This work is an initial attempt to use data from AI systems to evaluate Official Internatinoal Mahjong's game balance and develop a revised version of the traditional game better adapted for online players.

78.4LGMay 7
Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer

Yongyi Wang, Hanyu Liu, Lingfeng Li et al.

Decision Transformer (DT) formulates offline reinforcement learning as autoregressive sequence modeling, achieving promising results by predicting actions from a sequence of Return-to-Go (RTG), state, and action tokens. However, RTG is a scalar that summarizes future rewards, containing far less information than typical state or action vectors, yet it consumes the same computational budget per token. Worse, the self-attention cost of Transformers grows quadratically with sequence length, so including RTG as a separate token adds unnecessary overhead. We propose SlimDT, which removes RTG from the autoregressive sequence. Instead, we inject RTG information into the state representations before the sequential modeling step, allowing the Transformer to process only a compact (state, action) sequence. This reduces the sequence length by one-third, directly improving inference efficiency. On the D4RL benchmark, SlimDT surpasses standard DT across various tasks and achieves performance comparable to existing state-of-the-art methods. Decoupling a sparse conditioning signal from an information-rich sequence thus yields both computational gains and higher task performance.

DCNov 1, 2025
EPARA: Parallelizing Categorized AI Inference in Edge Clouds

Yubo Wang, Yubo Cui, Tuo Shi et al.

With the increasing adoption of AI applications such as large language models and computer vision AI, the computational demands on AI inference systems are continuously rising, making the enhancement of task processing capacity using existing hardware a primary objective in edge clouds. We propose EPARA, an end-to-end AI parallel inference framework in edge, aimed at enhancing the edge AI serving capability. Our key idea is to categorize tasks based on their sensitivity to latency/frequency and requirement for GPU resources, thereby achieving both request-level and service-level task-resource allocation. EPARA consists of three core components: 1) a task-categorized parallelism allocator that decides the parallel mode of each task, 2) a distributed request handler that performs the calculation for the specific request, and 3) a state-aware scheduler that periodically updates service placement in edge clouds. We implement a EPARA prototype and conduct a case study on the EPARA operation for LLMs and segmentation tasks. Evaluation through testbed experiments involving edge servers, embedded devices, and microcomputers shows that EPARA achieves up to 2.1$\times$ higher goodput in production workloads compared to prior frameworks, while adapting to various edge AI inference tasks.

AIJan 22
Decoupling Return-to-Go for Efficient Decision Transformer

Yongyi Wang, Hanyu Liu, Lingfeng Li et al.

The Decision Transformer (DT) has established a powerful sequence modeling approach to offline reinforcement learning. It conditions its action predictions on Return-to-Go (RTG), using it both to distinguish trajectory quality during training and to guide action generation at inference. In this work, we identify a critical redundancy in this design: feeding the entire sequence of RTGs into the Transformer is theoretically unnecessary, as only the most recent RTG affects action prediction. We show that this redundancy can impair DT's performance through experiments. To resolve this, we propose the Decoupled DT (DDT). DDT simplifies the architecture by processing only observation and action sequences through the Transformer, using the latest RTG to guide the action prediction. This streamlined approach not only improves performance but also reduces computational cost. Our experiments show that DDT significantly outperforms DT and establishes competitive performance against state-of-the-art DT variants across multiple offline RL tasks.

LGJun 23, 2025
Quantum-Classical Hybrid Quantized Neural Network

Wenxin Li, Chuan Wang, Hongdong Zhu et al.

Here in this work, we present a novel Quadratic Binary Optimization (QBO) model for quantized neural network training, enabling the use of arbitrary activation and loss functions through spline interpolation. We introduce Forward Interval Propagation (FIP), a method designed to tackle the challenges of non-linearity and the multi-layer composite structure in neural networks by discretizing activation functions into linear subintervals. This approach preserves the universal approximation properties of neural networks while allowing complex nonlinear functions to be optimized using quantum computers, thus broadening their applicability in artificial intelligence. We provide theoretical upper bounds on the approximation error and the number of Ising spins required, by deriving the sample complexity of the empirical risk minimization problem, from an optimization perspective. A significant challenge in solving the associated Quadratic Constrained Binary Optimization (QCBO) model on a large scale is the presence of numerous constraints. When employing the penalty method to handle these constraints, tuning a large number of penalty coefficients becomes a critical hyperparameter optimization problem, increasing computational complexity and potentially affecting solution quality. To address this, we employ the Quantum Conditional Gradient Descent (QCGD) algorithm, which leverages quantum computing to directly solve the QCBO problem. We prove the convergence of QCGD under a quantum oracle with randomness and bounded variance in objective value, as well as under limited precision constraints in the coefficient matrix. Additionally, we provide an upper bound on the Time-To-Solution for the QCBO solving process. We further propose a training algorithm with single-sample bit-scale optimization.

AIAug 6, 2025
Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling

Yongyi Wang, Lingfeng Li, Bozhou Chen et al.

Recent research has developed benchmarks for memory-augmented reinforcement learning (RL) algorithms, providing Partially Observable Markov Decision Process (POMDP) environments where agents depend on past observations to make decisions. While many benchmarks incorporate sufficiently complex real-world problems, they lack controllability over the degree of challenges posed to memory models. In contrast, synthetic environments enable fine-grained manipulation of dynamics, making them critical for detailed and rigorous evaluation of memory-augmented RL. Our study focuses on POMDP synthesis with three key contributions: 1. A theoretical framework for analyzing POMDPs, grounded in Memory Demand Structure (MDS), transition invariance, and related concepts; 2. A methodology leveraging linear process dynamics, state aggregation, and reward redistribution to construct customized POMDPs with predefined properties; 3. Empirically validated series of POMDP environments with increasing difficulty levels, designed based on our theoretical insights. Our work clarifies the challenges of memory-augmented RL in solving POMDPs, provides guidelines for analyzing and designing POMDP environments, and offers empirical support for selecting memory models in RL tasks.

AIJun 30, 2025
Constructing Non-Markovian Decision Process via History Aggregator

Yongyi Wang, Wenxin Li

In the domain of algorithmic decision-making, non-Markovian dynamics manifest as a significant impediment, especially for paradigms such as Reinforcement Learning (RL), thereby exerting far-reaching consequences on the advancement and effectiveness of the associated systems. Nevertheless, the existing benchmarks are deficient in comprehensively assessing the capacity of decision algorithms to handle non-Markovian dynamics. To address this deficiency, we have devised a generalized methodology grounded in category theory. Notably, we established the category of Markov Decision Processes (MDP) and the category of non-Markovian Decision Processes (NMDP), and proved the equivalence relationship between them. This theoretical foundation provides a novel perspective for understanding and addressing non-Markovian dynamics. We further introduced non-Markovianity into decision-making problem settings via the History Aggregator for State (HAS). With HAS, we can precisely control the state dependency structure of decision-making problems in the time series. Our analysis demonstrates the effectiveness of our method in representing a broad range of non-Markovian dynamics. This approach facilitates a more rigorous and flexible evaluation of decision algorithms by testing them in problem settings where non-Markovian dynamics are explicitly constructed.

AISep 16, 2025
HLSMAC: A New StarCraft Multi-Agent Challenge for High-Level Strategic Decision-Making

Xingxing Hong, Yungong Wang, Dexin Jin et al.

Benchmarks are crucial for assessing multi-agent reinforcement learning (MARL) algorithms. While StarCraft II-related environments have driven significant advances in MARL, existing benchmarks like SMAC focus primarily on micromanagement, limiting comprehensive evaluation of high-level strategic intelligence. To address this, we introduce HLSMAC, a new cooperative MARL benchmark with 12 carefully designed StarCraft II scenarios based on classical stratagems from the Thirty-Six Stratagems. Each scenario corresponds to a specific stratagem and is designed to challenge agents with diverse strategic elements, including tactical maneuvering, timing coordination, and deception, thereby opening up avenues for evaluating high-level strategic decision-making capabilities. We also propose novel metrics across multiple dimensions beyond conventional win rate, such as ability utilization and advancement efficiency, to assess agents' overall performance within the HLSMAC environment. We integrate state-of-the-art MARL algorithms and LLM-based agents with our benchmark and conduct comprehensive experiments. The results demonstrate that HLSMAC serves as a robust testbed for advancing multi-agent strategic decision-making.

AIJun 20, 2025
Style-Preserving Policy Optimization for Game Agents

Lingfeng Li, Yunlong Lu, Yongyi Wang et al.

Proficient game agents with diverse play styles enrich the gaming experience and enhance the replay value of games. However, recent advancements in game AI based on reinforcement learning have predominantly focused on improving proficiency, whereas methods based on evolution algorithms generate agents with diverse play styles but exhibit subpar performance compared to RL methods. To address this gap, this paper proposes Mixed Proximal Policy Optimization (MPPO), a method designed to improve the proficiency of existing suboptimal agents while retaining their distinct styles. MPPO unifies loss objectives for both online and offline samples and introduces an implicit constraint to approximate demonstrator policies by adjusting the empirical distribution of samples. Empirical results across environments of varying scales demonstrate that MPPO achieves proficiency levels comparable to, or even superior to, pure online algorithms while preserving demonstrators' play styles. This work presents an effective approach for generating highly proficient and diverse game agents, ultimately contributing to more engaging gameplay experiences.

CLJun 17, 2025
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary

Qirui Zheng, Xingbo Wang, Keyuan Cheng et al.

The advent of artificial intelligence has propelled AI-Generated Game Commentary (AI-GGC) into a rapidly expanding field, offering benefits such as unlimited availability and personalized narration. However, current researches in this area remain fragmented, and a comprehensive survey that systematically unifies existing efforts is still missing. To bridge this gap, our survey introduces a unified framework that systematically organizes the AI-GGC landscape. We present a novel taxonomy focused on three core commentator capabilities: Live Observation, Strategic Analysis, and Historical Recall. Commentary is further categorized into three functional types: Descriptive, Analytical, and Background. Building on this structure, we provide an in-depth review of state-of-the-art methods, datasets, and evaluation metrics across various game genres. Finally, we highlight key challenges such as real-time reasoning, multimodal integration, and evaluation bottlenecks, and outline promising directions for future research and system development in AI-GGC.

AIJun 17, 2025
Mxplainer: Explain and Learn Insights by Imitating Mahjong Agents

Lingfeng Li, Yunlong Lu, Yongyi Wang et al.

People need to internalize the skills of AI agents to improve their own capabilities. Our paper focuses on Mahjong, a multiplayer game involving imperfect information and requiring effective long-term decision-making amidst randomness and hidden information. Through the efforts of AI researchers, several impressive Mahjong AI agents have already achieved performance levels comparable to those of professional human players; however, these agents are often treated as black boxes from which few insights can be gleaned. This paper introduces Mxplainer, a parameterized search algorithm that can be converted into an equivalent neural network to learn the parameters of black-box agents. Experiments conducted on AI and human player data demonstrate that the learned parameters provide human-understandable insights into these agents' characteristics and play styles. In addition to analyzing the learned parameters, we also showcase how our search-based framework can locally explain the decision-making processes of black-box agents for most Mahjong game states.

CLSep 3, 2023
Business Process Text Sketch Automation Generation Using Large Language Model

Rui Zhu, Quanzhou Hu, Wenxin Li et al.

Business Process Management (BPM) is gaining increasing attention as it has the potential to cut costs while boosting output and quality. Business process document generation is a crucial stage in BPM. However, due to a shortage of datasets, data-driven deep learning techniques struggle to deliver the expected results. We propose an approach to transform Conditional Process Trees (CPTs) into Business Process Text Sketches (BPTSs) using Large Language Models (LLMs). The traditional prompting approach (Few-shot In-Context Learning) tries to get the correct answer in one go, and it can find the pattern of transforming simple CPTs into BPTSs, but for close-domain and CPTs with complex hierarchy, the traditional prompts perform weakly and with low correctness. We suggest using this technique to break down a difficult CPT into a number of basic CPTs and then solve each one in turn, drawing inspiration from the divide-and-conquer strategy. We chose 100 process trees with depths ranging from 2 to 5 at random, as well as CPTs with many nodes, many degrees of selection, and cyclic nesting. Experiments show that our method can achieve a correct rate of 93.42%, which is 45.17% better than traditional prompting methods. Our proposed method provides a solution for business process document generation in the absence of datasets, and secondly, it becomes potentially possible to provide a large number of datasets for the process model extraction (PME) domain.

SEJul 14, 2021
FAPR: Fast and Accurate Program Repair for Introductory Programming Courses

Yunlong Lu, Na Meng, Wenxin Li

In introductory programming courses, it is challenging for instructors to provide debugging feedback on students' incorrect programs. Some recent tools automatically offer program repair feedback by identifying any differences between incorrect and correct programs, but suffer from issues related to scalability, accuracy, and cross-language portability. This paper presents FAPR -- our novel approach that suggests repairs based on program differences in a fast and accurate manner. FAPR is different from current tools in three aspects. First, it encodes syntactic information into token sequences to enable high-speed comparison between incorrect and correct programs. Second, to accurately extract program differences, FAPR adopts a novel matching algorithm that maximizes token-level matches and minimizes statement-level differences. Third, FAPR relies on testing instead of static/dynamic analysis to validate and refine candidate repairs, so it eliminates the language dependency or high runtime overhead incurred by complex program analysis. We implemented FAPR to suggest repairs for both C and C++ programs; our experience shows the great cross-language portability of FAPR. More importantly, we empirically compared FAPR with a state-of-the-art tool Clara. FAPR suggested repairs for over 95.5% of incorrect solutions. We sampled 250 repairs among FAPR's suggestions, and found 89.6% of the samples to be minimal and correct. FAPR outperformed Clara by suggesting repairs for more cases, creating smaller repairs, producing higher-quality fixes, and causing lower runtime overheads. Our results imply that FAPR can potentially help instructors or TAs to effectively locate bugs in incorrect code, and to provide debugging hints/guidelines based on those generated repairs.

LGDec 16, 2020
A Note on Optimizing the Ratio of Monotone Supermodular Functions

Wenxin Li

We show that for the problem of minimizing (or maximizing) the ratio of two supermodular functions, no bounded approximation ratio can be achieved via polynomial number of queries, if the two supermodular functions are both monotone non-decreasing or non-increasing.

DSJun 16, 2020
Submodular Maximization in Clean Linear Time

Wenxin Li, Moran Feldman, Ehsan Kazemi et al.

In this paper, we provide the first deterministic algorithm that achieves the tight $1-1/e$ approximation guarantee for submodular maximization under a cardinality (size) constraint while making a number of queries that scales only linearly with the size of the ground set $n$. To complement our result, we also show strong information-theoretic lower bounds. More specifically, we show that when the maximum cardinality allowed for a solution is constant, no algorithm making a sub-linear number of function evaluations can guarantee any constant approximation ratio. Furthermore, when the constraint allows the selection of a constant fraction of the ground set, we show that any algorithm making fewer than $Ω(n/\log(n))$ function evaluations cannot perform better than an algorithm that simply outputs a uniformly random subset of the ground set of the right size. We then provide a variant of our deterministic algorithm for the more general knapsack constraint, which is the first linear-time algorithm that achieves $1/2$-approximation guarantee for this constraint. Finally, we extend our results to the general case of maximizing a monotone submodular function subject to the intersection of a $p$-set system and multiple knapsack constraints. We extensively evaluate the performance of our algorithms on multiple real-life machine learning applications, including movie recommendation, location summarization, twitter text summarization and video summarization.

IVMar 27, 2020
Viral Pneumonia Screening on Chest X-ray Images Using Confidence-Aware Anomaly Detection

Jianpeng Zhang, Yutong Xie, Guansong Pang et al.

Cluster of viral pneumonia occurrences during a short period of time may be a harbinger of an outbreak or pandemic, like SARS, MERS, and recent COVID-19. Rapid and accurate detection of viral pneumonia using chest X-ray can be significantly useful in large-scale screening and epidemic prevention, particularly when other chest imaging modalities are less available. Viral pneumonia often have diverse causes and exhibit notably different visual appearances on X-ray images. The evolution of viruses and the emergence of novel mutated viruses further result in substantial dataset shift, which greatly limits the performance of classification approaches. In this paper, we formulate the task of differentiating viral pneumonia from non-viral pneumonia and healthy controls into an one-class classification-based anomaly detection problem, and thus propose the confidence-aware anomaly detection (CAAD) model, which consists of a shared feature extractor, an anomaly detection module, and a confidence prediction module. If the anomaly score produced by the anomaly detection module is large enough or the confidence score estimated by the confidence prediction module is small enough, we accept the input as an anomaly case (i.e., viral pneumonia). The major advantage of our approach over binary classification is that we avoid modeling individual viral pneumonia classes explicitly and treat all known viral pneumonia cases as anomalies to reinforce the one-class model. The proposed model outperforms binary classification models on the clinical X-VIRAL dataset that contains 5,977 viral pneumonia (no COVID-19) cases, 18,619 non-viral pneumonia cases, and 18,774 healthy controls.

LGFeb 7, 2019
Artificial Intelligence for Prosthetics - challenge solutions

Łukasz Kidziński, Carmichael Ong, Sharada Prasanna Mohanty et al.

In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many solutions use similar relaxations and heuristics, such as reward shaping, frame skipping, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.

AINov 14, 2018
Layout Design for Intelligent Warehouse by Evolution with Fitness Approximation

Haifeng Zhang, Zilong Guo, Han Cai et al.

With the rapid growth of the express industry, intelligent warehouses that employ autonomous robots for carrying parcels have been widely used to handle the vast express volume. For such warehouses, the warehouse layout design plays a key role in improving the transportation efficiency. However, this work is still done by human experts, which is expensive and leads to suboptimal results. In this paper, we aim to automate the warehouse layout designing process. We propose a two-layer evolutionary algorithm to efficiently explore the warehouse layout space, where an auxiliary objective fitness approximation model is introduced to predict the outcome of the designed warehouse layout and a two-layer population structure is proposed to incorporate the approximation model into the ordinary evolution framework. Empirical experiments show that our method can efficiently design effective warehouse layouts that outperform both heuristic-designed and vanilla evolution-designed warehouse layouts.

CVMay 8, 2018
N2RPP: An Adversarial Network to Rebuild Plantar Pressure for ACLD Patients

Yi Zhang, Zhengfei Wang, Guoxiong Xu et al.

Foot is a vital part of human, and lots of valuable information is embedded. Plantar pressure is one of which contains this information and it describes human walking features. It is proved that once one has trouble with lower limb, the distribution of plantar pressure will change to some degree. Plantar pressure can be converted into images according to some simple standards. In this paper, we take full advantage of these plantar pressure images for medical usage. We present N2RPP, a generative adversarial network (GAN) based method to rebuild plantar pressure images of anterior cruciate ligament deficiency (ACLD) patients from low dimension features, which are extracted from an autoencoder. Through the result of experiments, the extracted features are a useful representation to describe and rebuild plantar pressure images. According to N2RPP's results, we find out that there are several noteworthy differences between normal people and patients. This can provide doctors a rough direction of adjusting plantar pressure to a better distribution to reduce patients' sore and pain during the rehabilitation treatment for ACLD.

CVFeb 28, 2018
A Model for Medical Diagnosis Based on Plantar Pressure

Guoxiong Xu, Zhengfei Wang, Hongshi Huang et al.

The process of determining which disease or condition explains a person's symptoms and signs can be very complicated and may be inaccurate in some cases. The general belief is that diagnosing diseases relies on doctors' keen intuition, rich experience and professional equipment. In this work, we employ ideas from recent advances in plantar pressure research and from the powerful capacity of the convolutional neural network for learning representations. Here, we propose a model using convolutional neural network based on plantar pressure for medical diagnosis. Our model learns a network that maps plantar pressure data to its corresponding medical diagnostic label. We then apply our model to make the medical diagnosis on datasets we collected from cooperative hospital and achieve an accuracy of 98.36%. We demonstrate that the model base on the convolutional neural network is competitive in medical diagnosis.

CVJan 4, 2018
ICFVR 2017: 3rd International Competition on Finger Vein Recognition

Yi Zhang, Houjun Huang, Haifeng Zhang et al.

In recent years, finger vein recognition has become an important sub-field in biometrics and been applied to real-world applications. The development of finger vein recognition algorithms heavily depends on large-scale real-world data sets. In order to motivate research on finger vein recognition, we released the largest finger vein data set up to now and hold finger vein recognition competitions based on our data set every year. In 2017, International Competition on Finger Vein Recognition(ICFVR) is held jointly with IJCB 2017. 11 teams registered and 10 of them joined the final evaluation. The winner of this year dramatically improved the EER from 2.64% to 0.483% compared to the winner of last year. In this paper, we introduce the process and results of ICFVR 2017 and give insights on development of state-of-art finger vein recognition algorithms.

AIJul 5, 2017
Learning to Design Games: Strategic Environments in Reinforcement Learning

Haifeng Zhang, Jun Wang, Zhiming Zhou et al.

In typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment. In this paper, we extend this setting by considering the environment is not given, but controllable and learnable through its interaction with the agent at the same time. This extension is motivated by environment design scenarios in the real-world, including game design, shopping space design and traffic signal design. Theoretically, we find a dual Markov decision process (MDP) w.r.t. the environment to that w.r.t. the agent, and derive a policy gradient solution to optimizing the parametrized environment. Furthermore, discontinuous environments are addressed by a proposed general generative framework. Our experiments on a Maze game design task show the effectiveness of the proposed algorithms in generating diverse and challenging Mazes against various agent settings.

CVDec 17, 2016
A Fusion Method Based on Decision Reliability Ratio for Finger Vein Verification

Liao Ni, Yi Zhang, He Zheng et al.

Finger vein verification has developed a lot since its first proposal, but there is still not a perfect algorithm. It is proved that algorithms with the same overall accuracy may have different misclassified patterns. We could make use of this complementation to fuse individual algorithms together for more precise result. According to our observation, algorithm has different confidence on its decisions but it is seldom considered in fusion methods. Our work is first to define decision reliability ratio to quantify this confidence, and then propose the Maximum Decision Reliability Ratio (MDRR) fusion method incorporating Weighted Voting. Experiment conducted on a data set of 1000 fingers and 5 images per finger proves the effectiveness of the method. The classifier obtained by MDRR method gets an accuracy of 99.42% while the maximum accuracy of the original individual classifiers is 97.77%. The experiment results also show the MDRR outperforms the traditional fusion methods as Voting, Weighted Voting, Sum and Weighted Sum.