Feng Xiao

CV
h-index20
36papers
141citations
Novelty47%
AI Score54

36 Papers

95.3SYMay 28
Robustness Enhancement of Consensus Networks: the Optimal Memory Depth

Jiamin Wang, Jian Liu, Feng Xiao et al.

Understanding what governs collective robustness and how it can be enhanced remains a central pursuit in network science. This paper investigates the robustness of multi-agent consensus networks, quantified by the $H_2$ performance metric, and delves into the enhancing effect of agents' local memory on it. Inspired by the hierarchical temporal structure of memory observed in neuroscience, we focus on the role of memory depth, which reflects the temporal features of memory from recent to remote. Building on linear extrapolation, we propose a consensus protocol with single-step memory and tunable memory depth, derive the necessary and sufficient condition for achieving consensus, and show that the protocol exhibits an inheritable consensus property across memory depths. Furthermore, analytical expressions for the $H_2$ performance metric, which depend on the memory factor, memory depth, coupling gain, and Laplacian spectrum, are established. Under balanced usage of real-time and memory information, we demonstrate that memory at any accessible depth enhances $H_2$ performance, and the optimal memory depth occurs at either the most recent or the most remote memory, contingent upon certain parameter regions. Further detailed discussions are provided to clarify the broader implications of our findings.

COMP-PHFeb 4, 2016
Boundary Variation Diminishing (BVD) reconstruction: a new approach to improve Godunov scheme

Ziyao Sun, Satoshi Inaba, Feng Xiao

This paper presents a new approach, so-called boundary variation diminishing (BVD), for reconstructions that minimize the discontinuities (jumps) at cell interfaces in Godunov type schemes. It is motivated by the observation that diminishing the jump at the cell boundary might effectively reduce the dissipation in numerical flux. Different from the existing practices which seek high-order polynomials within mesh cells while assuming discontinuities being always at the cell interfaces, we proposed a new strategy that combines a high-order polynomial-based interpolation and a jump-like reconstruction that allows a discontinuity being partly represented within the mesh cell rather than at the interface. It is shown that new schemes of high fidelity for both continuous and discontinuous solutions can be devised by the BVD guideline with properly-chosen candidate reconstruction schemes. Excellent numerical results have been obtained for both scalar and Euler conservation laws with substantially improved solution quality in comparison with the existing methods. This work provides a simple and accurate alternative of great practical significance to the current Godunov paradigm which overly pursues the smoothness within mesh cell under the questionable premiss that discontinuities only appear at cell interfaces.

COMP-PHJun 22, 2012
A note on the general multi-moment constrained flux reconstruction formulation for high order schemes

Feng Xiao, Satoshi Ii, Chungang Chen et al.

This paper presents a general formulation to construct high order numerical schemes by using multi-moment constraint conditions on the flux function reconstruction. The new formulation, so called multi-moment constrained flux reconstruction (MMC-FR), distinguishes itself essentially from the flux reconstruction formulation (FR) of Huynh (2007) by imposing not only the continuity constraint conditions on the flux function at the cell boundary, but also other types constraints which may include those on the spatial derivatives or the point values. This formulation can be also interprated as a blend of Lagrange interpolation the Hermite interpolation, which provides a numerical framework to accomodate a wider spectrum of high order schemes. Some representative schemes will be presented and evaluated through Fourier analysis and numerical tests.

63.3CVMay 20Code
Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors

Siqi Wei, Hongbin Xu, Feng Xiao et al.

Existing approaches for unsupervised 3D point cloud segmentation predominantly rely on a purely visual similarity-based learning-by-clustering paradigm, which suffers from a fundamental limitation: long-tail ambiguity. In such a paradigm, features of minor classes are consistently absorbed by dominant clusters, leading to severely imbalanced predictions. To address this issue, we propose LangTail, a language-guided hierarchical learning framework that leverages the balanced world knowledge encoded in language models to mitigate long-tail ambiguity in unsupervised 3D segmentation. The key idea is to establish multi-level associations between language-derived semantic priors and visually underrepresented minor classes, thereby compensating for the biased attention of purely visual clustering toward dominant classes. Specifically, LangTail first constructs an entity-level semantic prior from language models, capturing balanced and fine-grained world knowledge across categories. These priors are injected into a hierarchical clustering framework via contrastive alignment. This guides multi-granularity semantic structure formation and prevents minor classes from being absorbed by dominant clusters, yielding more discriminative representations for underrepresented categories. Extensive experiments on ScanNet-v2, S3DIS, and nuScenes demonstrate that LangTail consistently outperforms existing methods by significant margins, \ie, +13.5, +12.9, and +8.9 mIoU, respectively. These results demonstrate the effectiveness of language priors in improving the representation of minority classes in 3D point clouds. The code will be released at: https://github.com/Whisky0129/langtail_official.

43.3LGMay 28
CLUBench: A Clustering Benchmark

Feng Xiao, Dazhi Fu, Chris Ding et al.

Clustering is a fundamental problem in data science with a long-standing research history, yielding numerous insightful algorithms. Despite this progress, a systematic and large-scale empirical evaluation that jointly considers conventional algorithms, deep learning-based methods, and recent foundation model-based clustering remains largely absent, leading to limited guidance on algorithm selection and deployment. To address this gap, we introduce CLUBench, a comprehensive clustering benchmark comprising 24 algorithms of diverse principles evaluated on 131 datasets across tabular, text, and image data, involving 178,815 experiments. Importantly, our analyses of (i) the impact of hyperparameter tuning,(ii) the impact of data types and characteristics,(iii) the impact of pretrained embeddings,(iv) large language model-based clustering,(v) the similarity of algorithms, and (vi) the low-rank structures of performance matrices, yield meaningful insights and promising pathways for clustering research. For instance, our study reveals that: 1) All evaluated deep clustering methods do not exhibit a significant advantage compared with the top-performing conventional clustering algorithms (e.g., KMeans, SpeClu) in terms of average performance; 2) For image and text clustering tasks, combining pretrained embeddings with conventional clustering algorithms (e.g., KMeans, SpeClu) offers effective and efficient clustering; 3) Clustering remains a challenging and nontrivial problem, even in the era of increasingly dominant foundation models. Moreover, we propose to use the low-rank structure in cross-model performance matrices to efficiently approximate the overall performance evaluation in practical applications. We further demonstrate the feasibility of model selection based on the performance matrices across all hyperparameter configurations.

CRJul 23, 2024
RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent

Huiyu Xu, Wenhui Zhang, Zhibo Wang et al.

Recently, advanced Large Language Models (LLMs) such as GPT-4 have been integrated into many real-world applications like Code Copilot. These applications have significantly expanded the attack surface of LLMs, exposing them to a variety of threats. Among them, jailbreak attacks that induce toxic responses through jailbreak prompts have raised critical safety concerns. To identify these threats, a growing number of red teaming approaches simulate potential adversarial scenarios by crafting jailbreak prompts to test the target LLM. However, existing red teaming methods do not consider the unique vulnerabilities of LLM in different scenarios, making it difficult to adjust the jailbreak prompts to find context-specific vulnerabilities. Meanwhile, these methods are limited to refining jailbreak templates using a few mutation operations, lacking the automation and scalability to adapt to different scenarios. To enable context-aware and efficient red teaming, we abstract and model existing attacks into a coherent concept called "jailbreak strategy" and propose a multi-agent LLM system named RedAgent that leverages these strategies to generate context-aware jailbreak prompts. By self-reflecting on contextual feedback in an additional memory buffer, RedAgent continuously learns how to leverage these strategies to achieve effective jailbreaks in specific contexts. Extensive experiments demonstrate that our system can jailbreak most black-box LLMs in just five queries, improving the efficiency of existing red teaming methods by two times. Additionally, RedAgent can jailbreak customized LLM applications more efficiently. By generating context-aware jailbreak prompts towards applications on GPTs, we discover 60 severe vulnerabilities of these real-world applications with only two queries per vulnerability. We have reported all found issues and communicated with OpenAI and Meta for bug fixes.

NAMay 26, 2016
A note on implementation of boundary variation diminishing algorithm to high-order local polynomial-based schemes

Yoshiaki Abe, Ziyao Sun, Feng Xiao

A novel approach for selecting appropriate reconstructions is implemented to the hyperbolic conservation laws in the high-order local polynomial-based framework, e.g., the discontinuous Galerkin (DG) and flux reconstruction (FR) schemes. The high-order polynomial approximation generally fails to correctly capture a strong discontinuity inside a cell due to the Runge phenomenon, which is replaced by more stable approximation on the basis of a troubled-cell indicator such as that used in the total variation bounded (TVB) limiter. This paper examines the applicability of a new algorithm, so-called boundary variation diminishing (BVD) reconstruction, to the weighted essentially non-oscillatory (WENO) methodology in the FR framework including the nodal type DG method. The BVD reconstruction adaptively chooses a proper approximation for the solution function so as to minimize the jump between values at the left- and right-side of cell boundaries without any ad hoc constant such as the TVB parameter. Several numerical tests are conducted for a linear advection equation as well as the selection of an appropriate reconstruction, where the results of the BVD algorithm are comparable to those using the conventional TVB limiter. Note that the present work is limited to third or lower order polynomials, leaving the implementations for higher-order schemes a future work.

LGJul 9, 2023
Restricted Generative Projection for One-Class Classification and Anomaly Detection

Feng Xiao, Ruoyu Sun, Jicong Fan

We present a simple framework for one-class classification and anomaly detection. The core idea is to learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution. Crucially, the target distribution should be sufficiently simple, compact, and informative. The simplicity is to ensure that we can sample from the distribution easily, the compactness is to ensure that the decision boundary between normal data and abnormal data is clear and reliable, and the informativeness is to ensure that the transformed data preserve the important information of the original data. Therefore, we propose to use truncated Gaussian, uniform in hypersphere, uniform on hypersphere, or uniform between hyperspheres, as the target distribution. We then minimize the distance between the transformed data distribution and the target distribution while keeping the reconstruction error for the original data small enough. Comparative studies on multiple benchmark datasets verify the effectiveness of our methods in comparison to baselines.

CLMay 12, 2025Code
TiSpell: A Semi-Masked Methodology for Tibetan Spelling Correction covering Multi-Level Error with Data Augmentation

Yutong Liu, Feng Xiao, Ziyue Zhang et al.

Multi-level Tibetan spelling correction addresses errors at both the character and syllable levels within a unified model. Existing methods focus mainly on single-level correction and lack effective integration of both levels. Moreover, there are no open-source datasets or augmentation methods tailored for this task in Tibetan. To tackle this, we propose a data augmentation approach using unlabeled text to generate multi-level corruptions, and introduce TiSpell, a semi-masked model capable of correcting both character- and syllable-level errors. Although syllable-level correction is more challenging due to its reliance on global context, our semi-masked strategy simplifies this process. We synthesize nine types of corruptions on clean sentences to create a robust training set. Experiments on both simulated and real-world data demonstrate that TiSpell, trained on our dataset, outperforms baseline models and matches the performance of state-of-the-art approaches, confirming its effectiveness.

LGJan 2
IRPM: Intergroup Relative Preference Modeling for Pointwise Generative Reward Models

Haonan Song, Qingchen Xie, Huan Zhu et al.

Generative Reward Models (GRMs) have demonstrated strong performance in reward modeling, due to their interpretability and potential for refinement through reinforcement learning (RL). However, widely used pairwise GRMs create a computational bottleneck in reinforcement learning from human feedback (RLHF), when calibrating or aggregating preference signals over n candidates, often incurring O(n^2) pairwise judgments. To address this issue, we propose Intergroup Relative Preference Modeling (IRPM), an RL-based method that extends the Bradley--Terry preference-learning paradigm via intergroup comparisons to train pointwise GRMs from pairwise preference data. IRPM derives pointwise reward for each response by contrasting groups of chosen vs. rejected samples, enabling pointwise scores comparable across candidate sets and O(n) reward evaluation for a variable number of candidates during RL training, while preserving interpretability and scalability. Experiments show that IRPM achieves state-of-the-art performance among pointwise GRMs on RM-Bench, JudgeBench and RewardBench, and approaches the performance of leading pairwise GRMs. In addition, IRPM achieves substantial gains in post-training evaluations, demonstrating its effectiveness.

93.1OCApr 9
Robust Control of General Linear Delay Systems under Dissipativity: Part I -- A KSD-based Framework

Qian Feng, Wei Xing Zheng, Xiaoyu Wang et al.

This paper introduces an effective framework for designing memoryless dissipative full-state feedback for general linear delay systems via the KrasovskiÄ­ functional (KF) approach, where an arbitrary finite number of pointwise and general distributed delays (DDs) exists in the state, input and output. To handle the infinite dimensionality of DDs, we employ the Kronecker-Seuret Decomposition (KSD) which we recently proposed for analyzing matrix-valued functions in the context of delay systems. The KSD enables factorization or least-squares approximation of any number of $\fL^2$ DD kernels from any number of DDs without introducing conservatism. This also facilitates the construction of a complete-type KF with flexible integral kernels by means of a novel integral inequality derived from the least-squares principle. Our solution includes two theorems and an iterative algorithm to compute controller gains without relying on nonlinear solvers. A numerical example is tested to show the effectiveness of the proposed approach.

LGSep 23, 2024
Kriformer: A Novel Spatiotemporal Kriging Approach Based on Graph Transformers

Renbin Pan, Feng Xiao, Hegui Zhang et al.

Accurately estimating data in sensor-less areas is crucial for understanding system dynamics, such as traffic state estimation and environmental monitoring. This study addresses challenges posed by sparse sensor deployment and unreliable data by framing the problem as a spatiotemporal kriging task and proposing a novel graph transformer model, Kriformer. This model estimates data at locations without sensors by mining spatial and temporal correlations, even with limited resources. Kriformer utilizes transformer architecture to enhance the model's perceptual range and solve edge information aggregation challenges, capturing spatiotemporal information effectively. A carefully constructed positional encoding module embeds the spatiotemporal features of nodes, while a sophisticated spatiotemporal attention mechanism enhances estimation accuracy. The multi-head spatial interaction attention module captures subtle spatial relationships between observed and unobserved locations. During training, a random masking strategy prompts the model to learn with partial information loss, allowing the spatiotemporal embedding and multi-head attention mechanisms to synergistically capture correlations among locations. Experimental results show that Kriformer excels in representation learning for unobserved locations, validated on two real-world traffic speed datasets, demonstrating its effectiveness in spatiotemporal kriging tasks.

CLJul 16, 2025Code
Text-ADBench: Text Anomaly Detection Benchmark based on LLMs Embedding

Feng Xiao, Jicong Fan

Text anomaly detection is a critical task in natural language processing (NLP), with applications spanning fraud detection, misinformation identification, spam detection and content moderation, etc. Despite significant advances in large language models (LLMs) and anomaly detection algorithms, the absence of standardized and comprehensive benchmarks for evaluating the existing anomaly detection methods on text data limits rigorous comparison and development of innovative approaches. This work performs a comprehensive empirical study and introduces a benchmark for text anomaly detection, leveraging embeddings from diverse pre-trained language models across a wide array of text datasets. Our work systematically evaluates the effectiveness of embedding-based text anomaly detection by incorporating (1) early language models (GloVe, BERT); (2) multiple LLMs (LLaMa-2, LLama-3, Mistral, OpenAI (small, ada, large)); (3) multi-domain text datasets (news, social media, scientific publications); (4) comprehensive evaluation metrics (AUROC, AUPRC). Our experiments reveal a critical empirical insight: embedding quality significantly governs anomaly detection efficacy, and deep learning-based approaches demonstrate no performance advantage over conventional shallow algorithms (e.g., KNN, Isolation Forest) when leveraging LLM-derived embeddings.In addition, we observe strongly low-rank characteristics in cross-model performance matrices, which enables an efficient strategy for rapid model evaluation (or embedding evaluation) and selection in practical applications. Furthermore, by open-sourcing our benchmark toolkit that includes all embeddings from different models and code at https://github.com/jicongfan/Text-Anomaly-Detection-Benchmark, this work provides a foundation for future research in robust and scalable text anomaly detection systems.

58.2CLApr 13
Triviality Corrected Endogenous Reward

Xinda Wang, Zhengxu Hou, Yangshijie Zhang et al.

Reinforcement learning for open-ended text generation is constrained by the lack of verifiable rewards, necessitating reliance on judge models that require either annotated data or powerful closed-source models. Inspired by recent work on unsupervised reinforcement learning for mathematical reasoning using confidence-based endogenous rewards, we investigate whether this principle can be adapted to open-ended writing tasks. We find that directly applying confidence rewards leads to Triviality Bias: the policy collapses toward high-probability outputs, reducing diversity and meaningful content. We propose TCER (Triviality Corrected Endogenous Reward), which addresses this bias by rewarding the relative information gain between a specialist policy and a generalist reference policy, modulated by a probability-dependent correction mechanism. Across multiple writing benchmarks and model architectures, TCER achieves consistent improvements without external supervision. Furthermore, TCER also transfers effectively to mathematical reasoning, validating the generality of our approach across different generation tasks.

24.1AIApr 7
SignalClaw: LLM-Guided Evolutionary Synthesis of Interpretable Traffic Signal Control Skills

Da Lei, Feng Xiao, Lu Li et al.

Traffic signal control TSC requires strategies that are both effective and interpretable for deployment, yet reinforcement learning produces opaque neural policies while program synthesis depends on restrictive domain-specific languages. We present SIGNALCLAW, a framework that uses large language models LLMs as evolutionary skill generators to synthesize and refine interpretable control skills for adaptive TSC. Each skill includes rationale, selection guidance, and executable code, making policies human-inspectable and self-documenting. At each generation, evolution signals from simulation metrics such as queue percentiles, delay trends, and stagnation are translated into natural language feedback to guide improvement. SignalClaw also introduces event-driven compositional evolution: an event detector identifies emergency vehicles, transit priority, incidents, and congestion via TraCI, and a priority dispatcher selects specialized skills. Each skill is evolved independently, and a priority chain enables runtime composition without retraining. We evaluate SignalClaw on routine and event-injected SUMO scenarios against four baselines. On routine scenarios, it achieves average delay of 7.8 to 9.2 seconds, within 3 to 10 percent of the best method, with low variance across random seeds. Under event scenarios, it yields the lowest emergency delay 11.2 to 18.5 seconds versus 42.3 to 72.3 for MaxPressure and 78.5 to 95.3 for DQN, and the lowest transit person delay 9.8 to 11.5 seconds versus 38.7 to 45.2 for MaxPressure. In mixed events, the dispatcher composes skills effectively while maintaining stable overall delay. The evolved skills progress from simple linear rules to conditional strategies with multi-feature interactions, while remaining fully interpretable and directly modifiable by traffic engineers.

CVNov 4, 2025
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models

Tianfan Peng, Yuntao Du, Pengzhou Ji et al.

Large multimodal models (LMMs) often suffer from severe inference inefficiency due to the large number of visual tokens introduced by image encoders. While recent token compression methods, such as pruning and merging, have shown promise in reducing redundancy, their evaluation remains fragmented and inconsistent. In this work, we present UniPruneBench, a unified and extensible benchmark for visual token pruning in multimodal LLMs. UniPruneBench provides standardized protocols across six ability dimensions and ten datasets, covering ten representative compression algorithms and three families of LMMs (LLaVA-v1.5, Intern-VL3, and Qwen2.5-VL). Beyond task accuracy, it incorporates system-level metrics such as runtime and prefilling latency to provide a holistic view. Our experiments uncover several key findings: (1) random pruning is a surprisingly strong baseline, (2) no single method consistently outperforms others across scenarios, (3) pruning sensitivity varies significantly across tasks, with OCR being most vulnerable, and (4) pruning ratio is the dominant factor governing performance degradation. We believe UniPruneBench will serve as a reliable foundation for future research on efficient multimodal modeling.

CLJan 12
Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning

Ziheng Li, Liu Kang, Feng Xiao et al.

Group Relative Policy Optimization (GRPO) has emerged as a promising critic-free reinforcement learning paradigm for reasoning tasks. However, standard GRPO employs a coarse-grained credit assignment mechanism that propagates group-level rewards uniformly to to every token in a sequence, neglecting the varying contribution of individual reasoning steps. We address this limitation by introducing Outcome-grounded Advantage Reshaping (OAR), a fine-grained credit assignment mechanism that redistributes advantages based on how much each token influences the model's final answer. We instantiate OAR via two complementary strategies: (1) OAR-P, which estimates outcome sensitivity through counterfactual token perturbations, serving as a high-fidelity attribution signal; (2) OAR-G, which uses an input-gradient sensitivity proxy to approximate the influence signal with a single backward pass. These importance signals are integrated with a conservative Bi-Level advantage reshaping scheme that suppresses low-impact tokens and boosts pivotal ones while preserving the overall advantage mass. Empirical results on extensive mathematical reasoning benchmarks demonstrate that while OAR-P sets the performance upper bound, OAR-G achieves comparable gains with negligible computational overhead, both significantly outperforming a strong GRPO baseline, pushing the boundaries of critic-free LLM reasoning.

CVMar 13, 2024
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention

Feng Xiao, Hongbin Xu, Qiuxia Wu et al.

3D visual grounding aims to automatically locate the 3D region of the specified object given the corresponding textual description. Existing works fail to distinguish similar objects especially when multiple referred objects are involved in the description. Experiments show that direct matching of language and visual modal has limited capacity to comprehend complex referential relationships in utterances. It is mainly due to the interference caused by redundant visual information in cross-modal alignment. To strengthen relation-orientated mapping between different modalities, we propose SeCG, a semantic-enhanced relational learning model based on a graph network with our designed memory graph attention layer. Our method replaces original language-independent encoding with cross-modal encoding in visual analysis. More text-related feature expressions are obtained through the guidance of global semantics and implicit relationships. Experimental results on ReferIt3D and ScanRefer benchmarks show that the proposed method outperforms the existing state-of-the-art methods, particularly improving the localization performance for the multi-relation challenges.

CLApr 9, 2025
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations

Zican Dong, Han Peng, Peiyu Liu et al.

Mixture-of-Experts (MoE) models achieve a favorable trade-off between performance and inference efficiency by activating only a subset of experts. However, the memory overhead of storing all experts remains a major limitation, especially in large-scale MoE models such as DeepSeek-R1(671B). In this study, we investigate domain specialization and expert redundancy in large-scale MoE models and uncover a consistent behavior we term few-shot expert localization, with only a few in-domain demonstrations, the model consistently activates a sparse and stable subset of experts on tasks within the same domain. Building on this observation, we propose a simple yet effective pruning framework, EASY-EP, that leverages a few domain-specific demonstrations to identify and retain only the most relevant experts. EASY-EP comprises two key components: output-aware expert importance assessment and expert-level token contribution estimation. The former evaluates the importance of each expert for the current token by considering the gating scores and L2 norm of the outputs of activated experts, while the latter assesses the contribution of tokens based on representation similarities before and after routed experts. Experiments on DeepSeek-R1 and DeepSeek-V3-0324 show that our method can achieve comparable performances and $2.99\times$ throughput under the same memory budget with full model with only half the experts.

CVOct 14, 2024
4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting

Wanlin Liang, Hongbin Xu, Weitao Chen et al.

3D neural style transfer has gained significant attention for its potential to provide user-friendly stylization with spatial consistency. However, existing 3D style transfer methods often fall short in terms of inference efficiency, generalization ability, and struggle to handle dynamic scenes with temporal consistency. In this paper, we introduce 4DStyleGaussian, a novel 4D style transfer framework designed to achieve real-time stylization of arbitrary style references while maintaining reasonable content affinity, multi-view consistency, and temporal coherence. Our approach leverages an embedded 4D Gaussian Splatting technique, which is trained using a reversible neural network for reducing content loss in the feature distillation process. Utilizing the 4D embedded Gaussians, we predict a 4D style transformation matrix that facilitates spatially and temporally consistent style transfer with Gaussian Splatting. Experiments demonstrate that our method can achieve high-quality and zero-shot stylization for 4D scenarios with enhanced efficiency and spatial-temporal consistency.

CVOct 12, 2024
ControLRM: Fast and Controllable 3D Generation via Large Reconstruction Model

Hongbin Xu, Weitao Chen, Zhipeng Zhou et al.

Despite recent advancements in 3D generation methods, achieving controllability still remains a challenging issue. Current approaches utilizing score-distillation sampling are hindered by laborious procedures that consume a significant amount of time. Furthermore, the process of first generating 2D representations and then mapping them to 3D lacks internal alignment between the two forms of representation. To address these challenges, we introduce ControLRM, an end-to-end feed-forward model designed for rapid and controllable 3D generation using a large reconstruction model (LRM). ControLRM comprises a 2D condition generator, a condition encoding transformer, and a triplane decoder transformer. Instead of training our model from scratch, we advocate for a joint training framework. In the condition training branch, we lock the triplane decoder and reuses the deep and robust encoding layers pretrained with millions of 3D data in LRM. In the image training branch, we unlock the triplane decoder to establish an implicit alignment between the 2D and 3D representations. To ensure unbiased evaluation, we curate evaluation samples from three distinct datasets (G-OBJ, GSO, ABO) rather than relying on cherry-picking manual generation. The comprehensive experiments conducted on quantitative and qualitative comparisons of 3D controllability and generation quality demonstrate the strong generalization capacity of our proposed approach.

CVMar 13, 2024
StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields

Hongbin Xu, Weitao Chen, Feng Xiao et al.

4D style transfer aims at transferring arbitrary visual style to the synthesized novel views of a dynamic 4D scene with varying viewpoints and times. Existing efforts on 3D style transfer can effectively combine the visual features of style images and neural radiance fields (NeRF) but fail to handle the 4D dynamic scenes limited by the static scene assumption. Consequently, we aim to handle the novel challenging problem of 4D style transfer for the first time, which further requires the consistency of stylized results on dynamic objects. In this paper, we introduce StyleDyRF, a method that represents the 4D feature space by deforming a canonical feature volume and learns a linear style transformation matrix on the feature volume in a data-driven fashion. To obtain the canonical feature volume, the rays at each time step are deformed with the geometric prior of a pre-trained dynamic NeRF to render the feature map under the supervision of pre-trained visual encoders. With the content and style cues in the canonical feature volume and the style image, we can learn the style transformation matrix from their covariance matrices with lightweight neural networks. The learned style transformation matrix can reflect a direct matching of feature covariance from the content volume to the given style pattern, in analogy with the optimization of the Gram matrix in traditional 2D neural style transfer. The experimental results show that our method not only renders 4D photorealistic style transfer results in a zero-shot manner but also outperforms existing methods in terms of visual quality and consistency.

AIAug 7, 2025
MV-Debate: Multi-view Agent Debate with Dynamic Reflection Gating for Multimodal Harmful Content Detection in Social Media

Rui Lu, Jinhe Bi, Yunpu Ma et al.

Social media has evolved into a complex multimodal environment where text, images, and other signals interact to shape nuanced meanings, often concealing harmful intent. Identifying such intent, whether sarcasm, hate speech, or misinformation, remains challenging due to cross-modal contradictions, rapid cultural shifts, and subtle pragmatic cues. To address these challenges, we propose MV-Debate, a multi-view agent debate framework with dynamic reflection gating for unified multimodal harmful content detection. MV-Debate assembles four complementary debate agents, a surface analyst, a deep reasoner, a modality contrast, and a social contextualist, to analyze content from diverse interpretive perspectives. Through iterative debate and reflection, the agents refine responses under a reflection-gain criterion, ensuring both accuracy and efficiency. Experiments on three benchmark datasets demonstrate that MV-Debate significantly outperforms strong single-model and existing multi-agent debate baselines. This work highlights the promise of multi-agent debate in advancing reliable social intent detection in safety-critical online contexts.

IVApr 2, 2025
Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms

Junchi Zhou, Haozhou Wang, Yoichiro Kato et al.

Developing computer vision-based rice phenotyping techniques is crucial for precision field management and accelerating breeding, thereby continuously advancing rice production. Among phenotyping tasks, distinguishing image components is a key prerequisite for characterizing plant growth and development at the organ scale, enabling deeper insights into eco-physiological processes. However, due to the fine structure of rice organs and complex illumination within the canopy, this task remains highly challenging, underscoring the need for a high-quality training dataset. Such datasets are scarce, both due to a lack of large, representative collections of rice field images and the time-intensive nature of annotation. To address this gap, we established the first comprehensive multi-class rice semantic segmentation dataset, RiceSEG. We gathered nearly 50,000 high-resolution, ground-based images from five major rice-growing countries (China, Japan, India, the Philippines, and Tanzania), encompassing over 6,000 genotypes across all growth stages. From these original images, 3,078 representative samples were selected and annotated with six classes (background, green vegetation, senescent vegetation, panicle, weeds, and duckweed) to form the RiceSEG dataset. Notably, the sub-dataset from China spans all major genotypes and rice-growing environments from the northeast to the south. Both state-of-the-art convolutional neural networks and transformer-based semantic segmentation models were used as baselines. While these models perform reasonably well in segmenting background and green vegetation, they face difficulties during the reproductive stage, when canopy structures are more complex and multiple classes are involved. These findings highlight the importance of our dataset for developing specialized segmentation models for rice and other crops.

CVOct 11, 2025
B2N3D: Progressive Learning from Binary to N-ary Relationships for 3D Object Grounding

Feng Xiao, Hongbin Xu, Hai Ci et al.

Localizing 3D objects using natural language is essential for robotic scene understanding. The descriptions often involve multiple spatial relationships to distinguish similar objects, making 3D-language alignment difficult. Current methods only model relationships for pairwise objects, ignoring the global perceptual significance of n-ary combinations in multi-modal relational understanding. To address this, we propose a novel progressive relational learning framework for 3D object grounding. We extend relational learning from binary to n-ary to identify visual relations that match the referential description globally. Given the absence of specific annotations for referred objects in the training data, we design a grouped supervision loss to facilitate n-ary relational learning. In the scene graph created with n-ary relationships, we use a multi-modal network with hybrid attention mechanisms to further localize the target within the n-ary combinations. Experiments and ablation studies on the ReferIt3D and ScanRefer benchmarks demonstrate that our method outperforms the state-of-the-art, and proves the advantages of the n-ary relational perception in 3D localization.

CVMay 7, 2025
LSVG: Language-Guided Scene Graphs with 2D-Assisted Multi-Modal Encoding for 3D Visual Grounding

Feng Xiao, Hongbin Xu, Guocan Zhao et al.

3D visual grounding aims to localize the unique target described by natural languages in 3D scenes. The significant gap between 3D and language modalities makes it a notable challenge to distinguish multiple similar objects through the described spatial relationships. Current methods attempt to achieve cross-modal understanding in complex scenes via a target-centered learning mechanism, ignoring the modeling of referred objects. We propose a novel 3D visual grounding framework that constructs language-guided scene graphs with referred object discrimination to improve relational perception. The framework incorporates a dual-branch visual encoder that leverages pre-trained 2D semantics to enhance and supervise the multi-modal 3D encoding. Furthermore, we employ graph attention to promote relationship-oriented information fusion in cross-modal interaction. The learned object representations and scene graph structure enable effective alignment between 3D visual content and textual descriptions. Experimental results on popular benchmarks demonstrate our superior performance compared to state-of-the-art methods, especially in handling the challenges of multiple similar distractors.

CVApr 21, 2025
Cyc3D: Fine-grained Controllable 3D Generation via Cycle Consistency Regularization

Hongbin Xu, Chaohui Yu, Feng Xiao et al.

Despite the remarkable progress of 3D generation, achieving controllability, i.e., ensuring consistency between generated 3D content and input conditions like edge and depth, remains a significant challenge. Existing methods often struggle to maintain accurate alignment, leading to noticeable discrepancies. To address this issue, we propose \name{}, a new framework that enhances controllable 3D generation by explicitly encouraging cyclic consistency between the second-order 3D content, generated based on extracted signals from the first-order generation, and its original input controls. Specifically, we employ an efficient feed-forward backbone that can generate a 3D object from an input condition and a text prompt. Given an initial viewpoint and a control signal, a novel view is rendered from the generated 3D content, from which the extracted condition is used to regenerate the 3D content. This re-generated output is then rendered back to the initial viewpoint, followed by another round of control signal extraction, forming a cyclic process with two consistency constraints. \emph{View consistency} ensures coherence between the two generated 3D objects, measured by semantic similarity to accommodate generative diversity. \emph{Condition consistency} aligns the final extracted signal with the original input control, preserving structural or geometric details throughout the process. Extensive experiments on popular benchmarks demonstrate that \name{} significantly improves controllability, especially for fine-grained details, outperforming existing methods across various conditions (e.g., +14.17\% PSNR for edge, +6.26\% PSNR for sketch).

CVSep 23, 2020
Demand Forecasting in Bike-sharing Systems Based on A Multiple Spatiotemporal Fusion Network

Xiao Yan, Gang Kou, Feng Xiao et al.

Bike-sharing systems (BSSs) have become increasingly popular around the globe and have attracted a wide range of research interests. In this paper, the demand forecasting problem in BSSs is studied. Spatial and temporal features are critical for demand forecasting in BSSs, but it is challenging to extract spatiotemporal dynamics. Another challenge is to capture the relations between spatiotemporal dynamics and external factors, such as weather, day-of-week, and time-of-day. To address these challenges, we propose a multiple spatiotemporal fusion network named MSTF-Net. MSTF-Net consists of multiple spatiotemporal blocks: 3D convolutional network (3D-CNN) blocks, eidetic 3D convolutional long short-term memory networks (E3D-LSTM) blocks, and fully-connected (FC) blocks. Specifically, 3D-CNN blocks highlight extracting short-term spatiotemporal dependence in each fragment (i.e., closeness, period, and trend); E3D-LSTM blocks further extract long-term spatiotemporal dependence over all fragments; FC blocks extract nonlinear correlations of external factors. Finally, the latent representations of E3D-LSTM and FC blocks are fused to obtain the final prediction. For two real-world datasets, it is shown that MSTF-Net outperforms seven state-of-the-art models.

SESep 4, 2020
A Framework and DataSet for Bugs in Ethereum Smart Contracts

Pengcheng Zhang, Feng Xiao, Xiapu Luo

Ethereum is the largest blockchain platform that supports smart contracts. Users deploy smart contracts by publishing the smart contract's bytecode to the blockchain. Since the data in the blockchain cannot be modified, even if these contracts contain bugs, it is not possible to patch deployed smart contracts with code updates. Moreover, there is currently neither a comprehensive classification framework for Ethereum smart contract bugs, nor detailed criteria for detecting bugs in smart contracts, making it difficult for developers to fully understand the negative effects of bugs and design new approaches to detect bugs. In this paper, to fill the gap, we first collect as many smart contract bugs as possible from multiple sources and divide these bugs into 9 categories by extending the IEEE Standard Classification for Software Anomalies. Then, we design the criteria for detecting each kind of bugs, and construct a dataset of smart contracts covering all kinds of bugs. With our framework and dataset, developers can learn smart contract bugs and develop new tools to detect and locate bugs in smart contracts. Moreover, we evaluate the state-of-the-art tools for smart contract analysis with our dataset and obtain some interesting findings: 1) Mythril, Slither and Remix are the most worthwhile combination of analysis tools. 2) There are still 10 kinds of bugs that cannot be detected by any analysis tool.

CRDec 30, 2019
ICSTrace: A Malicious IP Traceback Model for Attacking Data of Industrial Control System

Feng Xiao, Qiang Xu

Considering the attacks against industrial control system are mostly organized and premeditated actions, IP traceback is significant for the security of industrial control system. Based on the infrastructure of the Internet, we have developed a novel malicious IP traceback model-ICSTrace, without deploying any new services. The model extracts the function codes and their parameters from the attack data according to the format of industrial control protocol, and employs a short sequence probability method to transform the function codes and their parameter into a vector, which characterizes the attack pattern of malicious IP addresses. Furthermore, a Partial Seeded K-Means algorithm is proposed for the pattern's clustering, which helps in tracing the attacks back to an organization. ICSTrace is evaluated basing on the attack data captured by the large-scale deployed honeypots for industrial control system, and the results demonstrate that ICSTrace is effective on malicious IP traceback in industrial control system.

SENov 21, 2019
SolidityCheck : Quickly Detecting Smart Contract Problems Through Regular Expressions

Pengcheng Zhang, Feng Xiao, Xiapu Luo

As a blockchain platform that has developed vigorously in recent years, Ethereum is different from Bitcoin in that it introduces smart contracts into blockchain.Solidity is one of the most mature and widely used smart contract programming language,which is used to write smart contracts and deploy them on blockchain. However, once the data in the blockchain is written, it cannot be modified. Ethereum smart contract is stored in the block chain, which makes the smart contract can no longer repair the code problems such as re-entrancy vulnerabilities or integer overflow problems. Currently, there still lacks of an efficient and effective approach for detecting these problems in Solidity. In this paper, we first classify all the possible problems in Solidity, then propose a smart contract problem detection approach for Solidity, namely SolidityCheck. The approach uses regular expressions to define the characteristics of problematic statements and uses regular matching and program instrumentation to prevent or detect problems. Finally, a large number of experiments is performed to show that SolidityCheck is superior to existing approaches.

LGApr 15, 2019
Learning Spatiotemporal Features of Ride-sourcing Services with Fusion Convolutional Network

Feng Xiao, Dapeng Zhang, Gang Kou et al.

To collectively forecast the demand for ride-sourcing services in all regions of a city, the deep learning approaches have been applied with commendable results. However, the local statistical differences throughout the geographical layout of the city make the spatial stationarity assumption of the convolution invalid, which limits the performance of CNNs on the demand forecasting task. In this paper, we propose a novel deep learning framework called LC-ST-FCN (locally connected spatiotemporal fully-convolutional neural network) to address the unique challenges of the region-level demand forecasting problem within one end-to-end architecture (E2E). We first employ the 3D convolutional layers to fuse the spatial and temporal information existed in the input and then feed the spatiotemporal features extracted by the 3D convolutional layers to the subsequent 2D convolutional layers. Afterward, the prediction value of each region is obtained by the locally connected convolutional layers which relax the parameter sharing scheme. We evaluate the proposed model on a real dataset from a ride-sourcing service platform (DiDiChuxing) and observe significant improvements compared with a bunch of baseline models. Besides, we also illustrate the effectiveness of our proposed model by visualizing how different types of convolutional layers transform their input and capture useful features. The visualization results show that fully convolutional architecture enables the model to better localize the related regions. And the locally connected layers play an important role in dealing with the local statistical differences and activating useful regions.

SPOct 12, 2018
PatternListener: Cracking Android Pattern Lock Using Acoustic Signals

Man Zhou, Qian Wang, Jingxiao Yang et al.

Pattern lock has been widely used for authentication to protect user privacy on mobile devices (e.g., smartphones and tablets). Given its pervasive usage, the compromise of pattern lock could lead to serious consequences. Several attacks have been constructed to crack the lock. However, these approaches require the attackers to either be physically close to the target device or be able to manipulate the network facilities (e.g., WiFi hotspots) used by the victims. Therefore, the effectiveness of the attacks is significantly impacted by the environment of mobile devices. Also, these attacks are not scalable since they cannot easily infer unlock patterns of a large number of devices. Motivated by an observation that fingertip motions on the screen of a mobile device can be captured by analyzing surrounding acoustic signals on it, we propose PatternListener, a novel acoustic attack that cracks pattern lock by analyzing imperceptible acoustic signals reflected by the fingertip. It leverages speakers and microphones of the victim's device to play imperceptible audio and record the acoustic signals reflected by the fingertip. In particular, it infers each unlock pattern by analyzing individual lines that compose the pattern and are the trajectories of the fingertip. We propose several algorithms to construct signal segments according to the captured signals for each line and infer possible candidates of each individual line according to the signal segments. Finally, we map all line candidates into grid patterns and thereby obtain the candidates of the entire unlock pattern. We implement a PatternListener prototype by using off-the-shelf smartphones and thoroughly evaluate it using 130 unique patterns. The real experimental results demonstrate that PatternListener can successfully exploit over 90% patterns within five attempts.

LGSep 26, 2017
A Deep Learning Model for Traffic Flow State Classification Based on Smart Phone Sensor Data

Wenwen Tu, Feng Xiao, Liping Fu et al.

This study proposes a Deep Belief Network model to classify traffic flow states. The model is capable of processing massive, high-density, and noise-contaminated data sets generated from smartphone sensors. The statistical features of Vehicle acceleration, angular acceleration, and GPS speed data, recorded by smartphone software, are analyzed, and then used as input for traffic flow state classification. Data from a five-day experiment is used to train and test the proposed model. A total of 747,856 sets of data are generated and used for both traffic flow states classification and sensitivity analysis of input variables. The result shows that the proposed Deep Belief Network model is superior to traditional machine learning methods in both classification performance and computational efficiency.

NAAug 2, 2017
Some practical versions of boundary variation diminishing (BVD) algorithm

Xi Deng, Bin Xie, Feng Xiao

This short note presents some variant schemes of boundary variation diminishing (BVD) algorithm in one dimension with the results of numerical tests for linear advection equation to facilitate practical use. In spite of being presented in 1D fashion, all the schemes are simple and easy to implement in multi-dimensions on structured and unstructured grids for nonlinear and system equations.

NIApr 5, 2017
CHAOS: an SDN-based Moving Target Defense System

Juan Wang, Feng Xiao, Jianwei Huang et al.

The static nature of current cyber systems has made them easy to be attacked and compromised. By constantly changing a system, Moving Target Defense (MTD) has provided a promising way to reduce or move the attack surface that is available for exploitation by an adversary. However, the current network- based MTD obfuscates networks indiscriminately that makes some networks key services, such as web and DNS services, unavailable, because many information of these services has to be opened to the outside and remain real without compromising their usability. Moreover, the indiscriminate obfuscation also severely reduces the performance of networks. In this paper, we propose CHAOS, an SDN (Software-defined networking)-based MTD system, which discriminately obfuscates hosts with different security levels in a network. In CHAOS, we introduce a Chaos Tower Obfuscation (CTO) method, which uses a Chaos Tower Structure (CTS) to depict the hierarchy of all the hosts in an intranet and provides a more unpredictable and flexible obfuscation method. We also present the design of CHAOS, which leverages SDN features to obfuscate the attack surface including IP obfuscation, ports obfuscation, and fingerprint obfuscation thereby enhancing the unpredictability of the networking environment. We develop fast CTO algorithms to achieve a different degree of obfuscation for the hosts in each layer. Our experimental results show that a network protected by CHAOS is capable of decreasing the percentage of information disclosure effectively to guarantee the normal flow of traffic.