49.9CRMar 24
How Far Should We Need to Go : Evaluate Provenance-based Intrusion Detection Systems in Industrial ScenariosYue Xiao, Ling Jiang, Sen Nie et al.
Provenance-based Intrusion Detection Systems (PIDSes) have been widely used to detect Advanced Persistent Threats (APTs). Although many studies achieve high performance in the evaluations of their original papers, their performance in industrial scenarios remains unclear. To fill this gap, we conduct the first systematic evaluation and analysis of PIDSes in industrial scenarios. We first analyze the differences between the data from DARPA datasets and that collected in industrial scenarios, identifying three main new characteristics in industry: heterogeneous multi-source inputs, more powerful attackers, and increasing benign activity complexity. We then build several datasets to evaluate five state-of-the-art PIDSes. The evaluation results reveal challenges for existing PIDSes, including poor portability across different hosts and platforms, low detection performance against real-world attacks, and high false positive rates with ever-changing benign activities. Based on the evaluation results and our industrial practices, we provide several insights to solve or explain the above problems. For example, we propose a method to mitigate the high false positives, which reduces manual effort by 2/3. Finally, we propose several research suggestions to improve PIDSes.
CLJul 8, 2025Code
ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?Haoxin Wang, Xianhan Peng, Xucheng Huang et al.
In this paper, we introduce ECom-Bench, the first benchmark framework for evaluating LLM agent with multimodal capabilities in the e-commerce customer support domain. ECom-Bench features dynamic user simulation based on persona information collected from real e-commerce customer interactions and a realistic task dataset derived from authentic e-commerce dialogues. These tasks, covering a wide range of business scenarios, are designed to reflect real-world complexities, making ECom-Bench highly challenging. For instance, even advanced models like GPT-4o achieve only a 10-20% pass^3 metric in our benchmark, highlighting the substantial difficulties posed by complex e-commerce scenarios. The code and data have been made publicly available at https://github.com/XiaoduoAILab/ECom-Bench to facilitate further research and development in this domain.
CVJul 16, 2022
Consistency of Implicit and Explicit Features Matters for Monocular 3D Object DetectionQian Ye, Ling Jiang, Wang Zhen et al.
Low-cost autonomous agents including autonomous driving vehicles chiefly adopt monocular 3D object detection to perceive surrounding environment. This paper studies 3D intermediate representation methods which generate intermediate 3D features for subsequent tasks. For example, the 3D features can be taken as input for not only detection, but also end-to-end prediction and/or planning that require a bird's-eye-view feature representation. In the study, we found that in generating 3D representation previous methods do not maintain the consistency between the objects' implicit poses in the latent space, especially orientations, and the explicitly observed poses in the Euclidean space, which can substantially hurt model performance. To tackle this problem, we present a novel monocular detection method, the first one being aware of the poses to purposefully guarantee that they are consistent between the implicit and explicit features. Additionally, we introduce a local ray attention mechanism to efficiently transform image features to voxels at accurate 3D locations. Thirdly, we propose a handcrafted Gaussian positional encoding function, which outperforms the sinusoidal encoding function while retaining the benefit of being continuous. Results show that our method improves the state-of-the-art 3D intermediate representation method by 3.15%. We are ranked 1st among all the reported monocular methods on both 3D and BEV detection benchmark on KITTI leaderboard as of th result's submission time.
CVMay 15, 2024Code
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language ModelWanting Xu, Yang Liu, Langping He et al.
We introduce Xmodel-VLM, a cutting-edge multimodal vision language model. It is designed for efficient deployment on consumer GPU servers. Our work directly confronts a pivotal industry issue by grappling with the prohibitive service costs that hinder the broad adoption of large-scale multimodal systems. Through rigorous training, we have developed a 1B-scale language model from the ground up, employing the LLaVA paradigm for modal alignment. The result, which we call Xmodel-VLM, is a lightweight yet powerful multimodal vision language model. Extensive testing across numerous classic multimodal benchmarks has revealed that despite its smaller size and faster execution, Xmodel-VLM delivers performance comparable to that of larger models. Our model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/XmodelVLM.
CLJul 7, 2025Code
MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM AgentsMing Gong, Xucheng Huang, Chenghan Yang et al.
Recent advances in large language models (LLMs) have enabled new applications in e-commerce customer service. However, their capabilities remain constrained in complex, multimodal scenarios. We present MindFlow, the first open-source multimodal LLM agent tailored for e-commerce. Built on the CoALA framework, it integrates memory, decision-making, and action modules, and adopts a modular "MLLM-as-Tool" strategy for effect visual-textual reasoning. Evaluated via online A/B testing and simulation-based ablation, MindFlow demonstrates substantial gains in handling complex queries, improving user satisfaction, and reducing operational costs, with a 93.53% relative improvement observed in real-world deployments.
LGNov 23, 2025Code
Xmodel-2.5: 1.3B Data-Efficient Reasoning SLMYang Liu, Xiaolong Zhong, Ling Jiang
Large language models deliver strong reasoning and tool-use skills, yet their computational demands make them impractical for edge or cost-sensitive deployments. We present \textbf{Xmodel-2.5}, a 1.3-billion-parameter small language model designed as a \emph{drop-in agent core}. Training with maximal-update parameterization ($μ$P) allows hyper-parameters tuned on a 20M-parameter proxy to transfer directly to the full model, even under the parameter-tied \emph{tie-word-embedding} architecture. A 1.4T-token Warmup--Stable--Decay curriculum is used, and we further show that \textbf{switching from AdamW to Muon during the decay phase} improves the 13-task reasoning average by 4.58\,\% while keeping every other hyper-parameter fixed, verifying that early AdamW stability can be paired with late Muon sharpening for better downstream performance. FP8-mixed-precision training balances accuracy and throughput. All checkpoints, recipes, and evaluation code are released under the Apache-2.0 license.\footnote{https://huggingface.co/XiaoduoAILab/Xmodel-2.5 and https://huggingface.co/XiaoduoAILab/Xmodel-2.5-history (training checkpoints).} Training code and evaluation harness: https://github.com/XiaoduoAILab/Xmodel-2.5.
CLJun 5, 2024Code
Xmodel-LM Technical ReportYichuan Wang, Yang Liu, Yu Yan et al.
We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization, Xmodel-LM exhibits remarkable performance despite its smaller size. It notably surpasses existing open-source language models of similar scale. Our model checkpoints and code are publicly accessible on GitHub at https://github.com/XiaoduoAILab/XmodelLM.
CLSep 23, 2025
MemOrb: A Plug-and-Play Verbal-Reinforcement Memory Layer for E-Commerce Customer ServiceYizhe Huang, Yang Liu, Ruiyu Zhao et al.
Large Language Model-based agents(LLM-based agents) are increasingly deployed in customer service, yet they often forget across sessions, repeat errors, and lack mechanisms for continual self-improvement. This makes them unreliable in dynamic settings where stability and consistency are critical. To better evaluate these properties, we emphasize two indicators: task success rate as a measure of overall effectiveness, and consistency metrics such as Pass$^k$ to capture reliability across multiple trials. To address the limitations of existing approaches, we propose MemOrb, a lightweight and plug-and-play verbal reinforcement memory layer that distills multi-turn interactions into compact strategy reflections. These reflections are stored in a shared memory bank and retrieved to guide decision-making, without requiring any fine-tuning. Experiments show that MemOrb significantly improves both success rate and stability, achieving up to a 63 percentage-point gain in multi-turn success rate and delivering more consistent performance across repeated trials. Our results demonstrate that structured reflection is a powerful mechanism for enhancing long-term reliability of frozen LLM agents in customer service scenarios.
CLAug 27, 2025
Survey of Specialized Large Language ModelChenghan Yang, Ruiyu Zhao, Yang Liu et al.
The rapid evolution of specialized large language models (LLMs) has transitioned from simple domain adaptation to sophisticated native architectures, marking a paradigm shift in AI development. This survey systematically examines this progression across healthcare, finance, legal, and technical domains. Besides the wide use of specialized LLMs, technical breakthrough such as the emergence of domain-native designs beyond fine-tuning, growing emphasis on parameter efficiency through sparse computation and quantization, increasing integration of multimodal capabilities and so on are applied to recent LLM agent. Our analysis reveals how these innovations address fundamental limitations of general-purpose LLMs in professional applications, with specialized models consistently performance gains on domain-specific benchmarks. The survey further highlights the implications for E-Commerce field to fill gaps in the field.
CVJul 19, 2019
A Multi-Scale Mapping Approach Based on a Deep Learning CNN Model for Reconstructing High-Resolution Urban DEMsLing Jiang, Yang Hu, Xilin Xia et al.
The shortage of high-resolution urban digital elevation model (DEM) datasets has been a challenge for modelling urban flood and managing its risk. A solution is to develop effective approaches to reconstruct high-resolution DEMs from their low-resolution equivalents that are more widely available. However, the current high-resolution DEM reconstruction approaches mainly focus on natural topography. Few attempts have been made for urban topography which is typically an integration of complex man-made and natural features. This study proposes a novel multi-scale mapping approach based on convolutional neural network (CNN) to deal with the complex characteristics of urban topography and reconstruct high-resolution urban DEMs. The proposed multi-scale CNN model is firstly trained using urban DEMs that contain topographic features at different resolutions, and then used to reconstruct the urban DEM at a specified (high) resolution from a low-resolution equivalent. A two-level accuracy assessment approach is also designed to evaluate the performance of the proposed urban DEM reconstruction method, in terms of numerical accuracy and morphological accuracy. The proposed DEM reconstruction approach is applied to a 121 km2 urbanized area in London, UK. Compared with other commonly used methods, the current CNN based approach produces superior results, providing a cost-effective innovative method to acquire high-resolution DEMs in other data-scarce environments.