LGNov 18, 2023Code
Tactics2D: A Highly Modular and Extensible Simulator for Driving Decision-makingYueyuan Li, Songan Zhang, Mingyang Jiang et al.
Simulation is a prospective method for generating diverse and realistic traffic scenarios to aid in the development of driving decision-making systems. However, existing simulators often fall short in diverse scenarios or interactive behavior models for traffic participants. This deficiency underscores the need for a flexible, reliable, user-friendly open-source simulator. Addressing this challenge, Tactics2D adopts a modular approach to traffic scenario construction, encompassing road elements, traffic regulations, behavior models, physics simulations for vehicles, and event detection mechanisms. By integrating numerous commonly utilized algorithms and configurations, Tactics2D empowers users to construct their driving scenarios effortlessly, just like assembling building blocks. Users can effectively evaluate the performance of driving decision-making models across various scenarios by leveraging both public datasets and user-collected real-world data. For access to the source code and community support, please visit the official GitHub page for Tactics2D at https://github.com/WoodOxen/Tactics2D.
AIMay 14
Herculean: An Agentic Benchmark for Financial IntelligenceXueqing Peng, Zhuohan Xie, Yupeng Cao et al.
As AI agents improve, the central question is no longer whether they can solve isolated well-defined financial tasks, but whether they can reliably carry out financial professional work. Existing financial benchmarks offer only a partial view of this ability, as they primarily evaluate static competencies such as question answering, retrieval, summarization, and classification. We introduce Herculean, the first skilled benchmark for agentic financial intelligence spanning four representative workflows, including Trading, Hedging, Market Insights, and Auditing. Each workflow is instantiated as a standardized MCP-based skill environment with its own tools, interaction dynamics, constraints, and success criteria, enabling consistent end-to-end assessment of heterogeneous agent systems. Across frontier agents, we find agents perform relatively well on Trading and Market Insights, but struggle substantially on Hedging and Auditing, where long-horizon coordination, state consistency, and structured verification are critical. Overall, our results point to a key gap in current agents in turning financial reasoning into dependable workflow execution in high-stakes financial workflows.
LGMay 12
TRACE: Temporal Routing with Autoregressive Cross-channel Experts for EEG Representation LearningFan Ma, Qier An, Peng Chen et al.
Learning transferable representations for electroencephalography (EEG) remains challenging because EEG signals are inherently multi-channel and non-stationary. Channels observed at the same time provide coupled measurements of neural activity, while the relevant temporal dynamics vary across contexts. This structure is poorly matched by architectures that apply uniform computation across time or route each channel patch independently. To this end, we propose TRACE, an autoregressive EEG pre-training framework that predicts future EEG patches from causal context while performing temporally adaptive and cross-channel coherent computation. At each temporal step, TRACE derives an expert routing decision from the causal cross-channel history and applies it jointly to all channels at that step. This preserves instantaneous cross-channel coherence while allowing different temporal regimes to activate different computation. Since routing is defined over the available channel set and causal temporal context, TRACE is compatible with heterogeneous pre-training across corpora with different channel counts, montages, sequence lengths, and recording domains. Across eight downstream EEG benchmarks, TRACE is evaluated in both settings: when downstream domains are seen only as unlabeled pre-training data and when downstream datasets are completely unseen during pre-training. It obtains the best results on several benchmarks while remaining competitive on motor imagery and clinical event classification tasks, with ablations supporting the importance of cross-channel temporal routing.
CLFeb 1Code
Ebisu: Benchmarking Large Language Models in Japanese FinanceXueqing Peng, Ruoyu Xiang, Fan Zhang et al.
Japanese finance combines agglutinative, head-final linguistic structure, mixed writing systems, and high-context communication norms that rely on indirect expression and implicit commitment, posing a substantial challenge for LLMs. We introduce Ebisu, a benchmark for native Japanese financial language understanding, comprising two linguistically and culturally grounded, expert-annotated tasks: JF-ICR, which evaluates implicit commitment and refusal recognition in investor-facing Q&A, and JF-TE, which assesses hierarchical extraction and ranking of nested financial terminology from professional disclosures. We evaluate a diverse set of open-source and proprietary LLMs spanning general-purpose, Japanese-adapted, and financial models. Results show that even state-of-the-art systems struggle on both tasks. While increased model scale yields limited improvements, language- and domain-specific adaptation does not reliably improve performance, leaving substantial gaps unresolved. Ebisu provides a focused benchmark for advancing linguistically and culturally grounded financial NLP. All datasets and evaluation scripts are publicly released.
CLJun 16, 2025
MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial ApplicationXueqing Peng, Lingfei Qian, Yan Wang et al.
Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.
LGMay 30, 2025
ROAD: Responsibility-Oriented Reward Design for Reinforcement Learning in Autonomous DrivingYongming Chen, Miner Chen, Liewen Liao et al.
Reinforcement learning (RL) in autonomous driving employs a trial-and-error mechanism, enhancing robustness in unpredictable environments. However, crafting effective reward functions remains challenging, as conventional approaches rely heavily on manual design and demonstrate limited efficacy in complex scenarios. To address this issue, this study introduces a responsibility-oriented reward function that explicitly incorporates traffic regulations into the RL framework. Specifically, we introduced a Traffic Regulation Knowledge Graph and leveraged Vision-Language Models alongside Retrieval-Augmented Generation techniques to automate reward assignment. This integration guides agents to adhere strictly to traffic laws, thus minimizing rule violations and optimizing decision-making performance in diverse driving conditions. Experimental validations demonstrate that the proposed methodology significantly improves the accuracy of assigning accident responsibilities and effectively reduces the agent's liability in traffic incidents.
CVJul 2, 2018
PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic SegmentationMingyang Jiang, Yiran Wu, Tianqi Zhao et al.
Recently, 3D understanding research sheds light on extracting features from point cloud directly, which requires effective shape pattern description of point clouds. Inspired by the outstanding 2D shape descriptor SIFT, we design a module called PointSIFT that encodes information of different orientations and is adaptive to scale of shape. Specifically, an orientation-encoding unit is designed to describe eight crucial orientations, and multi-scale representation is achieved by stacking several orientation-encoding units. PointSIFT module can be integrated into various PointNet-based architecture to improve the representation ability. Extensive experiments show our PointSIFT-based framework outperforms state-of-the-art method on standard benchmark datasets. The code and trained model will be published accompanied by this paper.