SYOct 10, 2022
Reducing Action Space: Reference-Model-Assisted Deep Reinforcement Learning for Inverter-based Volt-Var ControlQiong Liu, Ye Guo, Lirong Deng et al.
Reference-model-assisted deep reinforcement learning (DRL) for inverter-based Volt-Var Control (IB-VVC) in active distribution networks is proposed. We investigate that a large action space increases the learning difficulties of DRL and degrades the optimization performance in the process of generating data and training neural networks. To reduce the action space of DRL, we design a reference-model-assisted DRL approach. We introduce definitions of the reference model, reference-model-based optimization, and reference actions. The reference-model-assisted DRL learns the residual actions between the reference actions and optimal actions, rather than learning the optimal actions directly. Since the residual actions are considerably smaller than the optimal actions for a reference model, we can design a smaller action space for the reference-model-assisted DRL. It reduces the learning difficulties of DRL and optimises the performance of the reference-model-assisted DRL approach. It is noteworthy that the reference-model-assisted DRL approach is compatible with any policy gradient DRL algorithms for continuous action problems. This work takes the soft actor-critic algorithm as an example and designs a reference-model-assisted soft actor-critic algorithm. Simulations show that 1) large action space degrades the performance of DRL in the whole training stage, and 2) reference-model-assisted DRL requires fewer iteration times and returns a better optimization performance.
CVAug 8, 2025Code
Can Large Models Fool the Eye? A New Turing Test for Biological AnimationZijian Chen, Lirong Deng, Zhengyu Chen et al.
Evaluating the abilities of large models and manifesting their gaps are challenging. Current benchmarks adopt either ground-truth-based score-form evaluation on static datasets or indistinct textual chatbot-style human preferences collection, which may not provide users with immediate, intuitive, and perceptible feedback on performance differences. In this paper, we introduce BioMotion Arena, a novel framework for evaluating large language models (LLMs) and multimodal large language models (MLLMs) via visual animation. Our methodology draws inspiration from the inherent visual perception of motion patterns characteristic of living organisms that utilizes point-light source imaging to amplify the performance discrepancies between models. Specifically, we employ a pairwise comparison evaluation and collect more than 45k votes for 53 mainstream LLMs and MLLMs on 90 biological motion variants. Data analyses show that the crowd-sourced human votes are in good agreement with those of expert raters, demonstrating the superiority of our BioMotion Arena in offering discriminative feedback. We also find that over 90\% of evaluated models, including the cutting-edge open-source InternVL3 and proprietary Claude-4 series, fail to produce fundamental humanoid point-light groups, much less smooth and biologically plausible motions. This enables BioMotion Arena to serve as a challenging benchmark for performance visualization and a flexible evaluation framework without restrictions on ground-truth.
CVSep 6, 2025Code
PictOBI-20k: Unveiling Large Multimodal Models in Visual Decipherment for Pictographic Oracle Bone CharactersZijian Chen, Wenjie Hua, Jinhao Li et al.
Deciphering oracle bone characters (OBCs), the oldest attested form of written Chinese, has remained the ultimate, unwavering goal of scholars, offering an irreplaceable key to understanding humanity's early modes of production. Current decipherment methodologies of OBC are primarily constrained by the sporadic nature of archaeological excavations and the limited corpus of inscriptions. With the powerful visual perception capability of large multimodal models (LMMs), the potential of using LMMs for visually deciphering OBCs has increased. In this paper, we introduce PictOBI-20k, a dataset designed to evaluate LMMs on the visual decipherment tasks of pictographic OBCs. It includes 20k meticulously collected OBC and real object images, forming over 15k multi-choice questions. We also conduct subjective annotations to investigate the consistency of the reference point between humans and LMMs in visual reasoning. Experiments indicate that general LMMs possess preliminary visual decipherment skills, and LMMs are not effectively using visual information, while most of the time they are limited by language priors. We hope that our dataset can facilitate the evaluation and optimization of visual attention in future OBC-oriented LMMs. The code and dataset will be available at https://github.com/OBI-Future/PictOBI-20k.
CVAug 9, 2025
MMReID-Bench: Unleashing the Power of MLLMs for Effective and Versatile Person Re-identificationJinhao Li, Zijian Chen, Lirong Deng et al.
Person re-identification (ReID) aims to retrieve the images of an interested person in the gallery images, with wide applications in medical rehabilitation, abnormal behavior detection, and public security. However, traditional person ReID models suffer from uni-modal capability, leading to poor generalization ability in multi-modal data, such as RGB, thermal, infrared, sketch images, textual descriptions, etc. Recently, the emergence of multi-modal large language models (MLLMs) shows a promising avenue for addressing this problem. Despite this potential, existing methods merely regard MLLMs as feature extractors or caption generators, which do not fully unleash their reasoning, instruction-following, and cross-modal understanding capabilities. To bridge this gap, we introduce MMReID-Bench, the first multi-task multi-modal benchmark specifically designed for person ReID. The MMReID-Bench includes 20,710 multi-modal queries and gallery images covering 10 different person ReID tasks. Comprehensive experiments demonstrate the remarkable capabilities of MLLMs in delivering effective and versatile person ReID. Nevertheless, they also have limitations in handling a few modalities, particularly thermal and infrared data. We hope MMReID-Bench can facilitate the community to develop more robust and generalizable multimodal foundation models for person ReID.
AIMar 30, 2022
Reducing Learning Difficulties: One-Step Two-Critic Deep Reinforcement Learning for Inverter-based Volt-Var ControlQiong Liu, Ye Guo, Lirong Deng et al.
A one-step two-critic deep reinforcement learning (OSTC-DRL) approach for inverter-based volt-var control (IB-VVC) in active distribution networks is proposed in this paper. Firstly, considering IB-VVC can be formulated as a single-period optimization problem, we formulate the IB-VVC as a one-step Markov decision process rather than the standard Markov decision process, which simplifies the DRL learning task. Then we design the one-step actor-critic DRL scheme which is a simplified version of recent DRL algorithms, and it avoids the issue of Q value overestimation successfully. Furthermore, considering two objectives of VVC: minimizing power loss and eliminating voltage violation, we utilize two critics to approximate the rewards of two objectives separately. It simplifies the approximation tasks of each critic, and avoids the interaction effect between two objectives in the learning process of critic. The OSTC-DRL approach integrates the one-step actor-critic DRL scheme and the two-critic technology. Based on the OSTC-DRL, we design two centralized DRL algorithms. Further, we extend the OSTC-DRL to multi-agent OSTC-DRL for decentralized IB-VVC and design two multi-agent DRL algorithms. Simulations demonstrate that the proposed OSTC-DRL has a faster convergence rate and a better control performance, and the multi-agent OSTC-DRL works well for decentralized IB-VVC problems.