Xinjia Yu

CL
h-index5
3papers
15citations
Novelty42%
AI Score31

3 Papers

CLJun 2, 2025
ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge

Chaoyue He, Xin Zhou, Yi Wu et al.

We introduce ESGenius, a comprehensive benchmark for evaluating and enhancing the proficiency of Large Language Models (LLMs) in Environmental, Social, and Governance (ESG) and sustainability-focused question answering. ESGenius comprises two key components: (i) ESGenius-QA, a collection of 1,136 Multiple-Choice Questions (MCQs) generated by LLMs and rigorously validated by domain experts, covering a broad range of ESG pillars and sustainability topics. Each question is systematically linked to its corresponding source text, enabling transparent evaluation and supporting Retrieval-Augmented Generation (RAG) methods; and (ii) ESGenius-Corpus, a meticulously curated repository of 231 foundational frameworks, standards, reports, and recommendation documents from 7 authoritative sources. Moreover, to fully assess the capabilities and adaptation potential of LLMs, we implement a rigorous two-stage evaluation protocol -- Zero-Shot and RAG. Extensive experiments across 50 LLMs (0.5B to 671B) demonstrate that state-of-the-art models achieve only moderate performance in zero-shot settings, with accuracies around 55--70%, highlighting a significant knowledge gap for LLMs in this specialized, interdisciplinary domain. However, models employing RAG demonstrate significant performance improvements, particularly for smaller models. For example, DeepSeek-R1-Distill-Qwen-14B improves from 63.82% (zero-shot) to 80.46% with RAG. These results demonstrate the necessity of grounding responses in authoritative sources for enhanced ESG understanding. To the best of our knowledge, ESGenius is the first comprehensive QA benchmark designed to rigorously evaluate LLMs on ESG and sustainability knowledge, providing a critical tool to advance trustworthy AI in this vital domain.

CVApr 5, 2019
Prediction-Tracking-Segmentation

Jianren Wang, Yihui He, Xiaobo Wang et al.

We introduce a prediction driven method for visual tracking and segmentation in videos. Instead of solely relying on matching with appearance cues for tracking, we build a predictive model which guides finding more accurate tracking regions efficiently. With the proposed prediction mechanism, we improve the model robustness against distractions and occlusions during tracking. We demonstrate significant improvements over state-of-the-art methods not only on visual tracking tasks (VOT 2016 and VOT 2018) but also on video segmentation datasets (DAVIS 2016 and DAVIS 2017).

CYJan 21, 2016
Emotional Interaction between Artificial Companion Agents and the Elderly

Xinjia Yu

Artificial companion agents are defined as hardware or software entities designed to provide companionship to a person. The senior population are facing a special demand for companionship. Artificial companion agents have been demonstrated to be useful in therapy, offering emotional companionship and facilitating socialization. However, there is lack of empirical studies on what the artificial agents should do and how they can communicate with human beings better. To address these functional research problems, we attempt to establish a model to guide artificial companion designers to meet the emotional needs of the elderly through fulfilling absent roles in their social interactions. We call this model the Role Fulfilling Model. This model aims to use role as a key concept to analyse the demands from the elderly for functionalities from an emotional perspective in artificial companion agent designs and technologies. To evaluate the effectiveness of this model, we proposed a serious game platform named Happily Aging in Place. This game will help us to involve a large scale of senior users through crowdsourcing to test our model and hypothesis. To improve the emotional communication between artificial companion agents and users, This book draft addresses an important but largely overlooked aspect of affective computing: how to enable companion agents to express mixed emotions with facial expressions? And furthermore, for different users, do individual heterogeneity affects the perception of the same facial expressions? Some preliminary results about gender differences have been found. The perception of facial expressions between different age groups or cultural backgrounds will be held in future study.