CLJul 4, 2024
LLMAEL: Large Language Models are Good Context Augmenters for Entity LinkingAmy Xin, Yunjia Qi, Zijun Yao et al. · pku
Specialized entity linking (EL) models are well-trained at mapping mentions to unique knowledge base (KB) entities according to a given context. However, specialized EL models struggle to disambiguate long-tail entities due to their limited training data. Meanwhile, extensively pre-trained large language models (LLMs) possess broader knowledge of uncommon entities. Yet, with a lack of specialized EL training, LLMs frequently fail to generate accurate KB entity names, limiting their standalone effectiveness in EL. With the observation that LLMs are more adept at context generation instead of EL execution, we introduce LLM-Augmented Entity Linking (LLMAEL), the first framework to enhance specialized EL models with LLM data augmentation. LLMAEL leverages off-the-shelf, tuning-free LLMs as context augmenters, generating entity descriptions to serve as additional input for specialized EL models. Experiments show that LLMAEL sets new state-of-the-art results across 6 widely adopted EL benchmarks: compared to prior methods that integrate tuning-free LLMs into EL, LLMAEL achieves an absolute 8.9% gain in EL accuracy. We release our code and datasets.
AIOct 23, 2025
Multi-Step Reasoning for Embodied Question Answering via Tool AugmentationMingliang Zhai, Hansheng Liang, Xiaomeng Fan et al.
Embodied Question Answering (EQA) requires agents to explore 3D environments to obtain observations and answer questions related to the scene. Existing methods leverage VLMs to directly explore the environment and answer questions without explicit thinking or planning, which limits their reasoning ability and results in excessive or inefficient exploration as well as ineffective responses. In this paper, we introduce ToolEQA, an agent that integrates external tools with multi-step reasoning, where external tools can provide more useful information for completing the task, helping the model derive better exploration directions in the next step of reasoning and thus obtaining additional effective information. This enables ToolEQA to generate more accurate responses with a shorter exploration distance. To enhance the model's ability for tool-usage and multi-step reasoning, we further design a novel EQA data generation pipeline that automatically constructs large-scale EQA tasks with reasoning trajectories and corresponding answers. Based on the pipeline, we collect the EQA-RT dataset that contains about 18K tasks, divided into a training set EQA-RT-Train, and two test sets EQA-RT-Seen (scenes overlapping with the training set) and EQA-RT-Unseen (novel scenes). Experiments on EQA-RT-Seen and EQA-RT-Unseen show that ToolEQA improves the success rate by 9.2~20.2% over state-of-the-art baselines, while outperforming the zero-shot ToolEQA by 10% in success rate. In addition, ToolEQA also achieves state-of-the-art performance on the HM-EQA, OpenEQA, and EXPRESS-Bench datasets, demonstrating its generality. Our homepage see https://tooleqa.github.io.