CLFeb 24, 2025

HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization

Zhenghao Liu, Haolan Wang, Xinze Li, Qiushi Xiong, Xiaocui Yang, Yu Gu, Yukun Yan, Qi Shi, Fangfang Li, Ge Yu, Maosong Sun

arXiv:2502.17315v110.94 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of capturing structural semantics in tabular data for AI systems, representing an incremental advancement in multimodal learning for table reasoning.

The paper tackles the problem of enhancing table understanding in large language models by introducing HIPPO, a hybrid-modal preference optimization method that uses both text and image representations of tables, achieving a 4% improvement over existing models on table question answering and fact verification tasks.

Tabular data contains rich structural semantics and plays a crucial role in organizing and manipulating information. To better capture these structural semantics, this paper introduces the HybrId-modal Preference oPtimizatiOn (HIPPO) model, which represents tables using both text and image, and optimizes MLLMs to effectively learn more comprehensive table information from these multiple modalities. Specifically, HIPPO samples model responses from hybrid-modal table representations and designs a modality-consistent sampling strategy to enhance response diversity and mitigate modality bias during DPO training. Experimental results on table question answering and table fact verification tasks demonstrate the effectiveness of HIPPO, achieving a 4% improvement over various table reasoning models. Further analysis reveals that HIPPO not only enhances reasoning abilities based on unimodal table representations but also facilitates the extraction of crucial and distinct semantics from different modal representations. All data and codes are available at https://github.com/NEUIR/HIPPO.

View on arXiv PDF Code

Similar