IR AIDec 5, 2024

HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning

Manish Bhattarai, Ryan Barron, Maksim Eren, Minh Vu, Vesselin Grantcharov, Ismael Boureima, Valentin Stanev, Cynthia Matuszek, Vladimir Valtchinov, Kim Rasmussen, Boian Alexandrov

arXiv:2412.04661v116.612 citationsh-index: 25Has Code

Originality Highly original

AI Analysis

This addresses the need for efficient domain adaptation in retrieval-augmented generation, offering a novel method that is less computationally intensive than full fine-tuning.

The paper tackles the problem of aligning large language model embeddings with domain-specific content to improve retrieval relevance and reduce hallucinations in retrieval-augmented generation, achieving enhanced retrieval and classification across domains like healthcare and cybersecurity.

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external document retrieval to provide domain-specific or up-to-date knowledge. The effectiveness of RAG depends on the relevance of retrieved documents, which is influenced by the semantic alignment of embeddings with the domain's specialized content. Although full fine-tuning can align language models to specific domains, it is computationally intensive and demands substantial data. This paper introduces Hierarchical Embedding Alignment Loss (HEAL), a novel method that leverages hierarchical fuzzy clustering with matrix factorization within contrastive learning to efficiently align LLM embeddings with domain-specific content. HEAL computes level/depth-wise contrastive losses and incorporates hierarchical penalties to align embeddings with the underlying relationships in label hierarchies. This approach enhances retrieval relevance and document classification, effectively reducing hallucinations in LLM outputs. In our experiments, we benchmark and evaluate HEAL across diverse domains, including Healthcare, Material Science, Cyber-security, and Applied Maths.

View on arXiv PDF Code

Similar