DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs
This addresses the problem of limited LLM progress in specialized fields like molecular analysis, though it appears incremental as it builds on retrieval-augmented methods for a specific domain.
The paper tackles the challenge of LLMs struggling with domain-specific syntax like biomolecules by proposing DRAK, a retrieval-augmented knowledge framework, which exceeds previous benchmarks on six molecular tasks in the Mol-Instructions dataset.
Large Language Models (LLMs) encounter challenges with the unique syntax of specific domains, such as biomolecules. Existing fine-tuning or modality alignment techniques struggle to bridge the domain knowledge gap and understand complex molecular data, limiting LLMs' progress in specialized fields. To overcome these limitations, we propose an expandable and adaptable non-parametric knowledge injection framework named Domain-specific Retrieval-Augmented Knowledge (DRAK), aimed at enhancing reasoning capabilities in specific domains. Utilizing knowledge-aware prompts and gold label-induced reasoning, DRAK has developed profound expertise in the molecular domain and the capability to handle a broad spectrum of analysis tasks. We evaluated two distinct forms of DRAK variants, proving that DRAK exceeds previous benchmarks on six molecular tasks within the Mol-Instructions dataset. Extensive experiments have underscored DRAK's formidable performance and its potential to unlock molecular insights, offering a unified paradigm for LLMs to tackle knowledge-intensive tasks in specific domains. Our code will be available soon.