CL AI IR LG SD ASAug 8, 2024

wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech

Khai Le-Duc, Quy-Anh Dang, Tan-Hanh Pham, Truong-Son Hy

arXiv:2408.04174v11.91 citationsh-index: 5Has Code

Originality Incremental advance

AI Analysis

This addresses the neglect of speech data in knowledge graphs, which could enhance LLMs and search engines, but it is incremental as it applies existing GNN methods to a new modality.

The paper tackles the problem of knowledge graphs being limited to text data by introducing wav2graph, the first framework for supervised learning of knowledge graphs from speech, achieving baseline results for node classification and link prediction tasks on human and ASR transcripts.

Knowledge graphs (KGs) enhance the performance of large language models (LLMs) and search engines by providing structured, interconnected data that improves reasoning and context-awareness. However, KGs only focus on text data, thereby neglecting other modalities such as speech. In this work, we introduce wav2graph, the first framework for supervised learning knowledge graph from speech data. Our pipeline are straightforward: (1) constructing a KG based on transcribed spoken utterances and a named entity database, (2) converting KG into embedding vectors, and (3) training graph neural networks (GNNs) for node classification and link prediction tasks. Through extensive experiments conducted in inductive and transductive learning contexts using state-of-the-art GNN models, we provide baseline results and error analysis for node classification and link prediction tasks on human transcripts and automatic speech recognition (ASR) transcripts, including evaluations using both encoder-based and decoder-based node embeddings, as well as monolingual and multilingual acoustic pre-trained models. All related code, data, and models are published online.

View on arXiv PDF Code

Similar