Slot Filling as a Reasoning Task for SpeechLLMs
This addresses slot-filling accuracy in speech processing, though it appears incremental in adapting existing reasoning methods to speechLLMs.
The authors tackled slot-filling in speech large language models by integrating reasoning through a chain-of-thought framework, showing performance improvements with intermediate steps but finding that reasoning LLMs from math/logic domains underperform as foundation models.
We propose integration of reasoning into speech large language models (speechLLMs) for the end-to-end slot-filling task. Inspired by the recent development of reasoning LLMs, we use a chain-of-thought framework to decompose the slot-filling task into multiple reasoning steps, create a reasoning dataset and apply the supervised fine-tuning strategy to a speechLLM. We distinguish between regular and reasoning speechLLMs and experiment with different types and sizes of LLMs as their text foundation models. We demonstrate performance improvements by introducing reasoning (intermediate) steps. However, we show that a reasoning textual LLM developed mainly for math, logic and coding domains might be inferior as a foundation model for a reasoning speechLLM. We further show that hybrid speechLLMs, built on a hybrid text foundation LLM and fine-tuned to preserve both direct and reasoning modes of operation, have better performance than those fine-tuned employing only one mode of operation.