Improving Speech Recognition Accuracy of Local POI Using Geographical Models
This work addresses the problem of inaccurate voice search for local POI due to multi-dialect and massive data, which is incremental as it builds on existing speech recognition methods with geographic adaptations.
This paper tackled the challenge of speech recognition for local points of interest (POI) by proposing a geographic acoustic model (Geo-AM) and geo-specific language models (Geo-LMs), resulting in a 6.5% to 10.1% relative character error rate reduction on an accent testset and over 18.7% reduction on a Tencent Map task.
Nowadays voice search for points of interest (POI) is becoming increasingly popular. However, speech recognition for local POI has remained to be a challenge due to multi-dialect and massive POI. This paper improves speech recognition accuracy for local POI from two aspects. Firstly, a geographic acoustic model (Geo-AM) is proposed. The Geo-AM deals with multi-dialect problem using dialect-specific input feature and dialect-specific top layer. Secondly, a group of geo-specific language models (Geo-LMs) are integrated into our speech recognition system to improve recognition accuracy of long tail and homophone POI. During decoding, specific language models are selected on demand according to users' geographic location. Experiments show that the proposed Geo-AM achieves 6.5%$\sim$10.1% relative character error rate (CER) reduction on an accent testset and the proposed Geo-AM and Geo-LM totally achieve over 18.7% relative CER reduction on Tencent Map task.