CLAILGJan 28, 2021

BERTaú: Itaú BERT for digital customer service

arXiv:2101.12015v315 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient, domain-specific NLP models in banking customer service, though it is incremental as it adapts an existing BERT architecture to a specialized dataset.

The paper tackles the problem of creating a Portuguese financial domain language model for digital customer service, resulting in BERTaú, which improves performance by 22% in FAQ retrieval MRR, 2.1% in sentiment analysis F1, and 4.4% in NER F1, while using 66% fewer tokens.

In the last few years, three major topics received increased interest: deep learning, NLP and conversational agents. Bringing these three topics together to create an amazing digital customer experience and indeed deploy in production and solve real-world problems is something innovative and disruptive. We introduce a new Portuguese financial domain language representation model called BERTaú. BERTaú is an uncased BERT-base trained from scratch with data from the Itaú virtual assistant chatbot solution. Our novel contribution is that BERTaú pretrained language model requires less data, reached state-of-the-art performance in three NLP tasks, and generates a smaller and lighter model that makes the deployment feasible. We developed three tasks to validate our model: information retrieval with Frequently Asked Questions (FAQ) from Itaú bank, sentiment analysis from our virtual assistant data, and a NER solution. All proposed tasks are real-world solutions in production on our environment and the usage of a specialist model proved to be effective when compared to Google BERT multilingual and the DPRQuestionEncoder from Facebook, available at Hugging Face. The BERTaú improves the performance in 22% of FAQ Retrieval MRR metric, 2.1% in Sentiment Analysis F1 score, 4.4% in NER F1 score and can also represent the same sequence in up to 66% fewer tokens when compared to "shelf models".

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes