AIMay 6
FinRAG-12B: A Production-Validated Recipe for Grounded Question Answering in BankingDenys Katerenchuk, Pablo Duboue, Keelan Evanini et al.
Large language models (LLMs) are rapidly being adopted across various domains. However, their adoption in banking industry faces resistance due to demands for high accuracy, regulatory compliance, and the need for verifiable and grounded responses. We present a unified, data-efficient framework for training grounded domain-specific LLMs that optimizes answer quality, citation grounding, and calibrated refusal under real-world deployment constraints. First, we describe a data generation pipeline that combines LLM-as-a-Judge filtering, citation annotation, and curriculum learning with only 143M tokens. The resulting 12B model achieves high answer quality outperforming GPT-4.1 on citation grounding, with a modest citation tradeoff versus the untuned base. Second, we propose a calibrated refusal mechanism: training on 22% unanswerable examples yield a 12% "I don't know" rate, substantially improving over the base model's unsafe 4.3% rate while avoiding GPT-4.1's over-refusal (20.2%). Third, we present an end-to-end methodology spanning from data curation to quantized serving. The system is deployed at 40+ financial institutions, achieving a 7.1 percentage point improvement in query resolution (p < 0.001). Additionally, the model delivers 3-5x faster responses at 20-50x lower cost compared to GPT-4.1.
CLMay 22, 2024
''You should probably read this'': Hedge Detection in TextDenys Katerenchuk, Rivka Levitan
Humans express ideas, beliefs, and statements through language. The manner of expression can carry information indicating the author's degree of confidence in their statement. Understanding the certainty level of a claim is crucial in areas such as medicine, finance, engineering, and many others where errors can lead to disastrous results. In this work, we apply a joint model that leverages words and part-of-speech tags to improve hedge detection in text and achieve a new top score on the CoNLL-2010 Wikipedia corpus.
SOC-PHJan 5, 2025
Traits of a Leader: User Influence Level Prediction through Sociolinguistic ModelingDenys Katerenchuk, Rivka Levitan
Recognition of a user's influence level has attracted much attention as human interactions move online. Influential users have the ability to sway others' opinions to achieve some goals. As a result, predicting users' level of influence can help to understand social networks, forecast trends, prevent misinformation, etc. However, predicting user influence is a challenging problem because the concept of influence is specific to a situation or a domain, and user communications are limited to text. In this work, we define user influence level as a function of community endorsement and develop a model that significantly outperforms the baseline by leveraging demographic and personality data. This approach consistently improves RankDCG scores across eight different domains.
CLDec 20, 2018
A Survey of Hierarchy Identification in Social NetworksDenys Katerenchuk
Humans are social by nature. Throughout history, people have formed communities and built relationships. Most relationships with coworkers, friends, and family are developed during face-to-face interactions. These relationships are established through explicit means of communications such as words and implicit such as intonation, body language, etc. By analyzing human interactions we can derive information about the relationships and influence among conversation participants. However, with the development of the Internet, people started to communicate through text in online social networks. Interestingly, they brought their communicational habits to the Internet. Many social network users form relationships with each other and establish communities with leaders and followers. Recognizing these hierarchical relationships is an important task because it will help to understand social networks and predict future trends, improve recommendations, better target advertisement, and improve national security by identifying leaders of anonymous terror groups. In this work, I provide an overview of current research in this area and present the state-of-the-art approaches to deal with the problem of identifying hierarchical relationships in social networks.
CYApr 6, 2018
Understanding Actors and Evaluating Personae with Gaussian EmbeddingsHannah Kim, Denys Katerenchuk, Daniel Billet et al.
Understanding narrative content has become an increasingly popular topic. Nonetheless, research on identifying common types of narrative characters, or personae, is impeded by the lack of automatic and broad-coverage evaluation methods. We argue that computationally modeling actors provides benefits, including novel evaluation mechanisms for personae. Specifically, we propose two actor-modeling tasks, cast prediction and versatility ranking, which can capture complementary aspects of the relation between actors and the characters they portray. For an actor model, we present a technique for embedding actors, movies, character roles, genres, and descriptive keywords as Gaussian distributions and translation vectors, where the Gaussian variance corresponds to actors' versatility. Empirical results indicate that (1) the technique considerably outperforms TransE (Bordes et al. 2013) and ablation baselines and (2) automatically identified persona topics (Bamman, O'Connor, and Smith 2013) yield statistically significant improvements in both tasks, whereas simplistic persona descriptors including age and gender perform inconsistently, validating prior research.
CLMar 2, 2018
Age Group Classification with Speech and Metadata Multimodality FusionDenys Katerenchuk
Children comprise a significant proportion of TV viewers and it is worthwhile to customize the experience for them. However, identifying who is a child in the audience can be a challenging task. Identifying gender and age from audio commands is a well-studied problem but is still very challenging to get good accuracy when the utterances are typically only a couple of seconds long. We present initial studies of a novel method which combines utterances with user metadata. In particular, we develop an ensemble of different machine learning techniques on different subsets of data to improve child detection. Our initial results show a 9.2\% absolute improvement over the baseline, leading to a state-of-the-art performance.
IRMar 2, 2018
RankDCG: Rank-Ordering Evaluation MeasureDenys Katerenchuk, Andrew Rosenberg
Ranking is used for a wide array of problems, most notably information retrieval (search). There are a number of popular approaches to the evaluation of ranking such as Kendall's $τ$, Average Precision, and nDCG. When dealing with problems such as user ranking or recommendation systems, all these measures suffer from various problems, including an inability to deal with elements of the same rank, inconsistent and ambiguous lower bound scores, and an inappropriate cost function. We propose a new measure, rankDCG, that addresses these problems. This is a modification of the popular nDCG algorithm. We provide a number of criteria for any effective ranking algorithm and show that only rankDCG satisfies all of them. Results are presented on constructed and real data sets. We release a publicly available rankDCG evaluation package.