CLSep 10, 2024
BERTScoreVisualizer: A Web Tool for Understanding Simplified Text Evaluation with BERTScoreSebastian Jaskowski, Sahasra Chava, Agam Shah · gatech
The BERTScore metric is commonly used to evaluate automatic text simplification systems. However, current implementations of the metric fail to provide complete visibility into all information the metric can produce. Notably, the specific token matchings can be incredibly useful in generating clause-level insight into the quality of simplified text. We address this by introducing BERTScoreVisualizer, a web application that goes beyond reporting precision, recall, and F1 score and provides a visualization of the matching between tokens. We believe that our software can help improve the analysis of text simplification systems by specifically showing where generated, simplified text deviates from reference text. We host our code and demo on GitHub.
CLFeb 18, 2024
Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market AnalysisAgam Shah, Arnav Hiray, Pratvi Shah et al. · gatech
In this paper, we investigate the influence of claims in analyst reports and earnings calls on financial market returns, considering them as significant quarterly events for publicly traded companies. To facilitate a comprehensive analysis, we construct a new financial dataset for the claim detection task in the financial domain. We benchmark various language models on this dataset and propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function, outperforming existing approaches. We also demonstrate the practical utility of our proposed model by constructing a novel measure of optimism. Here, we observe the dependence of earnings surprise and return on our optimism measure. Our dataset, models, and code are publicly (under CC BY 4.0 license) available on GitHub.
CLMay 15, 2025
Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications GloballyAgam Shah, Siddhant Sukhani, Huzaifa Pardawala et al. · gatech
Central banks around the world play a crucial role in maintaining economic stability. Deciphering policy implications in their communications is essential, especially as misinterpretations can disproportionately impact vulnerable populations. To address this, we introduce the World Central Banks (WCB) dataset, the most comprehensive monetary policy corpus to date, comprising over 380k sentences from 25 central banks across diverse geographic regions, spanning 28 years of historical data. After uniformly sampling 1k sentences per bank (25k total) across all available years, we annotate and review each sentence using dual annotators, disagreement resolutions, and secondary expert reviews. We define three tasks: Stance Detection, Temporal Classification, and Uncertainty Estimation, with each sentence annotated for all three. We benchmark seven Pretrained Language Models (PLMs) and nine Large Language Models (LLMs) (Zero-Shot, Few-Shot, and with annotation guide) on these tasks, running 15,075 benchmarking experiments. We find that a model trained on aggregated data across banks significantly surpasses a model trained on an individual bank's data, confirming the principle "the whole is greater than the sum of its parts." Additionally, rigorous human evaluations, error analyses, and predictive tasks validate our framework's economic utility. Our artifacts are accessible through the HuggingFace and GitHub under the CC-BY-NC-SA 4.0 license.