Carolina Camassa

h-index1

4papers

153citations

Novelty29%

AI Score38

Ranked #87,319 of 194,257 authors (top 45%)#16,454 in CL (top 53%)

4 Papers

0.5CLSep 7, 2023

Loquacity and Visible Emotion: ChatGPT as a Policy Advisor

Claudia Biancotti, Carolina Camassa

ChatGPT, a software seeking to simulate human conversational abilities, is attracting increasing attention. It is sometimes portrayed as a groundbreaking productivity aid, including for creative work. In this paper, we run an experiment to assess its potential in complex writing tasks. We ask the software to compose a policy brief for the Board of the Bank of Italy. We find that ChatGPT can accelerate workflows by providing well-structured content suggestions, and by producing extensive, linguistically correct text in a matter of seconds. It does, however, require a significant amount of expert supervision, which partially offsets productivity gains. If the app is used naively, output can be incorrect, superficial, or irrelevant. Superficiality is an especially problematic limitation in the context of policy advice intended for high-level audiences.

31.1CYOct 16, 2023

Legal NLP Meets MiCAR: Advancing the Analysis of Crypto White Papers

Carolina Camassa

In the rapidly evolving field of crypto assets, white papers are essential documents for investor guidance, and are now subject to unprecedented content requirements under the European Union's Markets in Crypto-Assets Regulation (MiCAR). Natural Language Processing (NLP) can serve as a powerful tool for both analyzing these documents and assisting in regulatory compliance. This paper delivers two contributions to the topic. First, we survey existing applications of textual analysis to unregulated crypto asset white papers, uncovering a research gap that could be bridged with interdisciplinary collaboration. We then conduct an analysis of the changes introduced by MiCAR, highlighting the opportunities and challenges of integrating NLP within the new regulatory framework. The findings set the stage for further research, with the potential to benefit regulators, crypto asset issuers, and investors.

6.5CLMay 19

Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs

Carolina Camassa, Derek Shiller

Language models are trained to follow instructions, but they are also powerful pattern completers. What happens when these two objectives conflict? We construct conversations in which a user instruction to behave in a target way T (e.g., always output a specific token, answer in a particular language, or adopt a persona) is opposed by N hardcoded assistant turns demonstrating a competing pattern P. We then measure instruction-following (IF) rates in this setting, across 13 models and 16 different instructions, for up to 50 turns. Average instruction-following rates range from 1% to 99% across models, largely uncorrelated with standard capability benchmarks. The transition from instruction-following to pattern-following is universal but highly model-dependent. Robustness is modulated both by instruction content, with models resisting induction longer when instructions align with their trained value priors, and by output format, with diverse multi-token responses proving substantially more resistant than single-token outputs. Chain-of-thought reasoning improves robustness but does not eliminate susceptibility, and can produce dissociation between correct deliberation and incorrect output. When asked to predict their behavior in this setting, models achieve 83.5% accuracy on average but systematically underestimate their own resistance to induction pressure. These results suggest that instruction-following remains brittle under induction pressure even for otherwise capable models, and that output diversity, rather than semantic engagement with the input, is the primary factor predicting robustness.

14.5CYNov 1, 2024Code

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

Claudia Biancotti, Carolina Camassa, Andrea Coletta et al.

Advancements in large language models (LLMs) have renewed concerns about AI alignment - the consistency between human and AI goals and values. As various jurisdictions enact legislation on AI safety, the concept of alignment must be defined and measured across different domains. This paper proposes an experimental framework to assess whether LLMs adhere to ethical and legal standards in the relatively unexplored context of finance. We prompt twelve LLMs to impersonate the CEO of a financial institution and test their willingness to misuse customer assets to repay outstanding corporate debt. Beginning with a baseline configuration, we adjust preferences, incentives and constraints, analyzing the impact of each adjustment with logistic regression. Our findings reveal significant heterogeneity in the baseline propensity for unethical behavior of LLMs. Factors such as risk aversion, profit expectations, and regulatory environment consistently influence misalignment in ways predicted by economic theory, although the magnitude of these effects varies across LLMs. This paper highlights both the benefits and limitations of simulation-based, ex post safety testing. While it can inform financial authorities and institutions aiming to ensure LLM safety, there is a clear trade-off between generality and cost.