Christoffer Loeffler

LG
h-index16
6papers
21citations
Novelty44%
AI Score38

6 Papers

LGMay 25
Retrieval-Augmented Detection of Potentially Abusive Clauses in Chilean Terms of Service

Christoffer Loeffler, Tomás Rey Pizarro, Daniel Ignacio Miranda Vásquez et al.

Online Terms of Service often function as contracts of adhesion, creating asymmetries that may expose consumers to potentially abusive clauses. In Chile, assessing such clauses is legally challenging because some provisions clearly violate mandatory consumer law, whereas others depend on broader standards such as good faith and contractual imbalance. We present a retrieval-augmented generation framework for the automated detection and classification of potentially abusive clauses in Chilean Terms of Service. Designed for local execution, it combines efficient clause detection, hybrid dense--sparse retrieval, reranking, and prompt augmentation to support medium-sized open-weight language models. We also introduce the Chilean Abusive Terms of Service Extended corpus, comprising 100 contracts and 10,029 annotated clauses in 24 legally grounded categories spanning illegal, dark, and gray clauses. Experiments comparing commercial and open-weight language models, fine-tuned encoders, and traditional baselines show that retrieval-augmented prompting substantially improves performance and enables local models to approach larger cloud-based systems at lower computational and token cost. The study also contributes a refined legal annotation scheme and a practical design for AI-assisted consumer contract review.

LGJul 26, 2022
Active Learning of Ordinal Embeddings: A User Study on Football Data

Christoffer Loeffler, Kion Fallah, Stefano Fenu et al.

Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function. Distance metrics can only serve as proxy for similarity in information retrieval of similar instances. Learning a good similarity function from human annotations improves the quality of retrievals. This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset. We adapt an entropy-based active learning method with recent work from triplet mining to collect easy-to-answer but still informative annotations from human participants and use them to train a deep convolutional network that generalizes to unseen samples. Our user study shows that our approach improves the quality of the information retrieval compared to a previous deep metric learning approach that relies on a Siamese network. Specifically, we shed light on the strengths and weaknesses of passive sampling heuristics and active learners alike by analyzing the participants' response efficacy. To this end, we collect accuracy, algorithmic time complexity, the participants' fatigue and time-to-response, qualitative self-assessment and statements, as well as the effects of mixed-expertise annotators and their consistency on model performance and transfer-learning.

CVMar 14, 2022
Don't Get Me Wrong: How to Apply Deep Visual Interpretations to Time Series

Christoffer Loeffler, Wei-Cheng Lai, Bjoern Eskofier et al.

The correct interpretation of convolutional models is a hard problem for time series data. While saliency methods promise visual validation of predictions for image and language processing, they fall short when applied to time series. These tend to be less intuitive and represent highly diverse data, such as the tool-use time series dataset. Furthermore, saliency methods often generate varied, conflicting explanations, complicating the reliability of these methods. Consequently, a rigorous objective assessment is necessary to establish trust in them. This paper investigates saliency methods on time series data to formulate recommendations for interpreting convolutional models and implements them on the tool-use time series problem. To achieve this, we first employ nine gradient-, propagation-, or perturbation-based post-hoc saliency methods across six varied and complex real-world datasets. Next, we evaluate these methods using five independent metrics to generate recommendations. Subsequently, we implement a case study focusing on tool-use time series using convolutional classification models. Our results validate our recommendations that indicate that none of the saliency methods consistently outperforms others on all metrics, while some are sometimes ahead. Our insights and step-by-step guidelines allow experts to choose suitable saliency methods for a given model and dataset.

CVApr 2, 2025
Understanding Cross-Model Perceptual Invariances Through Ensemble Metamers

Lukas Boehm, Jonas Leo Mueller, Christoffer Loeffler et al.

Understanding the perceptual invariances of artificial neural networks is essential for improving explainability and aligning models with human vision. Metamers - stimuli that are physically distinct yet produce identical neural activations - serve as a valuable tool for investigating these invariances. We introduce a novel approach to metamer generation by leveraging ensembles of artificial neural networks, capturing shared representational subspaces across diverse architectures, including convolutional neural networks and vision transformers. To characterize the properties of the generated metamers, we employ a suite of image-based metrics that assess factors such as semantic fidelity and naturalness. Our findings show that convolutional neural networks generate more recognizable and human-like metamers, while vision transformers produce realistic but less transferable metamers, highlighting the impact of architectural biases on representational invariances.

CLFeb 2, 2025
Predicting potentially abusive clauses in Chilean terms of services with natural language processing

Christoffer Loeffler, Andrea Martínez Freile, Tomás Rey Pizarro

This study addresses the growing concern of information asymmetry in consumer contracts, exacerbated by the proliferation of online services with complex Terms of Service that are rarely even read. Even though research on automatic analysis methods is conducted, the problem is aggravated by the general focus on English-language Machine Learning approaches and on major jurisdictions, such as the European Union. We introduce a new methodology and a substantial dataset addressing this gap. We propose a novel annotation scheme with four categories and a total of 20 classes, and apply it on 50 online Terms of Service used in Chile. Our evaluation of transformer-based models highlights how factors like language- and/or domain-specific pre-training, few-shot sample size, and model architecture affect the detection and classification of potentially abusive clauses. Results show a large variability in performance for the different tasks and models, with the highest macro-F1 scores for the detection task ranging from 79% to 89% and micro-F1 scores up to 96%, while macro-F1 scores for the classification task range from 60% to 70% and micro-F1 scores from 64% to 80%. Notably, this is the first Spanish-language multi-label classification dataset for legal clauses, applying Chilean law and offering a comprehensive evaluation of Spanish-language models in the legal domain. Our work lays the ground for future research in method development for rarely considered legal analysis and potentially leads to practical applications to support consumers in Chile and Latin America as a whole.

LGJul 9, 2020
IALE: Imitating Active Learner Ensembles

Christoffer Loeffler, Christopher Mutschler

Active learning (AL) prioritizes the labeling of the most informative data samples. However, the performance of AL heuristics depends on the structure of the underlying classifier model and the data. We propose an imitation learning scheme that imitates the selection of the best expert heuristic at each stage of the AL cycle in a batch-mode pool-based setting. We use DAGGER to train the policy on a dataset and later apply it to datasets from similar domains. With multiple AL heuristics as experts, the policy is able to reflect the choices of the best AL heuristics given the current state of the AL process. Our experiment on well-known datasets show that we both outperform state of the art imitation learners and heuristics.