AIJul 17, 2022
Certain and Uncertain Inference with Indicative ConditionalsPaul Égré, Lorenzo Rossi, Jan Sprenger
This paper develops a trivalent semantics for the truth conditions and the probability of the natural language indicative conditional. Our framework rests on trivalent truth conditions first proposed by W. Cooper and yields two logics of conditional reasoning: (i) a logic C of inference from certain premises; and (ii) a logic U of inference from uncertain premises. But whereas C is monotonic for the conditional, U is not, and whereas C obeys Modus Ponens, U does not without restrictions. We show systematic correspondences between trivalent and probabilistic representations of inferences in either framework, and we use the distinction between the two systems to cast light, in particular, on McGee's puzzle about Modus Ponens. The result is a unified account of the semantics and epistemology of indicative conditionals that can be fruitfully applied to analyzing the validity of conditional inferences.
LGJul 20, 2024
Fairness Interventions: A Study in AI ExplainabilityThomas Souverain, Johnathan Nguyen, Nicolas Meric et al.
This paper presents a philosophical and experimental study of fairness interventions in AI classification, centered on the explainability of corrective methods. We argue that ensuring fairness requires not only satisfying a target criterion, but also explaining which variables constrain its realization. When corrections are used to mitigate advantage transparently, they must remain sensitive to the distribution of true labels. To illustrate this approach, we built FairDream, a fairness package whose mechanism is made transparent for lay users, increasing the model's weights of errors on disadvantaged groups. While a user may intend to achieve Demographic Parity by the correction method, experiments show that FairDream tends towards Equalized Odds, revealing a conservative bias inherent to the data environment. We clarify the relationship between these fairness criteria, analyze FairDream's reweighting process, and compare its trade-offs with closely related GridSearch models. Finally, we justify the normative preference for Equalized Odds via an epistemological interpretation of the results, using their proximity with Simpson's paradox. The paper thus unites normative, epistemological, and empirical explanations of fairness interventions, to ensure transparency for the users.
CLSep 12, 2023
Measuring vagueness and subjectivity in texts: from symbolic to neural VAGOBenjamin Icard, Vincent Claveau, Ghislain Atemezing et al.
We present a hybrid approach to the automated measurement of vagueness and subjectivity in texts. We first introduce the expert system VAGO, we illustrate it on a small benchmark of fact vs. opinion sentences, and then test it on the larger French press corpus FreSaDa to confirm the higher prevalence of subjective markers in satirical vs. regular texts. We then build a neural clone of VAGO, based on a BERT-like architecture, trained on the symbolic VAGO scores obtained on FreSaDa. Using explainability tools (LIME), we show the interest of this neural version for the enrichment of the lexicons of the symbolic version, and for the production of versions in other languages.
CLJul 4, 2024
HYBRINFOX at CheckThat! 2024 -- Task 2: Enriching BERT Models with the Expert System VAGO for Subjectivity DetectionMorgane Casanova, Julien Chanson, Benjamin Icard et al.
This paper presents the HYBRINFOX method used to solve Task 2 of Subjectivity detection of the CLEF 2024 CheckThat! competition. The specificity of the method is to use a hybrid system, combining a RoBERTa model, fine-tuned for subjectivity detection, a frozen sentence-BERT (sBERT) model to capture semantics, and several scores calculated by the English version of the expert system VAGO, developed independently of this task to measure vagueness and subjectivity in texts based on the lexicon. In English, the HYBRINFOX method ranked 1st with a macro F1 score of 0.7442 on the evaluation data. For the other languages, the method used a translation step into English, producing more mixed results (ranking 1st in Multilingual and 2nd in Italian over the baseline, but under the baseline in Bulgarian, German, and Arabic). We explain the principles of our hybrid approach, and outline ways in which the method could be improved for other languages besides English.
CLMar 24, 2024Code
A Multi-Label Dataset of French Fake News: Human and Machine InsightsBenjamin Icard, François Maine, Morgane Casanova et al.
We present a corpus of 100 documents, OBSINFOX, selected from 17 sources of French press considered unreliable by expert agencies, annotated using 11 labels by 8 annotators. By collecting more labels than usual, by more annotators than is typically done, we can identify features that humans consider as characteristic of fake news, and compare them to the predictions of automated classifiers. We present a topic and genre analysis using Gate Cloud, indicative of the prevalence of satire-like text in the corpus. We then use the subjectivity analyzer VAGO, and a neural version of it, to clarify the link between ascriptions of the label Subjective and ascriptions of the label Fake News. The annotated dataset is available online at the following url: https://github.com/obs-info/obsinfox Keywords: Fake News, Multi-Labels, Subjectivity, Vagueness, Detail, Opinion, Exaggeration, French Press
CLJul 4, 2024
HYBRINFOX at CheckThat! 2024 -- Task 1: Enhancing Language Models with Structured Information for Check-Worthiness EstimationGéraud Faye, Morgane Casanova, Benjamin Icard et al.
This paper summarizes the experiments and results of the HYBRINFOX team for the CheckThat! 2024 - Task 1 competition. We propose an approach enriching Language Models such as RoBERTa with embeddings produced by triples (subject ; predicate ; object) extracted from the text sentences. Our analysis of the developmental data shows that this method improves the performance of Language Models alone. On the evaluation data, its best performance was in English, where it achieved an F1 score of 71.1 and ranked 12th out of 27 candidates. On the other languages (Dutch and Arabic), it obtained more mixed results. Future research tracks are identified toward adapting this processing pipeline to more recent Large Language Models.
CLFeb 6, 2024
Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classificationGéraud Faye, Benjamin Icard, Morgane Casanova et al.
This paper investigates the language of propaganda and its stylistic features. It presents the PPN dataset, standing for Propagandist Pseudo-News, a multisource, multilingual, multimodal dataset composed of news articles extracted from websites identified as propaganda sources by expert agencies. A limited sample from this set was randomly mixed with papers from the regular French press, and their URL masked, to conduct an annotation-experiment by humans, using 11 distinct labels. The results show that human annotators were able to reliably discriminate between the two types of press across each of the labels. We propose different NLP techniques to identify the cues used by the annotators, and to compare them with machine classification. They include the analyzer VAGO to measure discourse vagueness and subjectivity, a TF-IDF to serve as a baseline, and four different classifiers: two RoBERTa-based models, CATS using syntax, and one XGBoost combining syntactic and semantic features.
CLApr 28, 2024
Explaining vague languagePaul Égré, Benjamin Spector
Why is language vague? Vagueness may be explained and rationalized if it can be shown that vague language is more useful to speaker and hearer than precise language. In a well-known paper, Lipman proposes a game-theoretic account of vagueness in terms of mixed strategy that leads to a puzzle: vagueness cannot be strictly better than precision at equilibrium. More recently, Égré, Spector, Mortier and Verheyen have put forward a Bayesian account of vagueness establishing that using vague words can be strictly more informative than using precise words. This paper proposes to compare both results and to explain why they are not in contradiction. Lipman's definition of vagueness relies exclusively on a property of signaling strategies, without making any assumptions about the lexicon, whereas Égré et al.'s involves a layer of semantic content. We argue that the semantic account of vagueness is needed, and more adequate and explanatory of vagueness.
CLOct 27, 2021
Combining Vagueness Detection with Deep Learning to Identify Fake NewsPaul Guélorget, Benjamin Icard, Guillaume Gadek et al.
In this paper, we combine two independent detection methods for identifying fake news: the algorithm VAGO uses semantic rules combined with NLP techniques to measure vagueness and subjectivity in texts, while the classifier FAKE-CLF relies on Convolutional Neural Network classification and supervised deep learning to classify texts as biased or legitimate. We compare the results of the two methods on four corpora. We find a positive correlation between the vagueness and subjectivity measures obtained by VAGO, and the classification of text as biased by FAKE-CLF. The comparison yields mutual benefits: VAGO helps explain the results of FAKE-CLF. Conversely FAKE-CLF helps us corroborate and expand VAGO's database. The use of two complementary techniques (rule-based vs data-driven) proves a fruitful approach for the challenging problem of identifying fake news.
LODec 4, 2014
Knowledge, Justification, and Adequate ReasonsPaul Égré, Paul Marty, Bryan Renne
Is knowledge definable as justified true belief ("JTB")? We argue that one can legitimately answer positively or negatively, depending on whether or not one's true belief is justified by what we call adequate reasons. To facilitate our argument we introduce a simple propositional logic of reason-based belief, and give an axiomatic characterization of the notion of adequacy for reasons. We show that this logic is sufficiently flexible to accommodate various useful features, including quantification over reasons. We use our framework to contrast two notions of JTB: one internalist, the other externalist. We argue that Gettier cases essentially challenge the internalist notion but not the externalist one. Our approach commits us to a form of infallibilism about knowledge, but it also leaves us with a puzzle, namely whether knowledge involves the possession of only adequate reasons, or leaves room for some inadequate reasons. We favor the latter position, which reflects a milder and more realistic version of infallibilism.