Un duel probabiliste pour départager deux présidents (LIA @ DEFT'2005)
This work addresses a specific NLP challenge for distinguishing between two French presidents' speeches, but it is incremental as it applies existing probabilistic methods to a new dataset.
The paper tackled the DEFT'05 challenge involving binary classification for author identification and thematic break detection in speeches, using probabilistic models like Markov chains and Bayes models, achieving promising results with precision of 0.890, recall of 0.955, and F-score of 0.925.
We present a set of probabilistic models applied to binary classification as defined in the DEFT'05 challenge. The challenge consisted a mixture of two differents problems in Natural Language Processing : identification of author (a sequence of François Mitterrand's sentences might have been inserted into a speech of Jacques Chirac) and thematic break detection (the subjects addressed by the two authors are supposed to be different). Markov chains, Bayes models and an adaptative process have been used to identify the paternity of these sequences. A probabilistic model of the internal coherence of speeches which has been employed to identify thematic breaks. Adding this model has shown to improve the quality results. A comparison with different approaches demostrates the superiority of a strategy that combines learning, coherence and adaptation. Applied to the DEFT'05 data test the results in terms of precision (0.890), recall (0.955) and Fscore (0.925) measure are very promising.