Jerome Francois

CR
h-index8
4papers
2citations
Novelty40%
AI Score36

4 Papers

CLMar 30
The Necessity of Setting Temperature in LLM-as-a-Judge

Lujun Li, Lama Sleem, Yangjie Xu et al.

LLM-as-a-Judge has emerged as an effective and low-cost paradigm for evaluating text quality and factual correctness. Prior studies have shown substantial agreement between LLM judges and human experts, even on tasks that are difficult to assess automatically. In practice, researchers commonly employ fixed temperature configurations during the evaluation process-with values of 0.1 and 1.0 being the most prevalent choices-a convention that is largely empirical rather than principled. However, recent researches suggest that LLM performance exhibits non-trivial sensitivity to temperature settings, that lower temperatures do not universally yield optimal outcomes, and that such effects are highly task-dependent. This raises a critical research question: does temperature influence judge performance in LLM centric evaluation? To address this, we systematically investigate the relationship between temperature and judge performance through a series of controlled experiments, and further adopt a causal inference framework within our empirical statistical analysis to rigorously examine the direct causal effect of temperature on judge behavior, offering actionable engineering insights for the design of LLM-centric evaluation pipelines.

CRNov 14, 2025
NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks

Lama Sleem, Jerome Francois, Lujun Li et al.

Jailbreak attacks designed to bypass safety mechanisms pose a serious threat by prompting LLMs to generate harmful or inappropriate content, despite alignment with ethical guidelines. Crafting universal filtering rules remains difficult due to their inherent dependence on specific contexts. To address these challenges without relying on threshold calibration or model fine-tuning, this work introduces a semantic consistency analysis between successful and unsuccessful responses, demonstrating that a negation-aware scoring approach captures meaningful patterns. Building on this insight, a novel detection framework called NegBLEURT Forest is proposed to evaluate the degree of alignment between outputs elicited by adversarial prompts and expected safe behaviors. It identifies anomalous responses using the Isolation Forest algorithm, enabling reliable jailbreak detection. Experimental results show that the proposed method consistently achieves top-tier performance, ranking first or second in accuracy across diverse models using the crafted dataset, while competing approaches exhibit notable sensitivity to model and data variations.

CRAug 19, 2020
Early Identification of Services in HTTPS Traffic

Wazen M. Shbair, Thibault Cholez, Jerome Francois et al.

Traffic monitoring is essential for network management tasks that ensure security and QoS. However, the continuous increase of HTTPS traffic undermines the effectiveness of current service-level monitoring that can only rely on unreliable parameters from the TLS handshake (X.509 certificate, SNI) or must decrypt the traffic. We propose a new machine learning-based method to identify HTTPS services without decryption. By extracting statistical features on TLS handshake packets and on a small number of application data packets, we can identify HTTPS services very early in the session. Extensive experiments performed over a significant and open dataset show that our method offers a good accuracy and a prototype implementation confirms that the early identification of HTTPS services is satisfied.

CRAug 19, 2020
A Survey of HTTPS Traffic and Services Identification Approaches

Wazen M. Shbair, Thibault Cholez, Jerome Francois et al.

HTTPS is quickly rising alongside the need of Internet users to benefit from security and privacy when accessing the Web, and it becomes the predominant application protocol on the Internet. This migration towards a secure Web using HTTPS comes with important challenges related to the management of HTTPS traffic to guarantee basic network properties such as security, QoS, reliability, etc. But encryption undermines the effectiveness of standard monitoring techniques and makes it difficult for ISPs and network administrators to properly identify and manage the services behind HTTPS traffic. This survey details the techniques used to monitor HTTPS traffic, from the most basic level of protocol identification (TLS, HTTPS), to the finest identification of precise services. We show that protocol identification is well mastered while more precise levels keep being challenging despite recent advances. We also describe practical solutions that lead us to discuss the trade-off between security and privacy and the research directions to guarantee both of them.