Catherine Faron Zucker

h-index18

5papers

27citations

Novelty22%

AI Score27

Ranked #152,862 of 194,257 authors (top 79%)#26,704 in CL (top 87%)

5 Papers

2.7CLNov 5, 2025

Overcoming the Generalization Limits of SLM Finetuning for Shape-Based Extraction of Datatype and Object Properties

Célian Ringwald, Fabien Gandon, Catherine Faron et al.

Small language models (SLMs) have shown promises for relation extraction (RE) when extracting RDF triples guided by SHACL shapes focused on common datatype properties. This paper investigates how SLMs handle both datatype and object properties for a complete RDF graph extraction. We show that the key bottleneck is related to long-tail distribution of rare properties. To solve this issue, we evaluate several strategies: stratified sampling, weighted loss, dataset scaling, and template-based synthetic data augmentation. We show that the best strategy to perform equally well over unbalanced target properties is to build a training set where the number of occurrences of each property exceeds a given threshold. To enable reproducibility, we publicly released our datasets, experimental results and code. Our findings offer practical guidance for training shape-aware SLMs and highlight promising directions for future work in semantic RE.

1.7IRDec 20, 2018

SMILK, linking natural language and data from the web

Cédric Lopez, Molka Dhouib, Elena Cabrio et al.

As part of the SMILK Joint Lab, we studied the use of Natural Language Processing to: (1) enrich knowledge bases and link data on the web, and conversely (2) use this linked data to contribute to the improvement of text analysis and the annotation of textual content, and to support knowledge extraction. The evaluation focused on brand-related information retrieval in the field of cosmetics. This article describes each step of our approach: the creation of ProVoc, an ontology to describe products and brands; the automatic population of a knowledge base mainly based on ProVoc from heterogeneous textual resources; and the evaluation of an application which that takes the form of a browser plugin providing additional knowledge to users browsing the web.

3.0AIAug 29, 2014

Challenges in Bridging Social Semantics and Formal Semantics on the Web

Fabien Lucien Gandon, Michel Buffa, Elena Cabrio et al.

This paper describes several results of Wimmics, a research lab which names stands for: web-instrumented man-machine interactions, communities, and semantics. The approaches introduced here rely on graph-oriented knowledge representation, reasoning and operationalization to model and support actors, actions and interactions in web-based epistemic communities. The re-search results are applied to support and foster interactions in online communities and manage their resources.

3.8CLFeb 20, 2013

Towards a Semantic-based Approach for Modeling Regulatory Documents in Building Industry

Khalil Riad Bouzidi, Catherine Faron-Zucker, Bruno Fies et al.

Regulations in the Building Industry are becoming increasingly complex and involve more than one technical area. They cover products, components and project implementation. They also play an important role to ensure the quality of a building, and to minimize its environmental impact. In this paper, we are particularly interested in the modeling of the regulatory constraints derived from the Technical Guides issued by CSTB and used to validate Technical Assessments. We first describe our approach for modeling regulatory constraints in the SBVR language, and formalizing them in the SPARQL language. Second, we describe how we model the processes of compliance checking described in the CSTB Technical Guides. Third, we show how we implement these processes to assist industrials in drafting Technical Documents in order to acquire a Technical Assessment; a compliance report is automatically generated to explain the compliance or noncompliance of this Technical Documents.

3.3IRFeb 19, 2013

An Ontology for Modelling and Supporting the Process of Authoring Technical Assessments

Khalil Riad Bouzidi, Bruno Fies, Marc Bourdeau et al.

In this paper, we present a semantic web approach for modelling the process of creating new technical and regulatory documents related to the Building sector. This industry, among other industries, is currently experiencing a phenomenal growth in its technical and regulatory texts. Therefore, it is urgent and crucial to improve the process of creating regulations by automating it as much as possible. We focus on the creation of particular technical documents issued by the French Scientific and Technical Centre for Building (CSTB), called Technical Assessments, and we propose services based on Semantic Web models and techniques for modelling the process of their creation.