Gürkan Solmaz

CL
h-index18
6papers
10citations
Novelty42%
AI Score38

6 Papers

LGApr 13, 2022
Label Augmentation with Reinforced Labeling for Weak Supervision

Gürkan Solmaz, Flavio Cirillo, Fabio Maresca et al.

Weak supervision (WS) is an alternative to the traditional supervised learning to address the need for ground truth. Data programming is a practical WS approach that allows programmatic labeling data samples using labeling functions (LFs) instead of hand-labeling each data point. However, the existing approach fails to fully exploit the domain knowledge encoded into LFs, especially when the LFs' coverage is low. This is due to the common data programming pipeline that neglects to utilize data features during the generative process. This paper proposes a new approach called reinforced labeling (RL). Given an unlabeled dataset and a set of LFs, RL augments the LFs' outputs to cases not covered by LFs based on similarities among samples. Thus, RL can lead to higher labeling coverage for training an end classifier. The experiments on several domains (classification of YouTube comments, wine quality, and weather prediction) result in considerable gains. The new approach produces significant performance improvement, leading up to +21 points in accuracy and +61 points in F1 scores compared to the state-of-the-art data programming approach.

CLFeb 25, 2025Code
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

Gaye Colakoglu, Gürkan Solmaz, Jonathan Fürst

This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model engagement, and 3) output refinement. Our study investigates the sub-problems and methods within these core challenges, such as input representation, chunking, prompting, selection of LLMs, and multimodal models. It examines the effect of different design choices through LayIE-LLM, a new, open-source, layout-aware IE test suite, benchmarking against traditional, fine-tuned IE models. The results on two IE datasets show that LLMs require adjustment of the IE pipeline to achieve competitive performance: the optimized configuration found with LayIE-LLM achieves 13.3--37.5 F1 points more than a general-practice baseline configuration using the same LLM. To find a well-working configuration, we develop a one-factor-at-a-time (OFAT) method that achieves near-optimal results. Our method is only 0.8--1.8 points lower than the best full factorial exploration with a fraction (2.8%) of the required computation. Overall, we demonstrate that, if well-configured, general-purpose LLMs match the performance of specialized models, providing a cost-effective, finetuning-free alternative. Our test-suite is available at https://github.com/gayecolakoglu/LayIE-LLM.

CLSep 18, 2025
TextMine: Data, Evaluation Framework and Ontology-guided LLM Pipeline for Humanitarian Mine Action

Chenyue Zhou, Gürkan Solmaz, Flavio Cirillo et al.

Humanitarian Mine Action (HMA) addresses the challenge of detecting and removing landmines from conflict regions. Much of the life-saving operational knowledge produced by HMA agencies is buried in unstructured reports, limiting the transferability of information between agencies. To address this issue, we propose TextMine: the first dataset, evaluation framework and ontology-guided large language model (LLM) pipeline for knowledge extraction in the HMA domain. TextMine structures HMA reports into (subject, relation, object)-triples, thus creating domain-specific knowledge. To ensure real-world relevance, we created the dataset in collaboration with Cambodian Mine Action Center (CMAC). We further introduce a bias-aware evaluation framework that combines human-annotated triples with an LLM-as-Judge protocol to mitigate position bias in reference-free scoring. Our experiments show that ontology-aligned prompts improve extraction accuracy by up to 44.2%, reduce hallucinations by 22.5%, and enhance format adherence by 20.9% compared to baseline models. We publicly release the dataset and code.

CLSep 15, 2025
AgenticIE: An Adaptive Agent for Information Extraction from Complex Regulatory Documents

Gaye Colakoglu, Gürkan Solmaz, Jonathan Fürst

Declaration of Performance (DoP) documents, mandated by EU regulation, certify the performance of construction products. There are two challenges to make DoPs machine and human accessible through automated key-value pair extraction (KVP) and question answering (QA): (1) While some of their content is standardized, DoPs vary widely in layout, schema, and format; (2) Both users and documents are multilingual. Existing static or LLM-only Information Extraction (IE) pipelines fail to adapt to this structural document and user diversity. Our domain-specific, agentic system addresses these challenges through a planner-executor-responder architecture. The system infers user intent, detects document language and modality, and orchestrates tools dynamically for robust, traceable reasoning while avoiding tool misuse or execution loops. Our agent outperforms baselines (ROUGE: 0.783 vs. 0.703/0.608) with better cross-lingual stability (17-point vs. 21-26-point variation).

CYMay 15, 2024
Desk-AId: Humanitarian Aid Desk Assessment with Geospatial AI for Predicting Landmine Areas

Flavio Cirillo, Gürkan Solmaz, Yi-Hsuan Peng et al.

The process of clearing areas, namely demining, starts by assessing and prioritizing potential hazardous areas (i.e., desk assessment) to go under thorough investigation of experts, who confirm the risk and proceed with the mines clearance operations. This paper presents Desk-AId that supports the desk assessment phase by estimating landmine risks using geospatial data and socioeconomic information. Desk-AId uses a Geospatial AI approach specialized to landmines. The approach includes mixed data sampling strategies and context-enrichment by historical conflicts and key multi-domain facilities (e.g., buildings, roads, health sites). The proposed system addresses the issue of having only ground-truth for confirmed hazardous areas by implementing a new hard-negative data sampling strategy, where negative points are sampled in the vicinity of hazardous areas. Experiments validate Desk-Aid in two domains for landmine risk assessment: 1) country-wide, and 2) uncharted study areas). The proposed approach increases the estimation accuracies up to 92%, for different classification models such as RandomForest (RF), Feedforward Neural Networks (FNN), and Graph Neural Networks (GNN).

HCJun 28, 2021
Design Considerations for Data Daemons: Co-creating Design Futures to Explore Ethical Personal Data Management

Wiebke Toussaint, Alejandra Gomez Ortega, Jered Vroon et al.

Mobile applications and online service providers track our virtual and physical behaviour more actively and with a broader scope than ever before. This has given rise to growing concerns about ethical personal data management. Even though regulation and awareness around data ethics are increasing, end-users are seldom engaged when defining and designing what a future with ethical personal data management should look like. We explore a participatory process that uses design futures, the Future workshop method and design fictions to envision ethical personal data management with end-users and designers. To engage participants effectively, we needed to bridge their differential expertise and make the abstract concepts of data and ethics tangible. By concretely presenting personal data management and control as fictitious entities called Data Daemons, we created a shared understanding of these abstract concepts, and empowered non-expert end-users and designers to become actively engaged in the design process.