CLOct 15, 2025

Building a Macedonian Recipe Dataset: Collection, Parsing, and Comparative Analysis

arXiv:2510.14128v21 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited digital resources for studying food culture in underrepresented languages like Macedonian, though it is incremental as it applies existing methods to new data.

The paper tackled the under-representation of Macedonian recipes in computational gastronomy by constructing the first systematic Macedonian recipe dataset through web scraping and parsing, resulting in a new resource that highlights distinctive ingredient combinations in Macedonian cuisine.

Computational gastronomy increasingly relies on diverse, high-quality recipe datasets to capture regional culinary traditions. Although there are large-scale collections for major languages, Macedonian recipes remain under-represented in digital research. In this work, we present the first systematic effort to construct a Macedonian recipe dataset through web scraping and structured parsing. We address challenges in processing heterogeneous ingredient descriptions, including unit, quantity, and descriptor normalization. An exploratory analysis of ingredient frequency and co-occurrence patterns, using measures such as Pointwise Mutual Information and Lift score, highlights distinctive ingredient combinations that characterize Macedonian cuisine. The resulting dataset contributes a new resource for studying food culture in underrepresented languages and offers insights into the unique patterns of Macedonian culinary tradition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes