7.3CLJun 2
EURO-5K: When Does Domain Pretraining Matter? Benchmarking Transformers for EU Reporting Obligation ExtractionMarios Koniaris, Vasileios Kotronis, Eugenia Giannini et al.
Extracting reporting obligations from EU legislation is critical for assessing and reducing regulatory reporting burden. However, distinguishing reporting requirements from structurally similar provisions requires specialised legal understanding. Current legal NLP methods lack specialised datasets with clear guidelines and comparative evaluation of extraction paradigms and domain adaptation strategies. We curate EURO-5K, a corpus of sentence-level reporting obligations and challenging negative examples from 136 EU legislative acts. On this dataset, we train and compare discriminative token-classification models (BERT-style) and generative span-extraction models (LLMs), evaluating both full fine-tuning and parameter-efficient QLoRA against baselines (pattern and dependency-based extraction, few-shot prompting). Results show that fully fine-tuned generic and legal BERT models achieve similar performance (0.89 F1), while fine-tuned LLMs match encoder accuracy for sentence-level extraction. Legal pretraining offers only small gains for generative models. In contrast, it is clearly beneficial when adaptation capacity is constrained, as parameter-efficient tuning of Legal-BERT outperforms its generic counterpart. Learning curve analysis demonstrates that legal pretraining accelerates early learning with minimal data. All approaches converge around 3K samples with diminishing returns thereafter, validating dataset sufficiency. Cross-dataset evaluation on two external regulatory corpora shows that our models behave as specialised reporting obligation extractors rather than generic regulatory classifiers. We release EURO-5K, trained models, and an interactive demo with explainability visualizations and structured RDF export. These demonstrate that both paradigms and parameter-efficient training provide practical tools for regulatory compliance automation.
NIOct 6, 2020
Network-aware Recommendations in the Wild: Methodology, Realistic Evaluations, ExperimentsSavvas Kastanakis, Pavlos Sermpezis, Vasileios Kotronis et al.
Joint caching and recommendation has been recently proposed as a new paradigm for increasing the efficiency of mobile edge caching. Early findings demonstrate significant gains for the network performance. However, previous works evaluated the proposed schemes exclusively on simulation environments. Hence, it still remains uncertain whether the claimed benefits would change in real settings. In this paper, we propose a methodology that enables to evaluate joint network and recommendation schemes in real content services by only using publicly available information. We apply our methodology to the YouTube service, and conduct extensive measurements to investigate the potential performance gains. Our results show that significant gains can be achieved in practice; e.g., 8 to 10 times increase in the cache hit ratio from cache-aware recommendations. Finally, we build an experimental testbed and conduct experiments with real users; we make available our code and datasets to facilitate further research. To our best knowledge, this is the first realistic evaluation (over a real service, with real measurements and user experiments) of the joint caching and recommendations paradigm. Our findings provide experimental evidence for the feasibility and benefits of this paradigm, validate assumptions of previous works, and provide insights that can drive future research.
NIJan 9, 2018
A Survey among Network Operators on BGP Prefix HijackingPavlos Sermpezis, Vasileios Kotronis, Alberto Dainotti et al.
BGP prefix hijacking is a threat to Internet operators and users. Several mechanisms or modifications to BGP that protect the Internet against it have been proposed. However, the reality is that most operators have not deployed them and are reluctant to do so in the near future. Instead, they rely on basic - and often inefficient - proactive defenses to reduce the impact of hijacking events, or on detection based on third party services and reactive approaches that might take up to several hours. In this work, we present the results of a survey we conducted among 75 network operators to study: (a) the operators' awareness of BGP prefix hijacking attacks, (b) presently used defenses (if any) against BGP prefix hijacking, (c) the willingness to adopt new defense mechanisms, and (d) reasons that may hinder the deployment of BGP prefix hijacking defenses. We expect the findings of this survey to increase the understanding of existing BGP hijacking defenses and the needs of network operators, as well as contribute towards designing new defense mechanisms that satisfy the requirements of the operators.