Robert Wrembel

LG
3papers
19citations
Novelty48%
AI Score39

3 Papers

LGMay 10, 2022
Quality versus speed in energy demand prediction for district heating systems

Witold Andrzejewski, Jedrzej Potoniec, Maciej Drozdowski et al.

In this paper, we consider energy demand prediction in district heating systems. Effective energy demand prediction is essential in combined heat power systems when offering electrical energy in competitive electricity markets. To address this problem, we propose two sets of algorithms: (1) a novel extension to the algorithm proposed by E. Dotzauer and (2) an autoregressive predictor based on hour-of-week adjusted linear regression on moving averages of energy consumption. These two methods are compared against state-of-the-art artificial neural networks. Energy demand predictor algorithms have various computational costs and prediction quality. While prediction quality is a widely used measure of predictor superiority, computational costs are less frequently analyzed and their impact is not so extensively studied. When predictor algorithms are constantly updated using new data, some computationally expensive forecasting methods may become inapplicable. The computational costs can be split into training and execution parts. The execution part is the cost paid when the already trained algorithm is applied to predict something. In this paper, we evaluate the above methods with respect to the quality and computational costs, both in the training and in the execution. The comparison is conducted on a real-world dataset from a district heating system in the northwest part of Poland.

23.2DBMay 15
Relational Database Data Lineage Ontology

Jakub Dutkiewicz, Paweł Misiorek, Robert Wrembel

Modeling data lineage in relational databases remains a challenging problem, particularly in scenarios involving incomplete or missing dependencies between database objects. In this paper, we propose a novel ontology for relational database data lineage, designed to provide a richer and more expressive semantic representation supporting discovering the lineage links by means of knowledge graphs (KGs). Building upon our previous work on KG-based lineage discovery, the proposed ontology extends the earlier model with additional concepts capturing structural, semantic, and transformation-level characteristics of relational data. These extensions enable more precise encoding of lineage evidence. To evaluate the impact of the proposed ontology, we conduct a comparative study using a KG-based inductive link prediction framework. Specifically, we assess the performance of a graph neural network model based on path embeddings under two settings: using the original baseline ontology and the newly proposed one. Experimental results demonstrate that the application of the enriched semantic model leads to improvements in lineage link prediction performance, as measured by AUC and Hits@10 metrics.

LGMar 2, 2018
PRESISTANT: Learning based assistant for data pre-processing

Besim Bilalli, Alberto Abelló, Tomàs Aluja-Banet et al.

Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks.