Dániel Pfeifer

3papers

2citations

Novelty37%

AI Score34

Ranked #130,557 of 205,806 authors (top 63%)#1,748 in ML (top 50%)

3 Papers

4.4MLMay 14

From Data to Action: Accelerating Refinery Optimization with AI

Dániel Pfeifer, Ábrahám Papp, Tibor Bernáth et al.

Nowadays refinery optimization utilizes sheer amounts of data, which can be handled with modern Linear Programming (LP) software, but the interpreting and applying the results remains challenging. Large petrochemical companies use massive models, with hundreds of thousands of input matrix elements. The LP solution is mathematically correct, but simplifications are made in the model, and data supply errors may occur. Therefore, further insight is needed to trust the results. The LP solver does not have a memory, so additional understanding could be gained by analyzing historical data and comparing it to the current plan. As such, machine learning approaches were suggested to support decision making based on the LP solution. Among these, Anomaly Detection tools are proposed to be used in tandem with the LP output. A transformed version of the popular ECOD methodology is applied. New methods are proposed to handle high-dimensional data: choosing the most informative pairs. Then, this is used alongside two 2D Anomaly Detection algorithms, revealing several business opportunities and data supply errors in the MOL refinery scheduling and planning architecture.

MLAug 28, 2024

Generalized Naive Bayes

Edith Alice Kovács, Anna Ország, Dániel Pfeifer et al.

In this paper we introduce the so-called Generalized Naive Bayes structure as an extension of the Naive Bayes structure. We give a new greedy algorithm that finds a good fitting Generalized Naive Bayes (GNB) probability distribution. We prove that this fits the data at least as well as the probability distribution determined by the classical Naive Bayes (NB). Then, under a not very restrictive condition, we give a second algorithm for which we can prove that it finds the optimal GNB probability distribution, i.e. best fitting structure in the sense of KL divergence. Both algorithms are constructed to maximize the information content and aim to minimize redundancy. Based on these algorithms, new methods for feature selection are introduced. We discuss the similarities and differences to other related algorithms in terms of structure, methodology, and complexity. Experimental results show, that the algorithms introduced outperform the related algorithms in many cases.

MLMay 10, 2022

Matrix and graph representations of vine copula structures

Dániel Pfeifer, Edith Alice Kovács

Vine copulas can efficiently model multivariate probability distributions. This paper focuses on a more thorough understanding of their structures, since in the literature, vine copula representations are often ambiguous. The graph representations include the original, cherry and chordal graph sequence structures, which we show equivalence between. Importantly we also show a new result, namely that when a perfect elimination ordering of a vine structure is given, then it can always be uniquely represented with a matrix. O. M. Nápoles has shown a way to represent vines in a matrix, and we algorithmify this previous approach, while also showing a new method for constructing such a matrix, through cherry tree sequences. We also calculate the runtime of these algorithms. Lastly, we prove that these two matrix-building algorithms are equivalent if the same perfect elimination ordering is being used.