LGJul 26, 2023
A new algorithm for Subgroup Set Discovery based on Information GainDaniel Gómez-Bravo, Aaron García, Guillermo Vigueras et al.
Pattern discovery is a machine learning technique that aims to find sets of items, subsequences, or substructures that are present in a dataset with a higher frequency value than a manually set threshold. This process helps to identify recurring patterns or relationships within the data, allowing for valuable insights and knowledge extraction. In this work, we propose Information Gained Subgroup Discovery (IGSD), a new SD algorithm for pattern discovery that combines Information Gain (IG) and Odds Ratio (OR) as a multi-criteria for pattern selection. The algorithm tries to tackle some limitations of state-of-the-art SD algorithms like the need for fine-tuning of key parameters for each dataset, usage of a single pattern search criteria set by hand, usage of non-overlapping data structures for subgroup space exploration, and the impossibility to search for patterns by fixing some relevant dataset variables. Thus, we compare the performance of IGSD with two state-of-the-art SD algorithms: FSSD and SSD++. Eleven datasets are assessed using these algorithms. For the performance evaluation, we also propose to complement standard SD measures with IG, OR, and p-value. Obtained results show that FSSD and SSD++ algorithms provide less reliable patterns and reduced sets of patterns than IGSD algorithm for all datasets considered. Additionally, IGSD provides better OR values than FSSD and SSD++, stating a higher dependence between patterns and targets. Moreover, patterns obtained for one of the datasets used, have been validated by a group of domain experts. Thus, patterns provided by IGSD show better agreement with experts than patterns obtained by FSSD and SSD++ algorithms. These results demonstrate the suitability of the IGSD as a method for pattern discovery and suggest that the inclusion of non-standard SD metrics allows to better evaluate discovered patterns.
PLJan 25, 2017
Towards Automatic Learning of Heuristics for Mechanical Transformations of Procedural CodeGuillermo Vigueras, Manuel Carro, Salvador Tamarit et al.
The current trends in next-generation exascale systems go towards integrating a wide range of specialized (co-)processors into traditional supercomputers. Due to the efficiency of heterogeneous systems in terms of Watts and FLOPS per surface unit, opening the access of heterogeneous platforms to a wider range of users is an important problem to be tackled. However, heterogeneous platforms limit the portability of the applications and increase development complexity due to the programming skills required. Program transformation can help make programming heterogeneous systems easier by defining a step-wise transformation process that translates a given initial code into a semantically equivalent final code, but adapted to a specific platform. Program transformation systems require the definition of efficient transformation strategies to tackle the combinatorial problem that emerges due to the large set of transformations applicable at each step of the process. In this paper we propose a machine learning-based approach to learn heuristics to define program transformation strategies. Our approach proposes a novel combination of reinforcement learning and classification methods to efficiently tackle the problems inherent to this type of systems. Preliminary results demonstrate the suitability of this approach.
PLJan 12, 2017
Towards a Semantics-Aware Code Transformation Toolchain for Heterogeneous SystemsSalvador Tamarit, Julio Mariño, Guillermo Vigueras et al.
Obtaining good performance when programming heterogeneous computing platforms poses significant challenges. We present a program transformation environment, implemented in Haskell, where architecture-agnostic scientific C code with semantic annotations is transformed into functionally equivalent code better suited for a given platform. The transformation steps are represented as rules that can be fired when certain syntactic and semantic conditions are fulfilled. These rules are not hard-wired into the rewriting engine: they are written in a C-like language and are automatically processed and incorporated into the rewriting engine. That makes it possible for end-users to add their own rules or to provide sets of rules that are adapted to certain specific domains or purposes.
PLMar 10, 2016
Proceedings of the First Workshop on Program Transformation for Programmability in Heterogeneous ArchitecturesSalvador Tamarit, Julio Mariño, Guillermo Vigueras et al.
This volume contains the proceedings of PROHA 2016, the first workshop on Program Transformation for Programmability in Heterogeneous Architectures, held on March 12, 2016 in Barcelona, Spain, as an affiliated workshop of CGO 2016, the 14th International Symposium on Code Generation and Optimization. Developing and maintaining high-performance applications and libraries for heterogeneous architectures while preserving its semantics and with a reasonable efficiency is a time-consuming task which is often only possible for experts. It often requires manually adapting sequential, platform-agnostic code to different infrastructures, and keeping the changes in all of these infrastructures in sync. These program modification tasks are costly and error-prone. Tools to assist in and, if possible, automate such transformations are of course of great interest. However, such tools may need significant reasoning and knowledge processing capabilities, including, for example, being able to process machine-understandable descriptions of the semantics of a piece of code is expected to do; to perform program transformations inside a context in which they are applicable; to use strategies to identify the sequence of transformations leading to the best resulting code; and others.