José Moreira

ML
5papers
62citations
Novelty44%
AI Score26

5 Papers

CVMar 8, 2023
Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions

Victor Ferrari, Rafael Sousa, Marcio Pereira et al.

Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to compute convolutions is known as the Im2Col + BLAS method. This paper proposes SConv: a direct-convolution algorithm based on a MLIR/LLVM code-generation toolchain that can be integrated into machine-learning compilers . This algorithm introduces: (a) Convolution Slicing Analysis (CSA) - a convolution-specific 3D cache-blocking analysis pass that focuses on tile reuse over the cache hierarchy; (b) Convolution Slicing Optimization (CSO) - a code-generation pass that uses CSA to generate a tiled direct-convolution macro-kernel; and (c) Vector-Based Packing (VBP) - an architecture-specific optimized input-tensor packing solution based on vector-register shift instructions for convolutions with unitary stride. Experiments conducted on 393 convolutions from full ONNX-MLIR machine-learning models indicate that the elimination of the Im2Col transformation and the use of fast packing routines result in a total packing time reduction, on full model inference, of 2.0x - 3.9x on Intel x86 and 3.6x - 7.2x on IBM POWER10. The speed-up over an Im2Col + BLAS method based on current BLAS implementations for end-to-end machine-learning model inference is in the range of 9% - 25% for Intel x86 and 10% - 42% for IBM POWER10 architectures. The total convolution speedup for model inference is 12% - 27% on Intel x86 and 26% - 46% on IBM POWER10. SConv also outperforms BLAS GEMM, when computing pointwise convolutions, in more than 83% of the 219 tested instances.

MLJul 28, 2022Code
A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting

Martim Sousa, Ana Maria Tomé, José Moreira

This paper introduces a novel model-agnostic algorithm called adaptive ensemble batch multi-input multi-output conformalized quantile regression (AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction intervals for a fixed pre-specified miscoverage rate in a distribution-free manner. Our method is grounded on conformal prediction principles, however, it does not require data splitting and provides close to exact coverage even when the data is not exchangeable. Moreover, the resulting prediction intervals, besides being empirically valid along the forecast horizon, do not neglect heteroscedasticity. AEnbMIMOCQR is designed to be robust to distribution shifts, which means that its prediction intervals remain reliable over an unlimited period of time, without entailing retraining or imposing unrealistic strict assumptions on the data-generating process. Through methodically experimentation, we demonstrate that our approach outperforms other competitive methods on both real-world and synthetic datasets. The code used in the experimental part and a tutorial on how to use AEnbMIMOCQR can be found at the following GitHub repository: https://github.com/Quilograma/AEnbMIMOCQR.

MLJul 6, 2022
Improved conformalized quantile regression

Martim Sousa, Ana Maria Tomé, José Moreira

Conformalized quantile regression is a procedure that inherits the advantages of conformal prediction and quantile regression. That is, we use quantile regression to estimate the true conditional quantile and then apply a conformal step on a calibration set to ensure marginal coverage. In this way, we get adaptive prediction intervals that account for heteroscedasticity. However, the aforementioned conformal step lacks adaptiveness as described in (Romano et al., 2019). To overcome this limitation, instead of applying a single conformal step after estimating conditional quantiles with quantile regression, we propose to cluster the explanatory variables weighted by their permutation importance with an optimized k-means and apply k conformal steps. To show that this improved version outperforms the classic version of conformalized quantile regression and is more adaptive to heteroscedasticity, we extensively compare the prediction intervals of both in open datasets.

DBAug 20, 2025
A DBMS-independent approach for capturing provenance polynomials through query rewriting

Paulo Pintor, Rogério Costa, José Moreira

In today's data-driven ecosystems, ensuring data integrity, traceability and accountability is important. Provenance polynomials constitute a powerful formalism for tracing the origin and the derivations made to produce database query results. Despite their theoretical expressiveness, current implementations have limitations in handling aggregations and nested queries, and some of them and tightly coupled to a single Database Management System (DBMS), hindering interoperability and broader applicability. This paper presents a query rewriting-based approach for annotating Structured Query Language (SQL) queries with provenance polynomials. The proposed methods are DBMS-independent and support Select-Projection-Join-Union-Aggregation (SPJUA) operations and nested queries, through recursive propagation of provenance annotations. This constitutes the first full implementation of semiring-based theory for provenance polynomials extended with semimodule structures. It also presents an experimental evaluation to assess the validity of the proposed methods and compare the performance against state-of-the-art systems using benchmark data and queries. The results indicate that our solution delivers a comprehensive implementation of the theoretical formalisms proposed in the literature, and demonstrates improved performance and scalability, outperforming existing methods.

DCAug 9, 2017
Enabling Massive Deep Neural Networks with the GraphBLAS

Jeremy Kepner, Manoj Kumar, José Moreira et al.

Deep Neural Networks (DNNs) have emerged as a core tool for machine learning. The computations performed during DNN training and inference are dominated by operations on the weight matrices describing the DNN. As DNNs incorporate more stages and more nodes per stage, these weight matrices may be required to be sparse because of memory limitations. The GraphBLAS.org math library standard was developed to provide high performance manipulation of sparse weight matrices and input/output vectors. For sufficiently sparse matrices, a sparse matrix library requires significantly less memory than the corresponding dense matrix implementation. This paper provides a brief description of the mathematics underlying the GraphBLAS. In addition, the equations of a typical DNN are rewritten in a form designed to use the GraphBLAS. An implementation of the DNN is given using a preliminary GraphBLAS C library. The performance of the GraphBLAS implementation is measured relative to a standard dense linear algebra library implementation. For various sizes of DNN weight matrices, it is shown that the GraphBLAS sparse implementation outperforms a BLAS dense implementation as the weight matrix becomes sparser.