Mário A. T. Figueiredo

LG
h-index21
40papers
3,471citations
Novelty42%
AI Score44

40 Papers

OCMay 9, 2012
Alternating Direction Algorithms for Constrained Sparse Regression: Application to Hyperspectral Unmixing

José M. Bioucas-Dias, Mário A. T. Figueiredo

Convex optimization problems are common in hyperspectral unmixing. Examples include: the constrained least squares (CLS) and the fully constrained least squares (FCLS) problems, which are used to compute the fractional abundances in linear mixtures of known spectra; the constrained basis pursuit (CBP) problem, which is used to find sparse (i.e., with a small number of non-zero terms) linear mixtures of spectra from large libraries; the constrained basis pursuit denoising (CBPDN) problem, which is a generalization of BP that admits modeling errors. In this paper, we introduce two new algorithms to efficiently solve these optimization problems, based on the alternating direction method of multipliers, a method from the augmented Lagrangian family. The algorithms are termed SUnSAL (sparse unmixing by variable splitting and augmented Lagrangian) and C-SUnSAL (constrained SUnSAL). C-SUnSAL solves the CBP and CBPDN problems, while SUnSAL solves CLS and FCLS, as well as a more general version thereof, called constrained sparse regression (CSR). C-SUnSAL and SUnSAL are shown to outperform off-the-shelf methods in terms of speed and accuracy.

LGMar 4, 2022
Differentiable Causal Discovery Under Latent Interventions

Gonçalo R. A. Faria, André F. T. Martins, Mário A. T. Figueiredo · uw

Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown. However, previous work assumes that the correspondence between samples and interventions is known, which is often unrealistic. We envision a scenario with an extensive dataset sampled from multiple intervention distributions and one observation distribution, but where we do not know which distribution originated each sample and how the intervention affected the system, \textit{i.e.}, interventions are entirely latent. We propose a method based on neural networks and variational inference that addresses this scenario by framing it as learning a shared causal graph among an infinite mixture (under a Dirichlet process prior) of intervention structural causal models. Experiments with synthetic and real data show that our approach and its semi-supervised variant are able to discover causal relations in this challenging scenario.

LGJun 27, 2022
Human-AI Collaboration in Decision-Making: Beyond Learning to Defer

Diogo Leitão, Pedro Saleiro, Mário A. T. Figueiredo et al.

Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements, such as the availability of predictions from humans for every instance or ground-truth labels that are independent from said humans. Furthermore, neither L2D nor alternative approaches tackle fundamental issues of deploying HAIC systems in real-world settings, such as capacity management or dealing with dynamic environments. In this paper, we aim to identify and review these and other limitations, pointing to where opportunities for future research in HAIC may lie.

LGJul 13, 2022
Understanding Unfairness in Fraud Detection through Model and Data Bias Interactions

José Pombal, André F. Cruz, João Bravo et al.

In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial services, for example. The awareness of this problem has given rise to the field of Fair ML, which focuses on studying, measuring, and mitigating unfairness in algorithmic prediction, with respect to a set of protected groups (e.g., race or gender). However, the underlying causes for algorithmic unfairness still remain elusive, with researchers divided between blaming either the ML algorithms or the data they are trained on. In this work, we maintain that algorithmic unfairness stems from interactions between models and biases in the data, rather than from isolated contributions of either of them. To this end, we propose a taxonomy to characterize data bias and we study a set of hypotheses regarding the fairness-accuracy trade-offs that fairness-blind ML algorithms exhibit under different data bias settings. On our real-world account-opening fraud use case, we find that each setting entails specific trade-offs, affecting fairness in expected value and variance -- the latter often going unnoticed. Moreover, we show how algorithms compare differently in terms of accuracy and fairness, depending on the biases affecting the data. Finally, we note that under specific data bias conditions, simple pre-processing interventions can successfully balance group-wise error rates, while the same techniques fail in more complex settings.

LGJun 27, 2022
Prisoners of Their Own Devices: How Models Induce Data Bias in Performative Prediction

José Pombal, Pedro Saleiro, Mário A. T. Figueiredo et al.

The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases operate. In the latter, the predictive model itself plays a pivotal role in shaping the distribution of the data. However, little attention has been heeded to relating unfairness to these interactions. Thus, to further the understanding of unfairness in these settings, we propose a taxonomy to characterize bias in the data, and study cases where it is shaped by model behaviour. Using a real-world account opening fraud detection case study as an example, we study the dangers to both performance and fairness of two typical biases in performative prediction: distribution shifts, and the problem of selective labels.

LGMar 29, 2023
Fairness-Aware Data Valuation for Supervised Learning

José Pombal, Pedro Saleiro, Mário A. T. Figueiredo et al.

Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework that can be used to incorporate fairness concerns into a series of ML-related tasks (e.g., data pre-processing, exploratory data analysis, active learning). We propose an entropy-based data valuation metric suited to address our two-pronged goal of maximizing both performance and fairness, which is more computationally efficient than existing metrics. We then show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results -- up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline -- and promote fairness in a data-centric way, where a deeper understanding of data quality takes center stage.

LGSep 4, 2022
ProBoost: a Boosting Method for Probabilistic Classifiers

Fábio Mendonça, Sheikh Shanawaz Mostafa, Fernando Morgado-Dias et al.

ProBoost, a new boosting algorithm for probabilistic classifiers, is proposed in this work. This algorithm uses the epistemic uncertainty of each training sample to determine the most challenging/uncertain ones; the relevance of these samples is then increased for the next weak learner, producing a sequence that progressively focuses on the samples found to have the highest uncertainty. In the end, the weak learners' outputs are combined into a weighted ensemble of classifiers. Three methods are proposed to manipulate the training set: undersampling, oversampling, and weighting the training samples according to the uncertainty estimated by the weak learners. Furthermore, two approaches are studied regarding the ensemble combination. The weak learner herein considered is a standard convolutional neural network, and the probabilistic models underlying the uncertainty estimation use either variational inference or Monte Carlo dropout. The experimental evaluation carried out on MNIST benchmark datasets shows that ProBoost yields a significant performance improvement. The results are further highlighted by assessing the relative achievable improvement, a metric proposed in this work, which shows that a model with only four weak learners leads to an improvement exceeding 12% in this metric (for either accuracy, sensitivity, or specificity), in comparison to the model learned without ProBoost.

LGMar 11, 2024Code
Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints

Jean V. Alves, Diogo Leitão, Sérgio Jesus et al.

Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key real-world aspects that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type I and type II errors have different costs; ii) requiring concurrent human predictions for every instance of the training dataset; and iii) not dealing with human work-capacity constraints. To address these issues, we propose the \textit{deferral under cost and capacity constraints framework} (DeCCaF). DeCCaF is a novel L2D approach, employing supervised learning to model the probability of human error under less restrictive data requirements (only one expert prediction per instance) and using constraint programming to globally minimize the error cost, subject to workload limitations. We test DeCCaF in a series of cost-sensitive fraud detection scenarios with different teams of 9 synthetic fraud analysts, with individual work-capacity constraints. The results demonstrate that our approach performs significantly better than the baselines in a wide array of scenarios, achieving an average $8.4\%$ reduction in the misclassification cost. The code used for the experiments is available at https://github.com/feedzai/deccaf

LGDec 20, 2023Code
FiFAR: A Fraud Detection Dataset for Learning to Defer

Jean V. Alves, Diogo Leitão, Sérgio Jesus et al.

Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud detection is a high-stakes setting where algorithms and human experts often work in tandem; however, there are no publicly available datasets for L2D concerning this important application of human-AI teaming. To fill this gap in L2D research, we introduce the Financial Fraud Alert Review Dataset (FiFAR), a synthetic bank account fraud detection dataset, containing the predictions of a team of 50 highly complex and varied synthetic fraud analysts, with varied bias and feature dependence. We also provide a realistic definition of human work capacity constraints, an aspect of L2D systems that is often overlooked, allowing for extensive testing of assignment systems under real-world conditions. We use our dataset to develop a capacity-aware L2D method and rejection learning approach under realistic data availability conditions, and benchmark these baselines under an array of 300 distinct testing scenarios. We believe that this dataset will serve as a pivotal instrument in facilitating a systematic, rigorous, reproducible, and transparent evaluation and comparison of L2D methods, thereby fostering the development of more synergistic human-AI collaboration in decision-making systems. The public dataset and detailed synthetic expert information are available at: https://github.com/feedzai/fifar-dataset

MLDec 10, 2025Code
LxCIM: a new rank-based binary classifier performance metric invariant to local exchange of classes

Tiago Brogueira, Mário A. T. Figueiredo

Binary classification is one of the oldest, most prevalent, and studied problems in machine learning. However, the metrics used to evaluate model performance have received comparatively little attention. The area under the receiver operating characteristic curve (AUROC) has long been a standard choice for model comparison. Despite its advantages, AUROC is not always ideal, particularly for problems that are invariant to local exchange of classes (LxC), a new form of metric invariance introduced in this work. To address this limitation, we propose LxCIM (LxC-invariant metric), which is not only rank-based and invariant under local exchange of classes, but also intuitive, logically consistent, and always computable, while enabling more detailed analysis through the cumulative accuracy-decision rate curve. Moreover, LxCIM exhibits clear theoretical connections to AUROC, accuracy, and the area under the accuracy-decision rate curve (AUDRC). These relationships allow for multiple complementary interpretations: as a symmetric form of AUROC, a rank-based analogue of accuracy, or a more representative and more interpretable variant of AUDRC. Finally, we demonstrate the direct applicability of LxCIM to the bivariate causal discovery problem (which exhibits invariance to local exchange of classes) and show how it addresses the acknowledged limitations of existing metrics used in this field. All code and implementation details are publicly available at github.com/tiagobrogueira/Causal-Discovery-In-Exchangeable-Data.

LGMar 14, 2023
Distinguishing Cause from Effect on Categorical Data: The Uniform Channel Model

Mário A. T. Figueiredo, Catarina A. Oliveira

Distinguishing cause from effect using observations of a pair of random variables is a core problem in causal discovery. Most approaches proposed for this task, namely additive noise models (ANM), are only adequate for quantitative data. We propose a criterion to address the cause-effect problem with categorical variables (living in sets with no meaningful order), inspired by seeing a conditional probability mass function (pmf) as a discrete memoryless channel. We select as the most likely causal direction the one in which the conditional pmf is closer to a uniform channel (UC). The rationale is that, in a UC, as in an ANM, the conditional entropy (of the effect given the cause) is independent of the cause distribution, in agreement with the principle of independence of cause and mechanism. Our approach, which we call the uniform channel model (UCM), thus extends the ANM rationale to categorical variables. To assess how close a conditional pmf (estimated from data) is to a UC, we use statistical testing, supported by a closed-form estimate of a UC channel. On the theoretical front, we prove identifiability of the UCM and show its equivalence with a structural causal model with a low-cardinality exogenous variable. Finally, the proposed method compares favorably with recent state-of-the-art alternatives in experiments on synthetic, benchmark, and real data.

CLMay 3, 2024
Conformal Prediction for Natural Language Processing: A Survey

Margarida M. Campos, António Farinhas, Chrysoula Zerva et al.

The rapid proliferation of large language models and natural language processing (NLP) applications creates a crucial need for uncertainty quantification to mitigate risks such as hallucinations and to enhance decision-making reliability in critical applications. Conformal prediction is emerging as a theoretically sound and practically useful framework, combining flexibility with strong statistical guarantees. Its model-agnostic and distribution-free nature makes it particularly promising to address the current shortcomings of NLP systems that stem from the absence of uncertainty quantification. This paper provides a comprehensive survey of conformal prediction techniques, their guarantees, and existing applications in NLP, pointing to directions for future research and open challenges.

MLDec 17, 2025
A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point

Carlos Couto, José Mourão, Mário A. T. Figueiredo et al.

Near an optimal learning point of a neural network, the learning performance of gradient descent dynamics is dictated by the Hessian matrix of the loss function with respect to the network parameters. We characterize the Hessian eigenspectrum for some classes of teacher-student problems, when the teacher and student networks have matching weights, showing that the smaller eigenvalues of the Hessian determine long-time learning performance. For linear networks, we analytically establish that for large networks the spectrum asymptotically follows a convolution of a scaled chi-square distribution with a scaled Marchenko-Pastur distribution. We numerically analyse the Hessian spectrum for polynomial and other non-linear networks. Furthermore, we show that the rank of the Hessian matrix can be seen as an effective number of parameters for networks using polynomial activation functions. For a generic non-linear activation function, such as the error function, we empirically observe that the Hessian matrix is always full rank.

LGFeb 20, 2025
Sparse Activations as Conformal Predictors

Margarida M. Campos, João Calém, Sophia Sklaviadis et al.

Conformal prediction is a distribution-free framework for uncertainty quantification that replaces point predictions with sets, offering marginal coverage guarantees (i.e., ensuring that the prediction sets contain the true label with a specified probability, in expectation). In this paper, we uncover a novel connection between conformal prediction and sparse softmax-like transformations, such as sparsemax and $γ$-entmax (with $γ> 1$), which may assign nonzero probability only to a subset of labels. We introduce new non-conformity scores for classification that make the calibration process correspond to the widely used temperature scaling method. At test time, applying these sparse transformations with the calibrated temperature leads to a support set (i.e., the set of labels with nonzero probability) that automatically inherits the coverage guarantees of conformal prediction. Through experiments on computer vision and text classification benchmarks, we demonstrate that the proposed method achieves competitive results in terms of coverage, efficiency, and adaptiveness compared to standard non-conformity scores based on softmax.

LGJan 16, 2024
DiConStruct: Causal Concept-based Explanations through Black-Box Distillation

Ricardo Moreira, Jacopo Bono, Mário Cardoso et al.

Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predictive task. Despite the rapid advances in AI explainability in recent years, as far as we know to date, no method fulfills these three properties. Indeed, mainstream methods for local concept explainability do not produce causal explanations and incur a trade-off between explainability and prediction performance. We present DiConStruct, an explanation method that is both concept-based and causal, with the goal of creating more interpretable local explanations in the form of structural causal models and concept attributions. Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations. Because of this, DiConStruct generates explanations efficiently while not impacting the black-box prediction task. We validate our method on an image dataset and a tabular dataset, showing that DiConStruct approximates the black-box models with higher fidelity than other concept explainability baselines, while providing explanations that include the causal relations between the concepts.

LGAug 4, 2021
Sparse Continuous Distributions and Fenchel-Young Losses

André F. T. Martins, Marcos Treviso, António Farinhas et al.

Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fusedmax), has led to distributions with varying support. This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define $Ω$-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $Ω$ is a Tsallis negentropy with parameter $α$, we obtain ``deformed exponential families,'' which include $α$-entmax and sparsemax ($α=2$) as particular cases. For quadratic energy functions, the resulting densities are $β$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $Ω$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $α\in \{1, 4/3, 3/2, 2\}$. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.

LGNov 30, 2020
TimeSHAP: Explaining Recurrent Models through Sequence Perturbations

João Bento, Pedro Saleiro, André F. Cruz et al.

Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may be arbitrarily long, we further propose a pruning method that is shown to dramatically decrease both its computational cost and the variance of its attributions. We use TimeSHAP to explain the predictions of a real-world bank account takeover fraud detection RNN model, and draw key insights from its explanations: i) the model identifies important features and events aligned with what fraud analysts consider cues for account takeover; ii) positive predicted sequences can be pruned to only 10% of the original length, as older events have residual attribution values; iii) the most recent input event of positive predictions only contributes on average to 41% of the model's score; iv) notably high attribution to client's age, suggesting a potential discriminatory reasoning, later confirmed as higher false positive rates for older clients.

LGNov 3, 2020
Control with adaptive Q-learning

João Pedro Araújo, Mário A. T. Figueiredo, Miguel Ayala Botto

This paper evaluates adaptive Q-learning (AQL) and single-partition adaptive Q-learning (SPAQL), two algorithms for efficient model-free episodic reinforcement learning (RL), in two classical control problems (Pendulum and Cartpole). AQL adaptively partitions the state-action space of a Markov decision process (MDP), while learning the control policy, i. e., the mapping from states to actions. The main difference between AQL and SPAQL is that the latter learns time-invariant policies, where the mapping from states to actions does not depend explicitly on the time step. This paper also proposes the SPAQL with terminal state (SPAQL-TS), an improved version of SPAQL tailored for the design of regulators for control problems. The time-invariant policies are shown to result in a better performance than the time-variant ones in both problems studied. These algorithms are particularly fitted to RL problems where the action space is finite, as is the case with the Cartpole problem. SPAQL-TS solves the OpenAI Gym Cartpole problem, while also displaying a higher sample efficiency than trust region policy optimization (TRPO), a standard RL algorithm for solving control tasks. Moreover, the policies learned by SPAQL are interpretable, while TRPO policies are typically encoded as neural networks, and therefore hard to interpret. Yielding interpretable policies while being sample-efficient are the major advantages of SPAQL.

MLSep 1, 2020
Variational Mixture of Normalizing Flows

Guilherme G. P. Freitas Pires, Mário A. T. Figueiredo

In the past few years, deep generative models, such as generative adversarial networks \autocite{GAN}, variational autoencoders \autocite{vaepaper}, and their variants, have seen wide adoption for the task of modelling complex data distributions. In spite of the outstanding sample quality achieved by those early methods, they model the target distributions \emph{implicitly}, in the sense that the probability density functions induced by them are not explicitly accessible. This fact renders those methods unfit for tasks that require, for example, scoring new instances of data with the learned distributions. Normalizing flows have overcome this limitation by leveraging the change-of-variables formula for probability density functions, and by using transformations designed to have tractable and cheaply computable Jacobians. Although flexible, this framework lacked (until recently \autocites{semisuplearning_nflows, RAD}) a way to introduce discrete structure (such as the one found in mixtures) in the models it allows to construct, in an unsupervised scenario. The present work overcomes this by using normalizing flows as components in a mixture model and devising an end-to-end training procedure for such a model. This procedure is based on variational inference, and uses a variational posterior parameterized by a neural network. As will become clear, this model naturally lends itself to (multimodal) density estimation, semi-supervised learning, and clustering. The proposed model is illustrated on two synthetic datasets, as well as on a real-world dataset. Keywords: Deep generative models, normalizing flows, variational inference, probabilistic modelling, mixture models.

LGJun 15, 2020
Equilibrium Propagation for Complete Directed Neural Networks

Matilde Tristany Farinha, Sérgio Pequito, Pedro A. Santos et al.

Artificial neural networks, one of the most successful approaches to supervised learning, were originally inspired by their biological counterparts. However, the most successful learning algorithm for artificial neural networks, backpropagation, is considered biologically implausible. We contribute to the topic of biologically plausible neuronal learning by building upon and extending the equilibrium propagation learning framework. Specifically, we introduce: a new neuronal dynamics and learning rule for arbitrary network architectures; a sparsity-inducing method able to prune irrelevant connections; a dynamical-systems characterization of the models, using Lyapunov theory.

LGJun 12, 2020
Sparse and Continuous Attention Mechanisms

André F. T. Martins, António Farinhas, Marcos Treviso et al.

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and alpha-entmax), which have varying support, being able to assign zero probability to irrelevant categories. This paper expands that work in two directions: first, we extend alpha-entmax to continuous domains, revealing a link with Tsallis statistics and deformed exponential families. Second, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for alpha in {1,2}. Experiments on attention-based text classification, machine translation, and visual question answering illustrate the use of continuous attention in 1D and 2D, showing that it allows attending to time intervals and compact regions.

LGJan 18, 2020
A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints

Marek Śmieja, Łukasz Struski, Mário A. T. Figueiredo

In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully pairwise-labeled dataset produced by the first stage in a supervised neural-network-based clustering method. The proposed approach, S3C2 (Semi-Supervised Siamese Classifiers for Clustering), is motivated by the observation that binary classification (such as assigning pairwise relations) is usually easier than multi-class clustering with partial supervision. On the other hand, being classification-based, our method solves only well-defined classification problems, rather than less well specified clustering tasks. Extensive experiments on various datasets demonstrate the high performance of the proposed method.

CVJul 19, 2018
Conditional Random Fields as Recurrent Neural Networks for 3D Medical Imaging Segmentation

Miguel Monteiro, Mário A. T. Figueiredo, Arlindo L. Oliveira

The Conditional Random Field as a Recurrent Neural Network layer is a recently proposed algorithm meant to be placed on top of an existing Fully-Convolutional Neural Network to improve the quality of semantic segmentation. In this paper, we test whether this algorithm, which was shown to improve semantic segmentation for 2D RGB images, is able to improve segmentation quality for 3D multi-modal medical images. We developed an implementation of the algorithm which works for any number of spatial dimensions, input/output image channels, and reference image channels. As far as we know this is the first publicly available implementation of this sort. We tested the algorithm with two distinct 3D medical imaging datasets, we concluded that the performance differences observed were not statistically significant. Finally, in the discussion section of the paper, we go into the reasons as to why this technique transfers poorly from natural images to medical images.

CVJan 2, 2018
Scene-Adapted Plug-and-Play Algorithm with Guaranteed Convergence: Applications to Data Fusion in Imaging

Afonso M. Teodoro, José M. Bioucas-Dias, Mário A. T. Figueiredo

The recently proposed plug-and-play (PnP) framework allows leveraging recent developments in image denoising to tackle other, more involved, imaging inverse problems. In a PnP method, a black-box denoiser is plugged into an iterative algorithm, taking the place of a formal denoising step that corresponds to the proximity operator of some convex regularizer. While this approach offers flexibility and excellent performance, convergence of the resulting algorithm may be hard to analyze, as most state-of-the-art denoisers lack an explicit underlying objective function. In this paper, we propose a PnP approach where a scene-adapted prior (i.e., where the denoiser is targeted to the specific scene being imaged) is plugged into ADMM (alternating direction method of multipliers), and prove convergence of the resulting algorithm. Finally, we apply the proposed framework in two different imaging inverse problems: hyperspectral sharpening/fusion and image deblurring from blurred/noisy image pairs.

CVSep 6, 2017
Blind image deblurring using class-adapted image priors

Marina Ljubenović, Mário A. T. Figueiredo

Blind image deblurring (BID) is an ill-posed inverse problem, usually addressed by imposing prior knowledge on the (unknown) image and on the blurring filter. Most of the work on BID has focused on natural images, using image priors based on statistical properties of generic natural images. However, in many applications, it is known that the image being recovered belongs to some specific class (e.g., text, face, fingerprints), and exploiting this knowledge allows obtaining more accurate priors. In this work, we propose a method where a Gaussian mixture model (GMM) is used to learn a class-adapted prior, by training on a dataset of clean images of that class. Experiments show the competitiveness of the proposed method in terms of restoration quality when dealing with images containing text, faces, or fingerprints. Additionally, experiments show that the proposed method is able to handle text images at high noise levels, outperforming state-of-the-art methods specifically designed for BID of text images.

CVFeb 8, 2017
Scene-adapted plug-and-play algorithm with convergence guarantees

Afonso M. Teodoro, José M. Bioucas-Dias, Mário A. T. Figueiredo

Recent frameworks, such as the so-called plug-and-play, allow us to leverage the developments in image denoising to tackle other, and more involved, problems in image processing. As the name suggests, state-of-the-art denoisers are plugged into an iterative algorithm that alternates between a denoising step and the inversion of the observation operator. While these tools offer flexibility, the convergence of the resulting algorithm may be difficult to analyse. In this paper, we plug a state-of-the-art denoiser, based on a Gaussian mixture model, in the iterations of an alternating direction method of multipliers and prove the algorithm is guaranteed to converge. Moreover, we build upon the concept of scene-adapted priors where we learn a model targeted to a specific scene being imaged, and apply the proposed method to address the hyperspectral sharpening problem.

CVMay 23, 2016
Image Restoration with Locally Selected Class-Adapted Models

Afonso M. Teodoro, José M. Bioucas-Dias, Mário A. T. Figueiredo

State-of-the-art algorithms for imaging inverse problems (namely deblurring and reconstruction) are typically iterative, involving a denoising operation as one of its steps. Using a state-of-the-art denoising method in this context is not trivial, and is the focus of current work. Recently, we have proposed to use a class-adapted denoiser (patch-based using Gaussian mixture models) in a so-called plug-and-play scheme, wherein a state-of-the-art denoiser is plugged into an iterative algorithm, leading to results that outperform the best general-purpose algorithms, when applied to an image of a known class (e.g. faces, text, brain MRI). In this paper, we extend that approach to handle situations where the image being processed is from one of a collection of possible classes or, more importantly, contains regions of different classes. More specifically, we propose a method to locally select one of a set of class-adapted Gaussian mixture patch priors, previously estimated from clean images of those classes. Our approach may be seen as simultaneously performing segmentation and restoration, thus contributing to bridging the gap between image restoration/reconstruction and analysis.

CVFeb 12, 2016
Image Restoration and Reconstruction using Variable Splitting and Class-adapted Image Priors

Afonso M. Teodoro, José M. Bioucas-Dias, Mário A. T. Figueiredo

This paper proposes using a Gaussian mixture model as a prior, for solving two image inverse problems, namely image deblurring and compressive imaging. We capitalize on the fact that variable splitting algorithms, like ADMM, are able to decouple the handling of the observation operator from that of the regularizer, and plug a state-of-the-art algorithm into the pure denoising step. Furthermore, we show that, when applied to a specific type of image, a Gaussian mixture model trained from an database of images of the same type is able to outperform current state-of-the-art methods.

DSSep 15, 2014
The Ordered Weighted $\ell_1$ Norm: Atomic Formulation, Projections, and Algorithms

Xiangrong Zeng, Mário A. T. Figueiredo

The ordered weighted $\ell_1$ norm (OWL) was recently proposed, with two different motivations: its good statistical properties as a sparsity promoting regularizer; the fact that it generalizes the so-called {\it octagonal shrinkage and clustering algorithm for regression} (OSCAR), which has the ability to cluster/group regression variables that are highly correlated. This paper contains several contributions to the study and application of OWL regularization: the derivation of the atomic formulation of the OWL norm; the derivation of the dual of the OWL norm, based on its atomic formulation; a new and simpler derivation of the proximity operator of the OWL norm; an efficient scheme to compute the Euclidean projection onto an OWL ball; the instantiation of the conditional gradient (CG, also known as Frank-Wolfe) algorithm for linear regression problems under OWL regularization; the instantiation of accelerated projected gradient algorithms for the same class of problems. Finally, a set of experiments give evidence that accelerated projected gradient algorithms are considerably faster than CG, for the class of problems considered.

CVApr 11, 2014
Decreasing Weighted Sorted $\ell_1$ Regularization

Xiangrong Zeng, Mário A. T. Figueiredo

We consider a new family of regularizers, termed {\it weighted sorted $\ell_1$ norms} (WSL1), which generalizes the recently introduced {\it octagonal shrinkage and clustering algorithm for regression} (OSCAR) and also contains the $\ell_1$ and $\ell_{\infty}$ norms as particular instances. We focus on a special case of the WSL1, the {\sl decreasing WSL1} (DWSL1), where the elements of the argument vector are sorted in non-increasing order and the weights are also non-increasing. In this paper, after showing that the DWSL1 is indeed a norm, we derive two key tools for its use as a regularizer: the dual norm and the Moreau proximity operator.

LGFeb 20, 2014
Group-sparse Matrix Recovery

Xiangrong Zeng, Mário A. T. Figueiredo

We apply the OSCAR (octagonal selection and clustering algorithms for regression) in recovering group-sparse matrices (two-dimensional---2D---arrays) from compressive measurements. We propose a 2D version of OSCAR (2OSCAR) consisting of the $\ell_1$ norm and the pair-wise $\ell_{\infty}$ norm, which is convex but non-differentiable. We show that the proximity operator of 2OSCAR can be computed based on that of OSCAR. The 2OSCAR problem can thus be efficiently solved by state-of-the-art proximal splitting algorithms. Experiments on group-sparse 2D array recovery show that 2OSCAR regularization solved by the SpaRSA algorithm is the fastest choice, while the PADMM algorithm (with debiasing) yields the most accurate results.

CVFeb 20, 2014
Robust Binary Fused Compressive Sensing using Adaptive Outlier Pursuit

Xiangrong Zeng, Mário A. T. Figueiredo

We propose a new method, {\it robust binary fused compressive sensing} (RoBFCS), to recover sparse piece-wise smooth signals from 1-bit compressive measurements. The proposed method is a modification of our previous {\it binary fused compressive sensing} (BFCS) algorithm, which is based on the {\it binary iterative hard thresholding} (BIHT) algorithm. As in BIHT, the data term of the objective function is a one-sided $\ell_1$ (or $\ell_2$) norm. Experiments show that the proposed algorithm is able to take advantage of the piece-wise smoothness of the original signal and detect sign flips and correct them, achieving more accurate recovery than BFCS and BIHT.

CVFeb 20, 2014
Binary Fused Compressive Sensing: 1-Bit Compressive Sensing meets Group Sparsity

Xiangrong Zeng, Mário A. T. Figueiredo

We propose a new method, {\it binary fused compressive sensing} (BFCS), to recover sparse piece-wise smooth signals from 1-bit compressive measurements. The proposed algorithm is a modification of the previous {\it binary iterative hard thresholding} (BIHT) algorithm, where, in addition to the sparsity constraint, the total-variation of the recovered signal is upper constrained. As in BIHT, the data term of the objective function is an one-sided $\ell_1$ (or $\ell_2$) norm. Experiments on the recovery of sparse piece-wise smooth signals show that the proposed algorithm is able to take advantage of the piece-wise smoothness of the original signal, achieving more accurate recovery than BIHT.

CVFeb 20, 2014
Exploiting Two-Dimensional Group Sparsity in 1-Bit Compressive Sensing

Xiangrong Zeng, Mário A. T. Figueiredo

We propose a new approach, {\it two-dimensional fused binary compressive sensing} (2DFBCS) to recover 2D sparse piece-wise signals from 1-bit measurements, exploiting 2D group sparsity for 1-bit compressive sensing recovery. The proposed method is a modified 2D version of the previous {\it binary iterative hard thresholding} (2DBIHT) algorithm, where the objective function includes a 2D one-sided $\ell_1$ (or $\ell_2$) penalty function encouraging agreement with the observed data, an indicator function of $K$-sparsity, and a total variation (TV) or modified TV (MTV) constraint. The subgradient of the 2D one-sided $\ell_1$ (or $\ell_2$) penalty and the projection onto the $K$-sparsity and TV or MTV constraint can be computed efficiently, allowing the appliaction of algorithms of the {\it forward-backward splitting} (a.k.a. {\it iterative shrinkage-thresholding}) family. Experiments on the recovery of 2D sparse piece-wise smooth signals show that the proposed approach is able to take advantage of the piece-wise smoothness of the original signal, achieving more accurate recovery than 2DBIHT. More specifically, 2DFBCS with the MTV and the $\ell_2$ penalty performs best amongst the algorithms tested.

LGOct 18, 2013
A novel sparsity and clustering regularization

Xiangrong Zeng, Mário A. T. Figueiredo

We propose a novel SPARsity and Clustering (SPARC) regularizer, which is a modified version of the previous octagonal shrinkage and clustering algorithm for regression (OSCAR), where, the proposed regularizer consists of a $K$-sparse constraint and a pair-wise $\ell_{\infty}$ norm restricted on the $K$ largest components in magnitude. The proposed regularizer is able to separably enforce $K$-sparsity and encourage the non-zeros to be equal in magnitude. Moreover, it can accurately group the features without shrinking their magnitude. In fact, SPARC is closely related to OSCAR, so that the proximity operator of the former can be efficiently computed based on that of the latter, allowing using proximal splitting algorithms to solve problems with SPARC regularization. Experiments on synthetic data and with benchmark breast cancer data show that SPARC is a competitive group-sparsity inducing regularizer for regression and classification.

CVSep 24, 2013
Solving OSCAR regularization problems by proximal splitting algorithms

Xiangrong Zeng, Mário A. T. Figueiredo

The OSCAR (octagonal selection and clustering algorithm for regression) regularizer consists of a L_1 norm plus a pair-wise L_inf norm (responsible for its grouping behavior) and was proposed to encourage group sparsity in scenarios where the groups are a priori unknown. The OSCAR regularizer has a non-trivial proximity operator, which limits its applicability. We reformulate this regularizer as a weighted sorted L_1 norm, and propose its grouping proximity operator (GPO) and approximate proximity operator (APO), thus making state-of-the-art proximal splitting algorithms (PSAs) available to solve inverse problems with OSCAR regularization. The GPO is in fact the APO followed by additional grouping and averaging operations, which are costly in time and storage, explaining the reason why algorithms with APO are much faster than that with GPO. The convergences of PSAs with GPO are guaranteed since GPO is an exact proximity operator. Although convergence of PSAs with APO is may not be guaranteed, we have experimentally found that APO behaves similarly to GPO when the regularization parameter of the pair-wise L_inf norm is set to an appropriately small value. Experiments on recovery of group-sparse signals (with unknown groups) show that PSAs with APO are very fast and accurate.

OCOct 9, 2012
Deconvolving Images with Unknown Boundaries Using the Alternating Direction Method of Multipliers

Mariana S. C. Almeida, Mário A. T. Figueiredo

The alternating direction method of multipliers (ADMM) has recently sparked interest as a flexible and efficient optimization tool for imaging inverse problems, namely deconvolution and reconstruction under non-smooth convex regularization. ADMM achieves state-of-the-art speed by adopting a divide and conquer strategy, wherein a hard problem is split into simpler, efficiently solvable sub-problems (e.g., using fast Fourier or wavelet transforms, or simple proximity operators). In deconvolution, one of these sub-problems involves a matrix inversion (i.e., solving a linear system), which can be done efficiently (in the discrete Fourier domain) if the observation operator is circulant, i.e., under periodic boundary conditions. This paper extends ADMM-based image deconvolution to the more realistic scenario of unknown boundary, where the observation operator is modeled as the composition of a convolution (with arbitrary boundary conditions) with a spatial mask that keeps only pixels that do not depend on the unknown boundary. The proposed approach also handles, at no extra cost, problems that combine the recovery of missing pixels (i.e., inpainting) with deconvolution. We show that the resulting algorithms inherit the convergence guarantees of ADMM and illustrate its performance on non-periodic deblurring (with and without inpainting of interior pixels) under total-variation and frame-based regularization.

OCMar 16, 2010
Multiplicative Noise Removal Using Variable Splitting and Constrained Optimization

José M. Bioucas-Dias, Mário A. T. Figueiredo

Multiplicative noise (also known as speckle noise) models are central to the study of coherent imaging systems, such as synthetic aperture radar and sonar, and ultrasound and laser imaging. These models introduce two additional layers of difficulties with respect to the standard Gaussian additive noise scenario: (1) the noise is multiplied by (rather than added to) the original image; (2) the noise is not Gaussian, with Rayleigh and Gamma being commonly used densities. These two features of multiplicative noise models preclude the direct application of most state-of-the-art algorithms, which are designed for solving unconstrained optimization problems where the objective has two terms: a quadratic data term (log-likelihood), reflecting the additive and Gaussian nature of the noise, plus a convex (possibly nonsmooth) regularizer (e.g., a total variation or wavelet-based regularizer/prior). In this paper, we address these difficulties by: (1) converting the multiplicative model into an additive one by taking logarithms, as proposed by some other authors; (2) using variable splitting to obtain an equivalent constrained problem; and (3) dealing with this optimization problem using the augmented Lagrangian framework. A set of experiments shows that the proposed method, which we name MIDAL (multiplicative image denoising by augmented Lagrangian), yields state-of-the-art results both in terms of speed and denoising performance.

OCJan 14, 2010
Restoration of Poissonian Images Using Alternating Direction Optimization

Mário A. T. Figueiredo, José M. Bioucas-Dias

Much research has been devoted to the problem of restoring Poissonian images, namely for medical and astronomical applications. However, the restoration of these images using state-of-the-art regularizers (such as those based on multiscale representations or total variation) is still an active research area, since the associated optimization problems are quite challenging. In this paper, we propose an approach to deconvolving Poissonian images, which is based on an alternating direction optimization method. The standard regularization (or maximum a posteriori) restoration criterion, which combines the Poisson log-likelihood with a (non-smooth) convex regularizer (log-prior), leads to hard optimization problems: the log-likelihood is non-quadratic and non-separable, the regularizer is non-smooth, and there is a non-negativity constraint. Using standard convex analysis tools, we present sufficient conditions for existence and uniqueness of solutions of these optimization problems, for several types of regularizers: total-variation, frame-based analysis, and frame-based synthesis. We attack these problems with an instance of the alternating direction method of multipliers (ADMM), which belongs to the family of augmented Lagrangian algorithms. We study sufficient conditions for convergence and show that these are satisfied, either under total-variation or frame-based (analysis and synthesis) regularization. The resulting algorithms are shown to outperform alternative state-of-the-art methods, both in terms of speed and restoration accuracy.

OCDec 17, 2009
An Augmented Lagrangian Approach to the Constrained Optimization Formulation of Imaging Inverse Problems

Manya V. Afonso, José M. Bioucas-Dias, Mário A. T. Figueiredo

We propose a new fast algorithm for solving one of the standard approaches to ill-posed linear inverse problems (IPLIP), where a (possibly non-smooth) regularizer is minimized under the constraint that the solution explains the observations sufficiently well. Although the regularizer and constraint are usually convex, several particular features of these problems (huge dimensionality, non-smoothness) preclude the use of off-the-shelf optimization tools and have stimulated a considerable amount of research. In this paper, we propose a new efficient algorithm to handle one class of constrained problems (often known as basis pursuit denoising) tailored to image recovery applications. The proposed algorithm, which belongs to the family of augmented Lagrangian methods, can be used to deal with a variety of imaging IPLIP, including deconvolution and reconstruction from compressive observations (such as MRI), using either total-variation or wavelet-based (or, more generally, frame-based) regularization. The proposed algorithm is an instance of the so-called "alternating direction method of multipliers", for which convergence sufficient conditions are known; we show that these conditions are satisfied by the proposed algorithm. Experiments on a set of image restoration and reconstruction benchmark problems show that the proposed algorithm is a strong contender for the state-of-the-art.