LGMay 12, 2022Code
Distinction Maximization Loss: Efficiently Improving Out-of-Distribution Detection and Uncertainty Estimation by Replacing the Loss and CalibratingDavid Macêdo, Cleber Zanchettin, Teresa Ludermir
Building robust deterministic neural networks remains a challenge. On the one hand, some approaches improve out-of-distribution detection at the cost of reducing classification accuracy in some situations. On the other hand, some methods simultaneously increase classification accuracy, uncertainty estimation, and out-of-distribution detection at the expense of reducing the inference efficiency. In this paper, we propose training deterministic neural networks using our DisMax loss, which works as a drop-in replacement for the usual SoftMax loss (i.e., the combination of the linear output layer, the SoftMax activation, and the cross-entropy loss). Starting from the IsoMax+ loss, we create each logit based on the distances to all prototypes, rather than just the one associated with the correct class. We also introduce a mechanism to combine images to construct what we call fractional probability regularization. Moreover, we present a fast way to calibrate the network after training. Finally, we propose a composite score to perform out-of-distribution detection. Our experiments show that DisMax usually outperforms current approaches simultaneously in classification accuracy, uncertainty estimation, and out-of-distribution detection while maintaining deterministic neural network inference efficiency. The code to reproduce the results is available at https://github.com/dlmacedo/distinction-maximization-loss.
LGMay 30, 2021Code
Enhanced Isotropy Maximization Loss: Seamless and High-Performance Out-of-Distribution Detection Simply Replacing the SoftMax LossDavid Macêdo, Teresa Ludermir
Current out-of-distribution detection approaches usually present special requirements (e.g., collecting outlier data and hyperparameter validation) and produce side effects (e.g., classification accuracy drop and slow/inefficient inferences). Recently, entropic out-of-distribution detection has been proposed as a seamless approach (i.e., a solution that avoids all previously mentioned drawbacks). The entropic out-of-distribution detection solution uses the IsoMax loss for training and the entropic score for out-of-distribution detection. The IsoMax loss works as a drop-in replacement of the SoftMax loss (i.e., the combination of the output linear layer, the SoftMax activation, and the cross-entropy loss) because swapping the SoftMax loss with the IsoMax loss requires no changes in the model's architecture or training procedures/hyperparameters. In this paper, we perform what we call an isometrization of the distances used in the IsoMax loss. Additionally, we propose replacing the entropic score with the minimum distance score. Experiments showed that these modifications significantly increase out-of-distribution detection performance while keeping the solution seamless. Besides being competitive with or outperforming all major current approaches, the proposed solution avoids all their current limitations, in addition to being much easier to use because only a simple loss replacement for training the neural network is required. The code to replace the SoftMax loss with the IsoMax+ loss and reproduce the results is available at https://github.com/dlmacedo/entropic-out-of-distribution-detection.
CYJan 29, 2025
International AI Safety ReportYoshua Bengio, Sören Mindermann, Daniel Privitera et al. · eth-zurich, mit
The first International AI Safety Report comprehensively synthesizes the current evidence on the capabilities, risks, and safety of advanced AI systems. The report was mandated by the nations attending the AI Safety Summit in Bletchley, UK. Thirty nations, the UN, the OECD, and the EU each nominated a representative to the report's Expert Advisory Panel. A total of 100 AI experts contributed, representing diverse perspectives and disciplines. Led by the report's Chair, these independent experts collectively had full discretion over the report's content.
CYNov 5, 2024
International Scientific Report on the Safety of Advanced AI (Interim Report)Yoshua Bengio, Sören Mindermann, Daniel Privitera et al. · eth-zurich
This is the interim publication of the first International Scientific Report on the Safety of Advanced AI. The report synthesises the scientific understanding of general-purpose AI -- AI that can perform a wide variety of tasks -- with a focus on understanding and managing its risks. A diverse group of 75 AI experts contributed to this report, including an international Expert Advisory Panel nominated by 30 countries, the EU, and the UN. Led by the Chair, these independent experts collectively had full discretion over the report's content. The final report is available at arXiv:2501.17805
LGAug 2, 2021
Metodos de Agrupamentos em dois EstagiosJefferson Souza, Teresa Ludermir
This work investigates the use of two-stage clustering methods. Four techniques were proposed: SOMK, SOMAK, ASCAK and SOINAK. SOMK is composed of a SOM (Self-Organizing Maps) followed by the K-means algorithm, SOMAK is a combination of SOM followed by the Ant K-means (AK) algorithm, ASCAK is composed by the ASCA (Ant System-based Clustering Algorithm) and AK algorithms, SOINAK is composed by the Self-Organizing Incremental Neural Network (SOINN) and AK. SOINAK presented a better performance among the four proposed techniques when applied to pattern recognition problems.
LGJul 29, 2021
Otimizacao de pesos e funcoes de ativacao de redes neurais aplicadas na previsao de series temporaisGecynalda Gomes, Teresa Ludermir
Neural Networks have been applied for time series prediction with good experimental results that indicate the high capacity to approximate functions with good precision. Most neural models used in these applications use activation functions with fixed parameters. However, it is known that the choice of activation function strongly influences the complexity and performance of the neural network and that a limited number of activation functions have been used. In this work, we propose the use of a family of free parameter asymmetric activation functions for neural networks and show that this family of defined activation functions satisfies the requirements of the universal approximation theorem. A methodology for the global optimization of this family of activation functions with free parameter and the weights of the connections between the processing units of the neural network is used. The central idea of the proposed methodology is to simultaneously optimize the weights and the activation function used in a multilayer perceptron network (MLP), through an approach that combines the advantages of simulated annealing, tabu search and a local learning algorithm, with the purpose of improving performance in the adjustment and forecasting of time series. We chose two learning algorithms: backpropagation with the term momentum (BPM) and LevenbergMarquardt (LM).
NEJul 18, 2021
Otimizacao de Redes Neurais atraves de Algoritmos Geneticos CelularesAnderson da Silva, Teresa Ludermir
This works proposes a methodology to searching for automatically Artificial Neural Networks (ANN) by using Cellular Genetic Algorithm (CGA). The goal of this methodology is to find compact networks whit good performance for classification problems. The main reason for developing this work is centered at the difficulties of configuring compact ANNs with good performance rating. The use of CGAs aims at seeking the components of the RNA in the same way that a common Genetic Algorithm (GA), but it has the differential of incorporating a Cellular Automaton (CA) to give location for the GA individuals. The location imposed by the CA aims to control the spread of solutions in the populations to maintain the genetic diversity for longer time. This genetic diversity is important for obtain good results with the GAs.
NEJul 10, 2021
Meta-aprendizado para otimizacao de parametros de redes neuraisTarsicio Lucas, Teresa Ludermir, Ricardo Prudencio et al.
The optimization of Artificial Neural Networks (ANNs) is an important task to the success of using these models in real-world applications. The solutions adopted to this task are expensive in general, involving trial-and-error procedures or expert knowledge which are not always available. In this work, we investigated the use of meta-learning to the optimization of ANNs. Meta-learning is a research field aiming to automatically acquiring knowledge which relates features of the learning problems to the performance of the learning algorithms. The meta-learning techniques were originally proposed and evaluated to the algorithm selection problem and after to the optimization of parameters for Support Vector Machines. However, meta-learning can be adopted as a more general strategy to optimize ANN parameters, which motivates new efforts in this research direction. In the current work, we performed a case study using meta-learning to choose the number of hidden nodes for MLP networks, which is an important parameter to be defined aiming a good networks performance. In our work, we generated a base of meta-examples associated to 93 regression problems. Each meta-example was generated from a regression problem and stored: 16 features describing the problem (e.g., number of attributes and correlation among the problem attributes) and the best number of nodes for this problem, empirically chosen from a range of possible values. This set of meta-examples was given as input to a meta-learner which was able to predict the best number of nodes for new problems based on their features. The experiments performed in this case study revealed satisfactory results.
NEJul 5, 2021
Uso de GSO cooperativos com decaimentos de pesos para otimizacao de redes neuraisDanielle Silva, Teresa Ludermir
Training of Artificial Neural Networks is a complex task of great importance in supervised learning problems. Evolutionary Algorithms are widely used as global optimization techniques and these approaches have been used for Artificial Neural Networks to perform various tasks. An optimization algorithm, called Group Search Optimizer (GSO), was proposed and inspired by the search behaviour of animals. In this article we present two new hybrid approaches: CGSO-Hk-WD and CGSO-Sk-WD. Cooperative GSOs are based on the divide-and-conquer paradigm, employing cooperative behaviour between GSO groups to improve the performance of the standard GSO. We also apply the weight decay strategy (WD, acronym for Weight Decay) to increase the generalizability of the networks. The results show that cooperative GSOs are able to achieve better performance than traditional GSO for classification problems in benchmark datasets such as Cancer, Diabetes, Ecoli and Glass datasets.
LGFeb 17, 2021
Training Aware Sigmoidal OptimizerDavid Macêdo, Pedro Dreyer, Teresa Ludermir et al.
Proper optimization of deep neural networks is an open research question since an optimal procedure to change the learning rate throughout training is still unknown. Manually defining a learning rate schedule involves troublesome time-consuming try and error procedures to determine hyperparameters such as learning rate decay epochs and learning rate decay rates. Although adaptive learning rate optimizers automatize this process, recent studies suggest they may produce overffiting and reduce performance when compared to fine-tuned learning rate schedules. Considering that deep neural networks loss functions present landscapes with much more saddle points than local minima, we proposed the Training Aware Sigmoidal Optimizer (TASO), which consists of a two-phases automated learning rate schedule. The first phase uses a high learning rate to fast traverse the numerous saddle point, while the second phase uses low learning rate to slowly approach the center of the local minimum previously found. We compared the proposed approach with commonly used adaptive learning rate schedules such as Adam, RMSProp, and Adagrad. Our experiments showed that TASO outperformed all competing methods in both optimal (i.e., performing hyperparameter validation) and suboptimal (i.e., using default hyperparameters) scenarios.
LGOct 13, 2020
Similarity Based Stratified Splitting: an approach to train better classifiersFelipe Farias, Teresa Ludermir, Carmelo Bastos-Filho
We propose a Similarity-Based Stratified Splitting (SBSS) technique, which uses both the output and input space information to split the data. The splits are generated using similarity functions among samples to place similar samples in different splits. This approach allows for a better representation of the data in the training phase. This strategy leads to a more realistic performance estimation when used in real-world applications. We evaluate our proposal in twenty-two benchmark datasets with classifiers such as Multi-Layer Perceptron, Support Vector Machine, Random Forest and K-Nearest Neighbors, and five similarity functions Cityblock, Chebyshev, Cosine, Correlation, and Euclidean. According to the Wilcoxon Sign-Rank test, our approach consistently outperformed ordinary stratified 10-fold cross-validation in 75\% of the assessed scenarios.
LGJun 7, 2020
Entropic Out-of-Distribution Detection: Seamless Detection of Unknown ExamplesDavid Macêdo, Tsang Ing Ren, Cleber Zanchettin et al.
In this paper, we argue that the unsatisfactory out-of-distribution (OOD) detection performance of neural networks is mainly due to the SoftMax loss anisotropy and propensity to produce low entropy probability distributions in disagreement with the principle of maximum entropy. Current out-of-distribution (OOD) detection approaches usually do not directly fix the SoftMax loss drawbacks, but rather build techniques to circumvent it. Unfortunately, those methods usually produce undesired side effects (e.g., classification accuracy drop, additional hyperparameters, slower inferences, and collecting extra data). In the opposite direction, we propose replacing SoftMax loss with a novel loss function that does not suffer from the mentioned weaknesses. The proposed IsoMax loss is isotropic (exclusively distance-based) and provides high entropy posterior probability distributions. Replacing the SoftMax loss by IsoMax loss requires no model or training changes. Additionally, the models trained with IsoMax loss produce as fast and energy-efficient inferences as those trained using SoftMax loss. Moreover, no classification accuracy drop is observed. The proposed method does not rely on outlier/background data, hyperparameter tuning, temperature calibration, feature extraction, metric learning, adversarial training, ensemble procedures, or generative models. Our experiments showed that IsoMax loss works as a seamless SoftMax loss drop-in replacement that significantly improves neural networks' OOD detection performance. Hence, it may be used as a baseline OOD detection approach to be combined with current or future OOD detection techniques to achieve even higher results.
LGAug 15, 2019
Entropic Out-of-Distribution DetectionDavid Macêdo, Tsang Ing Ren, Cleber Zanchettin et al.
Out-of-distribution (OOD) detection approaches usually present special requirements (e.g., hyperparameter validation, collection of outlier data) and produce side effects (e.g., classification accuracy drop, slower energy-inefficient inferences). We argue that these issues are a consequence of the SoftMax loss anisotropy and disagreement with the maximum entropy principle. Thus, we propose the IsoMax loss and the entropic score. The seamless drop-in replacement of the SoftMax loss by IsoMax loss requires neither additional data collection nor hyperparameter validation. The trained models do not exhibit classification accuracy drop and produce fast energy-efficient inferences. Moreover, our experiments show that training neural networks with IsoMax loss significantly improves their OOD detection performance. The IsoMax loss exhibits state-of-the-art performance under the mentioned conditions (fast energy-efficient inference, no classification accuracy drop, no collection of outlier data, and no hyperparameter validation), which we call the seamless OOD detection task. In future work, current OOD detection methods may replace the SoftMax loss with the IsoMax loss to improve their performance on the commonly studied non-seamless OOD detection problem.