Heikki Huttunen

CV
24papers
1,350citations
Novelty37%
AI Score26

24 Papers

CVSep 21, 2021Code
Towards a Real-Time Facial Analysis System

Bishwo Adhikari, Xingyang Ni, Esa Rahtu et al.

Facial analysis is an active research area in computer vision, with many practical applications. Most of the existing studies focus on addressing one specific task and maximizing its performance. For a complete facial analysis system, one needs to solve these tasks efficiently to ensure a smooth experience. In this work, we present a system-level design of a real-time facial analysis system. With a collection of deep neural networks for object detection, classification, and regression, the system recognizes age, gender, facial expression, and facial similarity for each person that appears in the camera view. We investigate the parallelization and interplay of individual tasks. Results on common off-the-shelf architecture show that the system's accuracy is comparable to the state-of-the-art methods, and the recognition speed satisfies real-time requirements. Moreover, we propose a multitask network for jointly predicting the first three attributes, i.e., age, gender, and facial expression. Source code and trained models are available at https://github.com/mahehu/TUT-live-age-estimator.

CVAug 16, 2021Code
On the Importance of Encrypting Deep Features

Xingyang Ni, Heikki Huttunen, Esa Rahtu

In this study, we analyze model inversion attacks with only two assumptions: feature vectors of user data are known, and a black-box API for inference is provided. On the one hand, limitations of existing studies are addressed by opting for a more practical setting. Experiments have been conducted on state-of-the-art models in person re-identification, and two attack scenarios (i.e., recognizing auxiliary attributes and reconstructing user data) are investigated. Results show that an adversary could successfully infer sensitive information even under severe constraints. On the other hand, it is advisable to encrypt feature vectors, especially for a machine learning model in production. As an alternative to traditional encryption methods such as AES, a simple yet effective method termed ShuffleBits is presented. More specifically, the binary sequence of each floating-point number gets shuffled. Deployed using the one-time pad scheme, it serves as a plug-and-play module that is applicable to any neural network, and the resulting model directly outputs deep features in encrypted form. Source code is publicly available at https://github.com/nixingyang/ShuffleBits.

CVJun 17, 2021Code
using multiple losses for accurate facial age estimation

Yi Zhou, Heikki Huttunen, Tapio Elomaa

Age estimation is an essential challenge in computer vision. With the advances of convolutional neural networks, the performance of age estimation has been dramatically improved. Existing approaches usually treat age estimation as a classification problem. However, the age labels are ambiguous, thus make the classification task difficult. In this paper, we propose a simple yet effective approach for age estimation, which improves the performance compared to classification-based methods. The method combines four classification losses and one regression loss representing different class granularities together, and we name it as Age-Granularity-Net. We validate the Age-Granularity-Net framework on the CVPR Chalearn 2016 dataset, and extensive experiments show that the proposed approach can reduce the prediction error compared to any individual loss. The source code link is https://github.com/yipersevere/age-estimation.

CVJul 15, 2020Code
Adaptive L2 Regularization in Person Re-Identification

Xingyang Ni, Liang Fang, Heikki Huttunen

We introduce an adaptive L2 regularization mechanism in the setting of person re-identification. In the literature, it is common practice to utilize hand-picked regularization factors which remain constant throughout the training procedure. Unlike existing approaches, the regularization factors in our proposed method are updated adaptively through backpropagation. This is achieved by incorporating trainable scalar variables as the regularization factors, which are further fed into a scaled hard sigmoid function. Extensive experiments on the Market-1501, DukeMTMC-reID and MSMT17 datasets validate the effectiveness of our framework. Most notably, we obtain state-of-the-art performance on MSMT17, which is the largest dataset for person re-identification. Source code is publicly available at https://github.com/nixingyang/AdaptiveL2Regularization.

SEJun 30, 2019Code
Are SonarQube Rules Inducing Bugs?

Valentina Lenarduzzi, Francesco Lomio, Heikki Huttunen et al.

Background. The popularity of tools for analyzing Technical Debt, and particularly the popularity of SonarQube, is increasing rapidly. SonarQube proposes a set of coding rules, which represent something wrong in the code that will soon be reflected in a fault or will increase maintenance effort. However, our local companies were not confident in the usefulness of the rules proposed by SonarQube and contracted us to investigate the fault-proneness of these rules. Objective. In this work we aim at understanding which SonarQube rules are actually fault-prone and to understand which machine learning models can be adopted to accurately identify fault-prone rules. Method. We designed and conducted an empirical study on 21 well-known mature open-source projects. We applied the SZZ algorithm to label the fault-inducing commits. We analyzed the fault-proneness by comparing the classification power of seven machine learning models. Result. Among the 202 rules defined for Java by SonarQube, only 25 can be considered to have relatively low fault-proneness. Moreover, violations considered as "bugs" by SonarQube were generally not fault-prone and, consequently, the fault-prediction power of the model proposed by SonarQube is extremely low. Conclusion. The rules applied by SonarQube for calculating technical debt should be thoroughly investigated and their harmfulness needs to be further confirmed. Therefore, companies should carefully consider which rules they really need to apply, especially if their goal is to reduce fault-proneness.

CVMay 10, 2021
Sample selection for efficient image annotation

Bishwo Adhikari, Esa Rahtu, Heikki Huttunen

Supervised object detection has been proven to be successful in many benchmark datasets achieving human-level performances. However, acquiring a large amount of labeled image samples for supervised detection training is tedious, time-consuming, and costly. In this paper, we propose an efficient image selection approach that samples the most informative images from the unlabeled dataset and utilizes human-machine collaboration in an iterative train-annotate loop. Image features are extracted by the CNN network followed by the similarity score calculation, Euclidean distance. Unlabeled images are then sampled into different approaches based on the similarity score. The proposed approach is straightforward, simple and sampling takes place prior to the network training. Experiments on datasets show that our method can reduce up to 80% of manual annotation workload, compared to full manual labeling setting, and performs better than random sampling.

CVMay 9, 2021
Selective Probabilistic Classifier Based on Hypothesis Testing

Saeed Bakhshi Germi, Esa Rahtu, Heikki Huttunen

In this paper, we propose a simple yet effective method to deal with the violation of the Closed-World Assumption for a classifier. Previous works tend to apply a threshold either on the classification scores or the loss function to reject the inputs that violate the assumption. However, these methods cannot achieve the low False Positive Ratio (FPR) required in safety applications. The proposed method is a rejection option based on hypothesis testing with probabilistic networks. With probabilistic networks, it is possible to estimate the distribution of outcomes instead of a single output. By utilizing Z-test over the mean and standard deviation for each class, the proposed method can estimate the statistical significance of the network certainty and reject uncertain outputs. The proposed method was experimented on with different configurations of the COCO and CIFAR datasets. The performance of the proposed method is compared with the Softmax Response, which is a known top-performing method. It is shown that the proposed method can achieve a broader range of operation and cover a lower FPR than the alternative.

CVAug 6, 2020
Handwritten Character Recognition from Wearable Passive RFID

Leevi Raivio, Han He, Johanna Virkki et al.

In this paper we study the recognition of handwritten characters from data captured by a novel wearable electro-textile sensor panel. The data is collected sequentially, such that we record both the stroke order and the resulting bitmap. We propose a preprocessing pipeline that fuses the sequence and bitmap representations together. The data is collected from ten subjects containing altogether 7500 characters. We also propose a convolutional neural network architecture, whose novel upsampling structure enables successful use of conventional ImageNet pretrained networks, despite the small input size of only 10x10 pixels. The proposed model reaches 72\% accuracy in experimental tests, which can be considered good accuracy for this challenging dataset. Both the data and the model are released to the public.

LGJul 2, 2020
Iterative Bounding Box Annotation for Object Detection

Bishwo Adhikari, Heikki Huttunen

Manual annotation of bounding boxes for object detection in digital images is tedious, and time and resource consuming. In this paper, we propose a semi-automatic method for efficient bounding box annotation. The method trains the object detector iteratively on small batches of labeled images and learns to propose bounding boxes for the next batch, after which the human annotator only needs to correct possible errors. We propose an experimental setup for simulating the human actions and use it for comparing different iteration strategies, such as the order in which the data is presented to the annotator. We experiment on our method with three datasets and show that it can reduce the human annotation effort significantly, saving up to 75% of total manual annotation work.

CVJun 29, 2020
Vehicle Attribute Recognition by Appearance: Computer Vision Methods for Vehicle Type, Make and Model Classification

Xingyang Ni, Heikki Huttunen

This paper studies vehicle attribute recognition by appearance. In the literature, image-based target recognition has been extensively investigated in many use cases, such as facial recognition, but less so in the field of vehicle attribute recognition. We survey a number of algorithms that identify vehicle properties ranging from coarse-grained level (vehicle type) to fine-grained level (vehicle make and model). Moreover, we discuss two alternative approaches for these tasks, including straightforward classification and a more flexible metric learning method. Furthermore, we design a simulated real-world scenario for vehicle attribute recognition and present an experimental comparison of the two approaches.

LGAug 23, 2019
Bayesian Receiver Operating Characteristic Metric for Linear Classifiers

Syeda Sakira Hassan, Heikki Huttunen, Jari Niemi et al.

We propose a novel classifier accuracy metric: the Bayesian Area Under the Receiver Operating Characteristic Curve (CBAUC). The method estimates the area under the ROC curve and is related to the recently proposed Bayesian Error Estimator. The metric can assess the quality of a classifier using only the training dataset without the need for computationally expensive cross-validation. We derive a closed-form solution of the proposed accuracy metric for any linear binary classifier under the Gaussianity assumption, and study the accuracy of the proposed estimator using simulated and real-world data. These experiments confirm that the closed-form CBAUC is both faster and more accurate than conventional AUC estimators.

LGMay 1, 2019
Surface Type Classification for Autonomous Robot Indoor Navigation

Francesco Lomio, Erjon Skenderi, Damoon Mohamadi et al.

In this work we describe the preparation of a time series dataset of inertial measurements for determining the surface type under a wheeled robot. The data consists of over 7600 labeled time series samples, with the corresponding surface type annotation. This data was used in two public competitions with over 1500 participant in total. Additionally, we describe the performance of state-of-art deep learning models for time series classification, as well as propose a baseline model based on an ensemble of machine learning methods. The baseline achieves an accuracy of over 68% with our nine-category dataset.

LGApr 16, 2019
HARK Side of Deep Learning -- From Grad Student Descent to Automated Machine Learning

Oguzhan Gencoglu, Mark van Gils, Esin Guldogan et al.

Recent advancements in machine learning research, i.e., deep learning, introduced methods that excel conventional algorithms as well as humans in several complex tasks, ranging from detection of objects in images and speech recognition to playing difficult strategic games. However, the current methodology of machine learning research and consequently, implementations of the real-world applications of such algorithms, seems to have a recurring HARKing (Hypothesizing After the Results are Known) issue. In this work, we elaborate on the algorithmic, economic and social reasons and consequences of this phenomenon. We present examples from current common practices of conducting machine learning research (e.g. avoidance of reporting negative results) and failure of generalization ability of the proposed algorithms and datasets in actual real-life usage. Furthermore, a potential future trajectory of machine learning research and development from the perspective of accountable, unbiased, ethical and privacy-aware algorithmic decision making is discussed. We would like to emphasize that with this discussion we neither claim to provide an exhaustive argumentation nor blame any specific institution or individual on the raised issues. This is simply a discussion put forth by us, insiders of the machine learning field, reflecting on us.

CVOct 1, 2018
Elastic Neural Networks for Classification

Yi Zhou, Yue Bai, Shuvra S. Bhattacharyya et al.

In this work we propose a framework for improving the performance of any deep neural network that may suffer from vanishing gradients. To address the vanishing gradient issue, we study a framework, where we insert an intermediate output branch after each layer in the computational graph and use the corresponding prediction loss for feeding the gradient to the early layers. The framework - which we name Elastic network - is tested with several well-known networks on CIFAR10 and CIFAR100 datasets, and the experimental results show that the proposed framework improves the accuracy on both shallow networks (e.g., MobileNet) and deep convolutional neural networks (e.g., DenseNet). We also identify the types of networks where the framework does not improve the performance and discuss the reasons. Finally, as a side product, the computational complexity of the resulting networks can be adjusted in an elastic manner by selecting the output branch according to current computational budget.

CVSep 14, 2018
Real Time System for Facial Analysis

Janne Tommola, Pedram Ghazi, Bishwo Adhikari et al.

In this paper we describe the anatomy of a real-time facial analysis system. The system recognizes the age, gender and facial expression from users in appearing in front of the camera. All components are based on convolutional neural networks, whose accuracy we study on commonly used training and evaluation sets. A key contribution of the work is the description of the interplay between processing threads for frame grabbing, face detection and the three types of recognition. The python code for executing the system uses common libraries--keras/tensorflow, opencv and dlib--and is available for download.

ASAug 2, 2018
Acoustic Scene Classification: A Competition Review

Shayan Gharib, Honain Derrar, Daisuke Niizumi et al.

In this paper we study the problem of acoustic scene classification, i.e., categorization of audio sequences into mutually exclusive classes based on their spectral content. We describe the methods and results discovered during a competition organized in the context of a graduate machine learning course; both by the students and external participants. We identify the most suitable methods and study the impact of each by performing an ablation study of the mixture of approaches. We also compare the results with a neural network baseline, and show the improvement over that. Finally, we discuss the impact of using a competition as a part of a university course, and justify its importance in the curriculum based on student feedback.

CVAug 1, 2018
Classification of Building Information Model (BIM) Structures with Deep Learning

Francesco Lomio, Ricardo Farinha, Mauri Laasonen et al.

In this work we study an application of machine learning to the construction industry and we use classical and modern machine learning methods to categorize images of building designs into three classes: Apartment building, Industrial building or Other. No real images are used, but only images extracted from Building Information Model (BIM) software, as these are used by the construction industry to store building designs. For this task, we compared four different methods: the first is based on classical machine learning, where Histogram of Oriented Gradients (HOG) was used for feature extraction and a Support Vector Machine (SVM) for classification; the other three methods are based on deep learning, covering common pre-trained networks as well as ones designed from scratch. To validate the accuracy of the models, a database of 240 images was used. The accuracy achieved is 57% for the HOG + SVM model, and above 89% for the neural networks.

LGAug 1, 2018
Binarized Convolutional Neural Networks for Efficient Inference on GPUs

Mir Khan, Heikki Huttunen, Jani Boutellier

Convolutional neural networks have recently achieved significant breakthroughs in various image classification tasks. However, they are computationally expensive,which can make their feasible mplementation on embedded and low-power devices difficult. In this paper convolutional neural network binarization is implemented on GPU-based platforms for real-time inference on resource constrained devices. In binarized networks, all weights and intermediate computations between layers are quantized to +1 and -1, allowing multiplications and additions to be replaced with bit-wise operations between 32-bit words. This representation completely eliminates the need for floating point multiplications and additions and decreases both the computational load and the memory footprint compared to a full-precision network implemented in floating point, making it well-suited for resource-constrained environments. We compare the performance of our implementation with an equivalent floating point implementation on one desktop and two embedded GPU platforms. Our implementation achieves a maximum speed up of 7. 4X with only 4.4% loss in accuracy compared to a reference implementation.

CVJul 10, 2018
Embedded Implementation of a Deep Learning Smile Detector

Pedram Ghazi, Antti P. Happonen, Jani Boutellier et al.

In this paper we study the real time deployment of deep learning algorithms in low resource computational environments. As the use case, we compare the accuracy and speed of neural networks for smile detection using different neural network architectures and their system level implementation on NVidia Jetson embedded platform. We also propose an asynchronous multithreading scheme for parallelizing the pipeline. Within this framework, we experimentally compare thirteen widely used network topologies. The experiments show that low complexity architectures can achieve almost equal performance as larger ones, with a fraction of computation required.

CVJul 3, 2018
Faster Bounding Box Annotation for Object Detection in Indoor Scenes

Bishwo Adhikari, Jukka Peltomäki, Jussi Puura et al.

This paper proposes an approach for rapid bounding box annotation for object detection datasets. The procedure consists of two stages: The first step is to annotate a part of the dataset manually, and the second step proposes annotations for the remaining samples using a model trained with the first stage annotations. We experimentally study which first/second stage split minimizes to total workload. In addition, we introduce a new fully labeled object detection dataset collected from indoor scenes. Compared to other indoor datasets, our collection has more class categories, different backgrounds, lighting conditions, occlusion and high intra-class differences. We train deep learning based object detectors with a number of state-of-the-art models and compare them in terms of speed and accuracy. The fully annotated dataset is released freely available for the research community.

CVJul 2, 2018
Elastic Neural Networks: A Scalable Framework for Embedded Computer Vision

Yue Bai, Shuvra S. Bhattacharyya, Antti P. Happonen et al.

We propose a new framework for image classification with deep neural networks. The framework introduces intermediate outputs to the computational graph of a network. This enables flexible control of the computational load and balances the tradeoff between accuracy and execution time. Moreover, we present an interesting finding that the intermediate outputs can act as a regularizer at training time, improving the prediction accuracy. In the experimental section we demonstrate the performance of our proposed framework with various commonly used pretrained deep networks in the use case of apparent age estimation.

LGFeb 21, 2017
Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Emre Çakır, Giambattista Parascandolo, Toni Heittola et al.

Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

SDApr 4, 2016
Recurrent Neural Networks for Polyphonic Sound Event Detection in Real Life Recordings

Giambattista Parascandolo, Heikki Huttunen, Tuomas Virtanen

In this paper we present an approach to polyphonic sound event detection in real life recordings based on bi-directional long short term memory (BLSTM) recurrent neural networks (RNNs). A single multilabel BLSTM RNN is trained to map acoustic features of a mixture signal consisting of sounds from multiple classes, to binary activity indicators of each event class. Our method is tested on a large database of real-life recordings, with 61 classes (e.g. music, car, speech) from 10 different everyday contexts. The proposed method outperforms previous approaches by a large margin, and the results are further improved using data augmentation techniques. Overall, our system reports an average F1-score of 65.5% on 1 second blocks and 64.7% on single frames, a relative improvement over previous state-of-the-art approach of 6.8% and 15.1% respectively.

CVFeb 23, 2016
Car Type Recognition with Deep Neural Networks

Heikki Huttunen, Fatemeh Shokrollahi Yancheshmeh, Ke Chen

In this paper we study automatic recognition of cars of four types: Bus, Truck, Van and Small car. For this problem we consider two data driven frameworks: a deep neural network and a support vector machine using SIFT features. The accuracy of the methods is validated with a database of over 6500 images, and the resulting prediction accuracy is over 97 %. This clearly exceeds the accuracies of earlier studies that use manually engineered feature extraction pipelines.