Ognjen Arandjelovic

CV
h-index11
35papers
663citations
Novelty45%
AI Score39

35 Papers

CVAug 23, 2023
Multimodal Latent Emotion Recognition from Micro-expression and Physiological Signals

Liangfei Zhang, Yifei Qian, Ognjen Arandjelovic et al.

This paper discusses the benefits of incorporating multimodal data for improving latent emotion recognition accuracy, focusing on micro-expression (ME) and physiological signals (PS). The proposed approach presents a novel multimodal learning framework that combines ME and PS, including a 1D separable and mixable depthwise inception network, a standardised normal distribution weighted feature fusion method, and depth/physiology guided attention modules for multimodal learning. Experimental results show that the proposed approach outperforms the benchmark method, with the weighted fusion method and guided attention modules both contributing to enhanced performance.

CVJan 14
Do Transformers Understand Ancient Roman Coin Motifs Better than CNNs?

David Reid, Ognjen Arandjelovic

Automated analysis of ancient coins has the potential to help researchers extract more historical insights from large collections of coins and to help collectors understand what they are buying or selling. Recent research in this area has shown promise in focusing on identification of semantic elements as they are commonly depicted on ancient coins, by using convolutional neural networks (CNNs). This paper is the first to apply the recently proposed Vision Transformer (ViT) deep learning architecture to the task of identification of semantic elements on coins, using fully automatic learning from multi-modal data (images and unstructured text). This article summarises previous research in the area, discusses the training and implementation of ViT and CNN models for ancient coins analysis and provides an evaluation of their performance. The ViT models were found to outperform the newly trained CNN models in accuracy.

LGDec 14, 2023
Context-PEFT: Efficient Multi-Modal, Multi-Task Fine-Tuning

Avelina Asada Hadji-Kyriacou, Ognjen Arandjelovic

This paper introduces a novel Parameter-Efficient Fine-Tuning (PEFT) framework for multi-modal, multi-task transfer learning with pre-trained language models. PEFT techniques such as LoRA, BitFit and IA3 have demonstrated comparable performance to full fine-tuning of pre-trained models for specific downstream tasks, all while demanding significantly fewer trainable parameters and reduced GPU memory consumption. However, in the context of multi-modal fine-tuning, the need for architectural modifications or full fine-tuning often becomes apparent. To address this we propose Context-PEFT, which learns different groups of adaptor parameters based on the token's domain. This approach enables LoRA-like weight injection without requiring additional architectural changes. Our method is evaluated on the COCO captioning task, where it outperforms full fine-tuning under similar data constraints while simultaneously offering a substantially more parameter-efficient and computationally economical solution.

CVDec 12, 2021
Magnifying Networks for Images with Billions of Pixels

Neofytos Dimitriou, Ognjen Arandjelovic

The shift towards end-to-end deep learning has brought unprecedented advances in many areas of computer vision. However, deep neural networks are trained on images with resolutions that rarely exceed $1,000 \times 1,000$ pixels. The growing use of scanners that create images with extremely high resolutions (average can be $100,000 \times 100,000$ pixels) thereby presents novel challenges to the field. Most of the published methods preprocess high-resolution images into a set of smaller patches, imposing an a priori belief on the best properties of the extracted patches (magnification, field of view, location, etc.). Herein, we introduce Magnifying Networks (MagNets) as an alternative deep learning solution for gigapixel image analysis that does not rely on a preprocessing stage nor requires the processing of billions of pixels. MagNets can learn to dynamically retrieve any part of a gigapixel image, at any magnification level and field of view, in an end-to-end fashion with minimal ground truth (a single global, slide-level label). Our results on the publicly available Camelyon16 and Camelyon17 datasets corroborate to the effectiveness and efficiency of MagNets and the proposed optimization framework for whole slide image classification. Importantly, MagNets process far less patches from each slide than any of the existing approaches ($10$ to $300$ times less).

CVDec 10, 2021
Short and Long Range Relation Based Spatio-Temporal Transformer for Micro-Expression Recognition

Liangfei Zhang, Xiaopeng Hong, Ognjen Arandjelovic et al.

Being spontaneous, micro-expressions are useful in the inference of a person's true emotions even if an attempt is made to conceal them. Due to their short duration and low intensity, the recognition of micro-expressions is a difficult task in affective computing. The early work based on handcrafted spatio-temporal features which showed some promise, has recently been superseded by different deep learning approaches which now compete for the state of the art performance. Nevertheless, the problem of capturing both local and global spatio-temporal patterns remains challenging. To this end, herein we propose a novel spatio-temporal transformer architecture -- to the best of our knowledge, the first purely transformer based approach (i.e. void of any convolutional network use) for micro-expression recognition. The architecture comprises a spatial encoder which learns spatial patterns, a temporal aggregator for temporal dimension analysis, and a classification head. A comprehensive evaluation on three widely used spontaneous micro-expression data sets, namely SMIC-HS, CASME II and SAMM, shows that the proposed approach consistently outperforms the state of the art, and is the first framework in the published literature on micro-expression recognition to achieve the unweighted F1-score greater than 0.9 on any of the aforementioned data sets.

CVJul 16, 2020
A New Look at Ghost Normalization

Neofytos Dimitriou, Ognjen Arandjelovic

Batch normalization (BatchNorm) is an effective yet poorly understood technique for neural network optimization. It is often assumed that the degradation in BatchNorm performance to smaller batch sizes stems from it having to estimate layer statistics using smaller sample sizes. However, recently, Ghost normalization (GhostNorm), a variant of BatchNorm that explicitly uses smaller sample sizes for normalization, has been shown to improve upon BatchNorm in some datasets. Our contributions are: (i) we uncover a source of regularization that is unique to GhostNorm, and not simply an extension from BatchNorm, (ii) three types of GhostNorm implementations are described, two of which employ BatchNorm as the underlying normalization technique, (iii) by visualising the loss landscape of GhostNorm, we observe that GhostNorm consistently decreases the smoothness when compared to BatchNorm, (iv) we introduce Sequential Normalization (SeqNorm), and report superior performance over state-of-the-art methodologies on both CIFAR--10 and CIFAR--100 datasets.

CVMar 7, 2019
Understanding Ancient Coin Images

Jessica Cooper, Ognjen Arandjelovic

In recent years, a range of problems within the broad umbrella of automatic, computer vision based analysis of ancient coins has been attracting an increasing amount of attention. Notwithstanding this research effort, the results achieved by the state of the art in the published literature remain poor and far from sufficiently well performing for any practical purpose. In the present paper we present a series of contributions which we believe will benefit the interested community. Firstly, we explain that the approach of visual matching of coins, universally adopted in all existing published papers on the topic, is not of practical interest because the number of ancient coin types exceeds by far the number of those types which have been imaged, be it in digital form (e.g. online) or otherwise (traditional film, in print, etc.). Rather, we argue that the focus should be on the understanding of the semantic content of coins. Hence, we describe a novel method which uses real-world multimodal input to extract and associate semantic concepts with the correct coin images and then using a novel convolutional neural network learn the appearance of these concepts. Empirical evidence on a real-world and by far the largest data set of ancient coins, we demonstrate highly promising results.

IVFeb 10, 2019
Colorectal Cancer Outcome Prediction from H&E Whole Slide Images using Machine Learning and Automatically Inferred Phenotype Profiles

Xingzhi Yue, Neofytos Dimitriou, Ognjen Arandjelovic

Digital pathology (DP) is a new research area which falls under the broad umbrella of health informatics. Owing to its potential for major public health impact, in recent years DP has been attracting much research attention. Nevertheless, a wide breadth of significant conceptual and technical challenges remain, few of them greater than those encountered in the field of oncology. The automatic analysis of digital pathology slides of cancerous tissues is particularly problematic due to the inherent heterogeneity of the disease, extremely large images, amongst numerous others. In this paper we introduce a novel machine learning based framework for the prediction of colorectal cancer outcome from whole digitized haematoxylin & eosin (H&E) stained histopathology slides. Using a real-world data set we demonstrate the effectiveness of the method and present a detailed analysis of its different elements which corroborate its ability to extract and learn salient, discriminative, and clinically meaningful content.

CVAug 26, 2017
Synthesising Wider Field Images from Narrow-Field Retinal Video Acquired Using a Low-Cost Direct Ophthalmoscope (Arclight) Attached to a Smartphone

Keylor Daniel Chaves Viquez, Ognjen Arandjelovic, Andrew Blaikie et al.

Access to low cost retinal imaging devices in low and middle income countries is limited, compromising progress in preventing needless blindness. The Arclight is a recently developed low-cost solar powered direct ophthalmoscope which can be attached to the camera of a smartphone to acquire retinal images and video. However, the acquired data is inherently limited by the optics of direct ophthalmoscopy, resulting in a narrow field of view with associated corneal reflections, limiting its usefulness. In this work we describe the first fully automatic method utilizing videos acquired using the Arclight attached to a mobile phone camera to create wider view, higher quality still images comparable with images obtained using much more expensive and bulky dedicated traditional retinal cameras.

CVMay 27, 2016
Achieving stable subspace clustering by post-processing generic clustering results

Duc-Son Pham, Ognjen Arandjelovic, Svetha Venkatesh

We propose an effective subspace selection scheme as a post-processing step to improve results obtained by sparse subspace clustering (SSC). Our method starts by the computation of stable subspaces using a novel random sampling scheme. Thus constructed preliminary subspaces are used to identify the initially incorrectly clustered data points and then to reassign them to more suitable clusters based on their goodness-of-fit to the preliminary model. To improve the robustness of the algorithm, we use a dominant nearest subspace classification scheme that controls the level of sensitivity against reassignment. We demonstrate that our algorithm is convergent and superior to the direct application of a generic alternative such as principal component analysis. On several popular datasets for motion segmentation and face clustering pervasively used in the sparse subspace clustering literature the proposed method is shown to reduce greatly the incidence of clustering errors while introducing negligible disturbance to the data points already correctly clustered.

CLMay 16, 2016
Identification of promising research directions using machine learning aided medical literature analysis

Victor Andrei, Ognjen Arandjelovic

The rapidly expanding corpus of medical research literature presents major challenges in the understanding of previous work, the extraction of maximum information from collected data, and the identification of promising research directions. We present a case for the use of advanced machine learning techniques as an aide in this task and introduce a novel methodology that is shown to be capable of extracting meaningful information from large longitudinal corpora, and of tracking complex temporal changes within it.

CVApr 5, 2016
Highly accurate gaze estimation using a consumer RGB-depth sensor

Reza Shoja Ghiass, Ognjen Arandjelovic

Determining the direction in which a person is looking is an important problem in a wide range of HCI applications. In this paper we describe a highly accurate algorithm that performs gaze estimation using an affordable and widely available device such as Kinect. The method we propose starts by performing accurate head pose estimation achieved by fitting a person specific morphable model of the face to depth data. The ordinarily competing requirements of high accuracy and high speed are met concurrently by formulating the fitting objective function as a combination of terms which excel either in accurate or fast fitting, and then by adaptively adjusting their relative contributions throughout fitting. Following pose estimation, pose normalization is done by re-rendering the fitted model as a frontal face. Finally gaze estimates are obtained through regression from the appearance of the eyes in synthetic, normalized images. Using EYEDIAP, the standard public dataset for the evaluation of gaze estimation algorithms from RGB-D data, we demonstrate that our method greatly outperforms the state of the art.

CVMar 16, 2016
Descriptor transition tables for object retrieval using unconstrained cluttered video acquired using a consumer level handheld mobile device

Warren Rieutort-Louis, Ognjen Arandjelovic

Visual recognition and vision based retrieval of objects from large databases are tasks with a wide spectrum of potential applications. In this paper we propose a novel recognition method from video sequences suitable for retrieval from databases acquired in highly unconstrained conditions e.g. using a mobile consumer-level device such as a phone. On the lowest level, we represent each sequence as a 3D mesh of densely packed local appearance descriptors. While image plane geometry is captured implicitly by a large overlap of neighbouring regions from which the descriptors are extracted, 3D information is extracted by means of a descriptor transition table, learnt from a single sequence for each known gallery object. These allow us to connect local descriptors along the 3rd dimension (which corresponds to viewpoint changes), thus resulting in a set of variable length Markov chains for each video. The matching of two sets of such chains is formulated as a statistical hypothesis test, whereby a subset of each is chosen to maximize the likelihood that the corresponding video sequences show the same object. The effectiveness of the proposed algorithm is empirically evaluated on the Amsterdam Library of Object Images and a new highly challenging video data set acquired using a mobile phone. On both data sets our method is shown to be successful in recognition in the presence of background clutter and large viewpoint changes.

CVMar 2, 2016
Learnt quasi-transitive similarity for retrieval from large collections of faces

Ognjen Arandjelovic

We are interested in identity-based retrieval of face sets from large unlabelled collections acquired in uncontrolled environments. Given a baseline algorithm for measuring the similarity of two face sets, the meta-algorithm introduced in this paper seeks to leverage the structure of the data corpus to make the best use of the available baseline. In particular, we show how partial transitivity of inter-personal similarity can be exploited to improve the retrieval of particularly challenging sets which poorly match the query under the baseline measure. We: (i) describe the use of proxy sets as a means of computing the similarity between two sets, (ii) introduce transitivity meta-features based on the similarity of salient modes of appearance variation between sets, (iii) show how quasi-transitivity can be learnt from such features without any labelling or manual intervention, and (iv) demonstrate the effectiveness of the proposed methodology through experiments on the notoriously challenging YouTube database and two successful baselines from the literature.

QMFeb 27, 2016
Prediction of future hospital admissions - what is the tradeoff between specificity and accuracy?

Ieva Vasiljeva, Ognjen Arandjelovic

Large amounts of electronic medical records collected by hospitals across the developed world offer unprecedented possibilities for knowledge discovery using computer based data mining and machine learning. Notwithstanding significant research efforts, the use of this data in the prediction of disease development has largely been disappointing. In this paper we examine in detail a recently proposed method which has in preliminary experiments demonstrated highly promising results on real-world data. We scrutinize the authors' claims that the proposed model is scalable and investigate whether the tradeoff between prediction specificity (i.e. the ability of the model to predict a wide number of different ailments) and accuracy (i.e. the ability of the model to make the correct prediction) is practically viable. Our experiments conducted on a data corpus of nearly 3,000,000 admissions support the authors' expectations and demonstrate that the high prediction accuracy is maintained well even when the number of admission types explicitly included in the model is increased to account for 98% of all admissions in the corpus. Thus several promising directions for future work are highlighted.

IRDec 25, 2015
Discovering topic structures of a temporally evolving document corpus

Adham Beykikhoshk, Ognjen Arandjelovic, Dinh Phung et al.

In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting, and merging. The power of the proposed framework is demonstrated on two medical literature corpora concerned with the autism spectrum disorder (ASD) and the metabolic syndrome (MetS) -- both increasingly important research subjects with significant social and healthcare consequences. In addition to the collected ASD and metabolic syndrome literature corpora which we made freely available, our contribution also includes an extensive empirical analysis of the proposed framework. We describe a detailed and careful examination of the effects that our algorithms's free parameters have on its output, and discuss the significance of the findings both in the context of the practical application of our algorithm as well as in the context of the existing body of work on temporal topic analysis. Our quantitative analysis is followed by several qualitative case studies highly relevant to the current research on ASD and MetS, on which our algorithm is shown to capture well the actual developments in these fields.

CVJun 23, 2015
Automatic vehicle tracking and recognition from aerial image sequences

Ognjen Arandjelovic

This paper addresses the problem of automated vehicle tracking and recognition from aerial image sequences. Motivated by its successes in the existing literature focus on the use of linear appearance subspaces to describe multi-view object appearance and highlight the challenges involved in their application as a part of a practical system. A working solution which includes steps for data extraction and normalization is described. In experiments on real-world data the proposed methodology achieved promising results with a high correct recognition rate and few, meaningful errors (type II errors whereby genuinely similar targets are sometimes being confused with one another). Directions for future research and possible improvements of the proposed method are discussed.

CVApr 21, 2015
Automatic Face Recognition from Video

Ognjen Arandjelovic

The objective of this work is to automatically recognize faces from video sequences in a realistic, unconstrained setup in which illumination conditions are extreme and greatly changing, viewpoint and user motion pattern have a wide variability, and video input is of low quality. At the centre of focus are face appearance manifolds: this thesis presents a significant advance of their understanding and application in the sphere of face recognition. The two main contributions are the Generic Shape-Illumination Manifold recognition algorithm and the Anisotropic Manifold Space clustering. The Generic Shape-Illumination Manifold is evaluated on a large data corpus acquired in real-world conditions and its performance is shown to greatly exceed that of state-of-the-art methods in the literature and the best performing commercial software. Empirical evaluation of the Anisotropic Manifold Space clustering on a popular situation comedy is also described with excellent preliminary results.

CVApr 21, 2015
The adaptable buffer algorithm for high quantile estimation in non-stationary data streams

Ognjen Arandjelovic, Duc-Son Pham, Svetha Venkatesh

The need to estimate a particular quantile of a distribution is an important problem which frequently arises in many computer vision and signal processing applications. For example, our work was motivated by the requirements of many semi-automatic surveillance analytics systems which detect abnormalities in close-circuit television (CCTV) footage using statistical models of low-level motion features. In this paper we specifically address the problem of estimating the running quantile of a data stream with non-stationary stochasticity when the memory for storing observations is limited. We make several major contributions: (i) we derive an important theoretical result which shows that the change in the quantile of a stream is constrained regardless of the stochastic properties of data, (ii) we describe a set of high-level design goals for an effective estimation algorithm that emerge as a consequence of our theoretical findings, (iii) we introduce a novel algorithm which implements the aforementioned design goals by retaining a sample of data values in a manner adaptive to changes in the distribution of data and progressively narrowing down its focus in the periods of quasi-stationary stochasticity, and (iv) we present a comprehensive evaluation of the proposed algorithm and compare it with the existing methods in the literature on both synthetic data sets and three large `real-world' streams acquired in the course of operation of an existing commercial surveillance system. Our findings convincingly demonstrate that the proposed method is highly successful and vastly outperforms the existing alternatives, especially when the target quantile is high valued and the available buffer capacity severely limited.

CVApr 21, 2015
Groupwise registration of aerial images

Ognjen Arandjelovic, Duc-Son Pham, Svetha Venkatesh

This paper addresses the task of time separated aerial image registration. The ability to solve this problem accurately and reliably is important for a variety of subsequent image understanding applications. The principal challenge lies in the extent and nature of transient appearance variation that a land area can undergo, such as that caused by the change in illumination conditions, seasonal variations, or the occlusion by non-persistent objects (people, cars). Our work introduces several novelties: (i) unlike all previous work on aerial image registration, we approach the problem using a set-based paradigm; (ii) we show how local, pair-wise constraints can be used to enforce a globally good registration using a constraints graph structure; (iii) we show how a simple holistic representation derived from raw aerial images can be used as a basic building block of the constraints graph in a manner which achieves both high registration accuracy and speed. We demonstrate: (i) that the proposed method outperforms the state-of-the-art for pair-wise registration already, achieving greater accuracy and reliability, while at the same time reducing the computational cost of the task; and (ii) that the increase in the number of available images in a set consistently reduces the average registration error.

CVApr 21, 2015
Viewpoint distortion compensation in practical surveillance systems

Ognjen Arandjelovic, Duc-Son Pham, Svetha Venkatesh

Our aim is to estimate the perspective-effected geometric distortion of a scene from a video feed. In contrast to all previous work we wish to achieve this using from low-level, spatio-temporally local motion features used in commercial semi-automatic surveillance systems. We: (i) describe a dense algorithm which uses motion features to estimate the perspective distortion at each image locus and then polls all such local estimates to arrive at the globally best estimate, (ii) present an alternative coarse algorithm which subdivides the image frame into blocks, and uses motion features to derive block-specific motion characteristics and constrain the relationships between these characteristics, with the perspective estimate emerging as a result of a global optimization scheme, and (iii) report the results of an evaluation using nine large sets acquired using existing close-circuit television (CCTV) cameras. Our findings demonstrate that both of the proposed methods are successful, their accuracy matching that of human labelling using complete visual data.

IRFeb 8, 2015
Hierarchical Dirichlet process for tracking complex topical structure evolution and its application to autism research literature

Adham Beykikhoshk, Ognjen Arandjelovic, Dinh Phung et al.

In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, and splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) - an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we will make freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.

LGNov 14, 2014
Sample-targeted clinical trial adaptation

Ognjen Arandjelovic

Clinical trial adaptation refers to any adjustment of the trial protocol after the onset of the trial. The main goal is to make the process of introducing new medical interventions to patients more efficient by reducing the cost and the time associated with evaluating their safety and efficacy. The principal question is how should adaptation be performed so as to minimize the chance of distorting the outcome of the trial. We propose a novel method for achieving this. Unlike previous work our approach focuses on trial adaptation by sample size adjustment. We adopt a recently proposed stratification framework based on collected auxiliary data and show that this information together with the primary measured variables can be used to make a probabilistically informed choice of the particular sub-group a sample should be removed from. Experiments on simulated data are used to illustrate the effectiveness of our method and its application in practice.

CVJul 28, 2014
A unified framework for thermal face recognition

Reza Shoja Ghiass, Ognjen Arandjelovic, Hakim Bendada et al.

The reduction of the cost of infrared (IR) cameras in recent years has made IR imaging a highly viable modality for face recognition in practice. A particularly attractive advantage of IR-based over conventional, visible spectrum-based face recognition stems from its invariance to visible illumination. In this paper we argue that the main limitation of previous work on face recognition using IR lies in its ad hoc approach to treating different nuisance factors which affect appearance, prohibiting a unified approach that is capable of handling concurrent changes in multiple (or indeed all) major extrinsic sources of variability, which is needed in practice. We describe the first approach that attempts to achieve this - the framework we propose achieves outstanding recognition performance in the presence of variable (i) pose, (ii) facial expression, (iii) physiological state, (iv) partial occlusion due to eye-wear, and (v) quasi-occlusion due to facial hair growth.

CVJun 28, 2014
A framework for improving the performance of verification algorithms with a low false positive rate requirement and limited training data

Ognjen Arandjelovic

In this paper we address the problem of matching patterns in the so-called verification setting in which a novel, query pattern is verified against a single training pattern: the decision sought is whether the two match (i.e. belong to the same class) or not. Unlike previous work which has universally focused on the development of more discriminative distance functions between patterns, here we consider the equally important and pervasive task of selecting a distance threshold which fits a particular operational requirement - specifically, the target false positive rate (FPR). First, we argue on theoretical grounds that a data-driven approach is inherently ill-conditioned when the desired FPR is low, because by the very nature of the challenge only a small portion of training data affects or is affected by the desired threshold. This leads us to propose a general, statistical model-based method instead. Our approach is based on the interpretation of an inter-pattern distance as implicitly defining a pattern embedding which approximately distributes patterns according to an isotropic multi-variate normal distribution in some space. This interpretation is then used to show that the distribution of training inter-pattern distances is the non-central chi2 distribution, differently parameterized for each class. Thus, to make the class-specific threshold choice we propose a novel analysis-by-synthesis iterative algorithm which estimates the three free parameters of the model (for each class) using task-specific constraints. The validity of the premises of our work and the effectiveness of the proposed method are demonstrated by applying the method to the task of set-based face verification on a large database of pseudo-random head motion videos.

CVJan 31, 2014
Hallucinating optimal high-dimensional subspaces

Ognjen Arandjelovic

Linear subspace representations of appearance variation are pervasive in computer vision. This paper addresses the problem of robustly matching such subspaces (computing the similarity between them) when they are used to describe the scope of variations within sets of images of different (possibly greatly so) scales. A naive solution of projecting the low-scale subspace into the high-scale image space is described first and subsequently shown to be inadequate, especially at large scale discrepancies. A successful approach is proposed instead. It consists of (i) an interpolated projection of the low-scale subspace into the high-scale space, which is followed by (ii) a rotation of this initial estimate within the bounds of the imposed ``downsampling constraint''. The optimal rotation is found in the closed-form which best aligns the high-scale reconstruction of the low-scale subspace with the reference it is compared to. The method is evaluated on the problem of matching sets of (i) face appearances under varying illumination and (ii) object appearances under varying viewpoint, using two large data sets. In comparison to the naive matching, the proposed algorithm is shown to greatly increase the separation of between-class and within-class similarities, as well as produce far more meaningful modes of common appearance on which the match score is based.

CVJan 29, 2014
Infrared face recognition: a comprehensive review of methodologies and databases

Reza Shoja Ghiass, Ognjen Arandjelovic, Hakim Bendada et al.

Automatic face recognition is an area with immense practical potential which includes a wide range of commercial and law enforcement applications. Hence it is unsurprising that it continues to be one of the most active research areas of computer vision. Even after over three decades of intense research, the state-of-the-art in face recognition continues to improve, benefitting from advances in a range of different research fields such as image processing, pattern recognition, computer graphics, and physiology. Systems based on visible spectrum images, the most researched face recognition modality, have reached a significant level of maturity with some practical success. However, they continue to face challenges in the presence of illumination, pose and expression changes, as well as facial disguises, all of which can significantly decrease recognition accuracy. Amongst various approaches which have been proposed in an attempt to overcome these limitations, the use of infrared (IR) imaging has emerged as a particularly promising research direction. This paper presents a comprehensive and timely review of the literature on this subject. Our key contributions are: (i) a summary of the inherent properties of infrared imaging which makes this modality promising in the context of face recognition, (ii) a systematic review of the most influential approaches, with a focus on emerging common trends as well as key differences between alternative methodologies, (iii) a description of the main databases of infrared facial images available to the researcher, and lastly (iv) a discussion of the most promising avenues for future research.

CVJun 20, 2013
Computer simulation based parameter selection for resistance exercise

Ognjen Arandjelovic

In contrast to most scientific disciplines, sports science research has been characterized by comparatively little effort investment in the development of relevant phenomenological models. Scarcer yet is the application of said models in practice. We present a framework which allows resistance training practitioners to employ a recently proposed neuromuscular model in actual training program design. The first novelty concerns the monitoring aspect of coaching. A method for extracting training performance characteristics from loosely constrained video sequences, effortlessly and with minimal human input, using computer vision is described. The extracted data is subsequently used to fit the underlying neuromuscular model. This is achieved by solving an inverse dynamics problem corresponding to a particular exercise. Lastly, a computer simulation of hypothetical training bouts, using athlete-specific capability parameters, is used to predict the effected adaptation and changes in performance. The software described here allows the practitioner to manipulate hypothetical training parameters and immediately see their effect on predicted adaptation for a specific athlete. Thus, this work presents a holistic view of the monitoring-assessment-adjustment loop.

CVJun 14, 2013
Live-wire 3D medical images segmentation

Ognjen Arandjelovic

This report describes the design, implementation, evaluation and original enhancements to the Live-Wire method for 2D and 3D image segmentation. Live-Wire 2D employs a semi-automatic paradigm; the user is asked to select a few boundary points of the object to segment, to steer the process in the right direction, while the result is displayed in real time. In our implementation segmentation is extended to three dimensions by performing this process on a slice-by-slice basis. User's time and involvement is further reduced by allowing him to specify object contours in planes orthogonal to the slices. If these planes are chosen strategically, Live-Wire 3D can perform 2D segmentation in the plane of each slice automatically. This report also proposes two improvements to the original method, path heating and a new graph edge feature function based on variance of path properties along the boundary. We show that these improvements lead up to a 33% reduction in interaction with the user, and improved delineation in presence of strong interfering edges.

CVJun 14, 2013
Matching objects across the textured-smooth continuum

Ognjen Arandjelovic

The problem of 3D object recognition is of immense practical importance, with the last decade witnessing a number of breakthroughs in the state of the art. Most of the previous work has focused on the matching of textured objects using local appearance descriptors extracted around salient image points. The recently proposed bag of boundaries method was the first to address directly the problem of matching smooth objects using boundary features. However, no previous work has attempted to achieve a holistic treatment of the problem by jointly using textural and shape features which is what we describe herein. Due to the complementarity of the two modalities, we fuse the corresponding matching scores and learn their relative weighting in a data specific manner by optimizing discriminative performance on synthetically distorted data. For the textural description of an object we adopt a representation in the form of a histogram of SIFT based visual words. Similarly the apparent shape of an object is represented by a histogram of discretized features capturing local shape. On a large public database of a diverse set of objects, the proposed method is shown to outperform significantly both purely textural and purely shape based approaches for matching across viewpoint variation.

CVJun 10, 2013
Discriminative k-means clustering

Ognjen Arandjelovic

The k-means algorithm is a partitional clustering method. Over 60 years old, it has been successfully used for a variety of problems. The popularity of k-means is in large part a consequence of its simplicity and efficiency. In this paper we are inspired by these appealing properties of k-means in the development of a clustering algorithm which accepts the notion of "positively" and "negatively" labelled data. The goal is to discover the cluster structure of both positive and negative data in a manner which allows for the discrimination between the two sets. The usefulness of this idea is demonstrated practically on the problem of face recognition, where the task of learning the scope of a person's appearance should be done in a manner which allows this face to be differentiated from others.

CVJun 10, 2013
Discriminative extended canonical correlation analysis for pattern set matching

Ognjen Arandjelovic

In this paper we address the problem of matching sets of vectors embedded in the same input space. We propose an approach which is motivated by canonical correlation analysis (CCA), a statistical technique which has proven successful in a wide variety of pattern recognition problems. Like CCA when applied to the matching of sets, our extended canonical correlation analysis (E-CCA) aims to extract the most similar modes of variability within two sets. Our first major contribution is the formulation of a principled framework for robust inference of such modes from data in the presence of uncertainty associated with noise and sampling randomness. E-CCA retains the efficiency and closed form computability of CCA, but unlike it, does not possess free parameters which cannot be inferred directly from data (inherent data dimensionality, and the number of canonical correlations used for set similarity computation). Our second major contribution is to show that in contrast to CCA, E-CCA is readily adapted to match sets in a discriminative learning scheme which we call discriminative extended canonical correlation analysis (DE-CCA). Theoretical contributions of this paper are followed by an empirical evaluation of its premises on the task of face recognition from sets of rasterized appearance images. The results demonstrate that our approach, E-CCA, already outperforms both CCA and its quasi-discriminative counterpart constrained CCA (C-CCA), for all values of their free parameters. An even greater improvement is achieved with the discriminative variant, DE-CCA.

CVJun 7, 2013
Vesselness features and the inverse compositional AAM for robust face recognition using thermal IR

Reza Shoja Ghiass, Ognjen Arandjelovic, Hakim Bendada et al.

Over the course of the last decade, infrared (IR) and particularly thermal IR imaging based face recognition has emerged as a promising complement to conventional, visible spectrum based approaches which continue to struggle when applied in the real world. While inherently insensitive to visible spectrum illumination changes, IR images introduce specific challenges of their own, most notably sensitivity to factors which affect facial heat emission patterns, e.g. emotional state, ambient temperature, and alcohol intake. In addition, facial expression and pose changes are more difficult to correct in IR images because they are less rich in high frequency detail which is an important cue for fitting any deformable model. We describe a novel method which addresses these challenges. To normalize for pose and facial expression changes we generate a synthetic frontal image of a face in a canonical, neutral facial expression from an image of the face in an arbitrary pose and facial expression. This is achieved by piecewise affine warping which follows active appearance model (AAM) fitting. This is the first publication which explores the use of an AAM on thermal IR images; we propose a pre-processing step which enhances detail in thermal images, making AAM convergence faster and more accurate. To overcome the problem of thermal IR image sensitivity to the pattern of facial temperature emissions we describe a representation based on reliable anatomical features. In contrast to previous approaches, our representation is not binary; rather, our method accounts for the reliability of the extracted features. This makes the proposed representation much more robust both to pose and scale changes. The effectiveness of the proposed approach is demonstrated on the largest public database of thermal IR images of faces on which it achieved 100% identification, significantly outperforming previous methods.

CVJun 7, 2013
Illumination-invariant face recognition from a single image across extreme pose using a dual dimension AAM ensemble in the thermal infrared spectrum

Reza Shoja Ghiass, Ognjen Arandjelovic, Hakim Bendada et al.

Over the course of the last decade, infrared (IR) and particularly thermal IR imaging based face recognition has emerged as a promising complement to conventional, visible spectrum based approaches which continue to struggle when applied in practice. While inherently insensitive to visible spectrum illumination changes, IR data introduces specific challenges of its own, most notably sensitivity to factors which affect facial heat emission patterns, e.g. emotional state, ambient temperature, and alcohol intake. In addition, facial expression and pose changes are more difficult to correct in IR images because they are less rich in high frequency detail which is an important cue for fitting any deformable model. In this paper we describe a novel method which addresses these major challenges. Specifically, when comparing two thermal IR images of faces, we mutually normalize their poses and facial expressions by using an active appearance model (AAM) to generate synthetic images of the two faces with a neutral facial expression and in the same view (the average of the two input views). This is achieved by piecewise affine warping which follows AAM fitting. A major contribution of our work is the use of an AAM ensemble in which each AAM is specialized to a particular range of poses and a particular region of the thermal IR face space. Combined with the contributions from our previous work which addressed the problem of reliable AAM fitting in the thermal IR spectrum, and the development of a person-specific representation robust to transient changes in the pattern of facial temperature emissions, the proposed ensemble framework accurately matches faces across the full range of yaw from frontal to profile, even in the presence of scale variation (e.g. due to the varying distance of a subject from the camera).

CVJun 7, 2013
Infrared face recognition: a literature review

Reza Shoja Ghiass, Ognjen Arandjelovic, Hakim Bendada et al.

Automatic face recognition (AFR) is an area with immense practical potential which includes a wide range of commercial and law enforcement applications, and it continues to be one of the most active research areas of computer vision. Even after over three decades of intense research, the state-of-the-art in AFR continues to improve, benefiting from advances in a range of different fields including image processing, pattern recognition, computer graphics and physiology. However, systems based on visible spectrum images continue to face challenges in the presence of illumination, pose and expression changes, as well as facial disguises, all of which can significantly decrease their accuracy. Amongst various approaches which have been proposed in an attempt to overcome these limitations, the use of infrared (IR) imaging has emerged as a particularly promising research direction. This paper presents a comprehensive and timely review of the literature on this subject.