João Paulo Costeira

h-index26

16papers

532citations

Novelty51%

AI Score46

Ranked #35,751 of 194,257 authors (top 18%)#12,718 in CV (top 22%)

16 Papers

5.0CVNov 10, 2023Code

2D Image head pose estimation via latent space regression under occlusion settings

José Celestino, Manuel Marques, Jacinto C. Nascimento et al.

Head orientation is a challenging Computer Vision problem that has been extensively researched having a wide variety of applications. However, current state-of-the-art systems still underperform in the presence of occlusions and are unreliable for many task applications in such scenarios. This work proposes a novel deep learning approach for the problem of head pose estimation under occlusions. The strategy is based on latent space regression as a fundamental key to better structure the problem for occluded scenarios. Our model surpasses several state-of-the-art methodologies for occluded HPE, and achieves similar accuracy for non-occluded scenarios. We demonstrate the usefulness of the proposed approach with: (i) two synthetically occluded versions of the BIWI and AFLW2000 datasets, (ii) real-life occlusions of the Pandora dataset, and (iii) a real-life application to human-robot interaction scenarios where face occlusions often occur. Specifically, the autonomous feeding from a robotic arm.

2.4OCJan 24, 2018

FLORIS and CLORIS: Hybrid Source and Network Localization Based on Ranges and Video

Beatriz Quintino Ferreira, João Gomes, Cláudia Soares et al.

We propose hybrid methods for localization in wireless sensor networks fusing noisy range measurements with angular information (extracted from video). Compared with conventional methods that rely on a single sensed variable, this may pave the way for improved localization accuracy and robustness. We address both the single-source and network (i.e., cooperative multiple-source) localization paradigms, solving them via optimization of a convex surrogate. The formulations for hybrid localization are unified in the sense that we propose a single nonlinear least-squares cost function, fusing both angular and range measurements. We then relax the problem to obtain an estimate of the optimal positions. This contrasts with other hybrid approaches that alternate the execution of localization algorithms for each type of measurement separately, to progressively refine the position estimates. Single-source localization uses a semidefinite relaxation to obtain a one-shot matrix solution from which the source position is derived via factorization. Network localization uses a different approach where sensor coordinates are retained as optimization variables, and the relaxed cost function is efficiently minimized using fast iterations based on Nesterov's optimal method. Further, an automated calibration procedure is developed to express range and angular information, obtained by different devices, possibly deployed at different locations, in a single consistent coordinate system. This drastically reduces the need for manual calibration that would otherwise negatively impact the practical usability of hybrid range/video localization systems. We develop and test, both in simulation and experimentally, the new hybrid localization algorithms, which not only overcome the limitations of previous fusing approaches but also compare favorably to state-of-the-art methods, outperforming them in some scenarios.

5.0CVSep 18, 2023

Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin

Gabriel Moreira, Manuel Marques, João Paulo Costeira et al.

Recent research in representation learning has shown that hierarchical data lends itself to low-dimensional and highly informative representations in hyperbolic space. However, even if hyperbolic embeddings have gathered attention in image recognition, their optimization is prone to numerical hurdles. Further, it remains unclear which applications stand to benefit the most from the implicit bias imposed by hyperbolicity, when compared to traditional Euclidean features. In this paper, we focus on prototypical hyperbolic neural networks. In particular, the tendency of hyperbolic embeddings to converge to the boundary of the Poincaré ball in high dimensions and the effect this has on few-shot classification. We show that the best few-shot results are attained for hyperbolic embeddings at a common hyperbolic radius. In contrast to prior benchmark results, we demonstrate that better performance can be achieved by a fixed-radius encoder equipped with the Euclidean metric, regardless of the embedding dimension.

8.8LGMay 27

Spectral Guidance for Flexible and Efficient Control of Diffusion Models

Gabriel Moreira, Manuel Marques, João Paulo Costeira et al.

We introduce Spectral Guidance, a framework for controlling diffusion models by leveraging the intrinsic geometry of the generative process. As data is progressively corrupted by noise, only a small number of features remain informative for control. We characterize them as the singular functions of a conditional expectation operator and show that they can be learned via a self-supervised objective. Once recovered, this basis enables the projection of arbitrary guidance signals, such as labels, CLIP embeddings, or masks, directly onto the sampling trajectory. This approach allows for stable, high-fidelity control without retraining or denoiser backpropagation during sampling. Empirically, we improve conditional accuracy on CIFAR-10 by 37 percentage points over the strongest training-free baseline while offering $4\times$ faster sampling. Moreover, the same representations that support label and CLIP guidance also enable spatial control, such as mask-based guidance, without auxiliary models. Finally, our framework reveals a phase transition in the generative process, pinpointing the optimal time window for effective guidance.

2.5AIFeb 25, 2022

Composing Complex and Hybrid AI Solutions

Peter Schüller, João Paolo Costeira, James Crowley et al.

Progress in several areas of computer science has been enabled by comfortable and efficient means of experimentation, clear interfaces, and interchangable components, for example using OpenCV for computer vision or ROS for robotics. We describe an extension of the Acumos system towards enabling the above features for general AI applications. Originally, Acumos was created for telecommunication purposes, mainly for creating linear pipelines of machine learning components. Our extensions include support for more generic components with gRPC/Protobuf interfaces, automatic orchestration of graphically assembled solutions including control loops, sub-component topologies, and event-based communication,and provisions for assembling solutions which contain user interfaces and shared storage areas. We provide examples of deployable solutions and their interfaces. The framework is deployed at http://aiexp.ai4europe.eu/ and its source code is managed as an open source Eclipse project.

3.7CVMar 25, 2024

VICAN: Very Efficient Calibration Algorithm for Large Camera Networks

Gabriel Moreira, Manuel Marques, João Paulo Costeira et al.

The precise estimation of camera poses within large camera networks is a foundational problem in computer vision and robotics, with broad applications spanning autonomous navigation, surveillance, and augmented reality. In this paper, we introduce a novel methodology that extends state-of-the-art Pose Graph Optimization (PGO) techniques. Departing from the conventional PGO paradigm, which primarily relies on camera-camera edges, our approach centers on the introduction of a dynamic element - any rigid object free to move in the scene - whose pose can be reliably inferred from a single image. Specifically, we consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step. This shift not only offers a solution to the challenges encountered in directly estimating relative poses between cameras, particularly in adverse environments, but also leverages the inclusion of numerous object poses to ameliorate and integrate errors, resulting in accurate camera pose estimates. Though our framework retains compatibility with traditional PGO solvers, its efficacy benefits from a custom-tailored optimization scheme. To this end, we introduce an iterative primal-dual algorithm, capable of handling large graphs. Empirical benchmarks, conducted on a new dataset of simulated indoor environments, substantiate the efficacy and efficiency of our approach.

6.5CVSep 16, 2021Code

Rotation Averaging in a Split Second: A Primal-Dual Method and a Closed-Form for Cycle Graphs

Gabriel Moreira, Manuel Marques, João Paulo Costeira

A cornerstone of geometric reconstruction, rotation averaging seeks the set of absolute rotations that optimally explains a set of measured relative orientations between them. In spite of being an integral part of bundle adjustment and structure-from-motion, averaging rotations is both a non-convex and high-dimensional optimization problem. In this paper, we address it from a maximum likelihood estimation standpoint and make a twofold contribution. Firstly, we set forth a novel initialization-free primal-dual method which we show empirically to converge to the global optimum. Further, we derive what is to our knowledge, the first optimal closed-form solution for rotation averaging in cycle graphs and contextualize this result within spectral graph theory. Our proposed methods achieve a significant gain both in precision and performance.

3.3CVApr 20, 2020

Unsupervised Vehicle Counting via Multiple Camera Domain Adaptation

Luca Ciampi, Carlos Santiago, Joao Paulo Costeira et al.

Monitoring vehicle flows in cities is crucial to improve the urban environment and quality of life of citizens. Images are the best sensing modality to perceive and assess the flow of vehicles in large areas. Current technologies for vehicle counting in images hinge on large quantities of annotated data, preventing their scalability to city-scale as new cameras are added to the system. This is a recurrent problem when dealing with physical systems and a key research area in Machine Learning and AI. We propose and discuss a new methodology to design image-based vehicle density estimators with few labeled data via multiple camera domain adaptations.

6.0CVNov 12, 2019

Pose Guided Attention for Multi-label Fashion Image Classification

Beatriz Quintino Ferreira, João P. Costeira, Ricardo G. Sousa et al.

We propose a compact framework with guided attention for multi-label classification in the fashion domain. Our visual semantic attention model (VSAM) is supervised by automatic pose extraction creating a discriminative feature space. VSAM outperforms the state of the art for an in-house dataset and performs on par with previous works on the DeepFashion dataset, even without using any landmark annotations. Additionally, we show that our semantic attention module brings robustness to large quantities of wrong annotations and provides more interpretable results.

0.9CVSep 5, 2017

Subspace Segmentation by Successive Approximations: A Method for Low-Rank and High-Rank Data with Missing Entries

João Carvalho, Manuel Marques, João P. Costeira

We propose a method to reconstruct and cluster incomplete high-dimensional data lying in a union of low-dimensional subspaces. Exploring the sparse representation model, we jointly estimate the missing data while imposing the intrinsic subspace structure. Since we have a non-convex problem, we propose an iterative method to reconstruct the data and provide a sparse similarity affinity matrix. This method is robust to initialization and achieves greater reconstruction accuracy than current methods, which dramatically improves clustering performance. Extensive experiments with synthetic and real data show that our approach leads to significant improvements in the reconstruction and segmentation, outperforming current state of the art for both low and high-rank data.

22.4CVJul 29, 2017Code

FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras

Shanghang Zhang, Guanhang Wu, João P. Costeira et al.

In this paper, we develop deep spatio-temporal neural networks to sequentially count vehicles from low quality videos captured by city cameras (citycams). Citycam videos have low resolution, low frame rate, high occlusion and large perspective, making most existing methods lose their efficacy. To overcome limitations of existing methods and incorporate the temporal information of traffic video, we design a novel FCN-rLSTM network to jointly estimate vehicle density and vehicle count by connecting fully convolutional neural networks (FCN) with long short term memory networks (LSTM) in a residual learning fashion. Such design leverages the strengths of FCN for pixel-level prediction and the strengths of LSTM for learning complex temporal dynamics. The residual learning connection reformulates the vehicle count regression as learning residual functions with reference to the sum of densities in each frame, which significantly accelerates the training of networks. To preserve feature map resolution, we propose a Hyper-Atrous combination to integrate atrous convolution in FCN and combine feature maps of different convolution layers. FCN-rLSTM enables refined feature representation and a novel end-to-end trainable mapping from pixels to vehicle count. We extensively evaluated the proposed method on different counting tasks with three datasets, with experimental results demonstrating their effectiveness and robustness. In particular, FCN-rLSTM reduces the mean absolute error (MAE) from 5.31 to 4.21 on TRANCOS, and reduces the MAE from 2.74 to 1.53 on WebCamT. Training process is accelerated by 5 times on average.

9.7CVJul 13, 2017

Discriminative Optimization: Theory and Applications to Computer Vision Problems

Jayakorn Vongkulbhisal, Fernando De la Torre, João P. Costeira

Many computer vision problems are formulated as the optimization of a cost function. This approach faces two main challenges: (i) designing a cost function with a local optimum at an acceptable solution, and (ii) developing an efficient numerical method to search for one (or multiple) of these local optima. While designing such functions is feasible in the noiseless case, the stability and location of local optima are mostly unknown under noise, occlusion, or missing data. In practice, this can result in undesirable local optima or not having a local optimum in the expected place. On the other hand, numerical optimization algorithms in high-dimensional spaces are typically local and often rely on expensive first or second order information to guide the search. To overcome these limitations, this paper proposes Discriminative Optimization (DO), a method that learns search directions from data without the need of a cost function. Specifically, DO explicitly learns a sequence of updates in the search space that leads to stationary points that correspond to desired solutions. We provide a formal analysis of DO and illustrate its benefits in the problem of 3D point cloud registration, camera pose estimation, and image denoising. We show that DO performed comparably or outperformed state-of-the-art algorithms in terms of accuracy, robustness to perturbations, and computational efficiency.

8.8LGMay 26, 2017Code

Multiple Source Domain Adaptation with Adversarial Training of Neural Networks

Han Zhao, Shanghang Zhang, Guanhang Wu et al.

While domain adaptation has been actively researched in recent years, most theoretical results and algorithms focus on the single-source-single-target adaptation setting. Naive application of such algorithms on multiple source domain adaptation problem may lead to suboptimal solutions. As a step toward bridging the gap, we propose a new generalization bound for domain adaptation when there are multiple source domains with labeled instances and one target domain with unlabeled instances. Compared with existing bounds, the new bound does not require expert knowledge about the target distribution, nor the optimal combination rule for multisource domains. Interestingly, our theory also leads to an efficient learning strategy using adversarial neural networks: we show how to interpret it as learning feature representations that are invariant to the multiple domain shifts while still being discriminative for the learning task. To this end, we propose two models, both of which we call multisource domain adversarial networks (MDANs): the first model optimizes directly our bound, while the second model is a smoothed approximation of the first one, leading to a more data-efficient and task-adaptive model. The optimization tasks of both models are minimax saddle point problems that can be optimized by adversarial training. To demonstrate the effectiveness of MDANs, we conduct extensive experiments showing superior adaptation performance on three real-world datasets: sentiment analysis, digit classification, and vehicle counting.

1.7CVApr 28, 2017

Understanding People Flow in Transportation Hubs

João Carvalho, Manuel Marques, João P. Costeira

In this paper, we aim to monitor the flow of people in large public infrastructures. We propose an unsupervised methodology to cluster people flow patterns into the most typical and meaningful configurations. By processing 3D images from a network of depth cameras, we build a descriptor for the flow pattern. We define a data-irregularity measure that assesses how well each descriptor fits a data model. This allows us to rank flow patterns from highly distinctive (outliers) to very common ones. By discarding outliers, we obtain more reliable key configurations (classes). Synthetic experiments show that the proposed method is superior to standard clustering methods. We applied it in an operational scenario during 14 days in the X-ray screening area of an international airport. Results show that our methodology is able to successfully summarize the representative patterns for such a long observation period, providing relevant information for airport management. Beyond regular flows, our method identifies a set of rare events corresponding to uncommon activities (cleaning, special security and circulating staff).

15.9CVMar 17, 2017Code

Understanding Traffic Density from Large-Scale Web Camera Data

Shanghang Zhang, Guanhang Wu, João P. Costeira et al.

Understanding traffic density from large-scale web camera (webcam) videos is a challenging problem because such videos have low spatial and temporal resolution, high occlusion and large perspective. To deeply understand traffic density, we explore both deep learning based and optimization based methods. To avoid individual vehicle detection and tracking, both methods map the image into vehicle density map, one based on rank constrained regression and the other one based on fully convolution networks (FCN). The regression based method learns different weights for different blocks in the image to increase freedom degrees of weights and embed perspective information. The FCN based method jointly estimates vehicle density map and vehicle count with a residual learning framework to perform end-to-end dense prediction, allowing arbitrary image resolution, and adapting to different vehicle scales and perspectives. We analyze and compare both methods, and get insights from optimization based method to improve deep model. Since existing datasets do not cover all the challenges in our work, we collected and labelled a large-scale traffic video dataset, containing 60 million frames from 212 webcams. Both methods are extensively evaluated and compared on different counting tasks and datasets. FCN based method significantly reduces the mean absolute error from 10.99 to 5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.

2.3MAJan 27, 2017

LocDyn: Robust Distributed Localization for Mobile Underwater Networks

Cláudia Soares, João Gomes, Beatriz Ferreira et al.

How to self-localize large teams of underwater nodes using only noisy range measurements? How to do it in a distributed way, and incorporating dynamics into the problem? How to reject outliers and produce trustworthy position estimates? The stringent acoustic communication channel and the accuracy needs of our geophysical survey application demand faster and more accurate localization methods. We approach dynamic localization as a MAP estimation problem where the prior encodes dynamics, and we devise a convex relaxation method that takes advantage of previous estimates at each measurement acquisition step; The algorithm converges at an optimal rate for first order methods. LocDyn is distributed: there is no fusion center responsible for processing acquired data and the same simple computations are performed for each node. LocDyn is accurate: experiments attest to a smaller positioning error than a comparable Kalman filter. LocDyn is robust: it rejects outlier noise, while the comparing methods succumb in terms of positioning error.