CVMar 19, 2025
DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid FrameworkHenrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar et al.
Optical flow estimation is essential for video processing tasks, such as restoration and action recognition. The quality of videos is constantly increasing, with current standards reaching 8K resolution. However, optical flow methods are usually designed for low resolution and do not generalize to large inputs due to their rigid architectures. They adopt downscaling or input tiling to reduce the input size, causing a loss of details and global information. There is also a lack of optical flow benchmarks to judge the actual performance of existing methods on high-resolution samples. Previous works only conducted qualitative high-resolution evaluations on hand-picked samples. This paper fills this gap in optical flow estimation in two ways. We propose DPFlow, an adaptive optical flow architecture capable of generalizing up to 8K resolution inputs while trained with only low-resolution samples. We also introduce Kubric-NK, a new benchmark for evaluating optical flow methods with input resolutions ranging from 1K to 8K. Our high-resolution evaluation pushes the boundaries of existing methods and reveals new insights about their generalization capabilities. Extensive experimental results show that DPFlow achieves state-of-the-art results on the MPI-Sintel, KITTI 2015, Spring, and other high-resolution benchmarks.
LGJun 18, 2025
Creating User-steerable Projections with Interactive Semantic MappingArtur André Oliveira, Mateus Espadoto, Roberto Hirata et al.
Dimensionality reduction (DR) techniques map high-dimensional data into lower-dimensional spaces. Yet, current DR techniques are not designed to explore semantic structure that is not directly available in the form of variables or class labels. We introduce a novel user-guided projection framework for image and text data that enables customizable, interpretable, data visualizations via zero-shot classification with Multimodal Large Language Models (MLLMs). We enable users to steer projections dynamically via natural-language guiding prompts, to specify high-level semantic relationships of interest to the users which are not explicitly present in the data dimensions. We evaluate our method across several datasets and show that it not only enhances cluster separation, but also transforms DR into an interactive, user-driven process. Our approach bridges the gap between fully automated DR techniques and human-centered data exploration, offering a flexible and adaptive way to tailor projections to specific analytical needs.
LGNov 18, 2025
Knowledge Graphs as Structured Memory for Embedding Spaces: From Training Clusters to Explainable InferenceArtur A. Oliveira, Mateus Espadoto, Roberto M. Cesar et al.
We introduce Graph Memory (GM), a structured non-parametric framework that augments embedding-based inference with a compact, relational memory over region-level prototypes. Rather than treating each training instance in isolation, GM summarizes the embedding space into prototype nodes annotated with reliability indicators and connected by edges that encode geometric and contextual relations. This design unifies instance retrieval, prototype-based reasoning, and graph-based label propagation within a single inductive model that supports both efficient inference and faithful explanation. Experiments on synthetic and real datasets including breast histopathology (IDC) show that GM achieves accuracy competitive with $k$NN and Label Spreading while offering substantially better calibration and smoother decision boundaries, all with an order of magnitude fewer samples. By explicitly modeling reliability and relational structure, GM provides a principled bridge between local evidence and global consistency in non-parametric learning.
CVDec 12, 2021
Sidewalk Measurements from Satellite Images: Preliminary FindingsMaryam Hosseini, Iago B. Araujo, Hamed Yazdanpanah et al.
Large-scale analysis of pedestrian infrastructures, particularly sidewalks, is critical to human-centric urban planning and design. Benefiting from the rich data set of planimetric features and high-resolution orthoimages provided through the New York City Open Data portal, we train a computer vision model to detect sidewalks, roads, and buildings from remote-sensing imagery and achieve 83% mIoU over held-out test set. We apply shape analysis techniques to study different attributes of the extracted sidewalks. More specifically, we do a tile-wise analysis of the width, angle, and curvature of sidewalks, which aside from their general impacts on walkability and accessibility of urban areas, are known to have significant roles in the mobility of wheelchair users. The preliminary results are promising, glimpsing the potential of the proposed approach to be adopted in different cities, enabling researchers and practitioners to have a more vivid picture of the pedestrian realm.
MLJul 5, 2021
Template-Based Graph ClusteringMateus Riva, Florian Yger, Pietro Gori et al.
We propose a novel graph clustering method guided by additional information on the underlying structure of the clusters (or communities). The problem is formulated as the matching of a graph to a template with smaller dimension, hence matching $n$ vertices of the observed graph (to be clustered) to the $k$ vertices of a template graph, using its edges as support information, and relaxed on the set of orthonormal matrices in order to find a $k$ dimensional embedding. With relevant priors that encode the density of the clusters and their relationships, our method outperforms classical methods, especially for challenging cases.
CVAug 2, 2019
A Structural Graph-Based Method for MRI AnalysisLarissa de O. Penteado, Mateus Riva, Roberto M. Cesar
The importance of imaging exams, such as Magnetic Resonance Imaging (MRI), for the diagnostic and follow-up of pediatric pathologies and the assessment of anatomical structures' development has been increasingly highlighted in recent times. Manual analysis of MRIs is time-consuming, subjective, and requires significant expertise. To mitigate this, automatic techniques are necessary. Most techniques focus on adult subjects, while pediatric MRI has specific challenges such as the ongoing anatomical and histological changes related to normal development of the organs, reduced signal-to-noise ratio due to the smaller bodies, motion artifacts and cooperation issues, especially in long exams, which can in many cases preclude common analysis methods developed for use in adults. Therefore, the development of a robust technique to aid in pediatric MRI analysis is necessary. This paper presents the current development of a new method based on the learning and matching of structural relational graphs (SRGs). The experiments were performed on liver MRI sequences of one patient from ICr-HC-FMUSP, and preliminary results showcased the viability of the project. Future experiments are expected to culminate with an application for pediatric liver substructure and brain tumor segmentation.
CVJun 28, 2017
A New Urban Objects Detection Framework Using Weakly Annotated SetsEric Keiji, Gabriel Ferreira, Claudio Silva et al.
Urban informatics explore data science methods to address different urban issues intensively based on data. The large variety and quantity of data available should be explored but this brings important challenges. For instance, although there are powerful computer vision methods that may be explored, they may require large annotated datasets. In this work we propose a novel approach to automatically creating an object recognition system with minimal manual annotation. The basic idea behind the method is to use large input datasets using available online cameras on large cities. A off-the-shelf weak classifier is used to detect an initial set of urban elements of interest (e.g. cars, pedestrians, bikes, etc.). Such initial dataset undergoes a quality control procedure and it is subsequently used to fine tune a strong classifier. Quality control and comparative performance assessment are used as part of the pipeline. We evaluate the method for detecting cars based on monitoring cameras. Experimental results using real data show that despite losing generality, the final detector provides better detection rates tailored to the selected cameras. The programmed robot gathered 770 video hours from 24 online city cameras (\~300GB), which has been fed to the proposed system. Our approach has shown that the method nearly doubled the recall (93\%) with respect to state-of-the-art methods using off-the-shelf algorithms.