Claudio T. Silva

CV
h-index29
19papers
547citations
Novelty36%
AI Score27

19 Papers

CVJun 28, 2022
Towards Global-Scale Crowd+AI Techniques to Map and Assess Sidewalks for People with Disabilities

Maryam Hosseini, Mikey Saugstad, Fabio Miranda et al. · mit, uw

There is a lack of data on the location, condition, and accessibility of sidewalks across the world, which not only impacts where and how people travel but also fundamentally limits interactive mapping tools and urban analytics. In this paper, we describe initial work in semi-automatically building a sidewalk network topology from satellite imagery using hierarchical multi-scale attention models, inferring surface materials from street-level images using active learning-based semantic segmentation, and assessing sidewalk condition and accessibility features using Crowd+AI. We close with a call to create a database of labeled satellite and streetscape scenes for sidewalks and sidewalk accessibility issues along with standardized benchmarks.

CYMay 25, 2022
Urban Rhapsody: Large-scale exploration of urban soundscapes

Joao Rulff, Fabio Miranda, Maryam Hosseini et al. · mit

Noise is one of the primary quality-of-life issues in urban environments. In addition to annoyance, noise negatively impacts public health and educational performance. While low-cost sensors can be deployed to monitor ambient noise levels at high temporal resolutions, the amount of data they produce and the complexity of these data pose significant analytical challenges. One way to address these challenges is through machine listening techniques, which are used to extract features in attempts to classify the source of noise and understand temporal patterns of a city's noise situation. However, the overwhelming number of noise sources in the urban environment and the scarcity of labeled data makes it nearly impossible to create classification models with large enough vocabularies that capture the true dynamism of urban soundscapes In this paper, we first identify a set of requirements in the yet unexplored domain of urban soundscape exploration. To satisfy the requirements and tackle the identified challenges, we propose Urban Rhapsody, a framework that combines state-of-the-art audio representation, machine learning, and visual analytics to allow users to interactively create classification models, understand noise patterns of a city, and quickly retrieve and label audio excerpts in order to create a large high-precision annotated database of urban sound recordings. We demonstrate the tool's utility through case studies performed by domain experts using data generated over the five-year deployment of a one-of-a-kind sensor network in New York City.

CVSep 30, 2021Code
IntentVizor: Towards Generic Query Guided Interactive Video Summarization

Guande Wu, Jianzhe Lin, Claudio T. Silva

The target of automatic video summarization is to create a short skim of the original long video while preserving the major content/events. There is a growing interest in the integration of user queries into video summarization or query-driven video summarization. This video summarization method predicts a concise synopsis of the original video based on the user query, which is commonly represented by the input text. However, two inherent problems exist in this query-driven way. First, the text query might not be enough to describe the exact and diverse needs of the user. Second, the user cannot edit once the summaries are produced, while we assume the needs of the user should be subtle and need to be adjusted interactively. To solve these two problems, we propose IntentVizor, an interactive video summarization framework guided by generic multi-modality queries. The input query that describes the user's needs are not limited to text but also the video snippets. We further represent these multi-modality finer-grained queries as user `intent', which is interpretable, interactable, editable, and can better quantify the user's needs. In this paper, we use a set of the proposed intents to represent the user query and design a new interactive visual analytic interface. Users can interactively control and adjust these mixed-initiative intents to obtain a more satisfying summary through the interface. Also, to improve the summarization quality via video understanding, a novel Granularity-Scalable Ego-Graph Convolutional Networks (GSE-GCN) is proposed. We conduct our experiments on two benchmark datasets. Comparisons with the state-of-the-art methods verify the effectiveness of the proposed framework. Code and dataset are available at https://github.com/jnzs1836/intent-vizor.

SESep 6, 2013Code
Enabling Reproducible Science with VisTrails

David Koop, Juliana Freire, Claudio T. Silva

With the increasing amount of data and use of computation in science, software has become an important component in many different domains. Computing is now being used more often and in more aspects of scientific work including data acquisition, simulation, analysis, and visualization. To ensure reproducibility, it is important to capture the different computational processes used as well as their executions. VisTrails is an open-source scientific workflow system for data analysis and visualization that seeks to address the problem of integrating varied tools as well as automatically documenting the methods and parameters employed. Growing from a specific project need to supporting a wide array of users required close collaborations in addition to new research ideas to design a usable and efficient system. The VisTrails project now includes standard software processes like unit testing and developer documentation while serving as a base for further research. In this paper, we describe how VisTrails has developed and how our efforts in structuring and advertising the system have contributed to its adoption in many domains.

LGMay 22, 2024
Exploring the Relationship Between Feature Attribution Methods and Model Performance

Priscylla Silva, Claudio T. Silva, Luis Gustavo Nonato

Machine learning and deep learning models are pivotal in educational contexts, particularly in predicting student success. Despite their widespread application, a significant gap persists in comprehending the factors influencing these models' predictions, especially in explainability within education. This work addresses this gap by employing nine distinct explanation methods and conducting a comprehensive analysis to explore the correlation between the agreement among these methods in generating explanations and the predictive model's performance. Applying Spearman's correlation, our findings reveal a very strong correlation between the model's performance and the agreement level observed among the explanation methods.

LGApr 25, 2024
T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients

Evandro S. Ortigossa, Fábio F. Dias, Brian Barr et al.

The development of machine learning applications has increased significantly in recent years, motivated by the remarkable ability of learning-powered systems to discover and generalize intricate patterns hidden in massive datasets. Modern learning models, while powerful, often exhibit a complexity level that renders them opaque black boxes, lacking transparency and hindering our understanding of their decision-making processes. Opacity challenges the practical application of machine learning, especially in critical domains requiring informed decisions. Explainable Artificial Intelligence (XAI) addresses that challenge, unraveling the complexity of black boxes by providing explanations. Feature attribution/importance XAI stands out for its ability to delineate the significance of input features in predictions. However, most attribution methods have limitations, such as instability, when divergent explanations result from similar or the same instance. This work introduces T-Explainer, a novel additive attribution explainer based on the Taylor expansion that offers desirable properties such as local accuracy and consistency. We demonstrate T-Explainer's effectiveness and stability over multiple runs in quantitative benchmark experiments against well-known attribution methods. Additionally, we provide several tools to evaluate and visualize explanations, turning T-Explainer into a comprehensive XAI framework.

CVDec 12, 2021
Sidewalk Measurements from Satellite Images: Preliminary Findings

Maryam Hosseini, Iago B. Araujo, Hamed Yazdanpanah et al.

Large-scale analysis of pedestrian infrastructures, particularly sidewalks, is critical to human-centric urban planning and design. Benefiting from the rich data set of planimetric features and high-resolution orthoimages provided through the New York City Open Data portal, we train a computer vision model to detect sidewalks, roads, and buildings from remote-sensing imagery and achieve 83% mIoU over held-out test set. We apply shape analysis techniques to study different attributes of the extracted sidewalks. More specifically, we do a tile-wise analysis of the width, angle, and curvature of sidewalks, which aside from their general impacts on walkability and accessibility of urban areas, are known to have significant roles in the mobility of wheelchair users. The preliminary results are promising, glimpsing the potential of the proposed approach to be adopted in different cities, enabling researchers and practitioners to have a more vivid picture of the pedestrian realm.

CVSep 6, 2021
ERA: Entity Relationship Aware Video Summarization with Wasserstein GAN

Guande Wu, Jianzhe Lin, Claudio T. Silva

Video summarization aims to simplify large scale video browsing by generating concise, short summaries that diver from but well represent the original video. Due to the scarcity of video annotations, recent progress for video summarization concentrates on unsupervised methods, among which the GAN based methods are most prevalent. This type of methods includes a summarizer and a discriminator. The summarized video from the summarizer will be assumed as the final output, only if the video reconstructed from this summary cannot be discriminated from the original one by the discriminator. The primary problems of this GAN based methods are two folds. First, the summarized video in this way is a subset of original video with low redundancy and contains high priority events/entities. This summarization criterion is not enough. Second, the training of the GAN framework is not stable. This paper proposes a novel Entity relationship Aware video summarization method (ERA) to address the above problems. To be more specific, we introduce an Adversarial Spatio Temporal network to construct the relationship among entities, which we think should also be given high priority in the summarization. The GAN training problem is solved by introducing the Wasserstein GAN and two newly proposed video patch/score sum losses. In addition, the score sum loss can also relieve the model sensitivity to the varying video lengths, which is an inherent problem for most current video analysis tasks. Our method substantially lifts the performance on the target benchmark datasets and exceeds the current leaderboard Rank 1 state of the art CSNet (2.1% F1 score increase on TVSum and 3.1% F1 score increase on SumMe). We hope our straightforward yet effective approach will shed some light on the future research of unsupervised video summarization.

HCAug 31, 2020
Urban Mosaic: Visual Exploration of Streetscapes Using Large-Scale Image Data

Fabio Miranda, Maryam Hosseini, Marcos Lage et al.

Urban planning is increasingly data driven, yet the challenge of designing with data at a city scale and remaining sensitive to the impact at a human scale is as important today as it was for Jane Jacobs. We address this challenge with Urban Mosaic,a tool for exploring the urban fabric through a spatially and temporally dense data set of 7.7 million street-level images from New York City, captured over the period of a year. Working in collaboration with professional practitioners, we use Urban Mosaic to investigate questions of accessibility and mobility, and preservation and retrofitting. In doing so, we demonstrate how tools such as this might provide a bridge between the city and the street, by supporting activities such as visual comparison of geographically distant neighborhoods,and temporal analysis of unfolding urban development.

HCJul 21, 2020
Melody: Generating and Visualizing Machine Learning Model Summary to Understand Data and Classifiers Together

Gromit Yeuk-Yin Chan, Enrico Bertini, Luis Gustavo Nonato et al.

With the increasing sophistication of machine learning models, there are growing trends of developing model explanation techniques that focus on only one instance (local explanation) to ensure faithfulness to the original model. While these techniques provide accurate model interpretability on various data primitive (e.g., tabular, image, or text), a holistic Explainable Artificial Intelligence (XAI) experience also requires a global explanation of the model and dataset to enable sensemaking in different granularity. Thus, there is a vast potential in synergizing the model explanation and visual analytics approaches. In this paper, we present MELODY, an interactive algorithm to construct an optimal global overview of the model and data behavior by summarizing the local explanations using information theory. The result (i.e., an explanation summary) does not require additional learning models, restrictions of data primitives, or the knowledge of machine learning from the users. We also design MELODY UI, an interactive visual analytics system to demonstrate how the explanation summary connects the dots in various XAI tasks from a global overview to local inspections. We present three usage scenarios regarding tabular, image, and text classifications to illustrate how to generalize model interpretability of different data. Our experiments show that our approaches: (1) provides a better explanation summary compared to a straightforward information-theoretic summarization and (2) achieves a significant speedup in the end-to-end data modeling pipeline.

HCJul 21, 2020
SUBPLEX: Towards a Better Understanding of Black Box Model Explanations at the Subpopulation Level

Jun Yuan, Gromit Yeuk-Yin Chan, Brian Barr et al.

Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts such as transport control, financial activities, and medical diagnosis. While current model interpretation methodologies focus on using locally linear functions to approximate the models or creating self-explanatory models that give explanations to each input instance, they do not focus on model interpretation at the subpopulation level, which is the understanding of model interpretations across different subset aggregations in a dataset. To address the challenges of providing explanations of an ML model across the whole dataset, we propose SUBPLEX, a visual analytics system to help users understand black-box model explanations with subpopulation visual analysis. SUBPLEX is designed through an iterative design process with machine learning researchers to address three usage scenarios of real-life machine learning tasks: model debugging, feature selection, and bias detection. The system applies novel subpopulation analysis on ML model explanations and interactive visualization to explore the explanations on a dataset with different levels of granularity. Based on the system, we conduct user evaluation to assess how understanding the interpretation at a subpopulation level influences the sense-making process of interpreting ML models from a user's perspective. Our results suggest that by providing model explanations for different groups of data, SUBPLEX encourages users to generate more ingenious ideas to enrich the interpretations. It also helps users to acquire a tight integration between programming workflow and visual analytics workflow. Last but not least, we summarize the considerations observed in applying visualization to machine learning interpretations.

SOC-PHMay 4, 2020
Learning Geo-Contextual Embeddings for Commuting Flow Prediction

Zhicheng Liu, Fabio Miranda, Weiting Xiong et al.

Predicting commuting flows based on infrastructure and land-use information is critical for urban planning and public policy development. However, it is a challenging task given the complex patterns of commuting flows. Conventional models, such as gravity model, are mainly derived from physics principles and limited by their predictive power in real-world scenarios where many factors need to be considered. Meanwhile, most existing machine learning-based methods ignore the spatial correlations and fail to model the influence of nearby regions. To address these issues, we propose Geo-contextual Multitask Embedding Learner (GMEL), a model that captures the spatial correlations from geographic contextual information for commuting flow prediction. Specifically, we first construct a geo-adjacency network containing the geographic contextual information. Then, an attention mechanism is proposed based on the framework of graph attention network (GAT) to capture the spatial correlations and encode geographic contextual information to embedding space. Two separate GATs are used to model supply and demand characteristics. A multitask learning framework is used to introduce stronger restrictions and enhance the effectiveness of the embedding representation. Finally, a gradient boosting machine is trained based on the learned embeddings to predict commuting flows. We evaluate our model using real-world datasets from New York City and the experimental results demonstrate the effectiveness of our proposal against the state of the art.

CVMar 8, 2020
A Tracking System For Baseball Game Reconstruction

Nina Wiedemann, Carlos Dietrich, Claudio T. Silva

The baseball game is often seen as many contests that are performed between individuals. The duel between the pitcher and the batter, for example, is considered the engine that drives the sport. The pitchers use a variety of strategies to gain competitive advantage against the batter, who does his best to figure out the ball trajectory and react in time for a hit. In this work, we propose a system that captures the movements of the pitcher, the batter, and the ball in a high level of detail, and discuss several ways how this information may be processed to compute interesting statistics. We demonstrate on a large database of videos that our methods achieve comparable results as previous systems, while operating solely on video material. In addition, state-of-the-art AI techniques are incorporated to augment the amount of information that is made available for players, coaches, teams, and fans.

HCAug 2, 2019
FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System

Bowen Yu, Claudio T. Silva

Dataflow visualization systems enable flexible visual data exploration by allowing the user to construct a dataflow diagram that composes query and visualization modules to specify system functionality. However learning dataflow diagram usage presents overhead that often discourages the user. In this work we design FlowSense, a natural language interface for dataflow visualization systems that utilizes state-of-the-art natural language processing techniques to assist dataflow diagram construction. FlowSense employs a semantic parser with special utterance tagging and special utterance placeholders to generalize to different datasets and dataflow diagrams. It explicitly presents recognized dataset and diagram special utterances to the user for dataflow context awareness. With FlowSense the user can expand and adjust dataflow diagrams more conveniently via plain English. We apply FlowSense to the VisFlow subset-flow visualization system to enhance its usability. We evaluate FlowSense by one case study with domain experts on a real-world data analysis problem and a formal user study.

GRJul 22, 2019
Motion Browser: Visualizing and Understanding Complex Upper Limb Movement Under Obstetrical Brachial Plexus Injuries

Gromit Yeuk-Yin Chan, Luis Gustavo Nonato, Alice Chu et al.

The brachial plexus is a complex network of peripheral nerves that enables sensing from and control of the movements of the arms and hand. Nowadays, the coordination between the muscles to generate simple movements is still not well understood, hindering the knowledge of how to best treat patients with this type of peripheral nerve injury. To acquire enough information for medical data analysis, physicians conduct motion analysis assessments with patients to produce a rich dataset of electromyographic signals from multiple muscles recorded with joint movements during real-world tasks. However, tools for the analysis and visualization of the data in a succinct and interpretable manner are currently not available. Without the ability to integrate, compare, and compute multiple data sources in one platform, physicians can only compute simple statistical values to describe patient's behavior vaguely, which limits the possibility to answer clinical questions and generate hypotheses for research. To address this challenge, we have developed \systemname, an interactive visual analytics system which provides an efficient framework to extract and compare muscle activity patterns from the patient's limbs and coordinated views to help users analyze muscle signals, motion data, and video information to address different tasks. The system was developed as a result of a collaborative endeavor between computer scientists and orthopedic surgery and rehabilitation physicians. We present case studies showing physicians can utilize the information displayed to understand how individuals coordinate their muscles to initiate appropriate treatment and generate new hypotheses for future research.

GRJul 9, 2019
Shadow Accrual Maps: Efficient Accumulation of City-Scale Shadows Over Time

Fabio Miranda, Harish Doraiswamy, Marcos Lage et al.

Large scale shadows from buildings in a city play an important role in determining the environmental quality of public spaces. They can be both beneficial, such as for pedestrians during summer, and detrimental, by impacting vegetation and by blocking direct sunlight. Determining the effects of shadows requires the accumulation of shadows over time across different periods in a year. In this paper, we propose a simple yet efficient class of approach that uses the properties of sun movement to track the changing position of shadows within a fixed time interval. We use this approach to extend two commonly used shadowing techniques, shadow maps and ray tracing, and demonstrate the efficiency of our approach. Our technique is used to develop an interactive visual analysis system, Shadow Profiler, targeted at city planners and architects that allows them to test the impact of shadows for different development scenarios. We validate the usefulness of this system through case studies set in Manhattan, a dense borough of New York City.

CVApr 8, 2019
Quantifying the presence of graffiti in urban environments

Eric K. Tokuda, Claudio T. Silva, Roberto M. Cesar-Jr

Graffiti is a common phenomenon in urban scenarios. Differently from urban art, graffiti tagging is a vandalism act and many local governments are putting great effort to combat it. The graffiti map of a region can be a very useful resource because it may allow one to potentially combat vandalism in locations with high level of graffiti and also to cleanup saturated regions to discourage future acts. There is currently no automatic way of obtaining a graffiti map of a region and it is obtained by manual inspection by the police or by popular participation. In this sense, we describe an ongoing work where we propose an automatic way of obtaining a graffiti map of a neighbourhood. It consists of the systematic collection of street view images followed by the identification of graffiti tags in the collected dataset and finally, in the calculation of the proposed graffiti level of that location. We validate the proposed method by evaluating the geographical distribution of graffiti in a city known to have high concentration of graffiti -- Sao Paulo, Brazil.

CVNov 12, 2018
A new approach for pedestrian density estimation using moving sensors and computer vision

Eric K. Tokuda, Yitzchak Lockerman, Gabriel B. A. Ferreira et al.

An understanding of pedestrian dynamics is indispensable for numerous urban applications including the design of transportation networks and planing for business development. Pedestrian counting often requires utilizing manual or technical means to count individuals in each location of interest. However, such methods do not scale to the size of a city and a new approach to fill this gap is here proposed. In this project, we used a large dense dataset of images of New York City along with computer vision techniques to construct a spatio-temporal map of relative person density. Due to the limitations of state of the art computer vision methods, such automatic detection of person is inherently subject to errors. We model these errors as a probabilistic process, for which we provide theoretical analysis and thorough numerical simulations. We demonstrate that, within our assumptions, our methodology can supply a reasonable estimate of person densities and provide theoretical bounds for the resulting error.

CVNov 6, 2018
Identificação automática de pichação a partir de imagens urbanas

Eric K. Tokuda, Claudio T. Silva, Roberto M. Cesar-Jr

Graffiti tagging is a common issue in great cities an local authorities are on the move to combat it. The tagging map of a city can be a useful tool as it may help to clean-up highly saturated regions and discourage future acts in the neighbourhood and currently there is no way of getting a tagging map of a region in an automatic fashion and manual inspection or crowd participation are required. In this work, we describe a work in progress in creating an automatic way to get a tagging map of a city or region. It is based on the use of street view images and on the detection of graffiti tags in the images.