HCFeb 29, 2024
ARTiST: Automated Text Simplification for Task Guidance in Augmented RealityGuande Wu, Jing Qian, Sonia Castelo et al.
Text presented in augmented reality provides in-situ, real-time information for users. However, this content can be challenging to apprehend quickly when engaging in cognitively demanding AR tasks, especially when it is presented on a head-mounted display. We propose ARTiST, an automatic text simplification system that uses a few-shot prompt and GPT-3 models to specifically optimize the text length and semantic content for augmented reality. Developed out of a formative study that included seven users and three experts, our system combines a customized error calibration model with a few-shot prompt to integrate the syntactic, lexical, elaborative, and content simplification techniques, and generate simplified AR text for head-worn displays. Results from a 16-user empirical study showed that ARTiST lightens the cognitive load and improves performance significantly over both unmodified text and text modified via traditional methods. Our work constitutes a step towards automating the optimization of batch text data for readability and performance in augmented reality.
IRFeb 10, 2021
Auctus: A Dataset Search Engine for Data AugmentationSonia Castelo, Rémi Rampin, Aécio Santos et al.
The large volumes of structured data currently available, from Web tables to open-data portals and enterprise data, open up new opportunities for progress in answering many important scientific, societal, and business questions. However, finding relevant data is difficult. While search engines have addressed this problem for Web documents, there are many new challenges involved in supporting the discovery of structured data. We demonstrate how the Auctus dataset search engine addresses some of these challenges. We describe the system architecture and how users can explore datasets through a rich set of queries. We also present case studies which show how Auctus supports data augmentation to improve machine learning models as well as to enrich analytics.
HCDec 14, 2020
A Visual Mining Approach to Improved Multiple-Instance LearningSonia Castelo, Moacir Ponti, Rosane Minghim
Multiple-instance learning (MIL) is a paradigm of machine learning that aims to classify a set (bag) of objects (instances), assigning labels only to the bags. This problem is often addressed by selecting an instance to represent each bag, transforming a MIL problem into standard supervised learning. Visualization can be a useful tool to assess learning scenarios by incorporating the users' knowledge into the classification process. Considering that multiple-instance learning is a paradigm that cannot be handled by current visualization techniques, we propose a multiscale tree-based visualization called MILTree to support MIL problems. The first level of the tree represents the bags, and the second level represents the instances belonging to each bag, allowing users to understand the MIL datasets in an intuitive way. In addition, we propose two new instance selection methods for MIL, which help users improve the model even further. Our methods can handle both binary and multiclass scenarios. In our experiments, SVM was used to build the classifiers. With support of the MILTree layout, the initial classification model was updated by changing the training set, which is composed of the prototype instances. Experimental results validate the effectiveness of our approach, showing that visual mining by MILTree can support exploring and improving models in MIL scenarios and that our instance selection methods outperform the currently available alternatives in most cases.
HCSep 1, 2020
Towards Evaluating Exploratory Model Building Process with AutoML SystemsSungsoo Ray Hong, Sonia Castelo, Vito D'Orazio et al.
The use of Automated Machine Learning (AutoML) systems are highly open-ended and exploratory. While rigorously evaluating how end-users interact with AutoML is crucial, establishing a robust evaluation methodology for such exploratory systems is challenging. First, AutoML is complex, including multiple sub-components that support a variety of sub-tasks for synthesizing ML pipelines, such as data preparation, problem specification, and model generation, making it difficult to yield insights that tell us which components were successful or not. Second, because the usage pattern of AutoML is highly exploratory, it is not possible to rely solely on widely used task efficiency and effectiveness metrics as success metrics. To tackle the challenges in evaluation, we propose an evaluation methodology that (1) guides AutoML builders to divide their AutoML system into multiple sub-system components, and (2) helps them reason about each component through visualization of end-users' behavioral patterns and attitudinal data. We conducted a study to understand when, how, why, and applying our methodology can help builders to better understand their systems and end-users. We recruited 3 teams of professional AutoML builders. The teams prepared their own systems and let 41 end-users use the systems. Using our methodology, we visualized end-users' behavioral and attitudinal data and distributed the results to the teams. We analyzed the results in two directions: what types of novel insights the AutoML builders learned from end-users, and (2) how the evaluation methodology helped the builders to understand workflows and the effectiveness of their systems. Our findings suggest new insights explaining future design opportunities in the AutoML domain as well as how using our methodology helped the builders to determine insights and let them draw concrete directions for improving their systems.
HCMay 1, 2020
PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML PipelinesJorge Piazentin Ono, Sonia Castelo, Roque Lopez et al.
In recent years, a wide variety of automated machine learning (AutoML) methods have been proposed to search and generate end-to-end learning pipelines. While these techniques facilitate the creation of models for real-world applications, given their black-box nature, the complexity of the underlying algorithms, and the large number of pipelines they derive, it is difficult for their developers to debug these systems. It is also challenging for machine learning experts to select an AutoML system that is well suited for a given problem or class of problems. In this paper, we present the PipelineProfiler, an interactive visualization tool that allows the exploration and comparison of the solution space of machine learning (ML) pipelines produced by AutoML systems. PipelineProfiler is integrated with Jupyter Notebook and can be used together with common data science tools to enable a rich set of analyses of the ML pipelines and provide insights about the algorithms that generated them. We demonstrate the utility of our tool through several use cases where PipelineProfiler is used to better understand and improve a real-world AutoML system. Furthermore, we validate our approach by presenting a detailed analysis of a think-aloud experiment with six data scientists who develop and evaluate AutoML tools.
LGJul 5, 2019
Visus: An Interactive System for Automatic Machine Learning Model Building and CurationAécio Santos, Sonia Castelo, Cristian Felix et al.
While the demand for machine learning (ML) applications is booming, there is a scarcity of data scientists capable of building such models. Automatic machine learning (AutoML) approaches have been proposed that help with this problem by synthesizing end-to-end ML data processing pipelines. However, these follow a best-effort approach and a user in the loop is necessary to curate and refine the derived pipelines. Since domain experts often have little or no expertise in machine learning, easy-to-use interactive interfaces that guide them throughout the model building process are necessary. In this paper, we present Visus, a system designed to support the model building process and curation of ML data processing pipelines generated by AutoML systems. We describe the framework used to ground our design choices and a usage scenario enabled by Visus. Finally, we discuss the feedback received in user testing sessions with domain experts.
CLMay 2, 2019
A Topic-Agnostic Approach for Identifying Fake News PagesSonia Castelo, Thais Almeida, Anas Elghafari et al.
Fake news and misinformation have been increasingly used to manipulate popular opinion and influence political processes. To better understand fake news, how they are propagated, and how to counter their effect, it is necessary to first identify them. Recently, approaches have been proposed to automatically classify articles as fake based on their content. An important challenge for these approaches comes from the dynamic nature of news: as new political events are covered, topics and discourse constantly change and thus, a classifier trained using content from articles published at a given time is likely to become ineffective in the future. To address this challenge, we propose a topic-agnostic (TAG) classification strategy that uses linguistic and web-markup features to identify fake news pages. We report experimental results using multiple data sets which show that our approach attains high accuracy in the identification of fake news, even as topics evolve over time.