Arnab Nandi

AI
h-index13
8papers
1,042citations
Novelty46%
AI Score45

8 Papers

LGMar 1, 2023
Cross-Modal Entity Matching for Visually Rich Documents

Ritesh Sarkhel, Arnab Nandi

Visually rich documents (e.g. leaflets, banners, magazine articles) are physical or digital documents that utilize visual cues to augment their semantics. Information contained in these documents are ad-hoc and often incomplete. Existing works that enable structured querying on these documents do not take this into account. This makes it difficult to contextualize the information retrieved from querying these documents and gather actionable insights from them. We propose Juno -- a cross-modal entity matching framework to address this limitation. It augments heterogeneous documents with supplementary information by matching a text span in the document with semantically similar tuples from an external database. Our main contribution in this is a deep neural network with attention that goes beyond traditional keyword-based matching and finds matching tuples by aligning text spans and relational tuples on a multimodal encoding space without any prior knowledge about the document type or the underlying schema. Exhaustive experiments on multiple real-world datasets show that Juno generalizes to heterogeneous documents with diverse layouts and formats. It outperforms state-of-the-art baselines by more than 6 F1 points with up to 60% less human-labeled samples. Our experiments further show that Juno is a computationally robust framework. We can train it only once, and then adapt it dynamically for multiple resource-constrained environments without sacrificing its downstream performance. This makes it suitable for on-device deployment in various edge-devices. To the best of our knowledge, ours is the first work that investigates the information incompleteness of visually rich documents and proposes a generalizable, performant and computationally robust framework to address it in an end-to-end way.

DBMar 4
Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities

Jean-Daniel Fekete, Yifan Hu, Dominik Moritz et al.

The rapid advancement of AI is transforming human-centered systems, with profound implications for human-AI interaction, human-data interaction, and visual analytics. In the AI era, data analysis increasingly involves large-scale, heterogeneous, and multimodal data that is predominantly unstructured, as well as foundation models such as LLMs and VLMs, which introduce additional uncertainty into analytical processes. These shifts expose persistent challenges for human-data interactive systems, including perceptually misaligned latency, scalability constraints, limitations of existing interaction and exploration paradigms, and growing uncertainty regarding the reliability and interpretability of AI-generated insights. Responding to these challenges requires moving beyond conventional efficiency and scalability metrics, redefining the roles of humans and machines in analytical workflows, and incorporating cognitive, perceptual, and design principles into every level of the human-data interaction stack. This paper investigates the challenges introduced by recent advances in AI and examines how these developments are reshaping the ways users engage with data, while outlining limitations and open research directions for building human-centered AI systems for interactive data analysis in the AI era.

21.4HCApr 27
Vega-Video: Integrating Video into the Grammar of Graphics

Dominik Winecki, Arnab Nandi

Video data is increasingly used alongside conventional data for interactive data exploration, necessitating interfaces for exploring and presenting mixed-modality data. However, integrating video into visualizations remains difficult due to its distinct paradigms and inherent performance challenges. We identify three classes of video data visualization - synchronization, annotation, and transformation - and integrate them into the Vega declarative grammar. We show that these abstractions enable high-performance implementation. To reconcile Vega's instantaneous dataflow with video player state, we introduce a split-signal architecture that preserves declarative semantics while masking video update delays. We detect continuous scrubbing interactions at compile time to apply encoding-aware optimizations that improve responsiveness by up to 4x. We also repurpose VOD protocols to transform videos in real time, delivering sub-200ms updates even on multi-hour-long compilations. These contributions enable seamless integration of conventional and video data visualization.

CLMar 30, 2024
Noise-Aware Training of Layout-Aware Language Models

Ritesh Sarkhel, Xiaoqi Ren, Lauro Beltrao Costa et al.

A visually rich document (VRD) utilizes visual features along with linguistic cues to disseminate information. Training a custom extractor that identifies named entities from a document requires a large number of instances of the target document type annotated at textual and visual modalities. This is an expensive bottleneck in enterprise scenarios, where we want to train custom extractors for thousands of different document types in a scalable way. Pre-training an extractor model on unlabeled instances of the target document type, followed by a fine-tuning step on human-labeled instances does not work in these scenarios, as it surpasses the maximum allowable training time allocated for the extractor. We address this scenario by proposing a Noise-Aware Training method or NAT in this paper. Instead of acquiring expensive human-labeled documents, NAT utilizes weakly labeled documents to train an extractor in a scalable way. To avoid degradation in the model's quality due to noisy, weakly labeled samples, NAT estimates the confidence of each training sample and incorporates it as uncertainty measure during training. We train multiple state-of-the-art extractor models using NAT. Experiments on a number of publicly available and in-house datasets show that NAT-trained models are not only robust in performance -- it outperforms a transfer-learning baseline by up to 6% in terms of macro-F1 score, but it is also more label-efficient -- it reduces the amount of human-effort required to obtain comparable performance by up to 73%.

CLFeb 18, 2020
Interpretable Multi-Headed Attention for Abstractive Summarization at Controllable Lengths

Ritesh Sarkhel, Moniba Keymanesh, Arnab Nandi et al.

Abstractive summarization at controllable lengths is a challenging task in natural language processing. It is even more challenging for domains where limited training data is available or scenarios in which the length of the summary is not known beforehand. At the same time, when it comes to trusting machine-generated summaries, explaining how a summary was constructed in human-understandable terms may be critical. We propose Multi-level Summarizer (MLS), a supervised method to construct abstractive summaries of a text document at controllable lengths. The key enabler of our method is an interpretable multi-headed attention mechanism that computes attention distribution over an input document using an array of timestep independent semantic kernels. Each kernel optimizes a human-interpretable syntactic or semantic property. Exhaustive experiments on two low-resource datasets in the English language show that MLS outperforms strong baselines by up to 14.70% in the METEOR score. Human evaluation of the summaries also suggests that they capture the key concepts of the document at various length-budgets.

AIApr 23, 2018
Discovery of Driving Patterns by Trajectory Segmentation

Sobhan Moosavi, Arnab Nandi, Rajiv Ramnath

Telematics data is becoming increasingly available due to the ubiquity of devices that collect data during drives, for different purposes, such as usage based insurance (UBI), fleet management, navigation of connected vehicles, etc. Consequently, a variety of data-analytic applications have become feasible that extract valuable insights from the data. In this paper, we address the especially challenging problem of discovering behavior-based driving patterns from only externally observable phenomena (e.g. vehicle's speed). We present a trajectory segmentation approach capable of discovering driving patterns as separate segments, based on the behavior of drivers. This segmentation approach includes a novel transformation of trajectories along with a dynamic programming approach for segmentation. We apply the segmentation approach on a real-word, rich dataset of personal car trajectories provided by a major insurance company based in Columbus, Ohio. Analysis and preliminary results show the applicability of approach for finding significant driving patterns.

AIOct 13, 2017
Characterizing Driving Context from Driver Behavior

Sobhan Moosavi, Behrooz Omidvar-Tehrani, R. Bruce Craig et al.

Because of the increasing availability of spatiotemporal data, a variety of data-analytic applications have become possible. Characterizing driving context, where context may be thought of as a combination of location and time, is a new challenging application. An example of such a characterization is finding the correlation between driving behavior and traffic conditions. This contextual information enables analysts to validate observation-based hypotheses about the driving of an individual. In this paper, we present DriveContext, a novel framework to find the characteristics of a context, by extracting significant driving patterns (e.g., a slow-down), and then identifying the set of potential causes behind patterns (e.g., traffic congestion). Our experimental results confirm the feasibility of the framework in identifying meaningful driving patterns, with improvements in comparison with the state-of-the-art. We also demonstrate how the framework derives interesting characteristics for different contexts, through real-world examples.

HCMar 31, 2016
Graphical Perception in Animated Bar Charts

Eugene Wu, Lilong Jiang, Larry Xu et al.

Interactive visual applications create animations that encode changes in the data. For example, cross-filtering dynamically updates linked visualizations based on the user's continuous brushing actions. The animated effects resulting from these interactions depends both on how interaction (e.g., brushing speed) controls properties of the animation such as frame rate, as well as how the data that is being explored dictates the data encoded in the animation. Past work has found that frame rate matters to general perception, however a critical question is which of these animation and data properties affects the perceptual accuracy of judgement tasks, and to what extent. Although graphical perception has been well studied for static data visualizations, it is ripe for exploration in the animated setting. We designed two animated judgment tasks of a target bar in an animated bar chart and empirically evaluate the effects of 2 animations properties - highlighting of the target bar and frame rate - as well as 3 data properties that affect the target bar's value throughout the animation. In short, we find that the rate and timing of animation changes is easier detected in larger values; that encodings such as color are easier to detect than shapes; and that timing is important - earlier changes were harder to perceive as compared to later changes in the animation. Our results are an initial step to understanding perceptual accuracy for animated data visualizations, both for presentations and ultimately as part of interactive applications.