STULL: Unbiased Online Sampling for Visual Exploration of Large Spatiotemporal Data
This addresses the issue of incorrect user interpretations in visual analytics due to biased sampling, offering a solution for interactive exploration of large datasets, though it is incremental as it builds on existing sampling techniques.
The paper tackles the problem of biased online sampling in visual analytics of large spatiotemporal data, proposing an unbiased approach that ensures equal selection probabilities for all qualifying data points, resulting in at least 50% more accurate data representation and closer visual appearances to exact visualizations within the same computational time.
Online sampling-supported visual analytics is increasingly important, as it allows users to explore large datasets with acceptable approximate answers at interactive rates. However, existing online spatiotemporal sampling techniques are often biased, as most researchers have primarily focused on reducing computational latency. Biased sampling approaches select data with unequal probabilities and produce results that do not match the exact data distribution, leading end users to incorrect interpretations. In this paper, we propose a novel approach to perform unbiased online sampling of large spatiotemporal data. The proposed approach ensures the same probability of selection to every point that qualifies the specifications of a user's multidimensional query. To achieve unbiased sampling for accurate representative interactive visualizations, we design a novel data index and an associated sample retrieval plan. Our proposed sampling approach is suitable for a wide variety of visual analytics tasks, e.g., tasks that run aggregate queries of spatiotemporal data. Extensive experiments confirm the superiority of our approach over a state-of-the-art spatial online sampling technique, demonstrating that within the same computational time, data samples generated in our approach are at least 50% more accurate in representing the actual spatial distribution of the data and enable approximate visualizations to present closer visual appearances to the exact ones.