DBSep 13, 2021
Augmenting Decision Making via Interactive What-If AnalysisSneha Gathani, Madelon Hulsebos, James Gale et al.
The fundamental goal of business data analysis is to improve business decisions using data. Business users often make decisions to achieve key performance indicators (KPIs) such as increasing customer retention or sales, or decreasing costs. To discover the relationship between data attributes hypothesized to be drivers and those corresponding to KPIs of interest, business users currently need to perform lengthy exploratory analyses. This involves considering multitudes of combinations and scenarios and performing slicing, dicing, and transformations on the data accordingly, e.g., analyzing customer retention across quarters of the year or suggesting optimal media channels across strata of customers. However, the increasing complexity of datasets combined with the cognitive limitations of humans makes it challenging to carry over multiple hypotheses, even for simple datasets. Therefore mentally performing such analyses is hard. Existing commercial tools either provide partial solutions or fail to cater to business users altogether. Here we argue for four functionalities to enable business users to interactively learn and reason about the relationships between sets of data attributes thereby facilitating data-driven decision making. We implement these functionalities in SystemD, an interactive visual data analysis system enabling business users to experiment with the data by asking what-if questions. We evaluate the system through three business use cases: marketing mix modeling, customer retention analysis, and deal closing analysis, and report on feedback from multiple business users. Users find the SystemD functionalities highly useful for quick testing and validation of their hypotheses around their KPIs of interest, addressing their unmet analysis needs. The feedback also suggests that the UX design can be enhanced to further improve the understandability of these functionalities.
DBSep 11, 2021
Making Table Understanding Work in PracticeMadelon Hulsebos, Sneha Gathani, James Gale et al.
Understanding the semantics of tables at scale is crucial for tasks like data integration, preparation, and search. Table understanding methods aim at detecting a table's topic, semantic column types, column relations, or entities. With the rise of deep learning, powerful models have been developed for these tasks with excellent accuracy on benchmarks. However, we observe that there exists a gap between the performance of these models on these benchmarks and their applicability in practice. In this paper, we address the question: what do we need for these models to work in practice? We discuss three challenges of deploying table understanding models and propose a framework to address them. These challenges include 1) difficulty in customizing models to specific domains, 2) lack of training data for typical database tables often found in enterprises, and 3) lack of confidence in the inferences made by models. We present SigmaTyper which implements this framework for the semantic column type detection task. SigmaTyper encapsulates a hybrid model trained on GitTables and integrates a lightweight human-in-the-loop approach to customize the model. Lastly, we highlight avenues for future research that further close the gap towards making table understanding effective in practice.
DBDec 1, 2020
Sigma Worksheet: Interactive Construction of OLAP QueriesJames Gale, Max Seiden, Gretchen Atwood et al.
The new generation of cloud data warehouses (CDWs) brings large amounts of data and compute power closer to users in enterprises. The ability to directly access the warehouse data, interactively analyze and explore it at scale can empower users to improve their decision making cycles. However, existing tools for analyzing data in CDWs are either limited in ad-hoc transformations or difficult to use for business users, the largest user segment in enterprises. Here we introduce Sigma Worksheet, a new interactive system that enables users to easily perform ad-hoc visual analysis of data in CDWs at scale. For this, Sigma Worksheet provides an accessible spreadsheet-like interface for data analysis through direct manipulation. Sigma Worksheet dynamically constructs matching SQL queries from user interactions on this familiar interface, building on the versatility and expressivity of SQL. Sigma Worksheet executes constructed queries directly on CDWs, leveraging the superior characteristics of the new generation CDWs, including scalability. To evaluate Sigma Worksheet, we first demonstrate its expressivity through two real life use cases, cohort analysis and sessionization. We then measure the performance of the Worksheet generated queries with a set of experiments using the TPC-H benchmark. Results show the performance of our compiled SQL queries is comparable to that of the reference queries of the benchmark. Finally, to assess the usefulness of Sigma Worksheet in deployment, we elicit feedback through a 100-person survey followed by a semi-structured interview study with 70 participants. We find that Sigma Worksheet is easier to use and learn, improving the productivity of users. Our findings also suggest Sigma Worksheet can further improve user experience by providing guidance to users at various steps of data analysis.