HCNov 30, 2023
Investigating Collaborative Data Practices: a Case Study on Artificial Intelligence for Healthcare ResearchRafael Henkin, Elizabeth Remfry, Duncan J. Reynolds et al.
Developing artificial intelligence (AI) tools for healthcare is a collaborative effort, bringing data scientists, clinicians, patients and other disciplines together. In this paper, we explore the collaborative data practices of research consortia tasked with applying AI tools to understand and manage multiple long-term conditions in the UK. Through an inductive thematic analysis of 13 semi-structured interviews with participants of these consortia, we aimed to understand how collaboration happens based on the tools used, communication processes and settings, as well as the conditions and obstacles for collaborative work. Our findings reveal the adaptation of tools that are used for sharing knowledge and the tailoring of information based on the audience, particularly those from a clinical or patient perspective. Limitations on the ability to do this were also found to be imposed by the use of electronic healthcare records and access to datasets. We identified meetings as the key setting for facilitating exchanges between disciplines and allowing for the blending and creation of knowledge. Finally, we bring to light the conditions needed to facilitate collaboration and discuss how some of the challenges may be navigated in future work.
LGDec 2, 2024
Exploring Long-Term Prediction of Type 2 Diabetes Microvascular ComplicationsElizabeth Remfry, Rafael Henkin, Michael R Barnes et al.
Electronic healthcare records (EHR) contain a huge wealth of data that can support the prediction of clinical outcomes. EHR data is often stored and analysed using clinical codes (ICD10, SNOMED), however these can differ across registries and healthcare providers. Integrating data across systems involves mapping between different clinical ontologies requiring domain expertise, and at times resulting in data loss. To overcome this, code-agnostic models have been proposed. We assess the effectiveness of a code-agnostic representation approach on the task of long-term microvascular complication prediction for individuals living with Type 2 Diabetes. Our method encodes individual EHRs as text using fine-tuned, pretrained clinical language models. Leveraging large-scale EHR data from the UK, we employ a multi-label approach to simultaneously predict the risk of microvascular complications across 1-, 5-, and 10-year windows. We demonstrate that a code-agnostic approach outperforms a code-based model and illustrate that performance is better with longer prediction windows but is biased to the first occurring complication. Overall, we highlight that context length is vitally important for model performance. This study highlights the possibility of including data from across different clinical ontologies and is a starting point for generalisable clinical models.
HCNov 28, 2019
Words of Estimative Correlation: Studying Verbalizations of ScatterplotsRafael Henkin, Cagatay Turkay
Natural language and visualization are being increasingly deployed together for supporting data analysis in different ways, from multimodal interaction to enriched data summaries and insights. Yet, researchers still lack systematic knowledge on how viewers verbalize their interpretations of visualizations, and how they interpret verbalizations of visualizations in such contexts. We describe two studies aimed at identifying characteristics of data and charts that are relevant in such tasks. The first study asks participants to verbalize what they see in scatterplots that depict various levels of correlations. The second study then asks participants to choose visualizations that match a given verbal description of correlation. We extract key concepts from responses, organize them in a taxonomy and analyze the categorized responses. We observe that participants use a wide range of vocabulary across all scatterplots, but particular concepts are preferred for higher levels of correlation. A comparison between the studies reveals the ambiguity of some of the concepts. We discuss how the results could inform the design of multimodal representations aligned with the data and analytical tasks, and present a research roadmap to deepen the understanding about visualizations and natural language.