Nguyen Phan

2papers

2 Papers

CLApr 27, 2023Code
ViMQ: A Vietnamese Medical Question Dataset for Healthcare Dialogue System Development

Ta Duc Huy, Nguyen Anh Tu, Tran Hoang Vu et al.

Existing medical text datasets usually take the form of question and answer pairs that support the task of natural language generation, but lacking the composite annotations of the medical terms. In this study, we publish a Vietnamese dataset of medical questions from patients with sentence-level and entity-level annotations for the Intent Classification and Named Entity Recognition tasks. The tag sets for two tasks are in medical domain and can facilitate the development of task-oriented healthcare chatbots with better comprehension of queries from patients. We train baseline models for the two tasks and propose a simple self-supervised training strategy with span-noise modelling that substantially improves the performance. Dataset and code will be published at https://github.com/tadeephuy/ViMQ

0.5CGApr 15
Interactive Exploration of Large-scale Streamlines of Vector Fields via a Curve Segment Neighborhood Graph

Nguyen Phan, Brian Kim, Adeel Zafar et al.

Streamlines have been widely used to represent and analyze various steady vector fields. To sufficiently represent important features in complex vector fields (like flow), a large number of streamlines are required. Due to the lack of a rigorous definition of features or patterns in streamlines, user interaction and exploration are required to achieve effective interpretation. Existing approaches based on clustering or pattern search, while valuable for specific analysis tasks, often face challenges in supporting interactive and level-of-detail exploration of large-scale curve-based data, particularly when real-time parameter adjustment and iterative refinement are needed. To address this, we design and implement an interactive web-based system. Our system utilizes a Curve Segment Neighborhood Graph (CSNG) to encode the neighboring relationships between curve segments. CSNG enables us to adapt a fast community detection algorithm to identify coherent flow structures and spatial groupings in the streamlines interactively. CSNG also supports a multi-level exploration through an enhanced force-directed layout. Furthermore, our system integrates an adjacency matrix representation to reveal detailed inter-relations among segments. To achieve real-time performance within a web browser, our system employs matrix compression for memory-efficient CSNG storage and parallel processing. We have applied our system to analyze and interpret complex patterns in several streamline datasets. Our experiments show that we achieve real-time performance on datasets with hundreds of thousands of segments.