CLNov 7, 2023
Evaluating multiple large language models in pediatric ophthalmologyJason Holmes, Rui Peng, Yiwei Li et al.
IMPORTANCE The response effectiveness of different large language models (LLMs) and various individuals, including medical students, graduate students, and practicing physicians, in pediatric ophthalmology consultations, has not been clearly established yet. OBJECTIVE Design a 100-question exam based on pediatric ophthalmology to evaluate the performance of LLMs in highly specialized scenarios and compare them with the performance of medical students and physicians at different levels. DESIGN, SETTING, AND PARTICIPANTS This survey study assessed three LLMs, namely ChatGPT (GPT-3.5), GPT-4, and PaLM2, were assessed alongside three human cohorts: medical students, postgraduate students, and attending physicians, in their ability to answer questions related to pediatric ophthalmology. It was conducted by administering questionnaires in the form of test papers through the LLM network interface, with the valuable participation of volunteers. MAIN OUTCOMES AND MEASURES Mean scores of LLM and humans on 100 multiple-choice questions, as well as the answer stability, correlation, and response confidence of each LLM. RESULTS GPT-4 performed comparably to attending physicians, while ChatGPT (GPT-3.5) and PaLM2 outperformed medical students but slightly trailed behind postgraduate students. Furthermore, GPT-4 exhibited greater stability and confidence when responding to inquiries compared to ChatGPT (GPT-3.5) and PaLM2. CONCLUSIONS AND RELEVANCE Our results underscore the potential for LLMs to provide medical assistance in pediatric ophthalmology and suggest significant capacity to guide the education of medical students.
CLNov 7, 2023
Evaluating Large Language Models in OphthalmologyJason Holmes, Shuyuan Ye, Yiwei Li et al.
Purpose: The performance of three different large language models (LLMS) (GPT-3.5, GPT-4, and PaLM2) in answering ophthalmology professional questions was evaluated and compared with that of three different professional populations (medical undergraduates, medical masters, and attending physicians). Methods: A 100-item ophthalmology single-choice test was administered to three different LLMs (GPT-3.5, GPT-4, and PaLM2) and three different professional levels (medical undergraduates, medical masters, and attending physicians), respectively. The performance of LLM was comprehensively evaluated and compared with the human group in terms of average score, stability, and confidence. Results: Each LLM outperformed undergraduates in general, with GPT-3.5 and PaLM2 being slightly below the master's level, while GPT-4 showed a level comparable to that of attending physicians. In addition, GPT-4 showed significantly higher answer stability and confidence than GPT-3.5 and PaLM2. Conclusion: Our study shows that LLM represented by GPT-4 performs better in the field of ophthalmology. With further improvements, LLM will bring unexpected benefits in medical education and clinical decision making in the near future.
CVJan 22, 2024
Semi-supervised segmentation of land cover images using nonlinear canonical correlation analysis with multiple features and t-SNEHong Wei, James Xiao, Yichao Zhang et al.
Image segmentation is a clustering task whereby each pixel is assigned a cluster label. Remote sensing data usually consists of multiple bands of spectral images in which there exist semantically meaningful land cover subregions, co-registered with other source data such as LIDAR (LIght Detection And Ranging) data, where available. This suggests that, in order to account for spatial correlation between pixels, a feature vector associated with each pixel may be a vectorized tensor representing the multiple bands and a local patch as appropriate. Similarly, multiple types of texture features based on a pixel's local patch would also be beneficial for encoding locally statistical information and spatial variations, without necessarily labelling pixel-wise a large amount of ground truth, then training a supervised model, which is sometimes impractical. In this work, by resorting to label only a small quantity of pixels, a new semi-supervised segmentation approach is proposed. Initially, over all pixels, an image data matrix is created in high dimensional feature space. Then, t-SNE projects the high dimensional data onto 3D embedding. By using radial basis functions as input features, which use the labelled data samples as centres, to pair with the output class labels, a modified canonical correlation analysis algorithm, referred to as RBF-CCA, is introduced which learns the associated projection matrix via the small labelled data set. The associated canonical variables, obtained for the full image, are applied by k-means clustering algorithm. The proposed semi-supervised RBF-CCA algorithm has been implemented on several remotely sensed multispectral images, demonstrating excellent segmentation results.
IRFeb 28, 2020
NewsStand CoronaViz: A Map Query Interface for Spatio-Temporal and Spatio-Textual Monitoring of Disease SpreadJohn Kastner, Hanan Samet, Hong Wei
With the rapid continuing spread of COVID-19, it is clearly important to be able to track the progress of the virus over time in order to be better prepared to anticipate its emergence and spread in new regions as well as declines in its presence in regions thereby leading to or justifying "reopening" decisions. There are many applications and web sites that monitor officially released numbers of cases which are likely to be the most accurate methods for tracking the progress of the virus; however, they will not necessarily paint a complete picture. To begin filling any gaps in official reports, we have developed the NewsStand CoronaViz web application (https://coronaviz.umiacs.io) that can run on desktops and mobile devices that allows users to explore the geographic spread in discussions about the virus through analysis of keyword prevalence in geotagged news articles and tweets in relation to the real spread of the virus as measured by confirmed case numbers reported by the appropriate authorities. NewsStand CoronaViz users have access to dynamic variants of the disease-related variables corresponding to the numbers of confirmed cases, active cases, deaths, and recoveries (where they are provided) via a map query interface. It has the ability to step forward and backward in time using both a variety of temporal window sizes (day, week, month, or combinations thereof) in addition to user-defined varying spatial window sizes specified by direct manipulation actions (e.g., pan, zoom, and hover) as well as textually (e.g., by the name of the containing country, state or province, or county as well as textually-specified spatially-adjacent combinations thereof), and finally by the amount of spatio-temporally-varying news and tweet volume involving COVID-19.
CLDec 19, 2018
Streaming Voice Query Recognition using Causal Convolutional Recurrent Neural NetworksRaphael Tang, Gefei Yang, Hong Wei et al.
Voice-enabled commercial products are ubiquitous, typically enabled by lightweight on-device keyword spotting (KWS) and full automatic speech recognition (ASR) in the cloud. ASR systems require significant computational resources in training and for inference, not to mention copious amounts of annotated speech data. KWS systems, on the other hand, are less resource-intensive but have limited capabilities. On the Comcast Xfinity X1 entertainment platform, we explore a middle ground between ASR and KWS: We introduce a novel, resource-efficient neural network for voice query recognition that is much more accurate than state-of-the-art CNNs for KWS, yet can be easily trained and deployed with limited resources. On an evaluation dataset representing the top 200 voice queries, we achieve a low false alarm rate of 1% and a query error rate of 6%. Our model performs inference 8.24x faster than the current ASR system.