LGMar 24, 2023
How Does Attention Work in Vision Transformers? A Visual Analytics AttemptYiran Li, Junpeng Wang, Xin Dai et al.
Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the sequence to learn the attention between patches. Despite many successful interpretations of transformers on sequential data, little effort has been devoted to the interpretation of ViTs, and many questions remain unanswered. For example, among the numerous attention heads, which one is more important? How strong are individual patches attending to their spatial neighbors in different heads? What attention patterns have individual heads learned? In this work, we answer these questions through a visual analytics approach. Specifically, we first identify what heads are more important in ViTs by introducing multiple pruning-based metrics. Then, we profile the spatial distribution of attention strengths between patches inside individual heads, as well as the trend of attention strengths across attention layers. Third, using an autoencoder-based learning solution, we summarize all possible attention patterns that individual heads could learn. Examining the attention strengths and patterns of the important heads, we answer why they are important. Through concrete case studies with experienced deep learning experts on multiple ViTs, we validate the effectiveness of our solution that deepens the understanding of ViTs from head importance, head attention strength, and head attention pattern.
SIAug 10, 2024
A Versatile Framework for Attributed Network Clustering via K-Nearest Neighbor AugmentationYiran Li, Gongyao Guo, Jieming Shi et al. · utoronto
Attributed networks containing entity-specific information in node attributes are ubiquitous in modeling social networks, e-commerce, bioinformatics, etc. Their inherent network topology ranges from simple graphs to hypergraphs with high-order interactions and multiplex graphs with separate layers. An important graph mining task is node clustering, aiming to partition the nodes of an attributed network into k disjoint clusters such that intra-cluster nodes are closely connected and share similar attributes, while inter-cluster nodes are far apart and dissimilar. It is highly challenging to capture multi-hop connections via nodes or attributes for effective clustering on multiple types of attributed networks. In this paper, we first present AHCKA as an efficient approach to attributed hypergraph clustering (AHC). AHCKA includes a carefully-crafted K-nearest neighbor augmentation strategy for the optimized exploitation of attribute information on hypergraphs, a joint hypergraph random walk model to devise an effective AHC objective, and an efficient solver with speedup techniques for the objective optimization. The proposed techniques are extensible to various types of attributed networks, and thus, we develop ANCKA as a versatile attributed network clustering framework, capable of attributed graph clustering (AGC), attributed multiplex graph clustering (AMGC), and AHC. Moreover, we devise ANCKA with algorithmic designs tailored for GPU acceleration to boost efficiency. We have conducted extensive experiments to compare our methods with 19 competitors on 8 attributed hypergraphs, 16 competitors on 6 attributed graphs, and 16 competitors on 3 attributed multiplex graphs, all demonstrating the superb clustering quality and efficiency of our methods.
CVNov 2, 2023
Visual Analytics for Efficient Image Exploration and User-Guided Image CaptioningYiran Li, Junpeng Wang, Prince Aboagye et al.
Recent advancements in pre-trained large-scale language-image models have ushered in a new era of visual comprehension, offering a significant leap forward. These breakthroughs have proven particularly instrumental in addressing long-standing challenges that were previously daunting. Leveraging these innovative techniques, this paper tackles two well-known issues within the realm of visual analytics: (1) the efficient exploration of large-scale image datasets and identification of potential data biases within them; (2) the evaluation of image captions and steering of their generation process. On the one hand, by visually examining the captions automatically generated from language-image models for an image dataset, we gain deeper insights into the semantic underpinnings of the visual contents, unearthing data biases that may be entrenched within the dataset. On the other hand, by depicting the association between visual contents and textual captions, we expose the weaknesses of pre-trained language-image models in their captioning capability and propose an interactive interface to steer caption generation. The two parts have been coalesced into a coordinated visual analytics system, fostering mutual enrichment of visual and textual elements. We validate the effectiveness of the system with domain practitioners through concrete case studies with large-scale image datasets.
IVJun 13, 2022
Acceleration of cerebral blood flow and arterial transit time maps estimation from multiple post-labeling delay arterial spin-labeled MRI via deep learningYiran Li, Ze Wang
Purpose: Arterial spin labeling (ASL) perfusion imaging indicates direct and absolute measurement of cerebral blood flow (CBF). Arterial transit time (ATT) is a related physiological parameter reflecting the duration for the labeled spins to reach the brain region of interest. Multiple post-labeling delay (PLDs) can provide robust measures of both CBF and ATT, allowing for optimization of regional CBF modeling based on ATT. The prolonged acquisition time can potentially reduce the quality and accuracy of the CBF and ATT estimation. We proposed a novel network to significantly reduce the number of PLDs with higher signal-to-noise ratio (SNR). Method: CBF and ATT estimations were performed for one PLD and two PLDs sepa-rately. Each model was trained independently to learn the nonlinear transformation from perfusion weighted image (PWI) to CBF and ATT images. Results: Both one-PLD and two-PLD models outperformed the conventional method visually on CBF and two-PLD model showed more accurate structure on ATT estima-tion. The proposed method significantly reduces the number of PLDs from 6 to 2 on ATT and even to single PLD on CBF without sacrificing the SNR. Conclusion: It is feasible to generate CBF and ATT maps with reduced PLDs using deep learning with high quality.
CVMar 6, 2023
Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural NetworksYiran Li, Junpeng Wang, Takanori Fujiwara et al.
Adversarial attacks on a convolutional neural network (CNN) -- injecting human-imperceptible perturbations into an input image -- could fool a high-performance CNN into making incorrect predictions. The success of adversarial attacks raises serious concerns about the robustness of CNNs, and prevents them from being used in safety-critical applications, such as medical diagnosis and autonomous driving. Our work introduces a visual analytics approach to understanding adversarial attacks by answering two questions: (1) which neurons are more vulnerable to attacks and (2) which image features do these vulnerable neurons capture during the prediction? For the first question, we introduce multiple perturbation-based measures to break down the attacking magnitude into individual CNN neurons and rank the neurons by their vulnerability levels. For the second, we identify image features (e.g., cat ears) that highly stimulate a user-selected neuron to augment and validate the neuron's responsibility. Furthermore, we support an interactive exploration of a large number of neurons by aiding with hierarchical clustering based on the neurons' roles in the prediction. To this end, a visual analytics system is designed to incorporate visual reasoning for interpreting adversarial attacks. We validate the effectiveness of our system through multiple case studies as well as feedback from domain experts.
CLApr 14
AlphaEval: Evaluating Agents in ProductionPengrui Lu, Bingyu Xu, Wenjun Zhang et al.
The rapid deployment of AI agents in commercial settings has outpaced the development of evaluation methodologies that reflect production realities. Existing benchmarks measure agent capabilities through retrospectively curated tasks with well-specified requirements and deterministic metrics -- conditions that diverge fundamentally from production environments where requirements contain implicit constraints, inputs are heterogeneous multi-modal documents with information fragmented across sources, tasks demand undeclared domain expertise, outputs are long-horizon professional deliverables, and success is judged by domain experts whose standards evolve over time. We present AlphaEval, a production-grounded benchmark of 94 tasks sourced from seven companies deploying AI agents in their core business, spanning six O*NET (Occupational Information Network) domains. Unlike model-centric benchmarks, AlphaEval evaluates complete agent products -- Claude Code, Codex, etc. -- as commercial systems, capturing performance variations invisible to model-level evaluation. Our evaluation framework covers multiple paradigms (LLM-as-a-Judge, reference-driven metrics, formal verification, rubric-based assessment, automated UI testing, etc.), with individual domains composing multiple paradigms. Beyond the benchmark itself, we contribute a requirement-to-benchmark construction framework -- a systematic methodology that transforms authentic production requirements into executable evaluation tasks in minimal time. This framework standardizes the entire pipeline from requirement to evaluation, providing a reproducible, modular process that any organization can adopt to construct production-grounded benchmarks for their own domains.
LGMar 1, 2022
A Domain-Theoretic Framework for Robustness Analysis of Neural NetworksCan Zhou, Razin A. Shaikh, Yiran Li et al.
A domain-theoretic framework is presented for validated robustness analysis of neural networks. First, global robustness of a general class of networks is analyzed. Then, using the fact that Edalat's domain-theoretic L-derivative coincides with Clarke's generalized gradient, the framework is extended for attack-agnostic local robustness analysis. The proposed framework is ideal for designing algorithms which are correct by construction. This claim is exemplified by developing a validated algorithm for estimation of Lipschitz constant of feedforward regressors. The completeness of the algorithm is proved over differentiable networks, and also over general position ReLU networks. Computability results are obtained within the framework of effectively given domains. Using the proposed domain model, differentiable and non-differentiable networks can be analyzed uniformly. The validated algorithm is implemented using arbitrary-precision interval arithmetic, and the results of some experiments are presented. The software implementation is truly validated, as it handles floating-point errors as well.
AIMar 27
Xpertbench: Expert Level Tasks with Rubrics-Based EvaluationXue Liu, Xin Ma, Yuxin Ma et al.
As Large Language Models (LLMs) exhibit plateauing performance on conventional benchmarks, a pivotal challenge persists: evaluating their proficiency in complex, open-ended tasks characterizing genuine expert-level cognition. Existing frameworks suffer from narrow domain coverage, reliance on generalist tasks, or self-evaluation biases. To bridge this gap, we present XpertBench, a high-fidelity benchmark engineered to assess LLMs across authentic professional domains. XpertBench consists of 1,346 meticulously curated tasks across 80 categories, spanning finance, healthcare, legal services, education, and dual-track research (STEM and Humanities). These tasks are derived from over 1,000 submissions by domain experts--including researchers from elite institutions and practitioners with extensive clinical or industrial experience--ensuring superior ecological validity. Each task uses detailed rubrics with mostly 15-40 weighted checkpoints to assess professional rigor. To facilitate scalable yet human-aligned assessment, we introduce ShotJudge, a novel evaluation paradigm that employs LLM judges calibrated with expert few-shot exemplars to mitigate self-rewarding biases. Our empirical evaluation of state-of-the-art LLMs reveals a pronounced performance ceiling: even leading models achieve a peak success rate of only ~66%, with a mean score around 55%. Models also exhibit domain-specific divergence, showing non-overlapping strengths in quantitative reasoning versus linguistic synthesis.. These findings underscore a significant "expert-gap" in current AI systems and establish XpertBench as a critical instrument for navigating the transition from general-purpose assistants to specialized professional collaborators.
CVSep 9, 2024
Memoryless Multimodal Anomaly Detection via Student-Teacher Network and Signed Distance LearningZhongbin Sun, Xiaolong Li, Yiran Li et al.
Unsupervised anomaly detection is a challenging computer vision task, in which 2D-based anomaly detection methods have been extensively studied. However, multimodal anomaly detection based on RGB images and 3D point clouds requires further investigation. The existing methods are mainly inspired by memory bank based methods commonly used in 2D-based anomaly detection, which may cost extra memory for storing mutimodal features. In present study, a novel memoryless method MDSS is proposed for multimodal anomaly detection, which employs a light-weighted student-teacher network and a signed distance function to learn from RGB images and 3D point clouds respectively, and complements the anomaly information from the two modalities. Specifically, a student-teacher network is trained with normal RGB images and masks generated from point clouds by a dynamic loss, and the anomaly score map could be obtained from the discrepancy between the output of student and teacher. Furthermore, the signed distance function learns from normal point clouds to predict the signed distances between points and surface, and the obtained signed distances are used to generate anomaly score map. Subsequently, the anomaly score maps are aligned to generate the final anomaly score map for detection. The experimental results indicate that MDSS is comparable but more stable than the SOTA memory bank based method Shape-guided, and furthermore performs better than other baseline methods.
SIMay 4
Quantum Hypergraph PartitioningYiran Li, Y. Batuhan Yilmaz, Michael Silver et al.
Hypergraph partitioning is a fundamental optimization problem with applications in data management and other domains involving higher-order relations. In this paper, we study balanced hypergraph partitioning from the perspective of quantum optimization. We formalize balanced $k$-way hypergraph partitioning with general hyperedge cut functions, and derive corresponding binary optimization formulations targeted at quantum optimization methods in both the two-way and multi-way settings. Our discussion highlights which cut functions admit Quadratic Unconstrained Binary Optimization (QUBO) encodings and which instead lead to higher-order binary objectives or rational forms. As a preliminary empirical validation, we focus on balanced two-way partitioning with the all-or-nothing cut on 3-uniform hypergraphs, where a direct QUBO is available, and evaluate simulated Quantum Approximate Optimization Algorithm (QAOA) and Simulated Annealing (SA) on small instances against exact solutions. The results show that the formulation is effective on small hypergraphs and that the balance-penalty weight plays a critical role in trading off cut quality and balance.
CVApr 21
LoViF 2026 Challenge on Real-World All-in-One Image Restoration: Methods and ResultsXiang Chen, Hao Li, Jiangxin Dong et al.
This paper presents a review for the LoViF Challenge on Real-World All-in-One Image Restoration. The challenge aimed to advance research on real-world all-in-one image restoration under diverse real-world degradation conditions, including blur, low-light, haze, rain, and snow. It provided a unified benchmark to evaluate the robustness and generalization ability of restoration models across multiple degradation categories within a common framework. The competition attracted 124 registered participants and received 9 valid final submissions with corresponding fact sheets, significantly contributing to the progress of real-world all-in-one image restoration. This report provides a detailed analysis of the submitted methods and corresponding results, emphasizing recent progress in unified real-world image restoration. The analysis highlights effective approaches and establishes a benchmark for future research in real-world low-level vision.
LGDec 12, 2024
RingFormer: A Ring-Enhanced Graph Transformer for Organic Solar Cell Property PredictionZhihao Ding, Ting Zhang, Yiran Li et al. · utoronto
Organic Solar Cells (OSCs) are a promising technology for sustainable energy production. However, the identification of molecules with desired OSC properties typically involves laborious experimental research. To accelerate progress in the field, it is crucial to develop machine learning models capable of accurately predicting the properties of OSC molecules. While graph representation learning has demonstrated success in molecular property prediction, it remains underexplored for OSC-specific tasks. Existing methods fail to capture the unique structural features of OSC molecules, particularly the intricate ring systems that critically influence OSC properties, leading to suboptimal performance. To fill the gap, we present RingFormer, a novel graph transformer framework specially designed to capture both atom and ring level structural patterns in OSC molecules. RingFormer constructs a hierarchical graph that integrates atomic and ring structures and employs a combination of local message passing and global attention mechanisms to generate expressive graph representations for accurate OSC property prediction. We evaluate RingFormer's effectiveness on five curated OSC molecule datasets through extensive experiments. The results demonstrate that RingFormer consistently outperforms existing methods, achieving a 22.77% relative improvement over the nearest competitor on the CEPDB dataset.
IVMar 5
ICHOR: A Robust Representation Learning Approach for ASL CBF Maps with Self-Supervised Masked AutoencodersXavier Beltran-Urbano, Yiran Li, Xinglin Zeng et al.
Arterial spin labeling (ASL) perfusion MRI allows direct quantification of regional cerebral blood flow (CBF) without exogenous contrast, enabling noninvasive measurements that can be repeated without constraints imposed by contrast injection. ASL is increasingly acquired in research studies and clinical MRI protocols. Building on successes in structural imaging, recent efforts have implemented deep learning based methods to improve image quality, enable automated quality control, and derive robust quantitative and predictive biomarkers with ASL derived CBF. However, progress has been limited by variable image quality, substantial inter-site, vendor and protocol differences, and limited availability of labeled datasets needed to train models that generalize across cohorts. To address these challenges, we introduce ICHOR, a self supervised pre-training approach for ASL CBF maps that learns transferable representations using 3D masked autoencoders. ICHOR is pretrained via masked image modeling using a Vision Transformer backbone and can be used as a general-purpose encoder for downstream ASL tasks. For pre-training, we curated one of the largest ASL datasets to date, comprising 11,405 ASL CBF scans from 14 studies spanning multiple sites and acquisition protocols. We evaluated the pre-trained ICHOR encoder on three downstream diagnostic classification tasks and one ASL CBF map quality prediction regression task. Across all evaluations, ICHOR outperformed existing neuroimaging self-supervised pre-training methods adapted to ASL. Pre-trained weights and code will be made publicly available.
CVAug 11, 2025
Information Bottleneck-based Causal Attention for Multi-label Medical Image RecognitionXiaoxiao Cui, Yiran Li, Kai He et al.
Multi-label classification (MLC) of medical images aims to identify multiple diseases and holds significant clinical potential. A critical step is to learn class-specific features for accurate diagnosis and improved interpretability effectively. However, current works focus primarily on causal attention to learn class-specific features, yet they struggle to interpret the true cause due to the inadvertent attention to class-irrelevant features. To address this challenge, we propose a new structural causal model (SCM) that treats class-specific attention as a mixture of causal, spurious, and noisy factors, and a novel Information Bottleneck-based Causal Attention (IBCA) that is capable of learning the discriminative class-specific attention for MLC of medical images. Specifically, we propose learning Gaussian mixture multi-label spatial attention to filter out class-irrelevant information and capture each class-specific attention pattern. Then a contrastive enhancement-based causal intervention is proposed to gradually mitigate the spurious attention and reduce noise information by aligning multi-head attention with the Gaussian mixture multi-label spatial. Quantitative and ablation results on Endo and MuReD show that IBCA outperforms all methods. Compared to the second-best results for each metric, IBCA achieves improvements of 6.35\% in CR, 7.72\% in OR, and 5.02\% in mAP for MuReD, 1.47\% in CR, and 1.65\% in CF1, and 1.42\% in mAP for Endo.
LGJun 6, 2024
GNNAnatomy: Rethinking Model-Level Explanations for Graph Neural NetworksHsiao-Ying Lu, Yiran Li, Ujwal Pratap Krishna Kaluvakolanu Thyagarajan et al.
Graph Neural Networks (GNNs) achieve outstanding performance across graph-based tasks but remain difficult to interpret. In this paper, we revisit foundational assumptions underlying model-level explanation methods for GNNs, namely: (1) maximizing classification confidence yields representative explanations, (2) a single explanation suffices for an entire class of graphs, and (3) explanations are inherently trustworthy. We identify pitfalls resulting from these assumptions: methods that optimize for classification confidence may overlook partially learned patterns; topological diversity across graph subsets within the same class is often underrepresented; and explanations alone offer limited support for building user trust when applied to new datasets or models. This paper introduces GNNAnatomy, a distillation-based method designed to generate explanations while avoiding these pitfalls. GNNAnatomy first characterizes graph topology using graphlets, a set of fundamental substructures. We then train a transparent multilayer perceptron surrogate to directly approximate GNN predictions based on the graphlet representations. By analyzing the weights assigned to each graphlet, we identify the most discriminative topologies, which serve as GNN explanations. To account for structural diversity within a class, GNNAnatomy generates explanations at the required granularity through an interface that supports human-AI teaming. This interface helps users identify subsets of graphs where distinct critical substructures drive class differentiation, enabling multi-grained explanations. Additionally, by enabling exploration and linking explanations back to input graphs, the interface fosters greater transparency and trust. We evaluate GNNAnatomy on both synthetic and real-world datasets through quantitative metrics and qualitative comparisons with state-of-the-art model-level explainable GNN methods.
SIJan 8, 2024
A Visual Analytics Design for Connecting Healthcare Team Communication to Patient OutcomesHsiao-Ying Lu, Yiran Li, Kwan-Liu Ma
Communication among healthcare professionals (HCPs) is crucial for the quality of patient treatment. Surrounding each patient's treatment, communication among HCPs can be examined as temporal networks, constructed from Electronic Health Record (EHR) access logs. This paper introduces a visual analytics system designed to study the effectiveness and efficiency of temporal communication networks mediated by the EHR system. We present a method that associates network measures with patient survival outcomes and devises effectiveness metrics based on these associations. To analyze communication efficiency, we extract the latencies and frequencies of EHR accesses. Our visual analytics system is designed to assist in inspecting and understanding the composed communication effectiveness metrics and to enable the exploration of communication efficiency by encoding latencies and frequencies in an information flow diagram. We demonstrate and evaluate our system through multiple case studies and an expert review.
LGMay 23, 2023
NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient InferenceRuiqi Sun, Siwei Ye, Jie Zhao et al.
The inherent diversity of computation types within the deep neural network (DNN) models often requires a variety of specialized units in hardware processors, which limits computational efficiency, increasing both inference latency and power consumption, especially when the hardware processor needs to support and execute different neural networks. In this study, we introduce NeuralMatrix, which elastically transforms the computations of entire DNNs into linear matrix operations. This transformation allows seamless execution of various DNN models all with matrix operations and paves the way for running versatile DNN models with a single General Matrix Multiplication (GEMM) accelerator.Extensive experiments with both CNN and transformer-based models demonstrate the potential of NeuralMatrix to accurately and efficiently execute a wide range of DNN models, achieving 2.17-38.72 times computation efficiency (i.e., throughput per power) compared to CPUs, GPUs, and SoC platforms. This level of efficiency is usually only attainable with the accelerator designed for a specific neural network.
HCAug 28, 2021
A Visual Analytics System for Water Distribution System OptimizationYiran Li, Erin Musabandesu, Takanori Fujiwara et al.
The optimization of water distribution systems (WDSs) is vital to minimize energy costs required for their operations. A principal approach taken by researchers is identifying an optimal scheme for water pump controls through examining computational simulations of WDSs. However, due to a large number of possible control combinations and the complexity of WDS simulations, it remains non-trivial to identify the best pump controls by reviewing the simulation results. To address this problem, we design a visual analytics system that helps understand relationships between simulation inputs and outputs towards better optimization. Our system incorporates interpretable machine learning as well as multiple linked visualizations to capture essential input-output relationships from complex WDS simulations. We demonstrate our system's effectiveness through a practical case study and evaluate its usability through expert reviews. Our results show that our system can lessen the burden of analysis and assist in determining optimal operating schemes.
HCMar 6, 2021
ChartStory: Automated Partitioning, Layout, and Captioning of Charts into Comic-Style NarrativesJian Zhao, Shenyu Xu, Senthil Chandrasegaran et al.
Visual data storytelling is gaining importance as a means of presenting data-driven information or analysis results, especially to the general public. This has resulted in design principles being proposed for data-driven storytelling, and new authoring tools being created to aid such storytelling. However, data analysts typically lack sufficient background in design and storytelling to make effective use of these principles and authoring tools. To assist this process, we present ChartStory for crafting data stories from a collection of user-created charts, using a style akin to comic panels to imply the underlying sequence and logic of data-driven narratives. Our approach is to operationalize established design principles into an advanced pipeline which characterizes charts by their properties and similarity, and recommends ways to partition, layout, and caption story pieces to serve a narrative. ChartStory also augments this pipeline with intuitive user interactions for visual refinement of generated data comics. We extensively and holistically evaluate ChartStory via a trio of studies. We first assess how the tool supports data comic creation in comparison to a manual baseline tool. Data comics from this study are subsequently compared and evaluated to ChartStory's automated recommendations by a team of narrative visualization practitioners. This is followed by a pair of interview studies with data scientists using their own datasets and charts who provide an additional assessment of the system. We find that ChartStory provides cogent recommendations for narrative generation, resulting in data comics that compare favorably to manually-created ones.
CVMar 1, 2021
Maximal function pooling with applicationsWojciech Czaja, Weilin Li, Yiran Li et al.
Inspired by the Hardy-Littlewood maximal function, we propose a novel pooling strategy which is called maxfun pooling. It is presented both as a viable alternative to some of the most popular pooling functions, such as max pooling and average pooling, and as a way of interpolating between these two algorithms. We demonstrate the features of maxfun pooling with two applications: first in the context of convolutional sparse coding, and then for image classification.
IVMay 15, 2020
A Learning-from-noise Dilated Wide Activation Network for denoising Arterial Spin Labeling (ASL) Perfusion ImagesDanfeng Xie, Yiran Li, Hanlu Yang et al.
Arterial spin labeling (ASL) perfusion MRI provides a non-invasive way to quantify cerebral blood flow (CBF) but it still suffers from a low signal-to-noise-ratio (SNR). Using deep machine learning (DL), several groups have shown encouraging denoising results. Interestingly, the improvement was obtained when the deep neural network was trained using noise-contaminated surrogate reference because of the lack of golden standard high quality ASL CBF images. More strikingly, the output of these DL ASL networks (ASLDN) showed even higher SNR than the surrogate reference. This phenomenon indicates a learning-from-noise capability of deep networks for ASL CBF image denoising, which can be further enhanced by network optimization. In this study, we proposed a new ASLDN to test whether similar or even better ASL CBF image quality can be achieved in the case of highly noisy training reference. Different experiments were performed to validate the learning-from-noise hypothesis. The results showed that the learning-from-noise strategy produced better output quality than ASLDN trained with relatively high SNR reference.
CVApr 15, 2020
Learning Furniture Compatibility with Graph Neural NetworksLuisa F. Polania, Mauricio Flores, Yiran Li et al.
We propose a graph neural network (GNN) approach to the problem of predicting the stylistic compatibility of a set of furniture items from images. While most existing results are based on siamese networks which evaluate pairwise compatibility between items, the proposed GNN architecture exploits relational information among groups of items. We present two GNN models, both of which comprise a deep CNN that extracts a feature representation for each image, a gated recurrent unit (GRU) network that models interactions between the furniture items in a set, and an aggregation function that calculates the compatibility score. In the first model, a generalized contrastive loss function that promotes the generation of clustered embeddings for items belonging to the same furniture set is introduced. Also, in the first model, the edge function between nodes in the GRU and the aggregation function are fixed in order to limit model complexity and allow training on smaller datasets; in the second model, the edge function and aggregation function are learned directly from the data. We demonstrate state-of-the art accuracy for compatibility prediction and "fill in the blank" tasks on the Bonn and Singapore furniture datasets. We further introduce a new dataset, called the Target Furniture Collections dataset, which contains over 6000 furniture items that have been hand-curated by stylists to make up 1632 compatible sets. We also demonstrate superior prediction accuracy on this dataset.
MED-PHFeb 18, 2020
A Visual Analytics System for Multi-model Comparison on Clinical Data PredictionsYiran Li, Takanori Fujiwara, Yong K. Choi et al.
There is a growing trend of applying machine learning methods to medical datasets in order to predict patients' future status. Although some of these methods achieve high performance, challenges still exist in comparing and evaluating different models through their interpretable information. Such analytics can help clinicians improve evidence-based medical decision making. In this work, we develop a visual analytics system that compares multiple models' prediction criteria and evaluates their consistency. With our system, users can generate knowledge on different models' inner criteria and how confidently we can rely on each model's prediction for a certain patient. Through a case study of a publicly available clinical dataset, we demonstrate the effectiveness of our visual analytics system to assist clinicians and researchers in comparing and quantitatively evaluating different machine learning methods.
MED-PHFeb 18, 2020
Comparative Visual Analytics for Assessing Medical Records with Sequence EmbeddingRongchen Guo, Takanori Fujiwara, Yiran Li et al.
Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare. Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians to make decisions with high confidence. However, such analysis is not straightforward due to the characteristics of medical records: high dimensionality, irregularity in time, and sparsity. To address this challenge, we introduce a method for similarity calculation of medical records. Our method employs event and sequence embeddings. While we use an autoencoder for the event embedding, we apply its variant with the self-attention mechanism for the sequence embedding. Moreover, in order to better handle the irregularity of data, we enhance the self-attention mechanism with consideration of different time intervals. We have developed a visual analytics system to support comparative studies of patient records. To make a comparison of sequences with different lengths easier, our system incorporates a sequence alignment method. Through its interactive interface, the user can quickly identify patients of interest and conveniently review both the temporal and multivariate aspects of the patient records. We demonstrate the effectiveness of our design and system with case studies using a real-world dataset from the neonatal intensive care unit of UC Davis.