Daniel Haehn

CV
h-index13
19papers
167citations
Novelty34%
AI Score51

19 Papers

CVAug 30, 2023Code
MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

Jianning Li, Zongwei Zhou, Jiancheng Yang et al.

Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, MedShapeNet includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: https://medshapenet.ikim.nrw/ and https://github.com/Jianningli/medshapenet-feedback

HCFeb 19, 2023
AutoDOViz: Human-Centered Automation for Decision Optimization

Daniel Karl I. Weidele, Shazia Afzal, Abel N. Valente et al.

We present AutoDOViz, an interactive user interface for automated decision optimization (AutoDO) using reinforcement learning (RL). Decision optimization (DO) has classically being practiced by dedicated DO researchers where experts need to spend long periods of time fine tuning a solution through trial-and-error. AutoML pipeline search has sought to make it easier for a data scientist to find the best machine learning pipeline by leveraging automation to search and tune the solution. More recently, these advances have been applied to the domain of AutoDO, with a similar goal to find the best reinforcement learning pipeline through algorithm selection and parameter tuning. However, Decision Optimization requires significantly more complex problem specification when compared to an ML problem. AutoDOViz seeks to lower the barrier of entry for data scientists in problem specification for reinforcement learning problems, leverage the benefits of AutoDO algorithms for RL pipeline search and finally, create visualizations and policy insights in order to facilitate the typical interactive nature when communicating problem formulation and solution proposals between DO experts and domain experts. In this paper, we report our findings from semi-structured expert interviews with DO practitioners as well as business consultants, leading to design requirements for human-centered automation for DO with RL. We evaluate a system implementation with data scientists and find that they are significantly more open to engage in DO after using our proposed solution. AutoDOViz further increases trust in RL agent models and makes the automated training and evaluation process more comprehensible. As shown for other automation in ML tasks, we also conclude automation of RL for DO can benefit from user and vice-versa when the interface promotes human-in-the-loop.

CVNov 18, 2023Code
Lesion Search with Self-supervised Learning

Kristin Qi, Jiali Cheng, Daniel Haehn

Content-based image retrieval (CBIR) with self-supervised learning (SSL) accelerates clinicians' interpretation of similar images without manual annotations. We develop a CBIR from the contrastive learning SimCLR and incorporate a generalized-mean (GeM) pooling followed by L2 normalization to classify lesion types and retrieve similar images before clinicians' analysis. Results have shown improved performance. We additionally build an open-source application for image analysis and retrieval. The application is easy to integrate, relieving manual efforts and suggesting the potential to support clinicians' everyday activities.

CVMay 13, 2024Code
Boostlet.js: Image processing plugins for the web via JavaScript injection

Edward Gaibor, Shruti Varade, Rohini Deshmukh et al.

Can web-based image processing and visualization tools easily integrate into existing websites without significant time and effort? Our Boostlet.js library addresses this challenge by providing an open-source, JavaScript-based web framework to enable additional image processing functionalities. Boostlet examples include kernel filtering, image captioning, data visualization, segmentation, and web-optimized machine-learning models. To achieve this, Boostlet.js uses a browser bookmark to inject a user-friendly plugin selection tool called PowerBoost into any host website. Boostlet also provides on-site access to a standard API independent of any visualization framework for pixel data and scene manipulation. Web-based Boostlets provide a modular architecture and client-side processing capabilities to apply advanced image-processing techniques using consumer-level hardware. The code is open-source and available.

CVNov 15, 2024Code
Melanoma Detection with Uncertainty Quantification

SangHyuk Kim, Edward Gaibor, Brian Matejek et al.

Early detection of melanoma is crucial for improving survival rates. Current detection tools often utilize data-driven machine learning methods but often overlook the full integration of multiple datasets. We combine publicly available datasets to enhance data diversity, allowing numerous experiments to train and evaluate various classifiers. We then calibrate them to minimize misdiagnoses by incorporating uncertainty quantification. Our experiments on benchmark datasets show accuracies of up to 93.2% before and 97.8% after applying uncertainty-based rejection, leading to a reduction in misdiagnoses by over 40.5%. Our code and data are publicly available, and a web-based interface for quick melanoma detection of user-supplied images is also provided.

CVNov 18, 2025Code
MRI Plane Orientation Detection using a Context-Aware 2.5D Model

SangHyuk Kim, Daniel Haehn, Sumientra Rampersad

Humans can easily identify anatomical planes (axial, coronal, and sagittal) on a 2D MRI slice, but automated systems struggle with this task. Missing plane orientation metadata can complicate analysis, increase domain shift when merging heterogeneous datasets, and reduce accuracy of diagnostic classifiers. This study develops a classifier that accurately generates plane orientation metadata. We adopt a 2.5D context-aware model that leverages multi-slice information to avoid ambiguity from isolated slices and enable robust feature learning. We train the 2.5D model on both 3D slice sequences and static 2D images. While our 2D reference model achieves 98.74% accuracy, our 2.5D method raises this to 99.49%, reducing errors by 60%, highlighting the importance of 2.5D context. We validate the utility of our generated metadata in a brain tumor detection task. A gated strategy selectively uses metadata-enhanced predictions based on uncertainty scores, boosting accuracy from 97.0% with an image-only model to 98.0%, reducing misdiagnoses by 33.3%. We integrate our plane orientation model into an interactive web application and provide it open-source.

CLMar 30, 2025
Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts

Youxiang Zhu, Ruochen Li, Danqing Wang et al.

Long-context large language models (LLMs) are prone to be distracted by irrelevant contexts. The reason for distraction remains poorly understood. In this paper, we first identify the contextual heads, a special group of attention heads that control the overall attention of the LLM. Then, we demonstrate that distraction arises when contextual heads fail to allocate sufficient attention to relevant contexts and can be mitigated by increasing attention to these contexts. We further identify focus directions, located at the key and query activations of these heads, which enable them to allocate more attention to relevant contexts without explicitly specifying which context is relevant. We comprehensively evaluate the effect of focus direction on various long-context tasks and find out focus directions could help to mitigate the poor task alignment of the long-context LLMs. We believe our findings could promote further research on long-context LLM alignment.

CVFeb 28, 2025
Generalization of CNNs on Relational Reasoning with Bar Charts

Zhenxing Cui, Lu Chen, Yunhai Wang et al.

This paper presents a systematic study of the generalization of convolutional neural networks (CNNs) and humans on relational reasoning tasks with bar charts. We first revisit previous experiments on graphical perception and update the benchmark performance of CNNs. We then test the generalization performance of CNNs on a classic relational reasoning task: estimating bar length ratios in a bar chart, by progressively perturbing the standard visualizations. We further conduct a user study to compare the performance of CNNs and humans. Our results show that CNNs outperform humans only when the training and test data have the same visual encodings. Otherwise, they may perform worse. We also find that CNNs are sensitive to perturbations in various visual encodings, regardless of their relevance to the target bars. Yet, humans are mainly influenced by bar lengths. Our study suggests that robust relational reasoning with visualizations is challenging for CNNs. Improving CNNs' generalization performance may require training them to better recognize task-related visual properties.

CYApr 12
Measuring Changes in Instructor Class Design and Student Learning After the Release of Large Language Models (LLMs)

Amanda Potasznik, Daniel Haehn

Student use of Generative AI (GenAI) products in completing their classwork, with or without their professors' knowledge and/or approval, has resulted in substantial shifts in higher education. While GenAI use is widespread, its impact on student study methods, faculty course development, grade reporting, and overall learning is not well documented. This is a mixed-methods, multi-course study using retrospective quantitative analysis, instructor surveys, and anonymous student surveys at a university in the New England region of the United States. This research seeks to identify and document patterns in student and faculty perceptions of, and experiences in, the use of LLMs as a learning tool inside and outside of the university classroom. Alongside quantitative and thematic analysis of both faculty and student survey responses, historical grade data as reported to the university registrar is used to triangulate the phenomenon of learning achievement in pre- and post-LLM eras. It is hoped that this research can serve as a pilot study for a broader set of institutions. Results from this study can inform GenAI policy for professors, universities, and other educational institutions that are trying to maximize student learning in the age of AI.

LGJul 5, 2025
A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime

Shuning Jiang, Wei-Lun Chao, Daniel Haehn et al.

We present a data-domain sampling regime for quantifying CNNs' graphic perception behaviors. This regime lets us evaluate CNNs' ratio estimation ability in bar charts from three perspectives: sensitivity to training-test distribution discrepancies, stability to limited samples, and relative expertise to human observers. After analyzing 16 million trials from 800 CNNs models and 6,825 trials from 113 human participants, we arrived at a simple and actionable conclusion: CNNs can outperform humans and their biases simply depend on the training-test distance. We show evidence of this simple, elegant behavior of the machines when they interpret visualization images. osf.io/gfqc3 provides registration, the code for our sampling regime, and experimental results.

CVMar 22, 2024
Web-based Melanoma Detection

SangHyuk Kim, Edward Gaibor, Daniel Haehn

Melanoma is the most aggressive form of skin cancer, and early detection can significantly increase survival rates and prevent cancer spread. However, developing reliable automated detection techniques is difficult due to the lack of standardized datasets and evaluation methods. This study introduces a unified melanoma classification approach that supports 54 combinations of 11 datasets and 24 state-of-the-art deep learning architectures. It enables a fair comparison of 1,296 experiments and results in a lightweight model deployable to the web-based MeshNet architecture named Mela-D. This approach can run up to 33x faster by reducing parameters 24x to yield an analogous 88.8\% accuracy comparable with ResNet50 on previously unseen images. This allows efficient and accurate melanoma detection in real-world settings that can run on consumer-level hardware.

CVMar 5
Meta-D: Metadata-Aware Architectures for Brain Tumor Analysis and Missing-Modality Segmentation

SangHyuk Kim, Daniel Haehn, Sumientra Rampersad

We present Meta-D, an architecture that explicitly leverages categorical scanner metadata such as MRI sequence and plane orientation to guide feature extraction for brain tumor analysis. We aim to improve the performance of medical image deep learning pipelines by integrating explicit metadata to stabilize feature representations. We first evaluate this in 2D tumor detection, where injecting sequence (e.g., T1, T2) and plane (e.g., axial) metadata dynamically modulates convolutional features, yielding an absolute increase of up to 2.62% in F1-score over image-only baselines. Because metadata grounds feature extraction when data are available, we hypothesize it can serve as a robust anchor when data are missing. We apply this to 3D missing-modality tumor segmentation. Our Transformer Maximizer utilizes metadata-based cross-attention to isolate and route available modalities, ensuring the network focuses on valid slices. This targeted attention improves brain tumor segmentation Dice scores by up to 5.12% under extreme modality scarcity while reducing model parameters by 24.1%.

CVOct 17, 2025
Designing a Convolutional Neural Network for High-Accuracy Oral Cavity Squamous Cell Carcinoma (OCSCC) Detection

Vishal Manikanden, Aniketh Bandlamudi, Daniel Haehn

Oral Cavity Squamous Cell Carcinoma (OCSCC) is the most common type of head and neck cancer. Due to the subtle nature of its early stages, deep and hidden areas of development, and slow growth, OCSCC often goes undetected, leading to preventable deaths. However, properly trained Convolutional Neural Networks (CNNs), with their precise image segmentation techniques and ability to apply kernel matrices to modify the RGB values of images for accurate image pattern recognition, would be an effective means for early detection of OCSCC. Pairing this neural network with image capturing and processing hardware would allow increased efficacy in OCSCC detection. The aim of our project is to develop a Convolutional Neural Network trained to recognize OCSCC, as well as to design a physical hardware system to capture and process detailed images, in order to determine the image quality required for accurate predictions. A CNN was trained on 4293 training images consisting of benign and malignant tumors, as well as negative samples, and was evaluated for its precision, recall, and Mean Average Precision (mAP) in its predictions of OCSCC. A testing dataset of randomly assorted images of cancerous, non-cancerous, and negative images was chosen, and each image was altered to represent 5 common resolutions. This test data set was thoroughly analyzed by the CNN and predictions were scored on the basis of accuracy. The designed enhancement hardware was used to capture detailed images, and its impact was scored. An application was developed to facilitate the testing process and bring open access to the CNN. Images of increasing resolution resulted in higher-accuracy predictions on a logarithmic scale, demonstrating the diminishing returns of higher pixel counts.

CVApr 5, 2025
Evaluating Graphical Perception with Multimodal LLMs

Rami Huu Nguyen, Kenichi Maeda, Mahsa Geshvadi et al.

Multimodal Large Language Models (MLLMs) have remarkably progressed in analyzing and understanding images. Despite these advancements, accurately regressing values in charts remains an underexplored area for MLLMs. For visualization, how do MLLMs perform when applied to graphical perception tasks? Our paper investigates this question by reproducing Cleveland and McGill's seminal 1984 experiment and comparing it against human task performance. Our study primarily evaluates fine-tuned and pretrained models and zero-shot prompting to determine if they closely match human graphical perception. Our findings highlight that MLLMs outperform human task performance in some cases but not in others. We highlight the results of all experiments to foster an understanding of where MLLMs succeed and fail when applied to data visualization.

HCMay 16, 2020
FiberStars: Visual Comparison of Diffusion Tractography Data between Multiple Subjects

Loraine Franke, Daniel Karl I. Weidele, Fan Zhang et al.

Tractography from high-dimensional diffusion magnetic resonance imaging (dMRI) data allows brain's structural connectivity analysis. Recent dMRI studies aim to compare connectivity patterns across subject groups and disease populations to understand subtle abnormalities in the brain's white matter connectivity and distributions of biologically sensitive dMRI derived metrics. Existing software products focus solely on the anatomy, are not intuitive or restrict the comparison of multiple subjects. In this paper, we present the design and implementation of FiberStars, a visual analysis tool for tractography data that allows the interactive visualization of brain fiber clusters combining existing 3D anatomy with compact 2D visualizations. With FiberStars, researchers can analyze and compare multiple subjects in large collections of brain fibers using different views. To evaluate the usability of our software, we performed a quantitative user study. We asked domain experts and non-experts to find patterns in a tractography dataset with either FiberStars or an existing dMRI exploration tool. Our results show that participants using FiberStars can navigate extensive collections of tractography faster and more accurately. All our research, software, and results are available openly.

IVApr 26, 2020
TRAKO: Efficient Transmission of Tractography Data for Visualization

Daniel Haehn, Loraine Franke, Fan Zhang et al.

Fiber tracking produces large tractography datasets that are tens of gigabytes in size consisting of millions of streamlines. Such vast amounts of data require formats that allow for efficient storage, transfer, and visualization. We present TRAKO, a new data format based on the Graphics Layer Transmission Format (glTF) that enables immediate graphical and hardware-accelerated processing. We integrate a state-of-the-art compression technique for vertices, streamlines, and attached scalar and property data. We then compare TRAKO to existing tractography storage methods and provide a detailed evaluation on eight datasets. TRAKO can achieve data reductions of over 28x without loss of statistical significance when used to replicate analysis from previously published studies.

CVDec 14, 2018
Fast Mitochondria Detection for Connectomics

Vincent Casser, Kai Kang, Hanspeter Pfister et al.

High-resolution connectomics data allows for the identification of dysfunctional mitochondria which are linked to a variety of diseases such as autism or bipolar. However, manual analysis is not feasible since datasets can be petabytes in size. We present a fully automatic mitochondria detector based on a modified U-Net architecture that yields high accuracy and fast processing times. We evaluate our method on multiple real-world connectomics datasets, including an improved version of the EPFL mitochondria benchmark. Our results show an Jaccard index of up to 0.90 with inference times lower than 16ms for a 512x512px image tile. This speed is faster than the acquisition speed of modern electron microscopes, enabling mitochondria detection in real-time. Our detector ranks first for real-time detection when compared to previous works and data, results, and code are openly available.

CVApr 4, 2017
Guided Proofreading of Automatic Segmentations for Connectomics

Daniel Haehn, Verena Kaynig, James Tompkin et al.

Automatic cell image segmentation methods in connectomics produce merge and split errors, which require correction through proofreading. Previous research has identified the visual search for these errors as the bottleneck in interactive proofreading. To aid error correction, we develop two classifiers that automatically recommend candidate merges and splits to the user. These classifiers use a convolutional neural network (CNN) that has been trained with errors in automatic segmentations against expert-labeled ground truth. Our classifiers detect potentially-erroneous regions by considering a large context region around a segmentation boundary. Corrections can then be performed by a user with yes/no decisions, which reduces variation of information 7.5x faster than previous proofreading methods. We also present a fully-automatic mode that uses a probability threshold to make merge/split decisions. Extensive experiments using the automatic approach and comparing performance of novice and expert users demonstrate that our method performs favorably against state-of-the-art proofreading methods on different connectomics datasets.

CVOct 27, 2016
Icon: An Interactive Approach to Train Deep Neural Networks for Segmentation of Neuronal Structures

Felix Gonda, Verena Kaynig, Ray Thouis et al.

We present an interactive approach to train a deep neural network pixel classifier for the segmentation of neuronal structures. An interactive training scheme reduces the extremely tedious manual annotation task that is typically required for deep networks to perform well on image segmentation problems. Our proposed method employs a feedback loop that captures sparse annotations using a graphical user interface, trains a deep neural network based on recent and past annotations, and displays the prediction output to users in almost real-time. Our implementation of the algorithm also allows multiple users to provide annotations in parallel and receive feedback from the same classifier. Quick feedback on classifier performance in an interactive setting enables users to identify and label examples that are more important than others for segmentation purposes. Our experiments show that an interactively-trained pixel classifier produces better region segmentation results on Electron Microscopy (EM) images than those generated by a network of the same architecture trained offline on exhaustive ground-truth labels.