Xiaoyi Jiang

LG
h-index17
29papers
652citations
Novelty47%
AI Score56

29 Papers

CLOct 19, 2022
TabLLM: Few-shot Classification of Tabular Data with Large Language Models

Stefan Hegselmann, Alejandro Buendia, Hunter Lang et al. · mit

We study the application of large language models to zero-shot and few-shot classification of tabular data. We prompt the large language model with a serialization of the tabular data to a natural-language string, together with a short description of the classification problem. In the few-shot setting, we fine-tune the large language model using some labeled examples. We evaluate several serialization methods including templates, table-to-text models, and large language models. Despite its simplicity, we find that this technique outperforms prior deep-learning-based tabular classification methods on several benchmark datasets. In most cases, even zero-shot classification obtains non-trivial performance, illustrating the method's ability to exploit prior knowledge encoded in large language models. Unlike many deep learning methods for tabular datasets, this approach is also competitive with strong traditional baselines like gradient-boosted trees, especially in the very-few-shot setting.

LGApr 17Code
Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction

Jingyuan Li, Xiaoyi Jiang, Fukang Wen et al. · microsoft-research

Discrete diffusion models based on continuous-time Markov chains (CTMCs) have shown strong performance on language and discrete data generation, yet existing approaches typically parameterize the reverse rate matrix as a single object -- via concrete scores, clean-data predictions ($x_0$-parameterization), or denoising distributions -- rather than aligning the parameterization with the intrinsic CTMC decomposition into jump timing and jump direction. Since a CTMC is fundamentally a Poisson process fully determined by these two quantities, decomposing along this structure is closer to first principles and naturally leads to our formulation. We propose \textbf{Neural CTMC}, which separately parameterizes the reverse process through an \emph{exit rate} (when to jump) and a \emph{jump distribution} (where to jump) using two dedicated network heads. We show that the evidence lower bound (ELBO) differs from a path-space KL divergence between the true and learned reverse processes by a $θ$-independent constant, so that the training objective is fully governed by the exit rate and jump distribution we parameterize. Moreover, this KL factorizes into a Poisson KL for timing and a categorical KL for direction. We further show that the tractable conditional surrogate preserves the gradients and minimizers of the corresponding marginal reverse-process objective under standard regularity assumptions. Our theoretical framework also covers masked and GIDD-style noise schedules. Empirically, while the uniform forward process has been explored in prior work, our model, to our best of the knowledge, is the first pure-uniform method to outperform mask-based methods on the OpenWebText dataset.To facilitate reproducibility, we release our pretrained weights at https://huggingface.co/Jiangxy1117/Neural-CTMC.

CVJun 2, 2022
A Bhattacharyya Coefficient-Based Framework for Noise Model-Aware Random Walker Image Segmentation

Dominik Drees, Florian Eilers, Ang Bian et al.

One well established method of interactive image segmentation is the random walker algorithm. Considerable research on this family of segmentation methods has been continuously conducted in recent years with numerous applications. These methods are common in using a simple Gaussian weight function which depends on a parameter that strongly influences the segmentation performance. In this work we propose a general framework of deriving weight functions based on probabilistic modeling. This framework can be concretized to cope with virtually any well-defined noise model. It eliminates the critical parameter and thus avoids time-consuming parameter search. We derive the specific weight functions for common noise types and show their superior performance on synthetic data as well as different biomedical image data (MRI images from the NYU fastMRI dataset, larvae images acquired with the FIM technique). Our framework can also be used in multiple other applications, e.g., the graph cut algorithm and its extensions.

CVSep 11, 2023
Phase-Specific Augmented Reality Guidance for Microscopic Cataract Surgery Using Long-Short Spatiotemporal Aggregation Transformer

Puxun Tu, Hongfei Ye, Haochen Shi et al.

Phacoemulsification cataract surgery (PCS) is a routine procedure conducted using a surgical microscope, heavily reliant on the skill of the ophthalmologist. While existing PCS guidance systems extract valuable information from surgical microscopic videos to enhance intraoperative proficiency, they suffer from non-phasespecific guidance, leading to redundant visual information. In this study, our major contribution is the development of a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase. Leveraging the inherent quasi-standardized nature of PCS procedures, we propose a two-stage surgical microscopic video recognition network. In the first stage, we implement a multi-task learning structure to segment the surgical limbus region and extract limbus region-focused spatial feature for each frame. In the second stage, we propose the long-short spatiotemporal aggregation transformer (LS-SAT) network to model local fine-grained and global temporal relationships, and combine the extracted spatial features to recognize the current surgical phase. Additionally, we collaborate closely with ophthalmologists to design AR visual cues by utilizing techniques such as limbus ellipse fitting and regional restricted normal cross-correlation rotation computation. We evaluated the network on publicly available and in-house datasets, with comparison results demonstrating its superior performance compared to related works. Ablation results further validated the effectiveness of the limbus region-focused spatial feature extractor and the combination of temporal features. Furthermore, the developed system was evaluated in a clinical setup, with results indicating remarkable accuracy and real-time performance. underscoring its potential for clinical applications.

LGJun 16, 2023
Building Blocks for a Complex-Valued Transformer Architecture

Florian Eilers, Xiaoyi Jiang

Most deep learning pipelines are built on real-valued operations to deal with real-valued inputs such as images, speech or music signals. However, a lot of applications naturally make use of complex-valued signals or images, such as MRI or remote sensing. Additionally the Fourier transform of signals is complex-valued and has numerous applications. We aim to make deep learning directly applicable to these complex-valued signals without using projections into $\mathbb{R}^2$. Thus we add to the recent developments of complex-valued neural networks by presenting building blocks to transfer the transformer architecture to the complex domain. We present multiple versions of a complex-valued Scaled Dot-Product Attention mechanism as well as a complex-valued layer normalization. We test on a classification and a sequence generation task on the MusicNet dataset and show improved robustness to overfitting while maintaining on-par performance when compared to the real-valued transformer architecture.

CVSep 21, 2022
Kernel-Based Generalized Median Computation for Consensus Learning

Andreas Nienkötter, Xiaoyi Jiang

Computing a consensus object from a set of given objects is a core problem in machine learning and pattern recognition. One popular approach is to formulate it as an optimization problem using the generalized median. Previous methods like the Prototype and Distance-Preserving Embedding methods transform objects into a vector space, solve the generalized median problem in this space, and inversely transform back into the original space. Both of these methods have been successfully applied to a wide range of object domains, where the generalized median problem has inherent high computational complexity (typically $\mathcal{NP}$-hard) and therefore approximate solutions are required. Previously, explicit embedding methods were used in the computation, which often do not reflect the spatial relationship between objects exactly. In this work we introduce a kernel-based generalized median framework that is applicable to both positive definite and indefinite kernels. This framework computes the relationship between objects and its generalized median in kernel space, without the need of an explicit embedding. We show that the spatial relationship between objects is more accurately represented in kernel space than in an explicit vector space using easy-to-compute kernels, and demonstrate superior performance of generalized median computation on datasets of three different domains. A software toolbox resulting from our work is made publicly available to encourage other researchers to explore the generalized median computation and applications.

LGMay 5
Covariance-Aware Goodness for Scalable Forward-Forward Learning

Xiaoyi Jiang, Bashir M. Al-Hashimi, Kai Xu

The Forward-Forward algorithm eliminates global gradient flow and full network activations storage. However, in convolutional settings, existing BP-free FF methods significantly under-perform backpropagation on complex benchmarks such as ImageNet-100 and Tiny-ImageNet. We identify this gap as a structural bottleneck in goodness extraction: standard sum-of-squares formulation collapses feature volumes into channel-wise activation energies which omits critical second-order dependencies. To address this, we propose a framework centered on three key components. First, Bi-axis Covariance Goodness(BiCovG) explicitly augments the standard goodness function with structured second-order information along two axes: cross-channel projections that model inter-feature covariance, and nested multi-scale aggregation that encodes spatial correlation statistics. This provides a tractable approximation to covariance-aware goodness without the prohibitive O(C^2) complexity of explicit matrix estimation. Second, a lightweight Logistic Fusion module aggregates layer-wise predictions, amplifying the contribution of deeper representations. Third, the Feature Alignment Layer(FAL) introduces a zero-initialized correction at block boundaries to mitigate representation misalignment in deep locally trained networks. By introducing these three components, we effectively double the depth of viable Forward-Forward learning, extending robust layer utilization from shallow baselines to 16 layer architectures like VGG-16. The resulting BP-free model achieves 73.01% on ImageNet-100 and 50.30% on Tiny-ImageNet. As a practical extension, Hybrid Goodness Blocks control the scope of gradient propagation via configurable block sizes, further narrowing the ImageNet-100 gap to 3.6% and matching BP on Tiny-ImageNet, while still reducing peak memory by approximately 50% relative to BP.

CVMay 13, 2025Code
Thermal Detection of People with Mobility Restrictions for Barrier Reduction at Traffic Lights Controlled Intersections

Xiao Ni, Carsten Kuehnel, Xiaoyi Jiang

Rapid advances in deep learning for computer vision have driven the adoption of RGB camera-based adaptive traffic light systems to improve traffic safety and pedestrian comfort. However, these systems often overlook the needs of people with mobility restrictions. Moreover, the use of RGB cameras presents significant challenges, including limited detection performance under adverse weather or low-visibility conditions, as well as heightened privacy concerns. To address these issues, we propose a fully automated, thermal detector-based traffic light system that dynamically adjusts signal durations for individuals with walking impairments or mobility burden and triggers the auditory signal for visually impaired individuals, thereby advancing towards barrier-free intersection for all users. To this end, we build the thermal dataset for people with mobility restrictions (TD4PWMR), designed to capture diverse pedestrian scenarios, particularly focusing on individuals with mobility aids or mobility burden under varying environmental conditions, such as different lighting, weather, and crowded urban settings. While thermal imaging offers advantages in terms of privacy and robustness to adverse conditions, it also introduces inherent hurdles for object detection due to its lack of color and fine texture details and generally lower resolution of thermal images. To overcome these limitations, we develop YOLO-Thermal, a novel variant of the YOLO architecture that integrates advanced feature extraction and attention mechanisms for enhanced detection accuracy and robustness in thermal imaging. Experiments demonstrate that the proposed thermal detector outperforms existing detectors, while the proposed traffic light system effectively enhances barrier-free intersection. The source codes and dataset are available at https://github.com/leon2014dresden/YOLO-THERMAL.

LGMar 13, 2024Code
DeepCSHAP: Utilizing Shapley Values to Explain Deep Complex-Valued Neural Networks

Florian Eilers, Xiaoyi Jiang

Deep Neural Networks are widely used in academy as well as corporate and public applications, including safety critical applications such as health care and autonomous driving. The ability to explain their output is critical for safety reasons as well as acceptance among applicants. A multitude of methods have been proposed to explain real-valued neural networks. Recently, complex-valued neural networks have emerged as a new class of neural networks dealing with complex-valued input data without the necessity of projecting them onto $\mathbb{R}^2$. This brings up the need to develop explanation algorithms for this kind of neural networks. In this paper we provide these developments. While we focus on adapting the widely used DeepSHAP algorithm to the complex domain, we also present versions of four gradient based explanation methods suitable for use in complex-valued neural networks. We evaluate the explanation quality of all presented algorithms and provide all of them as an open source library adaptable to most recent complex-valued neural network architectures.

SEFeb 13, 2020Code
The PHOTON Wizard -- Towards Educational Machine Learning Code Generators

Ramona Leenings, Nils Ralf Winter, Kelvin Sarink et al.

Despite the tremendous efforts to democratize machine learning, especially in applied-science, the application is still often hampered by the lack of coding skills. As we consider programmatic understanding key to building effective and efficient machine learning solutions, we argue for a novel educational approach that builds upon the accessibility and acceptance of graphical user interfaces to convey programming skills to an applied-science target group. We outline a proof-of-concept, open-source web application, the PHOTON Wizard, which dynamically translates GUI interactions into valid source code for the Python machine learning framework PHOTON. Thereby, users possessing theoretical machine learning knowledge gain key insights into the model development workflow as well as an intuitive understanding of custom implementations. Specifically, the PHOTON Wizard integrates the concept of Educational Machine Learning Code Generators to teach users how to write code for designing, training, optimizing and evaluating custom machine learning pipelines.

LGFeb 13, 2018Code
Barista - a Graphical Tool for Designing and Training Deep Neural Networks

Soeren Klemm, Aaron Scherzinger, Dominik Drees et al.

In recent years, the importance of deep learning has significantly increased in pattern recognition, computer vision, and artificial intelligence research, as well as in industry. However, despite the existence of multiple deep learning frameworks, there is a lack of comprehensible and easy-to-use high-level tools for the design, training, and testing of deep neural networks (DNNs). In this paper, we introduce Barista, an open-source graphical high-level interface for the Caffe deep learning framework. While Caffe is one of the most popular frameworks for training DNNs, editing prototext files in order to specify the net architecture and hyper parameters can become a cumbersome and error-prone task. Instead, Barista offers a fully graphical user interface with a graph-based net topology editor and provides an end-to-end training facility for DNNs, which allows researchers to focus on solving their problems without having to write code, edit text files, or manually parse logged data.

LGNov 15, 2025
Cross-view Joint Learning for Mixed-Missing Multi-view Unsupervised Feature Selection

Zongxin Shen, Yanyong Huang, Dongjie Wang et al.

Incomplete multi-view unsupervised feature selection (IMUFS), which aims to identify representative features from unlabeled multi-view data containing missing values, has received growing attention in recent years. Despite their promising performance, existing methods face three key challenges: 1) by focusing solely on the view-missing problem, they are not well-suited to the more prevalent mixed-missing scenario in practice, where some samples lack entire views or only partial features within views; 2) insufficient utilization of consistency and diversity across views limits the effectiveness of feature selection; and 3) the lack of theoretical analysis makes it unclear how feature selection and data imputation interact during the joint learning process. Being aware of these, we propose CLIM-FS, a novel IMUFS method designed to address the mixed-missing problem. Specifically, we integrate the imputation of both missing views and variables into a feature selection model based on nonnegative orthogonal matrix factorization, enabling the joint learning of feature selection and adaptive data imputation. Furthermore, we fully leverage consensus cluster structure and cross-view local geometrical structure to enhance the synergistic learning process. We also provide a theoretical analysis to clarify the underlying collaborative mechanism of CLIM-FS. Experimental results on eight real-world multi-view datasets demonstrate that CLIM-FS outperforms state-of-the-art methods.

LGFeb 6
Exploring Sparsity and Smoothness of Arbitrary $\ell_p$ Norms in Adversarial Attacks

Christof Duhme, Florian Eilers, Xiaoyi Jiang

Adversarial attacks against deep neural networks are commonly constructed under $\ell_p$ norm constraints, most often using $p=1$, $p=2$ or $p=\infty$, and potentially regularized for specific demands such as sparsity or smoothness. These choices are typically made without a systematic investigation of how the norm parameter \( p \) influences the structural and perceptual properties of adversarial perturbations. In this work, we study how the choice of \( p \) affects sparsity and smoothness of adversarial attacks generated under \( \ell_p \) norm constraints for values of $p \in [1,2]$. To enable a quantitative analysis, we adopt two established sparsity measures from the literature and introduce three smoothness measures. In particular, we propose a general framework for deriving smoothness measures based on smoothing operations and additionally introduce a smoothness measure based on first-order Taylor approximations. Using these measures, we conduct a comprehensive empirical evaluation across multiple real-world image datasets and a diverse set of model architectures, including both convolutional and transformer-based networks. We show that the choice of $\ell_1$ or $\ell_2$ is suboptimal in most cases and the optimal $p$ value is dependent on the specific task. In our experiments, using $\ell_p$ norms with $p\in [1.3, 1.5]$ yields the best trade-off between sparse and smooth attacks. These findings highlight the importance of principled norm selection when designing and evaluating adversarial attacks.

LGFeb 6
Perturbing the Phase: Analyzing Adversarial Robustness of Complex-Valued Neural Networks

Florian Eilers, Christof Duhme, Xiaoyi Jiang

Complex-valued neural networks (CVNNs) are rising in popularity for all kinds of applications. To safely use CVNNs in practice, analyzing their robustness against outliers is crucial. One well known technique to understand the behavior of deep neural networks is to investigate their behavior under adversarial attacks, which can be seen as worst case minimal perturbations. We design Phase Attacks, a kind of attack specifically targeting the phase information of complex-valued inputs. Additionally, we derive complex-valued versions of commonly used adversarial attacks. We show that in some scenarios CVNNs are more robust than RVNNs and that both are very susceptible to phase changes with the Phase Attacks decreasing the model performance more, than equally strong regular attacks, which can attack both phase and magnitude.

LGDec 17, 2025
Joint Learning of Unsupervised Multi-view Feature and Instance Co-selection with Cross-view Imputation

Yuxin Cai, Yanyong Huang, Jinyuan Chang et al.

Feature and instance co-selection, which aims to reduce both feature dimensionality and sample size by identifying the most informative features and instances, has attracted considerable attention in recent years. However, when dealing with unlabeled incomplete multi-view data, where some samples are missing in certain views, existing methods typically first impute the missing data and then concatenate all views into a single dataset for subsequent co-selection. Such a strategy treats co-selection and missing data imputation as two independent processes, overlooking potential interactions between them. The inter-sample relationships gleaned from co-selection can aid imputation, which in turn enhances co-selection performance. Additionally, simply merging multi-view data fails to capture the complementary information among views, ultimately limiting co-selection effectiveness. To address these issues, we propose a novel co-selection method, termed Joint learning of Unsupervised multI-view feature and instance Co-selection with cross-viEw imputation (JUICE). JUICE first reconstructs incomplete multi-view data using available observations, bringing missing data recovery and feature and instance co-selection together in a unified framework. Then, JUICE leverages cross-view neighborhood information to learn inter-sample relationships and further refine the imputation of missing values during reconstruction. This enables the selection of more representative features and instances. Extensive experiments demonstrate that JUICE outperforms state-of-the-art methods.

CLFeb 23, 2024
A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

Stefan Hegselmann, Shannon Zejiang Shen, Florian Gierse et al. · mit

Patients often face difficulties in understanding their hospitalizations, while healthcare workers have limited resources to provide explanations. In this work, we investigate the potential of large language models to generate patient summaries based on doctors' notes and study the effect of training data on the faithfulness and quality of the generated summaries. To this end, we release (i) a rigorous labeling protocol for errors in medical texts and (ii) a publicly available dataset of annotated hallucinations in 100 doctor-written and 100 generated summaries. We show that fine-tuning on hallucination-free data effectively reduces hallucinations from 2.60 to 1.55 per summary for Llama 2, while preserving relevant information. We observe a similar effect on GPT-4 (0.70 to 0.40), when the few-shot examples are hallucination-free. We also conduct a qualitative evaluation using hallucination-free and improved training data. We find that common quantitative metrics do not correlate well with faithfulness and quality. Finally, we test GPT-4 for automatic hallucination detection, which clearly outperforms common baselines.

LGFeb 6
Transformer-based Parameter Fitting of Models derived from Bloch-McConnell Equations for CEST MRI Analysis

Christof Duhme, Chris Lippe, Verena Hoerr et al.

Chemical exchange saturation transfer (CEST) MRI is a non-invasive imaging modality for detecting metabolites. It offers higher resolution and sensitivity compared to conventional magnetic resonance spectroscopy (MRS). However, quantification of CEST data is challenging because the measured signal results from a complex interplay of many physiological variables. Here, we introduce a transformer-based neural network to fit parameters such as metabolite concentrations, exchange and relaxation rates of a physical model derived from Bloch-McConnell equations to in-vitro CEST spectra. We show that our self-supervised trained neural network clearly outperforms the solution of classical gradient-based solver.

LGFeb 5
Clifford Kolmogorov-Arnold Networks

Matthias Wolff, Francesco Alesiani, Christof Duhme et al.

We introduce Clifford Kolmogorov-Arnold Network (ClKAN), a flexible and efficient architecture for function approximation in arbitrary Clifford algebra spaces. We propose the use of Randomized Quasi Monte Carlo grid generation as a solution to the exponential scaling associated with higher dimensional algebras. Our ClKAN also introduces new batch normalization strategies to deal with variable domain input. ClKAN finds application in scientific discovery and engineering, and is validated in synthetic and physics inspired tasks.

LGFeb 2
SNAP: A Self-Consistent Agreement Principle with Application to Robust Computation

Xiaoyi Jiang, Andreas Nienkötter

We introduce SNAP (Self-coNsistent Agreement Principle), a self-supervised framework for robust computation based on mutual agreement. Based on an Agreement-Reliability Hypothesis SNAP assigns weights that quantify agreement, emphasizing trustworthy items and downweighting outliers without supervision or prior knowledge. A key result is the Exponential Suppression of Outlier Weights, ensuring that outliers contribute negligibly to computations, even in high-dimensional settings. We study properties of SNAP weighting scheme and show its practical benefits on vector averaging and subspace estimation. Particularly, we demonstrate that non-iterative SNAP outperforms the iterative Weiszfeld algorithm and two variants of multivariate median of means. SNAP thus provides a flexible, easy-to-use, broadly applicable approach to robust computation.

LGFeb 4, 2025
CVKAN: Complex-Valued Kolmogorov-Arnold Networks

Matthias Wolff, Florian Eilers, Xiaoyi Jiang

In this work we propose CVKAN, a complex-valued Kolmogorov-Arnold Network (KAN), to join the intrinsic interpretability of KANs and the advantages of Complex-Valued Neural Networks (CVNNs). We show how to transfer a KAN and the necessary associated mechanisms into the complex domain. To confirm that CVKAN meets expectations we conduct experiments on symbolic complex-valued function fitting and physically meaningful formulae as well as on a more realistic dataset from knot theory. Our proposed CVKAN is more stable and performs on par or better than real-valued KANs while requiring less parameters and a shallower network architecture, making it more explainable.

LGOct 20, 2025
Model Metamers Reveal Invariances in Graph Neural Networks

Wei Xu, Xiaoyi Jiang, Lixiang Xu et al.

In recent years, deep neural networks have been extensively employed in perceptual systems to learn representations endowed with invariances, aiming to emulate the invariance mechanisms observed in the human brain. However, studies in the visual and auditory domains have confirmed that significant gaps remain between the invariance properties of artificial neural networks and those of humans. To investigate the invariance behavior within graph neural networks (GNNs), we introduce a model ``metamers'' generation technique. By optimizing input graphs such that their internal node activations match those of a reference graph, we obtain graphs that are equivalent in the model's representation space, yet differ significantly in both structure and node features. Our theoretical analysis focuses on two aspects: the local metamer dimension for a single node and the activation-induced volume change of the metamer manifold. Utilizing this approach, we uncover extreme levels of representational invariance across several classic GNN architectures. Although targeted modifications to model architecture and training strategies can partially mitigate this excessive invariance, they fail to fundamentally bridge the gap to human-like invariance. Finally, we quantify the deviation between metamer graphs and their original counterparts, revealing unique failure modes of current GNNs and providing a complementary benchmark for model evaluation.

LGMar 7, 2025
Robustness of Generalized Median Computation for Consensus Learning in Arbitrary Spaces

Andreas Nienkötter, Sandro Vega-Pons, Xiaoyi Jiang

Robustness in terms of outliers is an important topic and has been formally studied for a variety of problems in machine learning and computer vision. Generalized median computation is a special instance of consensus learning and a common approach to finding prototypes. Related research can be found in numerous problem domains with a broad range of applications. So far, however, robustness of generalized median has only been studied in a few specific spaces. To our knowledge, there is no robustness characterization in a general setting, i.e. for arbitrary spaces. We address this open issue in our work. The breakdown point >=0.5 is proved for generalized median with metric distance functions in general. We also study the detailed behavior in case of outliers from different perspectives. In addition, we present robustness results for weighted generalized median computation and non-metric distance functions. Given the importance of robustness, our work contributes to closing a gap in the literature. The presented results have general impact and applicability, e.g. providing deeper understanding of generalized median computation and practical guidance to avoid non-robust computation.

CVJun 27, 2024
Single Image Estimation of Cell Migration Direction by Deep Circular Regression

Lennart Bruns, Lucas Lamparter, Milos Galic et al.

In this paper, we address the problem of estimating the migration direction of cells based on a single image. A solution to this problem lays the foundation for a variety of applications that were previously not possible. To our knowledge, there is only one related work that employs a classification CNN with four classes (quadrants). However, this approach does not allow for detailed directional resolution. We tackle the single image estimation problem using deep circular regression, with a particular focus on cycle-sensitive methods. On two common datasets, we achieve a mean estimation error of $\sim\!17^\circ$, representing a significant improvement over previous work, which reported estimation error of $30^\circ$ and $34^\circ$, respectively.

CVDec 18, 2021
A Streaming Volumetric Image Generation Framework for Development and Evaluation of Out-of-Core Methods

Dominik Drees, Xiaoyi Jiang

Advances in 3D imaging technology in recent years have allowed for increasingly high resolution volumetric images of large specimen. The resulting datasets of hundreds of Gigabytes in size call for new scalable and memory efficient approaches in the field of image processing, where some progress has been made already. At the same time, quantitative evaluation of these new methods is difficult both in terms of the availability of specific data sizes and in the generation of associated ground truth data. In this paper we present an algorithmic framework that can be used to efficiently generate test (and ground truth) volume data, optionally even in a streaming fashion. As the proposed nested sweeps algorithm is fast, it can be used to generate test data on demand. We analyze the asymptotic run time of the presented algorithm and compare it experimentally to alternative approaches as well as a hypothetical best-case baseline method. In a case study, the framework is applied to the popular VascuSynth software for vascular image generation, making it capable of efficiently producing larger-than-main memory volumes which is demonstrated by generating a trillion voxel (1TB) image. Implementations of the presented framework are available online in the form of the modified version of Vascusynth and the code used for the experimental evaluation. In addition, the test data generation procedure has been integrated into the popular volume rendering and processing framework Voreen.

CVMar 17, 2021
Hierarchical Random Walker Segmentation for Large Volumetric Biomedical Images

Dominik Drees, Florian Eilers, Xiaoyi Jiang

The random walker method for image segmentation is a popular tool for semi-automatic image segmentation, especially in the biomedical field. However, its linear asymptotic run time and memory requirements make application to 3D datasets of increasing sizes impractical. We propose a hierarchical framework that, to the best of our knowledge, is the first attempt to overcome these restrictions for the random walker algorithm and achieves sublinear run time and constant memory complexity. The goal of this framework is -- rather than improving the segmentation quality compared to the baseline method -- to make interactive segmentation on out-of-core datasets possible. The method is evaluated quantitavely on synthetic data and the CT-ORG dataset where the expected improvements in algorithm run time while maintaining high segmentation quality are confirmed. The incremental (i.e., interaction update) run time is demonstrated to be in seconds on a standard PC even for volumes of hundreds of gigabytes in size. In a small case study the applicability to large real world from current biomedical research is demonstrated. An implementation of the presented method is publicly available in version 5.2 of the widely used volume rendering and processing software Voreen (https://www.uni-muenster.de/Voreen/).

CVFeb 5, 2021
Scalable Robust Graph and Feature Extraction for Arbitrary Vessel Networks in Large Volumetric Datasets

Dominik Drees, Aaron Scherzinger, René Hägerling et al.

Recent advances in 3D imaging technologies provide novel insights to researchers and reveal finer and more detail of examined specimen, especially in the biomedical domain, but also impose huge challenges regarding scalability for automated analysis algorithms due to rapidly increasing dataset sizes. In particular, existing research towards automated vessel network analysis does not consider memory requirements of proposed algorithms and often generates a large number of spurious branches for structures consisting of many voxels. Additionally, very often these algorithms have further restrictions such as the limitation to tree topologies or relying on the properties of specific image modalities. We present a scalable pipeline (in terms of computational cost, required main memory and robustness) that extracts an annotated abstract graph representation from the foreground segmentation of vessel networks of arbitrary topology and vessel shape. Only a single, dimensionless, a-priori determinable parameter is required. By careful engineering of individual pipeline stages and a novel iterative refinement scheme we are, for the first time, able to analyze the topology of volumes of roughly 1TB on commodity hardware. An implementation of the presented pipeline is publicly available in version 5.1 of the volume rendering and processing engine Voreen (https://www.uni-muenster.de/Voreen/).

LGFeb 13, 2020
PHOTONAI -- A Python API for Rapid Machine Learning Model Development

Ramona Leenings, Nils Ralf Winter, Lucas Plagwitz et al.

PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development. It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences. It is especially designed to support the iterative model development process and automates the repetitive training, hyperparameter optimization and evaluation tasks. Importantly, the workflow ensures unbiased performance estimates while still allowing the user to fully customize the machine learning analysis. PHOTONAI extends existing solutions with a novel pipeline implementation supporting more complex data streams, feature combinations, and algorithm selection. Metrics and results can be conveniently visualized using the PHOTONAI Explorer and predictive models are shareable in a standardized format for further external validation or application. A growing add-on ecosystem allows researchers to offer data modality specific algorithms to the community and enhance machine learning in the areas of the life sciences. Its practical utility is demonstrated on an exemplary medical machine learning problem, achieving a state-of-the-art solution in few lines of code. Source code is publicly available on Github, while examples and documentation can be found at www.photon-ai.com.

NCDec 13, 2019
Systematic Misestimation of Machine Learning Performance in Neuroimaging Studies of Depression

Claas Flint, Micah Cearns, Nils Opel et al.

We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from major depressive disorder (MDD) and healthy control (HC) based on neuroimaging data. Drawing upon structural magnetic resonance imaging (MRI) data from a balanced sample of $N = 1,868$ MDD patients and HC from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of $61\,\%$. Next, we mimicked the process by which researchers would draw samples of various sizes ($N = 4$ to $N = 150$) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes ($N = 20$), we observe accuracies of up to $95\,\%$. For medium sample sizes ($N = 100$) accuracies up to $75\,\%$ were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.

NCNov 24, 2019
Biological sex classification with structural MRI data shows increased misclassification in transgender women

Claas Flint, Katharina Förster, Sophie A. Koser et al.

Transgender individuals (TIs) show brain structural alterations that differ from their biological sex as well as their perceived gender. To substantiate evidence that the brain structure of TIs differs from male and female, we use a combined multivariate and univariate approach. Gray matter segments resulting from voxel-based morphometry preprocessing of $N = 1753$ cisgender (CG) healthy participants were used to train ($N=1402$) and validate (20 % hold-out; $N = 351$) a support-vector machine classifying the biological sex. As a second validation, we classified $N = 1104$ patients with depression. A third validation was performed using the matched CG sample of the transgender women (TWs) application-sample. Subsequently, the classifier was applied to $N = 26$ TWs. Finally, we compared brain volumes of CG-men, women and TW-pre/post treatment (cross-sex hormone treatment) in a univariate analysis controlling for sexual orientation, age and total brain volume. The application of our biological sex classifier to the transgender sample resulted in a significantly lower true positive rate (TPR) (TPR-male = 56.0 %). The TPR did not differ between CG-individuals with (TPR-male = 86.9 %) and without depression (TPR-male = 88.5 %). The univariate analysis of the transgender application-sample revealed that TW-pre/post treatment show brain structural differences from CG-women and CG-men in the putamen and insula, as well as the whole-brain analysis. Our results support the hypothesis that brain structure in TW differs from brain structure of their biological sex (male) as well as their perceived gender (female). This finding substantiates evidence that TIs show specific brain structural alterations leading to a different pattern of brain structure than CG-individuals.