IVJul 1, 2025
Tunable Wavelet Unit based Convolutional Neural Network in Optical Coherence Tomography Analysis Enhancement for Classifying Type of Epiretinal Membrane SurgeryAn Le, Nehal Mehta, William Freeman et al.
In this study, we developed deep learning-based method to classify the type of surgery performed for epiretinal membrane (ERM) removal, either internal limiting membrane (ILM) removal or ERM-alone removal. Our model, based on the ResNet18 convolutional neural network (CNN) architecture, utilizes postoperative optical coherence tomography (OCT) center scans as inputs. We evaluated the model using both original scans and scans preprocessed with energy crop and wavelet denoising, achieving 72% accuracy on preprocessed inputs, outperforming the 66% accuracy achieved on original scans. To further improve accuracy, we integrated tunable wavelet units with two key adaptations: Orthogonal Lattice-based Wavelet Units (OrthLatt-UwU) and Perfect Reconstruction Relaxation-based Wavelet Units (PR-Relax-UwU). These units allowed the model to automatically adjust filter coefficients during training and were incorporated into downsampling, stride-two convolution, and pooling layers, enhancing its ability to distinguish between ERM-ILM removal and ERM-alone removal, with OrthLattUwU boosting accuracy to 76% and PR-Relax-UwU increasing performance to 78%. Performance comparisons showed that our AI model outperformed a trained human grader, who achieved only 50% accuracy in classifying the removal surgery types from postoperative OCT scans. These findings highlight the potential of CNN based models to improve clinical decision-making by providing more accurate and reliable classifications. To the best of our knowledge, this is the first work to employ tunable wavelets for classifying different types of ERM removal surgery.
IVFeb 10, 2025
Universal Vessel Segmentation for Multi-Modality Retinal ImagesBo Wen, Anna Heinke, Akshay Agnihotri et al.
We identify two major limitations in the existing studies on retinal vessel segmentation: (1) Most existing works are restricted to one modality, i.e., the Color Fundus (CF). However, multi-modality retinal images are used every day in the study of the retina and diagnosis of retinal diseases, and the study of vessel segmentation on other modalities is scarce; (2) Even though a few works extended their experiments to new modalities such as the Multi-Color Scanning Laser Ophthalmoscopy (MC), these works still require fine-tuning a separate model for the new modality. The fine-tuning will require extra training data, which is difficult to acquire. In this work, we present a novel universal vessel segmentation model (URVSM) for multi-modality retinal images. In addition to performing the study on a much wider range of image modalities, we also propose a universal model to segment the vessels in all these commonly used modalities. While being much more versatile compared with existing methods, our universal model also demonstrates comparable performance to the state-of-the-art fine-tuned methods. To the best of our knowledge, this is the first work that achieves modality-agnostic retinal vessel segmentation and the first to study retinal vessel segmentation in several novel modalities.
LGSep 5, 2025
Beyond I-Con: Exploring New Dimension of Distance Measures in Representation LearningJasmine Shone, Shaden Alshammari, Mark Hamilton et al.
The Information Contrastive (I-Con) framework revealed that over 23 representation learning methods implicitly minimize KL divergence between data and learned distributions that encode similarities between data points. However, a KL-based loss may be misaligned with the true objective, and properties of KL divergence such as asymmetry and unboundedness may create optimization challenges. We present Beyond I-Con, a framework that enables systematic discovery of novel loss functions by exploring alternative statistical divergences and similarity kernels. Key findings: (1) on unsupervised clustering of DINO-ViT embeddings, we achieve state-of-the-art results by modifying the PMI algorithm to use total variation (TV) distance; (2) on supervised contrastive learning, we outperform the standard approach by using TV and a distance-based similarity kernel instead of KL and an angular kernel; (3) on dimensionality reduction, we achieve superior qualitative results and better performance on downstream tasks than SNE by replacing KL with a bounded f-divergence. Our results highlight the importance of considering divergence and similarity kernel choices in representation learning optimization.
GRJun 18, 2025
Graphics4Science: Computer Graphics for Scientific ImpactsPeter Yichen Chen, Minghao Guo, Hanspeter Pfister et al.
Computer graphics, often associated with films, games, and visual effects, has long been a powerful tool for addressing scientific challenges--from its origins in 3D visualization for medical imaging to its role in modern computational modeling and simulation. This course explores the deep and evolving relationship between computer graphics and science, highlighting past achievements, ongoing contributions, and open questions that remain. We show how core methods, such as geometric reasoning and physical modeling, provide inductive biases that help address challenges in both fields, especially in data-scarce settings. To that end, we aim to reframe graphics as a modeling language for science by bridging vocabulary gaps between the two communities. Designed for both newcomers and experts, Graphics4Science invites the graphics community to engage with science, tackle high-impact problems where graphics expertise can make a difference, and contribute to the future of scientific discovery. Additional details are available on the course website: https://graphics4science.github.io
CVJul 9, 2015
Understanding Intra-Class Knowledge Inside CNNDonglai Wei, Bolei Zhou, Antonio Torrabla et al.
Convolutional Neural Network (CNN) has been successful in image recognition tasks, and recent works shed lights on how CNN separates different classes with the learned inter-class knowledge through visualization. In this work, we instead visualize the intra-class knowledge inside CNN to better understand how an object class is represented in the fully-connected layers. To invert the intra-class knowledge into more interpretable images, we propose a non-parametric patch prior upon previous CNN visualization models. With it, we show how different "styles" of templates for an object class are organized by CNN in terms of location and content, and represented in a hierarchical and ensemble way. Moreover, such intra-class knowledge can be used in many interesting applications, e.g. style-based image retrieval and style-based object completion.