CVMar 30
Fisheye3R: Adapting Unified 3D Feed-Forward Foundation Models to Fisheye LensesRuxiao Duan, Erin Hong, Dongxu Zhao et al.
Feed-forward foundation models for multi-view 3-dimensional (3D) reconstruction have been trained on large-scale datasets of perspective images; when tested on wide field-of-view images, e.g., from a fisheye camera, their performance degrades. Their error arises from changes in spatial positions of pixels due to a non-linear projection model that maps 3D points onto the 2D image plane. While one may surmise that training on fisheye images would resolve this problem, there are far fewer fisheye images with ground truth than perspective images, which limit generalization. To enable inference on imagery exhibiting high radial distortion, we propose Fisheye3R, a novel adaptation framework that extends these multi-view 3D reconstruction foundation models to natively accommodate fisheye inputs without performance regression on perspective images. To address the scarcity of fisheye images and ground truth, we introduce flexible learning schemes that support self-supervised adaptation using only unlabeled perspective images and supervised adaptation without any fisheye training data. Extensive experiments across three foundation models, including VGGT, $Ï^3$, and MapAnything, demonstrate that our approach consistently improves camera pose, depth, point map, and field-of-view estimation on fisheye images.
IVMar 16, 2025
Fourier-Based 3D Multistage Transformer for Aberration Correction in Multicellular SpecimensThayer Alshaabi, Daniel E. Milkie, Gaoxiang Liu et al.
High-resolution tissue imaging is often compromised by sample-induced optical aberrations that degrade resolution and contrast. While wavefront sensor-based adaptive optics (AO) can measure these aberrations, such hardware solutions are typically complex, expensive to implement, and slow when serially mapping spatially varying aberrations across large fields of view. Here, we introduce AOViFT (Adaptive Optical Vision Fourier Transformer) -- a machine learning-based aberration sensing framework built around a 3D multistage Vision Transformer that operates on Fourier domain embeddings. AOViFT infers aberrations and restores diffraction-limited performance in puncta-labeled specimens with substantially reduced computational cost, training time, and memory footprint compared to conventional architectures or real-space networks. We validated AOViFT on live gene-edited zebrafish embryos, demonstrating its ability to correct spatially varying aberrations using either a deformable mirror or post-acquisition deconvolution. By eliminating the need for the guide star and wavefront sensing hardware and simplifying the experimental workflow, AOViFT lowers technical barriers for high-resolution volumetric microscopy across diverse biological samples.
LGJun 3, 2021
Double Descent Optimization Pattern and Aliasing: Caveats of Noisy LabelsFlorian Dubost, Erin Hong, Max Pike et al.
Optimization plays a key role in the training of deep neural networks. Deciding when to stop training can have a substantial impact on the performance of the network during inference. Under certain conditions, the generalization error can display a double descent pattern during training: the learning curve is non-monotonic and seemingly diverges before converging again after additional epochs. This optimization pattern can lead to early stopping procedures to stop training before the second convergence and consequently select a suboptimal set of parameters for the network, with worse performance during inference. In this work, in addition to confirming that double descent occurs with small datasets and noisy labels as evidenced by others, we show that noisy labels must be present both in the training and generalization sets to observe a double descent pattern. We also show that the learning rate has an influence on double descent, and study how different optimizers and optimizer parameters influence the apparition of double descent. Finally, we show that increasing the learning rate can create an aliasing effect that masks the double descent pattern without suppressing it. We study this phenomenon through extensive experiments on variants of CIFAR-10 and show that they translate to a real world application: the forecast of seizure events in epileptic patients from continuous electroencephalographic recordings.
CVNov 28, 2020
Semi-Supervised Learning for Sparsely-Labeled Sequential Data: Application to Healthcare Video ProcessingFlorian Dubost, Erin Hong, Nandita Bhaskhar et al.
Labeled data is a critical resource for training and evaluating machine learning models. However, many real-life datasets are only partially labeled. We propose a semi-supervised machine learning training strategy to improve event detection performance on sequential data, such as video recordings, when only sparse labels are available, such as event start times without their corresponding end times. Our method uses noisy guesses of the events' end times to train event detection models. Depending on how conservative these guesses are, mislabeled samples may be introduced into the training set. We further propose a mathematical model for explaining and estimating the evolution of the classification performance for increasingly noisier end time estimates. We show that neural networks can improve their detection performance by leveraging more training data with less conservative approximations despite the higher proportion of incorrect labels. We adapt sequential versions of CIFAR-10 and MNIST, and use the Berkeley MHAD and HMBD51 video datasets to empirically evaluate our method, and find that our risk-tolerant strategy outperforms conservative estimates by 3.5 points of mean average precision for CIFAR, 30 points for MNIST, 3 points for MHAD, and 14 points for HMBD51. Then, we leverage the proposed training strategy to tackle a real-life application: processing continuous video recordings of epilepsy patients, and show that our method outperforms baseline labeling methods by 17 points of average precision, and reaches a classification performance similar to that of fully supervised models. We share part of the code for this article.