NCSep 27, 2022
Adapting Brain-Like Neural Networks for Modeling Cortical Visual ProsthesesJacob Granley, Alexander Riedel, Michael Beyeler
Cortical prostheses are devices implanted in the visual cortex that attempt to restore lost vision by electrically stimulating neurons. Currently, the vision provided by these devices is limited, and accurately predicting the visual percepts resulting from stimulation is an open challenge. We propose to address this challenge by utilizing 'brain-like' convolutional neural networks (CNNs), which have emerged as promising models of the visual system. To investigate the feasibility of adapting brain-like CNNs for modeling visual prostheses, we developed a proof-of-concept model to predict the perceptions resulting from electrical stimulation. We show that a neurologically-inspired decoding of CNN activations produces qualitatively accurate phosphenes, comparable to phosphenes reported by real patients. Overall, this is an essential first step towards building brain-like models of electrical stimulation, which may not just improve the quality of vision provided by cortical prostheses but could also further our understanding of the neural code of vision.
NCJun 16, 2023
Human-in-the-Loop Optimization for Deep Stimulus Encoding in Visual ProsthesesJacob Granley, Tristan Fauvel, Matthew Chalk et al.
Neuroprostheses show potential in restoring lost sensory function and enhancing human capabilities, but the sensations produced by current devices often seem unnatural or distorted. Exact placement of implants and differences in individual perception lead to significant variations in stimulus response, making personalized stimulus optimization a key challenge. Bayesian optimization could be used to optimize patient-specific stimulation parameters with limited noisy observations, but is not feasible for high-dimensional stimuli. Alternatively, deep learning models can optimize stimulus encoding strategies, but typically assume perfect knowledge of patient-specific variations. Here we propose a novel, practically feasible approach that overcomes both of these fundamental limitations. First, a deep encoder network is trained to produce optimal stimuli for any individual patient by inverting a forward model mapping electrical stimuli to visual percepts. Second, a preferential Bayesian optimization strategy utilizes this encoder to optimize patient-specific parameters for a new patient, using a minimal number of pairwise comparisons between candidate stimuli. We demonstrate the viability of this approach on a novel, state-of-the-art visual prosthesis model. We show that our approach quickly learns a personalized stimulus encoder, leads to dramatic improvements in the quality of restored vision, and is robust to noisy patient feedback and misspecifications in the underlying forward model. Overall, our results suggest that combining the strengths of deep learning and Bayesian optimization could significantly improve the perceptual experience of patients fitted with visual prostheses and may prove a viable solution for a range of neuroprosthetic technologies.
CVMar 10, 2022
Deep Learning-Based Perceptual Stimulus Encoder for Bionic VisionLucas Relic, Bowen Zhang, Yi-Lin Tuan et al.
Retinal implants have the potential to treat incurable blindness, yet the quality of the artificial vision they produce is still rudimentary. An outstanding challenge is identifying electrode activation patterns that lead to intelligible visual percepts (phosphenes). Here we propose a PSE based on CNN that is trained in an end-to-end fashion to predict the electrode activation patterns required to produce a desired visual percept. We demonstrate the effectiveness of the encoder on MNIST using a psychophysically validated phosphene model tailored to individual retinal implant users. The present work constitutes an essential first step towards improving the quality of the artificial vision provided by retinal implants.
CVMay 13
SymbolSight: Minimizing Inter-Symbol Interference for Reading with Prosthetic VisionJasmine Lesner, Michael Beyeler
Retinal prostheses restore limited visual perception, but low spatial resolution and temporal persistence make reading difficult. In sequential letter presentation, the afterimage of one symbol can interfere with perception of the next, leading to systematic recognition errors. Rather than relying on future hardware improvements, we investigate whether optimizing the visual symbols themselves can mitigate this temporal interference. We present SymbolSight, a computational framework that selects symbol-to-letter mappings to minimize confusion among frequently adjacent letters. Using simulated prosthetic vision (SPV) and a neural proxy observer, we estimate pairwise symbol confusability and optimize assignments using language-specific bigram statistics. Across simulations in Arabic, Bulgarian, and English, the resulting heterogeneous symbol sets reduced predicted confusion by a median factor of 22 relative to native alphabets. These results suggest that standard typography is poorly matched to serial, low-bandwidth prosthetic vision and demonstrate how computational modeling can narrow the design space of visual encodings, identifying high-potential candidates for future psychophysical and clinical evaluation rather than predicting present-day clinical reading performance directly.
LGMay 26, 2022
Hybrid Neural Autoencoders for Stimulus Encoding in Visual and Other Sensory NeuroprosthesesJacob Granley, Lucas Relic, Michael Beyeler
Sensory neuroprostheses are emerging as a promising technology to restore lost sensory function or augment human capabilities. However, sensations elicited by current devices often appear artificial and distorted. Although current models can predict the neural or perceptual response to an electrical stimulus, an optimal stimulation strategy solves the inverse problem: what is the required stimulus to produce a desired response? Here, we frame this as an end-to-end optimization problem, where a deep neural network stimulus encoder is trained to invert a known and fixed forward model that approximates the underlying biological system. As a proof of concept, we demonstrate the effectiveness of this Hybrid Neural Autoencoder (HNA) in visual neuroprostheses. We find that HNA produces high-fidelity patient-specific stimuli representing handwritten digits and segmented images of everyday objects, and significantly outperforms conventional encoding strategies across all simulated patients. Overall this is an important step towards the long-standing challenge of restoring high-quality vision to people living with incurable blindness and may prove a promising solution for a variety of neuroprosthetic technologies.
NIMay 10
Network-Adaptive Cloud Processing for Visual NeuroprosthesesJiayi Liu, Yilin Wang, Michael Beyeler
Cloud-based machine learning is increasingly explored as a preprocessing strategy for next-generation visual neuroprostheses, where advanced scene understanding may exceed the computational and energy constraints of battery-powered visual processing units. Offloading computation to remote servers enables the use of state-of-the-art vision models, but also introduces sensitivity to network latency, jitter, and packet loss, which can disrupt the temporal consistency of the delivered neural stimulus. In this work, we examine the feasibility of cloud-assisted visual preprocessing for artificial vision by framing remote inference as a perceptually constrained systems problem. We present a network-adaptive cloud-assisted pipeline in which real-time round-trip-time feedback is used to dynamically modulate image resolution, compression, and transmission rate, explicitly prioritizing temporal continuity under adverse network conditions. PIDNet is used as a fixed real-time semantic segmentation backbone, allowing us to isolate how network-adaptive input encoding affects communication delay, inference time, and perceptual fidelity. Results show that adaptive visual encoding substantially reduces end-to-end latency during network congestion, with only modest degradation of global scene structure, while boundary precision degrades more sharply. Together, these findings delineate operating regimes in which cloud-assisted preprocessing may remain viable for future visual neuroprostheses and underscore the importance of network-aware adaptation for maintaining perceptual stability.
HCMar 18
Actionable Guidance Outperforms Map and Compass Cues in Demanding Immersive VR WayfindingApurv Varshney, Lily M. Turkstra, Jiaxin Su et al.
Navigation aids are central to immersive virtual reality (VR) experiences that involve physical locomotion. Their effectiveness depends not only on how much spatial information they provide, but also on how directly that information supports movement decisions. We compared three common guidance techniques for immersive VR wayfinding: a directional arrow, a minimap, and a compass. In a controlled room-scale VR study with 42 participants completing 1008 trials, participants navigated to target landmarks in a time-pressured maze with reduced visibility and forced route replanning. Across behavioral and eye-tracking measures, arrow guidance produced the strongest navigation performance, minimap guidance yielded intermediate performance, and compass cues performed worst, suggesting that during immersive locomotion users benefit from guidance that can be interpreted rapidly while moving. These results suggest that in demanding immersive locomotion tasks, interfaces that translate spatial information directly into actionable movement cues can outperform richer but more interpretive spatial representations. Our findings highlight the importance of designing XR navigation interfaces that minimize the cognitive translation between spatial information and movement decisions.
CVMay 20, 2022
Efficient visual object representation using a biologically plausible spike-latency code and winner-take-all inhibitionMelani Sanchez-Garcia, Tushar Chauhan, Benoit R. Cottereau et al.
Deep neural networks have surpassed human performance in key visual challenges such as object recognition, but require a large amount of energy, computation, and memory. In contrast, spiking neural networks (SNNs) have the potential to improve both the efficiency and biological plausibility of object recognition systems. Here we present a SNN model that uses spike-latency coding and winner-take-all inhibition (WTA-I) to efficiently represent visual stimuli from the Fashion MNIST dataset. Stimuli were preprocessed with center-surround receptive fields and then fed to a layer of spiking neurons whose synaptic weights were updated using spike-timing-dependent-plasticity (STDP). We investigate how the quality of the represented objects changes under different WTA-I schemes and demonstrate that a network of 150 spiking neurons can efficiently represent objects with as little as 40 spikes. Studying how core object recognition may be implemented using biologically plausible learning rules in SNNs may not only further our understanding of the brain, but also lead to novel and efficient artificial vision systems.
LGJan 31, 2025
Evaluating Deep Human-in-the-Loop Optimization for Retinal Implants Using Sighted ParticipantsEirini Schoinas, Adyah Rastogi, Anissa Carter et al.
Human-in-the-loop optimization (HILO) is a promising approach for personalizing visual prostheses by iteratively refining stimulus parameters based on user feedback. Previous work demonstrated HILO's efficacy in simulation, but its performance with human participants remains untested. Here we evaluate HILO using sighted participants viewing simulated prosthetic vision to assess its ability to optimize stimulation strategies under realistic conditions. Participants selected between phosphenes generated by competing encoders to iteratively refine a deep stimulus encoder (DSE). We tested HILO in three conditions: standard optimization, threshold misspecifications, and out-of-distribution parameter sampling. Participants consistently preferred HILO-generated stimuli over both a naive encoder and the DSE alone, with log odds favoring HILO across all conditions. We also observed key differences between human and simulated decision-making, highlighting the importance of validating optimization strategies with human participants. These findings support HILO as a viable approach for adapting visual prostheses to individuals. Clinical relevance: Validating HILO with sighted participants viewing simulated prosthetic vision is an important step toward personalized calibration of future visual prostheses.
SEDec 5, 2025
Fuzzing the brain: Automated stress testing for the safety of ML-driven neurostimulationMara Downing, Matthew Peng, Jacob Granley et al.
Objective: Machine learning (ML) models are increasingly used to generate electrical stimulation patterns in neuroprosthetic devices such as visual prostheses. While these models promise precise and personalized control, they also introduce new safety risks when model outputs are delivered directly to neural tissue. We propose a systematic, quantitative approach to detect and characterize unsafe stimulation patterns in ML-driven neurostimulation systems. Approach: We adapt an automated software testing technique known as coverage-guided fuzzing to the domain of neural stimulation. Here, fuzzing performs stress testing by perturbing model inputs and tracking whether resulting stimulation violates biophysical limits on charge density, instantaneous current, or electrode co-activation. The framework treats encoders as black boxes and steers exploration with coverage metrics that quantify how broadly test cases span the space of possible outputs and violation types. Main results: Applied to deep stimulus encoders for the retina and cortex, the method systematically reveals diverse stimulation regimes that exceed established safety limits. Two violation-output coverage metrics identify the highest number and diversity of unsafe outputs, enabling interpretable comparisons across architectures and training strategies. Significance: Violation-focused fuzzing reframes safety assessment as an empirical, reproducible process. By transforming safety from a training heuristic into a measurable property of the deployed model, it establishes a foundation for evidence-based benchmarking, regulatory readiness, and ethical assurance in next-generation neural interfaces.
LGMay 29, 2025
BIRD: Behavior Induction via Representation-structure DistillationGalen Pogoncheff, Michael Beyeler
Human-aligned deep learning models exhibit behaviors consistent with human values, such as robustness, fairness, and honesty. Transferring these behavioral properties to models trained on different tasks or data distributions remains challenging: aligned behavior is easily forgotten during fine-tuning, and collecting task-specific data that preserves this behavior can be prohibitively costly. We introduce BIRD (Behavior Induction via Representation-structure Distillation), a flexible framework for transferring aligned behavior by matching the internal representation structure of a student model to that of a teacher. Applied to out-of-distribution robustness in image classification, BIRD outperforms fine-tuning, transfer learning, and continual learning methods, improving robust accuracy by up to 16% over the next strongest baseline. It remains effective even when the teacher is trained on a much simpler dataset and is $25 \times$ smaller than the student. In a large-scale study of over 400 teacher-student pairs, we show that three interpretable and computable properties of the teacher's representations (i.e., task relevance, behavioral relevance, and complementary knowledge) explain up to 85% of the variance in transfer success. These insights offer practical guidance for teacher selection and design. BIRD turns small, well-aligned models into scalable alignment seeds, removing a key bottleneck in deploying safe AI systems in the wild.
NCMay 18, 2023
Explaining V1 Properties with a Biologically Constrained Deep Learning ArchitectureGalen Pogoncheff, Jacob Granley, Michael Beyeler
Convolutional neural networks (CNNs) have recently emerged as promising models of the ventral visual stream, despite their lack of biological specificity. While current state-of-the-art models of the primary visual cortex (V1) have surfaced from training with adversarial examples and extensively augmented data, these models are still unable to explain key neural properties observed in V1 that arise from biological circuitry. To address this gap, we systematically incorporated neuroscience-derived architectural components into CNNs to identify a set of mechanisms and architectures that comprehensively explain neural activity in V1. We show drastic improvements in model-V1 alignment driven by the integration of architectural components that simulate center-surround antagonism, local receptive fields, tuned normalization, and cortical magnification. Upon enhancing task-driven CNNs with a collection of these specialized components, we uncover models with latent representations that yield state-of-the-art explanation of V1 neural activity and tuning properties. Our results highlight an important advancement in the field of NeuroAI, as we systematically establish a set of architectural components that contribute to unprecedented explanation of V1. The neuroscience insights that could be gleaned from increasingly accurate in-silico models of the brain have the potential to greatly advance the fields of both neuroscience and artificial intelligence.
HCSep 10, 2021
A Systematic Review of Extended Reality (XR) for Understanding and Augmenting Vision LossJustin Kasowski, Byron A. Johnson, Ryan Neydavood et al.
Over the past decade, extended reality (XR) has emerged as an assistive technology not only to augment residual vision of people losing their sight but also to study the rudimentary vision restored to blind people by a visual neuroprosthesis. To make the best use of these emerging technologies, it is valuable and timely to understand the state of this research and identify any shortcomings that are present. Here we present a systematic literature review of 227 publications from 106 different venues assessing the potential of XR technology to further visual accessibility. In contrast to other reviews, we sample studies from multiple scientific disciplines, focus on augmentation of a person's residual vision, and require studies to feature a quantitative evaluation with appropriate end users. We summarize prominent findings from different XR research areas, show how the landscape has changed over the last decade, and identify scientific gaps in the literature. Specifically, we highlight the need for real-world validation, the broadening of end-user participation, and a more nuanced understanding of the suitability and usability of different XR-based accessibility aids. By broadening end-user participation to early stages of the design process and shifting the focus from behavioral performance to qualitative assessments of usability, future research has the potential to develop XR technologies that may not only allow for studying vision loss, but also enable novel visual accessibility aids with the potential to impact the lives of millions of people living with vision loss.
IVJul 9, 2021
U-Net with Hierarchical Bottleneck Attention for Landmark Detection in Fundus Images of the Degenerated RetinaShuyun Tang, Ziming Qi, Jacob Granley et al.
Fundus photography has routinely been used to document the presence and severity of retinal degenerative diseases such as age-related macular degeneration (AMD), glaucoma, and diabetic retinopathy (DR) in clinical practice, for which the fovea and optic disc (OD) are important retinal landmarks. However, the occurrence of lesions, drusen, and other retinal abnormalities during retinal degeneration severely complicates automatic landmark detection and segmentation. Here we propose HBA-U-Net: a U-Net backbone enriched with hierarchical bottleneck attention. The network consists of a novel bottleneck attention block that combines and refines self-attention, channel attention, and relative-position attention to highlight retinal abnormalities that may be important for fovea and OD segmentation in the degenerated retina. HBA-U-Net achieved state-of-the-art results on fovea detection across datasets and eye conditions (ADAM: Euclidean Distance (ED) of 25.4 pixels, REFUGE: 32.5 pixels, IDRiD: 32.1 pixels), on OD segmentation for AMD (ADAM: Dice Coefficient (DC) of 0.947), and on OD detection for DR (IDRiD: ED of 20.5 pixels). Our results suggest that HBA-U-Net may be well suited for landmark detection in the presence of a variety of retinal degenerative diseases.
HCFeb 21, 2021
Towards Immersive Virtual Reality Simulations of Bionic VisionJustin Kasowski, Nathan Wu, Michael Beyeler
Bionic vision is a rapidly advancing field aimed at developing visual neuroprostheses ('bionic eyes') to restore useful vision to people who are blind. However, a major outstanding challenge is predicting what people 'see' when they use their devices. The limited field of view of current devices necessitates head movements to scan the scene, which is difficult to simulate on a computer screen. In addition, many computational models of bionic vision lack biological realism. To address these challenges, we propose to embed biologically realistic models of simulated prosthetic vision (SPV) in immersive virtual reality (VR) so that sighted subjects can act as 'virtual patients' in real-world tasks.
CVJan 30, 2021
Deep Learning--Based Scene Simplification for Bionic VisionNicole Han, Sudhanshu Srivastava, Aiwen Xu et al.
Retinal degenerative diseases cause profound visual impairment in more than 10 million people worldwide, and retinal prostheses are being developed to restore vision to these individuals. Analogous to cochlear implants, these devices electrically stimulate surviving retinal cells to evoke visual percepts (phosphenes). However, the quality of current prosthetic vision is still rudimentary. Rather than aiming to restore "natural" vision, there is potential merit in borrowing state-of-the-art computer vision algorithms as image processing techniques to maximize the usefulness of prosthetic vision. Here we combine deep learning--based scene simplification strategies with a psychophysically validated computational model of the retina to generate realistic predictions of simulated prosthetic vision, and measure their ability to support scene understanding of sighted subjects (virtual patients) in a variety of outdoor scenarios. We show that object segmentation may better support scene understanding than models based on visual saliency and monocular depth estimation. In addition, we highlight the importance of basing theoretical predictions on biologically realistic models of phosphene shape. Overall, this work has the potential to drastically improve the utility of prosthetic vision for people blinded from retinal degenerative diseases.