CVSep 21, 2025
Towards Generalized Synapse Detection Across Invertebrate SpeciesSamia Mohinta, Daniel Franco-Barranco, Shi Yan Lee et al.
Behavioural differences across organisms, whether healthy or pathological, are closely tied to the structure of their neural circuits. Yet, the fine-scale synaptic changes that give rise to these variations remain poorly understood, in part due to persistent challenges in detecting synapses reliably and at scale. Volume electron microscopy (EM) offers the resolution required to capture synaptic architecture, but automated detection remains difficult due to sparse annotations, morphological variability, and cross-dataset domain shifts. To address this, we make three key contributions. First, we curate a diverse EM benchmark spanning four datasets across two invertebrate species: adult and larval Drosophila melanogaster, and Megaphragma viggianii (micro-WASP). Second, we propose SimpSyn, a single-stage Residual U-Net trained to predict dual-channel spherical masks around pre- and post-synaptic sites, designed to prioritize training and inference speeds and annotation efficiency over architectural complexity. Third, we benchmark SimpSyn against Buhmann et al.'s Synful [1], a state-of-the-art multi-task model that jointly infers synaptic pairs. Despite its simplicity, SimpSyn consistently outperforms Synful in F1-score across all volumes for synaptic site detection. While generalization across datasets remains limited, SimpSyn achieves competitive performance when trained on the combined cohort. Finally, ablations reveal that simple post-processing strategies - such as local peak detection and distance-based filtering - yield strong performance without complex test-time heuristics. Taken together, our results suggest that lightweight models, when aligned with task structure, offer a practical and scalable solution for synapse detection in large-scale connectomic pipelines.
LGMay 20, 2020
Distance-based Positive and Unlabeled Learning for RankingHayden S. Helm, Amitabh Basu, Avanti Athreya et al.
Learning to rank -- producing a ranked list of items specific to a query and with respect to a set of supervisory items -- is a problem of general interest. The setting we consider is one in which no analytic description of what constitutes a good ranking is available. Instead, we have a collection of representations and supervisory information consisting of a (target item, interesting items set) pair. We demonstrate analytically, in simulation, and in real data examples that learning to rank via combining representations using an integer linear program is effective when the supervision is as light as "these few items are similar to your item of interest." While this nomination task is quite general, for specificity we present our methodology from the perspective of vertex nomination in graphs. The methodology described herein is model agnostic.
MLMay 9, 2017
Semiparametric spectral modeling of the Drosophila connectomeCarey E. Priebe, Youngser Park, Minh Tang et al.
We present semiparametric spectral modeling of the complete larval Drosophila mushroom body connectome. Motivated by a thorough exploratory data analysis of the network via Gaussian mixture modeling (GMM) in the adjacency spectral embedding (ASE) representation space, we introduce the latent structure model (LSM) for network modeling and inference. LSM is a generalization of the stochastic block model (SBM) and a special case of the random dot product graph (RDPG) latent position model, and is amenable to semiparametric GMM in the ASE representation space. The resulting connectome code derived via semiparametric GMM composed with ASE captures latent connectome structure and elucidates biologically relevant neuronal properties.
CVMar 8, 2015
TED: A Tolerant Edit Distance for Segmentation EvaluationJan Funke, Francesc Moreno-Noguer, Albert Cardona et al.
In this paper, we present a novel error measure to compare a segmentation against ground truth. This measure, which we call Tolerant Edit Distance (TED), is motivated by two observations: (1) Some errors, like small boundary shifts, are tolerable in practice. Which errors are tolerable is application dependent and should be a parameter of the measure. (2) Non-tolerable errors have to be corrected manually. The time needed to do so should be reflected by the error measure. Using integer linear programming, the TED finds the minimal weighted sum of split and merge errors exceeding a given tolerance criterion, and thus provides a time-to-fix estimate. In contrast to commonly used measures like Rand index or variation of information, the TED (1) does not count small, but tolerable, differences, (2) provides intuitive numbers, (3) gives a time-to-fix estimate, and (4) can localize and classify the type of errors. By supporting both isotropic and anisotropic volumes and having a flexible tolerance criterion, the TED can be adapted to different requirements. On example segmentations for 3D neuron segmentation, we demonstrate that the TED is capable of counting topological errors, while ignoring small boundary shifts.