IMNov 7, 2022
Monte Carlo Techniques for Addressing Large Errors and Missing Data in Simulation-based InferenceBingjie Wang, Joel Leja, Ashley Villar et al.
Upcoming astronomical surveys will observe billions of galaxies across cosmic time, providing a unique opportunity to map the many pathways of galaxy assembly to an incredibly high resolution. However, the huge amount of data also poses an immediate computational challenge: current tools for inferring parameters from the light of galaxies take $\gtrsim 10$ hours per fit. This is prohibitively expensive. Simulation-based Inference (SBI) is a promising solution. However, it requires simulated data with identical characteristics to the observed data, whereas real astronomical surveys are often highly heterogeneous, with missing observations and variable uncertainties determined by sky and telescope conditions. Here we present a Monte Carlo technique for treating out-of-distribution measurement errors and missing data using standard SBI tools. We show that out-of-distribution measurement errors can be approximated by using standard SBI evaluations, and that missing data can be marginalized over using SBI evaluations over nearby data realizations in the training set. While these techniques slow the inference process from $\sim 1$ sec to $\sim 1.5$ min per object, this is still significantly faster than standard approaches while also dramatically expanding the applicability of SBI. This expanded regime has broad implications for future applications to astronomical surveys.
IVAug 26, 2024
BreakNet: Discontinuity-Resilient Multi-Scale Transformer Segmentation of Retinal LayersRazieh Ganjee, Bingjie Wang, Lingyun Wang et al.
Visible light optical coherence tomography (vis-OCT) is gaining traction for retinal imaging due to its high resolution and functional capabilities. However, the significant absorption of hemoglobin in the visible light range leads to pronounced shadow artifacts from retinal blood vessels, posing challenges for accurate layer segmentation. In this study, we present BreakNet, a multi-scale Transformer-based segmentation model designed to address boundary discontinuities caused by these shadow artifacts. BreakNet utilizes hierarchical Transformer and convolutional blocks to extract multi-scale global and local feature maps, capturing essential contextual, textural, and edge characteristics. The model incorporates decoder blocks that expand pathwaproys to enhance the extraction of fine details and semantic information, ensuring precise segmentation. Evaluated on rodent retinal images acquired with prototype vis-OCT, BreakNet demonstrated superior performance over state-of-the-art segmentation models, such as TCCT-BP and U-Net, even when faced with limited-quality ground truth data. Our findings indicate that BreakNet has the potential to significantly improve retinal quantification and analysis.
CVMay 12, 2024Code
Meta-Learned Modality-Weighted Knowledge Distillation for Robust Multi-Modal Learning with Missing DataHu Wang, Salma Hassan, Yuyuan Liu et al.
In multi-modal learning, some modalities are more influential than others, and their absence can have a significant impact on classification/segmentation accuracy. Addressing this challenge, we propose a novel approach called Meta-learned Modality-weighted Knowledge Distillation (MetaKD), which enables multi-modal models to maintain high accuracy even when key modalities are missing. MetaKD adaptively estimates the importance weight of each modality through a meta-learning process. These learned importance weights guide a pairwise modality-weighted knowledge distillation process, allowing high-importance modalities to transfer knowledge to lower-importance ones, resulting in robust performance despite missing inputs. Unlike previous methods in the field, which are often task-specific and require significant modifications, our approach is designed to work in multiple tasks (e.g., segmentation and classification) with minimal adaptation. Experimental results on five prevalent datasets, including three Brain Tumor Segmentation datasets (BraTS2018, BraTS2019 and BraTS2020), the Alzheimer's Disease Neuroimaging Initiative (ADNI) classification dataset and the Audiovision-MNIST classification dataset, demonstrate the proposed model is able to outperform the compared models by a large margin. The code is available at https://github.com/billhhh/MetaKD.
CVJun 16, 2025Code
Video Individual Counting With Implicit One-to-Many MatchingXuhui Zhu, Jing Xu, Bingjie Wang et al.
Video Individual Counting (VIC) is a recently introduced task that aims to estimate pedestrian flux from a video. It extends conventional Video Crowd Counting (VCC) beyond the per-frame pedestrian count. In contrast to VCC that only learns to count repeated pedestrian patterns across frames, the key problem of VIC is how to identify co-existent pedestrians between frames, which turns out to be a correspondence problem. Existing VIC approaches, however, mainly follow a one-to-one (O2O) matching strategy where the same pedestrian must be exactly matched between frames, leading to sensitivity to appearance variations or missing detections. In this work, we show that the O2O matching could be relaxed to a one-to-many (O2M) matching problem, which better fits the problem nature of VIC and can leverage the social grouping behavior of walking pedestrians. We therefore introduce OMAN, a simple but effective VIC model with implicit One-to-Many mAtchiNg, featuring an implicit context generator and a one-to-many pairwise matcher. Experiments on the SenseCrowd and CroHD benchmarks show that OMAN achieves the state-of-the-art performance. Code is available at \href{https://github.com/tiny-smart/OMAN}{OMAN}.
IVNov 17, 2024
Freqformer: Frequency-Domain Transformer for 3-D Reconstruction and Quantification of Human Retinal VasculatureLingyun Wang, Bingjie Wang, Jay Chhablani et al.
Objective: To achieve accurate 3-D reconstruction and quantitative analysis of human retinal vasculature from a single optical coherence tomography angiography (OCTA) scan. Methods: We introduce Freqformer, a novel Transformer-based model featuring a dual-branch architecture that integrates a Transformer layer for capturing global spatial context with a complex-valued frequency-domain module designed for adaptive frequency enhancement. Freqformer was trained using single depth-plane OCTA images, utilizing volumetrically merged OCTA as the ground truth. Performance was evaluated quantitatively through 2-D and 3-D image quality metrics. 2-D networks and their 3-D counterparts were compared to assess the differences between enhancing volume slice by slice and enhancing it by 3-D patches. Furthermore, 3-D quantitative vascular metrics were conducted to quantify human retinal vasculature. Results: Freqformer substantially outperformed existing convolutional neural networks and Transformer-based methods, achieving superior image metrics. Importantly, the enhanced OCTA volumes show strong correlation with the merged volumes on vascular segment count, density, length, and flow index, further underscoring its reliability for quantitative vascular analysis. 3-D counterparts did not yield additional gains in image metrics or downstream 3-D vascular quantification but incurred nearly an order-of-magnitude longer inference time, supporting our 2-D slice-wise enhancement strategy. Additionally, Freqformer showed excellent generalization capability on larger field-of-view scans, surpassing the quality of conventional volumetric merging methods. Conclusion: Freqformer reliably generates high-definition 3-D retinal microvasculature from single-scan OCTA, enabling precise vascular quantification comparable to standard volumetric merging methods.
MED-PHMay 15, 2024
Fully Automated OCT-based Tissue Screening SystemShaohua Pi, Razieh Ganjee, Lingyun Wang et al.
This study introduces a groundbreaking optical coherence tomography (OCT) imaging system dedicated for high-throughput screening applications using ex vivo tissue culture. Leveraging OCT's non-invasive, high-resolution capabilities, the system is equipped with a custom-designed motorized platform and tissue detection ability for automated, successive imaging across samples. Transformer-based deep learning segmentation algorithms further ensure robust, consistent, and efficient readouts meeting the standards for screening assays. Validated using retinal explant cultures from a mouse model of retinal degeneration, the system provides robust, rapid, reliable, unbiased, and comprehensive readouts of tissue response to treatments. This fully automated OCT-based system marks a significant advancement in tissue screening, promising to transform drug discovery, as well as other relevant research fields.
AIOct 28, 2025
BMGQ: A Bottom-up Method for Generating Complex Multi-hop Reasoning Questions from Semi-structured DataBingsen Qiu, Zijian Liu, Xiao Liu et al.
Building training-ready multi-hop question answering (QA) datasets that truly stress a model's retrieval and reasoning abilities remains highly challenging recently. While there have been a few recent evaluation datasets that capture the characteristics of hard-to-search but easy-to-verify problems -- requiring the integration of ambiguous, indirect, and cross-domain cues -- these data resources remain scarce and are mostly designed for evaluation, making them unsuitable for supervised fine-tuning (SFT) or reinforcement learning (RL). Meanwhile, manually curating non-trivially retrievable questions -- where answers cannot be found through a single direct query but instead require multi-hop reasoning over oblique and loosely connected evidence -- incurs prohibitive human costs and fails to scale, creating a critical data bottleneck for training high-capability retrieval-and-reasoning agents. To address this, we present an automated framework for generating high-difficulty, training-ready multi-hop questions from semi-structured knowledge sources. The system (i) grows diverse, logically labeled evidence clusters through Natural Language Inference (NLI)-based relation typing and diversity-aware expansion; (ii) applies reverse question construction to compose oblique cues so that isolated signals are underinformative but their combination uniquely identifies the target entity; and (iii) enforces quality with a two-step evaluation pipeline that combines multi-model consensus filtering with structured constraint decomposition and evidence-based matching. The result is a scalable process that yields complex, retrieval-resistant yet verifiable questions suitable for SFT/RL training as well as challenging evaluation, substantially reducing human curation effort while preserving the difficulty profile of strong evaluation benchmarks.