CVMar 4, 2022Code
UVCGAN: UNet Vision Transformer cycle-consistent GAN for unpaired image-to-image translationDmitrii Torbunov, Yi Huang, Haiwang Yu et al.
Unpaired image-to-image translation has broad applications in art, design, and scientific simulations. One early breakthrough was CycleGAN that emphasizes one-to-one mappings between two unpaired image domains via generative-adversarial networks (GAN) coupled with the cycle-consistency constraint, while more recent works promote one-to-many mapping to boost diversity of the translated images. Motivated by scientific simulation and one-to-one needs, this work revisits the classic CycleGAN framework and boosts its performance to outperform more contemporary models without relaxing the cycle-consistency constraint. To achieve this, we equip the generator with a Vision Transformer (ViT) and employ necessary training and regularization techniques. Compared to previous best-performing models, our model performs better and retains a strong correlation between the original and translated image. An accompanying ablation study shows that both the gradient penalty and self-supervised pre-training are crucial to the improvement. To promote reproducibility and open science, the source code, hyperparameter configurations, and pre-trained model are available at https://github.com/LS4GAN/uvcgan.
CVMar 28, 2023Code
UVCGAN v2: An Improved Cycle-Consistent GAN for Unpaired Image-to-Image TranslationDmitrii Torbunov, Yi Huang, Huan-Hsin Tseng et al.
An unpaired image-to-image (I2I) translation technique seeks to find a mapping between two domains of data in a fully unsupervised manner. While initial solutions to the I2I problem were provided by generative adversarial neural networks (GANs), diffusion models (DMs) currently hold the state-of-the-art status on the I2I translation benchmarks in terms of Frechet inception distance (FID). Yet, DMs suffer from limitations, such as not using data from the source domain during the training or maintaining consistency of the source and translated images only via simple pixel-wise errors. This work improves a recent UVCGAN model and equips it with modern advancements in model architectures and training procedures. The resulting revised model significantly outperforms other advanced GAN- and DM-based competitors on a variety of benchmarks. In the case of Male-to-Female translation of CelebA, the model achieves more than 40% improvement in FID score compared to the state-of-the-art results. This work also demonstrates the ineffectiveness of the pixel-wise I2I translation faithfulness metrics and suggests their revision. The code and trained models are available at https://github.com/LS4GAN/uvcgan2
HEP-EXApr 25, 2023
Unpaired Image Translation to Mitigate Domain Shift in Liquid Argon Time Projection Chamber Detector ResponsesYi Huang, Dmitrii Torbunov, Brett Viren et al.
Deep learning algorithms often are trained and deployed on different datasets. Any systematic difference between the training and a test dataset may degrade the algorithm performance--what is known as the domain shift problem. This issue is prevalent in many scientific domains where algorithms are trained on simulated data but applied to real-world datasets. Typically, the domain shift problem is solved through various domain adaptation methods. However, these methods are often tailored for a specific downstream task and may not easily generalize to different tasks. This work explores the feasibility of using an alternative way to solve the domain shift problem that is not specific to any downstream algorithm. The proposed approach relies on modern Unpaired Image-to-Image translation techniques, designed to find translations between different image domains in a fully unsupervised fashion. In this study, the approach is applied to a domain shift problem commonly encountered in Liquid Argon Time Projection Chamber (LArTPC) detector research when seeking a way to translate samples between two differently distributed detector datasets deterministically. This translation allows for mapping real-world data into the simulated data domain where the downstream algorithms can be run with much less domain-shift-related degradation. Conversely, using the translation from the simulated data in a real-world domain can increase the realism of the simulated dataset and reduce the magnitude of any systematic uncertainties. We adapted several UI2I translation algorithms to work on scientific data and demonstrated the viability of these techniques for solving the domain shift problem with LArTPC detector data. To facilitate further development of domain adaptation techniques for scientific datasets, the "Simple Liquid-Argon Track Samples" dataset used in this study also is published.
CVDec 3, 2024Code
EvRT-DETR: Latent Space Adaptation of Image Detectors for Event-based VisionDmitrii Torbunov, Yihui Ren, Animesh Ghose et al.
Event-based cameras (EBCs) have emerged as a bio-inspired alternative to traditional cameras, offering advantages in power efficiency, temporal resolution, and high dynamic range. However, the development of image analysis methods for EBCs is challenging due to the sparse and asynchronous nature of the data. This work addresses the problem of object detection for EBC cameras. The current approaches to EBC object detection focus on constructing complex data representations and rely on specialized architectures. We introduce I2EvDet (Image-to-Event Detection), a novel adaptation framework that bridges mainstream object detection with temporal event data processing. First, we demonstrate that a Real-Time DEtection TRansformer, or RT-DETR, a state-of-the-art natural image detector, trained on a simple image-like representation of the EBC data achieves performance comparable to specialized EBC methods. Next, as part of our framework, we develop an efficient adaptation technique that transforms image-based detectors into event-based detection models by modifying their frozen latent representation space through minimal architectural additions. The resulting EvRT-DETR model reaches state-of-the-art performance on the standard benchmark datasets Gen1 (mAP $+2.3$) and 1Mpx/Gen4 (mAP $+1.4$). These results demonstrate a fundamentally new approach to EBC object detection through principled adaptation of mainstream architectures, offering an efficient alternative with potential applications to other temporal visual domains. The code is available at: https://github.com/realtime-intelligence/evrt-detr
LGJan 23
Parameter Inference and Uncertainty Quantification with Diffusion Models: Extending CDI to 2D Spatial ConditioningDmitrii Torbunov, Yihui Ren, Lijun Wu et al.
Uncertainty quantification is critical in scientific inverse problems to distinguish identifiable parameters from those that remain ambiguous given available measurements. The Conditional Diffusion Model-based Inverse Problem Solver (CDI) has previously demonstrated effective probabilistic inference for one-dimensional temporal signals, but its applicability to higher-dimensional spatial data remains unexplored. We extend CDI to two-dimensional spatial conditioning, enabling probabilistic parameter inference directly from spatial observations. We validate this extension on convergent beam electron diffraction (CBED) parameter inference - a challenging multi-parameter inverse problem in materials characterization where sample geometry, electronic structure, and thermal properties must be extracted from 2D diffraction patterns. Using simulated CBED data with ground-truth parameters, we demonstrate that CDI produces well-calibrated posterior distributions that accurately reflect measurement constraints: tight distributions for well-determined quantities and appropriately broad distributions for ambiguous parameters. In contrast, standard regression methods - while appearing accurate on aggregate metrics - mask this underlying uncertainty by predicting training set means for poorly constrained parameters. Our results confirm that CDI successfully extends from temporal to spatial domains, providing the genuine uncertainty information required for robust scientific inference.
CVSep 26, 2025Code
CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design ProcessArman Akbari, Jian Gao, Yifei Zou et al.
Engineering design operates through hierarchical abstraction from system specifications to component implementations, requiring visual understanding coupled with mathematical reasoning at each level. While Multi-modal Large Language Models (MLLMs) excel at natural image tasks, their ability to extract mathematical models from technical diagrams remains unexplored. We present \textbf{CircuitSense}, a comprehensive benchmark evaluating circuit understanding across this hierarchy through 8,006+ problems spanning component-level schematics to system-level block diagrams. Our benchmark uniquely examines the complete engineering workflow: Perception, Analysis, and Design, with a particular emphasis on the critical but underexplored capability of deriving symbolic equations from visual inputs. We introduce a hierarchical synthetic generation pipeline consisting of a grid-based schematic generator and a block diagram generator with auto-derived symbolic equation labels. Comprehensive evaluation of six state-of-the-art MLLMs, including both closed-source and open-source models, reveals fundamental limitations in visual-to-mathematical reasoning. Closed-source models achieve over 85\% accuracy on perception tasks involving component recognition and topology identification, yet their performance on symbolic derivation and analytical reasoning falls below 19\%, exposing a critical gap between visual parsing and symbolic reasoning. Models with stronger symbolic reasoning capabilities consistently achieve higher design task accuracy, confirming the fundamental role of mathematical understanding in circuit synthesis and establishing symbolic reasoning as the key metric for engineering competence.
CVDec 4, 2025
IE2Video: Adapting Pretrained Diffusion Models for Event-Based Video ReconstructionDmitrii Torbunov, Onur Okuducu, Yi Huang et al.
Continuous video monitoring in surveillance, robotics, and wearable systems faces a fundamental power constraint: conventional RGB cameras consume substantial energy through fixed-rate capture. Event cameras offer sparse, motion-driven sensing with low power consumption, but produce asynchronous event streams rather than RGB video. We propose a hybrid capture paradigm that records sparse RGB keyframes alongside continuous event streams, then reconstructs full RGB video offline -- reducing capture power consumption while maintaining standard video output for downstream applications. We introduce the Image and Event to Video (IE2Video) task: reconstructing RGB video sequences from a single initial frame and subsequent event camera data. We investigate two architectural strategies: adapting an autoregressive model (HyperE2VID) for RGB generation, and injecting event representations into a pretrained text-to-video diffusion model (LTX) via learned encoders and low-rank adaptation. Our experiments demonstrate that the diffusion-based approach achieves 33\% better perceptual quality than the autoregressive baseline (0.283 vs 0.422 LPIPS). We validate our approach across three event camera datasets (BS-ERGB, HS-ERGB far/close) at varying sequence lengths (32-128 frames), demonstrating robust cross-dataset generalization with strong performance on unseen capture configurations.
AIFeb 2
AutoSizer: Automatic Sizing of Analog and Mixed-Signal Circuits via Large Language Model (LLM) AgentsXi Yu, Dmitrii Torbunov, Soumyajit Mandal et al.
The design of Analog and Mixed-Signal (AMS) integrated circuits remains heavily reliant on expert knowledge, with transistor sizing a major bottleneck due to nonlinear behavior, high-dimensional design spaces, and strict performance constraints. Existing Electronic Design Automation (EDA) methods typically frame sizing as static black-box optimization, resulting in inefficient and less robust solutions. Although Large Language Models (LLMs) exhibit strong reasoning abilities, they are not suited for precise numerical optimization in AMS sizing. To address this gap, we propose AutoSizer, a reflective LLM-driven meta-optimization framework that unifies circuit understanding, adaptive search-space construction, and optimization orchestration in a closed loop. It employs a two-loop optimization framework, with an inner loop for circuit sizing and an outer loop that analyzes optimization dynamics and constraints to iteratively refine the search space from simulation feedback. We further introduce AMS-SizingBench, an open benchmark comprising 24 diverse AMS circuits in SKY130 CMOS technology, designed to evaluate adaptive optimization policies under realistic simulator-based constraints. AutoSizer experimentally achieves higher solution quality, faster convergence, and higher success rate across varying circuit difficulties, outperforming both traditional optimization methods and existing LLM-based agents.
AINov 15, 2024
Diffusion Model-based Parameter Estimation in Dynamic Power SystemsFeiqin Zhu, Dmitrii Torbunov, Zhongjing Jiang et al.
Parameter estimation, which represents a classical inverse problem, is often ill-posed as different parameter combinations can yield identical outputs. This non-uniqueness poses a critical barrier to accurate and unique identification. This work introduces a novel parameter estimation framework to address such limits: the Joint Conditional Diffusion Model-based Inverse Problem Solver (JCDI). By leveraging the stochasticity of diffusion models, JCDI produces possible solutions revealing underlying distributions. Joint conditioning on multiple observations further narrows the posterior distributions of non-identifiable parameters. For the challenging task in dynamic power systems: composite load model parameterization, JCDI achieves a 58.6% reduction in parameter estimation error compared to the single-condition model. It also accurately replicates system's dynamic responses under various electrical faults, with root mean square errors below 4*10^(-3), outperforming existing deep-reinforcement-learning and supervised learning approaches. Given its data-driven nature, JCDI provides a universal framework for parameter estimation while effectively mitigating the non-uniqueness challenge across scientific domains.