CVMar 28, 2024Code
ILPO-NET: Network for the invariant recognition of arbitrary volumetric patterns in 3DDmitrii Zhemchuzhnikov, Sergei Grudinin
Effective recognition of spatial patterns and learning their hierarchy is crucial in modern spatial data analysis. Volumetric data applications seek techniques ensuring invariance not only to shifts but also to pattern rotations. While traditional methods can readily achieve translational invariance, rotational invariance possesses multiple challenges and remains an active area of research. Here, we present ILPO-Net (Invariant to Local Patterns Orientation Network), a novel approach that handles arbitrarily shaped patterns with the convolutional operation inherently invariant to local spatial pattern orientations using the Wigner matrix expansions. Our architecture seamlessly integrates the new convolution operator and, when benchmarked on diverse volumetric datasets such as MedMNIST and CATH, demonstrates superior performance over the baselines with significantly reduced parameter counts - up to 1000 times fewer in the case of MedMNIST. Beyond these demonstrations, ILPO-Net's rotational invariance paves the way for other applications across multiple disciplines. Our code is publicly available at https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/ILPONet.
CVApr 24, 2024Code
On the Fourier analysis in the SO(3) space : EquiLoPO NetworkDmitrii Zhemchuzhnikov, Sergei Grudinin
Analyzing volumetric data with rotational invariance or equivariance is an active topic in current research. Existing deep-learning approaches utilize either group convolutional networks limited to discrete rotations or steerable convolutional networks with constrained filter structures. This work proposes a novel equivariant neural network architecture that achieves analytical Equivariance to Local Pattern Orientation on the continuous SO(3) group while allowing unconstrained trainable filters - EquiLoPO Network. Our key innovations are a group convolutional operation leveraging irreducible representations as the Fourier basis and a local activation function in the SO(3) space that provides a well-defined mapping from input to output functions, preserving equivariance. By integrating these operations into a ResNet-style architecture, we propose a model that overcomes the limitations of prior methods. A comprehensive evaluation on diverse 3D medical imaging datasets from MedMNIST3D demonstrates the effectiveness of our approach, which consistently outperforms state of the art. This work suggests the benefits of true rotational equivariance on SO(3) and flexible unconstrained filters enabled by the local activation function, providing a flexible framework for equivariant deep learning on volumetric data with potential applications across domains. Our code is publicly available at https://gricad-gitlab.univ-grenoble-alpes.fr/GruLab/ILPO/-/tree/main/EquiLoPO.
GRFeb 18
CADReasoner: Iterative Program Editing for CAD Reverse EngineeringSoslan Kabisov, Vsevolod Kirichuk, Andrey Volkov et al.
Computer-Aided Design (CAD) powers modern engineering, yet producing high-quality parts still demands substantial expert effort. Many AI systems tackle CAD reverse engineering, but most are single-pass and miss fine geometric details. In contrast, human engineers compare the input shape with the reconstruction and iteratively modify the design based on remaining discrepancies. Agent-based methods mimic this loop with frozen VLMs, but weak 3D grounding of current foundation models limits reliability and efficiency. We introduce CADReasoner, a model trained to iteratively refine its prediction using geometric discrepancy between the input and the predicted shape. The model outputs a runnable CadQuery Python program whose rendered mesh is fed back at the next step. CADReasoner fuses multi-view renders and point clouds as complementary modalities. To bridge the realism gap, we propose a scan-simulation protocol applied during both training and evaluation. Across DeepCAD, Fusion 360, and MCB benchmarks, CADReasoner attains state-of-the-art results on clean and scan-sim tracks.
CVMay 28, 2025
cadrille: Multi-modal CAD Reconstruction with Online Reinforcement LearningMaksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov et al.
Computer-Aided Design (CAD) plays a central role in engineering and manufacturing, making it possible to create precise and editable 3D models. Using a variety of sensor or user-provided data as inputs for CAD reconstruction can democratize access to design applications. However, existing methods typically focus on a single input modality, such as point clouds, images, or text, which limits their generalizability and robustness. Leveraging recent advances in vision-language models (VLM), we propose a multi-modal CAD reconstruction model that simultaneously processes all three input modalities. Inspired by large language model (LLM) training paradigms, we adopt a two-stage pipeline: supervised fine-tuning (SFT) on large-scale procedurally generated data, followed by reinforcement learning (RL) fine-tuning using online feedback, obtained programatically. Furthermore, we are the first to explore RL fine-tuning of LLMs for CAD tasks demonstrating that online RL algorithms such as Group Relative Preference Optimization (GRPO) outperform offline alternatives. In the DeepCAD benchmark, our SFT model outperforms existing single-modal approaches in all three input modalities simultaneously. More importantly, after RL fine-tuning, cadrille sets new state-of-the-art on three challenging datasets, including a real-world one.
QMJul 26, 2021
6DCNN with roto-translational convolution filters for volumetric data processingDmitrii Zhemchuzhnikov, Ilia Igashov, Sergei Grudinin
In this work, we introduce 6D Convolutional Neural Network (6DCNN) designed to tackle the problem of detecting relative positions and orientations of local patterns when processing three-dimensional volumetric data. 6DCNN also includes SE(3)-equivariant message-passing and nonlinear activation operations constructed in the Fourier space. Working in the Fourier space allows significantly reducing the computational complexity of our operations. We demonstrate the properties of the 6D convolution and its efficiency in the recognition of spatial patterns. We also assess the 6DCNN model on several datasets from the recent CASP protein structure prediction challenges. Here, 6DCNN improves over the baseline architecture and also outperforms the state of the art.