4 Papers

59.2CVMar 25
A-SelecT: Automatic Timestep Selection for Diffusion Transformer Representation Learning

Changyu Liu, James Chenhao Liang, Wenhao Yang et al.

Diffusion models have significantly reshaped the field of generative artificial intelligence and are now increasingly explored for their capacity in discriminative representation learning. Diffusion Transformer (DiT) has recently gained attention as a promising alternative to conventional U-Net-based diffusion models, demonstrating a promising avenue for downstream discriminative tasks via generative pre-training. However, its current training efficiency and representational capacity remain largely constrained due to the inadequate timestep searching and insufficient exploitation of DiT-specific feature representations. In light of this view, we introduce Automatically Selected Timestep (A-SelecT) that dynamically pinpoints DiT's most information-rich timestep from the selected transformer feature in a single run, eliminating the need for both computationally intensive exhaustive timestep searching and suboptimal discriminative feature selection. Extensive experiments on classification and segmentation benchmarks demonstrate that DiT, empowered by A-SelecT, surpasses all prior diffusion-based attempts efficiently and effectively.

33.6ROMay 21
Non-Contact Vibration-Based Damage Detection of Civil Structures Using a Cost-Effective Autonomous UAV

Javier Becerril, Maximiliano Vargas, Jennifer Herrera et al.

This paper presents a non-contact approach for vibration-based structural damage detection using an autonomous and customized cost-effective unmanned aerial vehicle (UAV). Vibration signals are extracted from video recordings through vision-based motion tracking to identify shifts in natural frequencies indicative of structural degradation. A laboratory-scale frame structure is evaluated under healthy and simulated-damage conditions. The proposed system is validated through an experimental study involving two smartphones, a USB camera, and a custom-built low-cost UAV equipped with an onboard camera and an autonomous alignment system for operation in GPS-denied environments. The displacement time is extracted and analyzed in the frequency domain and compared to reference measurements from contact accelerometers and a finite element model. Experimental results show that all platforms successfully capture the fundamental frequency and its shift due to damage. Although the UAV exhibits slightly higher errors (up to 5.7%) due to platform-induced disturbances and sensing limitations, it reliably detects damage-induced frequency changes. Compared to commercial UAV systems, the proposed platform achieves comparable inspection performance at significantly lower cost. These results demonstrate that low-cost autonomous UAVs provide a practical, flexible, and scalable solution for structural health monitoring, particularly in scenarios where contact-based sensing is impractical. The findings also support the potential for the deployment of multiple cooperative UAVs to further enhance inspection coverage and robustness.

29.9CVMay 18
Patch-MoE Mamba: A Patch-Ordered Mixture-of-Experts State Space Architecture for Medical Image Segmentation

Diego Adame, Fabian Vazquez, Jose A. Nunez et al.

CNN- and Transformer-based architectures have achieved strong performance in medical image segmentation, but CNNs are limited in modeling long-range dependencies, while Transformers often suffer from quadratic computational and memory complexity. State space models, especially Mamba-based networks, offer an efficient alternative with linear sequence complexity. However, existing Mamba segmentation models still face two limitations: pixel-wise directional scanning can disrupt local 2D spatial structure, and simple summation-based fusion of scan directions cannot adapt well to diverse object sizes, shapes, and boundaries. To address these issues, we propose \textit{Patch-MoE Mamba}, a patch-ordered mixture-of-experts state space architecture for medical image segmentation. It introduces a hierarchical patch-ordered scanning mechanism that preserves local spatial neighborhoods while capturing multi-scale context, and an MoE-based directional fusion module that adaptively combines multiple Mamba scanner outputs using four directional experts, a learnable concatenation expert, and residual directional aggregation. Experiments on five public polyp segmentation benchmarks and the ISIC 2017/2018 skin lesion segmentation datasets demonstrate the effectiveness and generality of Patch-MoE Mamba.

LGDec 12, 2025
DFedReweighting: A Unified Framework for Objective-Oriented Reweighting in Decentralized Federated Learning

Kaichuang Zhang, Wei Yin, Jinghao Yang et al.

Decentralized federated learning (DFL) has recently emerged as a promising paradigm that enables multiple clients to collaboratively train machine learning model through iterative rounds of local training, communication, and aggregation without relying on a central server which introduces potential vulnerabilities in conventional Federated Learning. Nevertheless, DFL systems continue to face a range of challenges, including fairness, robustness, etc. To address these challenges, we propose \textbf{DFedReweighting}, a unified aggregation framework designed to achieve diverse objectives in DFL systems via a objective-oriented reweighting aggregation at the final step of each learning round. Specifically, the framework first computes preliminary weights based on \textit{target performance metric} obtained from auxiliary dataset constructed using local data. These weights are then refined using \textit{customized reweighting strategy}, resulting in the final aggregation weights. Our results from the theoretical analysis demonstrate that the appropriate combination of the target performance metric and the customized reweighting strategy ensures linear convergence. Experimental results consistently show that our proposed framework significantly improves fairness and robustness against Byzantine attacks in diverse scenarios. Provided that appropriate target performance metrics and customized reweighting strategy are selected, our framework can achieve a wide range of desired learning objectives.