7.0SDMay 8Code
TARNet: A Temporal-Aware Multi-Scale Architecture for Closed-Set Speaker IdentificationYassin Terraf, Youssef Iraqi
Closed-Set speaker identification aims to assign a speech utterance to one of a predefined set of enrolled speakers and requires robust modeling of speaker-specific characteristics across multiple temporal scales. While recent deep learning approaches have achieved strong performance, many existing architectures provide limited mechanisms for modeling temporal dependencies across different time scales, which can restrict the effective use of complementary short-, mid-, and long-term speaker characteristics. In this paper, we propose TARNet, a lightweight Temporal-Aware Representation Network for closed-set speaker identification. TARNet explicitly models temporal information at multiple time scales using a multi-stage temporal encoder with stage-specific dilation configurations. The resulting multi-scale representations are fused and aggregated via an Attentive Statistics Pooling (ASP) module to produce a discriminative utterance-level speaker embedding. Experiments on the VoxCeleb1 and LibriSpeech datasets show that TARNet outperforms state-of-the-art methods while maintaining competitive computational complexity, making it suitable for practical speaker identification systems. The code is publicly available at https://github.com/YassinTERRAF/TARNet.
CVJul 1, 2025Code
Instant Particle Size Distribution Measurement Using CNNs Trained on Synthetic DataYasser El Jarida, Youssef Iraqi, Loubna Mekouar
Accurate particle size distribution (PSD) measurement is important in industries such as mining, pharmaceuticals, and fertilizer manufacturing, significantly influencing product quality and operational efficiency. Traditional PSD methods like sieve analysis and laser diffraction are manual, time-consuming, and limited by particle overlap. Recent developments in convolutional neural networks (CNNs) enable automated, real-time PSD estimation directly from particle images. In this work, we present a CNN-based methodology trained on realistic synthetic particle imagery generated using Blender's advanced rendering capabilities. Synthetic data sets using this method can replicate various industrial scenarios by systematically varying particle shapes, textures, lighting, and spatial arrangements that closely resemble the actual configurations. We evaluated three CNN-based architectures, ResNet-50, InceptionV3, and EfficientNet-B0, for predicting critical PSD parameters (d10, d50, d90). Results demonstrated comparable accuracy across models, with EfficientNet-B0 achieving the best computational efficiency suitable for real-time industrial deployment. This approach shows the effectiveness of realistic synthetic data for robust CNN training, which offers significant potential for automated industrial PSD monitoring. The code is released at : https://github.com/YasserElj/Synthetic-Granular-Gen