Aritra Islam Saswato

2papers

2 Papers

12.5ROMar 11
STM32-Based Smart Waste Bin for Hygienic Disposal Using Embedded Sensing and Automated Control

Mohammed Aman Bhuiyan, Aritra Islam Saswato, Md. Misbah Khan et al.

The increasing demand for hygienic and contactless solutions in public and private environments has encouraged the development of automated systems for everyday applications. This paper presents the design and implementation of a motion- sensing automatic waste bin using an STM32 microcontroller, ultrasonic sensors, and a servo motor. The system detects user presence through ultrasonic sensing and automatically opens the bin lid using a servo motor controlled by the microcontroller. An additional ultrasonic sensor is used to monitor the internal waste level of the bin, while an OLED display provides real- time feedback regarding system status. The proposed system offers a low-cost, reliable, and easily deployable solution for touch-free waste disposal. Experimental evaluation demonstrates fast response time, stable sensing performance, and smooth mechanical operation. The system can be effectively deployed in homes, educational institutions, hospitals, and public facilities to improve hygiene and user convenience.

17.3SDMay 6
Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization

Mohammed Aman Bhuiyan, Md Sazzad Hossain Adib, Samiul Basir Bhuiyan et al.

Automatic Speech Recognition (ASR) and speaker diarization in Bangla remain challenging due to long form recordings, diverse acoustic conditions, and significant speaker variability. This work addresses these two core tasks in Bangla spoken language understanding by developing robust systems for long form ASR and speaker diarization. For ASR (Problem 1), we fine tune the tugstugi bengaliai regional asr whisper medium model on a custom-curated dataset of approximately 15,000 chunked and aligned Bangla audio segments, employing full weight training with extensive data augmentation including noise injection, reverb simulation, echo, clipping distortion, and pitch/time perturbation. For speaker diarization (Problem 2), we fine-tune the pyannote/segmentation-3.0 model using PyTorch Lightning on the competition annotated diarization dataset, swapping the fine-tuned segmentation backbone into the pyannote/speaker-diarization-community-1 pipeline while retaining the pretrained speaker embedding and clustering components. Our ASR system achieves a Word Error Rate (WER) of 0.2441, while our diarization system achieves a Diarization Error Rate (DER) of 0.2392, both evaluated on the test set, demonstrating notable improvements over the respective pretrained baselines. We describe our complete pipeline, including data preprocessing, text normalization, audio augmentation, training strategies, inference optimization, and post-processing for both tasks.