Santhosh Malarvannan

IV
h-index5
3papers
16citations
Novelty48%
AI Score30

3 Papers

MMJul 26, 2024Code
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention

Joe Dhanith P R, Shravan Venkatraman, Vigya Sharma et al.

Multimodal emotion recognition (MER) aims to infer human affect by jointly modeling audio and visual cues; however, existing approaches often struggle with temporal misalignment, weakly discriminative feature representations, and suboptimal fusion of heterogeneous modalities. To address these challenges, we propose AVT-CA, an Audio-Video Transformer architecture with cross attention for robust emotion recognition. The proposed model introduces a hierarchical video feature representation that combines channel attention, spatial attention, and local feature extraction to emphasize emotionally salient regions while suppressing irrelevant information. These refined visual features are integrated with audio representations through an intermediate transformer-based fusion mechanism that captures interlinked temporal dependencies across modalities. Furthermore, a cross-attention module selectively reinforces mutually consistent audio-visual cues, enabling effective feature selection and noise-aware fusion. Extensive experiments on three benchmark datasets, CMU-MOSEI, RAVDESS, and CREMA-D, demonstrate that AVT-CA consistently outperforms state-of-the-art baselines, achieving significant improvements in both accuracy and F1-score. Our source code is publicly available at https://github.com/shravan-18/AVTCA.

IVJul 3, 2024
Exploiting Precision Mapping and Component-Specific Feature Enhancement for Breast Cancer Segmentation and Identification

Pandiyaraju V, Shravan Venkatraman, Pavan Kumar S et al.

Breast cancer is one of the leading causes of death globally, and thus there is an urgent need for early and accurate diagnostic techniques. Although ultrasound imaging is a widely used technique for breast cancer screening, it faces challenges such as poor boundary delineation caused by variations in tumor morphology and reduced diagnostic accuracy due to inconsistent image quality. To address these challenges, we propose novel Deep Learning (DL) frameworks for breast lesion segmentation and classification. We introduce a precision mapping mechanism (PMM) for a precision mapping and attention-driven LinkNet (PMAD-LinkNet) segmentation framework that dynamically adapts spatial mappings through morphological variation analysis, enabling precise pixel-level refinement of tumor boundaries. Subsequently, we introduce a component-specific feature enhancement module (CSFEM) for a component-specific feature-enhanced classifier (CSFEC-Net). Through a multi-level attention approach, the CSFEM magnifies distinguishing features of benign, malignant, and normal tissues. The proposed frameworks are evaluated against existing literature and a diverse set of state-of-the-art Convolutional Neural Network (CNN) architectures. The obtained results show that our segmentation model achieves an accuracy of 98.1%, an IoU of 96.9%, and a Dice Coefficient of 97.2%. For the classification model, an accuracy of 99.2% is achieved with F1-score, precision, and recall values of 99.1%, 99.3%, and 99.1%, respectively.

IVNov 16, 2024
A Novel Adaptive Hybrid Focal-Entropy Loss for Enhancing Diabetic Retinopathy Detection Using Convolutional Neural Networks

Santhosh Malarvannan, Pandiyaraju V, Shravan Venkatraman et al.

Diabetic retinopathy is a leading cause of blindness around the world and demands precise AI-based diagnostic tools. Traditional loss functions in multi-class classification, such as Categorical Cross-Entropy (CCE), are very common but break down with class imbalance, especially in cases with inherently challenging or overlapping classes, which leads to biased and less sensitive models. Since a heavy imbalance exists in the number of examples for higher severity stage 4 diabetic retinopathy, etc., classes compared to those very early stages like class 0, achieving class balance is key. For this purpose, we propose the Adaptive Hybrid Focal-Entropy Loss which combines the ideas of focal loss and entropy loss with adaptive weighting in order to focus on minority classes and highlight the challenging samples. The state-of-the art models applied for diabetic retinopathy detection with AHFE revealed good performance improvements, indicating the top performances of ResNet50 at 99.79%, DenseNet121 at 98.86%, Xception at 98.92%, MobileNetV2 at 97.84%, and InceptionV3 at 93.62% accuracy. This sheds light into how AHFE promotes enhancement in AI-driven diagnostics for complex and imbalanced medical datasets.