3 Papers

CLDec 19, 2025
Advances and Challenges in Semantic Textual Similarity: A Comprehensive Survey

Lokendra Kumar, Neelesh S. Upadhye, Kannan Piedy

Semantic Textual Similarity (STS) research has expanded rapidly since 2021, driven by advances in transformer architectures, contrastive learning, and domain-specific techniques. This survey reviews progress across six key areas: transformer-based models, contrastive learning, domain-focused solutions, multi-modal methods, graph-based approaches, and knowledge-enhanced techniques. Recent transformer models such as FarSSiBERT and DeBERTa-v3 have achieved remarkable accuracy, while contrastive methods like AspectCSE have established new benchmarks. Domain-adapted models, including CXR-BERT for medical texts and Financial-STS for finance, demonstrate how STS can be effectively customized for specialized fields. Moreover, multi-modal, graph-based, and knowledge-integrated models further enhance semantic understanding and representation. By organizing and analyzing these developments, the survey provides valuable insights into current methods, practical applications, and remaining challenges. It aims to guide researchers and practitioners alike in navigating rapid advancements, highlighting emerging trends and future opportunities in the evolving field of STS.

13.9LGMar 30
Rethinking Attention Output Projection: Structured Hadamard Transforms for Efficient Transformers

Shubham Aggarwal, Lokendra Kumar

The dense output projection in multi head attention scales quadratically with model dimension, contributing significantly to parameter count, memory footprint, and inference cost. We propose replacing this projection with a fixed, parameter free Walsh Hadamard Transform (WHT) followed by a diagonal affine transformation. This approach eliminates approximately 25 percent of attention parameters per block while maintaining global cross-head interaction through an orthogonal, norm-preserving transformation. Our results demonstrate that WHT augmented models exhibit a steeper validation loss curve relative to training FLOPs compared to dense baselines, suggesting superior compute utilization during training. Crucially, we show that efficiency gains including reduced memory footprint and increased throughput grow monotonically with model size, batch size, and sequence length. We evaluate performance across both prefill and decoding stages, finding that the structured transform consistently outperforms dense projections as complexity increases. Our findings indicate that replacing dense projections with structured transforms allows for more compute-efficient architectures that achieve lower loss than dense models at an equivalent training budget.

10.9CVMar 20
Hyper-Connections for Adaptive Multi-Modal MRI Brain Tumor Segmentation

Lokendra Kumar, Shubham Aggarwal

We present the first study of Hyper-Connections (HC) for volumetric multi-modal brain tumor segmentation, integrating them as a drop-in replacement for fixed residual connections across five architectures: nnU-Net, SwinUNETR, VT-UNet, U-Net, and U-Netpp. Dynamic HC consistently improves all 3D models on the BraTS 2021 dataset, yielding up to +1.03 percent mean Dice gain with negligible parameter overhead. Gains are most pronounced in the Enhancing Tumor sub-region, reflecting improved fine-grained boundary delineation. Modality ablation further reveals that HC-equipped models develop sharper sensitivity toward clinically dominant sequences, specifically T1ce for Tumor Core and Enhancing Tumor, and FLAIR for Whole Tumor, a behavior absent in fixed-connection baselines and consistent across all architectures. In 2D settings, improvements are smaller and configuration-sensitive, suggesting that volumetric spatial context amplifies the benefit of adaptive aggregation. These results establish HC as a simple, efficient, and broadly applicable mechanism for multi-modal feature fusion in medical image segmentation.