Jitin Singla

h-index11

6papers

517citations

Novelty25%

AI Score40

Ranked #76,355 of 194,257 authors (top 39%)#14,594 in CL (top 47%)

6 Papers

2.0LGSep 28, 2023

On Learning with LAD

C. A. Jothishwaran, Biplav Srivastava, Jitin Singla et al.

The logical analysis of data, LAD, is a technique that yields two-class classifiers based on Boolean functions having disjunctive normal form (DNF) representation. Although LAD algorithms employ optimization techniques, the resulting binary classifiers or binary rules do not lead to overfitting. We propose a theoretical justification for the absence of overfitting by estimating the Vapnik-Chervonenkis dimension (VC dimension) for LAD models where hypothesis sets consist of DNFs with a small number of cubic monomials. We illustrate and confirm our observations empirically.

4.2CLMar 25

SamasÄmayik: A Parallel Dataset for Hindi-Sanskrit Machine Translation

N J Karthika, Keerthana Suryanarayanan, Jahanvi Purohit et al.

We release SamasÄmayik, a novel, meticulously curated, large-scale Hindi-Sanskrit corpus, comprising 92,196 parallel sentences. Unlike most data available in Sanskrit, which focuses on classical era text and poetry, this corpus aggregates data from diverse sources covering contemporary materials, including spoken tutorials, children's magazines, radio conversations, and instruction materials. We benchmark this new dataset by fine-tuning three complementary models - ByT5, NLLB and IndicTrans-v2, to demonstrate its utility. Our experiments demonstrate that models trained on the Samasamayik corpus achieve significant performance gains on in-domain test data, while achieving comparable performance on other widely used test sets, establishing a strong new performance baseline for contemporary Hindi-Sanskrit translation. Furthermore, a comparative analysis against existing corpora reveals minimal semantic and lexical overlap, confirming the novelty and non-redundancy of our dataset as a robust new resource for low-resource Indic language MT.

5.7LGMar 18

Pathology-Aware Multi-View Contrastive Learning for Patient-Independent ECG Reconstruction

Youssef Youssef, Jitin Singla

Reconstructing a 12-lead electrocardiogram (ECG) from a reduced lead set is an ill-posed inverse problem due to anatomical variability. Standard deep learning methods often ignore underlying cardiac pathology losing vital morphology in precordial leads. We propose Pathology-Aware Multi-View Contrastive Learning, a framework that regularizes the latent space through a pathological manifold. Our architecture integrates high-fidelity time-domain waveforms with pathology-aware embeddings learned via supervised contrastive alignment. By maximizing mutual information between latent representations and clinical labels, the framework learns to filter anatomical "nuisance" variables. On the PTB-XL dataset, our method achieves approx. 76\% reduction in RMSE compared to state-of-the-art model in patient-independent setting. Cross-dataset evaluation on the PTB Diagnostic Database confirms superior generalization, bridging the gap between hardware portability and diagnostic-grade reconstruction.

1.7CLMay 23, 2023Code

Sāmayik: A Benchmark and Dataset for English-Sanskrit Translation

Ayush Maheshwari, Ashim Gupta, Amrith Krishna et al.

We release Sāmayik, a dataset of around 53,000 parallel English-Sanskrit sentences, written in contemporary prose. Sanskrit is a classical language still in sustenance and has a rich documented heritage. However, due to the limited availability of digitized content, it still remains a low-resource language. Existing Sanskrit corpora, whether monolingual or bilingual, have predominantly focused on poetry and offer limited coverage of contemporary written materials. Sāmayik is curated from a diverse range of domains, including language instruction material, textual teaching pedagogy, and online tutorials, among others. It stands out as a unique resource that specifically caters to the contemporary usage of Sanskrit, with a primary emphasis on prose writing. Translation models trained on our dataset demonstrate statistically significant improvements when translating out-of-domain contemporary corpora, outperforming models trained on older classical-era poetry datasets. Finally, we also release benchmark models by adapting four multilingual pre-trained models, three of them have not been previously exposed to Sanskrit for translating between English and Sanskrit while one of them is multi-lingual pre-trained translation model including English and Sanskrit. The dataset and source code is present at https://github.com/ayushbits/saamayik.

5.7CVJun 28

FiRe: Frequency Reparameterization as a Preconditioner for Periodic Implicit Neural Representations

Harinandan Shukla, Rajarshi Verma, Jitin Singla

Periodic Implicit Neural Representations (INRs) such as SIREN and FINER assign every neuron, the same global frequency, spending the representational budget inefficiently when local signal content varies. We introduce FiRe (Frequency Reparameterization), that accelerates optimization by reparameterizing per-neuron frequency of periodic INRs without changing their underlying activation function. FiRe gives each neuron a bounded, input-dependent frequency via a separate low-rank gating path and is applicable to any periodic activation function. The gate acts as an implicit preconditioner that improves optimization conditioning at initialization via the Neural Tangent Kernel (NTK). This better-conditioned initialization makes optimization converge faster, and the high-frequency content of the reconstruction tracks the target more closely at a fixed computational budget. On 2D image fitting, FiRe increases PSNR over a parameter-matched baseline (up to +1 dB at short training budgets), with gains that vary with resolution and diminish at full convergence. We characterize how performance depends on resolution, rank, and training budget, and give an NTK account that predicts these trends.

3.6CVSep 23, 2025

YOLO-LAN: Precise Polyp Detection via Optimized Loss, Augmentations and Negatives

Siddharth Gupta, Jitin Singla

Colorectal cancer (CRC), a lethal disease, begins with the growth of abnormal mucosal cell proliferation called polyps in the inner wall of the colon. When left undetected, polyps can become malignant tumors. Colonoscopy is the standard procedure for detecting polyps, as it enables direct visualization and removal of suspicious lesions. Manual detection by colonoscopy can be inconsistent and is subject to oversight. Therefore, object detection based on deep learning offers a better solution for a more accurate and real-time diagnosis during colonoscopy. In this work, we propose YOLO-LAN, a YOLO-based polyp detection pipeline, trained using M2IoU loss, versatile data augmentations and negative data to replicate real clinical situations. Our pipeline outperformed existing methods for the Kvasir-seg and BKAI-IGH NeoPolyp datasets, achieving mAP$_{50}$ of 0.9619, mAP$_{50:95}$ of 0.8599 with YOLOv12 and mAP$_{50}$ of 0.9540, mAP$_{50:95}$ of 0.8487 with YOLOv8 on the Kvasir-seg dataset. The significant increase is achieved in mAP$_{50:95}$ score, showing the precision of polyp detection. We show robustness based on polyp size and precise location detection, making it clinically relevant in AI-assisted colorectal screening.