Parteek Kumar

CL
h-index23
4papers
7citations
Novelty35%
AI Score38

4 Papers

4.3CLJun 2
CourseTimeQA: A Lecture-Video Benchmark and a Latency-Constrained Cross-Modal Fusion Method for Timestamped QA

Vsevolod Kovalev, Parteek Kumar

We study timestamped question answering over educational lecture videos under a single-GPU latency/memory budget. Given a natural-language query, the system retrieves relevant timestamped segments and synthesizes a grounded answer. We present CourseTimeQA (52.3 h, 902 queries across six courses) and a lightweight, latency-constrained cross-modal retriever (CrossFusion-RAG) that combines frozen encoders, a learned 512->768 vision projection, shallow query-agnostic cross-attention over ASR and frames with a temporal-consistency regularizer, and a small cross-attentive reranker. On CourseTimeQA, CrossFusion-RAG improves nDCG@10 by 0.10 and MRR by 0.08 over a strong BLIP-2 retriever while achieving approximately 1.55 s median end-to-end latency on a single A100. Closest comparators (zero-shot CLIP multi-frame pooling; CLIP + cross-encoder reranker + MMR; learned late-fusion gating; text-only hybrid with cross-encoder reranking and its MMR variant; caption-augmented text retrieval; non-learned temporal smoothing) are evaluated under matched hardware and indexing. We report robustness across ASR noise (WER quartiles), diagnostics for temporal localization, and full training/tuning details to support reproducible comparison.

CLSep 10, 2013Code
Implementation of nlization framework for verbs, pronouns and determiners with eugene

Harinder Singh, Parteek Kumar

UNL system is designed and implemented by a nonprofit organization, UNDL Foundation at Geneva in 1999. UNL applications are application softwares that allow end users to accomplish natural language tasks, such as translating, summarizing, retrieving or extracting information, etc. Two major web based application softwares are Interactive ANalyzer (IAN), which is a natural language analysis system. It represents natural language sentences as semantic networks in the UNL format. Other application software is dEep-to-sUrface GENErator (EUGENE), which is an open-source interactive NLizer. It generates natural language sentences out of semantic networks represented in the UNL format. In this paper, NLization framework with EUGENE is focused, while using UNL system for accomplishing the task of machine translation. In whole NLization process, EUGENE takes a UNL input and delivers an output in natural language without any human intervention. It is language-independent and has to be parametrized to the natural language input through a dictionary and a grammar, provided as separate interpretable files. In this paper, it is explained that how UNL input is syntactically and semantically analyzed with the UNL-NL T-Grammar for NLization of UNL sentences involving verbs, pronouns and determiners for Punjabi natural language.

SDMay 4, 2025
Probing Audio-Generation Capabilities of Text-Based Language Models

Arjun Prasaath Anbazhagan, Parteek Kumar, Ujjwal Kaur et al.

How does textual representation of audio relate to the Large Language Model's (LLMs) learning about the audio world? This research investigates the extent to which LLMs can be prompted to generate audio, despite their primary training in textual data. We employ a three-tier approach, progressively increasing the complexity of audio generation: 1) Musical Notes, 2) Environmental Sounds, and 3) Human Speech. To bridge the gap between text and audio, we leverage code as an intermediary, prompting LLMs to generate code that, when executed, produces the desired audio output. To evaluate the quality and accuracy of the generated audio, we employ FAD and CLAP scores. Our findings reveal that while LLMs can generate basic audio features, their performance deteriorates as the complexity of the audio increases. This suggests that while LLMs possess a latent understanding of the auditory world, their ability to translate this understanding into tangible audio output remains rudimentary. Further research into techniques that can enhance the quality and diversity of LLM-generated audio can lead to an improvement in the performance of text-based LLMs in generating audio.

CVNov 30, 2024
Learner Attentiveness and Engagement Analysis in Online Education Using Computer Vision

Sharva Gogawale, Madhura Deshpande, Parteek Kumar et al.

In recent times, online education and the usage of video-conferencing platforms have experienced massive growth. Due to the limited scope of a virtual classroom, it may become difficult for instructors to analyze learners' attention and comprehension in real time while teaching. In the digital mode of education, it would be beneficial for instructors to have an automated feedback mechanism to be informed regarding learners' attentiveness at any given time. This research presents a novel computer vision-based approach to analyze and quantify learners' attentiveness, engagement, and other affective states within online learning scenarios. This work presents the development of a multiclass multioutput classification method using convolutional neural networks on a publicly available dataset - DAiSEE. A machine learning-based algorithm is developed on top of the classification model that outputs a comprehensive attentiveness index of the learners. Furthermore, an end-to-end pipeline is proposed through which learners' live video feed is processed, providing detailed attentiveness analytics of the learners to the instructors. By comparing the experimental outcomes of the proposed method against those of previous methods, it is demonstrated that the proposed method exhibits better attentiveness detection than state-of-the-art methods. The proposed system is a comprehensive, practical, and real-time solution that is deployable and easy to use. The experimental results also demonstrate the system's efficiency in gauging learners' attentiveness.