Sakshi Singh

7.8CRJun 5

AMD-FCG: An Enhanced Function Call Graph Dataset with Integrated Topological Features for Malware Detection and Classification

Parthajit Borah, Sakshi Singh, D. K. Bhattacharyya et al.

As malware illustrates a complex structure and behavior, detection of these has been a significant challenge in the domain of cybersecurity along with related services in daily life. So, it becomes crucial to have a reliable and adaptive solution to address the issue. Among the several detection methods developed over the years, one of the most reliable ones is studying and analyzing the structural and behavioral patterns of malware. These patterns of sophisticated malware can be obtained with the help of Function Call Graphs (FCGs). However, to effectively cover numerous groups of families of malware, it is required to have a sufficiently large dataset for the system to operate on. In order to ensure accuracy and robustness of the system, the dataset should comprise samples of different malwares and a benign application for secure execution of the detection process. This paper introduces AMD-FCG, an enhanced Function Call Graph dataset integrated with topological features of malwares. The framework enhances the detection procedure, streamlining the workflow for cybersecurity professionals and also eliminating the need for dynamic analysis and extensive processing. Therefore, it can be used to develop and deploy more efficient and innovative malware detection systems.

2.7CLApr 27, 2025

Sample-Efficient Language Model for Hinglish Conversational AI

Sakshi Singh, Abhinav Prakash, Aakriti Shah et al.

This paper presents our process for developing a sample-efficient language model for a conversational Hinglish chatbot. Hinglish, a code-mixed language that combines Hindi and English, presents a unique computational challenge due to inconsistent spelling, lack of standardization, and limited quality of conversational data. This work evaluates multiple pre-trained cross-lingual language models, including Gemma3-4B and Qwen2.5-7B, and employs fine-tuning techniques to improve performance on Hinglish conversational tasks. The proposed approach integrates synthetically generated dialogues with insights from existing Hinglish datasets to address data scarcity. Experimental results demonstrate that models with fewer parameters, when appropriately fine-tuned on high-quality code-mixed data, can achieve competitive performance for Hinglish conversation generation while maintaining computational efficiency.

Sakshi Singh

2 Papers