63.2AIJun 2
GTBench: A Curriculum-Grounded Benchmark for Evaluating LLMs as Mathematical Research Assistants in Graph TheoryNoujoud Nader, Ibrahem Aljabea, Patrick Diehl et al.
Large language models (LLMs) are increasingly used as self-study assistants in technical disciplines, yet their reliability as mathematical reasoning assistants remains poorly understood. We introduce GTBench, a curriculum-grounded benchmark for evaluating LLMs as mathematical research assistants in graph theory, comprising 63 problems organized into three groups of increasing difficulty: undergraduate definitions and basic properties (Group 1), algorithm tracing and structural reasoning (Group 2), and graduate-level proof construction (Group 3). Problems are sourced from verified academic materials including Diestel's Graph Theory. We evaluate five frontier models -- GPT-5, Claude Sonnet 4.6, Gemini 2.5 Flash-Lite, Llama 3.3 70B, and Mistral Large 3 -- under zero-shot and chain-of-thought prompting, using exact-match and LLM-as-judge evaluation for Groups 1 and 2, and a hybrid human expert and LLM-as-judge protocol for Group 3. Our results reveal a pronounced performance hierarchy: GPT-5 approaches ceiling on Group 1 (95.8% zero-shot) and maintains meaningful accuracy on graduate proofs (82%), while all other models degrade substantially with difficulty, with Llama achieving 0% under human evaluation on Group 3 zero-shot. Failure mode analysis shows that correct algorithm, wrong execution errors dominate Groups 1 and 2, while Group 3 additionally surfaces incomplete reasoning failures and reveals systematic disagreement between human evaluators and the automated judge, particularly on verbose or near-complete proofs (kappa = 0.48-0.83 across human pairs). GTBench provides the first curriculum-grounded evaluation framework for graph-theoretic reasoning in LLMs, with direct implications for the governance of AI tools in mathematical education and scientific research.
CRNov 27, 2023
Privacy-Preserving Data Sharing in Agriculture: Enforcing Policy Rules for Secure and Confidential Data SynthesisAnantaa Kotal, Lavanya Elluri, Deepti Gupta et al.
Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the EU GDPR, the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats and secure data based on applicable regulatory policy rules.
1.9CRMay 22
Microbenchmarking Cloud Cryptographic Workloads for Privacy-Preserving Healthcare IoTJeremiah L. Webb, Laxima Niure Kandel, Deepti Gupta et al.
Cryptographic operations are an essential component of cloud security architectures; their comprehensive performance characterization across different cloud services, hardware architectures, and programming language implementations remains unknown. Specifically, healthcare IoT devices are highly vulnerable and frequently targeted, yet the cryptographic performance trade offs in their cloud security architectures remain poorly understood. This research presents an extensive microbenchmark study evaluating the performance of core cryptographic workloads, including SHA HMAC generation, AES encryption, decryption, Elliptic Curve Cryptography (ECC) signature generation and verification, and RSA encryption, decryption, across Function as a Service (FaaS) integrated with Key Management Services (KMS) from Amazon Web Services (AWS) and Microsoft Azure. We evaluate FaaS platforms using Elastic Compute Cloud (EC2) instances and Azure Virtual Machines, specifically using burst optimized instance types to analyze performance under typical cloud workload patterns. The benchmark encompasses a comprehensive multi dimensional analysis spanning two CPU architectures (x86 64 and Arm64), six widely adopted programming languages (Rust, Go, Python, Java, C#, and TypeScript), multiple memory allocation configurations, and diverse instance types to capture the complex interplay between these factors. This study identifies optimal configurations for cryptographic workloads in FaaS environments, improving performance and cost efficiency while enabling secure and timely data protection for healthcare IoT applications.
94.6NIApr 10Code
Policy-Aware Edge LLM-RAG Framework for Internet of Battlefield Things Mission OrchestrationOm Solanki, Lopamudra Praharaj, Deepti Gupta et al.
Large Language Models (LLMs) offer a promising interface for intent-driven control of autonomous cyber-physical systems, but their direct use in mission-critical Internet of Battlefield Things (IoBT) environments raises significant safety, reliability, and policy-compliance concerns. This paper presents a Policy-Aware Large Language Model Retrieval-Augmented Generation (referred as PA-LLM-RAG), an edge-deployed LLM orchestration framework for IoBT mission control that integrates retrieval-augmented reasoning and independent command verification. The proposed PA-LLM-RAG framework combines a lightweight retrieval module that grounds decisions in operational policies and telemetry with a locally hosted LLM for mission planning and a secondary JudgeLLM for validating user generated commands prior to execution. To evaluate PA-LLM-RAG, we implement a simulated IoBT environment using RoboDK and assess four open-source LLMs across controlled mission scenarios of increasing complexity, including baseline operations, threat detection, coverage recovery, multi-event coordination, and policy-violation requests. Experimental results demonstrate that the framework effectively detects policy-violating commands while maintaining low-latency response suitable for edge deployment. Gemma-2B achieving the highest overall reliability with 4.17 sec latency and 100% success rate. The findings highlight a clear tradeoff between reasoning capacity and responsiveness across models and show that combining deterministic safeguards with JudgeLLM verification significantly improves reliability in LLM-driven IoBT orchestration.
CYApr 1, 2025
Towards Adaptive AI Governance: Comparative Insights from the U.S., EU, and AsiaVikram Kulothungan, Deepti Gupta
Artificial intelligence (AI) trends vary significantly across global regions, shaping the trajectory of innovation, regulation, and societal impact. This variation influences how different regions approach AI development, balancing technological progress with ethical and regulatory considerations. This study conducts a comparative analysis of AI trends in the United States (US), the European Union (EU), and Asia, focusing on three key dimensions: generative AI, ethical oversight, and industrial applications. The US prioritizes market-driven innovation with minimal regulatory constraints, the EU enforces a precautionary risk-based framework emphasizing ethical safeguards, and Asia employs state-guided AI strategies that balance rapid deployment with regulatory oversight. Although these approaches reflect different economic models and policy priorities, their divergence poses challenges to international collaboration, regulatory harmonization, and the development of global AI standards. To address these challenges, this paper synthesizes regional strengths to propose an adaptive AI governance framework that integrates risk-tiered oversight, innovation accelerators, and strategic alignment mechanisms. By bridging governance gaps, this study offers actionable insights for fostering responsible AI development while ensuring a balance between technological progress, ethical imperatives, and regulatory coherence.
68.0DCMar 13
LLM-HPC++: Evaluating LLM-Generated Modern C++ and MPI+OpenMP Codes for Scalable Mandelbrot Set ComputationPatrick Diehl, Noujoud Nader, Deepti Gupta
Parallel programming remains one of the most challenging aspects of High-Performance Computing (HPC), requiring deep knowledge of synchronization, communication, and memory models. While modern C++ standards and frameworks like OpenMP and MPI have simplified parallelism, mastering these paradigms is still complex. Recently, Large Language Models (LLMs) have shown promise in automating code generation, but their effectiveness in producing correct and efficient HPC code is not well understood. In this work, we systematically evaluate leading LLMs including ChatGPT 4 and 5, Claude, and LLaMA on the task of generating C++ implementations of the Mandelbrot set using shared-memory, directive-based, and distributed-memory paradigms. Each generated program is compiled and executed with GCC 11.5.0 to assess its correctness, robustness, and scalability. Results show that ChatGPT-4 and ChatGPT-5 achieve strong syntactic precision and scalable performance.
CRApr 30, 2024
PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance VerificationLeon Garza, Lavanya Elluri, Anantaa Kotal et al.
Data protection and privacy is becoming increasingly crucial in the digital era. Numerous companies depend on third-party vendors and service providers to carry out critical functions within their operations, encompassing tasks such as data handling and storage. However, this reliance introduces potential vulnerabilities, as these vendors' security measures and practices may not always align with the standards expected by regulatory bodies. Businesses are required, often under the penalty of law, to ensure compliance with the evolving regulatory rules. Interpreting and implementing these regulations pose challenges due to their complexity. Regulatory documents are extensive, demanding significant effort for interpretation, while vendor-drafted privacy policies often lack the detail required for full legal compliance, leading to ambiguity. To ensure a concise interpretation of the regulatory requirements and compliance of organizational privacy policy with said regulations, we propose a Large Language Model (LLM) and Semantic Web based approach for privacy compliance. In this paper, we develop the novel Privacy Policy Compliance Verification Knowledge Graph, PrivComp-KG. It is designed to efficiently store and retrieve comprehensive information concerning privacy policies, regulatory frameworks, and domain-specific knowledge pertaining to the legal landscape of privacy. Using Retrieval Augmented Generation, we identify the relevant sections in a privacy policy with corresponding regulatory rules. This information about individual privacy policies is populated into the PrivComp-KG. Combining this with the domain context and rules, the PrivComp-KG can be queried to check for compliance with privacy policies by each vendor against relevant policy regulations. We demonstrate the relevance of the PrivComp-KG, by verifying compliance of privacy policy documents for various organizations.
CLNov 27, 2025
Modeling Romanized Hindi and Bengali: Dataset Creation and Multilingual LLM IntegrationKanchon Gharami, Quazi Sarwar Muhtaseem, Deepti Gupta et al.
The development of robust transliteration techniques to enhance the effectiveness of transforming Romanized scripts into native scripts is crucial for Natural Language Processing tasks, including sentiment analysis, speech recognition, information retrieval, and intelligent personal assistants. Despite significant advancements, state-of-the-art multilingual models still face challenges in handling Romanized script, where the Roman alphabet is adopted to represent the phonetic structure of diverse languages. Within the South Asian context, where the use of Romanized script for Indo-Aryan languages is widespread across social media and digital communication platforms, such usage continues to pose significant challenges for cutting-edge multilingual models. While a limited number of transliteration datasets and models are available for Indo-Aryan languages, they generally lack sufficient diversity in pronunciation and spelling variations, adequate code-mixed data for large language model (LLM) training, and low-resource adaptation. To address this research gap, we introduce a novel transliteration dataset for two popular Indo-Aryan languages, Hindi and Bengali, which are ranked as the 3rd and 7th most spoken languages worldwide. Our dataset comprises nearly 1.8 million Hindi and 1 million Bengali transliteration pairs. In addition to that, we pre-train a custom multilingual seq2seq LLM based on Marian architecture using the developed dataset. Experimental results demonstrate significant improvements compared to existing relevant models in terms of BLEU and CER metrics.
SEAug 22, 2025
LLM-GUARD: Large Language Model-Based Detection and Repair of Bugs and Security Vulnerabilities in C++ and PythonAkshay Mhatre, Noujoud Nader, Patrick Diehl et al.
Large Language Models (LLMs) such as ChatGPT-4, Claude 3, and LLaMA 4 are increasingly embedded in software/application development, supporting tasks from code generation to debugging. Yet, their real-world effectiveness in detecting diverse software bugs, particularly complex, security-relevant vulnerabilities, remains underexplored. This study presents a systematic, empirical evaluation of these three leading LLMs using a benchmark of foundational programming errors, classic security flaws, and advanced, production-grade bugs in C++ and Python. The dataset integrates real code from SEED Labs, OpenSSL (via the Suresoft GLaDOS database), and PyBugHive, validated through local compilation and testing pipelines. A novel multi-stage, context-aware prompting protocol simulates realistic debugging scenarios, while a graded rubric measures detection accuracy, reasoning depth, and remediation quality. Our results show that all models excel at identifying syntactic and semantic issues in well-scoped code, making them promising for educational use and as first-pass reviewers in automated code auditing. Performance diminishes in scenarios involving complex security vulnerabilities and large-scale production code, with ChatGPT-4 and Claude 3 generally providing more nuanced contextual analyses than LLaMA 4. This highlights both the promise and the present constraints of LLMs in serving as reliable code analysis tools.
CYApr 1, 2025
AI Regulation and Capitalist Growth: Balancing Innovation, Ethics, and Global GovernanceVikram Kulothungan, Priya Ranjani Mohan, Deepti Gupta
Artificial Intelligence (AI) is increasingly central to economic growth, promising new efficiencies and markets. This economic significance has sparked debate over AI regulation: do rules and oversight bolster long term growth by building trust and safeguarding the public, or do they constrain innovation and free enterprise? This paper examines the balance between AI regulation and capitalist ideals, focusing on how different approaches to AI data privacy can impact innovation in AI-driven applications. The central question is whether AI regulation enhances or inhibits growth in a capitalist economy. Our analysis synthesizes historical precedents, the current U.S. regulatory landscape, economic projections, legal challenges, and case studies of recent AI policies. We discuss that carefully calibrated AI data privacy regulations-balancing innovation incentives with the public interest can foster sustainable growth by building trust and ensuring responsible data use, while excessive regulation may risk stifling innovation and entrenching incumbents.
DBJan 22, 2024
Declarative Privacy-Preserving Inference QueriesHong Guan, Ansh Tiwari, Summer Gautier et al.
Detecting inference queries running over personal attributes and protecting such queries from leaking individual information requires tremendous effort from practitioners. To tackle this problem, we propose an end-to-end workflow for automating privacy-preserving inference queries including the detection of subqueries that involve AI/ML model inferences on sensitive attributes. Our proposed novel declarative privacy-preserving workflow allows users to specify "what private information to protect" rather than "how to protect". Under the hood, the system automatically chooses privacy-preserving plans and hyper-parameters.
CRNov 24, 2021
Hierarchical Federated Learning based Anomaly Detection using Digital Twins for Smart HealthcareDeepti Gupta, Olumide Kayode, Smriti Bhatt et al.
Internet of Medical Things (IoMT) is becoming ubiquitous with a proliferation of smart medical devices and applications used in smart hospitals, smart-home based care, and nursing homes. It utilizes smart medical devices and cloud computing services along with core Internet of Things (IoT) technologies to sense patients' vital body parameters, monitor health conditions and generate multivariate data to support just-in-time health services. Mostly, this large amount of data is analyzed in centralized servers. Anomaly Detection (AD) in a centralized healthcare ecosystem is often plagued by significant delays in response time with high performance overhead. Moreover, there are inherent privacy issues associated with sending patients' personal health data to a centralized server, which may also introduce several security threats to the AD model, such as possibility of data poisoning. To overcome these issues with centralized AD models, here we propose a Federated Learning (FL) based AD model which utilizes edge cloudlets to run AD models locally without sharing patients' data. Since existing FL approaches perform aggregation on a single server which restricts the scope of FL, in this paper, we introduce a hierarchical FL that allows aggregation at different levels enabling multi-party collaboration. We introduce a novel disease-based grouping mechanism where different AD models are grouped based on specific types of diseases. Furthermore, we develop a new Federated Time Distributed (FedTimeDis) Long Short-Term Memory (LSTM) approach to train the AD model. We present a Remote Patient Monitoring (RPM) use case to demonstrate our model, and illustrate a proof-of-concept implementation using Digital Twin (DT) and edge cloudlets.
LGJun 22, 2021
Detecting Anomalous User Behavior in Remote Patient MonitoringDeepti Gupta, Maanak Gupta, Smriti Bhatt et al.
The growth in Remote Patient Monitoring (RPM) services using wearable and non-wearable Internet of Medical Things (IoMT) promises to improve the quality of diagnosis and facilitate timely treatment for a gamut of medical conditions. At the same time, the proliferation of IoMT devices increases the potential for malicious activities that can lead to catastrophic results including theft of personal information, data breach, and compromised medical devices, putting human lives at risk. IoMT devices generate tremendous amount of data that reflect user behavior patterns including both personal and day-to-day social activities along with daily routine health monitoring. In this context, there are possibilities of anomalies generated due to various reasons including unexpected user behavior, faulty sensor, or abnormal values from malicious/compromised devices. To address this problem, there is an imminent need to develop a framework for securing the smart health care infrastructure to identify and mitigate anomalies. In this paper, we present an anomaly detection model for RPM utilizing IoMT and smart home devices. We propose Hidden Markov Model (HMM) based anomaly detection that analyzes normal user behavior in the context of RPM comprising both smart home and smart health devices, and identifies anomalous user behavior. We design a testbed with multiple IoMT devices and home sensors to collect data and use the HMM model to train using network and user behavioral data. Proposed HMM based anomaly detection model achieved over 98% accuracy in identifying the anomalies in the context of RPM.
CRMar 28, 2021
Game Theory Based Privacy Preserving Approach for Collaborative Deep Learning in IoTDeepti Gupta, Smriti Bhatt, Paras Bhatt et al.
The exponential growth of Internet of Things (IoT) has become a transcending force in creating innovative smart devices and connected domains including smart homes, healthcare, transportation and manufacturing. With billions of IoT devices, there is a huge amount of data continuously being generated, transmitted, and stored at various points in the IoT architecture. Deep learning is widely being used in IoT applications to extract useful insights from IoT data. However, IoT users have security and privacy concerns and prefer not to share their personal data with third party applications or stakeholders. In order to address user privacy concerns, Collaborative Deep Learning (CDL) has been largely employed in data-driven applications which enables multiple IoT devices to train their models locally on edge gateways. In this chapter, we first discuss different types of deep learning approaches and how these approaches can be employed in the IoT domain. We present a privacy-preserving collaborative deep learning approach for IoT devices which can achieve benefits from other devices in the system. This learning approach is analyzed from the behavioral perspective of mobile edge devices using a game-theoretic model. We analyze the Nash Equilibrium in N-player static game model. We further present a novel fair collaboration strategy among edge IoT devices using cluster based approach to solve the CDL game, which enforces mobile edge devices for cooperation. We also present implementation details and evaluation analysis in a real-world smart home deployment.
CRFeb 1, 2021
Intelligent Network Layer for Cyber-Physical Systems SecurityRaj Chaganti, Deepti Gupta, Naga Vemprala
Cyber-Physical System (CPS) has made a tremendous progress in recent years and also disrupted many technical fields such as smart industries, smart health, smart transportation etc. to flourish the nations economy. However, CPS Security is still one of the concerns for wide adoption owing to high number of devices connecting to the internet and the traditional security solutions may not be suitable to protect the advanced, application specific attacks. This paper presents a programmable device network layer architecture to combat attacks and efficient network monitoring in heterogeneous environment CPS applications. We leverage Industrial control systems (ICS) to discuss the existing issues, highlighting the importance of advanced network layer for CPS. The programmable data plane language (P4) is introduced to detect well known HELLO Flood attack with minimal efforts in the network level and also used to featuring the potential solutions for security.
CRJul 30, 2020
Learner's Dilemma: IoT Devices Training Strategies in Collaborative Deep LearningDeepti Gupta, Olumide Kayode, Smriti Bhatt et al.
With the growth of Internet of Things (IoT) and mo-bile edge computing, billions of smart devices are interconnected to develop applications used in various domains including smart homes, healthcare and smart manufacturing. Deep learning has been extensively utilized in various IoT applications which require huge amount of data for model training. Due to privacy requirements, smart IoT devices do not release data to a remote third party for their use. To overcome this problem, collaborative approach to deep learning, also known as Collaborative DeepLearning (CDL) has been largely employed in data-driven applications. This approach enables multiple edge IoT devices to train their models locally on mobile edge devices. In this paper,we address IoT device training problem in CDL by analyzing the behavior of mobile edge devices using a game-theoretic model,where each mobile edge device aims at maximizing the accuracy of its local model at the same time limiting the overhead of participating in CDL. We analyze the Nash Equilibrium in anN-player static game model. We further present a novel cluster-based fair strategy to approximately solve the CDL game to enforce mobile edge devices for cooperation. Our experimental results and evaluation analysis in a real-world smart home deployment show that 80% mobile edge devices are ready to cooperate in CDL, while 20% of them do not train their local models collaboratively.