Nidhi Rastogi

h-index11

25papers

464citations

Novelty33%

AI Score51

Ranked #18,369 of 194,257 authors (top 9%)#327 in CR (top 5%)

25 Papers

13.5CRApr 8, 2022Code

CyNER: A Python Library for Cybersecurity Named Entity Recognition

Md Tanvirul Alam, Dipkamal Bhusal, Youngja Park et al.

Open Cyber threat intelligence (OpenCTI) information is available in an unstructured format from heterogeneous sources on the Internet. We present CyNER, an open-source python library for cybersecurity named entity recognition (NER). CyNER combines transformer-based models for extracting cybersecurity-related entities, heuristics for extracting different indicators of compromise, and publicly available NER models for generic entity types. We provide models trained on a diverse corpus that users can readily use. Events are described as classes in previous research - MALOnt2.0 (Christian et al., 2021) and MALOnt (Rastogi et al., 2020) and together extract a wide range of malware attack details from a threat intelligence corpus. The user can combine predictions from multiple different approaches to suit their needs. The library is made publicly available.

2.3CRSep 11, 2024Code

R+R: Revisiting Static Feature-Based Android Malware Detection using Machine Learning

Md Tanvirul Alam, Dipkamal Bhusal, Nidhi Rastogi

Static feature-based Android malware detection using machine learning (ML) remains critical due to its scalability and efficiency. However, existing approaches often overlook security-critical reproducibility concerns, such as dataset duplication, inadequate hyperparameter tuning, and variance from random initialization. This can significantly compromise the practical effectiveness of these systems. In this paper, we systematically investigate these challenges by proposing a more rigorous methodology for model selection and evaluation. Using two widely used datasets, Drebin and APIGraph, we evaluate six ML models of varying complexity under both offline and continuous active learning settings. Our analysis demonstrates that, contrary to popular belief, well-tuned, simpler models, particularly tree-based methods like XGBoost, consistently outperform more complex neural networks, especially when duplicates are removed. To promote transparency and reproducibility, we open-source our codebase, which is extensible for integrating new models and datasets, facilitating reproducible security research.

22.4CRNov 1, 2022Code

Looking Beyond IoCs: Automatically Extracting Attack Patterns from External CTI

Md Tanvirul Alam, Dipkamal Bhusal, Youngja Park et al.

Public and commercial organizations extensively share cyberthreat intelligence (CTI) to prepare systems to defend against existing and emerging cyberattacks. However, traditional CTI has primarily focused on tracking known threat indicators such as IP addresses and domain names, which may not provide long-term value in defending against evolving attacks. To address this challenge, we propose to use more robust threat intelligence signals called attack patterns. LADDER is a knowledge extraction framework that can extract text-based attack patterns from CTI reports at scale. The framework characterizes attack patterns by capturing the phases of an attack in Android and enterprise networks and systematically maps them to the MITRE ATT\&CK pattern framework. LADDER can be used by security analysts to determine the presence of attack vectors related to existing and emerging threats, enabling them to prepare defenses proactively. We also present several use cases to demonstrate the application of LADDER in real-world scenarios. Finally, we provide a new, open-access benchmark malware dataset to train future cyberthreat intelligence models.

12.0CRNov 3, 2025Code

AthenaBench: A Dynamic Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Md Tanvirul Alam, Dipkamal Bhusal, Salman Ahmad et al.

Large Language Models (LLMs) have demonstrated strong capabilities in natural language reasoning, yet their application to Cyber Threat Intelligence (CTI) remains limited. CTI analysis involves distilling large volumes of unstructured reports into actionable knowledge, a process where LLMs could substantially reduce analyst workload. CTIBench introduced a comprehensive benchmark for evaluating LLMs across multiple CTI tasks. In this work, we extend CTIBench by developing AthenaBench, an enhanced benchmark that includes an improved dataset creation pipeline, duplicate removal, refined evaluation metrics, and a new task focused on risk mitigation strategies. We evaluate twelve LLMs, including state-of-the-art proprietary models such as GPT-5 and Gemini-2.5 Pro, alongside seven open-source models from the LLaMA and Qwen families. While proprietary LLMs achieve stronger results overall, their performance remains subpar on reasoning-intensive tasks, such as threat actor attribution and risk mitigation, with open-source models trailing even further behind. These findings highlight fundamental limitations in the reasoning capabilities of current LLMs and underscore the need for models explicitly tailored to CTI workflows and automation.

14.4LGOct 30, 2025Code

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Md Tanvirul Alam, Nidhi Rastogi

Mathematical reasoning is a central challenge for large language models (LLMs), requiring not only correct answers but also faithful reasoning processes. Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising approach for enhancing such capabilities; however, its ability to foster genuine reasoning remains unclear. We investigate RLVR on two combinatorial problems with fully verifiable solutions: \emph{Activity Scheduling} and the \emph{Longest Increasing Subsequence}, using carefully curated datasets with unique optima. Across multiple reward designs, we find that RLVR improves evaluation metrics but often by reinforcing superficial heuristics rather than acquiring new reasoning strategies. These findings highlight the limits of RLVR generalization, emphasizing the importance of benchmarks that disentangle genuine mathematical reasoning from shortcut exploitation and provide faithful measures of progress. Code available at https://github.com/xashru/rlvr-seq-generalization.

2.9CRMar 4, 2022

Adversarial Patterns: Building Robust Android Malware Classifiers

Dipkamal Bhusal, Nidhi Rastogi

Machine learning models are increasingly being adopted across various fields, such as medicine, business, autonomous vehicles, and cybersecurity, to analyze vast amounts of data, detect patterns, and make predictions or recommendations. In the field of cybersecurity, these models have made significant improvements in malware detection. However, despite their ability to understand complex patterns from unstructured data, these models are susceptible to adversarial attacks that perform slight modifications in malware samples, leading to misclassification from malignant to benign. Numerous defense approaches have been proposed to either detect such adversarial attacks or improve model robustness. These approaches have resulted in a multitude of attack and defense techniques and the emergence of a field known as `adversarial machine learning.' In this survey paper, we provide a comprehensive review of adversarial machine learning in the context of Android malware classifiers. Android is the most widely used operating system globally and is an easy target for malicious agents. The paper first presents an extensive background on Android malware classifiers, followed by an examination of the latest advancements in adversarial attacks and defenses. Finally, the paper provides guidelines for designing robust malware classifiers and outlines research directions for the future.

7.0CRMar 1, 2022

Explaining RADAR features for detecting spoofing attacks in Connected Autonomous Vehicles

Nidhi Rastogi, Sara Rampazzi, Michael Clifford et al.

Connected autonomous vehicles (CAVs) are anticipated to have built-in AI systems for defending against cyberattacks. Machine learning (ML) models form the basis of many such AI systems. These models are notorious for acting like black boxes, transforming inputs into solutions with great accuracy, but no explanations support their decisions. Explanations are needed to communicate model performance, make decisions transparent, and establish trust in the models with stakeholders. Explanations can also indicate when humans must take control, for instance, when the ML model makes low confidence decisions or offers multiple or ambiguous alternatives. Explanations also provide evidence for post-incident forensic analysis. Research on explainable ML to security problems is limited, and more so concerning CAVs. This paper surfaces a critical yet under-researched sensor data \textit{uncertainty} problem for training ML attack detection models, especially in highly mobile and risk-averse platforms such as autonomous vehicles. We present a model that explains \textit{certainty} and \textit{uncertainty} in sensor input -- a missing characteristic in data collection. We hypothesize that model explanation is inaccurate for a given system without explainable input data quality. We estimate \textit{uncertainty} and mass functions for features in radar sensor data and incorporate them into the training model through experimental evaluation. The mass function allows the classifier to categorize all spoofed inputs accurately with an incorrect class label.

6.4CROct 31, 2025

Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation

Arnabh Borah, Md Tanvirul Alam, Nidhi Rastogi

Security applications are increasingly relying on large language models (LLMs) for cyber threat detection; however, their opaque reasoning often limits trust, particularly in decisions that require domain-specific cybersecurity knowledge. Because security threats evolve rapidly, LLMs must not only recall historical incidents but also adapt to emerging vulnerabilities and attack patterns. Retrieval-Augmented Generation (RAG) has demonstrated effectiveness in general LLM applications, but its potential for cybersecurity remains underexplored. In this work, we introduce a RAG-based framework designed to contextualize cybersecurity data and enhance LLM accuracy in knowledge retention and temporal reasoning. Using external datasets and the Llama-3-8B-Instruct model, we evaluate baseline RAG, an optimized hybrid retrieval approach, and conduct a comparative analysis across multiple performance metrics. Our findings highlight the promise of hybrid retrieval in strengthening the adaptability and reliability of LLMs for cybersecurity tasks.

14.7CRJun 30, 2024Code

Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models

Romy Fieblinger, Md Tanvirul Alam, Nidhi Rastogi

Cyber threats are constantly evolving. Extracting actionable insights from unstructured Cyber Threat Intelligence (CTI) data is essential to guide cybersecurity decisions. Increasingly, organizations like Microsoft, Trend Micro, and CrowdStrike are using generative AI to facilitate CTI extraction. This paper addresses the challenge of automating the extraction of actionable CTI using advancements in Large Language Models (LLMs) and Knowledge Graphs (KGs). We explore the application of state-of-the-art open-source LLMs, including the Llama 2 series, Mistral 7B Instruct, and Zephyr for extracting meaningful triples from CTI texts. Our methodology evaluates techniques such as prompt engineering, the guidance framework, and fine-tuning to optimize information extraction and structuring. The extracted data is then utilized to construct a KG, offering a structured and queryable representation of threat intelligence. Experimental results demonstrate the effectiveness of our approach in extracting relevant information, with guidance and fine-tuning showing superior performance over prompt engineering. However, while our methods prove effective in small-scale tests, applying LLMs to large-scale data for KG construction and Link Prediction presents ongoing challenges.

6.6CRFeb 10, 2021Code

Malware Knowledge Graph Generation

Sharmishtha Dutta, Nidhi Rastogi, Destin Yee et al.

Cyber threat and attack intelligence information are available in non-standard format from heterogeneous sources. Comprehending them and utilizing them for threat intelligence extraction requires engaging security experts. Knowledge graphs enable converting this unstructured information from heterogeneous sources into a structured representation of data and factual knowledge for several downstream tasks such as predicting missing information and future threat trends. Existing large-scale knowledge graphs mainly focus on general classes of entities and relationships between them. Open-source knowledge graphs for the security domain do not exist. To fill this gap, we've built \textsf{TINKER} - a knowledge graph for threat intelligence (\textbf{T}hreat \textbf{IN}telligence \textbf{K}nowl\textbf{E}dge g\textbf{R}aph). \textsf{TINKER} is generated using RDF triples describing entities and relations from tokenized unstructured natural language text from 83 threat reports published between 2006-2021. We built \textsf{TINKER} using classes and properties defined by open-source malware ontology and using hand-annotated RDF triples. We also discuss ongoing research and challenges faced while creating \textsf{TINKER}.

16.0CRFeb 10, 2021Code

TINKER: A framework for Open source Cyberthreat Intelligence

Nidhi Rastogi, Sharmishtha Dutta, Mohammed J. Zaki et al.

Threat intelligence on malware attacks and campaigns is increasingly being shared with other security experts for a cost or for free. Other security analysts use this intelligence to inform them of indicators of compromise, attack techniques, and preventative actions. Security analysts prepare threat analysis reports after investigating an attack, an emerging cyber threat, or a recently discovered vulnerability. Collectively known as cyber threat intelligence (CTI), the reports are typically in an unstructured format and, therefore, challenging to integrate seamlessly into existing intrusion detection systems. This paper proposes a framework that uses the aggregated CTI for analysis and defense at scale. The information is extracted and stored in a structured format using knowledge graphs such that the semantics of the threat intelligence can be preserved and shared at scale with other security analysts. Specifically, we propose the first semi-supervised open-source knowledge graph-based framework, TINKER, to capture cyber threat information and its context. Following TINKER, we generate a Cyberthreat Intelligence Knowledge Graph (CTI-KG) and demonstrate the usage using different use cases.

15.4CRJun 20, 2020Code

MALOnt: An Ontology for Malware Threat Intelligence

Nidhi Rastogi, Sharmishtha Dutta, Mohammed J. Zaki et al.

Malware threat intelligence uncovers deep information about malware, threat actors, and their tactics, Indicators of Compromise(IoC), and vulnerabilities in different platforms from scattered threat sources. This collective information can guide decision making in cyber defense applications utilized by security operation centers(SoCs). In this paper, we introduce an open-source malware ontology - MALOnt that allows the structured extraction of information and knowledge graph generation, especially for threat intelligence. The knowledge graph that uses MALOnt is instantiated from a corpus comprising hundreds of annotated malware threat reports. The knowledge graph enables the analysis, detection, classification, and attribution of cyber threats caused by malware. We also demonstrate the annotation process using MALOnt on exemplar threat intelligence reports. A work in progress, this research is part of a larger effort towards auto-generation of knowledge graphs (KGs)for gathering malware threat intelligence from heterogeneous online resources.

13.4LGJan 23, 2024

MORPH: Towards Automated Concept Drift Adaptation for Malware Detection

Md Tanvirul Alam, Romy Fieblinger, Ashim Mahara et al.

Concept drift is a significant challenge for malware detection, as the performance of trained machine learning models degrades over time, rendering them impractical. While prior research in malware concept drift adaptation has primarily focused on active learning, which involves selecting representative samples to update the model, self-training has emerged as a promising approach to mitigate concept drift. Self-training involves retraining the model using pseudo labels to adapt to shifting data distributions. In this research, we propose MORPH -- an effective pseudo-label-based concept drift adaptation method specifically designed for neural networks. Through extensive experimental analysis of Android and Windows malware datasets, we demonstrate the efficacy of our approach in mitigating the impact of concept drift. Our method offers the advantage of reducing annotation efforts when combined with active learning. Furthermore, our method significantly improves over existing works in automated concept drift adaptation for malware detection.

8.5CRApr 12, 2024Code

PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis

Dipkamal Bhusal, Md Tanvirul Alam, Monish K. Veerabhadran et al.

Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions. This susceptibility, combined with the black-box nature of such networks, limits their adoption in critical applications like autonomous driving. Feature-attribution-based explanation methods provide relevance of input features for model predictions on input samples, thus explaining model decisions. However, we observe that both model predictions and feature attributions for input samples are sensitive to noise. We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples. Our method, PASA, requires the computation of two test statistics using model prediction and feature attribution and can reliably detect adversarial samples using thresholds learned from benign samples. We validate our lightweight approach by evaluating the performance of PASA on varying strengths of FGSM, PGD, BIM, and CW attacks on multiple image and non-image datasets. On average, we outperform state-of-the-art statistical unsupervised adversarial detectors on CIFAR-10 and ImageNet by 14\% and 35\% ROC-AUC scores, respectively. Moreover, our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism.

6.2CVSep 23, 2025

Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence

Sanish Suwal, Dipkamal Bhusal, Michael Clifford et al.

Prior works have shown that neural networks can be heavily pruned while preserving performance, but the impact of pruning on model interpretability remains unclear. In this work, we investigate how magnitude-based pruning followed by fine-tuning affects both low-level saliency maps and high-level concept representations. Using a ResNet-18 trained on ImageNette, we compare post-hoc explanations from Vanilla Gradients (VG) and Integrated Gradients (IG) across pruning levels, evaluating sparsity and faithfulness. We further apply CRAFT-based concept extraction to track changes in semantic coherence of learned concepts. Our results show that light-to-moderate pruning improves saliency-map focus and faithfulness while retaining distinct, semantically meaningful concepts. In contrast, aggressive pruning merges heterogeneous features, reducing saliency map sparsity and concept coherence despite maintaining accuracy. These findings suggest that while pruning can shape internal representations toward more human-aligned attention patterns, excessive pruning undermines interpretability.

6.2CVNov 25, 2025

SPHINX: A Synthetic Environment for Visual Perception and Reasoning

Md Tanvirul Alam, Saksham Aggarwal, Justin Yang Chae et al.

We present Sphinx, a synthetic environment for visual perception and reasoning that targets core cognitive primitives. Sphinx procedurally generates puzzles using motifs, tiles, charts, icons, and geometric primitives, each paired with verifiable ground-truth solutions, enabling both precise evaluation and large-scale dataset construction. The benchmark covers 25 task types spanning symmetry detection, geometric transformations, spatial reasoning, chart interpretation, and sequence prediction. Evaluating recent large vision-language models (LVLMs) shows that even state-of-the-art GPT-5 attains only 51.1% accuracy, well below human performance. Finally, we demonstrate that reinforcement learning with verifiable rewards (RLVR) substantially improves model accuracy on these tasks and yields gains on external visual reasoning benchmarks, highlighting its promise for advancing multimodal reasoning.

3.6CVOct 5, 2025

Concept-Based Masking: A Patch-Agnostic Defense Against Adversarial Patch Attacks

Ayushi Mehrotra, Derek Peng, Dipkamal Bhusal et al.

Adversarial patch attacks pose a practical threat to deep learning models by forcing targeted misclassifications through localized perturbations, often realized in the physical world. Existing defenses typically assume prior knowledge of patch size or location, limiting their applicability. In this work, we propose a patch-agnostic defense that leverages concept-based explanations to identify and suppress the most influential concept activation vectors, thereby neutralizing patch effects without explicit detection. Evaluated on Imagenette with a ResNet-50, our method achieves higher robust and clean accuracy than the state-of-the-art PatchCleanser, while maintaining strong performance across varying patch sizes and locations. Our results highlight the promise of combining interpretability with robustness and suggest concept-driven defenses as a scalable strategy for securing machine learning models against adversarial patch attacks.

12.0CRMar 3, 2025

Too Much to Trust? Measuring the Security and Cognitive Impacts of Explainability in AI-Driven SOCs

Nidhi Rastogi, Shirid Pant, Devang Dhanuka et al.

Explainable AI (XAI) holds significant promise for enhancing the transparency and trustworthiness of AI-driven threat detection in Security Operations Centers (SOCs). However, identifying the appropriate level and format of explanation, particularly in environments that demand rapid decision-making under high-stakes conditions, remains a complex and underexplored challenge. To address this gap, we conducted a three-month mixed-methods study combining an online survey (N1=248) with in-depth interviews (N2=24) to examine (1) how SOC analysts conceptualize AI-generated explanations and (2) which types of explanations are perceived as actionable and trustworthy across different analyst roles. Our findings reveal that participants were consistently willing to accept XAI outputs, even in cases of lower predictive accuracy, when explanations were perceived as relevant and evidence-backed. Analysts repeatedly emphasized the importance of understanding the rationale behind AI decisions, expressing a strong preference for contextual depth over a mere presentation of outcomes on dashboards. Building on these insights, this study re-evaluates current explanation methods within security contexts and demonstrates that role-aware, context-rich XAI designs aligned with SOC workflows can substantially improve practical utility. Such tailored explainability enhances analyst comprehension, increases triage efficiency, and supports more confident responses to evolving threats.

28.5CRJun 11, 2024Code

CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Md Tanvirul Alam, Dipkamal Bhusal, Le Nguyen et al.

Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there are no benchmarks that address the practical and applied aspects of CTI-specific tasks. To bridge this gap, we introduce CTIBench, a benchmark designed to assess LLMs' performance in CTI applications. CTIBench includes multiple datasets focused on evaluating knowledge acquired by LLMs in the cyber-threat landscape. Our evaluation of several state-of-the-art models on these tasks provides insights into their strengths and weaknesses in CTI contexts, contributing to a better understanding of LLM capabilities in CTI.

10.7CRSep 3, 2021

Ontology-driven Knowledge Graph for Android Malware

Ryan Christian, Sharmishtha Dutta, Youngja Park et al.

We present MalONT2.0 -- an ontology for malware threat intelligence \cite{rastogi2020malont}. New classes (attack patterns, infrastructural resources to enable attacks, malware analysis to incorporate static analysis, and dynamic analysis of binaries) and relations have been added following a broadened scope of core competency questions. MalONT2.0 allows researchers to extensively capture all requisite classes and relations that gather semantic and syntactic characteristics of an android malware attack. This ontology forms the basis for the malware threat intelligence knowledge graph, MalKG, which we exemplify using three different, non-overlapping demonstrations. Malware features have been extracted from CTI reports on android threat intelligence shared on the Internet and written in the form of unstructured text. Some of these sources are blogs, threat intelligence reports, tweets, and news articles. The smallest unit of information that captures malware features is written as triples comprising head and tail entities, each connected with a relation. In the poster and demonstration, we discuss MalONT2.0, MalKG, as well as the dynamically growing knowledge graph, TINKER.

10.7CRFeb 10, 2021

DANTE: Predicting Insider Threat using LSTM on system logs

Nidhi Rastogi, Qicheng Ma

Insider threat is one of the most pernicious threat vectors to information and communication technologies (ICT)across the world due to the elevated level of trust and access that an insider is afforded. This type of threat can stem from both malicious users with a motive as well as negligent users who inadvertently reveal details about trade secrets, company information, or even access information to malignant players. In this paper, we propose a novel approach that uses system logs to detect insider behavior using a special recurrent neural network (RNN) model. Ground truth is established using DANTE and used as the baseline for identifying anomalous behavior. For this, system logs are modeled as a natural language sequence and patterns are extracted from these sequences. We create workflows of sequences of actions that follow a natural language logic and control flow. These flows are assigned various categories of behaviors - malignant or benign. Any deviation from these sequences indicates the presence of a threat. We further classify threats into one of the five categories provided in the CERT insider threat dataset. Through experimental evaluation, we show that the proposed model can achieve 99% prediction accuracy.

14.5AIMar 31, 2020

Personal Health Knowledge Graphs for Patients

Nidhi Rastogi, Mohammed J. Zaki

Existing patient data analytics platforms fail to incorporate information that has context, is personal, and topical to patients. For a recommendation system to give a suitable response to a query or to derive meaningful insights from patient data, it should consider personal information about the patient's health history, including but not limited to their preferences, locations, and life choices that are currently applicable to them. In this review paper, we critique existing literature in this space and also discuss the various research challenges that come with designing, building, and operationalizing a personal health knowledge graph (PHKG) for patients.

2.7CRApr 27, 2019

Exploring Information Centrality for Intrusion Detection in Large Networks

Nidhi Rastogi

Modern networked systems are constantly under threat from systemic attacks. There has been a massive upsurge in the number of devices connected to a network as well as the associated traffic volume. This has intensified the need to better understand all possible attack vectors during system design and implementation. Further, it has increased the need to mine large data sets, analyzing which has become a daunting task. It is critical to scale monitoring infrastructures to match this need, but a difficult goal for the small and medium organization. Hence, there is a need to propose novel approaches that address the big data problem in security. Information Centrality (IC) labels network nodes with better vantage points for detecting network-based anomalies as central nodes and uses them for detecting a category of attacks called systemic attacks. The main idea is that since these central nodes already see a lot of information flowing through the network, they are in a good position to detect anomalies before other nodes. This research first dives into the importance of using graphs in understanding the topology and information flow. We then introduce the usage of information centrality, a centrality-based index, to reduce data collection in existing communication networks. Using IC-identified central nodes can accelerate outlier detection when armed with a suitable anomaly detection technique. We also come up with a more efficient way to compute Information centrality for large networks. Finally, we demonstrate that central nodes detect anomalous behavior much faster than other non-central nodes, given the anomalous behavior is systemic in nature.

2.5CRJan 24, 2017

Graph Analytics for anomaly detection in homogeneous wireless networks - A Simulation Approach

Nidhi Rastogi, James Hendler

In the Internet of Things (IoT) devices are exposed to various kinds of attacks when connected to the Internet. An attack detection mechanism that understands the limitations of these severely resource-constrained devices is necessary. This is important since current approaches are either customized for wireless networks or for the conventional Internet with heavy data transmission. Also, the detection mechanism need not always be as sophisticated. Simply signaling that an attack is taking place may be enough in some situations, for example in NIDS using anomaly detection. In graph networks, central nodes are the nodes that bear the most influence in the network. The purpose of this research is to explore experimentally the relationship between the behavior of central nodes and anomaly detection when an attack spreads through a network. As a result, we propose a novel anomaly detection approach using this unique methodology which has been unexplored so far in communication networks. Also, in the experiment, we identify presence of an attack originating and propagating throughout a network of IoT using our methodology.

6.3CRJan 24, 2017

WhatsApp security and role of metadata in preserving privacy

Nidhi Rastogi, James Hendler

WhatsApp messenger is arguably the most popular mobile app available on all smart-phones. Over one billion people worldwide for free messaging, calling, and media sharing use it. In April 2016, WhatsApp switched to a default end-to-end encrypted service. This means that all messages (SMS), phone calls, videos, audios, and any other form of information exchanged cannot be read by any unauthorized entity since WhatsApp. In this paper we analyze the WhatsApp messaging platform and critique its security architecture along with a focus on its privacy preservation mechanisms. We report that the Signal Protocol, which forms the basis of WhatsApp end-to-end encryption, does offer protection against forward secrecy, and MITM to a large extent. Finally, we argue that simply encrypting the end-to-end channel cannot preserve privacy. The metadata can reveal just enough information to show connections between people, their patterns, and personal information. This paper elaborates on the security architecture of WhatsApp and performs an analysis on the various protocols used. This enlightens us on the status quo of the app security and what further measures can be used to fill existing gaps without compromising the usability. We start by describing the following (i) important concepts that need to be understood to properly understand security, (ii) the security architecture, (iii) security evaluation, (iv) followed by a summary of our work. Some of the important concepts that we cover in this paper before evaluating the architecture are - end-to-end encryption (E2EE), signal protocol, and curve25519. The description of the security architecture covers key management, end-to-end encryption in WhatsApp, Authentication Mechanism, Message Exchange, and finally the security evaluation. We then cover importance of metadata and role it plays in conserving privacy with respect to whatsapp.