Salil Kanhere

CR
10papers
112citations
Novelty36%
AI Score41

10 Papers

CRMar 21, 2022
PublicCheck: Public Integrity Verification for Services of Run-time Deep Models

Shuo Wang, Sharif Abuadbba, Sidharth Agarwal et al.

Existing integrity verification approaches for deep models are designed for private verification (i.e., assuming the service provider is honest, with white-box access to model parameters). However, private verification approaches do not allow model users to verify the model at run-time. Instead, they must trust the service provider, who may tamper with the verification results. In contrast, a public verification approach that considers the possibility of dishonest service providers can benefit a wider range of users. In this paper, we propose PublicCheck, a practical public integrity verification solution for services of run-time deep models. PublicCheck considers dishonest service providers, and overcomes public verification challenges of being lightweight, providing anti-counterfeiting protection, and having fingerprinting samples that appear smooth. To capture and fingerprint the inherent prediction behaviors of a run-time model, PublicCheck generates smoothly transformed and augmented encysted samples that are enclosed around the model's decision boundary while ensuring that the verification queries are indistinguishable from normal queries. PublicCheck is also applicable when knowledge of the target model is limited (e.g., with no knowledge of gradients or model parameters). A thorough evaluation of PublicCheck demonstrates the strong capability for model integrity breach detection (100% detection accuracy with less than 10 black-box API queries) against various model integrity attacks and model compression attacks. PublicCheck also demonstrates the smooth appearance, feasibility, and efficiency of generating a plethora of encysted samples for fingerprinting.

41.2CRMay 29
GETA: Generalized Encrypted Traffic Analysis

Ransika Gunasekara, Rahat Masood, Salil Kanhere

Traditional traffic analysis is being fundamentally challenged by the rapid adoption of encryption, tunnelling, and privacy-preserving protocols, which increasingly obscure packet payloads and limit the usefulness of Deep Packet Inspection (DPI). Although machine learning has advanced encrypted traffic analysis, existing approaches often remain tied to protocol-specific header features, depend on large labelled datasets, and degrade when deployed across heterogeneous network environments. We present GETA, a protocol-agnostic framework for encrypted traffic analysis that models network flows as multivariate time series using only traffic metadata, thereby avoiding reliance on packet payloads or header semantics. GETA combines meta-learning, embedding refinement, and self-attention to support few-shot adaptation to previously unseen domains with minimal labelled data. Across nine public datasets spanning application identification, VPN traffic classification, IoT device fingerprinting, and attack detection, GETA consistently outperforms state-of-the-art baselines. These results show that GETA offers a practical and generalisable foundation for robust traffic analysis in modern encrypted networks.

CLMar 15, 2022
TSM: Measuring the Enticement of Honeyfiles with Natural Language Processing

Roelien C. Timmer, David Liebowitz, Surya Nepal et al.

Honeyfile deployment is a useful breach detection method in cyber deception that can also inform defenders about the intent and interests of intruders and malicious insiders. A key property of a honeyfile, enticement, is the extent to which the file can attract an intruder to interact with it. We introduce a novel metric, Topic Semantic Matching (TSM), which uses topic modelling to represent files in the repository and semantic matching in an embedding vector space to compare honeyfile text and topic words robustly. We also present a honeyfile corpus created with different Natural Language Processing (NLP) methods. Experiments show that TSM is effective in inter-corpus comparisons and is a promising tool to measure the enticement of honeyfiles. TSM is the first measure to use NLP techniques to quantify the enticement of honeyfile content that compares the essential topical content of local contexts to honeyfiles and is robust to paraphrasing.

CRSep 4, 2022
PhishClone: Measuring the Efficacy of Cloning Evasion Attacks

Arthur Wong, Alsharif Abuadbba, Mahathir Almashor et al.

Web-based phishing accounts for over 90% of data breaches, and most web-browsers and security vendors rely on machine-learning (ML) models as mitigation. Despite this, links posted regularly on anti-phishing aggregators such as PhishTank and VirusTotal are shown to easily bypass existing detectors. Prior art suggests that automated website cloning, with light mutations, is gaining traction with attackers. This has limited exposure in current literature and leads to sub-optimal ML-based countermeasures. The work herein conducts the first empirical study that compiles and evaluates a variety of state-of-the-art cloning techniques in wide circulation. We collected 13,394 samples and found 8,566 confirmed phishing pages targeting 4 popular websites using 7 distinct cloning mechanisms. These samples were replicated with malicious code removed within a controlled platform fortified with precautions that prevent accidental access. We then reported our sites to VirusTotal and other platforms, with regular polling of results for 7 days, to ascertain the efficacy of each cloning technique. Results show that no security vendor detected our clones, proving the urgent need for more effective detectors. Finally, we posit 4 recommendations to aid web developers and ML-based defences to alleviate the risks of cloning attacks.

41.1CLMar 31
TriageSim: A Conversational Emergency Triage Simulation Framework from Structured Electronic Health Records

Dipankar Srirag, Quoc Dung Nguyen, Aditya Joshi et al.

Research in emergency triage is restricted to structured electronic health records (EHR) due to regulatory constraints on nurse-patient interactions. We introduce TriageSim, a simulation framework for generating persona-conditioned triage conversations from structured records. TriageSim enables multi-turn nurse-patient interactions with explicit control over disfluency and decision behaviour, producing a corpus of ~800 synthetic transcripts and corresponding audio. We use a combination of automated analysis for linguistic, behavioural and acoustic fidelity alongside manual evaluation for medical fidelity using a random subset of 50 conversations. The utility of the generated corpus is examined via conversational triage classification. We observe modest agreement for acuity levels across three modalities: generated synthetic text, ASR transcripts, and direct audio inputs. The code, persona schemata and triage policy prompts for TriageSim will be available upon acceptance.

CRFeb 19, 2022
Device Identification in Blockchain-Based Internet of Things

Ali Dorri, Clemence Roulin, Shantanu Pal et al.

In recent years blockchain technology has received tremendous attention. Blockchain users are known by a changeable Public Key (PK) that introduces a level of anonymity, however, studies have shown that anonymized transactions can be linked to deanonymize the users. Most of the existing studies on user de-anonymization focus on monetary applications, however, blockchain has received extensive attention in non-monetary applications like IoT. In this paper we study the impact of de-anonymization on IoT-based blockchain. We populate a blockchain with data of smart home devices and then apply machine learning algorithms in an attempt to classify transactions to a particular device that in turn risks the privacy of the users. Two types of attack models are defined: (i) informed attacks: where attackers know the type of devices installed in a smart home, and (ii) blind attacks: where attackers do not have this information. We show that machine learning algorithms can successfully classify the transactions with 90% accuracy. To enhance the anonymity of the users, we introduce multiple obfuscation methods which include combining multiple packets into a transaction, merging ledgers of multiple devices, and delaying transactions. The implementation results show that these obfuscation methods significantly reduce the attack success rates to 20% to 30% and thus enhance user privacy.

CRAug 26, 2021
Blockchain in Supply Chain: Opportunities and Design Considerations

Gowri Sankar Ramachandran, Sidra Malik, Shantanu Pal et al.

Supply chain applications operate in a multi-stakeholder setting, demanding trust, provenance, and transparency. Blockchain technology provides mechanisms to establish a decentralized infrastructure involving multiple stakeholders. Such mechanisms make the blockchain technology ideal for multi-stakeholder supply chain applications. This chapter introduces the characteristics and requirements of the supply chain and explains how blockchain technology can meet the demands of supply chain applications. In particular, this chapter discusses how data and trust management can be established using blockchain technology. The importance of scalability and interoperability in a blockchain-based supply chain is highlighted to help the stakeholders make an informed decision. The chapter concludes by underscoring the design challenges and open opportunities in the blockchain-based supply chain domain.

CRApr 27, 2021
PrivChain: Provenance and Privacy Preservation in Blockchain enabled Supply Chains

Sidra Malik, Volkan Dedeoglu, Salil Kanhere et al.

Blockchain offers traceability and transparency to supply chain event data and hence can help overcome many challenges in supply chain management such as: data integrity, provenance and traceability. However, data privacy concerns such as the protection of trade secrets have hindered adoption of blockchain technology. Although consortium blockchains only allow authorised supply chain entities to read/write to the ledger, privacy preservation of trade secrets cannot be ascertained. In this work, we propose a privacy-preservation framework, PrivChain, to protect sensitive data on blockchain using zero knowledge proofs. PrivChain provides provenance and traceability without revealing any sensitive information to end-consumers or supply chain entities. Its novelty stems from: a) its ability to allow data owners to protect trade related information and instead provide proofs on the data, and b) an integrated incentive mechanism for entities providing valid proofs over provenance data. In particular, PrivChain uses Zero Knowledge Range Proofs (ZKRPs), an efficient variant of ZKPs, to provide origin information without disclosing the exact location of a supply chain product. Furthermore, the framework allows to compute proofs and commitments off-line, decoupling the computational overhead from blockchain. The proof verification process and incentive payment initiation are automated using blockchain transactions, smart contracts, and events. A proof of concept implementation on Hyperledger Fabric reveals a minimal overhead of using PrivChain for blockchain enabled supply chains.

CRFeb 23, 2019
Blockchain And The Future of the Internet: A Comprehensive Review

Fakhar ul Hassan, Anwaar Ali, Mohamed Rahouti et al.

Blockchain is challenging the status quo of the central trust infrastructure currently prevalent in the Internet towards a design principle that is underscored by decentralization, transparency, and trusted auditability. In ideal terms, blockchain advocates a decentralized, transparent, and more democratic version of the Internet. Essentially being a trusted and decentralized database, blockchain finds its applications in fields as varied as the energy sector, forestry, fisheries, mining, material recycling, air pollution monitoring, supply chain management, and their associated operations. In this paper, we present a survey of blockchain-based network applications. Our goal is to cover the evolution of blockchain-based systems that are trying to bring in a renaissance in the existing, mostly centralized, space of network applications. While re-imagining the space with blockchain, we highlight various common challenges, pitfalls, and shortcomings that can occur. Our aim is to make this work as a guiding reference manual for someone interested in shifting towards a blockchain-based solution for one's existing use case or automating one from the ground up.

CRDec 21, 2018
On the Activity Privacy of Blockchain for IoT

Ali Dorri, Clemence Roulin, Raja Jurdak et al.

Security is one of the fundamental challenges in the Internet of Things (IoT) due to the heterogeneity and resource constraints of the IoT devices. Device classification methods are employed to enhance the security of IoT by detecting unregistered devices or traffic patterns. In recent years, blockchain has received tremendous attention as a distributed trustless platform to enhance the security of IoT. Conventional device identification methods are not directly applicable in blockchain-based IoT as network layer packets are not stored in the blockchain. Moreover, the transactions are broadcast and thus have no destination IP address and contain a public key as the user identity, and are stored permanently in blockchain which can be read by any entity in the network. We show that device identification in blockchain introduces privacy risks as the malicious nodes can identify users' activity pattern by analyzing the temporal pattern of their transactions in the blockchain. We study the likelihood of classifying IoT devices by analyzing their information stored in the blockchain, which to the best of our knowledge, is the first work of its kind. We use a smart home as a representative IoT scenario. First, a blockchain is populated according to a real-world smart home traffic dataset. We then apply machine learning algorithms on the data stored in the blockchain to analyze the success rate of device classification, modeling both an informed and a blind attacker. Our results demonstrate success rates over 90\% in classifying devices. We propose three timestamp obfuscation methods, namely combining multiple packets into a single transaction, merging ledgers of multiple devices, and randomly delaying transactions, to reduce the success rate in classifying devices. The proposed timestamp obfuscation methods can reduce the classification success rates to as low as 20%.