Abraham Itzhak Weinberg

h-index10

13papers

315citations

Novelty38%

AI Score50

Ranked #20,413 of 194,257 authors (top 11%)#385 in CR (top 6%)

13 Papers

10.4LGJun 7, 2022Code

SubStrat: A Subset-Based Strategy for Faster AutoML

Teddy Lazebnik, Amit Somech, Abraham Itzhak Weinberg

Automated machine learning (AutoML) frameworks have become important tools in the data scientists' arsenal, as they dramatically reduce the manual work devoted to the construction of ML pipelines. Such frameworks intelligently search among millions of possible ML pipelines - typically containing feature engineering, model selection and hyper parameters tuning steps - and finally output an optimal pipeline in terms of predictive accuracy. However, when the dataset is large, each individual configuration takes longer to execute, therefore the overall AutoML running times become increasingly high. To this end, we present SubStrat, an AutoML optimization strategy that tackles the data size, rather than configuration space. It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small yet representative data subset which preserves a particular characteristic of the full data. It then employs the AutoML tool on the small subset, and finally, it refines the resulted pipeline by executing a restricted, much shorter, AutoML process on the large dataset. Our experimental results, performed on two popular AutoML frameworks, Auto-Sklearn and TPOT, show that SubStrat reduces their running times by 79% (on average), with less than 2% average loss in the accuracy of the resulted ML pipeline.

5.7CRApr 27

ARCANE: Cross-Campaign Attacker Re-identification via Passive Beacon Telemetry -- A Bayesian Network Framework for Longitudinal Cyber Attribution

Abraham Itzhak Weinberg

Current cyber attribution approaches typically operate on a per-incident basis, leaving open whether aggregating evidence across campaigns improves adversary identification. We investigate whether cross-campaign attribution reduces ambiguity or whether structural limits persist under longitudinal data. We model adversary fingerprints as multi-dimensional feature vectors encoding behavioral, infrastructural, and temporal characteristics derived from covert beacon interactions. We introduce ARCANE (Attacker Re-identification via Cross-campaign Attribution Network), a probabilistic framework that aggregates passive telemetry across campaigns and organizations to construct persistent adversary fingerprints. These fingerprints are updated using a Bayesian belief network that integrates new evidence over time. A time-decayed confidence metric captures accumulated similarity across campaigns. Evaluation on a synthetic dataset of multiple threat profiles shows that intra-actor similarity consistently exceeds inter-actor similarity. However, separation between distinct actors remains limited due to shared operational practices among sophisticated adversaries. Results indicate that cross-campaign aggregation alone does not resolve attribution ambiguity. Performance is constrained by a structural ceiling in feature space, where inter-actor similarity remains high even without evasion. Attribution accuracy remains stable under increasing evasion, suggesting the main limitation is feature indistinguishability rather than adversarial adaptation. These findings highlight the need for additional signal classes, such as targeting patterns, temporal coordination, and infrastructure relationships, to improve attribution reliability.

5.8CRMay 4

PHANTOM: Polymorphic Honeytoken Adaptation with Narrative-Tailored Organisational Mimicry

Abraham Itzhak Weinberg

Honeytokens, decoy digital assets planted to detect and attribute unauthorised access, are a well-established primitive in cyber deception. Existing generation tools produce static, template-based tokens that lack organisational specificity and are identifiable by statistical, syntactic, and semantic analysis. We introduce PHANTOM (Polymorphic Honeytoken Adaptation with Narrative-Tailored Organisational Mimicry), a framework that generates contextually convincing honeytokens by encoding organisation-specific knowledge: domain names, service naming conventions, technology-stack idioms, and realistic secret-value distributions, into a multi-component generation pipeline. We formalise honeytoken quality through a four-component Believability Score that captures syntactic validity, semantic coherence, statistical plausibility, and human acceptance. We use this metric to evaluate PHANTOM across 8 token types and 4 organisational contexts against a template-based baseline. PHANTOM achieves B = 0.778 +/- 0.057 versus B = 0.576 +/- 0.058 for templates (Delta = +0.203, t = 14.07, p < 0.001, Cohen's d = 3.52). Human-evaluator acceptance rises from 6.2% to 100%, and detection resistance (DR = 1 - Pd) improves from 0.609 to 0.870 across three simulated scanner models (regex, entropy analysis, and ML classifier), each with p < 0.001. The semantic coherence gap (Delta Sc = +0.309, d = 4.52) is the dominant driver, confirming the hypothesis that organisational context is the critical missing ingredient in current approaches. All results are reproduced without external API calls, making the pipeline fully deployable in air-gapped environments.

7.8CRMay 13

CLOUDBURST: Cloud-Layer Observations Using Beacons for Unified Real-time Surveillance and Threat Attribution

Abraham Itzhak Weinberg

Modern cloud-native environments present a fundamentally different exfiltration threat surface than traditional file-based scenarios. Attackers targeting AWS, GCP, Azure, and OCI steal S3 presigned URLs, container images, Kubernetes secrets, Terraform state modules, and IAM role tokens -- artefacts that existing honeytoken and beacon frameworks do not address. We present \textbf{CLOUDBURST}, the first formal taxonomy and measurement framework for cloud-native passive beacons, comprising six vector classes across four major cloud providers. We introduce the \textit{Cloud Attribution Score} (CAS), a four-component metric that explicitly models ephemeral infrastructure penalty ($E_p$), IAM coverage depth ($I_c$), and multi-cloud correlation bonus ($M_b$) -- dimensions absent from all prior attribution quality metrics. Experiments across $21$ deployed beacons, $205$ simulated callbacks, and three attacker sophistication levels yield four principal findings. First, IAM Canary Roles achieve the highest CAS (mean $0.450$) and Detection Resistance (DR $= 0.873$), making them the most deployable vector. Second, S3 Presigned URLs achieve the highest detection resistance (DR $= 0.890$), surviving all three cloud-native scanner models (AWS Macie, Checkov/tfsec, Prisma Cloud/Wiz). Third, ephemeral infrastructure churn degrades CAS from $\approx 0.79$ at deployment to $\approx 0.18$--$0.22$ at $48$ hours for all vectors ($p < 0.001$), establishing the first quantitative model of attribution decay in containerised environments. Fourth, Serverless Function Triggers exhibit the worst detection resistance (DR $= 0.611$) due to their explicit outbound HTTP callback pattern, motivating covert callback channel design as future work. No significant CAS difference is observed across cloud providers ($H = 1.99$, $p = 0.57$), confirming that CLOUDBURST is provider-agnostic in its effectiveness.

2.3CRAug 8, 2024

The Role and Applications of Airport Digital Twin in Cyberattack Protection during the Generative AI Era

Abraham Itzhak Weinberg

In recent years, the threat facing airports from growing and increasingly sophisticated cyberattacks has become evident. Airports are considered a strategic national asset, so protecting them from attacks, specifically cyberattacks, is a crucial mission. One way to increase airports' security is by using Digital Twins (DTs). This paper shows and demonstrates how DTs can enhance the security mission. The integration of DTs with Generative AI (GenAI) algorithms can lead to synergy and new frontiers in fighting cyberattacks. The paper exemplifies ways to model cyberattack scenarios using simulations and generate synthetic data for testing defenses. It also discusses how DTs can be used as a crucial tool for vulnerability assessment by identifying weaknesses, prioritizing, and accelerating remediations in case of cyberattacks. Moreover, the paper demonstrates approaches for anomaly detection and threat hunting using Machine Learning (ML) and GenAI algorithms. Additionally, the paper provides impact prediction and recovery coordination methods that can be used by DT operators and stakeholders. It also introduces ways to harness the human factor by integrating training and simulation algorithms with Explainable AI (XAI) into the DT platforms. Lastly, the paper offers future applications and technologies that can be utilized in DT environments.

2.6LGMar 27, 2024

Quantum Algorithms: A New Frontier in Financial Crime Prevention

Abraham Itzhak Weinberg, Alessio Faccia

Financial crimes fast proliferation and sophistication require novel approaches that provide robust and effective solutions. This paper explores the potential of quantum algorithms in combating financial crimes. It highlights the advantages of quantum computing by examining traditional and Machine Learning (ML) techniques alongside quantum approaches. The study showcases advanced methodologies such as Quantum Machine Learning (QML) and Quantum Artificial Intelligence (QAI) as powerful solutions for detecting and preventing financial crimes, including money laundering, financial crime detection, cryptocurrency attacks, and market manipulation. These quantum approaches leverage the inherent computational capabilities of quantum computers to overcome limitations faced by classical methods. Furthermore, the paper illustrates how quantum computing can support enhanced financial risk management analysis. Financial institutions can improve their ability to identify and mitigate risks, leading to more robust risk management strategies by exploiting the quantum advantage. This research underscores the transformative impact of quantum algorithms on financial risk management. By embracing quantum technologies, organisations can enhance their capabilities to combat evolving threats and ensure the integrity and stability of financial systems.

9.6AIMar 17, 2024

Causality from Bottom to Top: A Survey

Abraham Itzhak Weinberg, Cristiano Premebida, Diego Resende Faria

Causality has become a fundamental approach for explaining the relationships between events, phenomena, and outcomes in various fields of study. It has invaded various fields and applications, such as medicine, healthcare, economics, finance, fraud detection, cybersecurity, education, public policy, recommender systems, anomaly detection, robotics, control, sociology, marketing, and advertising. In this paper, we survey its development over the past five decades, shedding light on the differences between causality and other approaches, as well as the preconditions for using it. Furthermore, the paper illustrates how causality interacts with new approaches such as Artificial Intelligence (AI), Generative AI (GAI), Machine and Deep Learning, Reinforcement Learning (RL), and Fuzzy Logic. We study the impact of causality on various fields, its contribution, and its interaction with state-of-the-art approaches. Additionally, the paper exemplifies the trustworthiness and explainability of causality models. We offer several ways to evaluate causality models and discuss future directions.

2.3CRNov 19, 2024

Transforming Triple-Entry Accounting with Machine Learning: A Path to Enhanced Transparency Through Analytics

Abraham Itzhak Weinberg, Alessio Faccia

Triple Entry (TE) is an accounting method that utilizes three accounts or 'entries' to record each transaction, rather than the conventional double-entry bookkeeping system. Existing studies have found that TE accounting, with its additional layer of verification and disclosure of inter-organizational relationships, could help improve transparency in complex financial and supply chain transactions such as blockchain. Machine learning (ML) presents a promising avenue to augment the transparency advantages of TE accounting. By automating some of the data collection and analysis needed for TE bookkeeping, ML techniques have the potential to make this more transparent accounting method scalable for large organizations with complex international supply chains, further enhancing the visibility and trustworthiness of financial reporting. By leveraging ML algorithms, anomalies within distributed ledger data can be swiftly identified, flagging potential instances of fraud or errors. Furthermore, by delving into transaction relationships over time, ML can untangle intricate webs of transactions, shedding light on obscured dealings and adding an investigative dimension. This paper aims to demonstrate the interaction between TE and ML and how they can leverage transparency levels.

1.9CLFeb 26, 2024

Generating Effective Ensembles for Sentiment Analysis

Itay Etelis, Avi Rosenfeld, Abraham Itzhak Weinberg et al.

In recent years, transformer models have revolutionized Natural Language Processing (NLP), achieving exceptional results across various tasks, including Sentiment Analysis (SA). As such, current state-of-the-art approaches for SA predominantly rely on transformer models alone, achieving impressive accuracy levels on benchmark datasets. In this paper, we show that the key for further improving the accuracy of such ensembles for SA is to include not only transformers, but also traditional NLP models, despite the inferiority of the latter compared to transformer models. However, as we empirically show, this necessitates a change in how the ensemble is constructed, specifically relying on the Hierarchical Ensemble Construction (HEC) algorithm we present. Our empirical studies across eight canonical SA datasets reveal that ensembles incorporating a mix of model types, structured via HEC, significantly outperform traditional ensembles. Finally, we provide a comparative analysis of the performance of the HEC and GPT-4, demonstrating that while GPT-4 closely approaches state-of-the-art SA methods, it remains outperformed by our proposed ensemble strategy.

7.0CRMay 18

Subtle Injection for Ground-truth Inference of LLM Training Data

Abraham Itzhak Weinberg

As large language models (LLMs) are increasingly trained on scraped web corpora without authorisation, content owners require forensic methods to prove that their documents were included in a model's training set. We propose \textbf{SIGIL} (\textbf{S}ubtle \textbf{I}njection for \textbf{G}round-truth \textbf{I}nference of \textbf{L}LM training data), a framework that embeds imperceptible \emph{canary sequences} into protected text and code such that any LLM trained on those documents exhibits statistically detectable behavioural signatures when probed with targeted queries. SIGIL defines five canary strategies -- lexical-rare, lexical-phrase, syntactic, semantic, and code-pattern -- and a \emph{Membership Inference Score} (MIS) grounded in the Neyman-Pearson hypothesis testing framework with formal false-positive rate (FPR) control. Simulator parameters are calibrated against the empirical membership inference literature, yielding realistic heterogeneous results across $36{,}000$ trials: overall AUC $= 0.892$, rising from $0.831$ at $0.1\%$ injection to $0.947$ at $10\%$. Detection rates range from $33\%$ to $78\%$ across model-size and injection-rate conditions. Code Pattern canaries achieve the highest AUC ($0.903$, Cohen's $d = 1.84$); Syntactic Structure the lowest ($0.875$, $d = 1.63$). All four experimental factors -- injection rate, model size, canary strategy, and mixture ratio -- have significant independent effects on MIS ($p < 0.001$). SIGIL maintains AUC $> 0.86$ even under $100\%$ paraphrasing ($\text{AUC} = 0.864$), confirming robustness through semantic leakage that survives surface-level rewriting.

1.2MANov 24, 2025

Trust-Based Social Learning for Communication (TSLEC) Protocol Evolution in Multi-Agent Reinforcement Learning

Abraham Itzhak Weinberg

Emergent communication in multi-agent systems typically occurs through independent learning, resulting in slow convergence and potentially suboptimal protocols. We introduce TSLEC (Trust-Based Social Learning with Emergent Communication), a framework where agents explicitly teach successful strategies to peers, with knowledge transfer modulated by learned trust relationships. Through experiments with 100 episodes across 30 random seeds, we demonstrate that trust-based social learning reduces episodes-to-convergence by 23.9% (p < 0.001, Cohen's d = 1.98) compared to independent emergence, while producing compositional protocols (C = 0.38) that remain robust under dynamic objectives (Phi > 0.867 decoding accuracy). Trust scores strongly correlate with teaching quality (r = 0.743, p < 0.001), enabling effective knowledge filtering. Our results establish that explicit social learning fundamentally accelerates emergent communication in multi-agent coordination.

3.4SEOct 22, 2025

A Framework for the Adoption and Integration of Generative AI in Midsize Organizations and Enterprises (FAIGMOE)

Abraham Itzhak Weinberg

Generative Artificial Intelligence (GenAI) presents transformative opportunities for organizations, yet both midsize organizations and larger enterprises face distinctive adoption challenges. Midsize organizations encounter resource constraints and limited AI expertise, while enterprises struggle with organizational complexity and coordination challenges. Existing technology adoption frameworks, including TAM (Technology Acceptance Model), TOE (Technology Organization Environment), and DOI (Diffusion of Innovations) theory, lack the specificity required for GenAI implementation across these diverse contexts, creating a critical gap in adoption literature. This paper introduces FAIGMOE (Framework for the Adoption and Integration of Generative AI in Midsize Organizations and Enterprises), a conceptual framework addressing the unique needs of both organizational types. FAIGMOE synthesizes technology adoption theory, organizational change management, and innovation diffusion perspectives into four interconnected phases: Strategic Assessment, Planning and Use Case Development, Implementation and Integration, and Operationalization and Optimization. Each phase provides scalable guidance on readiness assessment, strategic alignment, risk governance, technical architecture, and change management adaptable to organizational scale and complexity. The framework incorporates GenAI specific considerations including prompt engineering, model orchestration, and hallucination management that distinguish it from generic technology adoption frameworks. As a perspective contribution, FAIGMOE provides the first comprehensive conceptual framework explicitly addressing GenAI adoption across midsize and enterprise organizations, offering actionable implementation protocols, assessment instruments, and governance templates requiring empirical validation through future research.

3.6IRApr 9, 2025

Human-Oriented Image Retrieval System (HORSE): A Neuro-Symbolic Approach to Optimizing Retrieval of Previewed Images

Abraham Itzhak Weinberg

Image retrieval remains a challenging task due to the complex interaction between human visual perception, memory, and computational processes. Current image search engines often struggle to efficiently retrieve images based on natural language descriptions, as they rely on time-consuming preprocessing, tagging, and machine learning pipelines. This paper introduces the Human-Oriented Retrieval Search Engine for Images (HORSE), a novel approach that leverages neuro-symbolic indexing to improve image retrieval by focusing on human-oriented indexing. By integrating cognitive science insights with advanced computational techniques, HORSE enhances the retrieval process, making it more aligned with how humans perceive, store, and recall visual information. The neuro-symbolic framework combines the strengths of neural networks and symbolic reasoning, mitigating their individual limitations. The proposed system optimizes image retrieval, offering a more intuitive and efficient solution for users. We discuss the design and implementation of HORSE, highlight its potential applications in fields such as design error detection and knowledge management, and suggest future directions for research to further refine the system's metrics and capabilities.