Brian Weber

CR
h-index45
9papers
371citations
Novelty27%
AI Score26

9 Papers

CRMar 14, 2022
Toward the Detection of Polyglot Files

Luke Koch, Sean Oesch, Mary Adkisson et al.

Standardized file formats play a key role in the development and use of computer software. However, it is possible to abuse standardized file formats by creating a file that is valid in multiple file formats. The resulting polyglot (many languages) file can confound file format identification, allowing elements of the file to evade analysis.This is especially problematic for malware detection systems that rely on file format identification for feature extraction. File format identification processes that depend on file signatures can be easily evaded thanks to flexibility in the format specifications of certain file formats. Although work has been done to identify file formats using more comprehensive methods than file signatures, accurate identification of polyglot files remains an open problem. Since malware detection systems routinely perform file format-specific feature extraction, polyglot files need to be filtered out prior to ingestion by these systems. Otherwise, malicious content could pass through undetected. To address the problem of polyglot detection we assembled a data set using the mitra tool. We then evaluated the performance of the most commonly used file identification tool, file. Finally, we demonstrated the accuracy, precision, recall and F1 score of a range of machine and deep learning models. Malconv2 and Catboost demonstrated the highest recall on our data set with 95.16% and 95.45%, respectively. These models can be incorporated into a malware detector's file processing pipeline to filter out potentially malicious polyglots before file format-dependent feature extraction takes place.

LGJan 24, 2025
Humanity's Last Exam

Long Phan, Alice Gatti, Ziwen Han et al. · amazon-science, apple-ml

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.

CRJul 1, 2024
On the Abuse and Detection of Polyglot Files

Luke Koch, Sean Oesch, Amul Chaulagain et al.

A polyglot is a file that is valid in two or more formats. Polyglot files pose a problem for malware detection systems that route files to format-specific detectors/signatures, as well as file upload and sanitization tools. In this work we found that existing file-format and embedded-file detection tools, even those developed specifically for polyglot files, fail to reliably detect polyglot files used in the wild, leaving organizations vulnerable to attack. To address this issue, we studied the use of polyglot files by malicious actors in the wild, finding $30$ polyglot samples and $15$ attack chains that leveraged polyglot files. In this report, we highlight two well-known APTs whose cyber attack chains relied on polyglot files to bypass detection mechanisms. Using knowledge from our survey of polyglot usage in the wild -- the first of its kind -- we created a novel data set based on adversary techniques. We then trained a machine learning detection solution, PolyConv, using this data set. PolyConv achieves a precision-recall area-under-curve score of $0.999$ with an F1 score of $99.20$% for polyglot detection and $99.47$% for file-format identification, significantly outperforming all other tools tested. We developed a content disarmament and reconstruction tool, ImSan, that successfully sanitized $100$% of the tested image-based polyglots, which were the most common type found via the survey. Our work provides concrete tools and suggestions to enable defenders to better defend themselves against polyglot files, as well as directions for future work to create more robust file specifications and methods of disarmament.

MES-HALLDec 21, 2023
Data needs and challenges for quantum dot devices automation

Justyna P. Zwolak, Jacob M. Taylor, Reed W. Andrews et al.

Gate-defined quantum dots are a promising candidate system for realizing scalable, coupled qubit systems and serving as a fundamental building block for quantum computers. However, present-day quantum dot devices suffer from imperfections that must be accounted for, which hinders the characterization, tuning, and operation process. Moreover, with an increasing number of quantum dot qubits, the relevant parameter space grows sufficiently to make heuristic control infeasible. Thus, it is imperative that reliable and scalable autonomous tuning approaches are developed. This meeting report outlines current challenges in automating quantum dot device tuning and operation with a particular focus on datasets, benchmarking, and standardization. We also present insights and ideas put forward by the quantum dot community on how to overcome them. We aim to provide guidance and inspiration to researchers invested in automation efforts.

MES-HALLDec 18, 2023
QDA$^2$: A principled approach to automatically annotating charge stability diagrams

Brian Weber, Justyna P. Zwolak

Gate-defined semiconductor quantum dot (QD) arrays are a promising platform for quantum computing. However, presently, the large configuration spaces and inherent noise make tuning of QD devices a nontrivial task and with the increasing number of QD qubits, the human-driven experimental control becomes unfeasible. Recently, researchers working with QD systems have begun putting considerable effort into automating device control, with a particular focus on machine-learning-driven methods. Yet, the reported performance statistics vary substantially in both the meaning and the type of devices used for testing. While systematic benchmarking of the proposed tuning methods is necessary for developing reliable and scalable tuning approaches, the lack of openly available standardized datasets of experimental data makes such testing impossible. The QD auto-annotator -- a classical algorithm for automatic interpretation and labeling of experimentally acquired data -- is a critical step toward rectifying this. QD auto-annotator leverages the principles of geometry to produce state labels for experimental double-QD charge stability diagrams and is a first step towards building a large public repository of labeled QD data.

CRApr 12, 2024
The Path To Autonomous Cyber Defense

Sean Oesch, Phillipe Austria, Amul Chaulagain et al.

Defenders are overwhelmed by the number and scale of attacks against their networks.This problem will only be exacerbated as attackers leverage artificial intelligence to automate their workflows. We propose a path to autonomous cyber agents able to augment defenders by automating critical steps in the cyber defense life cycle.

HCNov 30, 2021
A Mathematical Framework for Evaluation of SOAR Tools with Limited Survey Data

Savannah Norem, Ashley E Rice, Samantha Erwin et al.

Security operation centers (SOCs) all over the world are tasked with reacting to cybersecurity alerts ranging in severity. Security Orchestration, Automation, and Response (SOAR) tools streamline cybersecurity alert responses by SOC operators. SOAR tool adoption is expensive both in effort and finances. Hence, it is crucial to limit adoption to those most worthwhile; yet no research evaluating or comparing SOAR tools exists. The goal of this work is to evaluate several SOAR tools using specific criteria pertaining to their usability. SOC operators were asked to first complete a survey about what SOAR tool aspects are most important. Operators were then assigned a set of SOAR tools for which they viewed demonstration and overview videos, and then operators completed a second survey wherein they were tasked with evaluating each of the tools on the aspects from the first survey. In addition, operators provided an overall rating to each of their assigned tools, and provided a ranking of their tools in order of preference. Due to time constraints on SOC operators for thorough testing, we provide a systematic method of downselecting a large pool of SOAR tools to a select few that merit next-step hands-on evaluation by SOC operators. Furthermore, the analyses conducted in this survey help to inform future development of SOAR tools to ensure that the appropriate functions are available for use in a SOC.

CRDec 16, 2020
Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection

Robert A. Bridges, Sean Oesch, Miki E. Verma et al.

In this paper, we present a scientific evaluation of four prominent malware detection tools to assist an organization with two primary questions: To what extent do ML-based tools accurately classify previously- and never-before-seen files? Is it worth purchasing a network-level malware detector? To identify weaknesses, we tested each tool against 3,536 total files (2,554 or 72\% malicious, 982 or 28\% benign) of a variety of file types, including hundreds of malicious zero-days, polyglots, and APT-style files, delivered on multiple protocols. We present statistical results on detection time and accuracy, consider complementary analysis (using multiple tools together), and provide two novel applications of the recent cost-benefit evaluation procedure of Iannacone \& Bridges. While the ML-based tools are more effective at detecting zero-day files and executables, the signature-based tool may still be an overall better option. Both network-based tools provide substantial (simulated) savings when paired with either host tool, yet both show poor detection rates on protocols other than HTTP or SMTP. Our results show that all four tools have near-perfect precision but alarmingly low recall, especially on file types other than executables and office files -- 37% of malware tested, including all polyglot files, were undetected. Priorities for researchers and takeaways for end users are given.

CRNov 12, 2018
The SFS Summer Research Study at UMBC: Project-Based Learning Inspires Cybersecurity Students

Alan Sherman, Enis Golaszewski, Edward LaFemina et al.

May 30-June 2, 2017, Scholarship for Service (SFS) scholars at the University of Maryland, Baltimore County (UMBC) analyzed the security of a targeted aspect of the UMBC computer systems. During this hands-on study, with complete access to source code, students identified vulnerabilities, devised and implemented exploits, and suggested mitigations. As part of a pioneering program at UMBC to extend SFS scholarships to community colleges, the study helped initiate six students from two nearby community colleges, who transferred to UMBC in fall 2017 to complete their four-year degrees in computer science and information systems. The study examined the security of a set of "NetAdmin" custom scripts that enable UMBC faculty and staff to open the UMBC firewall to allow external access to machines they control for research purposes. Students discovered vulnerabilities stemming from weak architectural design, record overflow, and failure to sanitize inputs properly. For example, they implemented a record-overflow and code-injection exploit that exfiltrated the vital API key of the UMBC firewall. This report summarizes student activities and findings, and reflects on lessons learned for students, educators, and system administrators. Our students found the collaborative experience inspirational, students and educators appreciated the authentic case study, and IT administrators gained access to future employees and received free recommendations for improving the security of their systems. We hope that other universities can benefit from our motivational and educational strategy of teaming educators and system administrators to engage students in active project-based learning centering on focused questions about their university computer systems.