LG MLSep 7, 2020

Black Box to White Box: Discover Model Characteristics Based on Strategic Probing

Josh Kalin, Matthew Ciolino, David Noever, Gerry Dozier

arXiv:2009.03136v15.89 citations

Originality Incremental advance

AI Analysis

This addresses the problem of model transparency for security and auditing purposes, but it is incremental as it builds on existing probing techniques.

This paper tackles the problem of discovering model characteristics (architecture and training dataset) through strategic probing, using a structured set of input probes and model outputs to train a deep classifier. It demonstrates that datasets in image and text domains are distinguishable, but text transformer outputs show diversity, indicating further research is needed for architecture attribution in text.

In Machine Learning, White Box Adversarial Attacks rely on knowing underlying knowledge about the model attributes. This works focuses on discovering to distrinct pieces of model information: the underlying architecture and primary training dataset. With the process in this paper, a structured set of input probes and the output of the model become the training data for a deep classifier. Two subdomains in Machine Learning are explored: image based classifiers and text transformers with GPT-2. With image classification, the focus is on exploring commonly deployed architectures and datasets available in popular public libraries. Using a single transformer architecture with multiple levels of parameters, text generation is explored by fine tuning off different datasets. Each dataset explored in image and text are distinguishable from one another. Diversity in text transformer outputs implies further research is needed to successfully classify architecture attribution in text domain.

View on arXiv PDF

Similar