Stolen Subwords: Importance of Vocabularies for Machine Translation Model Stealing
This work addresses model security for NLP practitioners, but it is incremental as it builds on existing knowledge distillation and stealing techniques.
The study investigated the role of subword vocabularies in machine translation model stealing, finding that vocabulary choice has minimal impact on local model performance, and demonstrated that the victim's vocabulary can be extracted with gray-box access.
In learning-based functionality stealing, the attacker is trying to build a local model based on the victim's outputs. The attacker has to make choices regarding the local model's architecture, optimization method and, specifically for NLP models, subword vocabulary, such as BPE. On the machine translation task, we explore (1) whether the choice of the vocabulary plays a role in model stealing scenarios and (2) if it is possible to extract the victim's vocabulary. We find that the vocabulary itself does not have a large effect on the local model's performance. Given gray-box model access, it is possible to collect the victim's vocabulary by collecting the outputs (detokenized subwords on the output). The results of the minimum effect of vocabulary choice are important more broadly for black-box knowledge distillation.