Philip A. Whittington

7.3DSDec 19, 2024

Philip Whittington, Gregor Bachmann, Tiago Pimentel

In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $δ$ symbols by either finding a vocabulary directly (direct tokenisation), or selecting a sequence of merge operations (bottom-up tokenisation).

Philip A. Whittington

1 Paper