Tyler Bell

h-index13
2papers

2 Papers

CLMay 22, 2024Code
Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models

Tyler Bell, Avinash Mudireddy, Ivan Johnson-Eversoll et al.

We prove a new asymptotic un-equipartition property for the perplexity of long texts generated by a language model and present supporting experimental evidence from open-source models. Specifically we show that the logarithmic perplexity of any large text generated by a language model must asymptotically converge to the average entropy of its token distributions. This defines a ``typical set'' that all long synthetic texts generated by a language model must belong to. We refine the concept of ''typical set'' to include only grammatically correct texts. We then show that this refined typical set is a vanishingly small subset of all possible grammatically correct texts for a very general definition of grammar. This means that language models are strongly constrained in the range of their possible behaviors and outputs. We make no simplifying assumptions (such as stationarity) about the statistics of language model outputs, and therefore our results are directly applicable to practical real-world models without any approximations. We discuss possible applications of the typical set concept to problems such as detecting synthetic texts and membership inference in training datasets.

IVSep 2, 2020
Depth Range Reduction for 3D Range Geometry Compression

Matthew G. Finley, Tyler Bell

Three-dimensional (3D) shape measurement devices and techniques are being rapidly adopted within a variety of industries and applications. As acquiring 3D range data becomes faster and more accurate it becomes more challenging to efficiently store, transmit, or stream this data. One prevailing approach to compressing 3D range data is to encode it within the color channels of regular 2D images. This paper presents a novel method for reducing the depth range of a 3D geometry such that it can be stored within a 2D image using lower encoding frequencies (or a fewer number of encoding periods). This allows for smaller compressed file sizes to be achieved without a proportional increase in reconstruction errors. Further, as the proposed method occurs prior to encoding, it is readily compatible with a variety of existing image-based 3D range geometry compression methods.