CL AIMar 29, 2025

LangVAE and LangSpace: Building and Probing for Language Model VAEs

Danilo S. Carvalho, Yingji Zhang, Harriet Unsworth, André Freitas

arXiv:2505.00004v12 citationsh-index: 7EMNLP

Originality Incremental advance

AI Analysis

This provides a modular system for researchers to build and analyze textual representations, though it appears incremental as it builds on existing VAE and LLM methods.

The authors tackled the problem of constructing and analyzing variational autoencoders (VAEs) on top of pre-trained large language models (LLMs) to create compact and semantically disentangled representations, resulting in a flexible and scalable framework with tools for probing and experimentation.

We present LangVAE, a novel framework for modular construction of variational autoencoders (VAEs) on top of pre-trained large language models (LLMs). Such language model VAEs can encode the knowledge of their pre-trained components into more compact and semantically disentangled representations. The representations obtained in this way can be analysed with the LangVAE companion framework: LangSpace, which implements a collection of probing methods, such as vector traversal and interpolation, disentanglement measures, and cluster visualisations. LangVAE and LangSpace offer a flexible, efficient and scalable way of building and analysing textual representations, with simple integration for models available on the HuggingFace Hub. Additionally, we conducted a set of experiments with different encoder and decoder combinations, as well as annotated inputs, revealing a wide range of interactions across architectural families and sizes w.r.t. generalisation and disentanglement. Our findings demonstrate a promising framework for systematising the experimentation and understanding of textual representations.

View on arXiv PDF

Similar