LGMay 22, 2025

Omni TM-AE: A Scalable and Interpretable Embedding Model Using the Full Tsetlin Machine State Space

arXiv:2505.16386v11 citationsh-index: 33
Originality Incremental advance
AI Analysis

This addresses the need for interpretable and reusable embeddings in NLP, offering a scalable alternative to black-box models, though it appears incremental by building on existing Tsetlin Machine methods.

The paper tackles the problem of balancing performance, scalability, and interpretability in NLP embedding models by introducing Omni TM-AE, which fully utilizes the Tsetlin Machine's state space to create reusable, interpretable embeddings; experiments show it performs competitively or surpasses mainstream models in tasks like semantic similarity, sentiment classification, and document clustering.

The increasing complexity of large-scale language models has amplified concerns regarding their interpretability and reusability. While traditional embedding models like Word2Vec and GloVe offer scalability, they lack transparency and often behave as black boxes. Conversely, interpretable models such as the Tsetlin Machine (TM) have shown promise in constructing explainable learning systems, though they previously faced limitations in scalability and reusability. In this paper, we introduce Omni Tsetlin Machine AutoEncoder (Omni TM-AE), a novel embedding model that fully exploits the information contained in the TM's state matrix, including literals previously excluded from clause formation. This method enables the construction of reusable, interpretable embeddings through a single training phase. Extensive experiments across semantic similarity, sentiment classification, and document clustering tasks show that Omni TM-AE performs competitively with and often surpasses mainstream embedding models. These results demonstrate that it is possible to balance performance, scalability, and interpretability in modern Natural Language Processing (NLP) systems without resorting to opaque architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes