AIDec 31, 2025

GenZ: Foundational models as latent variable generators within traditional statistical models

arXiv:2512.24834v13.3h-index: 8

Originality Highly original

AI Analysis

This addresses the problem of improving prediction accuracy in specific domains like real estate and recommendations by making foundational models more adaptable to dataset-specific patterns.

The paper tackles the problem of foundational models failing to capture dataset-specific patterns for prediction tasks by introducing GenZ, a hybrid model that bridges foundational models and statistical modeling through interpretable semantic features. The approach achieves 12% median relative error on house price prediction (vs. 38% for a GPT-5 baseline) and predicts collaborative filtering representations with 0.59 cosine similarity from semantic descriptions alone.

We present GenZ, a hybrid model that bridges foundational models and statistical modeling through interpretable semantic features. While large language models possess broad domain knowledge, they often fail to capture dataset-specific patterns critical for prediction tasks. Our approach addresses this by discovering semantic feature descriptions through an iterative process that contrasts groups of items identified via statistical modeling errors, rather than relying solely on the foundational model's domain understanding. We formulate this as a generalized EM algorithm that jointly optimizes semantic feature descriptors and statistical model parameters. The method prompts a frozen foundational model to classify items based on discovered features, treating these judgments as noisy observations of latent binary features that predict real-valued targets through learned statistical relationships. We demonstrate the approach on two domains: house price prediction (hedonic regression) and cold-start collaborative filtering for movie recommendations. On house prices, our model achieves 12\% median relative error using discovered semantic features from multimodal listing data, substantially outperforming a GPT-5 baseline (38\% error) that relies on the LLM's general domain knowledge. For Netflix movie embeddings, our model predicts collaborative filtering representations with 0.59 cosine similarity purely from semantic descriptions -- matching the performance that would require approximately 4000 user ratings through traditional collaborative filtering. The discovered features reveal dataset-specific patterns (e.g., architectural details predicting local housing markets, franchise membership predicting user preferences) that diverge from the model's domain knowledge alone.

View on arXiv PDF

Similar