4 Papers

CLMar 2
Semantic Novelty Trajectories in 80,000 Books: A Cross-Corpus Embedding Analysis

Fred Zimmerman

I apply Schmidhuber's compression progress theory of interestingness at corpus scale, analyzing semantic novelty trajectories in more than 80,000 books spanning two centuries of English-language publishing. Using sentence-transformer paragraph embeddings and a running-centroid novelty measure, I compare 28,730 pre-1920 Project Gutenberg books (PG19) against 52,796 modern English books (Books3, approximately 1990-2010). The principal findings are fourfold. First, mean paragraph-level novelty is roughly 10% higher in modern books (0.503 vs. 0.459). Second, trajectory circuitousness -- the ratio of cumulative path length to net displacement in embedding space -- nearly doubles in the modern corpus (+67%). Third, convergent narrative curves, in which novelty declines toward a settled semantic register, are 2.3x more common in pre-1920 literature. Fourth, novelty is orthogonal to reader quality ratings (r = -0.002), suggesting that interestingness in Schmidhuber's sense is structurally independent of perceived literary merit. Clustering paragraph-level trajectories via PAA-16 representations reveals eight distinct narrative-shape archetypes whose distribution shifts substantially between eras. All analysis code and an interactive exploration toolkit are publicly available at https://bigfivekiller.online/novelty_hub.

70.1CLApr 1
Narrative Fingerprints: Multi-Scale Author Identification via Novelty Curve Dynamics

Fred Zimmerman, Hilmar AI

We test whether authors have characteristic "fingerprints" in the information-theoretic novelty curves of their published works. Working with two corpora -- Books3 (52,796 books, 759 qualifying authors) and PG-19 (28,439 books, 1,821 qualifying authors) -- we find that authorial voice leaves measurable traces in how novelty unfolds across a text. The signal is multi-scale: at book level, scalar dynamics (mean novelty, speed, volume, circuitousness) identify 43% of authors significantly above chance; at chapter level, SAX motif patterns in sliding windows achieve 30x-above-chance attribution, far exceeding the scalar features that dominate at book level. These signals are complementary, not redundant. We show that the fingerprint is partly confounded with genre but persists within-genre for approximately one-quarter of authors. Classical authors (Twain, Austen, Kipling) show fingerprints comparable in strength to modern authors, suggesting the phenomenon is not an artifact of contemporary publishing conventions.

CYFeb 16
Synthetic Reader Panels: Tournament-Based Ideation with LLM Personas for Autonomous Publishing

Fred Zimmerman

We present a system for autonomous book ideation that replaces human focus groups with synthetic reader panels -- diverse collections of LLM-instantiated reader personas that evaluate book concepts through structured tournament competitions. Each persona is defined by demographic attributes (age group, gender, income, education, reading level), behavioral patterns (books per year, genre preferences, discovery methods, price sensitivity), and consistency parameters. Panels are composed per imprint to reflect target demographics, with diversity constraints ensuring representation across age, reading level, and genre affinity. Book concepts compete in single-elimination, double-elimination, round-robin, or Swiss-system tournaments, judged against weighted criteria including market appeal, originality, and execution potential. To reject low-quality LLM evaluations, we implement five automated anti-slop checks (repetitive phrasing, generic framing, circular reasoning, score clustering, audience mismatch). We report results from deployment within a multi-imprint publishing operation managing 6 active imprints and 609 titles in distribution. Three case studies -- a 270-evaluator panel for a children's literacy novel, and two 5-person expert panels for a military memoir and a naval strategy monograph -- demonstrate that synthetic panels produce actionable demographic segmentation, identify structural content issues invisible to homogeneous reviewers, and enable tournament filtering that eliminates low-quality concepts while enriching high-quality survivors from 15% to 62% of the evaluated pool.

SEOct 23, 2025
AI-Driven Development of a Publishing Imprint: Xynapse Traces

Fred Zimmerman

Xynapse Traces is an experimental publishing imprint created via a fusion of human and algorithmic methods using a configuration-driven architecture and a multi-model AI integration framework. The system achieved a remarkable 90% reduction in time-to-market (from a typical 6-12 months to just 2-4 weeks), with 80% cost reduction compared to traditional imprint development, while publishing 52 books in its first year and maintaining exceptional quality metrics, including 99% citation accuracy and 100% validation success after initial corrections. Key technical innovations include a continuous ideation pipeline with tournament-style evaluation, a novel codex design for transcriptive meditation practice, comprehensive automation spanning from ideation through production and distribution, and publisher personas that define and guide the imprint's mission. The system also integrates automated verification with human oversight, ensuring that gains in speed do not compromise publishing standards. This effort has significant implications for the future of book publishing, suggesting new paradigms for human-AI collaboration that democratize access to sophisticated publishing capabilities and make previously unviable niche markets accessible.