Juan M. Huerta

2papers

2 Papers

77.0CLMay 8
WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems

Juan M. Huerta

The LLM Wiki pattern, to compile and provide domain knowledge into a persistent artifact and serve it to LLMs via KV cache inference, promises context access at sub-second latency with zero retrieval failure. Realizing this requires solving the compilation gap: LLM compilation distilling raw documents into a wiki without catastrophically discarding critical facts. We characterize this gap across 17 RepLiQA domains (6,800 questions): we observe that full context KV cache inference outperforms RAG on curated knowledge (4.38 vs. 4.08 out of 5, 7.3 faster TTFT) but degrades below RAG at scale due to attention dilution, and blind compilation fails entirely (2.14 to 2.32 vs. 3.46, 53 to 60% catastrophic failure rate). To address the compilation gap, we propose WiCER (Wiki-memory Compile, Evaluate, Refine), an iterative algorithm inspired by counterexample-guided abstraction refinement (CEGAR) that closes this gap. WiCER evaluates compiled wikis against diagnostic probes, identifies dropped facts, and forces their preservation in subsequent compilations. One to two iterations recover 80% of lost quality (mean 3.24 vs. 3.47 for raw full-context across the 15 topics with baselines), reducing catastrophic failures by 55% relative. An ablation across all 17 topics confirms that targeted diagnosis (+0.95), not generic pinning (+0.16), drives the gains. All code and benchmarks are released for reproducible research.

CYSep 9, 2015
Accelerating News Integration in Automatic Knowledge Extraction Ecosystems: an API-first Outlook

Juan M. Huerta, Clancy Childs

Leveraging Application Programming Interfaces (APIs) has been widely acknowledged as a valuable approach to software and system design that have promoted the acceleration of products and services development by allowing the decoupling of interface design from service implementation details. Many organizations in the news and journalism industry have adopted and promoted this API oriented approach. In the first part of this paper, we provide a survey of the most significant recent work around traditional news and journalistic open APIs and how these have been influenced by and impacted the news product landscape. In the second part of the paper, we identify two disruptive technology trends that we believe will impact the role and value of news/journalism products in the future: API-first development methodologies, and the increased role of news-supported automatic knowledge extraction and analytic services. We anticipate that these two driving forces will create a new wave of adoption, open collaboration, standardization and overall progress in news content adoption in knowledge platforms. We provide a brief overview of our experience in this area at Dow Jones.