Nicholas S. Kersting

LG
h-index3
3papers
2citations
Novelty53%
AI Score41

3 Papers

LGApr 30, 2024Code
Harmonic LLMs are Trustworthy

Nicholas S. Kersting, Mohammad Rahman, Suchismitha Vedala et al.

We introduce an intuitive method to test the robustness (stability and explainability) of any black-box LLM in real-time via its local deviation from harmoniticity, denoted as $γ$. To the best of our knowledge this is the first completely model-agnostic and unsupervised method of measuring the robustness of any given response from an LLM, based upon the model itself conforming to a purely mathematical standard. To show general application and immediacy of results, we measure $γ$ in 10 popular LLMs (ChatGPT, Claude-2.1, Claude3.0, GPT-4, GPT-4o, Smaug-72B, Mixtral-8x7B, Llama2-7B, Mistral-7B and MPT-7B) across thousands of queries in three objective domains: WebQA, ProgrammingQA, and TruthfulQA. Across all models and domains tested, human annotation confirms that $γ\to 0$ indicates trustworthiness, and conversely searching higher values of $γ$ easily exposes examples of hallucination, a fact that enables efficient adversarial prompt generation through stochastic gradient ascent in $γ$. The low-$γ$ leaders among the models in the respective domains are GPT-4o, GPT-4, and Smaug-72B, providing evidence that mid-size open-source models can win out against large commercial models.

26.3CLMay 6
Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

Nicholas S. Kersting, Vittorio Castelli, Chieh Ting Yeh et al.

We introduce the **Concept Field** of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the field by $ζ$, the mean absolute z-distance between the observed delta and the field's local Gaussian estimate. The score is black-box (no model internals), corpus-attributable (every score traces to nearby corpus sentences), and admits a direct probabilistic reading. We support the computation with the introduction of a **Vector Sequence Database (VSDB)** that stores embeddings together with sequence-position and next-delta metadata. We evaluate this approach on two large-scale settings: hallucination-style groundedness detection over the U.S. Code of Federal Regulations, and novelty detection over Project Gutenberg. Using controlled LLM-generated rewrites, Concept Fields achieve strong selective classification performance under a grounded / ungrounded / unsure triage policy, which unlike retrieval-centric baselines have similar coverage-risk behavior across both domains, supporting a probability-based interpretation that transfers across domains. We also sketch how divergence and curl of the Concept Field, computed on dense clusters, surface qualitatively meaningful semantic patterns (logic sources, sinks, and implicit topics), which we offer as hypothesis-generating rather than as a quantitative result. Concept Fields provide a fast, lightweight, and interpretable signal for groundedness and novelty, complementary to LLM-as-judge and white-box detectors.

LGApr 29, 2024
Harmonic Machine Learning Models are Robust

Nicholas S. Kersting, Yi Li, Aman Mohanty et al.

We introduce Harmonic Robustness, a powerful and intuitive method to test the robustness of any machine-learning model either during training or in black-box real-time inference monitoring without ground-truth labels. It is based on functional deviation from the harmonic mean value property, indicating instability and lack of explainability. We show implementation examples in low-dimensional trees and feedforward NNs, where the method reliably identifies overfitting, as well as in more complex high-dimensional models such as ResNet-50 and Vision Transformer where it efficiently measures adversarial vulnerability across image classes.