The Model Is Not the Product: A Dual-Pillar Architecture for Local-First Psychological Coaching
For developers of emotionally oriented AI applications, this work shows that a dual-pillar architecture (local runtime + memory corpus) can overcome the context window limitation of LLMs while ensuring privacy, but the results are incremental as the approach combines known techniques (RAG, local models).
Psych LM is an iOS app for psychological coaching that uses a local-first, retrieval-augmented architecture with an on-device language model and a structured memory corpus to achieve persistent context across sessions, demonstrating that complex context-aware interaction is feasible on mobile devices by prioritizing architectural control over model size.
Existing language model applications struggle to meet the demand for emotionally oriented support, primarily due to their inability to maintain deep, persistent context across sessions. This report introduces Psych LM, an iOS application that validates the thesis that, for such applications, the surrounding architecture is paramount. Psych LM runs a local, on-device language model within a purpose-built, local-first runtime designed for behavioral and life-coaching applications. The system achieves the practical effect of a near-infinite context window through an automated, user-inspectable memory corpus that converts conversations into structured memory cards, including facts, goals, and events, and dynamically injects them into the prompt via semantic and vector search. As such, the system can be defined as an active-learning, retrieval-augmented generative, on-device architecture. This architecture delivers four primary contributions: a local-first design where privacy is a core property; a detailed description of the memory corpus for persistent context of key user information; a deterministic orchestration layer that provides a stable behavioral spine independent of the model's internal state; and a benchmark framework focused on evaluating the integrated system's reliability under realistic operating conditions. The R and D process confirms that complex, context-aware interaction can be reliably achieved under the strict constraints of a mobile environment by prioritizing architectural control and resource management over simple model size.