Andrew Kiruluta

LG
h-index3
32papers
38citations
Novelty58%
AI Score53

32 Papers

LGApr 17
Data-Driven Variational Basis Learning Beyond Neural Networks: A Non-Neural Framework for Adaptive Basis Discovery

Andrew Kiruluta

Classical representation systems such as Fourier series, wavelets, and fixed dictionaries provide analytically tractable basis expansions, but they are not intrinsically adapted to the empirical structure of modern high-dimensional data. Neural networks overcome this limitation by learning features from data, yet they do so through layered nonlinear parameterizations that often sacrifice interpretability, explicit control over basis structure, and mathematical transparency. In this manuscript we develop a non-neural alternative that learns basis functions directly from data through variational optimization. The proposed framework, termed Data Driven Variational Basis Learning (DVBL), treats basis atoms as primary optimization variables and learns them jointly with sample-specific coefficients and, when appropriate, a latent linear evolution operator. This yields a data-adaptive basis expansion that remains explicit, interpretable, and amenable to rigorous analysis. We formulate the model, establish existence of minimizers, prove blockwise descent properties for an alternating minimization algorithm, give conditions for coefficient recovery and basis identifiability, and show how manifold and dynamical regularization can be integrated without invoking neural architectures. We also discuss the conceptual novelty of the framework relative to classical dictionary learning, spectral methods, Koopman operator methods, and deep representation learning.

CLMar 22
Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models

Andrew Kiruluta

Large language models deliver strong generative performance but at the cost of massive parameter counts, memory use, and decoding latency. Prior work has shown that pruning and structured sparsity can preserve accuracy under substantial compression, while prompt-compression methods reduce latency by removing redundant input tokens. However, these two directions remain largely separate. Most model-compression methods are static and optimized offline, and they do not exploit the fact that different prompts and decoding steps activate different latent computational pathways. Prompt-compression methods reduce sequence length, but they do not adapt the executed model subnetwork. We propose a unified compressed-sensing-guided framework for dynamic LLM execution. Random measurement operators probe latent model usage, sparse recovery estimates task-conditioned and token-adaptive support sets, and the recovered supports are compiled into hardware-efficient sparse execution paths over blocks, attention heads, channels, and feed-forward substructures. The framework introduces five key contributions: task-conditioned measurements, so different prompts induce different sparse supports; token-adaptive recovery, so active substructures are re-estimated during decoding; formal sample-complexity bounds under restricted isometry or mutual incoherence assumptions; compile-to-hardware constraints that restrict recovery to GPU-efficient structures; and a joint objective that unifies prompt compression with model reduction. Together, these components recast LLM inference as a measurement-and-recovery problem with explicit approximation guarantees and deployment-oriented speedup constraints.

LGMar 13
From Gradients to Riccati Geometry: Kalman World Models for Single-Pass Learning

Andrew Kiruluta

Backpropagation dominates modern machine learning, yet it is not the only principled method for optimizing dynamical systems. We propose Kalman World Models (KWM), a class of learned state-space models trained via recursive Bayesian filtering rather than reverse-mode automatic differentiation. Instead of gradient descent updates, we replace parameter learning with Kalman-style gain adaptation. Training becomes online filtering; error signals become innovations. We further extend this framework to transformer-based large language models (LLMs), where internal activations are treated as latent dynamical states corrected via innovation terms. This yields a gradient-free training and adaptation paradigm grounded in control theory. We derive stability conditions, analyze computational complexity, and provide empirical results on sequence modeling tasks demonstrating competitive performance with improved robustness and continual adaptation properties.

CVAug 20, 2024
Hierarchical Attention Diffusion Networks with Object Priors for Video Change Detection

Andrew Kiruluta, Eric Lundy, Andreas Lemos

We present a unified change detection pipeline that combines instance level masking, multi\-scale attention within a denoising diffusion model, and per pixel semantic classification, all refined via SSIM to match human perception. By first isolating only temporally novel objects with Mask R\-CNN, then guiding diffusion updates through hierarchical cross attention to object and global contexts, and finally categorizing each pixel into one of C change types, our method delivers detailed, interpretable multi\-class maps. It outperforms traditional differencing, Siamese CNNs, and GAN\-based detectors by 10\-25 points in F1 and IoU on both synthetic and real world benchmarks, marking a new state of the art in remote sensing change detection.

LGJan 2
Filtering Beats Fine Tuning: A Bayesian Kalman View of In Context Learning in LLMs

Andrew Kiruluta

We present a theory-first framework that interprets inference-time adaptation in large language models (LLMs) as online Bayesian state estimation. Rather than modeling rapid adaptation as implicit optimization or meta-learning, we formulate task- and context-specific learning as the sequential inference of a low-dimensional latent adaptation state governed by a linearized state-space model. Under Gaussian assumptions, adaptation follows a Kalman recursion with closed-form updates for both the posterior mean and covariance. This perspective elevates epistemic uncertainty to an explicit dynamical variable. We show that inference-time learning is driven by covariance collapse, i.e., rapid contraction of posterior uncertainty induced by informative tokens, which typically precedes convergence of the posterior mean. Using observability conditions on token-level Jacobians, we establish stability of the Bayesian filter, prove exponential covariance contraction rates, and derive mean-square error bounds. Gradient descent, natural-gradient methods, and meta-learning updates arise as singular, noise-free limits of the filtering dynamics, positioning optimization-based adaptation as a degenerate approximation of Bayesian inference. The resulting theory provides a unified probabilistic account of in-context learning, parameter-efficient adaptation, and test-time learning without parameter updates. It yields explicit guarantees on stability and sample efficiency, offers a principled interpretation of prompt informativeness via information accumulation, and clarifies the role of uncertainty dynamics absent from existing accounts. Minimal illustrative experiments corroborate the qualitative predictions of the theory.

CLNov 2, 2025
Spectral Neuro-Symbolic Reasoning II: Semantic Node Merging, Entailment Filtering, and Knowledge Graph Alignment

Andrew Kiruluta, Priscilla Burity

This report extends the Spectral Neuro-Symbolic Reasoning (Spectral NSR) framework by introducing three semantically grounded enhancements: (1) transformer-based node merging using contextual embeddings (e.g., Sentence-BERT, SimCSE) to reduce redundancy, (2) sentence-level entailment validation with pretrained NLI classifiers (e.g., RoBERTa, DeBERTa) to improve edge quality, and (3) alignment with external knowledge graphs (e.g., ConceptNet, Wikidata) to augment missing context. These modifications enhance graph fidelity while preserving the core spectral reasoning pipeline. Experimental results on ProofWriter, EntailmentBank, and CLUTRR benchmarks show consistent accuracy gains (up to +3.8\%), improved generalization to adversarial cases, and reduced inference noise. The novelty lies in performing semantic and symbolic refinement entirely upstream of the spectral inference stage, enabling efficient, interpretable, and scalable reasoning without relying on quadratic attention mechanisms. In summary, this work extends the Spectral NSR framework with modular, semantically grounded preprocessing steps that improve graph quality without altering the core spectral reasoning engine. The result is a more robust, interpretable, and scalable reasoning system suitable for deployment in open-domain and real-world settings.

LGAug 19, 2024
Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique

Andrew Kiruluta, Andreas Lemos

This paper presents a novel hybrid approach that integrates linear programming (LP) within the loss function of an unsupervised machine learning model. By leveraging the strengths of both optimization techniques and machine learning, this method introduces a robust framework for solving complex optimization problems where traditional methods may fall short. The proposed approach encapsulates the constraints and objectives of a linear programming problem directly into the loss function, guiding the learning process to adhere to these constraints while optimizing the desired outcomes. This technique not only preserves the interpretability of linear programming but also benefits from the flexibility and adaptability of machine learning, making it particularly well-suited for unsupervised or semi-supervised learning scenarios.

LGApr 8, 2025
Learnable Multi-Scale Wavelet Transformer: A Novel Alternative to Self-Attention

Andrew Kiruluta, Priscilla Burity, Samantha Williams

Transformer architectures, underpinned by the self-attention mechanism, have achieved state-of-the-art results across numerous natural language processing (NLP) tasks by effectively modeling long-range dependencies. However, the computational complexity of self-attention, scaling quadratically with input sequence length, presents significant challenges for processing very long sequences or operating under resource constraints. This paper introduces the Learnable Multi-Scale Wavelet Transformer (LMWT), a novel architecture that replaces the standard dot-product self-attention with a learnable multi-scale Haar wavelet transform module. Leveraging the intrinsic multi-resolution properties of wavelets, the LMWT efficiently captures both local details and global context. Crucially, the parameters of the wavelet transform, including scale-specific coefficients, are learned end-to-end during training, allowing the model to adapt its decomposition strategy to the data and task. We present the detailed mathematical formulation of the learnable Haar wavelet module and its integration into the transformer framework, supplemented by an architectural diagram. We conduct a comprehensive experimental evaluation on a standard machine translation benchmark (WMT16 En-De), comparing the LMWT against a baseline self-attention transformer using metrics like BLEU score, perplexity, and token accuracy. Furthermore, we analyze the computational complexity, highlighting the linear scaling of our approach, discuss its novelty in the context of related work, and explore the interpretability offered by visualizing the learned Haar coefficients. Our results indicate that the LMWT achieves competitive performance while offering substantial computational advantages, positioning it as a promising and novel alternative for efficient sequence modeling.

CVApr 4, 2025
A Hybrid Wavelet-Fourier Method for Next-Generation Conditional Diffusion Models

Andrew Kiruluta, Andreas Lemos

We present a novel generative modeling framework,Wavelet-Fourier-Diffusion, which adapts the diffusion paradigm to hybrid frequency representations in order to synthesize high-quality, high-fidelity images with improved spatial localization. In contrast to conventional diffusion models that rely exclusively on additive noise in pixel space, our approach leverages a multi-transform that combines wavelet sub-band decomposition with partial Fourier steps. This strategy progressively degrades and then reconstructs images in a hybrid spectral domain during the forward and reverse diffusion processes. By supplementing traditional Fourier-based analysis with the spatial localization capabilities of wavelets, our model can capture both global structures and fine-grained features more effectively. We further extend the approach to conditional image generation by integrating embeddings or conditional features via cross-attention. Experimental evaluations on CIFAR-10, CelebA-HQ, and a conditional ImageNet subset illustrate that our method achieves competitive or superior performance relative to baseline diffusion models and state-of-the-art GANs, as measured by Fréchet Inception Distance (FID) and Inception Score (IS). We also show how the hybrid frequency-based representation improves control over global coherence and fine texture synthesis, paving the way for new directions in multi-scale generative modeling.

AIFeb 14, 2025
A Self-Supervised Reinforcement Learning Approach for Fine-Tuning Large Language Models Using Cross-Attention Signals

Andrew Kiruluta, Andreas Lemos, Priscilla Burity

We propose a novel reinforcement learning framework for post training large language models that does not rely on human in the loop feedback. Instead, our approach uses cross attention signals within the model itself to derive a self supervised reward, thereby guiding iterative fine tuning of the model policy. By analyzing how the model attends to the input prompt during generation, we construct measures of prompt coverage, focus, and coherence. We then use these measures to rank or score candidate responses, providing a reward signal that encourages the model to produce well aligned, on topic text. In empirical comparisons against standard policy gradient methods and RL fine tuning with synthetic preference models, our method shows significant gains in prompt relevance and consistency over a non RL baseline. While it does not yet match the performance of fully human supervised RLHF systems, it highlights an important direction for scaling alignment with minimal human labeling. We provide a detailed analysis, discuss potential limitations, and outline future work for combining cross-attention based signals with smaller amounts of human feedback.

CVApr 21, 2025
Spectral Dictionary Learning for Generative Image Modeling

Andrew Kiruluta

We propose a novel spectral generative model for image synthesis that departs radically from the common variational, adversarial, and diffusion paradigms. In our approach, images, after being flattened into one-dimensional signals, are reconstructed as linear combinations of a set of learned spectral basis functions, where each basis is explicitly parameterized in terms of frequency, phase, and amplitude. The model jointly learns a global spectral dictionary with time-varying modulations and per-image mixing coefficients that quantify the contributions of each spectral component. Subsequently, a simple probabilistic model is fitted to these mixing coefficients, enabling the deterministic generation of new images by sampling from the latent space. This framework leverages deterministic dictionary learning, offering a highly interpretable and physically meaningful representation compared to methods relying on stochastic inference or adversarial training. Moreover, the incorporation of frequency-domain loss functions, computed via the short-time Fourier transform (STFT), ensures that the synthesized images capture both global structure and fine-grained spectral details, such as texture and edge information. Experimental evaluations on the CIFAR-10 benchmark demonstrate that our approach not only achieves competitive performance in terms of reconstruction quality and perceptual fidelity but also offers improved training stability and computational efficiency. This new type of generative model opens up promising avenues for controlled synthesis, as the learned spectral dictionary affords a direct handle on the intrinsic frequency content of the images, thus providing enhanced interpretability and potential for novel applications in image manipulation and analysis.

LGMar 16, 2025
State Fourier Diffusion Language Model (SFDLM): A Scalable, Novel Iterative Approach to Language Modeling

Andrew Kiruluta, Andreas Lemos

In recent years, diffusion based methods have emerged as a powerful paradigm for generative modeling. Although discrete diffusion for natural language processing has been explored to a lesser extent, it shows promise for tasks requiring iterative denoising of token based data. In standard approaches to text generation, transformers dominate, but their reliance on self attention often incurs high computational costs. This paper introduces a fully diffusion driven discrete text generation model built without any transformer or large convolution modules. Instead, the model integrates structured state space dynamics in the time domain with a novel Complex Fourier Multi Layer Perceptron module that operates in the frequency domain. The forward noising process randomly samples the vocabulary to replace tokens with a controlled probability, while the learned reverse model systematically reverts corrupted sequences toward their original states. By composing local state space updates with global Fourier based mixing, the approach effectively captures both short and long range dependencies.

CVJun 22, 2025
From Pixels and Words to Waves: A Unified Framework for Spectral Dictionary vLLMs

Andrew Kiruluta, Priscilla Burity

Vision-language models (VLMs) unify computer vision and natural language processing in a single architecture capable of interpreting and describing images. Most state-of-the-art systems rely on two computationally intensive components: convolutions in the vision encoder and quadratic self-attention for multimodal fusion. This work removes both by introducing a spectral dictionary token mixer, which represents each image patch or wordpiece as a sparse combination of learnable frequency atoms. Our 1.1B-parameter prototype, SDict-VLM, achieves BLEU-4 of 39.2, CIDEr of 127.5, and SPICE of 27.0 on MS-COCO captioning, along with 50.3 percent accuracy on VQAv2. These results close approximately 85 percent of the performance gap to BLIP-2 while using 60 percent fewer parameters, 2.3 times less peak GPU memory, and 2.2 times faster inference than PaLI-3. To our knowledge, this is the first VLM to eliminate both convolutions and self-attention while matching mid-scale transformer baselines. In addition to its O(L log L) complexity, the shared frequency dictionary enables transparent cross-modal alignment and offers a tunable trade-off between accuracy and compute, paving the way for efficient and interpretable VLMs.

CLMay 9, 2025
Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition

Andrew Kiruluta, Eric Lundy, Priscilla Burity

Existing sequence to sequence models for structured language tasks rely heavily on the dot product self attention mechanism, which incurs quadratic complexity in both computation and memory for input length N. We introduce the Graph Wavelet Transformer (GWT), a novel architecture that replaces this bottleneck with a learnable, multi scale wavelet transform defined over an explicit graph Laplacian derived from syntactic or semantic parses. Our analysis shows that multi scale spectral decomposition offers an interpretable, efficient, and expressive alternative to quadratic self attention for graph structured sequence modeling.

LGMay 9, 2025
Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons

Andrew Kiruluta, Preethi Raju, Priscilla Burity

We present a novel non attention based architecture for large language models (LLMs) that efficiently handles very long context windows, on the order of hundreds of thousands to potentially millions of tokens. Unlike traditional Transformer designs, which suffer from quadratic memory and computation overload due to the nature of the self attention mechanism, our model avoids token to token attention entirely. Instead, it combines the following complementary components: State Space blocks (inspired by S4) that learn continuous time convolution kernels and scale near linearly with sequence length, Multi Resolution Convolution layers that capture local context at different dilation levels, a lightweight Recurrent Supervisor to maintain a global hidden state across sequential chunks, and Retrieval Augmented External Memory that stores and retrieves high-level chunk embeddings without reintroducing quadratic operations.

CLApr 29, 2025
From Attention to Atoms: Spectral Dictionary Learning for Fast, Interpretable Language Models

Andrew Kiruluta

We propose a novel spectral generative modeling framework for natural language processing that jointly learns a global time varying Fourier dictionary and per token mixing coefficients, replacing the ubiquitous self attention mechanism in transformer architectures. By enforcing reconstruction losses in both the time domain (embedding reconstruction) and the frequency domain (via Short Time Fourier Transform magnitude matching) alongside a standard language modeling objective, and fitting a Gaussian Mixture Model (GMM) prior over the learned mixing vectors, our approach achieves competitive perplexity and generation quality on standard benchmarks such as WikiText2 and Penn Treebank. In contrast to the quadratic computation complexity of self attention, our method operates with linear complexity, delivering substantial efficiency gains. We demonstrate that spectral dictionary models can achieve competitive performance compared to transformer baselines while significantly reducing inference latency and memory footprint, offering a compelling alternative for scalable language modeling.

CVApr 16, 2025
Wavelet-based Variational Autoencoders for High-Resolution Image Generation

Andrew Kiruluta

Variational Autoencoders (VAEs) are powerful generative models capable of learning compact latent representations. However, conventional VAEs often generate relatively blurry images due to their assumption of an isotropic Gaussian latent space and constraints in capturing high-frequency details. In this paper, we explore a novel wavelet-based approach (Wavelet-VAE) in which the latent space is constructed using multi-scale Haar wavelet coefficients. We propose a comprehensive method to encode the image features into multi-scale detail and approximation coefficients and introduce a learnable noise parameter to maintain stochasticity. We thoroughly discuss how to reformulate the reparameterization trick, address the KL divergence term, and integrate wavelet sparsity principles into the training objective. Our experimental evaluation on CIFAR-10 and other high-resolution datasets demonstrates that the Wavelet-VAE improves visual fidelity and recovers higher-resolution details compared to conventional VAEs. We conclude with a discussion of advantages, potential limitations, and future research directions for wavelet-based generative modeling.

LGJan 13
Spectral Generative Flow Models: A Physics-Inspired Replacement for Vectorized Large Language Models

Andrew Kiruluta

We introduce Spectral Generative Flow Models (SGFMs), a physics-inspired alternative to transformer-based large language models. Instead of representing text or video as sequences of discrete tokens processed by attention, SGFMs treat generation as the evolution of a continuous field governed by constrained stochastic dynamics in a multiscale wavelet basis. This formulation replaces global attention with local operators, spectral projections, and Navier--Stokes-like transport, yielding a generative mechanism grounded in continuity, geometry, and physical structure. Our framework provides three key innovations: (i) a field-theoretic ontology in which text and video are unified as trajectories of a stochastic partial differential equation; (ii) a wavelet-domain representation that induces sparsity, scale separation, and computational efficiency; and (iii) a constrained stochastic flow that enforces stability, coherence, and uncertainty propagation. Together, these components define a generative architecture that departs fundamentally from autoregressive modeling and diffusion-based approaches. SGFMs offer a principled path toward long-range coherence, multimodal generality, and physically structured inductive bias in next-generation generative models.

QUANT-PHNov 26, 2025
Quantum Circuit Reasoning Models: A Variational Framework for Differentiable Logical Inference

Andrew Kiruluta

This report introduces a novel class of reasoning architectures, termed Quantum Circuit Reasoning Models (QCRM), which extend the concept of Variational Quantum Circuits (VQC) from energy minimization and classification tasks to structured logical inference and reasoning. We posit that fundamental quantum mechanical operations, superposition, entanglement, interference, and measurement, naturally map to essential reasoning primitives such as hypothesis branching, constraint propagation, consistency enforcement, and decision making. The resulting framework combines quantum-inspired computation with differentiable optimization, enabling reasoning to emerge as a process of amplitude evolution and interference-driven selection of self-consistent states. We develop the mathematical foundation of QCRM, define its parameterized circuit architecture, and show how logical rules can be encoded as unitary transformations over proposition-qubit states. We further formalize a training objective grounded in classical gradient descent over circuit parameters and discuss simulation-based implementations on classical hardware. Finally, we propose the Quantum Reasoning Layer (QRL) as a differentiable hybrid component for composable reasoning models applicable to scientific, biomedical, and chemical inference domains.

AISep 7, 2025
From Eigenmodes to Proofs: Integrating Graph Spectral Operators with Symbolic Interpretable Reasoning

Andrew Kiruluta, Priscilla Burity

We introduce Spectral NSR, a fully spectral neuro-symbolic reasoning framework that embeds logical rules as spectral templates and performs inference directly in the graph spectral domain. By leveraging graph signal processing (GSP) and frequency-selective filters grounded in the Laplacian eigenstructure of knowledge graphs, the architecture unifies the interpretability of symbolic reasoning with the scalability and adaptability of spectral learning. Beyond the core formulation, we incorporate a comprehensive set of extensions, including dynamic graph and basis learning, rational and diffusion filters for sharper spectral selectivity, mixture-of-spectral-experts for modular specialization, proof-guided training with spectral curricula, and uncertainty quantification for calibrated confidence. Additional enhancements such as large language model coupling, co-spectral transfer alignment, adversarial robustness, efficient GPU kernels, generalized Laplacians, and causal interventions further expand the versatility of the framework. Empirical evaluation on state-of-the-art reasoning benchmarks such as ProofWriter and CLUTRR demonstrates that Spectral NSR achieves superior accuracy, faster inference, improved robustness to adversarial perturbations, and higher interpretability compared to leading baselines including transformers, message-passing neural networks, and neuro-symbolic logic programming systems. Spectral attribution and proof-band agreement analyses confirm that model decisions align closely with symbolic proof structures, while transfer experiments validate effective domain adaptation through co-spectral alignment. These results establish Spectral NSR as a scalable and principled foundation for the next generation of reasoning systems, offering transparency, robustness, and generalization beyond conventional approaches.

AIAug 19, 2025
A Fully Spectral Neuro-Symbolic Reasoning Architecture with Graph Signal Processing as the Computational Backbone

Andrew Kiruluta

We propose a fully spectral, neuro\-symbolic reasoning architecture that leverages Graph Signal Processing (GSP) as the primary computational backbone for integrating symbolic logic and neural inference. Unlike conventional reasoning models that treat spectral graph methods as peripheral components, our approach formulates the entire reasoning pipeline in the graph spectral domain. Logical entities and relationships are encoded as graph signals, processed via learnable spectral filters that control multi-scale information propagation, and mapped into symbolic predicates for rule-based inference. We present a complete mathematical framework for spectral reasoning, including graph Fourier transforms, band-selective attention, and spectral rule grounding. Experiments on benchmark reasoning datasets (ProofWriter, EntailmentBank, bAbI, CLUTRR, and ARC-Challenge) demonstrate improvements in logical consistency, interpretability, and computational efficiency over state\-of\-the\-art neuro\-symbolic models. Our results suggest that GSP provides a mathematically grounded and computationally efficient substrate for robust and interpretable reasoning systems.

AIAug 7, 2025
A Novel Architecture for Symbolic Reasoning with Decision Trees and LLM Agents

Andrew Kiruluta

We propose a hybrid architecture that integrates decision tree-based symbolic reasoning with the generative capabilities of large language models (LLMs) within a coordinated multi-agent framework. Unlike prior approaches that loosely couple symbolic and neural modules, our design embeds decision trees and random forests as callable oracles within a unified reasoning system. Tree-based modules enable interpretable rule inference and causal logic, while LLM agents handle abductive reasoning, generalization, and interactive planning. A central orchestrator maintains belief state consistency and mediates communication across agents and external tools, enabling reasoning over both structured and unstructured inputs. The system achieves strong performance on reasoning benchmarks. On \textit{ProofWriter}, it improves entailment consistency by +7.2\% through logic-grounded tree validation. On GSM8k, it achieves +5.3\% accuracy gains in multistep mathematical problems via symbolic augmentation. On \textit{ARC}, it boosts abstraction accuracy by +6.0\% through integration of symbolic oracles. Applications in clinical decision support and scientific discovery show how the system encodes domain rules symbolically while leveraging LLMs for contextual inference and hypothesis generation. This architecture offers a robust, interpretable, and extensible solution for general-purpose neuro-symbolic reasoning.

LGAug 5, 2025
Quantum Spectral Reasoning: A Non-Neural Architecture for Interpretable Machine Learning

Andrew Kiruluta

We propose a novel machine learning architecture that departs from conventional neural network paradigms by leveraging quantum spectral methods, specifically Pade approximants and the Lanczos algorithm, for interpretable signal analysis and symbolic reasoning. The core innovation of our approach lies in its ability to transform raw time-domain signals into sparse, physically meaningful spectral representations without the use of backpropagation, high-dimensional embeddings, or data-intensive black-box models. Through rational spectral approximation, the system extracts resonant structures that are then mapped into symbolic predicates via a kernel projection function, enabling logical inference through a rule-based reasoning engine. This architecture bridges mathematical physics, sparse approximation theory, and symbolic artificial intelligence, offering a transparent and physically grounded alternative to deep learning models. We develop the full mathematical formalism underlying each stage of the pipeline, provide a modular algorithmic implementation, and demonstrate the system's effectiveness through comparative evaluations on time-series anomaly detection, symbolic classification, and hybrid reasoning tasks. Our results show that this spectral-symbolic architecture achieves competitive accuracy while maintaining interpretability and data efficiency, suggesting a promising new direction for physically-informed, reasoning-capable machine learning.

LGJul 27, 2025
Beyond Neural Networks: Symbolic Reasoning over Wavelet Logic Graph Signals

Andrew Kiruluta, Andreas Lemos, Priscilla Burity

We present a fully non neural learning framework based on Graph Laplacian Wavelet Transforms (GLWT). Unlike traditional architectures that rely on convolutional, recurrent, or attention based neural networks, our model operates purely in the graph spectral domain using structured multiscale filtering, nonlinear shrinkage, and symbolic logic over wavelet coefficients. Signals defined on graph nodes are decomposed via GLWT, modulated with interpretable nonlinearities, and recombined for downstream tasks such as denoising and token classification. The system supports compositional reasoning through a symbolic domain-specific language (DSL) over graph wavelet activations. Experiments on synthetic graph denoising and linguistic token graphs demonstrate competitive performance against lightweight GNNs with far greater transparency and efficiency. This work proposes a principled, interpretable, and resource-efficient alternative to deep neural architectures for learning on graphs.

LGJul 27, 2025
Operator-Based Machine Intelligence: A Hilbert Space Framework for Spectral Learning and Symbolic Reasoning

Andrew Kiruluta, Andreas Lemos, Priscilla Burity

Traditional machine learning models, particularly neural networks, are rooted in finite-dimensional parameter spaces and nonlinear function approximations. This report explores an alternative formulation where learning tasks are expressed as sampling and computation in infinite dimensional Hilbert spaces, leveraging tools from functional analysis, signal processing, and spectral theory. We review foundational concepts such as Reproducing Kernel Hilbert Spaces (RKHS), spectral operator learning, and wavelet-domain representations. We present a rigorous mathematical formulation of learning in Hilbert spaces, highlight recent models based on scattering transforms and Koopman operators, and discuss advantages and limitations relative to conventional neural architectures. The report concludes by outlining directions for scalable and interpretable machine learning grounded in Hilbertian signal processing.

LGJul 18, 2025
Wavelet Logic Machines: Learning and Reasoning in the Spectral Domain Without Neural Networks

Andrew Kiruluta

We introduce a fully spectral learning framework that eliminates traditional neural layers by operating entirely in the wavelet domain. The model applies learnable nonlinear transformations, including soft-thresholding and gain-phase modulation, directly to wavelet coefficients. It also includes a differentiable wavelet basis selection mechanism, enabling adaptive processing using families such as Haar, Daubechies, and Biorthogonal wavelets. Implemented in PyTorch with full 3D support, the model maintains a spectral pipeline without spatial convolutions or attention. On synthetic 3D denoising and natural language tasks from the GLUE benchmark, including SST-2 sentiment classification, the model achieves 89.3 percent accuracy, close to a 4-layer Transformer baseline (90.1 percent), while using 72 percent fewer parameters and 58 percent less peak memory. Faster early convergence is observed due to spectral sparsity priors. In contrast to the quadratic complexity of self-attention and large matrix multiplications in Transformers, our approach uses linear-time wavelet transforms and pointwise nonlinearities, significantly reducing inference cost. This yields a compact, interpretable, and efficient alternative to neural models. Our results support the viability of principled spectral learning in both vision and language tasks, offering new directions for model design without overparameterized architectures.

CVJun 30, 2025
CS-VLM: Compressed Sensing Attention for Efficient Vision-Language Representation Learning

Andrew Kiruluta, Preethi Raju, Priscilla Burity

Vision-Language Models (vLLMs) have emerged as powerful architectures for joint reasoning over visual and textual inputs, enabling breakthroughs in image captioning, cross modal retrieval, and multimodal dialogue. However, as these models scale to longer video sequences and richer language descriptions, the quadratic complexity of the standard attention mechanism presents a fundamental computational bottleneck. This challenge is exacerbated in vLLMs, where attention must be computed not only within modalities but also across them, leading to prohibitive memory and latency costs. In this work, we introduce the Compressed Sensing Attention Transformer (CSAT), a novel architecture that reimagines attention computation through the lens of compressed sensing. By projecting high dimensional key and value representations into a lower-dimensional subspace via random measurement matrices and reconstructing the attention outputs using sparse recovery algorithms, CSAT significantly reduces attention complexity while maintaining semantic fidelity. Applied to vLLMs, CSAT exploits the inherent compressibility of both visual and textual representations especially evident in video, where temporal redundancy is high, and in language, where cross-modal grounding is often sparse. In contrast to LLMs, which must often model entangled symbolic dependencies, vLLMs benefit from structured sparsity in alignment and scene composition, making them particularly well-suited to compressed attention. We provide a formal mathematical treatment of CSAT, demonstrate its integration into vision language pipelines, and validate its performance on standard benchmarks, highlighting its promise as a scalable, interpretable, and resource efficient solution for next generation multimodal transformers.

CLJun 8, 2025
History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM

Andrew Kiruluta, Andreas Lemos, Priscilla Burity

We present CAGSR-vLLM-MTC, an extension of our Self-Supervised Cross-Attention-Guided Reinforcement (CAGSR) framework, now implemented on the high-performance vLLM runtime, to address both multi-turn dialogue and chain-of-thought reasoning. Building upon our original single-turn approach, we first instrumented vLLM's C++/CUDA kernels to asynchronously capture per-layer, per-head cross-attention weights during generation. We then generalized our self-supervised reward function to accumulate attention signals over entire conversation histories and intermediate chain-of-thought steps. We discuss practical trade-offs, including an entropy-based clamping mechanism to prevent attention collapse on early context, and outline future directions for multi-party dialogues and hierarchical reasoning.

LGJun 1, 2025
Adaptive Two Sided Laplace Transforms: A Learnable, Interpretable, and Scalable Replacement for Self-Attention

Andrew Kiruluta

We propose an innovative, learnable two-sided short-time Laplace transform (STLT) mechanism to supplant the traditional self attention in transformer-based LLMs. Our STLT introduces trainable parameters for each Laplace node, enabling end-to-end learning of decay rates , oscillatory frequencies, and window bandwidth T. This flexibility allows the model to dynamically adapt token relevance half lives and frequency responses during training. By selecting S learnable nodes and leveraging fast recursive convolution, we achieve an effective complexity of in time and memory. We further incorporate an efficient FFT-based computation of the relevance matrix and an adaptive node allocation mechanism to dynamically adjust the number of active Laplace nodes. Empirical results on language modeling (WikiText\-103, Project Gutenberg), machine translation (WMT'14 En\-De), and long document question answering (NarrativeQA) demonstrate that our learnable STLT achieves perplexities and scores on par with or better than existing efficient transformers while naturally extending to context lengths exceeding 100k tokens or more limited only by available hardware. Ablation studies confirm the importance of learnable parameters and adaptive node allocation. The proposed approach combines interpretability, through explicit decay and frequency parameters, with scalability and robustness, offering a pathway towards ultra-long-sequence language modeling without the computational bottleneck of self-attention.

LGMar 4, 2025
FourierNAT: A Fourier-Mixing-Based Non-Autoregressive Transformer for Parallel Sequence Generation

Andrew Kiruluta, Eric Lundy, Andreas Lemos

We present FourierNAT, a novel non-autoregressive Transformer (NAT) architecture that employs Fourier-based mixing in the decoder to generate output sequences in parallel. While traditional NAT approaches often face challenges with capturing global dependencies, our method leverages a discrete Fourier transform to mix token embeddings across the entire sequence dimension, coupled with learned frequency-domain gating. This allows the model to efficiently propagate context without explicit autoregressive steps. Empirically, FourierNAT achieves competitive results against leading NAT baselines on standard benchmarks like WMT machine translation and CNN/DailyMail summarization, providing significant speed advantages over autoregressive Transformers. We further demonstrate that learned frequency-domain parameters allow the model to adaptively focus on long-range or short-range dependencies, partially mitigating the well-known coherence gaps in one-pass NAT generation. Overall, FourierNAT highlights the potential of integrating spectral-domain operations to accelerate and improve parallel text generation. This approach can potentially provide great computational and time savings in inference tasks LLMs.

CLNov 25, 2021
Beyond Self Attention: A Subquadratic Fourier Wavelet Transformer with Multi Modal Fusion

Andrew Kiruluta, Andreas Lemos, Eric Lundy

We revisit the use of spectral techniques to replaces the attention mechanism in Transformers through Fourier Transform based token mixing, and present a comprehensive and novel reformulation of this technique in next generation transformer models. We provide expanded literature context, detailed mathematical formulations of Fourier mixing and causal masking, and introduce a novel MultiDomain Fourier Wavelet Attention(MDFWA) that integrates frequency and time localized transforms to capture both global and local dependencies efficiently. We derive the complexity bounds, gradient formulas, and show that MDFWA achieves sub quadratic time and memory cost while improving expressive power. We validate our design on an abstractive summarization task using PubMed dataset, by enhancing the proposed approach with learned frequency bases, adaptive scale selection, and multi-modal extensions.

CVDec 15, 2017
Reducing Deep Network Complexity via Sparse Hierarchical Fourier Interaction Networks

Andrew Kiruluta, Samantha Williams

This paper presents a Sparse Hierarchical Fourier Interaction Networks, an architectural building block that unifies three complementary principles of frequency domain modeling: A hierarchical patch wise Fourier transform that affords simultaneous access to local detail and global context; A learnable, differentiable top K masking mechanism which retains only the most informative spectral coefficients, thereby exploiting the natural compressibility of visual and linguistic signals.