Bridging KAN and MLP: MJKAN, a Hybrid Architecture with Both Efficiency and Expressiveness
This work addresses the problem of inefficient and underperforming neural network architectures for researchers and practitioners in machine learning, offering a flexible hybrid solution, though it is incremental as it builds on existing KAN and MLP methods.
The paper tackled the practical limitations of Kolmogorov-Arnold Networks (KANs), such as high computational costs and performance deficits, by proposing MJKAN, a hybrid architecture that integrates a FiLM-like mechanism with Radial Basis Function activations; the results show MJKAN achieves superior approximation in function regression tasks, significantly outperforming MLPs, with performance improving as the number of basis functions increases, while being competitive with MLPs in image and text classification but requiring careful tuning of basis size to prevent overfitting.
Kolmogorov-Arnold Networks (KANs) have garnered attention for replacing fixed activation functions with learnable univariate functions, but they exhibit practical limitations, including high computational costs and performance deficits in general classification tasks. In this paper, we propose the Modulation Joint KAN (MJKAN), a novel neural network layer designed to overcome these challenges. MJKAN integrates a FiLM (Feature-wise Linear Modulation)-like mechanism with Radial Basis Function (RBF) activations, creating a hybrid architecture that combines the non-linear expressive power of KANs with the efficiency of Multilayer Perceptrons (MLPs). We empirically validated MJKAN's performance across a diverse set of benchmarks, including function regression, image classification (MNIST, CIFAR-10/100), and natural language processing (AG News, SMS Spam). The results demonstrate that MJKAN achieves superior approximation capabilities in function regression tasks, significantly outperforming MLPs, with performance improving as the number of basis functions increases. Conversely, in image and text classification, its performance was competitive with MLPs but revealed a critical dependency on the number of basis functions. We found that a smaller basis size was crucial for better generalization, highlighting that the model's capacity must be carefully tuned to the complexity of the data to prevent overfitting. In conclusion, MJKAN offers a flexible architecture that inherits the theoretical advantages of KANs while improving computational efficiency and practical viability.