LLM-FS-Agent: A Deliberative Role-based Large Language Model Architecture for Transparent Feature Selection
This addresses the problem of transparent feature selection for machine learning practitioners in domains like cybersecurity, representing an incremental improvement over existing LLM-based approaches.
The paper tackles the challenge of interpretable feature selection in high-dimensional data by introducing LLM-FS-Agent, a multi-agent architecture that orchestrates a deliberative debate among LLM agents to evaluate feature relevance. Experimental results on a cybersecurity dataset show it achieves comparable classification performance while reducing downstream training time by an average of 46%.
High-dimensional data remains a pervasive challenge in machine learning, often undermining model interpretability and computational efficiency. While Large Language Models (LLMs) have shown promise for dimensionality reduction through feature selection, existing LLM-based approaches frequently lack structured reasoning and transparent justification for their decisions. This paper introduces LLM-FS-Agent, a novel multi-agent architecture designed for interpretable and robust feature selection. The system orchestrates a deliberative "debate" among multiple LLM agents, each assigned a specific role, enabling collective evaluation of feature relevance and generation of detailed justifications. We evaluate LLM-FS-Agent in the cybersecurity domain using the CIC-DIAD 2024 IoT intrusion detection dataset and compare its performance against strong baselines, including LLM-Select and traditional methods such as PCA. Experimental results demonstrate that LLM-FS-Agent consistently achieves superior or comparable classification performance while reducing downstream training time by an average of 46% (statistically significant improvement, p = 0.028 for XGBoost). These findings highlight that the proposed deliberative architecture enhances both decision transparency and computational efficiency, establishing LLM-FS-Agent as a practical and reliable solution for real-world applications.