Feature Encoding in Quantum Machine Learning: A Survey and Practical Guidelines
For practitioners in quantum machine learning, this work offers actionable guidelines for selecting encoding methods on noisy intermediate-scale quantum devices, addressing a critical bottleneck in the field.
This survey provides a systematic review of feature encoding methods in quantum machine learning, introducing a three-axis cost-expressivity-robustness taxonomy and a five-regime decision framework. The key finding is that for gate error rates above 10^-3, shallow angle-based encodings outperform amplitude encoding in practice, despite amplitude encoding's exponential qubit advantage.
The encoding of classical data into quantum states constitutes the primary performance bottleneck in Quantum Machine Learning (qml) on Noisy Intermediate-Scale Quantum (nisq) devices. No existing framework jointly characterises resource cost, expressivity, and noise robustness, nor provides actionable selection guidelines for practitioners. This survey addresses that gap through a systematic review of 66 primary works (2017-2026) assembled via a PRISMA-adapted protocol across five academic databases. Four principal contributions are made. First, a three-axis cost-expressivity-robustness taxonomy classifies all major encoding families - basis, angle, dense-angle, amplitude, data re-uploading, and IQP - along independently measurable axes. Second, closed-form depth-fidelity bounds under nisq decoherence channels identify the critical gate-error rate p* ~ 10^-3 below which amplitude encoding is viable. Third, a unified treatment of Fourier expressivity, barren-plateau onset, and quantum kernel concentration as functions of the encoding circuit provides the first joint trainability analysis. Fourth, a five-regime decision framework maps (D, n, p, tau) - feature dimension, qubit budget, error rate, and task type - to a hardware-grounded encoding recommendation. The central finding is that for p >= 10^-3, shallow angle-based encodings consistently outperform amplitude encoding in practice, despite the latter's exponential qubit advantage.