ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signals
This work addresses the need for flexible signal processing in industrial applications like acoustic and vibration data, though it appears incremental by building on existing foundation model concepts.
The authors tackled the problem of general machine signal modeling with arbitrary sampling rates by proposing ECHO, a foundation model that integrates band-split architecture and frequency positional embeddings, achieving state-of-the-art performance in anomaly detection and fault classification on various datasets.
Pre-trained foundation models have demonstrated remarkable success in audio, vision and language, yet their potential for general machine signal modeling with arbitrary sampling rates-covering acoustic, vibration, and other industrial sensor data-remains under-explored. In this work, we propose a novel foundation model ECHO that integrates an advanced band-split architecture with frequency positional embeddings, enabling spectral localization across arbitrary sampling configurations. Moreover, the model incorporates sliding patches to support inputs of variable length without padding or cropping, producing a concise embedding that retains both temporal and spectral fidelity and naturally extends to streaming scenarios. We evaluate our method on various kinds of machine signal datasets, including previous DCASE task 2 challenges (2020-2025), and widely-used industrial signal corpora. Experimental results demonstrate consistent state-of-the-art performance in machine signal anomaly detection and fault classification, confirming the effectiveness and generalization capability of the proposed model. We open-sourced ECHO on https://github.com/yucongzh/ECHO.