Scaled and Inter-token Relation Enhanced Transformer for Sample-restricted Residential NILM
This addresses a specific challenge in NILM for residential energy monitoring, offering an incremental improvement over existing transformer methods for small datasets.
The paper tackles the problem of training transformers on small datasets for Non-Intrusive Load Monitoring (NILM), where standard attention mechanisms over-smooth and prioritize intra-token relationships, reducing performance. The proposed architecture with inter-token relation enhancement and dynamic temperature tuning outperforms the original transformer and state-of-the-art models by 10-15% in F1 score on the REDD dataset.
Transformers have demonstrated exceptional performance across various domains due to their self-attention mechanism, which captures complex relationships in data. However, training on smaller datasets poses challenges, as standard attention mechanisms can over-smooth attention scores and overly prioritize intra-token relationships, reducing the capture of meaningful inter-token dependencies critical for tasks like Non-Intrusive Load Monitoring (NILM). To address this, we propose a novel transformer architecture with two key innovations: inter-token relation enhancement and dynamic temperature tuning. The inter-token relation enhancement mechanism removes diagonal entries in the similarity matrix to improve attention focus on inter-token relations. The dynamic temperature tuning mechanism, a learnable parameter, adapts attention sharpness during training, preventing over-smoothing and enhancing sensitivity to token relationships. We validate our method on the REDD dataset and show that it outperforms the original transformer and state-of-the-art models by 10-15\% in F1 score across various appliance types, demonstrating its efficacy for training on smaller datasets.