CVNov 29, 2022
Lightweight Structure-Aware Attention for Visual UnderstandingHeeseung Kwon, Francisco M. Castro, Manuel J. Marin-Jimenez et al.
Attention operator has been widely used as a basic brick in visual understanding since it provides some flexibility through its adjustable kernels. However, this operator suffers from inherent limitations: (1) the attention kernel is not discriminative enough, resulting in high redundancy, and (2) the complexity in computation and memory is quadratic in the sequence length. In this paper, we propose a novel attention operator, called Lightweight Structure-aware Attention (LiSA), which has a better representation power with log-linear complexity. Our operator transforms the attention kernels to be more discriminative by learning structural patterns. These structural patterns are encoded by exploiting a set of relative position embeddings (RPEs) as multiplicative weights, thereby improving the representation power of the attention kernels. Additionally, the RPEs are approximated to obtain log-linear complexity. Our experiments and analyses demonstrate that the proposed operator outperforms self-attention and other existing operators, achieving state-of-the-art results on ImageNet-1K and other downstream tasks such as video action recognition on Kinetics-400, object detection \& instance segmentation on COCO, and semantic segmentation on ADE-20K.
CVAug 31, 2020
iLGaCo: Incremental Learning of Gait Covariate FactorsZihao Mu, Francisco M. Castro, Manuel J. Marin-Jimenez et al.
Gait is a popular biometric pattern used for identifying people based on their way of walking. Traditionally, gait recognition approaches based on deep learning are trained using the whole training dataset. In fact, if new data (classes, view-points, walking conditions, etc.) need to be included, it is necessary to re-train again the model with old and new data samples. In this paper, we propose iLGaCo, the first incremental learning approach of covariate factors for gait recognition, where the deep model can be updated with new information without re-training it from scratch by using the whole dataset. Instead, our approach performs a shorter training process with the new data and a small subset of previous samples. This way, our model learns new information while retaining previous knowledge. We evaluate iLGaCo on CASIA-B dataset in two incremental ways: adding new view-points and adding new walking conditions. In both cases, our results are close to the classical `training-from-scratch' approach, obtaining a marginal drop in accuracy ranging from 0.2% to 1.2%, what shows the efficacy of our approach. In addition, the comparison of iLGaCo with other incremental learning methods, such as LwF and iCarl, shows a significant improvement in accuracy, between 6% and 15% depending on the experiment.