Diana-Nicoleta Grigore

CV
h-index52
3papers
5citations
Novelty52%
AI Score38

3 Papers

CVApr 14, 2024Code
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers

Diana-Nicoleta Grigore, Mariana-Iuliana Georgescu, Jon Alvarez Justo et al.

Few-shot knowledge distillation recently emerged as a viable approach to harness the knowledge of large-scale pre-trained models, using limited data and computational resources. In this paper, we propose a novel few-shot feature distillation approach for vision transformers. Our approach is based on two key steps. Leveraging the fact that vision transformers have a consistent depth-wise structure, we first copy the weights from intermittent layers of existing pre-trained vision transformers (teachers) into shallower architectures (students), where the intermittence factor controls the complexity of the student transformer with respect to its teacher. Next, we employ an enhanced version of Low-Rank Adaptation (LoRA) to distill knowledge into the student in a few-shot scenario, aiming to recover the information processing carried out by the skipped teacher layers. We present comprehensive experiments with supervised and self-supervised transformers as teachers, on six data sets from various domains (natural, medical and satellite images) and tasks (classification and segmentation). The empirical results confirm the superiority of our approach over state-of-the-art competitors. Moreover, the ablation results demonstrate the usefulness of each component of the proposed pipeline. We release our code at https://github.com/dianagrigore/WeCoLoRA.

CVAug 5, 2025
SlotMatch: Distilling Object-Centric Representations for Unsupervised Video Segmentation

Diana-Nicoleta Grigore, Neelu Madan, Andreas Mogelmose et al.

Unsupervised video segmentation is a challenging computer vision task, especially due to the lack of supervisory signals coupled with the complexity of visual scenes. To overcome this challenge, state-of-the-art models based on slot attention often have to rely on large and computationally expensive neural architectures. To this end, we propose a simple knowledge distillation framework that effectively transfers object-centric representations to a lightweight student. The proposed framework, called SlotMatch, aligns corresponding teacher and student slots via the cosine similarity, requiring no additional distillation objectives or auxiliary supervision. The simplicity of SlotMatch is confirmed via theoretical and empirical evidence, both indicating that integrating additional losses is redundant. We conduct experiments on three datasets to compare the state-of-the-art teacher model, SlotContrast, with our distilled student. The results show that our student based on SlotMatch matches and even outperforms its teacher, while using 3.6x less parameters and running up to 2.7x faster. Moreover, our student surpasses all other state-of-the-art unsupervised video segmentation models.

CVFeb 14, 2022
Discriminability-enforcing loss to improve representation learning

Florinel-Alin Croitoru, Diana-Nicoleta Grigore, Radu Tudor Ionescu

During the training process, deep neural networks implicitly learn to represent the input data samples through a hierarchy of features, where the size of the hierarchy is determined by the number of layers. In this paper, we focus on enforcing the discriminative power of the high-level representations, that are typically learned by the deeper layers (closer to the output). To this end, we introduce a new loss term inspired by the Gini impurity, which is aimed at minimizing the entropy (increasing the discriminative power) of individual high-level features with respect to the class labels. Although our Gini loss induces highly-discriminative features, it does not ensure that the distribution of the high-level features matches the distribution of the classes. As such, we introduce another loss term to minimize the Kullback-Leibler divergence between the two distributions. We conduct experiments on two image classification data sets (CIFAR-100 and Caltech 101), considering multiple neural architectures ranging from convolutional networks (ResNet-17, ResNet-18, ResNet-50) to transformers (CvT). Our empirical results show that integrating our novel loss terms into the training objective consistently outperforms the models trained with cross-entropy alone, without increasing the inference time at all.