CVJun 16, 2023Code
Group Orthogonalization Regularization For Vision Models Adaptation and RobustnessYoav Kurtz, Noga Bar, Raja Giryes
As neural networks become deeper, the redundancy within their parameters increases. This phenomenon has led to several methods that attempt to reduce the correlation between convolutional filters. We propose a computationally efficient regularization technique that encourages orthonormality between groups of filters within the same layer. Our experiments show that when incorporated into recent adaptation methods for diffusion models and vision transformers (ViTs), this regularization improves performance on downstream tasks. We further show improved robustness when group orthogonality is enforced during adversarial training. Our code is available at https://github.com/YoavKurtz/GOR.
LGJan 12, 2025
ZOQO: Zero-Order Quantized OptimizationNoga Bar, Raja Giryes
The increasing computational and memory demands in deep learning present significant challenges, especially in resource-constrained environments. We introduce a zero-order quantized optimization (ZOQO) method designed for training models with quantized parameters and operations. Our approach leverages zero-order approximations of the gradient sign and adapts the learning process to maintain the parameters' quantization without the need for full-precision gradient calculations. We demonstrate the effectiveness of ZOQO through experiments in fine-tuning of large language models and black-box adversarial attacks. Despite the limitations of zero-order and quantized operations training, our method achieves competitive performance compared to full-precision methods, highlighting its potential for low-resource environments.
LGJun 3, 2024
Diverse Subset Selection via Norm-Based Sampling and OrthogonalityNoga Bar, Raja Giryes
Large annotated datasets are crucial for the success of deep neural networks, but labeling data can be prohibitively expensive in domains such as medical imaging. This work tackles the subset selection problem: selecting a small set of the most informative examples from a large unlabeled pool for annotation. We propose a simple and effective method that combines feature norms, randomization, and orthogonality (via the Gram-Schmidt process) to select diverse and informative samples. Feature norms serve as a proxy for informativeness, while randomization and orthogonalization reduce redundancy and encourage coverage of the feature space. Extensive experiments on image and text benchmarks, including CIFAR-10/100, Tiny ImageNet, ImageNet, OrganAMNIST, and Yelp, show that our method consistently improves subset selection performance, both as a standalone approach and when integrated with existing techniques.
LGMay 27, 2023
Pruning at Initialization -- A Sketching PerspectiveNoga Bar, Raja Giryes
The lottery ticket hypothesis (LTH) has increased attention to pruning neural networks at initialization. We study this problem in the linear setting. We show that finding a sparse mask at initialization is equivalent to the sketching problem introduced for efficient matrix multiplication. This gives us tools to analyze the LTH problem and gain insights into it. Specifically, using the mask found at initialization, we bound the approximation error of the pruned linear model at the end of training. We theoretically justify previous empirical evidence that the search for sparse networks may be data independent. By using the sketching perspective, we suggest a generic improvement to existing algorithms for pruning at initialization, which we show to be beneficial in the data-independent case.
LGFeb 24, 2021
Multiplicative Reweighting for Robust Neural Network OptimizationNoga Bar, Tomer Koren, Raja Giryes
Neural networks are widespread due to their powerful performance. Yet, they degrade in the presence of noisy labels at training time. Inspired by the setting of learning with expert advice, where multiplicative weights (MW) updates were recently shown to be robust to moderate data corruptions in expert advice, we propose to use MW for reweighting examples during neural networks optimization. We theoretically establish the convergence of our method when used with gradient descent and prove its advantages in 1d cases. We then validate empirically our findings for the general case by showing that MW improves neural networks' accuracy in the presence of label noise on CIFAR-10, CIFAR-100 and Clothing1M. We also show the impact of our approach on adversarial robustness.