LGMay 28
Improving Relative Representations with Learned Anchors and Whitened Inner ProductsOscar Thorsted Svendsen, Nikolaj Holst Jakobsen, Fabian Mager et al.
Independently trained neural models typically converge to incompatible latent representations, creating a fundamental barrier to highly modular AI systems. While Relative Representations (RR) address this by mapping absolute coordinates to a shared space defined by similarities to common anchor points, traditional implementations rely on randomly sampled anchors and cosine similarity, which frequently fail to capture the anisotropic geometries of modern architectures like Transformers. In this work, we propose a robust framework for cross-model communication based on two improvements. We learn anchors as robust semantic prototypes and utilize a geometry-aware similarity metric which preserves discriminative magnitude information and is invariant to affine shifts. Our approach demonstrates significant gains in performance and consistency across vision and language tasks. Notably, it enables nearly lossless information transfer and stable zero-shot communication even between highly heterogeneous architectures, such as small language models of varying scales.
IVApr 25, 2022
Analysing the Influence of Attack Configurations on the Reconstruction of Medical Images in Federated LearningMads Emil Dahlgaard, Morten Wehlast Jørgensen, Niels Asp Fuglsang et al.
The idea of federated learning is to train deep neural network models collaboratively and share them with multiple participants without exposing their private training data to each other. This is highly attractive in the medical domain due to patients' privacy records. However, a recently proposed method called Deep Leakage from Gradients enables attackers to reconstruct data from shared gradients. This study shows how easy it is to reconstruct images for different data initialization schemes and distance measures. We show how data and model architecture influence the optimal choice of initialization scheme and distance measure configurations when working with single images. We demonstrate that the choice of initialization scheme and distance measure can significantly increase convergence speed and quality. Furthermore, we find that the optimal attack configuration depends largely on the nature of the target image distribution and the complexity of the model architecture.
LGMay 12
On What We Can Learn from Low-Resolution DataTheresa Dahl Frehr, Niels Henrik Pontoppidan, Hiba Nassar et al.
Artificial intelligence systems typically rely on large, centrally collected datasets, a premise that does not hold in many real-world domains such as healthcare and public institutions. In these settings, data sharing is often constrained by storage, privacy, or resource limitations. For example, small wearable devices may lack the bandwidth or energy capacity needed to store and transmit high-resolution data, leading to aggregation during data collection and thus a loss of information. As a result, datasets collected from different sources may consist of a mixture of high- and low-resolution samples. Despite the prevalence of this setting, it remains unclear how informative low-resolution data is when models are ultimately evaluated on high-resolution inputs. We provide a theoretical analysis based on the Kullback-Leibler divergence that characterises how the influence of a datapoint changes with resolution, and derive bounds that relate the relative contribution of high- and low-resolution observations to the information lost under downsampling. To support this analysis, we empirically demonstrate, using both a vision transformer and a convolutional neural network, that adding low-resolution data to the training set consistently improves performance when high-resolution data is scarce.
MLMar 12, 2021
Machine Learning Assisted Orthonormal Basis Selection for Functional Data AnalysisRani Basna, Hiba Nassar, Krzysztof Podgórski
In implementations of the functional data methods, the effect of the initial choice of an orthonormal basis has not gained much attention in the past. Typically, several standard bases such as Fourier, wavelets, splines, etc. are considered to transform observed functional data and a choice is made without any formal criteria indicating which of the bases is preferable for the initial transformation of the data into functions. In an attempt to address this issue, we propose a strictly data-driven method of orthogonal basis selection. The method uses recently introduced orthogonal spline bases called the splinets obtained by efficient orthogonalization of the B-splines. The algorithm learns from the data in the machine learning style to efficiently place knots. The optimality criterion is based on the average (per functional data point) mean square error and is utilized both in the learning algorithms and in comparison studies. The latter indicates efficiency that is particularly evident for the sparse functional data and to a lesser degree in analyses of responses to complex physical systems.
CROct 29, 2020
Minimal Model Structure Analysis for Input Reconstruction in Federated LearningJia Qian, Hiba Nassar, Lars Kai Hansen
\ac{fl} proposed a distributed \ac{ml} framework where every distributed worker owns a complete copy of global model and their own data. The training is occurred locally, which assures no direct transmission of training data. However, the recent work \citep{zhu2019deep} demonstrated that input data from a neural network may be reconstructed only using knowledge of gradients of that network, which completely breached the promise of \ac{fl} and sabotaged the user privacy. In this work, we aim to further explore the theoretical limits of reconstruction, speedup and stabilize the reconstruction procedure. We show that a single input may be reconstructed with the analytical form, regardless of network depth using a fully-connected neural network with one hidden node. Then we generalize this result to a gradient averaged over batches of size $B$. In this case, the full batch can be reconstructed if the number of hidden units exceeds $B$. For a \ac{cnn}, the number of required kernels in convolutional layers is decided by multiple factors, e.g., padding, kernel and stride size, etc. We require the number of kernels $h\geq (\frac{d}{d^{\prime}})^2C$, where we define $d$ as input width, $d^{\prime}$ as output width after convolutional layer, and $C$ as channel number of input. We validate our observation and demonstrate the improvements using bio-medical (fMRI, \ac{wbc}) and benchmark data (MNIST, Kuzushiji-MNIST, CIFAR100, ImageNet and face images).