44.6PRMay 20
Approximate FKG inequalities for phase-bound spin systems, with applications to central limit theorems for exponential random graphsSatyaki Mukherjee, Vilas Winstein
The Fortuin-Kasteleyn-Ginibre (FKG) inequality is an invaluable tool in monotone spin systems satisfying the FKG lattice condition, which provides positive correlations for all coordinate-wise increasing functions of spins. This inequality has numerous applications and plays an integral role in the proof of various central limit theorems (CLTs), including recent work on ferromagnetic exponential random graph models (ERGMs) wherein a Hamiltonian tilt promotes the presence of small subgraphs like triangles. However, the FKG lattice condition fails to hold when confining a spin system to a particular phase in the low-temperature regime of parameters. Thus it is not a priori clear if each phase internally has positive correlations for increasing functions, or if the positive correlations in the overall model (which is a mixture of phases) arise primarily from the global choice of phase. In this article, we show that the individual phases in ERGMs do indeed satisfy an approximate form of the FKG inequality internally. We use this to finish the proof of various CLTs within each individual phase in the phase-coexistence regime, answering a question posed by Bianchi, Collet, and Magnanini. We present the FKG inequality for ERGMs as a consequence of a more general result which holds under certain inputs related to metastable mixing; we expect this general result to be widely applicable, and we devote a section to spelling out the details of its application to a class of generalized higher-order ferromagnetic Curie-Weiss models where the necessary inputs are relatively transparent.
LGSep 5, 2023
Representation Learning Dynamics of Self-Supervised ModelsPascal Esser, Satyaki Mukherjee, Debarghya Ghoshdastidar
Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data, and SSL with neural networks has been highly successful in practice. However current theoretical analysis of SSL is mostly restricted to generalisation error bounds. In contrast, learning dynamics often provide a precise characterisation of the behaviour of neural networks based models but, so far, are mainly known in supervised settings. In this paper, we study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We show that a naive extension of the dymanics of multivariate regression to SSL leads to learning trivial scalar representations that demonstrates dimension collapse in SSL. Consequently, we formulate SSL objectives with orthogonality constraints on the weights, and derive the exact (network width independent) learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold. We also argue that the infinite width approximation of SSL models significantly deviate from the neural tangent kernel approximations of supervised models. We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL.
LGDec 2, 2022
Improved Representation Learning Through Tensorized AutoencodersPascal Mattia Esser, Satyaki Mukherjee, Mahalakshmi Sabanayagam et al.
The central question in representation learning is what constitutes a good or meaningful representation. In this work we argue that if we consider data with inherent cluster structures, where clusters can be characterized through different means and covariances, those data structures should be represented in the embedding as well. While Autoencoders (AE) are widely used in practice for unsupervised representation learning, they do not fulfil the above condition on the embedding as they obtain a single representation of the data. To overcome this we propose a meta-algorithm that can be used to extend an arbitrary AE architecture to a tensorized version (TAE) that allows for learning cluster-specific embeddings while simultaneously learning the cluster assignment. For the linear setting we prove that TAE can recover the principle components of the different clusters in contrast to principle component of the entire data recovered by a standard AE. We validated this on planted models and for general, non-linear and convolutional AEs we empirically illustrate that tensorizing the AE is beneficial in clustering and de-noising tasks.
LGFeb 24, 2023
Wasserstein Projection Pursuit of Non-Gaussian SignalsSatyaki Mukherjee, Soumendu Sundar Mukherjee, Debarghya Ghoshdastidar
We consider the general dimensionality reduction problem of locating in a high-dimensional data cloud, a $k$-dimensional non-Gaussian subspace of interesting features. We use a projection pursuit approach -- we search for mutually orthogonal unit directions which maximise the 2-Wasserstein distance of the empirical distribution of data-projections along these directions from a standard Gaussian. Under a generative model, where there is a underlying (unknown) low-dimensional non-Gaussian subspace, we prove rigorous statistical guarantees on the accuracy of approximating this unknown subspace by the directions found by our projection pursuit approach. Our results operate in the regime where the data dimensionality is comparable to the sample size, and thus supplement the recent literature on the non-feasibility of locating interesting directions via projection pursuit in the complementary regime where the data dimensionality is much larger than the sample size.
LGFeb 4, 2025
Recovering Imbalanced Clusters via Gradient-Based Projection PursuitMartin Eppert, Satyaki Mukherjee, Debarghya Ghoshdastidar
Projection Pursuit is a classic exploratory technique for finding interesting projections of a dataset. We propose a method for recovering projections containing either Imbalanced Clusters or a Bernoulli-Rademacher distribution using a gradient-based technique to optimize the projection index. As sample complexity is a major limiting factor in Projection Pursuit, we analyze our algorithm's sample complexity within a Planted Vector setting where we can observe that Imbalanced Clusters can be recovered more easily than balanced ones. Additionally, we give a generalized result that works for a variety of data distributions and projection indices. We compare these results to computational lower bounds in the Low-Degree-Polynomial Framework. Finally, we experimentally evaluate our method's applicability to real-world data using FashionMNIST and the Human Activity Recognition Dataset, where our algorithm outperforms others when only a few samples are available.