ITMay 27, 2022
Group-invariant max filteringJameson Cahill, Joseph W. Iverson, Dustin G. Mixon et al.
Given a real inner product space $V$ and a group $G$ of linear isometries, we construct a family of $G$-invariant real-valued functions on $V$ that we call max filters. In the case where $V=\mathbb{R}^d$ and $G$ is finite, a suitable max filter bank separates orbits, and is even bilipschitz in the quotient metric. In the case where $V=L^2(\mathbb{R}^d)$ and $G$ is the group of translation operators, a max filter exhibits stability to diffeomorphic distortion like that of the scattering transform introduced by Mallat. We establish that max filters are well suited for various classification tasks, both in theory and in practice.
LGNov 28, 2022
Sketch-and-solve approaches to k-means clustering by semidefinite programmingCharles Clum, Dustin G. Mixon, Soledad Villar et al.
We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We provide code and an extensive set of numerical experiments where we use this approach to certify approximate optimality of clustering solutions obtained by k-means++.
65.0FAMar 24
Approximation theorems in bilipschitz invariant theoryJameson Cahill, Joseph W. Iverson, Dustin G. Mixon et al.
Bilipschitz invariant theory concerns low-distortion embeddings of orbit spaces into Euclidean space. To date, embeddings with the smallest-possible distortion are known for only a few cases, to include: (a) planar rotations, (b) real phase retrieval, and (c) finite reflection groups. Here, we prove that for all three of these cases, the smallest possible distortion is nearly achieved by a composition of a "max filter bank" with a linear transformation. Our proof amounts to a two-step process: first, we show it suffices to demonstrate a certain inclusion of Lipschitz function spaces, and second, we prove that inclusion, using fundamentally different approaches for the three cases. We also show that these cases interact differently with a few related function spaces, which suggests that a unified treatment would be nontrivial.
30.4LGMar 21
Neural collapse in the orthoplex regimeJames Alcala, Rayna Andreeva, Vladimir A. Kobzar et al.
When training a neural network for classification, the feature vectors of the training set are known to collapse to the vertices of a regular simplex, provided the dimension $d$ of the feature space and the number $n$ of classes satisfies $n\leq d+1$. This phenomenon is known as neural collapse. For other applications like language models, one instead takes $n\gg d$. Here, the neural collapse phenomenon still occurs, but with different emergent geometric figures. We characterize these geometric figures in the orthoplex regime where $d+2\leq n\leq 2d$. The techniques in our analysis primarily involve Radon's theorem and convexity.
MLDec 5, 2025
BalLOT: Balanced $k$-means clustering with optimal transportWenyan Luo, Dustin G. Mixon
We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.
LGMar 18, 2025
On the clustering behavior of sliding windowsBoris Alexeev, Wenyan Luo, Dustin G. Mixon et al.
Things can go spectacularly wrong when clustering timeseries data that has been preprocessed with a sliding window. We highlight three surprising failures that emerge depending on how the window size compares with the timeseries length. In addition to computational examples, we present theoretical explanations for each of these failure modes.
LGNov 23, 2020
Neural collapse with unconstrained featuresDustin G. Mixon, Hans Parshall, Jianzong Pi
Neural collapse is an emergent phenomenon in deep learning that was recently discovered by Papyan, Han and Donoho. We propose a simple "unconstrained features model" in which neural collapse also emerges empirically. By studying this model, we provide some explanation for the emergence of neural collapse in terms of the landscape of empirical risk.
LGAug 10, 2020
Lie PCA: Density estimation for symmetric manifoldsJameson Cahill, Dustin G. Mixon, Hans Parshall
We introduce an extension to local principal component analysis for learning symmetric manifolds. In particular, we use a spectral method to approximate the Lie algebra corresponding to the symmetry group of the underlying manifold. We derive the sample complexity of our method for a variety of manifolds before applying it to various data sets for improved density estimation.
ITAug 10, 2020
Sketching semidefinite programs for faster clusteringDustin G. Mixon, Kaiying Xie
Many clustering problems enjoy solutions by semidefinite programming. Theoretical results in this vein frequently consider data with a planted clustering and a notion of signal strength such that the semidefinite program exactly recovers the planted clustering when the signal strength is sufficiently large. In practice, semidefinite programs are notoriously slow, and so speedups are welcome. In this paper, we show how to sketch a popular semidefinite relaxation of a graph clustering problem known as minimum bisection, and our analysis supports a meta-claim that the clustering task is less computationally burdensome when there is more signal.
MLDec 6, 2018
SqueezeFit: Label-aware dimensionality reduction by semidefinite programmingCulver McWhirter, Dustin G. Mixon, Soledad Villar
Given labeled points in a high-dimensional vector space, we seek a low-dimensional subspace such that projecting onto this subspace maintains some prescribed distance between points of differing labels. Intended applications include compressive classification. Taking inspiration from large margin nearest neighbor classification, this paper introduces a semidefinite relaxation of this problem. Unlike its predecessors, this relaxation is amenable to theoretical analysis, allowing us to provably recover a planted projection operator from the data.
LGMar 25, 2018
SUNLayer: Stable denoising with generative networksRuhui Jin, Dustin G. Mixon, Soledad Villar
Deep neural networks are often used to implement powerful generative models for real-world data. Notable applications include image denoising, as well as other classical inverse problems like compressed sensing and super-resolution. To provide a rigorous but simplified analysis of generative models, in this work, we introduce an elegant theoretical framework based on spherical harmonics, namely \textbf{SUNLayer}. Our theoretical framework identifies explicit conditions on activation functions that guarantee denoising under local optimization. Numerical experiments examine the theoretical properties on commonly used activation functions and demonstrate their stable denoising performance.
MLOct 3, 2017
Monte Carlo approximation certificates for k-means clusteringDustin G. Mixon, Soledad Villar
Efficient algorithms for $k$-means clustering frequently converge to suboptimal partitions, and given a partition, it is difficult to detect $k$-means optimality. In this paper, we develop an a posteriori certifier of approximate optimality for $k$-means clustering. The certifier is a sub-linear Monte Carlo algorithm based on Peng and Wei's semidefinite relaxation of $k$-means. In particular, solving the relaxation for small random samples of the dataset produces a high-confidence lower bound on the $k$-means objective, and being sub-linear, our algorithm is faster than $k$-means++ when the number of data points is large. We illustrate the performance of our algorithm with both numerical experiments and a performance guarantee: If the data points are drawn independently from any mixture of two Gaussians over $\mathbb{R}^m$ with identity covariance, then with probability $1-O(1/m)$, our $\operatorname{poly}(m)$-time algorithm produces a 3-approximation certificate with 99% confidence.
MLFeb 22, 2016
Clustering subgaussian mixtures by semidefinite programmingDustin G. Mixon, Soledad Villar, Rachel Ward
We introduce a model-free relax-and-round algorithm for k-means clustering based on a semidefinite relaxation due to Peng and Wei. The algorithm interprets the SDP output as a denoised version of the original data and then rounds this output to a hard clustering. We provide a generic method for proving performance guarantees for this algorithm, and we analyze the algorithm in the context of subgaussian mixture models. We also study the fundamental limits of estimating Gaussian centers by k-means clustering in order to compare our approximation guarantee to the theoretically optimal k-means clustering solution.
ITSep 26, 2015
Probably certifiably correct k-means clusteringTakayuki Iguchi, Dustin G. Mixon, Jesse Peterson et al.
Recently, Bandeira [arXiv:1509.00824] introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k-means clustering. First, we prove that Peng and Wei's semidefinite relaxation of k-means is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed k-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei that is designed to solve k-means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.
LGJul 15, 2015
Learning Boolean functions with concentrated spectraDustin G. Mixon, Jesse Peterson
This paper discusses the theory and application of learning Boolean functions that are concentrated in the Fourier domain. We first estimate the VC dimension of this function class in order to establish a small sample complexity of learning in this case. Next, we propose a computationally efficient method of empirical risk minimization, and we apply this method to the MNIST database of handwritten digits. These results demonstrate the effectiveness of our model for modern classification tasks. We conclude with a list of open problems for future investigation.
ITMay 18, 2015
On the tightness of an SDP relaxation of k-meansTakayuki Iguchi, Dustin G. Mixon, Jesse Peterson et al.
Recently, Awasthi et al. introduced an SDP relaxation of the $k$-means problem in $\mathbb R^m$. In this work, we consider a random model for the data points in which $k$ balls of unit radius are deterministically distributed throughout $\mathbb R^m$, and then in each ball, $n$ points are drawn according to a common rotationally invariant probability distribution. For any fixed ball configuration and probability distribution, we prove that the SDP relaxation of the $k$-means problem exactly recovers these planted clusters with probability $1-e^{-Ω(n)}$ provided the distance between any two of the ball centers is $>2+ε$, where $ε$ is an explicit function of the configuration of the ball centers, and can be arbitrarily small when $m$ is large.
LGApr 11, 2014
Compressive classification and the rare eclipse problemAfonso S. Bandeira, Dustin G. Mixon, Benjamin Recht
This paper addresses the fundamental question of when convex sets remain disjoint after random projection. We provide an analysis using ideas from high-dimensional convex geometry. For ellipsoids, we provide a bound in terms of the distance between these ellipsoids and simple functions of their polynomial coefficients. As an application, this theorem provides bounds for compressive classification of convex sets. Rather than assuming that the data to be classified is sparse, our results show that the data can be acquired via very few measurements yet will remain linearly separable. We demonstrate the feasibility of this approach in the context of hyperspectral imaging.
STOct 8, 2012
Group Model Selection Using Marginal Correlations: The Good, the Bad and the UglyWaheed U. Bajwa, Dustin G. Mixon
Group model selection is the problem of determining a small subset of groups of predictors (e.g., the expression data of genes) that are responsible for majority of the variation in a response variable (e.g., the malignancy of a tumor). This paper focuses on group model selection in high-dimensional linear models, in which the number of predictors far exceeds the number of samples of the response variable. Existing works on high-dimensional group model selection either require the number of samples of the response variable to be significantly larger than the total number of predictors contributing to the response or impose restrictive statistical priors on the predictors and/or nonzero regression coefficients. This paper provides comprehensive understanding of a low-complexity approach to group model selection that avoids some of these limitations. The proposed approach, termed Group Thresholding (GroTh), is based on thresholding of marginal correlations of groups of predictors with the response variable and is reminiscent of existing thresholding-based approaches in the literature. The most important contribution of the paper in this regard is relating the performance of GroTh to a polynomial-time verifiable property of the predictors for the general case of arbitrary (random or deterministic) predictors and arbitrary nonzero regression coefficients.