Dustin G. Mixon

h-index27

17papers

2,791citations

Novelty52%

AI Score46

Ranked #39,486 of 194,257 authors (top 20%)#9,217 in LG (top 23%)

17 Papers

9.2COJul 16

Short spherical $t$-design curves

Emily J. King, Dustin G. Mixon

We study the minimum arclength of spherical $t$-design curves, i.e., closed rectifiable curves on $S^d$ whose normalized arclength measure exactly integrates every polynomial of degree at most $t$. We prove an explicit spectral lower bound that is sharp for $t=1$ in all spheres and for $t=2$ in every odd-dimensional sphere, yielding the first exact optimality results for spherical $t$-design curves with $t>1$. For even-dimensional spheres, we construct $2$-design curves whose lengths asymptotically match the lower bound as $d\to\infty$, and in $S^2$, we use numerical optimization and the calculus of variations to derive a candidate for the shortest $2$-design curve.

23.5LGOct 9, 2023

Generalized Neural Collapse for a Large Number of Classes

Jiachen Jiang, Jinxin Zhou, Peng Wang et al. · deepmind

Neural collapse provides an elegant mathematical characterization of learned last layer representations (a.k.a. features) and classifier weights in deep classification models. Such results not only provide insights but also motivate new techniques for improving practical deep models. However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space. This paper extends neural collapse to cases where the number of classes are much larger than the dimension of feature space, which broadly occur for language models, retrieval systems, and face recognition applications. We show that the features and classifier exhibit a generalized neural collapse phenomenon, where the minimum one-vs-rest margins is maximized.We provide empirical study to verify the occurrence of generalized neural collapse in practical deep neural networks. Moreover, we provide theoretical study to show that the generalized neural collapse provably occurs under unconstrained feature model with spherical constraint, under certain technical conditions on feature dimension and number of classes.

11.7ITMay 27, 2022

Group-invariant max filtering

Jameson Cahill, Joseph W. Iverson, Dustin G. Mixon et al.

Given a real inner product space $V$ and a group $G$ of linear isometries, we construct a family of $G$-invariant real-valued functions on $V$ that we call max filters. In the case where $V=\mathbb{R}^d$ and $G$ is finite, a suitable max filter bank separates orbits, and is even bilipschitz in the quotient metric. In the case where $V=L^2(\mathbb{R}^d)$ and $G$ is the group of translation operators, a max filter exhibits stability to diffeomorphic distortion like that of the scattering transform introduced by Mallat. We establish that max filters are well suited for various classification tasks, both in theory and in practice.

3.3LGNov 28, 2022Code

Sketch-and-solve approaches to k-means clustering by semidefinite programming

Charles Clum, Dustin G. Mixon, Soledad Villar et al.

We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We provide code and an extensive set of numerical experiments where we use this approach to certify approximate optimality of clustering solutions obtained by k-means++.

1.9FAMar 24

Approximation theorems in bilipschitz invariant theory

Jameson Cahill, Joseph W. Iverson, Dustin G. Mixon et al.

Bilipschitz invariant theory concerns low-distortion embeddings of orbit spaces into Euclidean space. To date, embeddings with the smallest-possible distortion are known for only a few cases, to include: (a) planar rotations, (b) real phase retrieval, and (c) finite reflection groups. Here, we prove that for all three of these cases, the smallest possible distortion is nearly achieved by a composition of a "max filter bank" with a linear transformation. Our proof amounts to a two-step process: first, we show it suffices to demonstrate a certain inclusion of Lipschitz function spaces, and second, we prove that inclusion, using fundamentally different approaches for the three cases. We also show that these cases interact differently with a few related function spaces, which suggests that a unified treatment would be nontrivial.

4.5MLDec 5, 2025

BalLOT: Balanced $k$-means clustering with optimal transport

Wenyan Luo, Dustin G. Mixon

We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.

27.7LGNov 23, 2020

Neural collapse with unconstrained features

Dustin G. Mixon, Hans Parshall, Jianzong Pi

Neural collapse is an emergent phenomenon in deep learning that was recently discovered by Papyan, Han and Donoho. We propose a simple "unconstrained features model" in which neural collapse also emerges empirically. By studying this model, we provide some explanation for the emergence of neural collapse in terms of the landscape of empirical risk.

4.2LGAug 10, 2020

Lie PCA: Density estimation for symmetric manifolds

Jameson Cahill, Dustin G. Mixon, Hans Parshall

We introduce an extension to local principal component analysis for learning symmetric manifolds. In particular, we use a spectral method to approximate the Lie algebra corresponding to the symmetry group of the underlying manifold. We derive the sample complexity of our method for a variety of manifolds before applying it to various data sets for improved density estimation.

5.1ITAug 10, 2020

Sketching semidefinite programs for faster clustering

Dustin G. Mixon, Kaiying Xie

Many clustering problems enjoy solutions by semidefinite programming. Theoretical results in this vein frequently consider data with a planted clustering and a notion of signal strength such that the semidefinite program exactly recovers the planted clustering when the signal strength is sufficiently large. In practice, semidefinite programs are notoriously slow, and so speedups are welcome. In this paper, we show how to sketch a popular semidefinite relaxation of a graph clustering problem known as minimum bisection, and our analysis supports a meta-claim that the clustering task is less computationally burdensome when there is more signal.

6.1MLDec 6, 2018

SqueezeFit: Label-aware dimensionality reduction by semidefinite programming

Culver McWhirter, Dustin G. Mixon, Soledad Villar

Given labeled points in a high-dimensional vector space, we seek a low-dimensional subspace such that projecting onto this subspace maintains some prescribed distance between points of differing labels. Intended applications include compressive classification. Taking inspiration from large margin nearest neighbor classification, this paper introduces a semidefinite relaxation of this problem. Unlike its predecessors, this relaxation is amenable to theoretical analysis, allowing us to provably recover a planted projection operator from the data.

9.4LGMar 25, 2018

SUNLayer: Stable denoising with generative networks

Ruhui Jin, Dustin G. Mixon, Soledad Villar

Deep neural networks are often used to implement powerful generative models for real-world data. Notable applications include image denoising, as well as other classical inverse problems like compressed sensing and super-resolution. To provide a rigorous but simplified analysis of generative models, in this work, we introduce an elegant theoretical framework based on spherical harmonics, namely \textbf{SUNLayer}. Our theoretical framework identifies explicit conditions on activation functions that guarantee denoising under local optimization. Numerical experiments examine the theoretical properties on commonly used activation functions and demonstrate their stable denoising performance.

4.8MLOct 3, 2017

Monte Carlo approximation certificates for k-means clustering

Dustin G. Mixon, Soledad Villar

Efficient algorithms for $k$-means clustering frequently converge to suboptimal partitions, and given a partition, it is difficult to detect $k$-means optimality. In this paper, we develop an a posteriori certifier of approximate optimality for $k$-means clustering. The certifier is a sub-linear Monte Carlo algorithm based on Peng and Wei's semidefinite relaxation of $k$-means. In particular, solving the relaxation for small random samples of the dataset produces a high-confidence lower bound on the $k$-means objective, and being sub-linear, our algorithm is faster than $k$-means++ when the number of data points is large. We illustrate the performance of our algorithm with both numerical experiments and a performance guarantee: If the data points are drawn independently from any mixture of two Gaussians over $\mathbb{R}^m$ with identity covariance, then with probability $1-O(1/m)$, our $\operatorname{poly}(m)$-time algorithm produces a 3-approximation certificate with 99% confidence.

19.1MLFeb 22, 2016

Clustering subgaussian mixtures by semidefinite programming

Dustin G. Mixon, Soledad Villar, Rachel Ward

We introduce a model-free relax-and-round algorithm for k-means clustering based on a semidefinite relaxation due to Peng and Wei. The algorithm interprets the SDP output as a denoised version of the original data and then rounds this output to a hard clustering. We provide a generic method for proving performance guarantees for this algorithm, and we analyze the algorithm in the context of subgaussian mixture models. We also study the fundamental limits of estimating Gaussian centers by k-means clustering in order to compare our approximation guarantee to the theoretically optimal k-means clustering solution.

11.3ITSep 26, 2015

Probably certifiably correct k-means clustering

Takayuki Iguchi, Dustin G. Mixon, Jesse Peterson et al.

Recently, Bandeira [arXiv:1509.00824] introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k-means clustering. First, we prove that Peng and Wei's semidefinite relaxation of k-means is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed k-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei that is designed to solve k-means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.

1.1LGJul 15, 2015

Learning Boolean functions with concentrated spectra

Dustin G. Mixon, Jesse Peterson

This paper discusses the theory and application of learning Boolean functions that are concentrated in the Fourier domain. We first estimate the VC dimension of this function class in order to establish a small sample complexity of learning in this case. Next, we propose a computationally efficient method of empirical risk minimization, and we apply this method to the MNIST database of handwritten digits. These results demonstrate the effectiveness of our model for modern classification tasks. We conclude with a list of open problems for future investigation.

10.8ITMay 18, 2015

On the tightness of an SDP relaxation of k-means

Takayuki Iguchi, Dustin G. Mixon, Jesse Peterson et al.

Recently, Awasthi et al. introduced an SDP relaxation of the $k$-means problem in $\mathbb R^m$. In this work, we consider a random model for the data points in which $k$ balls of unit radius are deterministically distributed throughout $\mathbb R^m$, and then in each ball, $n$ points are drawn according to a common rotationally invariant probability distribution. For any fixed ball configuration and probability distribution, we prove that the SDP relaxation of the $k$-means problem exactly recovers these planted clusters with probability $1-e^{-Ω(n)}$ provided the distance between any two of the ball centers is $>2+ε$, where $ε$ is an explicit function of the configuration of the ball centers, and can be arbitrarily small when $m$ is large.

8.0LGApr 11, 2014

Compressive classification and the rare eclipse problem

Afonso S. Bandeira, Dustin G. Mixon, Benjamin Recht

This paper addresses the fundamental question of when convex sets remain disjoint after random projection. We provide an analysis using ideas from high-dimensional convex geometry. For ellipsoids, we provide a bound in terms of the distance between these ellipsoids and simple functions of their polynomial coefficients. As an application, this theorem provides bounds for compressive classification of convex sets. Rather than assuming that the data to be classified is sparse, our results show that the data can be acquired via very few measurements yet will remain linearly separable. We demonstrate the feasibility of this approach in the context of hyperspectral imaging.