Yoav Freund

LG
h-index6
24papers
314citations
Novelty49%
AI Score28

24 Papers

LGJun 27, 2023
Effective resistance in metric spaces

Robi Bhattacharjee, Alexander Cloninger, Yoav Freund et al.

Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigenvectors of the graph Laplacian. One attractive application of ER is to point clouds, i.e. graphs whose vertices correspond to IID samples from a distribution over a metric space. Unfortunately, it was shown that the ER between any two points converges to a trivial quantity that holds no information about the graph's structure as the size of the sample increases to infinity. In this study, we show that this trivial solution can be circumvented by considering a region-based ER between pairs of small regions rather than pairs of points and by scaling the edge weights appropriately with respect to the underlying density in each region. By keeping the regions fixed, we show analytically that the region-based ER converges to a non-trivial limit as the number of points increases to infinity. Namely the ER on a metric space. We support our theoretical findings with numerical experiments.

LGMar 5, 2023
Active learning using region-based sampling

Sanjoy Dasgupta, Yoav Freund

We present a general-purpose active learning scheme for data in metric spaces. The algorithm maintains a collection of neighborhoods of different sizes and uses label queries to identify those that have a strong bias towards one particular label; when two such neighborhoods intersect and have different labels, the region of overlap is treated as a ``known unknown'' and is a target of future active queries. We give label complexity bounds for this method that do not rely on assumptions about the data and we instantiate them in several cases of interest.

CVApr 8, 2024
Towards Explainable Automated Neuroanatomy

Kui Qian, Litao Qiao, Beth Friedman et al.

We present a novel method for quantifying the microscopic structure of brain tissue. It is based on the automated recognition of interpretable features obtained by analyzing the shapes of cells. This contrasts with prevailing methods of brain anatomical analysis in two ways. First, contemporary methods use gray-scale values derived from smoothed version of the anatomical images, which dissipated valuable information from the texture of the images. Second, contemporary analysis uses the output of black-box Convolutional Neural Networks, while our system makes decisions based on interpretable features obtained by analyzing the shapes of individual cells. An important benefit of this open-box approach is that the anatomist can understand and correct the decisions made by the computer. Our proposed system can accurately localize and identify existing brain structures. This can be used to align and coregistar brains and will facilitate connectomic studies for reverse engineering of brain circuitry.

LGFeb 28, 2022
Structure from Voltage

Robi Bhattacharjee, Alex Cloninger, Yoav Freund et al.

Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigen-vectors of the graph Laplacian. Graph laplacians are used to find low dimensional structures in high dimensional data. Here too, ER based analysis has advantages over eign-vector based methods. Unfortunately Von Luxburg et al. (2010) show that, when vertices correspond to a sample from a distribution over a metric space, the limit of the ER between distant points converges to a trivial quantity that holds no information about the structure of the graph. We show that by using scaling resistances in a graph with $n$ vertices by $n^2$, one gets a meaningful limit of the voltages and of effective resistances. We also show that by adding a "ground" node to a metric graph one gets a simple and natural way to compute all of the distances from a chosen point to all other points.

LGOct 5, 2021
When is the Convergence Time of Langevin Algorithms Dimension Independent? A Composite Optimization Viewpoint

Yoav Freund, Yi-An Ma, Tong Zhang

There has been a surge of works bridging MCMC sampling and optimization, with a specific focus on translating non-asymptotic convergence guarantees for optimization problems into the analysis of Langevin algorithms in MCMC sampling. A conspicuous distinction between the convergence analysis of Langevin sampling and that of optimization is that all known convergence rates for Langevin algorithms depend on the dimensionality of the problem, whereas the convergence rates for optimization are dimension-free for convex problems. Whether a dimension independent convergence rate can be achieved by Langevin algorithm is thus a long-standing open problem. This paper provides an affirmative answer to this problem for large classes of either Lipschitz or smooth convex problems with normal priors. By viewing Langevin algorithm as composite optimization, we develop a new analysis technique that leads to dimension independent convergence rates for such problems.

LGAug 23, 2021
StreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm

Andreas Oslandsbotn, Zeljko Kereta, Valeriya Naumova et al.

Kernel ridge regression (KRR) is a popular scheme for non-linear non-parametric learning. However, existing implementations of KRR require that all the data is stored in the main memory, which severely limits the use of KRR in contexts where data size far exceeds the memory size. Such applications are increasingly common in data mining, bioinformatics, and control. A powerful paradigm for computing on data sets that are too large for memory is the streaming model of computation, where we process one data sample at a time, discarding each sample before moving on to the next one. In this paper, we propose StreaMRAK - a streaming version of KRR. StreaMRAK improves on existing KRR schemes by dividing the problem into several levels of resolution, which allows continual refinement to the predictions. The algorithm reduces the memory requirement by continuously and efficiently integrating new samples into the training model. With a novel sub-sampling scheme, StreaMRAK reduces memory and computational complexities by creating a sketch of the original data, where the sub-sampling density is adapted to the bandwidth of the kernel and the local dimensionality of the data. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum. The results show that the proposed algorithm is fast and accurate.

LGJun 20, 2021
Optimal Online Learning using Potential Functions

Yoav Freund

We study a family of potential functions for online learning. We show that if the potential function has strictly positive derivatives of order 1-4 then the min-max optimal strategy for the adversary is Brownian motion. Using that fact we analyze different potential functions and show that the Normal-Hedge potential provides the tightest upper bounds on the cumulative regret of the top ε-percentile.

SDMar 2, 2021
Audio scene monitoring using redundant ad-hoc microphone array networks

Peter Gerstoft, Yihan Hu, Michael J. Bianco et al.

We present a system for localizing sound sources in a room with several ad-hoc microphone arrays. Each circular array performs direction of arrival (DOA) estimation independently using commercial software. The DOAs are fed to a fusion center, concatenated, and used to perform the localization based on two proposed methods, which require only few labeled source locations (anchor points) for training. The first proposed method is based on principal component analysis (PCA) of the observed DOA and does not require any knowledge of anchor points. The array cluster can then perform localization on a manifold defined by the PCA of concatenated DOAs over time. The second proposed method performs localization using an affine transformation between the DOA vectors and the room manifold. The PCA has fewer requirements on the training sequence, but is less robust to missing DOAs from one of the arrays. The methods are demonstrated with five IoT 8-microphone circular arrays, placed at unspecified fixed locations in an office. Both the PCA and the affine method can easily map out a rectangle based on a few anchor points with similar accuracy. The proposed methods provide a step towards monitoring activities in a smart home and require little installation effort as the array locations are not needed.

LGJul 15, 2020
Experimental Design for Bathymetry Editing

Julaiti Alafate, Yoav Freund, David T. Sandwell et al.

We describe an application of machine learning to a real-world computer assisted labeling task. Our experimental results expose significant deviations from the IID assumption commonly used in machine learning. These results suggest that the common random split of all data into training and testing can often lead to poor performance.

LGMay 29, 2019
An adaptive nearest neighbor rule for classification

Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund et al.

We introduce a variant of the $k$-nearest neighbor classifier in which $k$ is chosen adaptively for each query, rather than supplied as a parameter. The choice of $k$ depends on properties of each neighborhood, and therefore may significantly vary between different points. (For example, the algorithm will use larger $k$ for predicting the labels of points in noisy regions.) We provide theory and experiments that demonstrate that the algorithm performs comparably to, and sometimes better than, $k$-NN with an optimal choice of $k$. In particular, we derive bounds on the convergence rates of our classifier that depend on a local quantity we call the `advantage' which is significantly weaker than the Lipschitz conditions used in previous convergence rate proofs. These generalization bounds hinge on a variant of the seminal Uniform Convergence Theorem due to Vapnik and Chervonenkis; this variant concerns conditional probabilities and may be of independent interest.

LGJan 25, 2019
Faster Boosting with Smaller Memory

Julaiti Alafate, Yoav Freund

State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small. This is achieved using a combination of three techniques: early stopping, effective sample size, and stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory.

LGMay 19, 2018
Tell Me Something New: A New Framework for Asynchronous Parallel Learning

Julaiti Alafate, Yoav Freund

We present a novel approach for parallel computation in the context of machine learning that we call "Tell Me Something New" (TMSN). This approach involves a set of independent workers that use broadcast to update each other when they observe "something new". TMSN does not require synchronization or a head node and is highly resilient against failing machines or laggards. We demonstrate the utility of TMSN by applying it to learning boosted trees. We show that our implementation is 10 times faster than XGBoost and LightGBM on the splice-site prediction problem.

CVMar 9, 2018
Robust Landmark Detection for Alignment of Mouse Brain Section Images

Yuncong Chen, David Kleinfeld, Martyn Goulding et al.

Brightfield and fluorescent imaging of whole brain sections are funda- mental tools of research in mouse brain study. As sectioning and imaging become more efficient, there is an increasing need to automate the post-processing of sec- tions for alignment and three dimensional visualization. There is a further need to facilitate the development of a digital atlas, i.e. a brain-wide map annotated with cell type and tract tracing data, which would allow the automatic registra- tion of images stacks to a common coordinate system. Currently, registration of slices requires manual identification of landmarks. In this work we describe the first steps in developing a semi-automated system to construct a histology at- las of mouse brainstem that combines atlas-guided annotation, landmark-based registration and atlas generation in an iterative framework. We describe an unsu- pervised approach for identifying and matching region and boundary landmarks, based on modelling texture. Experiments show that the detected landmarks corre- spond well with brain structures, and matching is robust under distortion. These results will serve as the basis for registration and atlas building.

CVFeb 28, 2017
The Active Atlas: Combining 3D Anatomical Models with Texture Detectors

Yuncong Chen, Lauren McElvain, Alex Tolpygo et al.

While modern imaging technologies such as fMRI have opened exciting new possibilities for studying the brain in vivo, histological sections remain the best way to study the anatomy of the brain at the level of single neurons. The histological atlas changed little since 1909 and localizing brain regions is a still a labor intensive process performed only by experienced neuro-anatomists. Existing digital atlases such as the Allen Brain atlas are limited to low resolution images which cannot identify the detailed structure of the neurons. We have developed a digital atlas methodology that combines information about the 3D organization of the brain and the detailed texture of neurons in different structures. Using the methodology we developed an atlas for the mouse brainstem and mid-brain, two regions for which there are currently no good atlases. Our atlas is "active" in that it can be used to automatically align a histological stack to the atlas, thus reducing the work of the neuroanatomist.

LGMay 28, 2016
Muffled Semi-Supervised Learning

Akshay Balsubramani, Yoav Freund

We explore a novel approach to semi-supervised learning. This approach is contrary to the common approach in that the unlabeled examples serve to "muffle," rather than enhance, the guidance provided by the labeled examples. We provide several variants of the basic algorithm and show experimentally that they can achieve significantly higher AUC than boosted trees, random forests and logistic regression when unlabeled examples are available.

LGApr 12, 2016
Typical Stability

Raef Bassily, Yoav Freund

In this paper, we introduce a notion of algorithmic stability called typical stability. When our goal is to release real-valued queries (statistics) computed over a dataset, this notion does not require the queries to be of bounded sensitivity -- a condition that is generally assumed under differential privacy [DMNS06, Dwork06] when used as a notion of algorithmic stability [DFHPRR15a, DFHPRR15b, BNSSSU16] -- nor does it require the samples in the dataset to be independent -- a condition that is usually assumed when generalization-error guarantees are sought. Instead, typical stability requires the output of the query, when computed on a dataset drawn from the underlying distribution, to be concentrated around its expected value with respect to that distribution. We discuss the implications of typical stability on the generalization error (i.e., the difference between the value of the query computed on the dataset and the expected value of the query with respect to the true data distribution). We show that typical stability can control generalization error in adaptive data analysis even when the samples in the dataset are not necessarily independent and when queries to be computed are not necessarily of bounded-sensitivity as long as the results of the queries over the dataset (i.e., the computed statistics) follow a distribution with a "light" tail. Examples of such queries include, but not limited to, subgaussian and subexponential queries. We also discuss the composition guarantees of typical stability and prove composition theorems that characterize the degradation of the parameters of typical stability under $k$-fold adaptive composition. We also give simple noise-addition algorithms that achieve this notion. These algorithms are similar to their differentially private counterparts, however, the added noise is calibrated differently.

LGOct 1, 2015
Optimal Binary Classifier Aggregation for General Losses

Akshay Balsubramani, Yoav Freund

We address the problem of aggregating an ensemble of predictors with known loss bounds in a semi-supervised binary classification setting, to minimize prediction loss incurred on the unlabeled data. We find the minimax optimal predictions for a very general class of loss functions including all convex and many non-convex losses, extending a recent analysis of the problem for misclassification error. The result is a family of semi-supervised ensemble aggregation algorithms which are as efficient as linear learning by convex optimization, but are minimax optimal without any relaxations. Their decision rules take a form familiar in decision theory -- applying sigmoid functions to a notion of ensemble margin -- without the assumptions typically made in margin-based learning.

LGJun 18, 2015
Scalable Semi-Supervised Aggregation of Classifiers

Akshay Balsubramani, Yoav Freund

We present and empirically evaluate an efficient algorithm that learns to aggregate the predictions of an ensemble of binary classifiers. The algorithm uses the structure of the ensemble predictions on unlabeled data to yield significant performance improvements. It does this without making assumptions on the structure or origin of the ensemble, without parameters, and as scalably as linear learning. We empirically demonstrate these performance gains with random forests.

LGMar 5, 2015
Optimally Combining Classifiers Using Unlabeled Data

Akshay Balsubramani, Yoav Freund

We develop a worst-case analysis of aggregation of classifier ensembles for binary classification. The task of predicting to minimize error is formulated as a game played over a given set of unlabeled data (a transductive setting), where prior label information is encoded as constraints on the game. The minimax solution of this game identifies cases where a weighted combination of the classifiers can perform significantly better than any single classifier.

LGJan 15, 2015
PAC-Bayes with Minimax for Confidence-Rated Transduction

Akshay Balsubramani, Yoav Freund

We consider using an ensemble of binary classifiers for transductive prediction, when unlabeled test data are known in advance. We derive minimax optimal rules for confidence-rated prediction in this setting. By using PAC-Bayes analysis on these rules, we obtain data-dependent performance guarantees without distributional assumptions on the data. Our analysis techniques are readily extended to a setting in which the predictor is allowed to abstain.

LGJan 15, 2015
The Fast Convergence of Incremental PCA

Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund

We consider a situation in which we see samples in $\mathbb{R}^d$ drawn i.i.d. from some distribution with mean zero and unknown covariance A. We wish to compute the top eigenvector of A in an incremental fashion - with an algorithm that maintains an estimate of the top eigenvector in O(d) space, and incrementally adjusts the estimate with each new data point that arrives. Two classical such schemes are due to Krasulina (1969) and Oja (1983). We give finite-sample convergence rates for both.

LGSep 9, 2014
Non-Convex Boosting Overcomes Random Label Noise

Sunsern Cheamanunkul, Evan Ettinger, Yoav Freund

The sensitivity of Adaboost to random label noise is a well-studied problem. LogitBoost, BrownBoost and RobustBoost are boosting algorithms claimed to be less sensitive to noise than AdaBoost. We present the results of experiments evaluating these algorithms on both synthetic and real datasets. We compare the performance on each of datasets when the labels are corrupted by different levels of independent label noise. In presence of random label noise, we found that BrownBoost and RobustBoost perform significantly better than AdaBoost and LogitBoost, while the difference between each pair of algorithms is insignificant. We provide an explanation for the difference based on the margin distributions of the algorithms.

HCSep 9, 2014
Co-adaptation in a Handwriting Recognition System

Sunsern Cheamanunkul, Yoav Freund

Handwriting is a natural and versatile method for human-computer interaction, especially on small mobile devices such as smart phones. However, as handwriting varies significantly from person to person, it is difficult to design handwriting recognizers that perform well for all users. A natural solution is to use machine learning to adapt the recognizer to the user. One complicating factor is that, as the computer adapts to the user, the user also adapts to the computer and probably changes their handwriting. This paper investigates the dynamics of co-adaptation, a process in which both the computer and the user are adapting their behaviors in order to improve the speed and accuracy of the communication through handwriting. We devised an information-theoretic framework for quantifying the efficiency of a handwriting system where the system includes both the user and the computer. Using this framework, we analyzed data collected from an adaptive handwriting recognition system and characterized the impact of machine adaptation and of human adaptation. We found that both machine adaptation and human adaptation have significant impact on the input rate and must be considered together in order to improve the efficiency of the system as a whole.

LGMar 15, 2012
An Online Learning-based Framework for Tracking

Kamalika Chaudhuri, Yoav Freund, Daniel Hsu

We study the tracking problem, namely, estimating the hidden state of an object over time, from unreliable and noisy measurements. The standard framework for the tracking problem is the generative framework, which is the basis of solutions such as the Bayesian algorithm and its approximation, the particle filters. However, these solutions can be very sensitive to model mismatches. In this paper, motivated by online learning, we introduce a new framework for tracking. We provide an efficient tracking algorithm for this framework. We provide experimental results comparing our algorithm to the Bayesian algorithm on simulated data. Our experiments show that when there are slight model mismatches, our algorithm outperforms the Bayesian algorithm.