MLNov 23, 2025
Transforming Conditional Density Estimation Into a Single Nonparametric Regression TaskAlexander G. Reisach, Olivier Collier, Alex Luedtke et al.
We propose a way of transforming the problem of conditional density estimation into a single nonparametric regression task via the introduction of auxiliary samples. This allows leveraging regression methods that work well in high dimensions, such as neural networks and decision trees. Our main theoretical result characterizes and establishes the convergence of our estimator to the true conditional density in the data limit. We develop condensité, a method that implements this approach. We demonstrate the benefit of the auxiliary samples on synthetic data and showcase that condensité can achieve good out-of-the-box results. We evaluate our method on a large population survey dataset and on a satellite imaging dataset. In both cases, we find that condensité matches or outperforms the state of the art and yields conditional densities in line with established findings in the literature on each dataset. Our contribution opens up new possibilities for regression-based conditional density estimation and the empirical results indicate strong promise for applied research.
STOct 17, 2013
Minimax rates in permutation estimation for feature matchingOlivier Collier, Arnak S. Dalalyan
The problem of matching two sets of features appears in various tasks of computer vision and can be often formalized as a problem of permutation estimation. We address this problem from a statistical point of view and provide a theoretical analysis of the accuracy of several natural estimators. To this end, the minimax rate of separation is investigated and its expression is obtained as a function of the sample size, noise level and dimension. We consider the cases of homoscedastic and heteroscedastic noise and establish, in each case, tight upper bounds on the separation distance of several estimators. These upper bounds are shown to be unimprovable both in the homoscedastic and heteroscedastic settings. Interestingly, these bounds demonstrate that a phase transition occurs when the dimension $d$ of the features is of the order of the logarithm of the number of features $n$. For $d=O(\log n)$, the rate is dimension free and equals $σ(\log n)^{1/2}$, where $σ$ is the noise level. In contrast, when $d$ is larger than $c\log n$ for some constant $c>0$, the minimax rate increases with $d$ and is of the order $σ(d\log n)^{1/4}$. We also discuss the computational aspects of the estimators and provide empirical evidence of their consistency on synthetic data. Finally, we show that our results extend to more general matching criteria.