Bill Moran

h-index33

3papers

14citations

Novelty30%

AI Score17

Ranked #191,066 of 194,257 authors (top 98%)#3,343 in ML (top 99%)

3 Papers

1.5MLNov 25, 2015

Maximum Likelihood Estimation for Single Linkage Hierarchical Clustering

Dekang Zhu, Dan P. Guralnik, Xuezhi Wang et al.

We derive a statistical model for estimation of a dendrogram from single linkage hierarchical clustering (SLHC) that takes account of uncertainty through noise or corruption in the measurements of separation of data. Our focus is on just the estimation of the hierarchy of partitions afforded by the dendrogram, rather than the heights in the latter. The concept of estimating this "dendrogram structure'' is introduced, and an approximate maximum likelihood estimator (MLE) for the dendrogram structure is described. These ideas are illustrated by a simple Monte Carlo simulation that, at least for small data sets, suggests the method outperforms SLHC in the presence of noise.

4.0MLNov 24, 2015

Statistical Properties of the Single Linkage Hierarchical Clustering Estimator

Dekang Zhu, Dan P. Guralnik, Xuezhi Wang et al.

Distance-based hierarchical clustering (HC) methods are widely used in unsupervised data analysis but few authors take account of uncertainty in the distance data. We incorporate a statistical model of the uncertainty through corruption or noise in the pairwise distances and investigate the problem of estimating the HC as unknown parameters from measurements. Specifically, we focus on single linkage hierarchical clustering (SLHC) and study its geometry. We prove that under fairly reasonable conditions on the probability distribution governing measurements, SLHC is equivalent to maximum partial profile likelihood estimation (MPPLE) with some of the information contained in the data ignored. At the same time, we show that direct evaluation of SLHC on maximum likelihood estimation (MLE) of pairwise distances yields a consistent estimator. Consequently, a full MLE is expected to perform better than SLHC in getting the correct HC results for the ground truth metric.

1.2SYOct 12, 2014

Bounds on Multiple Sensor Fusion

Bill Moran, Fred Cohen, Zengfu Wang et al.

We consider the problem of fusing measurements from multiple sensors, where the sensing regions overlap and data are non-negative---possibly resulting from a count of indistinguishable discrete entities. Because of overlaps, it is, in general, impossible to fuse this information to arrive at an accurate estimate of the overall amount or count of material present in the union of the sensing regions. Here we study the range of overall values consistent with the data. Posed as a linear programming problem, this leads to interesting questions associated with the geometry of the sensor regions, specifically, the arrangement of their non-empty intersections. We define a computational tool called the fusion polytope and derive a condition for this to be in the positive orthant thus simplifying calculations. We show that, in two dimensions, inflated tiling schemes based on rectangular regions fail to satisfy this condition, whereas inflated tiling schemes based on hexagons do.