CGApr 30, 2018
Simple Distances for Trajectories via LandmarksJeff M. Phillips, Pingfan Tang
We develop a new class of distances for objects including lines, hyperplanes, and trajectories, based on the distance to a set of landmarks. These distances easily and interpretably map objects to a Euclidean space, are simple to compute, and perform well in data analysis tasks. For trajectories, they match and in some cases significantly out-perform all state-of-the-art other metrics, can effortlessly be used in k-means clustering, and directly plugged into approximate nearest neighbor approaches which immediately out-perform the best recent advances in trajectory similarity search by several orders of magnitude. These distances do not require a geometry distorting dual (common in the line or halfspace case) or complicated alignment (common in trajectory case). We show reasonable and often simple conditions under which these distances are metrics.
LGSep 5, 2016
The Robustness of Estimator CompositionPingfan Tang, Jeff M. Phillips
We formalize notions of robustness for composite estimators via the notion of a breakdown point. A composite estimator successively applies two (or more) estimators: on data decomposed into disjoint parts, it applies the first estimator on each part, then the second estimator on the outputs of the first estimator. And so on, if the composition is of more than two estimators. Informally, the breakdown point is the minimum fraction of data points which if significantly modified will also significantly modify the output of the estimator, so it is typically desirable to have a large breakdown point. Our main result shows that, under mild conditions on the individual estimators, the breakdown point of the composite estimator is the product of the breakdown points of the individual estimators. We also demonstrate several scenarios, ranging from regression to statistical testing, where this analysis is easy to apply, useful in understanding worst case robustness, and sheds powerful insights onto the associated data analysis.