Matrix sketching for supervised classification with imbalanced classes
This addresses a domain-specific issue in supervised classification for datasets with imbalanced classes, but it appears incremental as it applies an existing technique to a new problem.
The paper tackles the problem of poor classification performance due to imbalanced classes by proposing matrix sketching as a tool for rebalancing class sizes, but no concrete results or numbers are provided.
Matrix sketching is a recently developed data compression technique. An input matrix A is efficiently approximated with a smaller matrix B, so that B preserves most of the properties of A up to some guaranteed approximation ratio. In so doing numerical operations on big data sets become faster. Sketching algorithms generally use random projections to compress the original dataset and this stochastic generation process makes them amenable to statistical analysis. The statistical properties of sketching algorithms have been widely studied in the context of multiple linear regression. In this paper we propose matrix sketching as a tool for rebalancing class sizes in supervised classification with imbalanced classes. It is well-known in fact that class imbalance may lead to poor classification performances especially as far as the minority class is concerned.