Cluster, Classify, Regress: A General Method For Learning Discountinous Functions
This method addresses the challenge of modeling discontinuous functions in machine learning, which is an incremental approach combining existing techniques.
The paper tackles the problem of learning highly nonlinear and discontinuous functions in supervised learning by proposing a three-stage method: clustering input-output pairs, classifying based on cluster labels, and performing separate regressions per class. The method is demonstrated on toy problems, including a plasma fusion simulation example, but no concrete performance numbers are provided.
This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. It is proposed to solve this problem in three stages: (i) cluster the pairs of input-output data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for each class, where the training data corresponds to the subset of the original input-output pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak.