DXML: Distributed Extreme Multilabel Classification
This work addresses scalability challenges in extreme multilabel classification for applications like product ranking and recommendation, though it appears incremental as it builds on existing methods with a focus on implementation.
The authors tackled the problem of extreme multilabel classification for large-scale ranking and recommendation by proposing a scalable hybrid distributed and shared memory implementation, which experiments showed was relatively faster to train and test on some large datasets and sometimes resulted in relatively small model sizes.
As a big data application, extreme multilabel classification has emerged as an important research topic with applications in ranking and recommendation of products and items. A scalable hybrid distributed and shared memory implementation of extreme classification for large scale ranking and recommendation is proposed. In particular, the implementation is a mix of message passing using MPI across nodes and using multithreading on the nodes using OpenMP. The expression for communication latency and communication volume is derived. Parallelism using work-span model is derived for shared memory architecture. This throws light on the expected scalability of similar extreme classification methods. Experiments show that the implementation is relatively faster to train and test on some large datasets. In some cases, model size is relatively small.