CL LG MLJun 2, 2016

Stochastic Structured Prediction under Bandit Feedback

Artem Sokolov, Julia Kreutzer, Christopher Lo, Stefan Riezler

arXiv:1606.00739v29.331 citations

Originality Incremental advance

AI Analysis

This work addresses a learning scenario for structured prediction in NLP, but it appears incremental as it builds on existing methods with specific optimizations.

The paper tackles the problem of stochastic structured prediction under bandit feedback, where learners predict outputs and receive partial feedback, and finds that a non-convex objective for pairwise preference learning yields the best results in terms of convergence speed and task performance.

Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analyze them as stochastic first-order methods. We present an experimental evaluation on problems of natural language processing over exponential output spaces, and compare convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm. Best results under both criteria are obtained for a non-convex objective for pairwise preference learning under bandit feedback.

View on arXiv PDF

Similar