CLLGMLOct 28, 2019

Exploring Kernel Functions in the Softmax Layer for Contextual Word Classification

arXiv:1910.12554v1646 citations
Originality Synthesis-oriented
AI Analysis

This work addresses word classification in NLP, but it appears incremental as it explores kernel variations without clear breakthrough results.

The paper tackled contextual word classification by replacing the inner product in the softmax layer with kernel functions, and found a wide range of performances across different kernel settings in language modeling and machine translation tasks.

Prominently used in support vector machines and logistic regressions, kernel functions (kernels) can implicitly map data points into high dimensional spaces and make it easier to learn complex decision boundaries. In this work, by replacing the inner product function in the softmax layer, we explore the use of kernels for contextual word classification. In order to compare the individual kernels, experiments are conducted on standard language modeling and machine translation tasks. We observe a wide range of performances across different kernel settings. Extending the results, we look at the gradient properties, investigate various mixture strategies and examine the disambiguation abilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes