Julian Hitschler

h-index5

3papers

514citations

Novelty38%

AI Score26

Ranked #160,293 of 194,257 authors (top 83%)#27,585 in CL (top 90%)

3 Papers

4.9MLJun 12, 2018

Sparse Stochastic Zeroth-Order Optimization with an Application to Bandit Structured Prediction

Artem Sokolov, Julian Hitschler, Mayumi Ohta et al.

Stochastic zeroth-order (SZO), or gradient-free, optimization allows to optimize arbitrary functions by relying only on function evaluations under parameter perturbations, however, the iteration complexity of SZO methods suffers a factor proportional to the dimensionality of the perturbed function. We show that in scenarios with natural sparsity patterns as in structured prediction applications, this factor can be reduced to the expected number of active features over input-output pairs. We give a general proof that applies sparse SZO optimization to Lipschitz-continuous, nonconvex, stochastic objectives, and present an experimental evaluation on linear bandit structured prediction tasks with sparse word-based feature representations that confirm our theoretical results.

18.8CLMar 13, 2017Code

Nematus: a Toolkit for Neural Machine Translation

Rico Sennrich, Orhan Firat, Kyunghyun Cho et al.

We present Nematus, a toolkit for Neural Machine Translation. The toolkit prioritizes high translation accuracy, usability, and extensibility. Nematus has been used to build top-performing submissions to shared translation tasks at WMT and IWSLT, and has been used to train systems for production environments.

18.4CLJan 15, 2016

Multimodal Pivots for Image Caption Translation

Julian Hitschler, Shigehiko Schamoni, Stefan Riezler

We present an approach to improve statistical machine translation of image descriptions by multimodal pivots defined in visual space. The key idea is to perform image retrieval over a database of images that are captioned in the target language, and use the captions of the most similar images for crosslingual reranking of translation outputs. Our approach does not depend on the availability of large amounts of in-domain parallel data, but only relies on available large datasets of monolingually captioned images, and on state-of-the-art convolutional neural networks to compute image similarities. Our experimental evaluation shows improvements of 1 BLEU point over strong baselines.