Craig Stewart

CL
7papers
5,927citations
Novelty48%
AI Score28

7 Papers

CLOct 29, 2020
Unbabel's Participation in the WMT20 Metrics Shared Task

Ricardo Rei, Craig Stewart, Catarina Farinha et al.

We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics. We intend to participate on the segment-level, document-level and system-level tracks on all language pairs, as well as the 'QE as a Metric' track. Accordingly, we illustrate results of our models in these tracks with reference to test sets from the previous year. Our submissions build upon the recently proposed COMET framework: We train several estimator models to regress on different human-generated quality scores and a novel ranking model trained on relative ranks obtained from Direct Assessments. We also propose a simple technique for converting segment-level predictions into a document-level score. Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.

CLSep 18, 2020
COMET: A Neural Framework for MT Evaluation

Ricardo Rei, Craig Stewart, Ana C Farinha et al.

We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. To showcase our framework, we train three models with different types of human judgements: Direct Assessments, Human-mediated Translation Edit Rate and Multidimensional Quality Metrics. Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.

HCAug 12, 2020
Predicting MOOCs Dropout Using Only Two Easily Obtainable Features from the First Week's Activities

Ahmed Alamri, Mohammad Alshehri, Alexandra I. Cristea et al.

While Massive Open Online Course (MOOCs) platforms provide knowledge in a new and unique way, the very high number of dropouts is a significant drawback. Several features are considered to contribute towards learner attrition or lack of interest, which may lead to disengagement or total dropout. The jury is still out on which factors are the most appropriate predictors. However, the literature agrees that early prediction is vital to allow for a timely intervention. Whilst feature-rich predictors may have the best chance for high accuracy, they may be unwieldy. This study aims to predict learner dropout early-on, from the first week, by comparing several machine-learning approaches, including Random Forest, Adaptive Boost, XGBoost and GradientBoost Classifiers. The results show promising accuracies (82%-94%) using as little as 2 features. We show that the accuracies obtained outperform state of the art approaches, even when the latter deploy several features.

HCAug 12, 2020
Is MOOC Learning Different for Dropouts? A Visually-Driven, Multi-granularity Explanatory ML Approach

Ahmed Alamri, Zhongtian Sun, Alexandra I. Cristea et al.

Millions of people have enrolled and enrol (especially in the Covid-19 pandemic world) in MOOCs. However, the retention rate of learners is notoriously low. The majority of the research work on this issue focuses on predicting the dropout rate, but very few use explainable learning patterns as part of this analysis. However, visual representation of learning patterns could provide deeper insights into learners' behaviour across different courses, whilst numerical analyses can -- and arguably, should -- be used to confirm the latter. Thus, this paper proposes and compares different granularity visualisations for learning patterns (based on clickstream data) for both course completers and non-completers. In the large-scale MOOCs we analysed, across various domains, our fine-grained, fish-eye visualisation approach showed that non-completers are more likely to jump forward in their learning sessions, often on a 'catch-up' path, whilst completers exhibit linear behaviour. For coarser, bird-eye granularity visualisation, we observed learners' transition between types of learning activity, obtaining typed transition graphs. The results, backed up by statistical significance analysis and machine learning, provide insights for course instructors to maintain engagement of learners by adapting the course design to not just 'dry' predicted values, but explainable, visually viable paths extracted.

CLApr 1, 2019
Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation

Nikolai Vogler, Craig Stewart, Graham Neubig

Simultaneous interpretation, the translation of speech from one language to another in real-time, is an inherently difficult and strenuous task. One of the greatest challenges faced by interpreters is the accurate translation of difficult terminology like proper names, numbers, or other entities. Intelligent computer-assisted interpreting (CAI) tools that could analyze the spoken word and detect terms likely to be untranslated by an interpreter could reduce translation error and improve interpreter performance. In this paper, we propose a task of predicting which terminology simultaneous interpreters will leave untranslated, and examine methods that perform this task using supervised sequence taggers. We describe a number of task-specific features explicitly designed to indicate when an interpreter may struggle with translating a word. Experimental results on a newly-annotated version of the NAIST Simultaneous Translation Corpus (Shimizu et al., 2014) indicate the promise of our proposed method.

CLFeb 25, 2019
Improving Robustness of Machine Translation with Synthetic Noise

Vaibhav Vaibhav, Sumeet Singh, Craig Stewart et al.

Modern Machine Translation (MT) systems perform consistently well on clean, in-domain text. However most human generated text, particularly in the realm of social media, is full of typos, slang, dialect, idiolect and other noise which can have a disastrous impact on the accuracy of output translation. In this paper we leverage the Machine Translation of Noisy Text (MTNT) dataset to enhance the robustness of MT systems by emulating naturally occurring noise in otherwise clean data. Synthesizing noise in this manner we are ultimately able to make a vanilla MT system resilient to naturally occurring noise and partially mitigate loss in accuracy resulting therefrom.

CLMay 10, 2018
Automatic Estimation of Simultaneous Interpreter Performance

Craig Stewart, Nikolai Vogler, Junjie Hu et al.

Simultaneous interpretation, translation of the spoken word in real-time, is both highly challenging and physically demanding. Methods to predict interpreter confidence and the adequacy of the interpreted message have a number of potential applications, such as in computer-assisted interpretation interfaces or pedagogical tools. We propose the task of predicting simultaneous interpreter performance by building on existing methodology for quality estimation (QE) of machine translation output. In experiments over five settings in three language pairs, we extend a QE pipeline to estimate interpreter performance (as approximated by the METEOR evaluation metric) and propose novel features reflecting interpretation strategy and evaluation measures that further improve prediction accuracy.