Vivek Kalyan

CL
h-index2
3papers
Novelty28%
AI Score33

3 Papers

CLOct 29, 2021Code
Handshakes AI Research at CASE 2021 Task 1: Exploring different approaches for multilingual tasks

Vivek Kalyan, Paul Tan, Shaun Tan et al.

The aim of the CASE 2021 Shared Task 1 (Hürriyetoğlu et al., 2021) was to detect and classify socio-political and crisis event information at document, sentence, cross-sentence, and token levels in a multilingual setting, with each of these subtasks being evaluated separately in each test language. Our submission contained entries in all of the subtasks, and the scores obtained validated our research finding: That the multilingual aspect of the tasks should be embraced, so that modeling and training regimes use the multilingual nature of the tasks to their mutual benefit, rather than trying to tackle the different languages separately. Our code is available at https://github.com/HandshakesByDC/case2021/

CLJul 27, 2021Code
Red Dragon AI at TextGraphs 2021 Shared Task: Multi-Hop Inference Explanation Regeneration by Matching Expert Ratings

Vivek Kalyan, Sam Witteveen, Martin Andrews

Creating explanations for answers to science questions is a challenging task that requires multi-hop inference over a large set of fact sentences. This year, to refocus the Textgraphs Shared Task on the problem of gathering relevant statements (rather than solely finding a single 'correct path'), the WorldTree dataset was augmented with expert ratings of 'relevance' of statements to each overall explanation. Our system, which achieved second place on the Shared Task leaderboard, combines initial statement retrieval; language models trained to predict the relevance scores; and ensembling of a number of the resulting rankings. Our code implementation is made available at https://github.com/mdda/worldtree_corpus/tree/textgraphs_2021

CLOct 28, 2025
Reinforcement Learning for Long-Horizon Multi-Turn Search Agents

Vivek Kalyan, Martin Andrews

Large Language Model (LLM) agents can leverage multiple turns and tools to solve complex tasks, with prompt-based approaches achieving strong performance. This work demonstrates that Reinforcement Learning (RL) can push capabilities significantly further by learning from experience. Through experiments on a legal document search benchmark, we show that our RL-trained 14 Billion parameter model outperforms frontier class models (85% vs 78% accuracy). In addition, we explore turn-restricted regimes, during training and at test-time, that show these agents achieve better results if allowed to operate over longer multi-turn horizons.