97.3HOJun 4
Benchmarks in LeipzigAndrei Balakin, Miklós Bóna, Marie-Charlotte Brandenburg et al.
Between April 1 and May 15, 2026, a group of 49 mathematicians compiled a dataset of research-level mathematics questions with known answers. Most of the work was done during the 3-day workshop *Benchmarks in Leipzig* with 35 participants at the Max Planck Institute for Mathematics in the Sciences in Leipzig, Germany. We present the resulting collection of 100 questions. We evaluated these questions in three stages: a single attempt by five state-of-the-art LLMs, followed by a 20-runs-per-model evaluation with three of these models, and finally a 3-run attempt with two heavy-thinking models. After Stage 1, 41 questions remained completely unsolved; after Stage 2, this count dropped to 16; and we concluded Stage 3 with only 2 unsolved questions. This demonstrates that the mathematical reasoning capabilities of LLMs are becoming impressive.
SENov 15, 2017
Programming Bots by Synthesizing Natural Language Expressions into API InvocationsShayan Zamanirad, Boualem Benatallah, Moshe Chai Barukh et al.
At present, bots are still in their preliminary stages of development. Many are relatively simple, or developed ad-hoc for a very specific use-case. For this reason, they are typically programmed manually, or utilize machine-learning classifiers to interpret a fixed set of user utterances. In reality, real world conversations with humans require support for dynamically capturing users expressions. Moreover, bots will derive immeasurable value by programming them to invoke APIs for their results. Today, within the Web and Mobile development community, complex applications are being stringed together with a few lines of code -- all made possible by APIs. Yet, developers today are not as empowered to program bots in much the same way. To overcome this, we introduce BotBase, a bot programming platform that dynamically synthesizes natural language user expressions into API invocations. Our solution is two faceted: Firstly, we construct an API knowledge graph to encode and evolve APIs; secondly, leveraging the above we apply techniques in NLP, ML and Entity Recognition to perform the required synthesis from natural language user expressions into API calls.
IRSep 15, 2017
Algorithms and Architecture for Real-time Recommendations at News UKDion Bailey, Tom Pajak, Daoud Clarke et al.
Recommendation systems are recognised as being hugely important in industry, and the area is now well understood. At News UK, there is a requirement to be able to quickly generate recommendations for users on news items as they are published. However, little has been published about systems that can generate recommendations in response to changes in recommendable items and user behaviour in a very short space of time. In this paper we describe a new algorithm for updating collaborative filtering models incrementally, and demonstrate its effectiveness on clickstream data from The Times. We also describe the architecture that allows recommendations to be generated on the fly, and how we have made each component scalable. The system is currently being used in production at News UK.
IRAug 9, 2017
Ephemeral Context to Support Robust and Diverse Music RecommendationsPavel Kucherbaev, Nava Tintarev, Carlos Rodriguez
While prior work on context-based music recommendation focused on fixed set of contexts (e.g. walking, driving, jogging), we propose to use multiple sensors and external data sources to describe momentary (ephemeral) context in a rich way with a very large number of possible states (e.g. jogging fast along in downtown of Sydney under a heavy rain at night being tired and angry). With our approach, we address the problems which current approaches face: 1) a limited ability to infer context from missing or faulty sensor data; 2) an inability to use contextual information to support novel content discovery.