Yuki Nakayama

h-index12
2papers

2 Papers

CLMar 21, 2024
RakutenAI-7B: Extending Large Language Models for Japanese

Rakuten Group, Aaron Levine, Connie Huang et al.

We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.

CLDec 17, 2025
Rakuten Data Release: A Large-Scale and Long-Term Reviews Corpus for Hotel Domain

Yuki Nakayama, Koki Hikichi, Yun Ching Liu et al.

This paper presents a large-scale corpus of Rakuten Travel Reviews. Our collection contains 7.29 million customer reviews for 16 years, ranging from 2009 to 2024. Each record in the dataset contains the review text, its response from an accommodation, an anonymized reviewer ID, review date, accommodation ID, plan ID, plan title, room type, room name, purpose, accompanying group, and user ratings from six aspect categories, as well as an overall score. We present statistical information about our corpus and provide insights into factors driving data drift between 2019 and 2024 using statistical approaches.