Kazuhide Nakata

LG
h-index12
12papers
334citations
Novelty47%
AI Score46

12 Papers

LGNov 30, 2023Code
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami et al.

Off-Policy Evaluation (OPE) aims to assess the effectiveness of counterfactual policies using only offline logged data and is often used to identify the top-k promising policies for deployment in online A/B tests. Existing evaluation metrics for OPE estimators primarily focus on the "accuracy" of OPE or that of downstream policy selection, neglecting risk-return tradeoff in the subsequent online policy deployment. To address this issue, we draw inspiration from portfolio evaluation in finance and develop a new metric, called SharpeRatio@k, which measures the risk-return tradeoff of policy portfolios formed by an OPE estimator under varying online evaluation budgets (k). We validate our metric in two example scenarios, demonstrating its ability to effectively distinguish between low-risk and high-risk estimators and to accurately identify the most efficient one. Efficiency of an estimator is characterized by its capability to form the most advantageous policy portfolios, maximizing returns while minimizing risks during online deployment, a nuance that existing metrics typically overlook. To facilitate a quick, accurate, and consistent evaluation of OPE via SharpeRatio@k, we have also integrated this metric into an open-source software, SCOPE-RL (https://github.com/hakuhodo-technologies/scope-rl). Employing SharpeRatio@k and SCOPE-RL, we conduct comprehensive benchmarking experiments on various estimators and RL tasks, focusing on their risk-return tradeoff. These experiments offer several interesting directions and suggestions for future OPE research.

LGNov 30, 2023Code
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami et al.

This paper introduces SCOPE-RL, a comprehensive open-source Python software designed for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection (OPS). Unlike most existing libraries that focus solely on either policy learning or evaluation, SCOPE-RL seamlessly integrates these two key aspects, facilitating flexible and complete implementations of both offline RL and OPE processes. SCOPE-RL put particular emphasis on its OPE modules, offering a range of OPE estimators and robust evaluation-of-OPE protocols. This approach enables more in-depth and reliable OPE compared to other packages. For instance, SCOPE-RL enhances OPE by estimating the entire reward distribution under a policy rather than its mere point-wise expected value. Additionally, SCOPE-RL provides a more thorough evaluation-of-OPE by presenting the risk-return tradeoff in OPE results, extending beyond mere accuracy evaluations in existing OPE literature. SCOPE-RL is designed with user accessibility in mind. Its user-friendly APIs, comprehensive documentation, and a variety of easy-to-follow examples assist researchers and practitioners in efficiently implementing and experimenting with various offline RL methods and OPE estimators, tailored to their specific problem contexts. The documentation of SCOPE-RL is available at https://scope-rl.readthedocs.io/en/latest/.

IRJul 14, 2023
An IPW-based Unbiased Ranking Metric in Two-sided Markets

Keisho Oh, Naoki Nishimura, Minje Sung et al.

In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions require matching preferences from both users. This paper addresses the complex interaction of biases between users in two-sided markets and proposes a tailored LTR approach. We first present a formulation of feedback mechanisms in two-sided matching platforms and point out that their implicit feedback may include position bias from both user groups. On the basis of this observation, we extend the IPW estimator and propose a new estimator, named two-sided IPW, to address the position bases in two-sided markets. We prove that the proposed estimator satisfies the unbiasedness for the ground-truth ranking metric. We conducted numerical experiments on real-world two-sided platforms and demonstrated the effectiveness of our proposed method in terms of both precision and robustness. Our experiments showed that our method outperformed baselines especially when handling rare items, which are less frequently observed in the training data.

LGJul 6, 2024
Balancing Immediate Revenue and Future Off-Policy Evaluation in Coupon Allocation

Naoki Nishimura, Ken Kobayashi, Kazuhide Nakata

Coupon allocation drives customer purchases and boosts revenue. However, it presents a fundamental trade-off between exploiting the current optimal policy to maximize immediate revenue and exploring alternative policies to collect data for future policy improvement via off-policy evaluation (OPE). To balance this trade-off, we propose a novel approach that combines a model-based revenue maximization policy and a randomized exploration policy for data collection. Our framework enables flexible adjustment of the mixture ratio between these two policies to optimize the balance between short-term revenue and future policy improvement. We formulate the problem of determining the optimal mixture ratio as multi-objective optimization, enabling quantitative evaluation of this trade-off. We empirically verified the effectiveness of the proposed mixed policy using synthetic data. Our main contributions are: (1) Demonstrating a mixed policy combining deterministic and probabilistic policies, flexibly adjusting the data collection vs. revenue trade-off. (2) Formulating the optimal mixture ratio problem as multi-objective optimization, enabling quantitative evaluation of this trade-off.

CVDec 14, 2025
Content-Aware Ad Banner Layout Generation with Two-Stage Chain-of-Thought in Vision Language Models

Kei Yoshitake, Kento Hosono, Ken Kobayashi et al.

In this paper, we propose a method for generating layouts for image-based advertisements by leveraging a Vision-Language Model (VLM). Conventional advertisement layout techniques have predominantly relied on saliency mapping to detect salient regions within a background image, but such approaches often fail to fully account for the image's detailed composition and semantic content. To overcome this limitation, our method harnesses a VLM to recognize the products and other elements depicted in the background and to inform the placement of text and logos. The proposed layout-generation pipeline consists of two steps. In the first step, the VLM analyzes the image to identify object types and their spatial relationships, then produces a text-based "placement plan" based on this analysis. In the second step, that plan is rendered into the final layout by generating HTML-format code. We validated the effectiveness of our approach through evaluation experiments, conducting both quantitative and qualitative comparisons against existing methods. The results demonstrate that by explicitly considering the background image's content, our method produces noticeably higher-quality advertisement layouts.

LGOct 23, 2025
Hierarchical Time Series Forecasting with Robust Reconciliation

Shuhei Aikawa, Aru Suzuki, Kei Yoshitake et al.

This paper focuses on forecasting hierarchical time-series data, where each higher-level observation equals the sum of its corresponding lower-level time series. In such contexts, the forecast values should be coherent, meaning that the forecast value of each parent series exactly matches the sum of the forecast values of its child series. Existing hierarchical forecasting methods typically generate base forecasts independently for each series and then apply a reconciliation procedure to adjust them so that the resulting forecast values are coherent across the hierarchy. These methods generally derive an optimal reconciliation, using a covariance matrix of the forecast error. In practice, however, the true covariance matrix is unknown and has to be estimated from finite samples in advance. This gap between the true and estimated covariance matrix may degrade forecast performance. To address this issue, we propose a robust optimization framework for hierarchical reconciliation that accounts for uncertainty in the estimated covariance matrix. We first introduce an uncertainty set for the estimated covariance matrix and formulate a reconciliation problem that minimizes the worst-case expected squared error over this uncertainty set. We show that our problem can be cast as a semidefinite optimization problem. Numerical experiments demonstrate that the proposed robust reconciliation method achieved better forecast performance than existing hierarchical forecasting methods, which indicates the effectiveness of integrating uncertainty into the reconciliation process.

LGJun 12, 2025
Interior-Point Vanishing Problem in Semidefinite Relaxations for Neural Network Verification

Ryota Ueda, Takami Sato, Ken Kobayashi et al.

Semidefinite programming (SDP) relaxation has emerged as a promising approach for neural network verification, offering tighter bounds than other convex relaxation methods for deep neural networks (DNNs) with ReLU activations. However, we identify a critical limitation in the SDP relaxation when applied to deep networks: interior-point vanishing, which leads to the loss of strict feasibility -- a crucial condition for the numerical stability and optimality of SDP. Through rigorous theoretical and empirical analysis, we demonstrate that as the depth of DNNs increases, the strict feasibility is likely to be lost, creating a fundamental barrier to scaling SDP-based verification. To address the interior-point vanishing, we design and investigate five solutions to enhance the feasibility conditions of the verification problem. Our methods can successfully solve 88% of the problems that could not be solved by existing methods, accounting for 41% of the total. Our analysis also reveals that the valid constraints for the lower and upper bounds for each ReLU unit are traditionally inherited from prior work without solid reasons, but are actually not only unbeneficial but also even harmful to the problem's feasibility. This work provides valuable insights into the fundamental challenges of SDP-based DNN verification and offers practical solutions to improve its applicability to deeper neural networks, contributing to the development of more reliable and secure systems with DNNs.

QUANT-PHOct 14, 2021
Solving Large Break Minimization Problems in a Mirrored Double Round-robin Tournament Using Quantum Annealing

Michiya Kuramata, Ryota Katsuki, Kazuhide Nakata

Quantum annealing (QA) has gained considerable attention because it can be applied to combinatorial optimization problems, which have numerous applications in logistics, scheduling, and finance. In recent years, research on solving practical combinatorial optimization problems using them has accelerated. However, researchers struggle to find practical combinatorial optimization problems, for which quantum annealers outperform other mathematical optimization solvers. Moreover, there are only a few studies that compare the performance of quantum annealers with one of the most sophisticated mathematical optimization solvers, such as Gurobi and CPLEX. In our study, we determine that QA demonstrates better performance than the solvers in the break minimization problem in a mirrored double round-robin tournament (MDRRT). We also explain the desirable performance of QA for the sparse interaction between variables and a problem without constraints. In this process, we demonstrate that the break minimization problem in an MDRRT can be expressed as a 4-regular graph. Through computational experiments, we solve this problem using our QA approach and two-integer programming approaches, which were performed using the latest quantum annealer D-Wave Advantage, and the sophisticated mathematical optimization solver, Gurobi, respectively. Further, we compare the quality of the solutions and the computational time. QA was able to determine the exact solution in 0.05 seconds for problems with 20 teams, which is a practical size. In the case of 36 teams, it took 84.8 s for the integer programming method to reach the objective function value, which was obtained by the quantum annealer in 0.05 s. These results not only present the break minimization problem in an MDRRT as an example of applying QA to practical optimization problems, but also contribute to find problems that can be effectively solved by QA.

TRJun 6, 2021
Online Trading Models with Deep Reinforcement Learning in the Forex Market Considering Transaction Costs

Koya Ishikawa, Kazuhide Nakata

In recent years, a wide range of investment models have been created using artificial intelligence. Automatic trading by artificial intelligence can expand the range of trading methods, such as by conferring the ability to operate 24 hours a day and the ability to trade with high frequency. Automatic trading can also be expected to trade with more information than is available to humans if it can sufficiently consider past data. In this paper, we propose an investment agent based on a deep reinforcement learning model, which is an artificial intelligence model. The model considers the transaction costs involved in actual trading and creates a framework for trading over a long period of time so that it can make a large profit on a single trade. In doing so, it can maximize the profit while keeping transaction costs low. In addition, in consideration of actual operations, we use online learning so that the system can continue to learn by constantly updating the latest online data instead of learning with static data. This makes it possible to trade in non-stationary financial markets by always incorporating current market trend information.

LGJun 5, 2021
Constrained Generalized Additive 2 Model with Consideration of High-Order Interactions

Akihisa Watanabe, Michiya Kuramata, Kaito Majima et al.

In recent years, machine learning and AI have been introduced in many industrial fields. In fields such as finance, medicine, and autonomous driving, where the inference results of a model may have serious consequences, high interpretability as well as prediction accuracy is required. In this study, we propose CGA2M+, which is based on the Generalized Additive 2 Model (GA2M) and differs from it in two major ways. The first is the introduction of monotonicity. Imposing monotonicity on some functions based on an analyst's knowledge is expected to improve not only interpretability but also generalization performance. The second is the introduction of a higher-order term: given that GA2M considers only second-order interactions, we aim to balance interpretability and prediction accuracy by introducing a higher-order term that can capture higher-order interactions. In this way, we can improve prediction performance without compromising interpretability by applying learning innovation. Numerical experiments showed that the proposed model has high predictive performance and interpretability. Furthermore, we confirmed that generalization performance is improved by introducing monotonicity.

CLDec 18, 2020
Technical Progress Analysis Using a Dynamic Topic Model for Technical Terms to Revise Patent Classification Codes

Mana Iwata, Yoshiro Matsuda, Yoshimasa Utsumi et al.

Japanese patents are assigned a patent classification code, FI (File Index), that is unique to Japan. FI is a subdivision of the IPC, an international patent classification code, that is related to Japanese technology. FIs are revised to keep up with technological developments. These revisions have already established more than 30,000 new FIs since 2006. However, these revisions require a lot of time and workload. Moreover, these revisions are not automated and are thus inefficient. Therefore, using machine learning to assist in the revision of patent classification codes (FI) will lead to improved accuracy and efficiency. This study analyzes patent documents from this new perspective of assisting in the revision of patent classification codes with machine learning. To analyze time-series changes in patents, we used the dynamic topic model (DTM), which is an extension of the latent Dirichlet allocation (LDA). Also, unlike English, the Japanese language requires morphological analysis. Patents contain many technical words that are not used in everyday life, so morphological analysis using a common dictionary is not sufficient. Therefore, we used a technique for extracting technical terms from text. After extracting technical terms, we applied them to DTM. In this study, we determined the technological progress of the lighting class F21 for 14 years and compared it with the actual revision of patent classification codes. In other words, we extracted technical terms from Japanese patents and applied DTM to determine the progress of Japanese technology. Then, we analyzed the results from the new perspective of revising patent classification codes with machine learning. As a result, it was found that those whose topics were on the rise were judged to be new technologies.

MLSep 9, 2019
Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback

Yuta Saito, Suguru Yaginuma, Yuta Nishino et al.

Recommender systems widely use implicit feedback such as click data because of its general availability. Although the presence of clicks signals the users' preference to some extent, the lack of such clicks does not necessarily indicate a negative response from the users, as it is possible that the users were not exposed to the items (positive-unlabeled problem). This leads to a difficulty in predicting the users' preferences from implicit feedback. Previous studies addressed the positive-unlabeled problem by uniformly upweighting the loss for the positive feedback data or estimating the confidence of each data having relevance information via the EM-algorithm. However, these methods failed to address the missing-not-at-random problem in which popular or frequently recommended items are more likely to be clicked than other items even if a user does not have a considerable interest in them. To overcome these limitations, we first define an ideal loss function to be optimized to realize recommendations that maximize the relevance and propose an unbiased estimator for the ideal loss. Subsequently, we analyze the variance of the proposed unbiased estimator and further propose a clipped estimator that includes the unbiased estimator as a special case. We demonstrate that the clipped estimator is expected to improve the performance of the recommender system, by considering the bias-variance trade-off. We conduct semi-synthetic and real-world experiments and demonstrate that the proposed method largely outperforms the baselines. In particular, the proposed method works better for rare items that are less frequently observed in the training data. The findings indicate that the proposed method can better achieve the objective of recommending items with the highest relevance.