Kosuke Nakanishi

h-index3

4papers

1citation

Novelty53%

AI Score49

Ranked #46,576 of 201,326 authors (top 23%)#10,608 in LG (top 25%)

4 Papers

QUANT-PHApr 22

Hamiltonian simulation for 3D elastic wave equations in homogeneous elastic media

Kosuke Nakanishi, Hiroshi Yano, Yuki Sato

We present an explicit quantum circuit construction for Hamiltonian simulation of a first-order velocity--stress formulation of the three-dimensional elastic wave equation in homogeneous isotropic media. Previous studies have shown how elastic wave equations can be cast into forms amenable to Hamiltonian simulation, but they typically rely on black box Hamiltonian access assumptions, making gate complexity estimation difficult. Starting from the first-order velocity--stress formulation, we discretize the system by finite differences, transform it into Schrödinger form, and exploit the separation between the component register and the spatial register to decompose the Hamiltonian into structured tensor product terms. This yields explicit implementations of first-order and second-order Trotter formulas for the resulting time evolution operator. We derive corresponding error bounds and constant sensitive qubit and CNOT complexity estimates in terms of the discretization parameter, simulation time, target accuracy, and material parameters. Numerical experiments validate the proposed framework through comparisons with the exact time evolution and reconstructed physical fields.

LGJun 20, 2025Code

Off-Policy Actor-Critic for Adversarial Observation Robustness: Virtual Alternative Training via Symmetric Policy Evaluation

Kosuke Nakanishi, Akihiro Kubo, Yuji Yasui et al.

Recently, robust reinforcement learning (RL) methods designed to handle adversarial input observations have received significant attention, motivated by RL's inherent vulnerabilities. While existing approaches have demonstrated reasonable success, addressing worst-case scenarios over long time horizons requires both minimizing the agent's cumulative rewards for adversaries and training agents to counteract them through alternating learning. However, this process introduces mutual dependencies between the agent and the adversary, making interactions with the environment inefficient and hindering the development of off-policy methods. In this work, we propose a novel off-policy method that eliminates the need for additional environmental interactions by reformulating adversarial learning as a soft-constrained optimization problem. Our approach is theoretically supported by the symmetric property of policy evaluation between the agent and the adversary. The implementation is available at https://github.com/nakanakakosuke/VALT_SAC.

LGMay 9

A Single Deep Preference-Conditioned Policy for Learning Pareto Coverage Sets

Akihiro Kubo, Kosuke Nakanishi, Shin Ishii

Preference-conditioned multi-objective reinforcement learning aims to learn a single policy that captures trade-offs across preferences, but under nonlinear scalarization the uniqueness and continuity of the preference-to-solution correspondence remain unclear. We study this problem in tabular multi-objective Markov decision processes (MDPs) using smooth Tchebycheff scalarization as a monotone utility. Under mild interior conditions on the preference set, we prove that each preference induces a unique Pareto-optimal return vector and that this vector depends Lipschitz-continuously on the preference, providing a principled foundation for preference sweeping toward dense Pareto-front coverage. To compute these targets, we formulate the problem over occupancy measures and derive Concave Mirror Descent Policy Iteration (CMDPI), which achieves an $O(1/k)$ objective-suboptimality rate. We further show that each update is equivalent to solving a Kullback-Leibler-regularized MDP with the previous policy as reference, yielding a policy-iteration interpretation and finite-iterate policy continuity across preferences. We instantiate the update as a deep actor-critic algorithm preserving previous-policy regularization. On eight MO-Gymnasium tasks, it achieves the best average hypervolume rank among recent baselines and strong expected-utility performance. Continuous-control experiments indicate gains beyond the discrete-action setting.

LGAug 31, 2024

Robust off-policy Reinforcement Learning via Soft Constrained Adversary

Kosuke Nakanishi, Akihiro Kubo, Yuji Yasui et al.

Recently, robust reinforcement learning (RL) methods against input observation have garnered significant attention and undergone rapid evolution due to RL's potential vulnerability. Although these advanced methods have achieved reasonable success, there have been two limitations when considering adversary in terms of long-term horizons. First, the mutual dependency between the policy and its corresponding optimal adversary limits the development of off-policy RL algorithms; although obtaining optimal adversary should depend on the current policy, this has restricted applications to off-policy RL. Second, these methods generally assume perturbations based only on the $L_p$-norm, even when prior knowledge of the perturbation distribution in the environment is available. We here introduce another perspective on adversarial RL: an f-divergence constrained problem with the prior knowledge distribution. From this, we derive two typical attacks and their corresponding robust learning frameworks. The evaluation of robustness is conducted and the results demonstrate that our proposed methods achieve excellent performance in sample-efficient off-policy RL.