Yuichi Motai

LG
h-index6
4papers
4citations
Novelty49%
AI Score47

4 Papers

54.9NEJun 3
Multi-Column RBF Neural Network Using Adaptive and Non-Adaptive Particle Swarm Optimization

Ammar Hoori, Yuichi Motai

The radial basis function neural network (RBFN) trained with a gradient descending algorithm provides an effective fully connected structure in both shallow and deep networks. The error correction (ErrCor), a state-of-the-art gradient-based training method, selects optimal hidden units to improve accuracy. Alternatively, as a population-based algorithm, the particle swarm optimization algorithm (PSO) uses the swarm experience to optimize RBFN parameters, offering global search and robustness to local minima. Adaptive PSO (APSO) has emerged as an improved variant of PSO. APSO algorithm improves convergence speed by dynamically adjusting swarm parameters during optimization. Both ErrCor and PSO demonstrate improved results and competitive convergence. However, with large datasets, these methods face scalability challenges such as excessive kernel computations and large hidden layer structures. A recent multi-column RBFN approach (MCRN) improves ErrCor performance by deploying small RBFNs in a parallel system. Inspired by MCRN's success, we propose two novel approaches to improve PSO performance: the multi-column RBFN with PSO (MC-PSO) and the multi-column RBFN with APSO (MC-APSO). These methods introduce parallel RBFN structures trained using evolutionary swarm methods. Each RBFN is independently trained on a specific spatial subset of the dataset using either PSO or APSO algorithms. These resulting specialist-trained RBFNs are tailored to their respective subsets. During testing, only selected RBFNs, where the test instance neighbors are located, contribute to the multi-column output. This specialization improves accuracy, while parallelism enhances speed. We evaluate the proposed methods on various benchmark datasets. The MC-PSO and MC-APSO outperform ErrCor, PSO, APSO, and MCRN in terms of accuracy and recall. They also demonstrate faster training and testing times in most experiments.

19.5ROJun 3
Uncertainty-Aware Adaptive Sensor Fusion for Autonomous Navigation

Simegnew Yihunie Alaba, Yuichi Motai

This work introduces a hybrid deep learning approach integrated with an Unscented Kalman Filter (UKF) to enhance pose estimation accuracy in Visual-Inertial Odometry (VIO) for autonomous navigation. The proposed model employs a Vision Transformer (ViT) network to effectively capture temporal dependencies from inertial measurement unit (IMU) data and utilizes a Multiscale Convolutional Neural Network (MCNN) to learn optical flow-based motion cues from visual data. An adaptive sensor fusion module dynamically weights IMU and visual features by leveraging estimated uncertainty, thus improving robustness in diverse and challenging environmental conditions. Additionally, a novel uncertainty-aware loss function is proposed to explicitly incorporate prediction uncertainty into the learning process, enabling robust and accurate navigation under noisy, incomplete, or unreliable sensor inputs. Comprehensive evaluations of the KITTI dataset demonstrate that the proposed method significantly outperforms baseline approaches, achieving superior performance in terms of Absolute Trajectory Error (ATE) and Relative Pose Error (RPE). The lightweight and computationally efficient model processes data at 155 FPS on an NVIDIA A100 GPU, making it highly suitable for deployment in resource-constrained autonomous systems.

43.2LGMay 6
Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning

Alper Kamil Bozkurt, Xiaoan Xu, Shangtong Zhang et al.

In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained with offline RL are evaluated via either off-policy evaluation (OPE) or online evaluation (OE). The policy with the highest estimated value is then deployed and continually fine-tuned. However, this setup has two main issues. First, OPE can be unreliable, making it risky to deploy a policy based solely on those estimates, whereas OE may identify a viable policy with substantial online interaction, which could have been used for fine-tuning. Second--and more importantly--it is also often not possible to determine a priori whether a pretrained policy will improve with post-deployment fine-tuning, especially in non-stationary environments. As a result, procedures committing to a single deployed policy are impractical in many real-world settings. Moreover, a naive remedy that exhaustively fine-tunes all candidates would violate interaction budget constraints and is likewise infeasible. In this paper, we propose a novel adaptive approach for policy selection and fine-tuning under online interaction budgets in O2O-RL. Following the standard pipeline, we first train a set of candidate policies with different offline RL algorithms and hyperparameters; we then perform OPE to obtain initial performance estimates. We next adaptively select and fine-tune the policies based on their predicted performance via an upper-confidence-bound approach thereby making efficient use of online interactions. We demonstrate that our approach improves upon O2O-RL baselines with various benchmarks.

LGSep 29, 2025
Safe In-Context Reinforcement Learning

Amir Moeini, Minjae Kwon, Alper Kamil Bozkurt et al.

In-context reinforcement learning (ICRL) is an emerging RL paradigm where the agent, after some pretraining procedure, is able to adapt to out-of-distribution test tasks without any parameter updates. The agent achieves this by continually expanding the input (i.e., the context) to its policy neural networks. For example, the input could be all the history experience that the agent has access to until the current time step. The agent's performance improves as the input grows, without any parameter updates. In this work, we propose the first method that promotes the safety of ICRL's adaptation process in the framework of constrained Markov Decision Processes. In other words, during the parameter-update-free adaptation process, the agent not only maximizes the reward but also minimizes an additional cost function. We also demonstrate that our agent actively reacts to the threshold (i.e., budget) of the cost tolerance. With a higher cost budget, the agent behaves more aggressively, and with a lower cost budget, the agent behaves more conservatively.