AO-PHSep 26, 2022
Deep generative model super-resolves spatially correlated multiregional climate dataNorihiro Oyama, Noriko N. Ishizaki, Satoshi Koide et al.
Super-resolving the coarse outputs of global climate simulations, termed downscaling, is crucial in making political and social decisions on systems requiring long-term climate change projections. Existing fast super-resolution techniques, however, have yet to preserve the spatially correlated nature of climatological data, which is particularly important when we address systems with spatial expanse, such as the development of transportation infrastructure. Herein, we show an adversarial network-based machine learning enables us to correctly reconstruct the inter-regional spatial correlations in downscaling with high magnification of up to fifty while maintaining pixel-wise statistical consistency. Direct comparison with the measured meteorological data of temperature and precipitation distributions reveals that integrating climatologically important physical information improves the downscaling performance, which prompts us to call this approach $π$SRGAN (Physics Informed Super-Resolution Generative Adversarial Network). The proposed method has a potential application to the inter-regionally consistent assessment of the climate change impact. Additionally, we present the outcomes of another variant of the deep generative model-based downscaling approach in which the low-resolution precipitation field is substituted with the pressure field, referred to as $ψ$SRGAN (Precipitation Source Inaccessible SRGAN). Remarkably, this method demonstrates unexpectedly good downscaling performance for the precipitation field.
NEJan 29
MolLIBRA: Genetic Molecular Optimization with Multi-Fingerprint Surrogates and Text-Molecule Aligned CriticMasahi Okada, Kazuki Sakai, Hiroaki Yoshida et al.
We study sample-efficient molecular optimization under a limited budget of oracle evaluations. We propose MolLIBRA (MultimOdaLity and Language Integrated Bayesian and evolutionaRy optimizAtion), a genetic algorithm based framework that pre-ranks candidate molecules using multiple critics before oracle calls: (i) an ensemble of Gaussian process (GP) surrogates defined over multiple molecular fingerprints and (ii) a pretrained text-molecule aligned encoder CLAMP. The GP ensemble enables adaptive selection of task-appropriate fingerprints, while CLAMP provides a zero-shot scoring signal from task descriptions by measuring the similarity between molecular and text embeddings. On the Practical Molecular Optimization (PMO) benchmark with a budget of 1,000 evaluations (PMO-1K), MolLIBRA-L, our variant with a language-model-based candidate generator, attains the best Top-10 AUC on 14/22 tasks and the highest overall sum of Top-10 AUC across tasks among prior methods.
AO-PHMay 12
Generative climate downscaling enables high-resolution compound risk assessment by preserving multivariate dependenciesTakuro Kutsuna, Noriko N. Ishizaki, Norihiro Oyama et al.
Physics-based climate projections using general circulation models are essential for assessing future risks, but their coarse resolution limits regional decision-making. Statistical downscaling can efficiently add detail, yet many methods treat variables independently, degrading inter-variable relationships that govern compound hazards such as heat stress, drought, and wildfire. Here we show that a diffusion-based multivariate generative framework, combined with bias correction, recovers degraded inter-variable correlations even under a 50$\times$ increase in linear resolution. When applied to five meteorological variables over Japan, the framework reduces inter-variable correlation errors by more than fourfold relative to existing baselines while improving both univariate and spatial accuracy, leading to more accurate detection of severe drought. These results demonstrate that multivariate generative downscaling improves the reliability of compound risk assessment under large resolution gaps.
SISep 29, 2025
Data-Driven Discrete Geofence Design Using Binary Quadratic ProgrammingKeisuke Otaki, Akihisa Okada, Tadayoshi Matsumori et al.
Geofences have attracted significant attention in the design of spatial and virtual regions for managing and engaging spatiotemporal events. By using geofences to monitor human activity across their boundaries, content providers can create spatially triggered events that include notifications about points of interest within a geofence by pushing spatial information to the devices of users. Traditionally, geofences were hand-crafted by providers. In addition to the hand-crafted approach, recent advances in collecting human mobility data through mobile devices can accelerate the automatic and data-driven design of geofences, also known as the geofence design problem. Previous approaches assume circular shapes; thus, their flexibility is insufficient, and they can only handle geofence-based applications for large areas with coarse resolutions. A challenge with using circular geofences in urban and high-resolution areas is that they often overlap and fail to align with political district boundaries and road segments, such as one-way streets and median barriers. In this study, we address the problem of extracting arbitrary shapes as geofences from human mobility data to mitigate this problem. In our formulation, we cast the existing optimization problems for circular geofences to 0-1 integer programming problems to represent arbitrary shapes. Although 0-1 integer programming problems are computationally hard, formulating them as quadratic (unconstrained) binary optimization problems enables efficient approximation of optimal solutions, because this allows the use of specialized quadratic solvers, such as the quantum annealing, and other state-of-the-art algorithms. We then develop and compare different formulation methods to extract discrete geofences. We confirmed that our new modeling approach enables flexible geofence design.
LGJun 5, 2024
Predicting unobserved climate time series data at distant areas via spatial correlation using reservoir computingShihori Koyama, Daisuke Inoue, Hiroaki Yoshida et al.
Collecting time series data spatially distributed in many locations is often important for analyzing climate change and its impacts on ecosystems. However, comprehensive spatial data collection is not always feasible, requiring us to predict climate variables at some locations. This study focuses on a prediction of climatic elements, specifically near-surface temperature and pressure, at a target location apart from a data observation point. Our approach uses two prediction methods: reservoir computing (RC), known as a machine learning framework with low computational requirements, and vector autoregression models (VAR), recognized as a statistical method for analyzing time series data. Our results show that the accuracy of the predictions degrades with the distance between the observation and target locations. We quantitatively estimate the distance in which effective predictions are possible. We also find that in the context of climate data, a geographical distance is associated with data correlation, and a strong data correlation significantly improves the prediction accuracy with RC. In particular, RC outperforms VAR in predicting highly correlated data within the predictive range. These findings suggest that machine learning-based methods can be used more effectively to predict climatic elements in remote locations by assessing the distance to them from the data observation point in advance. Our study on low-cost and accurate prediction of climate variables has significant value for climate change strategies.
LGFeb 18, 2022
SapientML: Synthesizing Machine Learning Pipelines by Learning from Human-Written SolutionsRipon K. Saha, Akira Ura, Sonal Mahajan et al.
Automatic machine learning, or AutoML, holds the promise of truly democratizing the use of machine learning (ML), by substantially automating the work of data scientists. However, the huge combinatorial search space of candidate pipelines means that current AutoML techniques, generate sub-optimal pipelines, or none at all, especially on large, complex datasets. In this work we propose an AutoML technique SapientML, that can learn from a corpus of existing datasets and their human-written pipelines, and efficiently generate a high-quality pipeline for a predictive task on a new dataset. To combat the search space explosion of AutoML, SapientML employs a novel divide-and-conquer strategy realized as a three-stage program synthesis approach, that reasons on successively smaller search spaces. The first stage uses a machine-learned model to predict a set of plausible ML components to constitute a pipeline. In the second stage, this is then refined into a small pool of viable concrete pipelines using syntactic constraints derived from the corpus and the machine-learned model. Dynamically evaluating these few pipelines, in the third stage, provides the best solution. We instantiate SapientML as part of a fully automated tool-chain that creates a cleaned, labeled learning corpus by mining Kaggle, learns from it, and uses the learned models to then synthesize pipelines for new predictive tasks. We have created a training corpus of 1094 pipelines spanning 170 datasets, and evaluated SapientML on a set of 41 benchmark datasets, including 10 new, large, real-world datasets from Kaggle, and against 3 state-of-the-art AutoML tools and 2 baselines. Our evaluation shows that SapientML produces the best or comparable accuracy on 27 of the benchmarks while the second best tool fails to even produce a pipeline on 9 of the instances.
SEDec 21, 2021
Elixir: Effective object-oriented program repairRipon K. Saha, Yingjun Lyu, Hiroaki Yoshida et al.
This work is motivated by the pervasive use of method invocations in object-oriented (OO) programs, and indeed their prevalence in patches of OO-program bugs. We propose a generate-and-validate repair technique, called ELIXIR designed to be able to generate such patches. ELIXIR aggressively uses method calls, on par with local variables, fields, or constants, to construct more expressive repair-expressions, that go into synthesizing patches. The ensuing enlargement of the repair space, on account of the wider use of method calls, is effectively tackled by using a machine-learnt model to rank concrete repairs. The machine-learnt model relies on four features derived from the program context, i.e., the code surrounding the potential repair location, and the bug report. We implement ELIXIR and evaluate it on two datasets, the popular Defects4J dataset and a new dataset Bugs.jar created by us, and against 2 baseline versions of our technique, and 5 other techniques representing the state of the art in program repair. Our evaluation shows that ELIXIR is able to increase the number of correctly repaired bugs in Defects4J by 85% (from 14 to 26) and by 57% in Bugs.jar (from 14 to 22), while also significantly out-performing other state-of-the-art repair techniques including ACS, HD-Repair, NOPOL, PAR, and jGenProg.
LGAug 21, 2021
Reservoir Computing with Diverse Timescales for Prediction of Multiscale DynamicsGouhei Tanaka, Tadayoshi Matsumori, Hiroaki Yoshida et al.
Machine learning approaches have recently been leveraged as a substitute or an aid for physical/mathematical modeling approaches to dynamical systems. To develop an efficient machine learning method dedicated to modeling and prediction of multiscale dynamics, we propose a reservoir computing (RC) model with diverse timescales by using a recurrent network of heterogeneous leaky integrator (LI) neurons. We evaluate computational performance of the proposed model in two time series prediction tasks related to four chaotic fast-slow dynamical systems. In a one-step-ahead prediction task where input data are provided only from the fast subsystem, we show that the proposed model yields better performance than the standard RC model with identical LI neurons. Our analysis reveals that the timescale required for producing each component of target multiscale dynamics is appropriately and flexibly selected from the reservoir dynamics by model training. In a long-term prediction task, we demonstrate that a closed-loop version of the proposed model can achieve longer-term predictions compared to the counterpart with identical LI neurons depending on the hyperparameter setting.
OCNov 16, 2020
Optimal Transport-based Coverage Control for Swarm Robot Systems: Generalization of the Voronoi Tessellation-based MethodDaisuke Inoue, Yuji Ito, Hiroaki Yoshida
Swarm robot systems, which consist of many cooperating mobile robots, have attracted attention for their environmental adaptability and fault tolerance advantages. One of the most important tasks for such systems is coverage control, in which robots autonomously deploy to approximate a given spatial distribution. In this study, we formulate a coverage control paradigm using the concept of optimal transport and propose a novel control technique, which we have termed the optimal transport-based coverage control (OTCC) method. The proposed OTCC, derived via the gradient flow of the cost function in the Kantorovich dual problem, is shown to covers a widely used existing control method as a special case. We also perform a Lyapunov stability analysis of the controlled system, and provide numerical calculations to show that the OTCC reproduces target distributions with better performance than the existing control method.
OCApr 16, 2020
Model Predictive Mean Field Games for Controlling Multi-Agent SystemsDaisuke Inoue, Yuji Ito, Takahito Kashiwabara et al.
When controlling multi-agent systems, the trade-off between performance and scalability is a major challenge. Here, we address this difficulty by using mean field games (MFGs), which is a framework that deduces the macroscopic dynamics describing the density profile of agents from their microscopic dynamics. To effectively use the MFG, we propose a model predictive MFG (MP-MFG), which estimates the agent population density profile with using kernel density estimation and manages the input generation with model predictive control. The proposed MP-MFG generates control inputs by monitoring the agent population at each time step, and thus achieves higher robustness than the conventional MFG. Numerical results show that the MP-MFG outperforms the MFG when the agent model has modeling errors or the number of agents in the system is small.