Takuro Kutsuna

h-index7

13papers

21citations

Novelty55%

AI Score44

Ranked #71,886 of 201,018 authors (top 36%)#16,313 in LG (top 38%)

13 Papers

LGFeb 21, 2024Code

Accuracy-Preserving Calibration via Statistical Modeling on Probability Simplex

Yasushi Esaki, Akihiro Nakamura, Keisuke Kawano et al.

Classification models based on deep neural networks (DNNs) must be calibrated to measure the reliability of predictions. Some recent calibration methods have employed a probabilistic model on the probability simplex. However, these calibration methods cannot preserve the accuracy of pre-trained models, even those with a high classification accuracy. We propose an accuracy-preserving calibration method using the Concrete distribution as the probabilistic model on the probability simplex. We theoretically prove that a DNN model trained on cross-entropy loss has optimality as the parameter of the Concrete distribution. We also propose an efficient method that synthetically generates samples for training probabilistic models on the probability simplex. We demonstrate that the proposed method can outperform previous methods in accuracy-preserving calibration tasks using benchmarks. The code is available at https://github.com/ToyotaCRDL/SimplexTS.

LGApr 7, 2023

Supervised Contrastive Learning with Heterogeneous Similarity for Distribution Shifts

Takuro Kutsuna

Distribution shifts are problems where the distribution of data changes between training and testing, which can significantly degrade the performance of a model deployed in the real world. Recent studies suggest that one reason for the degradation is a type of overfitting, and that proper regularization can mitigate the degradation, especially when using highly representative models such as neural networks. In this paper, we propose a new regularization using the supervised contrastive learning to prevent such overfitting and to train models that do not degrade their performance under the distribution shifts. We extend the cosine similarity in contrastive loss to a more general similarity measure and propose to use different parameters in the measure when comparing a sample to a positive or negative example, which is analytically shown to act as a kind of margin in contrastive loss. Experiments on benchmark datasets that emulate distribution shifts, including subpopulation shift and domain generalization, demonstrate the advantage of the proposed method over existing regularization methods.

AO-PHMay 12

Generative climate downscaling enables high-resolution compound risk assessment by preserving multivariate dependencies

Takuro Kutsuna, Noriko N. Ishizaki, Norihiro Oyama et al.

Physics-based climate projections using general circulation models are essential for assessing future risks, but their coarse resolution limits regional decision-making. Statistical downscaling can efficiently add detail, yet many methods treat variables independently, degrading inter-variable relationships that govern compound hazards such as heat stress, drought, and wildfire. Here we show that a diffusion-based multivariate generative framework, combined with bias correction, recovers degraded inter-variable correlations even under a 50$\times$ increase in linear resolution. When applied to five meteorological variables over Japan, the framework reduces inter-variable correlation errors by more than fourfold relative to existing baselines while improving both univariate and spatial accuracy, leading to more accurate detection of severe drought. These results demonstrate that multivariate generative downscaling improves the reliability of compound risk assessment under large resolution gaps.

MLDec 25, 2025

Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models

Takuro Kutsuna

Diffusion models have become a central tool in deep generative modeling, but standard formulations rely on a single network and a single diffusion schedule to transform a simple prior, typically a standard normal distribution, into the target data distribution. As a result, the model must simultaneously represent the global structure of the distribution and its fine-scale local variations, which becomes difficult when these scales are strongly mismatched. This issue arises both in natural images, where coarse manifold-level structure and fine textures coexist, and in low-dimensional distributions with highly concentrated local structure. To address this issue, we propose Residual Prior Diffusion (RPD), a two-stage framework in which a coarse prior model first captures the large-scale structure of the data distribution, and a diffusion model is then trained to represent the residual between the prior and the target data distribution. We formulate RPD as an explicit probabilistic model with a tractable evidence lower bound, whose optimization reduces to the familiar objectives of noise prediction or velocity prediction. We further introduce auxiliary variables that leverage information from the prior model and theoretically analyze how they reduce the difficulty of the prediction problem in RPD. Experiments on synthetic datasets with fine-grained local structure show that standard diffusion models fail to capture local details, whereas RPD accurately captures fine-scale detail while preserving the large-scale structure of the distribution. On natural image generation tasks, RPD achieved generation quality that matched or exceeded that of representative diffusion-based baselines and it maintained strong performance even with a small number of inference steps.

LGMar 25, 2024Code

One-Shot Domain Incremental Learning

Yasushi Esaki, Satoshi Koide, Takuro Kutsuna

Domain incremental learning (DIL) has been discussed in previous studies on deep neural network models for classification. In DIL, we assume that samples on new domains are observed over time. The models must classify inputs on all domains. In practice, however, we may encounter a situation where we need to perform DIL under the constraint that the samples on the new domain are observed only infrequently. Therefore, in this study, we consider the extreme case where we have only one sample from the new domain, which we call one-shot DIL. We first empirically show that existing DIL methods do not work well in one-shot DIL. We have analyzed the reason for this failure through various investigations. According to our analysis, we clarify that the difficulty of one-shot DIL is caused by the statistics in the batch normalization layers. Therefore, we propose a technique regarding these statistics and demonstrate the effectiveness of our technique through experiments on open datasets. The code is available at https://github.com/ToyotaCRDL/OneShotDIL.

MLMar 9, 2023

StyleDiff: Attribute Comparison Between Unlabeled Datasets in Latent Disentangled Space

Keisuke Kawano, Takuro Kutsuna, Ryoko Tokuhisa et al.

One major challenge in machine learning applications is coping with mismatches between the datasets used in the development and those obtained in real-world applications. These mismatches may lead to inaccurate predictions and errors, resulting in poor product quality and unreliable systems. In this study, we propose StyleDiff to inform developers of the differences between the two datasets for the steady development of machine learning systems. Using disentangled image spaces obtained from recently proposed generative models, StyleDiff compares the two datasets by focusing on attributes in the images and provides an easy-to-understand analysis of the differences between the datasets. The proposed StyleDiff performs in $O (d N\log N)$, where $N$ is the size of the datasets and $d$ is the number of attributes, enabling the application to large datasets. We demonstrate that StyleDiff accurately detects differences between datasets and presents them in an understandable format using, for example, driving scenes datasets.

LGMay 23, 2025

CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots

Keisuke Kawano, Takuro Kutsuna, Naoki Hayashi et al.

In many real-world settings--e.g., single-cell RNA sequencing, mobility sensing, and environmental monitoring--data are observed only as temporally aggregated snapshots collected over finite time windows, often with noisy or uncertain timestamps, and without access to continuous trajectories. We study the problem of estimating continuous-time dynamics from such snapshots. We present Continuous-Time Optimal Transport Flow (CT-OT Flow), a two-stage framework that (i) infers high-resolution time labels by aligning neighboring intervals via partial optimal transport (POT) and (ii) reconstructs a continuous-time data distribution through temporal kernel smoothing, from which we sample pairs of nearby times to train standard ODE/SDE models. Our formulation explicitly accounts for snapshot aggregation and time-label uncertainty and uses practical accelerations (screening and mini-batch POT), making it applicable to large datasets. Across synthetic benchmarks and two real datasets (scRNA-seq and typhoon tracks), CT-OT Flow reduces distributional and trajectory errors compared with OT-CFM, [SF]$^{2}$M, TrajectoryNet, MFM, and ENOT.

NEMar 8, 2024

Linearly Constrained Weights: Reducing Activation Shift for Faster Training of Neural Networks

Takuro Kutsuna

In this paper, we first identify activation shift, a simple but remarkable phenomenon in a neural network in which the preactivation value of a neuron has non-zero mean that depends on the angle between the weight vector of the neuron and the mean of the activation vector in the previous layer. We then propose linearly constrained weights (LCW) to reduce the activation shift in both fully connected and convolutional layers. The impact of reducing the activation shift in a neural network is studied from the perspective of how the variance of variables in the network changes through layer operations in both forward and backward chains. We also discuss its relationship to the vanishing gradient problem. Experimental results show that LCW enables a deep feedforward network with sigmoid activation functions to be trained efficiently by resolving the vanishing gradient problem. Moreover, combined with batch normalization, LCW improves generalization performance of both feedforward and convolutional networks.

MLMay 20, 2025

An Asymptotic Equation Linking WAIC and WBIC in Singular Models

Naoki Hayashi, Takuro Kutsuna, Sawa Takamuku

In statistical learning, models are classified as regular or singular depending on whether the mapping from parameters to probability distributions is injective. Most models with hierarchical structures or latent variables are singular, for which conventional criteria such as the Akaike Information Criterion and the Bayesian Information Criterion are inapplicable due to the breakdown of normal approximations for the likelihood and posterior. To address this, the Widely Applicable Information Criterion (WAIC) and the Widely Applicable Bayesian Information Criterion (WBIC) have been proposed. Since WAIC and WBIC are computed using posterior distributions at different temperature settings, separate posterior sampling is generally required. In this paper, we theoretically derive an asymptotic equation that links WAIC and WBIC, despite their dependence on different posteriors. This equation yields an asymptotically unbiased expression of WAIC in terms of the posterior distribution used for WBIC. The result clarifies the structural relationship between these criteria within the framework of singular learning theory, and deepens understanding of their asymptotic behavior. This theoretical contribution provides a foundation for future developments in the computational efficiency of model selection in singular models.

LGJan 23, 2025

Exploring Variance Reduction in Importance Sampling for Efficient DNN Training

Takuro Kutsuna

Importance sampling is widely used to improve the efficiency of deep neural network (DNN) training by reducing the variance of gradient estimators. However, efficiently assessing the variance reduction relative to uniform sampling remains challenging due to computational overhead. This paper proposes a method for estimating variance reduction during DNN training using only minibatches sampled under importance sampling. By leveraging the proposed method, the paper also proposes an effective minibatch size to enable automatic learning rate adjustment. An absolute metric to quantify the efficiency of importance sampling is also introduced as well as an algorithm for real-time estimation of importance scores based on moving gradient statistics. Theoretical analysis and experiments on benchmark datasets demonstrated that the proposed algorithm consistently reduces variance, improves training efficiency, and enhances model accuracy compared with current importance-sampling approaches while maintaining minimal computational overhead.

MLDec 4, 2024

Generalized Diffusion Model with Adjusted Offset Noise

Takuro Kutsuna

Diffusion models have become fundamental tools for modeling data distributions in machine learning and have applications in image generation, drug discovery, and audio synthesis. Despite their success, these models face challenges when generating data with extreme brightness values, as evidenced by limitations in widely used frameworks like Stable Diffusion. Offset noise has been proposed as an empirical solution to this issue, yet its theoretical basis remains insufficiently explored. In this paper, we propose a generalized diffusion model that naturally incorporates additional noise within a rigorous probabilistic framework. Our approach modifies both the forward and reverse diffusion processes, enabling inputs to be diffused into Gaussian distributions with arbitrary mean structures. We derive a loss function based on the evidence lower bound, establishing its theoretical equivalence to offset noise with certain adjustments, while broadening its applicability. Experiments on synthetic datasets demonstrate that our model effectively addresses brightness-related challenges and outperforms conventional methods in high-dimensional scenarios.

LGFeb 2, 2024

Minimal Sufficient Views: A DNN model making predictions with more evidence has higher accuracy

Keisuke Kawano, Takuro Kutsuna, Keisuke Sano

Deep neural networks (DNNs) exhibit high performance in image recognition; however, the reasons for their strong generalization abilities remain unclear. A plausible hypothesis is that DNNs achieve robust and accurate predictions by identifying multiple pieces of evidence from images. Thus, to test this hypothesis, this study proposed minimal sufficient views (MSVs). MSVs is defined as a set of minimal regions within an input image that are sufficient to preserve the prediction of DNNs, thus representing the evidence discovered by the DNN. We empirically demonstrated a strong correlation between the number of MSVs (i.e., the number of pieces of evidence) and the generalization performance of the DNN models. Remarkably, this correlation was found to hold within a single DNN as well as between different DNNs, including convolutional and transformer models. This suggested that a DNN model that makes its prediction based on more evidence has a higher generalization performance. We proposed a metric based on MSVs for DNN model selection that did not require label information. Consequently, we empirically showed that the proposed metric was less dependent on the degree of overfitting, rendering it a more reliable indicator of model performance than existing metrics, such as average confidence.

LGJun 29, 2020

Neural Time Warping For Multiple Sequence Alignment

Keisuke Kawano, Takuro Kutsuna, Satoshi Koide

Multiple sequences alignment (MSA) is a traditional and challenging task for time-series analyses. The MSA problem is formulated as a discrete optimization problem and is typically solved by dynamic programming. However, the computational complexity increases exponentially with respect to the number of input sequences. In this paper, we propose neural time warping (NTW) that relaxes the original MSA to a continuous optimization and obtains the alignments using a neural network. The solution obtained by NTW is guaranteed to be a feasible solution for the original discrete optimization problem under mild conditions. Our experimental results show that NTW successfully aligns a hundred time-series and significantly outperforms existing methods for solving the MSA problem. In addition, we show a method for obtaining average time-series data as one of applications of NTW. Compared to the existing barycenters, the mean time series data retains the features of the input time-series data.