Masanari Kimura

LG
h-index5
15papers
134citations
Novelty33%
AI Score38

15 Papers

10.7LGApr 19, 2023Code
Information Geometrically Generalized Covariate Shift Adaptation

Masanari Kimura, Hideitsu Hino

Many machine learning methods assume that the training and test data follow the same distribution. However, in the real world, this assumption is very often violated. In particular, the phenomenon that the marginal distribution of the data changes is called covariate shift, one of the most important research topics in machine learning. We show that the well-known family of covariate shift adaptation methods is unified in the framework of information geometry. Furthermore, we show that parameter search for geometrically generalized covariate shift adaptation method can be achieved efficiently. Numerical experiments show that our generalization can achieve better performance than the existing methods it encompasses.

23.8LGFeb 10, 2024
Understanding Test-Time Augmentation

Masanari Kimura

Test-Time Augmentation (TTA) is a very powerful heuristic that takes advantage of data augmentation during testing to produce averaged output. Despite the experimental effectiveness of TTA, there is insufficient discussion of its theoretical aspects. In this paper, we aim to give theoretical guarantees for TTA and clarify its behavior.

17.0LGMar 15, 2024
A Short Survey on Importance Weighting for Machine Learning

Masanari Kimura, Hideitsu Hino

Importance weighting is a fundamental procedure in statistics and machine learning that weights the objective function or probability distribution based on the importance of the instance in some sense. The simplicity and usefulness of the idea has led to many applications of importance weighting. For example, it is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio. This survey summarizes the broad applications of importance weighting in machine learning and related research.

1.7MLFeb 9
Information Geometry of Absorbing Markov-Chain and Discriminative Random Walks

Masanari Kimura

Discriminative Random Walks (DRWs) are a simple yet powerful tool for semi-supervised node classification, but their theoretical foundations remain fragmentary. We revisit DRWs through the lens of information geometry, treating the family of class-specific hitting-time laws on an absorbing Markov chain as a statistical manifold. Starting from a log-linear edge-weight model, we derive closed-form expressions for the hitting-time probability mass function, its full moment hierarchy, and the observed Fisher information. The Fisher matrix of each seed node turns out to be rank-one, taking the quotient by its null space yields a low-dimensional, globally flat manifold that captures all identifiable directions of the model. Leveraging the geometry, we introduce a sensitivity score for unlabeled nodes that bounds, and in one-dimensional cases attains, the maximal first-order change in DRW betweenness under unit Fisher perturbations. The score can lead to principled strategies for active label acquisition, edge re-weighting, and explanation.

4.5MLMay 22, 2025
Higher-Order Asymptotics of Test-Time Adaptation for Batch Normalization Statistics

Masanari Kimura

This study develops a higher-order asymptotic framework for test-time adaptation (TTA) of Batch Normalization (BN) statistics under distribution shift by integrating classical Edgeworth expansion and saddlepoint approximation techniques with a novel one-step M-estimation perspective. By analyzing the statistical discrepancy between training and test distributions, we derive an Edgeworth expansion for the normalized difference in BN means and obtain an optimal weighting parameter that minimizes the mean-squared error of the adapted statistic. Reinterpreting BN TTA as a one-step M-estimator allows us to derive higher-order local asymptotic normality results, which incorporate skewness and other higher moments into the estimator's behavior. Moreover, we quantify the trade-offs among bias, variance, and skewness in the adaptation process and establish a corresponding generalization bound on the model risk. The refined saddlepoint approximations further deliver uniformly accurate density and tail probability estimates for the BN TTA statistic. These theoretical insights provide a comprehensive understanding of how higher-order corrections and robust one-step updating can enhance the reliability and performance of BN layers in adapting to changing data distributions.

5.9ITMar 31, 2021Code
$α$-Geodesical Skew Divergence

Masanari Kimura, Hideitsu Hino

The asymmetric skew divergence smooths one of the distributions by mixing it, to a degree determined by the parameter $λ$, with the other distribution. Such divergence is an approximation of the KL divergence that does not require the target distribution to be absolutely continuous with respect to the source distribution. In this paper, an information geometric generalization of the skew divergence called the $α$-geodesical skew divergence is proposed, and its properties are studied.

1.2LGJul 8, 2020Code
Density Fixing: Simple yet Effective Regularization Method based on the Class Prior

Masanari Kimura, Ryohei Izawa

Machine learning models suffer from overfitting, which is caused by a lack of labeled data. To tackle this problem, we proposed a framework of regularization methods, called density-fixing, that can be used commonly for supervised and semi-supervised learning. Our proposed regularization method improves the generalization performance by forcing the model to approximate the class's prior distribution or the frequency of occurrence. This regularization term is naturally derived from the formula of maximum likelihood estimation and is theoretically justified. We further provide the several theoretical analyses of the proposed method including asymptotic behavior. Our experimental results on multiple benchmark datasets are sufficient to support our argument, and we suggest that this simple and effective regularization method is useful in real-world machine learning problems.

9.0MLJun 11, 2020
Why Mixup Improves the Model Performance

Masanari Kimura

Machine learning techniques are used in a wide range of domains. However, machine learning models often suffer from the problem of over-fitting. Many data augmentation methods have been proposed to tackle such a problem, and one of them is called mixup. Mixup is a recently proposed regularization procedure, which linearly interpolates a random pair of training examples. This regularization method works very well experimentally, but its theoretical guarantee is not adequately discussed. In this study, we aim to discover why mixup works well from the aspect of the statistical learning theory.

0.9CVOct 16, 2019
Large-Scale Landslides Detection from Satellite Images with Incomplete Labels

Masanari Kimura

Earthquakes and tropical cyclones cause the suffering of millions of people around the world every year. The resulting landslides exacerbate the effects of these disasters. Landslide detection is, therefore, a critical task for the protection of human life and livelihood in mountainous areas. To tackle this problem, we propose a combination of satellite technology and Deep Neural Networks (DNNs). We evaluate the performance of multiple DNN-based methods for landslide detection on actual satellite images of landslide damage. Our analysis demonstrates the potential for a meaningful social impact in terms of disasters and rescue.

2.7LGSep 12, 2019
New Perspective of Interpretability of Deep Neural Networks

Masanari Kimura, Masayuki Tanaka

Deep neural networks (DNNs) are known as black-box models. In other words, it is difficult to interpret the internal state of the model. Improving the interpretability of DNNs is one of the hot research topics. However, at present, the definition of interpretability for DNNs is vague, and the question of what is a highly explanatory model is still controversial. To address this issue, we provide the definition of the human predictability of the model, as a part of the interpretability of the DNNs. The human predictability proposed in this paper is defined by easiness to predict the change of the inference when perturbating the model of the DNNs. In addition, we introduce one example of high human-predictable DNNs. We discuss that our definition will help to the research of the interpretability of the DNNs considering various types of applications.

1.8CVMay 27, 2019
PNUNet: Anomaly Detection using Positive-and-Negative Noise based on Self-Training Procedure

Masanari Kimura

We propose the novel framework for anomaly detection in images. Our new framework, PNUNet, is based on many normal data and few anomalous data. We assume that some noises are added to the input images and learn to remove the noise. In addition, the proposed method achieves significant performance improvement by updating the noise assumed in the inputs using a self-training framework. The experimental results for the benchmark datasets show the usefulness of our new anomaly detection framework.

0.9CVMay 7, 2019
Intentional Attention Mask Transformation for Robust CNN Classification

Masanari Kimura, Masayuki Tanaka

Convolutional Neural Networks have achieved impressive results in various tasks, but interpreting the internal mechanism is a challenging problem. To tackle this problem, we exploit a multi-channel attention mechanism in feature space. Our network architecture allows us to obtain an attention mask for each feature while existing CNN visualization methods provide only a common attention mask for all features. We apply the proposed multi-channel attention mechanism to multi-attribute recognition task. We can obtain different attention mask for each feature and for each attribute. Those analyses give us deeper insight into the feature space of CNNs. Furthermore, our proposed attention mechanism naturally derives a method for improving the robustness of CNNs. From the observation of feature space based on the proposed attention mask, we demonstrate that we can obtain robust CNNs by intentionally emphasizing features that are important for attributes. The experimental results for the benchmark dataset show that the proposed method gives high human interpretability while accurately grasping the attributes of the data, and improves network robustness.

2.6CVApr 30, 2019
Interpretation of Feature Space using Multi-Channel Attentional Sub-Networks

Masanari Kimura, Masayuki Tanaka

Convolutional Neural Networks have achieved impressive results in various tasks, but interpreting the internal mechanism is a challenging problem. To tackle this problem, we exploit a multi-channel attention mechanism in feature space. Our network architecture allows us to obtain an attention mask for each feature while existing CNN visualization methods provide only a common attention mask for all features. We apply the proposed multi-channel attention mechanism to multi-attribute recognition task. We can obtain different attention mask for each feature and for each attribute. Those analyses give us deeper insight into the feature space of CNNs. The experimental results for the benchmark dataset show that the proposed method gives high interpretability to humans while accurately grasping the attributes of the data.

9.1CVJul 3, 2018
Anomaly Detection Using GANs for Visual Inspection in Noisy Training Data

Masanari Kimura, Takashi Yanagihara

The detection and the quantification of anomalies in image data are critical tasks in industrial scenes such as detecting micro scratches on product. In recent years, due to the difficulty of defining anomalies and the limit of correcting their labels, research on unsupervised anomaly detection using generative models has attracted attention. Generally, in those studies, only normal images are used for training to model the distribution of normal images. The model measures the anomalies in the target images by reproducing the most similar images and scoring image patches indicating their fit to the learned distribution. This approach is based on a strong presumption; the trained model should not be able to generate abnormal images. However, in reality, the model can generate abnormal images mainly due to noisy normal data which include small abnormal pixels, and such noise severely affects the accuracy of the model. Therefore, we propose a novel anomaly detection method to distort the distribution of the model with existing abnormal images. The proposed method detects pixel-level micro anomalies with a high accuracy from 1024x1024 high resolution images which are actually used in an industrial scene. In this paper, we share experimental results on open datasets, due to the confidentiality of the data.

0.8LGFeb 18, 2018Code
Node Centralities and Classification Performance for Characterizing Node Embedding Algorithms

Kento Nozawa, Masanari Kimura, Atsunori Kanemura

Embedding graph nodes into a vector space can allow the use of machine learning to e.g. predict node classes, but the study of node embedding algorithms is immature compared to the natural language processing field because of a diverse nature of graphs. We examine the performance of node embedding algorithms with respect to graph centrality measures that characterize diverse graphs, through systematic experiments with four node embedding algorithms, four or five graph centralities, and six datasets. Experimental results give insights into the properties of node embedding algorithms, which can be a basis for further research on this topic.