Kei Kobayashi

5papers

73citations

Novelty45%

AI Score27

Ranked #162,406 of 201,326 authors (top 81%)#2,568 in ML (top 73%)

5 Papers

LGSep 1, 2024

Benign Overfitting under Learning Rate Conditions for $α$ Sub-exponential Input

Kota Okudo, Kei Kobayashi

This paper investigates the phenomenon of benign overfitting in binary classification problems with heavy-tailed input distributions, extending the analysis of maximum margin classifiers to $α$ sub-exponential distributions ($α\in (0, 2]$). This generalizes previous work focused on sub-gaussian inputs. We provide generalization error bounds for linear classifiers trained using gradient descent on unregularized logistic loss in this heavy-tailed setting. Our results show that, under certain conditions on the dimensionality $p$ and the distance between the centers of the distributions, the misclassification error of the maximum margin classifier asymptotically approaches the noise level, the theoretical optimal value. Moreover, we derive an upper bound on the learning rate $β$ for benign overfitting to occur and show that as the tail heaviness of the input distribution $α$ increases, the upper bound on the learning rate decreases. These results demonstrate that benign overfitting persists even in settings with heavier-tailed inputs than previously studied, contributing to a deeper understanding of the phenomenon in more realistic data environments.

AIFeb 16, 2021

Representing Hierarchical Structure by Using Cone Embedding

Daisuke Takehara, Kei Kobayashi

Graph embedding is becoming an important method with applications in various areas, including social networks and knowledge graph completion. In particular, Poincaré embedding has been proposed to capture the hierarchical structure of graphs, and its effectiveness has been reported. However, most of the existing methods have isometric mappings in the embedding space, and the choice of the origin point can be arbitrary. This fact is not desirable when the distance from the origin is used as an indicator of hierarchy, as in the case of Poincaré embedding. In this paper, we propose cone embedding, embedding method in a metric cone, which solve these problems, and we gain further benefits: 1) we provide an indicator of hierarchical information that is both geometrically and intuitively natural to interpret, and 2) we can extract the hierarchical structure from a graph embedding output of other methods by learning additional one-dimensional parameters.

MLMar 1, 2020

Why is the Mahalanobis Distance Effective for Anomaly Detection?

Ryo Kamoi, Kei Kobayashi

The Mahalanobis distance-based confidence score, a recently proposed anomaly detection method for pre-trained neural classifiers, achieves state-of-the-art performance on both out-of-distribution (OoD) and adversarial examples detection. This work analyzes why this method exhibits such strong performance in practical settings while imposing an implausible assumption; namely, that class conditional distributions of pre-trained features have tied covariance. Although the Mahalanobis distance-based method is claimed to be motivated by classification prediction confidence, we find that its superior performance stems from information not useful for classification. This suggests that the reason the Mahalanobis confidence score works so well is mistaken, and makes use of different information from ODIN, another popular OoD detection method based on prediction confidence. This perspective motivates us to combine these two methods, and the combined detector exhibits improved performance and robustness. These findings provide insight into the behavior of neural classifiers in response to anomalous inputs.

MLNov 15, 2019

Likelihood Assignment for Out-of-Distribution Inputs in Deep Generative Models is Sensitive to Prior Distribution Choice

Ryo Kamoi, Kei Kobayashi

Recent work has shown that deep generative models assign higher likelihood to out-of-distribution inputs than to training data. We show that a factor underlying this phenomenon is a mismatch between the nature of the prior distribution and that of the data distribution, a problem found in widely used deep generative models such as VAEs and Glow. While a typical choice for a prior distribution is a standard Gaussian distribution, properties of distributions of real data sets may not be consistent with a unimodal prior distribution. This paper focuses on the relationship between the choice of a prior distribution and the likelihoods assigned to out-of-distribution inputs. We propose the use of a mixture distribution as a prior to make likelihoods assigned by deep generative models sensitive to out-of-distribution inputs. Furthermore, we explain the theoretical advantages of adopting a mixture distribution as the prior, and we present experimental results to support our claims. Finally, we demonstrate that a mixture prior lowers the out-of-distribution likelihood with respect to two pairs of real image data sets: Fashion-MNIST vs. MNIST and CIFAR10 vs. SVHN.

MLFeb 12, 2018

Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification

Tatsuhiro Aoshima, Kei Kobayashi, Mihoko Minami

Machine learning has played an important role in information retrieval (IR) in recent times. In search engines, for example, query keywords are accepted and documents are returned in order of relevance to the given query; this can be cast as a multi-label ranking problem in machine learning. Generally, the number of candidate documents is extremely large (from several thousand to several million); thus, the classifier must handle many labels. This problem is referred to as extreme multi-label classification (XMLC). In this paper, we propose a novel approach to XMLC termed the Sparse Weighted Nearest-Neighbor Method. This technique can be derived as a fast implementation of state-of-the-art (SOTA) one-versus-rest linear classifiers for very sparse datasets. In addition, we show that the classifier can be written as a sparse generalization of a representer theorem with a linear kernel. Furthermore, our method can be viewed as the vector space model used in IR. Finally, we show that the Sparse Weighted Nearest-Neighbor Method can process data points in real time on XMLC datasets with equivalent performance to SOTA models, with a single thread and smaller storage footprint. In particular, our method exhibits superior performance to the SOTA models on a dataset with 3 million labels.