LGSep 29, 2021
Breaking the curse of dimensionality with Isolation KernelKai Ming Ting, Takashi Washio, Ye Zhu et al.
The curse of dimensionality has been studied in different aspects. However, breaking the curse has been elusive. We show for the first time that it is possible to break the curse using the recently introduced Isolation Kernel. We show that only Isolation Kernel performs consistently well in indexed search, spectral & density peaks clustering, SVM classification and t-SNE visualization in both low and high dimensions, compared with distance, Gaussian and linear kernels. This is also supported by our theoretical analyses that Isolation Kernel is the only kernel that has the provable ability to break the curse, compared with existing metric-based Lipschitz continuous kernels.
LGSep 24, 2020
Isolation Distributional Kernel: A New Tool for Point & Group Anomaly DetectionKai Ming Ting, Bi-Cun Xu, Takashi Washio et al.
We introduce Isolation Distributional Kernel as a new way to measure the similarity between two distributions. Existing approaches based on kernel mean embedding, which convert a point kernel to a distributional kernel, have two key issues: the point kernel employed has a feature map with intractable dimensionality; and it is {\em data independent}. This paper shows that Isolation Distributional Kernel (IDK), which is based on a {\em data dependent} point kernel, addresses both key issues. We demonstrate IDK's efficacy and efficiency as a new tool for kernel based anomaly detection for both point and group anomalies. Without explicit learning, using IDK alone outperforms existing kernel based point anomaly detector OCSVM and other kernel mean embedding methods that rely on Gaussian kernel. For group anomaly detection,we introduce an IDK based detector called IDK$^2$. It reformulates the problem of group anomaly detection in input space into the problem of point anomaly detection in Hilbert space, without the need for learning. IDK$^2$ runs orders of magnitude faster than group anomaly detector OCSMM.We reveal for the first time that an effective kernel based anomaly detector based on kernel mean embedding must employ a characteristic kernel which is data dependent.
LGJul 2, 2019
Isolation Kernel: The X Factor in Efficient and Effective Large Scale Online Kernel LearningKai Ming Ting, Jonathan R. Wells, Takashi Washio
Large scale online kernel learning aims to build an efficient and scalable kernel-based predictive model incrementally from a sequence of potentially infinite data points. A current key approach focuses on ways to produce an approximate finite-dimensional feature map, assuming that the kernel used has a feature map with intractable dimensionality---an assumption traditionally held in kernel-based methods. While this approach can deal with large scale datasets efficiently, this outcome is achieved by compromising predictive accuracy because of the approximation. We offer an alternative approach which overrides the assumption and puts the kernel used at the heart of the approach. It focuses on creating an exact, sparse and finite-dimensional feature map of a kernel called Isolation Kernel. Using this new approach, to achieve the above aim of large scale online kernel learning becomes extremely simple---simply use Isolation Kernel instead of a kernel having a feature map with intractable dimensionality. We show that, using Isolation Kernel, large scale online kernel learning can be achieved efficiently without sacrificing accuracy.
IRFeb 9, 2019
A new simple and effective measure for bag-of-word inter-document similarity measurementSunil Aryal, Kai Ming Ting, Takashi Washio et al.
To measure the similarity of two documents in the bag-of-words (BoW) vector representation, different term weighting schemes are used to improve the performance of cosine similarity---the most widely used inter-document similarity measure in text mining. In this paper, we identify the shortcomings of the underlying assumptions of term weighting in the inter-document similarity measurement task; and provide a more fit-to-the-purpose alternative. Based on this new assumption, we introduce a new simple but effective similarity measure which does not require explicit term weighting. The proposed measure employs a more nuanced probabilistic approach than those used in term weighting to measure the similarity of two documents w.r.t each term occurring in the two documents. Our empirical comparison with the existing similarity measures using different term weighting schemes shows that the new measure produces (i) better results in the binary BoW representation; and (ii) competitive and more consistent results in the term-frequency-based BoW representation.
LGDec 8, 2018
Learning Graph Representation via Formal Concept AnalysisYuka Yoneda, Mahito Sugiyama, Takashi Washio
We present a novel method that can learn a graph representation from multivariate data. In our representation, each node represents a cluster of data points and each edge represents the subset-superset relationship between clusters, which can be mutually overlapped. The key to our method is to use formal concept analysis (FCA), which can extract hierarchical relationships between clusters based on the algebraic closedness property. We empirically show that our method can effectively extract hierarchical structures of clusters compared to the baseline method.
AIFeb 19, 2018
Analysis of cause-effect inference by comparing regression errorsPatrick Blöbaum, Dominik Janzing, Takashi Washio et al.
We address the problem of inferring the causal direction between two variables by comparing the least-squares errors of the predictions in both possible directions. Under the assumption of an independence between the function relating cause and effect, the conditional noise distribution, and the distribution of the cause, we show that the errors are smaller in causal direction if both variables are equally scaled and the causal relation is close to deterministic. Based on this, we provide an easily applicable algorithm that only requires a regression in both possible causal directions and a comparison of the errors. The performance of the algorithm is compared with various related causal inference methods in different artificial and real-world data sets.
AIOct 11, 2016
Error Asymmetry in Causal and Anticausal RegressionPatrick Blöbaum, Takashi Washio, Shohei Shimizu
It is generally difficult to make any statements about the expected prediction error in an univariate setting without further knowledge about how the data were generated. Recent work showed that knowledge about the real underlying causal structure of a data generation process has implications for various machine learning settings. Assuming an additive noise and an independence between data generating mechanism and its input, we draw a novel connection between the intrinsic causal relationship of two variables and the expected prediction error. We formulate the theorem that the expected error of the true data generating function as prediction model is generally smaller when the effect is predicted from its cause and, on the contrary, greater when the cause is predicted from its effect. The theorem implies an asymmetry in the error depending on the prediction direction. This is further corroborated with empirical evaluations in artificial and real-world data sets.
MLAug 2, 2014
A Bayesian estimation approach to analyze non-Gaussian data-generating processes with latent classesNaoki Tanaka, Shohei Shimizu, Takashi Washio
A large amount of observational data has been accumulated in various fields in recent times, and there is a growing need to estimate the generating processes of these data. A linear non-Gaussian acyclic model (LiNGAM) based on the non-Gaussianity of external influences has been proposed to estimate the data-generating processes of variables. However, the results of the estimation can be biased if there are latent classes. In this paper, we first review LiNGAM, its extended model, as well as the estimation procedure for LiNGAM in a Bayesian framework. We then propose a new Bayesian estimation procedure that solves the problem.
MLJan 22, 2014
Causal Discovery in a Binary Exclusive-or Skew Acyclic Model: BExSAMTakanori Inazumi, Takashi Washio, Shohei Shimizu et al.
Discovering causal relations among observed variables in a given data set is a major objective in studies of statistics and artificial intelligence. Recently, some techniques to discover a unique causal model have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary data and propose an efficient new approach to deriving the unique causal model governing a given binary data set under skew distributions of external binary noises. Experimental evaluation shows excellent performance for both artificial and real world data sets.
MLJan 22, 2014
Identifiability of an Integer Modular Acyclic Additive Noise Model and its Causal Structure DiscoveryJoe Suzuki, Takanori Inazumi, Takashi Washio et al.
The notion of causality is used in many situations dealing with uncertainty. We consider the problem whether causality can be identified given data set generated by discrete random variables rather than continuous ones. In particular, for non-binary data, thus far it was only known that causality can be identified except rare cases. In this paper, we present necessary and sufficient condition for an integer modular acyclic additive noise (IMAN) of two variables. In addition, we relate bivariate and multivariate causal identifiability in a more explicit manner, and develop a practical algorithm to find the order of variables and their parent sets. We demonstrate its performance in applications to artificial data and real world body motion data with comparisons to conventional methods.
QUANT-PHJan 20, 2014
Anomaly detection in reconstructed quantum states using a machine-learning techniqueSatoshi Hara, Takafumi Ono, Ryo Okamoto et al.
The accurate detection of small deviations in given density matrices is important for quantum information processing. Here we propose a new method based on the concept of data mining. We demonstrate that the proposed method can more accurately detect small erroneous deviations in reconstructed density matrices, which contain intrinsic fluctuations due to the limited number of samples, than a naive method of checking the trace distance from the average of the given density matrices. This method has the potential to be a key tool in broad areas of physics where the detection of small deviations of quantum states reconstructed using a limited number of samples are essential.
MLMar 29, 2013
ParceLiNGAM: A causal ordering method robust against latent confoundersTatsuya Tashiro, Shohei Shimizu, Aapo Hyvarinen et al.
We consider learning a causal ordering of variables in a linear non-Gaussian acyclic model called LiNGAM. Several existing methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But, the estimation results could be distorted if some assumptions actually are violated. In this paper, we propose a new algorithm for learning causal orders that is robust against one typical violation of the model assumptions: latent confounders. The key idea is to detect latent confounders by testing independence between estimated external influences and find subsets (parcels) that include variables that are not affected by latent confounders. We demonstrate the effectiveness of our method using artificial data and simulated brain imaging data.
MLApr 9, 2012
Estimation of causal orders in a linear non-Gaussian acyclic model: a method robust against latent confoundersTatsuya Tashiro, Shohei Shimizu, Aapo Hyvarinen et al.
We consider to learn a causal ordering of variables in a linear non-Gaussian acyclic model called LiNGAM. Several existing methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But, the estimation results could be distorted if some assumptions actually are violated. In this paper, we propose a new algorithm for learning causal orders that is robust against one typical violation of the model assumptions: latent confounders. We demonstrate the effectiveness of our method using artificial data.
MLMar 1, 2012
Learning a Common Substructure of Multiple Graphical Gaussian ModelsSatoshi Hara, Takashi Washio
Properties of data are frequently seen to vary depending on the sampled situations, which usually changes along a time evolution or owing to environmental effects. One way to analyze such data is to find invariances, or representative features kept constant over changes. The aim of this paper is to identify one such feature, namely interactions or dependencies among variables that are common across multiple datasets collected under different conditions. To that end, we propose a common substructure learning (CSSL) framework based on a graphical Gaussian model. We further present a simple learning algorithm based on the Dual Augmented Lagrangian and the Alternating Direction Method of Multipliers. We confirm the performance of CSSL over other existing techniques in finding unchanging dependency structures in multiple datasets through numerical simulations on synthetic data and through a real world application to anomaly detection in automobile sensors.
LGFeb 14, 2012
Discovering causal structures in binary exclusive-or skew acyclic modelsTakanori Inazumi, Takashi Washio, Shohei Shimizu et al.
Discovering causal relations among observed variables in a given data set is a main topic in studies of statistics and artificial intelligence. Recently, some techniques to discover an identifiable causal structure have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary data and propose a new approach to derive an identifiable causal structure governing the data based on skew Bernoulli distributions of external noise. Experimental evaluation shows excellent performance for both artificial and real world data sets.