Ken-ichi Kawarabayashi

CL
h-index61
27papers
3,617citations
Novelty58%
AI Score60

27 Papers

LGSep 29, 2022
Intrinsic Dimensionality Estimation within Tight Localities: A Theoretical and Experimental Analysis

Laurent Amsaleg, Oussama Chelly, Michael E. Houle et al.

Accurate estimation of Intrinsic Dimensionality (ID) is of crucial importance in many data mining and machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. However, since their convergence generally requires sample sizes (that is, neighborhood sizes) on the order of hundreds of points, existing ID estimation methods may have only limited usefulness for applications in which the data consists of many natural groups of small size. In this paper, we propose a local ID estimation strategy stable even for `tight' localities consisting of as few as 20 sample points. The estimator applies MLE techniques over all available pairwise distances among the members of the sample, based on a recent extreme-value-theoretic model of intrinsic dimensionality, the Local Intrinsic Dimension (LID). Our experimental results show that our proposed estimation technique can achieve notably smaller variance, while maintaining comparable levels of bias, at much smaller sample sizes than state-of-the-art estimators.

CLFeb 3
Accelerating Scientific Research with Gemini: Case Studies and Common Techniques

David P. Woodruff, Vincent Cohen-Addad, Lalit Jain et al.

Recent advances in large language models (LLMs) have opened new avenues for accelerating scientific research. While models are increasingly capable of assisting with routine tasks, their ability to contribute to novel, expert-level mathematical discovery is less understood. We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models, specifically Google's Gemini-based models (in particular Gemini Deep Think and its advanced variants), to solve open problems, refute conjectures, and generate new proofs across diverse areas in theoretical computer science, as well as other areas such as economics, optimization, and physics. Based on these experiences, we extract common techniques for effective human-AI collaboration in theoretical research, such as iterative refinement, problem decomposition, and cross-disciplinary knowledge transfer. While the majority of our results stem from this interactive, conversational methodology, we also highlight specific instances that push beyond standard chat interfaces. These include deploying the model as a rigorous adversarial reviewer to detect subtle flaws in existing proofs, and embedding it within a "neuro-symbolic" loop that autonomously writes and executes code to verify complex derivations. Together, these examples highlight the potential of AI not just as a tool for automation, but as a versatile, genuine partner in the creative process of scientific discovery.

DSMar 15, 2022
Online Task Assignment Problems with Reusable Resources

Hanna Sumita, Shinji Ito, Kei Takemura et al.

We study online task assignment problem with reusable resources, motivated by practical applications such as ridesharing, crowdsourcing and job hiring. In the problem, we are given a set of offline vertices (agents), and, at each time, an online vertex (task) arrives randomly according to a known time-dependent distribution. Upon arrival, we assign the task to agents immediately and irrevocably. The goal of the problem is to maximize the expected total profit produced by completed tasks. The key features of our problem are (1) an agent is reusable, i.e., an agent comes back to the market after completing the assigned task, (2) an agent may reject the assigned task to stay the market, and (3) a task may accommodate multiple agents. The setting generalizes that of existing work in which an online task is assigned to one agent under (1). In this paper, we propose an online algorithm that is $1/2$-competitive for the above setting, which is tight. Moreover, when each agent can reject assigned tasks at most $Δ$ times, the algorithm is shown to have the competitive ratio $Δ/(3Δ-1)\geq 1/3$. We also evaluate our proposed algorithm with numerical experiments.

LGSep 19, 2023
A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings

Danushka Bollegala, Shuichi Otake, Tomoya Machide et al.

We propose a Neighbourhood-Aware Differential Privacy (NADP) mechanism considering the neighbourhood of a word in a pretrained static word embedding space to determine the minimal amount of noise required to guarantee a specified privacy level. We first construct a nearest neighbour graph over the words using their embeddings, and factorise it into a set of connected components (i.e. neighbourhoods). We then separately apply different levels of Gaussian noise to the words in each neighbourhood, determined by the set of words in that neighbourhood. Experiments show that our proposed NADP mechanism consistently outperforms multiple previously proposed DP mechanisms such as Laplacian, Gaussian, and Mahalanobis in multiple downstream tasks, while guaranteeing higher levels of privacy.

98.5COMar 25
The Four Color Theorem with Linearly Many Reducible Configurations and Near-Linear Time Coloring

Yuta Inoue, Ken-ichi Kawarabayashi, Atsuyuki Miyashita et al.

We give a near-linear time 4-coloring algorithm for planar graphs, improving on the previous quadratic time algorithm by Robertson et al. from 1996. Such an algorithm cannot be achieved by the known proofs of the Four Color Theorem (4CT). Technically speaking, we show the following significant generalization of the 4CT: every planar triangulation contains linearly many pairwise non-touching reducible configurations or pairwise non-crossing obstructing cycles of length at most 5 (which all allow for making effective 4-coloring reductions). The known proofs of the 4CT only show the existence of a single reducible configuration or obstructing cycle in the above statement. The existence is proved using the discharging method based on combinatorial curvature. It identifies reducible configurations in parts where the local neighborhood has positive combinatorial curvature. Our result significantly strengthens the known proofs of 4CT, showing that we can also find reductions in large ``flat" parts where the curvature is zero, and moreover, we can make reductions almost anywhere in a given planar graph. An interesting aspect of this is that such large flat parts are also found in large triangulations of any fixed surface. From a computational perspective, the old proofs allowed us to apply induction on a problem that is smaller by some additive constant. The inductive step took linear time, resulting in a quadratic total time. With our linear number of reducible configurations or obstructing cycles, we can reduce the problem size by a constant factor. Our inductive step takes $O(n\log n)$ time, yielding a 4-coloring in $O(n\log n)$ total time. In order to efficiently handle a linear number of reducible configurations, we need them to have certain robustness that could also be useful in other applications. All our reducible configurations are what is known as D-reducible.

90.3DSApr 3
Online Graph Coloring for $k$-Colorable Graphs

Ken-ichi Kawarabayashi, Hirotaka Yoneda, Masataka Yoneda

We study the problem of online graph coloring for $k$-colorable graphs. The best previously known deterministic algorithm uses $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors for general $k$ and $\widetilde{O}(n^{5/6})$ colors for $k = 4$, both given by Kierstead in 1998. In this paper, we finally break this barrier, achieving the first major improvement in nearly three decades. Our results are summarized as follows: (1) $k \geq 5$ case. We provide a deterministic online algorithm to color $k$-colorable graphs with $\widetilde{O}(n^{1-\frac{1}{k(k-1)/2}})$ colors, significantly improving the current upper bound of $\widetilde{O}(n^{1-\frac{1}{k!}})$ colors. Our algorithm also matches the best-known bound for $k = 4$ ($\widetilde{O}(n^{5/6})$ colors). (2) $k = 4$ case. We provide a deterministic online algorithm to color $4$-colorable graphs with $\widetilde{O}(n^{14/17})$ colors, improving the current upper bound of $\widetilde{O}(n^{5/6})$ colors. (3) $k = 2$ case. We show that for randomized algorithms, the upper bound is $1.034 \log_2 n + O(1)$ colors and the lower bound is $\frac{91}{96} \log_2 n - O(1)$ colors. This means that we close the gap to a factor of $1.09$. With our algorithm for the $k \geq 5$ case, we also obtain a deterministic online algorithm for graph coloring that achieves a competitive ratio of $O(\frac{n}{\log \log n})$, which improves the best-known result of $O(\frac{n \log \log \log n}{\log \log n})$ by Kierstead. For the bipartite graph case ($k = 2$), the limit of online deterministic algorithms is known: any deterministic algorithm requires $2 \log_2 n - O(1)$ colors. Our results imply that randomized algorithms can perform slightly better but still have a limit.

98.1COApr 3
A polynomial bound for the minimal excluded minors for a surface

Sarah Houdaigoui, Ken-ichi Kawarabayashi

As part of the graph minor project, Robertson and Seymour showed in 1990 that the class of graphs that can be embedded in a given surface can be characterized by a finite set of minimal excluded minors. However, their proof, because existential, provides no explicit information about these excluded minors. In 1993, Seymour established the first upper bound on the order of such minimal excluded minors. Very recently, Houdaigoui and Kawarabayashi improved this result by deriving a quasi-polynomial upper bound. Despite this progress, the gap between this bound and the known linear lower bound $Ω(g)$ (where $g$ denotes the genus) remains substantial. In particular, they conjectured that a polynomial upper bound should hold. In this paper, we confirm this conjecture by showing that the order of the minimal excluded minors for a surface of genus $g$ is $O(g^{8+\varepsilon})$ for every $\varepsilon >0$. This result significantly narrows the gap between the known lower and upper bounds, bringing the asymptotic behavior much closer to the conjectured optimum. Our approach introduces a new forbidden structure of minimal excluded minors. Let $G$ be a minimal excluded minor for a surface of Euler genus $g$. Houdaigoui and Kawarabayashi showed that $G$ contains $O(\log g)$ pairwise disjoint cycles that are contractible and nested in some embedding of $G$. We strengthen this result by proving a separator-based variant: for any contractible subgraph $H \subseteq G$ with a separator of size $s$ (with $H$ completely contained in one side), the subgraph $H$ contains $O(\log s)$ disjoint cycles that are contractible and nested in some embedding of $G$. This allows us to replace a genus-dependent bound with a separator-dependent one, which is the main new ingredient in deriving our polynomial bound.

85.7DSMay 8
EPTAS for Hard Graph Cut Problems for Dense Graphs

Kaisei Deguchi, Ken-ichi Kawarabayashi, Hiroaki Mori

Everywhere-$δ$-dense graphs are defined as graphs on $n$ vertices in which every vertex has degree at least $δn$ for some constant $δ> 0$. Approximation schemes are vital for handling NP-hard optimization problems, but for many graph cut problems, existing PTAS algorithms often suffer from running times of $n^{f(1/\varepsilon)}$. In this paper, we bring PTASs down to EPTASs for several fundamental minimization problems on everywhere-$Ω(1)$-dense graphs. Specifically, we present the first Efficient Polynomial-Time Approximation Scheme (EPTAS), running in time $f(1/\varepsilon)n^{O(1)}$, for the ConstrainedMinCut problem under a global constraint on vertex weights, a problem that captures BalancedSeparator and SmallSetExpansion. Moreover, we give the first EPTASs for MinQuotientCut and ProductSparsestCut on everywhere-$δ$-dense graphs with integer-valued dense vertex weights; these problems generalize the four well-known problems UniformSparsestCut, EdgeExpansion, Conductance, and NormalizedCut. Our main technical contribution is an EPTAS for ConstrainedMinCut, based on the weak regularity lemma and sampling and estimation techniques. We then obtain EPTASs for MinQuotientCut and ProductSparsestCut via a unified reduction that invokes this algorithm as a subroutine. In contrast, previous works giving PTASs for these problems on everywhere-$δ$-dense graphs typically rely on powerful tools such as the Lasserre hierarchy or specific integer programming technique, which we avoid.

79.6DMMay 8
Well-Quasi-Ordering Eulerian Digraphs: Bounded Carving Width

Dario Cavallaro, Ken-ichi Kawarabayashi, Stephan Kreutzer

We prove that every class of Eulerian directed graphs of bounded carving width (equivalently of bounded degree and treewidth) is well-quasi-ordered by strong immersion. In fact, we prove a stronger result, namely that every class of Eulerian directed graphs of bounded carving width, where every vertex is additionally labeled from a well-quasi-order, fixes a linear order on its incident edges, and may impose further restrictions on how the immersion is allowed to route paths through it, is well-quasi-ordered by an adequate notion of strong immersion. To this extent, we develop a framework seemingly suited to prove well-quasi-ordering for classes of Eulerian directed graphs by (strong) immersion and present a first meta theorem in that direction. We complement our results by observing that the class of Eulerian directed graphs of unbounded degree is \emph{not} well-quasi-ordered by \emph{strong} immersion, even if we assume the treewidth of the class to be at most two. We conclude with a dichotomy result, proving for a very restricted class of Eulerian directed graphs of unbounded degree that it is not well-quasi-ordered by strong immersion, but it is well-quasi-ordered by weak immersion.

LGDec 19, 2023
New Classes of the Greedy-Applicable Arm Feature Distributions in the Sparse Linear Bandit Problem

Koji Ichikawa, Shinji Ito, Daisuke Hatano et al.

We consider the sparse contextual bandit problem where arm feature affects reward through the inner product of sparse parameters. Recent studies have developed sparsity-agnostic algorithms based on the greedy arm selection policy. However, the analysis of these algorithms requires strong assumptions on the arm feature distribution to ensure that the greedily selected samples are sufficiently diverse; One of the most common assumptions, relaxed symmetry, imposes approximate origin-symmetry on the distribution, which cannot allow distributions that has origin-asymmetric support. In this paper, we show that the greedy algorithm is applicable to a wider range of the arm feature distributions from two aspects. Firstly, we show that a mixture distribution that has a greedy-applicable component is also greedy-applicable. Second, we propose new distribution classes, related to Gaussian mixture, discrete, and radial distribution, for which the sample diversity is guaranteed. The proposed classes can describe distributions with origin-asymmetric support and, in conjunction with the first claim, provide theoretical guarantees of the greedy policy for a very wide range of the arm feature distributions.

CLJan 25, 2021
RelWalk A Latent Variable Model Approach to Knowledge Graph Embedding

Danushka Bollegala, Huda Hakami, Yuichi Yoshida et al.

Embedding entities and relations of a knowledge graph in a low-dimensional space has shown impressive performance in predicting missing links between entities. Although progresses have been achieved, existing methods are heuristically motivated and theoretical understanding of such embeddings is comparatively underdeveloped. This paper extends the random walk model (Arora et al., 2016a) of word embeddings to Knowledge Graph Embeddings (KGEs) to derive a scoring function that evaluates the strength of a relation R between two entities h (head) and t (tail). Moreover, we show that marginal loss minimisation, a popular objective used in much prior work in KGE, follows naturally from the log-likelihood ratio maximisation under the probabilities estimated from the KGEs according to our theoretical relationship. We propose a learning objective motivated by the theoretical analysis to learn KGEs from a given knowledge graph. Using the derived objective, accurate KGEs are learnt from FB15K237 and WN18RR benchmark datasets, providing empirical evidence in support of the theory.

MLJan 20, 2021
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

Kei Takemura, Shinji Ito, Daisuke Hatano et al.

The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making problem in which a learner chooses a set of arms with the feature vectors in each round under given constraints so as to maximize the sum of rewards of arms. Several existing algorithms have regret bounds that are optimal with respect to the number of rounds $T$. However, there is a gap of $\tilde{O}(\max(\sqrt{d}, \sqrt{k}))$ between the current best upper and lower bounds, where $d$ is the dimension of the feature vectors, $k$ is the number of the chosen arms in a round, and $\tilde{O}(\cdot)$ ignores the logarithmic factors. The dependence of $k$ and $d$ is of practical importance because $k$ may be larger than $T$ in real-world applications such as recommender systems. In this paper, we fill the gap by improving the upper and lower bounds. More precisely, we show that the C${}^2$UCB algorithm proposed by Qin, Chen, and Zhu (2014) has the optimal regret bound $\tilde{O}(d\sqrt{kT} + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C${}^2$UCB algorithm and demonstrate that it enjoys the optimal regret bound for a more general problem that can take into account other objectives simultaneously. We also show that our technique would be applicable to related problems. Numerical experiments support our theoretical results and considerations.

LGSep 24, 2020
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Keyulu Xu, Mozhi Zhang, Jingling Li et al.

We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) -- structured networks with MLP modules -- have shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. But, they can provably learn a linear target function when the training distribution is sufficiently "diverse". Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features. Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings.

CLSep 12, 2019
Query Obfuscation Semantic Decomposition

Danushka Bollegala, Tomoya Machide, Ken-ichi Kawarabayashi

We propose a method to protect the privacy of search engine users by decomposing the queries using semantically \emph{related} and unrelated \emph{distractor} terms. Instead of a single query, the search engine receives multiple decomposed query terms. Next, we reconstruct the search results relevant to the original query term by aggregating the search results retrieved for the decomposed query terms. We show that the word embeddings learnt using a distributed representation learning method can be used to find semantically related and distractor query terms. We derive the relationship between the \emph{obfuscity} achieved through the proposed query anonymisation method and the \emph{reconstructability} of the original search results using the decomposed queries. We analytically study the risk of discovering the search engine users' information intents under the proposed query obfuscation method, and empirically evaluate its robustness against clustering-based attacks. Our experimental results show that the proposed method can accurately reconstruct the search results for user queries, without compromising the privacy of the search engine users.

CLJun 4, 2019
Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Mozhi Zhang, Keyulu Xu, Ken-ichi Kawarabayashi et al.

Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy).

LGMay 30, 2019
What Can Neural Networks Reason About?

Keyulu Xu, Jingling Li, Mozhi Zhang et al.

Neural networks have succeeded in many reasoning tasks. Empirically, these tasks require specialized network structures, e.g., Graph Neural Networks (GNNs) perform well on many such tasks, but less structured networks fail. Theoretically, there is limited understanding of why and when a network structure generalizes better than others, although they have equal expressive power. In this paper, we develop a framework to characterize which reasoning tasks a network can learn well, by studying how well its computation structure aligns with the algorithmic structure of the relevant reasoning process. We formally define this algorithmic alignment and derive a sample complexity bound that decreases with better alignment. This framework offers an explanation for the empirical success of popular reasoning models, and suggests their limitations. As an example, we unify seemingly different reasoning tasks, such as intuitive physics, visual question answering, and shortest paths, via the lens of a powerful algorithmic paradigm, dynamic programming (DP). We show that GNNs align with DP and thus are expected to solve these tasks. On several reasoning tasks, our theory is supported by empirical results.

LGJun 9, 2018
Representation Learning on Graphs with Jumping Knowledge Networks

Keyulu Xu, Chengtao Li, Yonglong Tian et al.

Recent deep learning approaches for representation learning on graphs follow a neighborhood aggregation procedure. We analyze some important properties of these models, and propose a strategy to overcome those. In particular, the range of "neighboring" nodes that a node's representation draws from strongly depends on the graph structure, analogous to the spread of a random walk. To adapt to local neighborhood properties and tasks, we explore an architecture -- jumping knowledge (JK) networks -- that flexibly leverages, for each node, different neighborhood ranges to enable better structure-aware representation. In a number of experiments on social, bioinformatics and citation networks, we demonstrate that our model achieves state-of-the-art performance. Furthermore, combining the JK framework with models like Graph Convolutional Networks, GraphSAGE and Graph Attention Networks consistently improves those models' performance.

MLJun 6, 2018
Causal Bandits with Propagating Inference

Akihiro Yabe, Daisuke Hatano, Hanna Sumita et al.

Bandit is a framework for designing sequential experiments. In each experiment, a learner selects an arm $A \in \mathcal{A}$ and obtains an observation corresponding to $A$. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms $|\mathcal{A}|$. This makes bandit incapable of handling an exponentially large number of arms, hence the bandit problem with side-information is often considered to overcome this lower bound. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information. A causal graph is a fundamental model that is frequently used with a variety of real problems. In this setting, the arms are identified with interventions on a given causal graph, and the effect of an intervention propagates throughout all over the causal graph. The task is to find the best intervention that maximizes the expected value on a target node. Existing algorithms for causal bandit overcame the $Ω(\sqrt{|\mathcal{A}|/T})$ simple-regret lower-bound; however, their algorithms work only when the interventions $\mathcal{A}$ are localized around a single node (i.e., an intervention propagates only to its neighbors). We propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves $O(\sqrt{ γ^*\log(|\mathcal{A}|T) / T})$ regret bound, where $γ^*$ is determined by using a causal graph structure. In particular, if the in-degree of the causal graph is bounded, then $γ^* = O(N^2)$, where $N$ is the number $N$ of nodes.

CLApr 14, 2018
ClassiNet -- Predicting Missing Features for Short-Text Classification

Danushka Bollegala, Vincent Atanasov, Takanori Maehara et al.

The fundamental problem in short-text classification is \emph{feature sparseness} -- the lack of feature overlap between a trained model and a test instance to be classified. We propose \emph{ClassiNet} -- a network of classifiers trained for predicting missing features in a given instance, to overcome the feature sparseness problem. Using a set of unlabeled training instances, we first learn binary classifiers as feature predictors for predicting whether a particular feature occurs in a given instance. Next, each feature predictor is represented as a vertex $v_i$ in the ClassiNet where a one-to-one correspondence exists between feature predictors and vertices. The weight of the directed edge $e_{ij}$ connecting a vertex $v_i$ to a vertex $v_j$ represents the conditional probability that given $v_i$ exists in an instance, $v_j$ also exists in the same instance. We show that ClassiNets generalize word co-occurrence graphs by considering implicit co-occurrences between features. We extract numerous features from the trained ClassiNet to overcome feature sparseness. In particular, for a given instance $\vec{x}$, we find similar features from ClassiNet that did not appear in $\vec{x}$, and append those features in the representation of $\vec{x}$. Moreover, we propose a method based on graph propagation to find features that are indirectly related to a given short-text. We evaluate ClassiNets on several benchmark datasets for short-text classification. Our experimental results show that by using ClassiNet, we can statistically significantly improve the accuracy in short-text classification tasks, without having to use any external resources such as thesauri for finding related features.

CLSep 19, 2017
Think Globally, Embed Locally --- Locally Linear Meta-embedding of Words

Danushka Bollegala, Kohei Hayashi, Ken-ichi Kawarabayashi

Distributed word embeddings have shown superior performances in numerous Natural Language Processing (NLP) tasks. However, their performances vary significantly across different tasks, implying that the word embeddings learnt by those methods capture complementary aspects of lexical semantics. Therefore, we believe that it is important to combine the existing word embeddings to produce more accurate and complete \emph{meta-embeddings} of words. For this purpose, we propose an unsupervised locally linear meta-embedding learning method that takes pre-trained word embeddings as the input, and produces more accurate meta embeddings. Unlike previously proposed meta-embedding learning methods that learn a global projection over all words in a vocabulary, our proposed method is sensitive to the differences in local neighbourhoods of the individual source word embeddings. Moreover, we show that vector concatenation, a previously proposed highly competitive baseline approach for integrating word embeddings, can be derived as a special case of the proposed method. Experimental results on semantic similarity, word analogy, relation classification, and short-text classification tasks show that our meta-embeddings to significantly outperform prior methods in several benchmark datasets, establishing a new state of the art for meta-embeddings.

CLSep 5, 2017
Using $k$-way Co-occurrences for Learning Word Embeddings

Danushka Bollegala, Yuichi Yoshida, Ken-ichi Kawarabayashi

Co-occurrences between two words provide useful insights into the semantics of those words. Consequently, numerous prior work on word embedding learning have used co-occurrences between two words as the training signal for learning word embeddings. However, in natural language texts it is common for multiple words to be related and co-occurring in the same context. We extend the notion of co-occurrences to cover $k(\geq\!\!2)$-way co-occurrences among a set of $k$-words. Specifically, we prove a theoretical relationship between the joint probability of $k(\geq\!\!2)$ words, and the sum of $\ell_2$ norms of their embeddings. Next, we propose a learning objective motivated by our theoretical result that utilises $k$-way co-occurrences for learning word embeddings. Our experimental results show that the derived theoretical relationship does indeed hold empirically, and despite data sparsity, for some smaller $k$ values, $k$-way embeddings perform comparably or better than $2$-way embeddings in a range of tasks.

CLNov 19, 2015
Joint Word Representation Learning using a Corpus and a Semantic Lexicon

Danushka Bollegala, Alsuhaibani Mohammed, Takanori Maehara et al.

Methods for learning word representations using large text corpora have received much attention lately due to their impressive performance in numerous natural language processing (NLP) tasks such as, semantic similarity measurement, and word analogy detection. Despite their success, these data-driven word representation learning methods do not consider the rich semantic relational structure between words in a co-occurring context. On the other hand, already much manual effort has gone into the construction of semantic lexicons such as the WordNet that represent the meanings of words by defining the various relationships that exist among the words in a language. We consider the question, can we improve the word representations learnt using a corpora by integrating the knowledge from semantic lexicons?. For this purpose, we propose a joint word representation learning method that simultaneously predicts the co-occurrences of two words in a sentence subject to the relational constrains given by the semantic lexicon. We use relations that exist between words in the lexicon to regularize the word representations learnt from the corpus. Our proposed method statistically significantly outperforms previously proposed methods for incorporating semantic lexicons into word representations on several benchmark datasets for semantic similarity and word analogy.

CLMay 27, 2015
Unsupervised Cross-Domain Word Representation Learning

Danushka Bollegala, Takanori Maehara, Ken-ichi Kawarabayashi

Meaning of a word varies from one domain to another. Despite this important domain dependence in word semantics, existing word representation learning methods are bound to a single domain. Given a pair of \emph{source}-\emph{target} domains, we propose an unsupervised method for learning domain-specific word representations that accurately capture the domain-specific aspects of word semantics. First, we select a subset of frequent words that occur in both domains as \emph{pivots}. Next, we optimize an objective function that enforces two constraints: (a) for both source and target domain documents, pivots that appear in a document must accurately predict the co-occurring non-pivots, and (b) word representations learnt for pivots must be similar in the two domains. Moreover, we propose a method to perform domain adaptation using the learnt word representations. Our proposed method significantly outperforms competitive baselines including the state-of-the-art domain-insensitive word representations, and reports best sentiment classification accuracies for all domain-pairs in a benchmark dataset.

CLMay 1, 2015
Embedding Semantic Relations into Word Representations

Danushka Bollegala, Takanori Maehara, Ken-ichi Kawarabayashi

Learning representations for semantic relations is important for various tasks such as analogy detection, relational search, and relation classification. Although there have been several proposals for learning representations for individual words, learning word representations that explicitly capture the semantic relations between words remains under developed. We propose an unsupervised method for learning vector representations for words such that the learnt representations are sensitive to the semantic relations that exist between two words. First, we extract lexical patterns from the co-occurrence contexts of two words in a corpus to represent the semantic relations that exist between those two words. Second, we represent a lexical pattern as the weighted sum of the representations of the words that co-occur with that lexical pattern. Third, we train a binary classifier to detect relationally similar vs. non-similar lexical pattern pairs. The proposed method is unsupervised in the sense that the lexical pattern pairs we use as train data are automatically sampled from a corpus, without requiring any manual intervention. Our proposed method statistically significantly outperforms the current state-of-the-art word representations on three benchmark datasets for proportional analogy detection, demonstrating its ability to accurately capture the semantic relations among words.

CLDec 7, 2014
Learning Word Representations from Relational Graphs

Danushka Bollegala, Takanori Maehara, Yuichi Yoshida et al.

Attributes of words and relations between two words are central to numerous tasks in Artificial Intelligence such as knowledge representation, similarity measurement, and analogy detection. Often when two words share one or more attributes in common, they are connected by some semantic relations. On the other hand, if there are numerous semantic relations between two words, we can expect some of the attributes of one of the words to be inherited by the other. Motivated by this close connection between attributes and relations, given a relational graph in which words are inter- connected via numerous semantic relations, we propose a method to learn a latent representation for the individual words. The proposed method considers not only the co-occurrences of words as done by existing approaches for word representation learning, but also the semantic relations in which two words co-occur. To evaluate the accuracy of the word representations learnt using the proposed method, we use the learnt word representations to solve semantic word analogy problems. Our experimental results show that it is possible to learn better word representations by using semantic semantics between words.

AIJan 23, 2014
Generating Approximate Solutions to the TTP using a Linear Distance Relaxation

Richard Hoshino, Ken-ichi Kawarabayashi

In some domestic professional sports leagues, the home stadiums are located in cities connected by a common train line running in one direction. For these instances, we can incorporate this geographical information to determine optimal or nearly-optimal solutions to the n-team Traveling Tournament Problem (TTP), an NP-hard sports scheduling problem whose solution is a double round-robin tournament schedule that minimizes the sum total of distances traveled by all n teams. We introduce the Linear Distance Traveling Tournament Problem (LD-TTP), and solve it for n=4 and n=6, generating the complete set of possible solutions through elementary combinatorial techniques. For larger n, we propose a novel "expander construction" that generates an approximate solution to the LD-TTP. For n congruent to 4 modulo 6, we show that our expander construction produces a feasible double round-robin tournament schedule whose total distance is guaranteed to be no worse than 4/3 times the optimal solution, regardless of where the n teams are located. This 4/3-approximation for the LD-TTP is stronger than the currently best-known ratio of 5/3 + epsilon for the general TTP. We conclude the paper by applying this linear distance relaxation to general (non-linear) n-team TTP instances, where we develop fast approximate solutions by simply "assuming" the n teams lie on a straight line and solving the modified problem. We show that this technique surprisingly generates the distance-optimal tournament on all benchmark sets on 6 teams, as well as close-to-optimal schedules for larger n, even when the teams are located around a circle or positioned in three-dimensional space.

AIJan 16, 2014
Scheduling Bipartite Tournaments to Minimize Total Travel Distance

Richard Hoshino, Ken-ichi Kawarabayashi

In many professional sports leagues, teams from opposing leagues/conferences compete against one another, playing inter-league games. This is an example of a bipartite tournament. In this paper, we consider the problem of reducing the total travel distance of bipartite tournaments, by analyzing inter-league scheduling from the perspective of discrete optimization. This research has natural applications to sports scheduling, especially for leagues such as the National Basketball Association (NBA) where teams must travel long distances across North America to play all their games, thus consuming much time, money, and greenhouse gas emissions. We introduce the Bipartite Traveling Tournament Problem (BTTP), the inter-league variant of the well-studied Traveling Tournament Problem. We prove that the 2n-team BTTP is NP-complete, but for small values of n, a distance-optimal inter-league schedule can be generated from an algorithm based on minimum-weight 4-cycle-covers. We apply our theoretical results to the 12-team Nippon Professional Baseball (NPB) league in Japan, producing a provably-optimal schedule requiring 42950 kilometres of total team travel, a 16% reduction compared to the actual distance traveled by these teams during the 2010 NPB season. We also develop a nearly-optimal inter-league tournament for the 30-team NBA league, just 3.8% higher than the trivial theoretical lower bound.