ITLGFeb 15, 2016

Distributed Information-Theoretic Clustering

arXiv:1602.04605v73 citations
Originality Incremental advance
AI Analysis

This work addresses fundamental limits in distributed information theory for clustering applications, with incremental contributions to specific setups.

The paper tackles the problem of distributed source coding for maximizing mutual information between encoded sequences, connecting it to biclustering and related tasks. It improves cardinality bounds to quantify the gap between inner and outer bounds for a binary symmetric source and provides a tight characterization for a multiple-description CEO extension.

We study a novel multi-terminal source coding setup motivated by the biclustering problem. Two separate encoders observe two i.i.d. sequences $X^n$ and $Y^n$, respectively. The goal is to find rate-limited encodings $f(x^n)$ and $g(z^n)$ that maximize the mutual information $I(f(X^n); g(Y^n))/n$. We discuss connections of this problem with hypothesis testing against independence, pattern recognition, and the information bottleneck method. Improving previous cardinality bounds for the inner and outer bounds allows us to thoroughly study the special case of a binary symmetric source and to quantify the gap between the inner and the outer bound in this special case. Furthermore, we investigate a multiple description (MD) extension of the Chief Operating Officer (CEO) problem with mutual information constraint. Surprisingly, this MD-CEO problem permits a tight single-letter characterization of the achievable region.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes