Jehoshua Bruck

h-index60

15papers

99citations

Novelty58%

AI Score42

Ranked #59,001 of 194,257 authors (top 30%)#160 in IT (top 21%)

15 Papers

8.6ARMar 27

Data Gravity and the Energy Limits of Computation

Wonsuk Lee, Jehoshua Bruck

Unlike the von Neumann architecture, which separates computation from memory, the brain tightly integrates them, an organization that large language models increasingly resemble. The crucial difference lies in the ratio of energy spent on computation versus data access: in the brain, most energy fuels compute, while in von Neumann architectures, data movement dominates. To capture this imbalance, we introduce the \emph{operation-operand disjunction constant} $G_d$, a dimensionless measure of the energy required for data transport relative to computation. As part of this framework, we propose the metaphor of \emph{data gravity}: just as mass exerts gravitational pull, large and frequently accessed data sets attract computation. We develop expressions for optimal computation placement and show that bringing the computation closer to the data can reduce energy consumption by a factor of $G_d^{(Î²- 1)/2}$, where $Î²\in (1, 3)$ captures the empirically observed distance-dependent energy scaling. We demonstrate that these findings are consistent with measurements across processors from 45\,nm to 7\,nm, as well as with results from processing-in-memory (PIM) architectures. High $G_d$ values are limiting; as $G_d$ increases, the energy required for data movement threatens to stall progress, slowing the scaling of large language models and pushing modern computing toward a plateau. Unless computation is realigned with data gravity, the growth of AI may be capped not by algorithms but by physics.

4.3CCMay 17, 2022

On Algebraic Constructions of Neural Networks with Small Weights

Kordag Mehmet Kilic, Jin Sima, Jehoshua Bruck

Neural gates compute functions based on weighted sums of the input variables. The expressive power of neural gates (number of distinct functions it can compute) depends on the weight sizes and, in general, large weights (exponential in the number of inputs) are required. Studying the trade-offs among the weight sizes, circuit size and depth is a well-studied topic both in circuit complexity theory and the practice of neural computation. We propose a new approach for studying these complexity trade-offs by considering a related algebraic framework. Specifically, given a single linear equation with arbitrary coefficients, we would like to express it using a system of linear equations with smaller (even constant) coefficients. The techniques we developed are based on Siegel's Lemma for the bounds, anti-concentration inequalities for the existential results and extensions of Sylvester-type Hadamard matrices for the constructions. We explicitly construct a constant weight, optimal size matrix to compute the EQUALITY function (checking if two integers expressed in binary are equal). Computing EQUALITY with a single linear equation requires exponentially large weights. In addition, we prove the existence of the best-known weight size (linear) matrices to compute the COMPARISON function (comparing between two integers expressed in binary). In the context of the circuit complexity theory, our results improve the upper bounds on the weight sizes for the best-known circuit sizes for EQUALITY and COMPARISON.

2.0LGNov 12, 2023

Omitted Labels Induce Nontransitive Paradoxes in Causality

Bijan Mazaheri, Siddharth Jain, Matthew Cook et al.

We explore "omitted label contexts," in which training data is limited to a subset of the possible labels. This setting is standard among specialized human experts or specific, focused studies. By studying Simpson's paradox, we observe that ``correct'' adjustments sometimes require non-exchangeable treatment and control groups. A generalization of Simpson's paradox leads us to study networks of conclusions drawn from different contexts, within which a paradox of nontransitivity arises. We prove that the space of possible nontransitive structures in these networks exactly corresponds to structures that form from aggregating ranked-choice votes.

2.3CCFeb 13, 2024

Nearest Neighbor Representations of Neurons

Kordag Mehmet Kilic, Jin Sima, Jehoshua Bruck

The Nearest Neighbor (NN) Representation is an emerging computational model that is inspired by the brain. We study the complexity of representing a neuron (threshold function) using the NN representations. It is known that two anchors (the points to which NN is computed) are sufficient for a NN representation of a threshold function, however, the resolution (the maximum number of bits required for the entries of an anchor) is $O(n\log{n})$. In this work, the trade-off between the number of anchors and the resolution of a NN representation of threshold functions is investigated. We prove that the well-known threshold functions EQUALITY, COMPARISON, and ODD-MAX-BIT, which require 2 or 3 anchors and resolution of $O(n)$, can be represented by polynomially large number of anchors in $n$ and $O(\log{n})$ resolution. We conjecture that for all threshold functions, there are NN representations with polynomially large size and logarithmic resolution in $n$.

1.2CCFeb 13, 2024

Nearest Neighbor Representations of Neural Circuits

Kordag Mehmet Kilic, Jin Sima, Jehoshua Bruck

Neural networks successfully capture the computational power of the human brain for many tasks. Similarly inspired by the brain architecture, Nearest Neighbor (NN) representations is a novel approach of computation. We establish a firmer correspondence between NN representations and neural networks. Although it was known how to represent a single neuron using NN representations, there were no results even for small depth neural networks. Specifically, for depth-2 threshold circuits, we provide explicit constructions for their NN representation with an explicit bound on the number of bits to represent it. Example functions include NN representations of convex polytopes (AND of threshold gates), IP2, OR of threshold gates, and linear or exact decision lists.

3.3CCMay 9, 2023

On the Information Capacity of Nearest Neighbor Representations

Kordag Mehmet Kilic, Jin Sima, Jehoshua Bruck

The $\textit{von Neumann Computer Architecture}$ has a distinction between computation and memory. In contrast, the brain has an integrated architecture where computation and memory are indistinguishable. Motivated by the architecture of the brain, we propose a model of $\textit{associative computation}$ where memory is defined by a set of vectors in $\mathbb{R}^n$ (that we call $\textit{anchors}$), computation is performed by convergence from an input vector to a nearest neighbor anchor, and the output is a label associated with an anchor. Specifically, in this paper, we study the representation of Boolean functions in the associative computation model, where the inputs are binary vectors and the corresponding outputs are the labels ($0$ or $1$) of the nearest neighbor anchors. The information capacity of a Boolean function in this model is associated with two quantities: $\textit{(i)}$ the number of anchors (called $\textit{Nearest Neighbor (NN) Complexity}$) and $\textit{(ii)}$ the maximal number of bits representing entries of anchors (called $\textit{Resolution}$). We study symmetric Boolean functions and present constructions that have optimal NN complexity and resolution.

1.6LGJul 15, 2021

Expert Graphs: Synthesizing New Expertise via Collaboration

Bijan Mazaheri, Siddharth Jain, Jehoshua Bruck

Consider multiple experts with overlapping expertise working on a classification problem under uncertain input. What constitutes a consistent set of opinions? How can we predict the opinions of experts on missing sub-domains? In this paper, we define a framework of to analyze this problem, termed "expert graphs." In an expert graph, vertices represent classes and edges represent binary opinions on the topics of their vertices. We derive necessary conditions for expert graph validity and use them to create "synthetic experts" which describe opinions consistent with the observed opinions of other experts. We show this framework to be equivalent to the well-studied linear ordering polytope. We show our conditions are not sufficient for describing all expert graphs on cliques, but are sufficient for cycles.

6.7MLOct 23, 2020Code

Robust Correction of Sampling Bias Using Cumulative Distribution Functions

Bijan Mazaheri, Siddharth Jain, Jehoshua Bruck

Varying domains and biased datasets can lead to differences between the training and the target distributions, known as covariate shift. Current approaches for alleviating this often rely on estimating the ratio of training and target probability density functions. These techniques require parameter tuning and can be unstable across different datasets. We present a new method for handling covariate shift using the empirical cumulative distribution function estimates of the target distribution by a rigorous generalization of a recent idea proposed by Vapnik and Izmailov. Further, we show experimentally that our method is more robust in its predictions, is not reliant on parameter tuning and shows similar classification performance compared to the current state-of-the-art techniques on synthetic and real datasets.

1.2LGApr 22, 2020

CodNN -- Robust Neural Networks From Coded Classification

Netanel Raviv, Siddharth Jain, Pulakesh Upadhyaya et al.

Deep Neural Networks (DNNs) are a revolutionary force in the ongoing information revolution, and yet their intrinsic properties remain a mystery. In particular, it is widely known that DNNs are highly sensitive to noise, whether adversarial or random. This poses a fundamental challenge for hardware implementations of DNNs, and for their deployment in critical applications such as autonomous driving. In this paper we construct robust DNNs via error correcting codes. By our approach, either the data or internal layers of the DNN are coded with error correcting codes, and successful computation under noise is guaranteed. Since DNNs can be seen as a layered concatenation of classification tasks, our research begins with the core task of classifying noisy coded inputs, and progresses towards robust DNNs. We focus on binary data and linear codes. Our main result is that the prevalent parity code can guarantee robustness for a large family of DNNs, which includes the recently popularized binarized neural networks. Further, we show that the coded classification problem has a deep connection to Fourier analysis of Boolean functions. In contrast to existing solutions in the literature, our results do not rely on altering the training process of the DNN, and provide mathematically rigorous guarantees rather than experimental evidence.

1.2ITJun 1, 2017

Generic Secure Repair for Distributed Storage

Wentao Huang, Jehoshua Bruck

This paper studies the problem of repairing secret sharing schemes, i.e., schemes that encode a message into $n$ shares, assigned to $n$ nodes, so that any $n-r$ nodes can decode the message but any colluding $z$ nodes cannot infer any information about the message. In the event of node failures so that shares held by the failed nodes are lost, the system needs to be repaired by reconstructing and reassigning the lost shares to the failed (or replacement) nodes. This can be achieved trivially by a trustworthy third-party that receives the shares of the available nodes, recompute and reassign the lost shares. The interesting question, studied in the paper, is how to repair without a trustworthy third-party. The main issue that arises is repair security: how to maintain the requirement that any colluding $z$ nodes, including the failed nodes, cannot learn any information about the message, during and after the repair process? We solve this secure repair problem from the perspective of secure multi-party computation. Specifically, we design generic repair schemes that can securely repair any (scalar or vector) linear secret sharing schemes. We prove a lower bound on the repair bandwidth of secure repair schemes and show that the proposed secure repair schemes achieve the optimal repair bandwidth up to a small constant factor when $n$ dominates $z$, or when the secret sharing scheme being repaired has optimal rate. We adopt a formal information-theoretic approach in our analysis and bounds. A main idea in our schemes is to allow a more flexible repair model than the straightforward one-round repair model implicitly assumed by existing secure regenerating codes. Particularly, the proposed secure repair schemes are simple and efficient two-round protocols.

5.1ITMay 28, 2015

Communication Efficient Secret Sharing

Wentao Huang, Michael Langberg, Joerg Kliewer et al.

A secret sharing scheme is a method to store information securely and reliably. Particularly, in a threshold secret sharing scheme, a secret is encoded into $n$ shares, such that any set of at least $t_1$ shares suffice to decode the secret, and any set of at most $t_2 < t_1$ shares reveal no information about the secret. Assuming that each party holds a share and a user wishes to decode the secret by receiving information from a set of parties; the question we study is how to minimize the amount of communication between the user and the parties. We show that the necessary amount of communication, termed "decoding bandwidth", decreases as the number of parties that participate in decoding increases. We prove a tight lower bound on the decoding bandwidth, and construct secret sharing schemes achieving the bound. Particularly, we design a scheme that achieves the optimal decoding bandwidth when $d$ parties participate in decoding, universally for all $t_1 \le d \le n$. The scheme is based on Shamir's secret sharing scheme and preserves its simplicity and efficiency. In addition, we consider secure distributed storage where the proposed communication efficient secret sharing schemes further improve disk access complexity during decoding.

1.2ITJan 19, 2014

The Capacity of String-Replication Systems

Farzad Farnoud, Moshe Schwartz, Jehoshua Bruck

It is known that the majority of the human genome consists of repeated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from repeated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence and simple replication rules, including those resembling genomic replication processes. In other words, our goal is to find out the capacity, or the expressive power, of these string-replication systems. Our results include exact capacities, and bounds on the capacities, of four fundamental string-replication systems.

1.2ITSep 4, 2012

Efficiently Extracting Randomness from Imperfect Stochastic Processes

Hongchao Zhou, Jehoshua Bruck

We study the problem of extracting a prescribed number of random bits by reading the smallest possible number of symbols from non-ideal stochastic processes. The related interval algorithm proposed by Han and Hoshi has asymptotically optimal performance; however, it assumes that the distribution of the input stochastic process is known. The motivation for our work is the fact that, in practice, sources of randomness have inherent correlations and are affected by measurement's noise. Namely, it is hard to obtain an accurate estimation of the distribution. This challenge was addressed by the concepts of seeded and seedless extractors that can handle general random sources with unknown distributions. However, known seeded and seedless extractors provide extraction efficiencies that are substantially smaller than Shannon's entropy limit. Our main contribution is the design of extractors that have a variable input-length and a fixed output length, are efficient in the consumption of symbols from the source, are capable of generating random bits from general stochastic processes and approach the information theoretic upper bound on efficiency.

1.2ITSep 4, 2012

Linear Transformations for Randomness Extraction

Hongchao Zhou, Jehoshua Bruck

Information-efficient approaches for extracting randomness from imperfect sources have been extensively studied, but simpler and faster ones are required in the high-speed applications of random number generation. In this paper, we focus on linear constructions, namely, applying linear transformation for randomness extraction. We show that linear transformations based on sparse random matrices are asymptotically optimal to extract randomness from independent sources and bit-fixing sources, and they are efficient (may not be optimal) to extract randomness from hidden Markov sources. Further study demonstrates the flexibility of such constructions on source models as well as their excellent information-preserving capabilities. Since linear transformations based on sparse random matrices are computationally fast and can be easy to implement using hardware like FPGAs, they are very attractive in the high-speed applications. In addition, we explore explicit constructions of transformation matrices. We show that the generator matrices of primitive BCH codes are good choices, but linear transformations based on such matrices require more computational time due to their high densities.

1.2ITSep 4, 2012

Synthesis of Stochastic Flow Networks

Hongchao Zhou, Ho-Lin Chen, Jehoshua Bruck

A stochastic flow network is a directed graph with incoming edges (inputs) and outgoing edges (outputs), tokens enter through the input edges, travel stochastically in the network, and can exit the network through the output edges. Each node in the network is a splitter, namely, a token can enter a node through an incoming edge and exit on one of the output edges according to a predefined probability distribution. Stochastic flow networks can be easily implemented by DNA-based chemical reactions, with promising applications in molecular computing and stochastic computing. In this paper, we address a fundamental synthesis question: Given a finite set of possible splitters and an arbitrary rational probability distribution, design a stochastic flow network, such that every token that enters the input edge will exit the outputs with the prescribed probability distribution. The problem of probability transformation dates back to von Neumann's 1951 work and was followed, among others, by Knuth and Yao in 1976. Most existing works have been focusing on the "simulation" of target distributions. In this paper, we design optimal-sized stochastic flow networks for "synthesizing" target distributions. It shows that when each splitter has two outgoing edges and is unbiased, an arbitrary rational probability \frac{a}{b} with a\leq b\leq 2^n can be realized by a stochastic flow network of size n that is optimal. Compared to the other stochastic systems, feedback (cycles in networks) strongly improves the expressibility of stochastic flow networks.