Nader H. Bshouty

LG
h-index3
19papers
133citations
Novelty51%
AI Score45

19 Papers

DSApr 7
Classes Testable with $O(1/ε)$ Queries for Small $ε$ Independent of the Number of Variables

Nader H. Bshouty, George Haddad

In this paper, we study classes of Boolean functions that are testable with $O(ψ+1/ε)$ queries, where $ψ$ depends on the parameters of the class (e.g., the number of terms, the number of relevant variables, etc.) but not on the total number of variables $n$. In particular, when $ε\le 1/ψ$, the query complexity is $O(1/ε)$, matching the known tight bound $Ω(1/ε)$. This result was previously known for classes of terms of size at most $k$ and exclusive OR functions of at most $k$ variables. In this paper, we extend this list to include the classes: $k$-junta, functions with Fourier degree at most $d$, $s$-sparse polynomials of degree at most $d$, and $s$-sparse polynomials. Additionally, we show that for any class $C$ of Boolean functions that depend on at most $k$ variables, if $C$ is properly exactly learnable, then it is testable with $O(1/ε)$ queries for $ε<1/ψ$, where $ψ$ depends on $k$ and independent of the total number of variables $n$.

DSJun 27, 2023
On Detecting Some Defective Items in Group Testing

Nader H. Bshouty, Catherine A. Haddad-Zaknoon

Group testing is an approach aimed at identifying up to $d$ defective items among a total of $n$ elements. This is accomplished by examining subsets to determine if at least one defective item is present. In our study, we focus on the problem of identifying a subset of $\ell\leq d$ defective items. We develop upper and lower bounds on the number of tests required to detect $\ell$ defective items in both the adaptive and non-adaptive settings while considering scenarios where no prior knowledge of $d$ is available, and situations where an estimate of $d$ or at least some non-trivial upper bound on $d$ is available. When no prior knowledge on $d$ is available, we prove a lower bound of $ Ω(\frac{\ell \log^2n}{\log \ell +\log\log n})$ tests in the randomized non-adaptive settings and an upper bound of $O(\ell \log^2 n)$ for the same settings. Furthermore, we demonstrate that any non-adaptive deterministic algorithm must ask $Θ(n)$ tests, signifying a fundamental limitation in this scenario. For adaptive algorithms, we establish tight bounds in different scenarios. In the deterministic case, we prove a tight bound of $Θ(\ell\log{(n/\ell)})$. Moreover, in the randomized settings, we derive a tight bound of $Θ(\ell\log{(n/d)})$. When $d$, or at least some non-trivial estimate of $d$, is known, we prove a tight bound of $Θ(d\log (n/d))$ for the deterministic non-adaptive settings, and $Θ(\ell\log(n/d))$ for the randomized non-adaptive settings. In the adaptive case, we present an upper bound of $O(\ell \log (n/\ell))$ for the deterministic settings, and a lower bound of $Ω(\ell\log(n/d)+\log n)$. Additionally, we establish a tight bound of $Θ(\ell \log(n/d))$ for the randomized adaptive settings.

DSMay 18
A Note on Second-Order Expected Maximum-Load Bounds for Binary Linear Hashing

Nader H. Bshouty

Let $S\subseteq F_2^u$ have size $n=2^\ell$, and let $h:F_2^u\to F_2^\ell$ be a uniformly random linear map. For $y\in F_2^\ell$, write $Load_h(y):=|h^{-1}(y)\cap S|$, and let $M(S,h):=\max_{y\in F_2^\ell} Load_h(y)$ be the maximum load. Jaber, Kumar and Zuckerman (STOC 2025) proved that the expected maximum load of $h$ on $S$ is at most $16\log n/\log\log n$, matching the fully independent keys-into-bins scale up to constants. Their proof also gives the tail estimate \[ \Pr\left[ M(S,h)\ge R\frac{\log n}{\log\log n} \right] \le O\left(\frac{1}{R^{2}}\right). \] We record a base optimization in their exponential-potential method showing that binary linear hashing nearly matches fully independent hashing also at the level of the second-order maximum-load scale. For every $R>1$ satisfying $R\ell^{1-1/R}\ge D\ln\ell$, where $D$ is an absolute constant, we prove \[ \Pr\left[ M(S,h)\ge R\frac{\log n}{\log\log n} \right] \le O\left( \frac{(\log\log n)^2}{R^2(\log n)^{2-2/R}} \right). \] Integrating this tail yields \[ E[M(S,h)] \le \left( 1+ (1+o(1)) \frac{\log\log\log n}{\log\log n} \right) \frac{\log n}{\log\log n}. \] Thus binary linear hashing matches fully independent hashing in the leading term and matches the dominant second-order correction up to a $1+o(1)$ factor. We also prove, by an independent self-contained argument, a sharp tail bound for one prescribed bucket: for fixed $y\in F_2^\ell$, \[ \Pr[ Load_h(y)>2^a-2]\le γ^{-1}2^{-a^2}, \] where $ γ=\prod_{j\ge1}(1-2^{-j}) $. A subspace construction shows that this is asymptotically tight even in the leading constant as $ a\to\infty $. However, this controls only a fixed bucket; a direct union bound over all buckets loses a factor $ 2^\ell $.

LGJul 16, 2024
Approximating the Number of Relevant Variables in a Parity Implies Proper Learning

Nader H. Bshouty, George Haddad

Consider the model where we can access a parity function through random uniform labeled examples in the presence of random classification noise. In this paper, we show that approximating the number of relevant variables in the parity function is as hard as properly learning parities. More specifically, let $γ:{\mathbb R}^+\to {\mathbb R}^+$, where $γ(x) \ge x$, be any strictly increasing function. In our first result, we show that from any polynomial-time algorithm that returns a $γ$-approximation, $D$ (i.e., $γ^{-1}(d(f)) \leq D \leq γ(d(f))$), of the number of relevant variables~$d(f)$ for any parity $f$, we can, in polynomial time, construct a solution to the long-standing open problem of polynomial-time learning $k(n)$-sparse parities (parities with $k(n)\le n$ relevant variables), where $k(n) = ω_n(1)$. In our second result, we show that from any $T(n)$-time algorithm that, for any parity $f$, returns a $γ$-approximation of the number of relevant variables $d(f)$ of $f$, we can, in polynomial time, construct a $poly(Γ(n))T(Γ(n)^2)$-time algorithm that properly learns parities, where $Γ(x)=γ(γ(x))$. If $T(Γ(n)^2)=\exp({o(n/\log n)})$, this would resolve another long-standing open problem of properly learning parities in the presence of random classification noise in time $\exp({o(n/\log n)})$.

LGFeb 3, 2025
On Exact Learning of $d$-Monotone Functions

Nader H. Bshouty

In this paper, we study the learnability of the Boolean class of $d$-monotone functions $f:{\cal X}\to\{0,1\}$ from membership and equivalence queries, where $({\cal X},\le)$ is a finite lattice. We show that the class of $d$-monotone functions that are represented in the form $f=F(g_1,g_2,\ldots,g_d)$, where $F$ is any Boolean function $F:\{0,1\}^d\to\{0,1\}$ and $g_1,\ldots,g_d:{\cal X}\to \{0,1\}$ are any monotone functions, is learnable in time $σ({\cal X})\cdot (size(f)/d+1)^{d}$ where $σ({\cal X})$ is the maximum sum of the number of immediate predecessors in a chain from the largest element to the smallest element in the lattice ${\cal X}$ and $size(f)=size(g_1)+\cdots+size(g_d)$, where $size(g_i)$ is the number of minimal elements in $g_i^{-1}(1)$. For the Boolean function $f:\{0,1\}^n\to\{0,1\}$, the class of $d$-monotone functions that are represented in the form $f=F(g_1,g_2,\ldots,g_d)$, where $F$ is any Boolean function and $g_1,\ldots,g_d$ are any monotone DNF, is learnable in time $O(n^2)\cdot (size(f)/d+1)^{d}$ where $size(f)=size(g_1)+\cdots+size(g_d)$. In particular, this class is learnable in polynomial time when $d$ is constant. Additionally, this class is learnable in polynomial time when $size(g_i)$ is constant for all $i$ and $d=O(\log n)$.

LGFeb 7, 2022
Almost Optimal Proper Learning and Testing Polynomials

Nader H. Bshouty

We give the first almost optimal polynomial-time proper learning algorithm of Boolean sparse multivariate polynomial under the uniform distribution. For $s$-sparse polynomial over $n$ variables and $ε=1/s^β$, $β>1$, our algorithm makes $$q_U=\left(\frac{s}ε\right)^{\frac{\log β}β+O(\frac{1}β)}+ \tilde O\left(s\right)\left(\log\frac{1}ε\right)\log n$$ queries. Notice that our query complexity is sublinear in $1/ε$ and almost linear in $s$. All previous algorithms have query complexity at least quadratic in $s$ and linear in $1/ε$. We then prove the almost tight lower bound $$q_L=\left(\frac{s}ε\right)^{\frac{\log β}β+Ω(\frac{1}β)}+ Ω\left(s\right)\left(\log\frac{1}ε\right)\log n,$$ Applying the reduction in~\cite{Bshouty19b} with the above algorithm, we give the first almost optimal polynomial-time tester for $s$-sparse polynomial. Our tester, for $β>3.404$, makes $$\tilde O\left(\frac{s}ε\right)$$ queries.

DSAug 10, 2021
On Learning and Testing Decision Tree

Nader H. Bshouty, Catherine A. Haddad-Zaknoon

In this paper, we study learning and testing decision tree of size and depth that are significantly smaller than the number of attributes $n$. Our main result addresses the problem of poly$(n,1/ε)$ time algorithms with poly$(s,1/ε)$ query complexity (independent of $n$) that distinguish between functions that are decision trees of size $s$ from functions that are $ε$-far from any decision tree of size $φ(s,1/ε)$, for some function $φ> s$. The best known result is the recent one that follows from Blank, Lange and Tan,~\cite{BlancLT20}, that gives $φ(s,1/ε)=2^{O((\log^3s)/ε^3)}$. In this paper, we give a new algorithm that achieves $φ(s,1/ε)=2^{O(\log^2 (s/ε))}$. Moreover, we study the testability of depth-$d$ decision tree and give a {\it distribution free} tester that distinguishes between depth-$d$ decision tree and functions that are $ε$-far from depth-$d^2$ decision tree. In particular, for decision trees of size $s$, the above result holds in the distribution-free model when the tree depth is $O(\log(s/ε))$. We also give other new results in learning and testing of size-$s$ decision trees and depth-$d$ decision trees that follow from results in the literature and some results we prove in this paper.

LGNov 5, 2019
Bounds for the Number of Tests in Non-Adaptive Randomized Algorithms for Group Testing

Nader H. Bshouty, George Haddad, Catherine A. Haddad-Zaknoon

We study the group testing problem with non-adaptive randomized algorithms. Several models have been discussed in the literature to determine how to randomly choose the tests. For a model ${\cal M}$, let $m_{\cal M}(n,d)$ be the minimum number of tests required to detect at most $d$ defectives within $n$ items, with success probability at least $1-δ$, for some constant $δ$. In this paper, we study the measures $$c_{\cal M}(d)=\lim_{n\to \infty} \frac{m_{\cal M}(n,d)}{\ln n} \mbox{ and } c_{\cal M}=\lim_{d\to \infty} \frac{c_{\cal M}(d)}{d}.$$ In the literature, the analyses of such models only give upper bounds for $c_{\cal M}(d)$ and $c_{\cal M}$, and for some of them, the bounds are not tight. We give new analyses that yield tight bounds for $c_{\cal M}(d)$ and $c_{\cal M}$ for all the known models~${\cal M}$.

LGJan 23, 2019
Adaptive Exact Learning of Decision Trees from Membership Queries

Nader H. Bshouty, Catherine A. Haddad-Zaknoon

In this paper we study the adaptive learnability of decision trees of depth at most $d$ from membership queries. This has many applications in automated scientific discovery such as drugs development and software update problem. Feldman solves the problem in a randomized polynomial time algorithm that asks $\tilde O(2^{2d})\log n$ queries and Kushilevitz-Mansour in a deterministic polynomial time algorithm that asks $ 2^{18d+o(d)}\log n$ queries. We improve the query complexity of both algorithms. We give a randomized polynomial time algorithm that asks $\tilde O(2^{2d}) + 2^{d}\log n$ queries and a deterministic polynomial time algorithm that asks $2^{5.83d}+2^{2d+o(d)}\log n$ queries.

LGMar 28, 2018
On Learning Graphs with Edge-Detecting Queries

Hasan Abasi, Nader H. Bshouty

We consider the problem of learning a general graph $G=(V,E)$ using edge-detecting queries, where the number of vertices $|V|=n$ is given to the learner. The information theoretic lower bound gives $m\log n$ for the number of queries, where $m=|E|$ is the number of edges. In case the number of edges $m$ is also given to the learner, Angluin-Chen's Las Vegas algorithm \cite{AC08} runs in $4$ rounds and detects the edges in $O(m\log n)$ queries. In the other harder case where the number of edges $m$ is unknown, their algorithm runs in $5$ rounds and asks $O(m\log n+\sqrt{m}\log^2 n)$ queries. There have been two open problems: \emph{(i)} can the number of queries be reduced to $O(m\log n)$ in the second case, and, \emph{(ii)} can the number of rounds be reduced without substantially increasing the number of queries (in both cases). For the first open problem (when $m$ is unknown) we give two algorithms. The first is an $O(1)$-round Las Vegas algorithm that asks $m\log n+\sqrt{m}(\log^{[k]}n)\log n$ queries for any constant $k$ where $\log^{[k]}n=\log \stackrel{k}{\cdots} \log n$. The second is an $O(\log^*n)$-round Las Vegas algorithm that asks $O(m\log n)$ queries. This solves the first open problem for any practical $n$, for example, $n<2^{65536}$. We also show that no deterministic algorithm can solve this problem in a constant number of rounds. To solve the second problem we study the case when $m$ is known. We first show that any non-adaptive Monte Carlo algorithm (one-round) must ask at least $Ω(m^2\log n)$ queries, and any two-round Las Vegas algorithm must ask at least $m^{4/3-o(1)}\log n$ queries on average. We then give two two-round Monte Carlo algorithms, the first asks $O(m^{4/3}\log n)$ queries for any $n$ and $m$, and the second asks $O(m\log n)$ queries when $n>2^m$. Finally, we give a $3$-round Monte Carlo algorithm that asks $O(m\log n)$ queries for any $n$ and $m$.

DSFeb 1, 2018
On Polynomial time Constructions of Minimum Height Decision Tree

Nader H. Bshouty, Waseem Makhoul

In this paper we study a polynomial time algorithms that for an input $A\subseteq {B_m}$ outputs a decision tree for $A$ of minimum depth. This problem has many applications that include, to name a few, computer vision, group testing, exact learning from membership queries and game theory. Arkin et al. and Moshkov gave a polynomial time $(\ln |A|)$- approximation algorithm (for the depth). The result of Dinur and Steurer for set cover implies that this problem cannot be approximated with ratio $(1-o(1))\cdot \ln |A|$, unless P=NP. Moskov the combinatorial measure of extended teaching dimension of $A$, $ETD(A)$. He showed that $ETD(A)$ is a lower bound for the depth of the decision tree for $A$ and then gave an {\it exponential time} $ETD(A)/\log(ETD(A))$-approximation algorithm. In this paper we further study the $ETD(A)$ measure and a new combinatorial measure, $DEN(A)$, that we call the density of the set $A$. We show that $DEN(A)\le ETD(A)+1$. We then give two results. The first result is that the lower bound $ETD(A)$ of Moshkov for the depth of the decision tree for $A$ is greater than the bounds that are obtained by the classical technique used in the literature. The second result is a polynomial time $(\ln 2) DEN(A)$-approximation (and therefore $(\ln 2) ETD(A)$-approximation) algorithm for the depth of the decision tree of $A$. We also show that a better approximation ratio implies P=NP. We then apply the above results to learning the class of disjunctions of predicates from membership queries. We show that the $ETD$ of this class is bounded from above by the degree $d$ of its Hasse diagram. We then show that Moshkov algorithm can be run in polynomial time and is $(d/\log d)$-approximation algorithm. This gives optimal algorithms when the degree is constant. For example, learning axis parallel rays over constant dimension space.

LGAug 9, 2017
Non-Adaptive Randomized Algorithm for Group Testing

Nader H. Bshouty, Nuha Diab, Shada R. Kawar et al.

We study the problem of group testing with a non-adaptive randomized algorithm in the random incidence design (RID) model where each entry in the test is chosen randomly independently from $\{0,1\}$ with a fixed probability $p$. The property that is sufficient and necessary for a unique decoding is the separability of the tests, but unfortunately no linear time algorithm is known for such tests. In order to achieve linear-time decodable tests, the algorithms in the literature use the disjunction property that gives almost optimal number of tests. We define a new property for the tests which we call semi-disjunction property. We show that there is a linear time decoding for such test and for $d\to \infty$ the number of tests converges to the number of tests with the separability property and is therefore optimal (in the RID model). Our analysis shows that, in the RID model, the number of tests in our algorithm is better than the one with the disjunction property even for small $d$.

LGJul 4, 2017
The Maximum Cosine Framework for Deriving Perceptron Based Linear Classifiers

Nader H. Bshouty, Catherine A. Haddad-Zaknoon

In this work, we introduce a mathematical framework, called the Maximum Cosine Framework or MCF, for deriving new linear classifiers. The method is based on selecting an appropriate bound on the cosine of the angle between the target function and the algorithm's. To justify its correctness, we use the MCF to show how to regenerate the update rule of Aggressive ROMMA. Moreover, we construct a cosine bound from which we build the Maximum Cosine Perceptron algorithm or, for short, the MCP algorithm. We prove that the MCP shares the same mistake bound like the Perceptron. In addition, we demonstrate the promising performance of the MCP on a real dataset. Our experiments show that, under the restriction of single pass learning, the MCP algorithm outperforms PA and Aggressive ROMMA.

LGJun 21, 2017
Exact Learning of Juntas from Membership Queries

Nader H. Bshouty, Areej Costa

In this paper, we study adaptive and non-adaptive exact learning of Juntas from membership queries. We use new techniques to find new bounds, narrow some of the gaps between the lower bounds and upper bounds and find new deterministic and randomized algorithms with small query and time complexities. Some of the bounds are tight in the sense that finding better ones either gives a breakthrough result in some long-standing combinatorial open problem or needs a new technique that is beyond the existing ones.

LGJun 15, 2017
Learning Disjunctions of Predicates

Nader H. Bshouty, Dana Drachsler-Cohen, Martin Vechev et al.

Let $F$ be a set of boolean functions. We present an algorithm for learning $F_\vee := \{\vee_{f\in S} f \mid S \subseteq F\}$ from membership queries. Our algorithm asks at most $|F| \cdot OPT(F_\vee)$ membership queries where $OPT(F_\vee)$ is the minimum worst case number of membership queries for learning $F_\vee$. When $F$ is a set of halfspaces over a constant dimension space or a set of variable inequalities, our algorithm runs in polynomial time. The problem we address has practical importance in the field of program synthesis, where the goal is to synthesize a program that meets some requirements. Program synthesis has become popular especially in settings aiming to help end users. In such settings, the requirements are not provided upfront and the synthesizer can only learn them by posing membership queries to the end user. Our work enables such synthesizers to learn the exact requirements while bounding the number of membership queries.

LGJun 13, 2017
Exact Learning from an Honest Teacher That Answers Membership Queries

Nader H. Bshouty

Given a teacher that holds a function $f:X\to R$ from some class of functions $C$. The teacher can receive from the learner an element~$d$ in the domain $X$ (a query) and returns the value of the function in $d$, $f(d)\in R$. The learner goal is to find $f$ with a minimum number of queries, optimal time complexity, and optimal resources. In this survey, we present some of the results known from the literature, different techniques used, some new problems, and open problems.

LGFeb 13, 2015
Non-Adaptive Learning a Hidden Hipergraph

Hasan Abasi, Nader H. Bshouty, Hanna Mazzawi

We give a new deterministic algorithm that non-adaptively learns a hidden hypergraph from edge-detecting queries. All previous non-adaptive algorithms either run in exponential time or have non-optimal query complexity. We give the first polynomial time non-adaptive learning algorithm for learning hypergraph that asks almost optimal number of queries.

LGMay 7, 2014
Learning Boolean Halfspaces with Small Weights from Membership Queries

Hasan Abasi, Ali Z. Abdi, Nader H. Bshouty

We consider the problem of proper learning a Boolean Halfspace with integer weights $\{0,1,\ldots,t\}$ from membership queries only. The best known algorithm for this problem is an adaptive algorithm that asks $n^{O(t^5)}$ membership queries where the best lower bound for the number of membership queries is $n^t$ [Learning Threshold Functions with Small Weights Using Membership Queries. COLT 1999] In this paper we close this gap and give an adaptive proper learning algorithm with two rounds that asks $n^{O(t)}$ membership queries. We also give a non-adaptive proper learning algorithm that asks $n^{O(t^3)}$ membership queries.

LGMay 5, 2014
On Exact Learning Monotone DNF from Membership Queries

Hasan Abasi, Nader H. Bshouty, Hanna Mazzawi

In this paper, we study the problem of learning a monotone DNF with at most $s$ terms of size (number of variables in each term) at most $r$ ($s$ term $r$-MDNF) from membership queries. This problem is equivalent to the problem of learning a general hypergraph using hyperedge-detecting queries, a problem motivated by applications arising in chemical reactions and genome sequencing. We first present new lower bounds for this problem and then present deterministic and randomized adaptive algorithms with query complexities that are almost optimal. All the algorithms we present in this paper run in time linear in the query complexity and the number of variables $n$. In addition, all of the algorithms we present in this paper are asymptotically tight for fixed $r$ and/or $s$.