Hye Won Chung

LG
h-index11
25papers
249citations
Novelty58%
AI Score59

25 Papers

LGJan 3, 2023Code
Data Valuation Without Training of a Model

Nohyun Ki, Hoyong Choi, Hye Won Chung

Many recent works on understanding deep learning try to quantify how much individual data instances influence the optimization and generalization of a model. Such attempts reveal characteristics and importance of individual instances, which may provide useful information in diagnosing and improving deep learning. However, most of the existing works on data valuation require actual training of a model, which often demands high-computational cost. In this paper, we provide a training-free data valuation score, called complexity-gap score, which is a data-centric score to quantify the influence of individual instances in generalization of two-layer overparameterized neural networks. The proposed score can quantify irregularity of the instances and measure how much each data instance contributes in the total movement of the network parameters during training. We theoretically analyze and empirically demonstrate the effectiveness of the complexity-gap score in finding `irregular or mislabeled' data instances, and also provide applications of the score in analyzing datasets and diagnosing training dynamics. Our code is publicly available at https://github.com/JJchy/CG_score

LGAug 20, 2024Code
Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

Dong Geun Shin, Hye Won Chung

Detecting out-of-distribution (OOD) samples is a critical task for reliable machine learning. However, it becomes particularly challenging when the models are trained on long-tailed datasets, as the models often struggle to distinguish tail-class in-distribution samples from OOD samples. We examine the main challenges in this problem by identifying the trade-offs between OOD detection and in-distribution (ID) classification, faced by existing methods. We then introduce our method, called \textit{Representation Norm Amplification} (RNA), which solves this challenge by decoupling the two problems. The main idea is to use the norm of the representation as a new dimension for OOD detection, and to develop a training method that generates a noticeable discrepancy in the representation norm between ID and OOD data, while not perturbing the feature learning for ID classification. Our experiments show that RNA achieves superior performance in both OOD detection and classification compared to the state-of-the-art methods, by 1.70\% and 9.46\% in FPR95 and 2.43\% and 6.87\% in classification accuracy on CIFAR10-LT and ImageNet-LT, respectively. The code for this work is available at https://github.com/dgshin21/RNA.

CVJul 8, 2022
Test-Time Adaptation via Self-Training with Nearest Neighbor Information

Minguk Jang, Sae-Young Chung, Hye Won Chung

Test-time adaptation (TTA) aims to adapt a trained classifier using online unlabeled test data only, without any information related to the training procedure. Most existing TTA methods adapt the trained classifier using the classifier's prediction on the test data as pseudo-label. However, under test-time domain shift, accuracy of the pseudo labels cannot be guaranteed, and thus the TTA methods often encounter performance degradation at the adapted classifier. To overcome this limitation, we propose a novel test-time adaptation method, called Test-time Adaptation via Self-Training with nearest neighbor information (TAST), which is composed of the following procedures: (1) adds trainable adaptation modules on top of the trained feature extractor; (2) newly defines a pseudo-label distribution for the test data by using the nearest neighbor information; (3) trains these modules only a few times during test time to match the nearest neighbor-based pseudo label distribution and a prototype-based class distribution for the test data; and (4) predicts the label of test data using the average predicted class distribution from these modules. The pseudo-label generation is based on the basic intuition that a test data and its nearest neighbor in the embedding space are likely to share the same label under the domain shift. By utilizing multiple randomly initialized adaptation modules, TAST extracts useful information for the classification of the test data under the domain shift, using the nearest neighbor information. TAST showed better performance than the state-of-the-art TTA methods on two standard benchmark tasks, domain generalization, namely VLCS, PACS, OfficeHome, and TerraIncognita, and image corruption, particularly CIFAR-10/100C.

STJan 12, 2023
Detection problems in the spiked matrix models

Ji Hyung Jung, Hye Won Chung, Ji Oon Lee

We study the statistical decision process of detecting the low-rank signal from various signal-plus-noise type data matrices, known as the spiked random matrix models. We first show that the principal component analysis can be improved by entrywise pre-transforming the data matrix if the noise is non-Gaussian, generalizing the known results for the spiked random matrix models with rank-1 signals. As an intermediate step, we find out sharp phase transition thresholds for the extreme eigenvalues of spiked random matrices, which generalize the Baik-Ben Arous-Péché (BBP) transition. We also prove the central limit theorem for the linear spectral statistics for the spiked random matrices and propose a hypothesis test based on it, which does not depend on the distribution of the signal or the noise. When the noise is non-Gaussian noise, the test can be improved with an entrywise transformation to the data matrix with additive noise. We also introduce an algorithm that estimates the rank of the signal when it is not known a priori.

LGMay 21
Toward Understanding Adversarial Distillation: Why Robust Teachers Fail

Hongsin Lee, Hye Won Chung

Adversarial Distillation aims to enhance student robustness by guiding the student with a robust teacher's soft labels within the min-max adversarial training framework, yet its success is notoriously inconsistent: a more robust teacher often fails to improve, or even harms, the student's robust generalization. In this paper, we identify a key mechanism of this teacher dependency: the misalignment between the teacher's supervisory confidence and the student's representational limitations on a consistent subset of training data -- the Robustly Unlearnable Set. We present a theoretical framework analyzing the feature learning dynamics of a two-layer neural network, demonstrating that this mismatch creates a dichotomy in distillation outcomes. We prove that when a teacher provides confident supervision on unlearnable samples, it compels the student to memorize spurious noise patterns that eventually overpower the learned robust signal, thereby driving robust overfitting. Conversely, a teacher that exhibits high uncertainty on these samples effectively suppresses noise memorization, allowing the student to rely solely on the learnable signal for robust generalization. We empirically validate our theory across both synthetic simulations and real-image classification datasets, confirming that robust overfitting is driven by the teacher's interaction with unlearnable samples. Finally, we demonstrate that a teacher's predictive entropy on unlearnable samples serves as a strong indicator of student robustness, validating our theoretical framework and offering a principled guideline for robust teacher selection.

MLDec 19, 2022
Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization

Daesung Kim, Hye Won Chung

The nonconvex formulation of the matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient Descent (GD) is a simple yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combined with random initialization. However, previous works on matrix completion require either careful initialization or regularizers to prove the convergence of GD. In this paper, we study the rank-1 symmetric matrix completion and prove that GD converges to the ground truth when small random initialization is used. We show that in a logarithmic number of iterations, the trajectory enters the region where local convergence occurs. We provide an upper bound on the initialization size that is sufficient to guarantee the convergence, and show that a larger initialization can be used as more samples are available. We observe that the implicit regularization effect of GD plays a critical role in the analysis, and for the entire trajectory, it prevents each entry from becoming much larger than the others.

CVDec 11, 2025Code
Sample-wise Adaptive Weighting for Transfer Consistency in Adversarial Distillation

Hongsin Lee, Hye Won Chung

Adversarial distillation in the standard min-max adversarial training framework aims to transfer adversarial robustness from a large, robust teacher network to a compact student. However, existing work often neglects to incorporate state-of-the-art robust teachers. Through extensive analysis, we find that stronger teachers do not necessarily yield more robust students-a phenomenon known as robust saturation. While typically attributed to capacity gaps, we show that such explanations are incomplete. Instead, we identify adversarial transferability-the fraction of student-crafted adversarial examples that remain effective against the teacher-as a key factor in successful robustness transfer. Based on this insight, we propose Sample-wise Adaptive Adversarial Distillation (SAAD), which reweights training examples by their measured transferability without incurring additional computational cost. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that SAAD consistently improves AutoAttack robustness over prior methods. Our code is available at https://github.com/HongsinLee/saad.

HCDec 29, 2022
Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Hyeonsu Jeong, Hye Won Chung

Crowdsourcing has emerged as an effective platform for labeling large amounts of data in a cost- and time-efficient manner. Most previous work has focused on designing an efficient algorithm to recover only the ground-truth labels of the data. In this paper, we consider multi-choice crowdsourcing tasks with the goal of recovering not only the ground truth, but also the most confusing answer and the confusion probability. The most confusing answer provides useful information about the task by revealing the most plausible answer other than the ground truth and how plausible it is. To theoretically analyze such scenarios, we propose a model in which there are the top two plausible answers for each task, distinguished from the rest of the choices. Task difficulty is quantified by the probability of confusion between the top two, and worker reliability is quantified by the probability of giving an answer among the top two. Under this model, we propose a two-stage inference algorithm to infer both the top two answers and the confusion probability. We show that our algorithm achieves the minimax optimal convergence rate. We conduct both synthetic and real data experiments and demonstrate that our algorithm outperforms other recent algorithms. We also show the applicability of our algorithms in inferring the difficulty of tasks and in training neural networks with top-two soft labels.

LGNov 19, 2025Code
SNAP: Low-Latency Test-Time Adaptation with Sparse Updates

Hyeongheon Cha, Dong Min Kim, Hye Won Chung et al.

Test-Time Adaptation (TTA) adjusts models using unlabeled test data to handle dynamic distribution shifts. However, existing methods rely on frequent adaptation and high computational cost, making them unsuitable for resource-constrained edge environments. To address this, we propose SNAP, a sparse TTA framework that reduces adaptation frequency and data usage while preserving accuracy. SNAP maintains competitive accuracy even when adapting based on only 1% of the incoming data stream, demonstrating its robustness under infrequent updates. Our method introduces two key components: (i) Class and Domain Representative Memory (CnDRM), which identifies and stores a small set of samples that are representative of both class and domain characteristics to support efficient adaptation with limited data; and (ii) Inference-only Batch-aware Memory Normalization (IoBMN), which dynamically adjusts normalization statistics at inference time by leveraging these representative samples, enabling efficient alignment to shifting target domains. Integrated with five state-of-the-art TTA algorithms, SNAP reduces latency by up to 93.12%, while keeping the accuracy drop below 3.3%, even across adaptation rates ranging from 1% to 50%. This demonstrates its strong potential for practical use on edge devices serving latency-sensitive applications. The source code is available at https://github.com/chahh9808/SNAP.

CVOct 18, 2025Code
VIPAMIN: Visual Prompt Initialization via Embedding Selection and Subspace Expansion

Jaekyun Park, Hye Won Chung

In the era of large-scale foundation models, fully fine-tuning pretrained networks for each downstream task is often prohibitively resource-intensive. Prompt tuning offers a lightweight alternative by introducing tunable prompts while keeping the backbone frozen. However, existing visual prompt tuning methods often fail to specialize the prompts or enrich the representation space--especially when applied to self-supervised backbones. We show that these limitations become especially pronounced in challenging tasks and data-scarce settings, where effective adaptation is most critical. In this work, we introduce VIPAMIN, a visual prompt initialization strategy that enhances adaptation of self-supervised models by (1) aligning prompts with semantically informative regions in the embedding space, and (2) injecting novel representational directions beyond the pretrained subspace. Despite its simplicity--requiring only a single forward pass and lightweight operations--VIPAMIN consistently improves performance across diverse tasks and dataset sizes, setting a new state of the art in visual prompt tuning. Our code is available at https://github.com/iamjaekyun/vipamin.

LGNov 20, 2024
Label Distribution Shift-Aware Prediction Refinement for Test-Time Adaptation

Minguk Jang, Hye Won Chung

Test-time adaptation (TTA) is an effective approach to mitigate performance degradation of trained models when encountering input distribution shifts at test time. However, existing TTA methods often suffer significant performance drops when facing additional class distribution shifts. We first analyze TTA methods under label distribution shifts and identify the presence of class-wise confusion patterns commonly observed across different covariate shifts. Based on this observation, we introduce label Distribution shift-Aware prediction Refinement for Test-time adaptation (DART), a novel TTA method that refines the predictions by focusing on class-wise confusion patterns. DART trains a prediction refinement module during an intermediate time by exposing it to several batches with diverse class distributions using the training dataset. This module is then used during test time to detect and correct class distribution shifts, significantly improving pseudo-label accuracy for test data. Our method exhibits 5-18% gains in accuracy under label distribution shifts on CIFAR-10C, without any performance degradation when there is no label distribution shift. Extensive experiments on CIFAR, PACS, OfficeHome, and ImageNet benchmarks demonstrate DART's ability to correct inaccurate predictions caused by test-time distribution shifts. This improvement leads to enhanced performance in existing TTA methods, making DART a valuable plug-in tool.

LGFeb 16, 2024
Rethinking Self-Distillation: Label Averaging and Enhanced Soft Label Refinement with Partial Labels

Hyeonsu Jeong, Hye Won Chung

We investigate the mechanisms of self-distillation in multi-class classification, particularly in the context of linear probing with fixed feature extractors where traditional feature learning explanations do not apply. Our theoretical analysis reveals that multi-round self-distillation effectively performs label averaging among instances with high feature correlations, governed by the eigenvectors of the Gram matrix derived from input features. This process leads to clustered predictions and improved generalization, mitigating the impact of label noise by reducing the model's reliance on potentially corrupted labels. We establish conditions under which multi-round self-distillation achieves 100% population accuracy despite label noise. Furthermore, we introduce a novel, efficient single-round self-distillation method using refined partial labels from the teacher's top two softmax outputs, referred to as the PLL student model. This approach replicates the benefits of multi-round distillation in a single round, achieving comparable or superior performance--especially in high-noise scenarios--while significantly reducing computational cost.

CVOct 21, 2025
CovMatch: Cross-Covariance Guided Multimodal Dataset Distillation with Trainable Text Encoder

Yongmin Lee, Hye Won Chung

Multimodal dataset distillation aims to synthesize a small set of image-text pairs that enables efficient training of large-scale vision-language models. While dataset distillation has shown promise in unimodal tasks, extending it to multimodal contrastive learning presents key challenges: learning cross-modal alignment and managing the high computational cost of large encoders. Prior approaches address scalability by freezing the text encoder and update only the image encoder and text projection layer. However, we find this severely limits semantic alignment and becomes a bottleneck for performance scaling. We propose CovMatch, a scalable dataset distillation framework that aligns the cross-covariance of real and synthetic features while regularizing feature distributions within each modality. Unlike prior approaches, CovMatch enables joint optimization of both encoders, leading to stronger cross-modal alignment and improved performance. Evaluated on Flickr30K and COCO, CovMatch outperforms state-of-the-art multimodal distillation methods and achieves up to 6.8% absolute gains in retrieval accuracy using only 500 synthetic pairs.

LGJun 5, 2024
BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges

Hoyong Choi, Nohyun Ki, Hye Won Chung

Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training, addressing challenges associated with training neural networks on large-scale datasets. However, existing methods tend to specialize in either high or low selection ratio regimes, lacking a universal approach that consistently achieves competitive performance across a broad range of selection ratios. We introduce a universal and efficient data subset selection method, Best Window Selection (BWS), by proposing a method to choose the best window subset from samples ordered based on their difficulty scores. This approach offers flexibility by allowing the choice of window intervals that span from easy to difficult samples. Furthermore, we provide an efficient mechanism for selecting the best window subset by evaluating its quality using kernel ridge regression. Our experimental results demonstrate the superior performance of BWS compared to other baselines across a broad range of selection ratios over datasets, including CIFAR-10/100 and ImageNet, and the scenarios involving training from random initialization or fine-tuning of pre-trained models.

DSMay 31, 2023
Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation

Joonhyuk Yang, Dongpil Shin, Hye Won Chung

We consider the problem of graph matching, or learning vertex correspondence, between two correlated stochastic block models (SBMs). The graph matching problem arises in various fields, including computer vision, natural language processing and bioinformatics, and in particular, matching graphs with inherent community structure has significance related to de-anonymization of correlated social networks. Compared to the correlated Erdos-Renyi (ER) model, where various efficient algorithms have been developed, among which a few algorithms have been proven to achieve the exact matching with constant edge correlation, no low-order polynomial algorithm has been known to achieve exact matching for the correlated SBMs with constant correlation. In this work, we propose an efficient algorithm for matching graphs with community structure, based on the comparison between partition trees rooted from each vertex, by extending the idea of Mao et al. (2021) to graphs with communities. The partition tree divides the large neighborhoods of each vertex into disjoint subsets using their edge statistics to different communities. Our algorithm is the first low-order polynomial-time algorithm achieving exact matching between two correlated SBMs with high probability in dense graphs.

HCNov 19, 2021
A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental Limits

Doyeon Kim, Jeonghwan Lee, Hye Won Chung

Crowdsourcing system has emerged as an effective platform for labeling data with relatively low cost by using non-expert workers. Inferring correct labels from multiple noisy answers on data, however, has been a challenging problem, since the quality of the answers varies widely across tasks and workers. Many existing works have assumed that there is a fixed ordering of workers in terms of their skill levels, and focused on estimating worker skills to aggregate the answers from workers with different weights. In practice, however, the worker skill changes widely across tasks, especially when the tasks are heterogeneous. In this paper, we consider a new model, called $d$-type specialization model, in which each task and worker has its own (unknown) type and the reliability of each worker can vary in the type of a given task and that of a worker. We allow that the number $d$ of types can scale in the number of tasks. In this model, we characterize the optimal sample complexity to correctly infer the labels within any given accuracy, and propose label inference algorithms achieving the order-wise optimal limit even when the types of tasks or those of workers are unknown. We conduct experiments both on synthetic and real datasets, and show that our algorithm outperforms the existing algorithms developed based on more strict model assumptions.

STApr 28, 2021
Detection of Signal in the Spiked Rectangular Models

Ji Hyung Jung, Hye Won Chung, Ji Oon Lee

We consider the problem of detecting signals in the rank-one signal-plus-noise data matrix models that generalize the spiked Wishart matrices. We show that the principal component analysis can be improved by pre-transforming the matrix entries if the noise is non-Gaussian. As an intermediate step, we prove a sharp phase transition of the largest eigenvalues of spiked rectangular matrices, which extends the Baik-Ben Arous-Péché (BBP) transition. We also propose a hypothesis test to detect the presence of signal with low computational complexity, based on the linear spectral statistics, which minimizes the sum of the Type-I and Type-II errors when the noise is Gaussian.

LGFeb 24, 2021
Self-Diagnosing GAN: Diagnosing Underrepresented Samples in Generative Adversarial Networks

Jinhee Lee, Haeri Kim, Youngkyu Hong et al.

Despite remarkable performance in producing realistic samples, Generative Adversarial Networks (GANs) often produce low-quality samples near low-density regions of the data manifold, e.g., samples of minor groups. Many techniques have been developed to improve the quality of generated samples, either by post-processing generated samples or by pre-processing the empirical data distribution, but at the cost of reduced diversity. To promote diversity in sample generation without degrading the overall quality, we propose a simple yet effective method to diagnose and emphasize underrepresented samples during training of a GAN. The main idea is to use the statistics of the discrepancy between the data distribution and the model distribution at each data instance. Based on the observation that the underrepresented samples have a high average discrepancy or high variability in discrepancy, we propose a method to emphasize those samples during training of a GAN. Our experimental results demonstrate that the proposed method improves GAN performance on various datasets, and it is especially effective in improving the quality and diversity of sample generation for minor groups.

MLMar 23, 2020
Robust Hypergraph Clustering via Convex Relaxation of Truncated MLE

Jeonghwan Lee, Daesung Kim, Hye Won Chung

We study hypergraph clustering in the weighted $d$-uniform hypergraph stochastic block model ($d$\textsf{-WHSBM}), where each edge consisting of $d$ nodes from the same community has higher expected weight than the edges consisting of nodes from different communities. We propose a new hypergraph clustering algorithm, called \textsf{CRTMLE}, and provide its performance guarantee under the $d$\textsf{-WHSBM} for general parameter regimes. We show that the proposed method achieves the order-wise optimal or the best existing results for approximately balanced community sizes. Moreover, our results settle the first recovery guarantees for growing number of clusters of unbalanced sizes. Involving theoretical analysis and empirical results, we demonstrate the robustness of our algorithm against the unbalancedness of community sizes or the presence of outlier nodes.

HCMar 21, 2020
Crowdsourced Labeling for Worker-Task Specialization Model

Doyeon Kim, Hye Won Chung

We consider crowdsourced labeling under a $d$-type worker-task specialization model, where each worker and task is associated with one particular type among a finite set of types and a worker provides a more reliable answer to tasks of the matched type than to tasks of unmatched types. We design an inference algorithm that recovers binary task labels (up to any given recovery accuracy) by using worker clustering, worker skill estimation and weighted majority voting. The designed inference algorithm does not require any information about worker/task types, and achieves any targeted recovery accuracy with the best known performance (minimum number of queries per task).

ITJan 31, 2020
Binary Classification with XOR Queries: Fundamental Limits and An Efficient Algorithm

Daesung Kim, Hye Won Chung

We consider a query-based data acquisition problem for binary classification of unknown labels, which has diverse applications in communications, crowdsourcing, recommender systems and active learning. To ensure reliable recovery of unknown labels with as few number of queries as possible, we consider an effective query type that asks "group attribute" of a chosen subset of objects. In particular, we consider the problem of classifying $m$ binary labels with XOR queries that ask whether the number of objects having a given attribute in the chosen subset of size $d$ is even or odd. The subset size $d$, which we call query degree, can be varying over queries. We consider a general noise model where the accuracy of answers on queries changes depending both on the worker (the data provider) and query degree $d$. For this general model, we characterize the information-theoretic limit on the optimal number of queries to reliably recover $m$ labels in terms of a given combination of degree-$d$ queries and noise parameters. Further, we propose an efficient inference algorithm that achieves this limit even when the noise parameters are unknown.

STJan 16, 2020
Weak Detection in the Spiked Wigner Model with General Rank

Ji Hyung Jung, Hye Won Chung, Ji Oon Lee

We study the statistical decision process of detecting the signal from a `signal+noise' type matrix model with an additive Wigner noise. We propose a hypothesis test based on the linear spectral statistics of the data matrix, which does not depend on the distribution of the signal or the noise. The test is optimal under the Gaussian noise if the signal-to-noise ratio is small, as it minimizes the sum of the Type-I and Type-II errors. Under the non-Gaussian noise, the test can be improved with an entrywise transformation to the data matrix. We also introduce an algorithm that estimates the rank of the signal when it is not known a priori.

LGApr 19, 2019
Shallow Neural Network can Perfectly Classify an Object following Separable Probability Distribution

Youngjae Min, Hye Won Chung

Guiding the design of neural networks is of great importance to save enormous resources consumed on empirical decisions of architectural parameters. This paper constructs shallow sigmoid-type neural networks that achieve 100% accuracy in classification for datasets following a linear separability condition. The separability condition in this work is more relaxed than the widely used linear separability. Moreover, the constructed neural network guarantees perfect classification for any datasets sampled from a separable probability distribution. This generalization capability comes from the saturation of sigmoid function that exploits small margins near the boundaries of intervals formed by the separable probability distribution.

STSep 28, 2018
Weak detection in the spiked Wigner model

Hye Won Chung, Ji Oon Lee

We consider the weak detection problem in a rank-one spiked Wigner data matrix where the signal-to-noise ratio is small so that reliable detection is impossible. We propose a hypothesis test on the presence of the signal by utilizing the linear spectral statistics of the data matrix. The test is data-driven and does not require prior knowledge about the distribution of the signal or the noise. When the noise is Gaussian, the proposed test is optimal in the sense that its error matches that of the likelihood ratio test, which minimizes the sum of the Type-I and Type-II errors. If the density of the noise is known and non-Gaussian, the error of the test can be lowered by applying an entrywise transformation to the data matrix. We establish a central limit theorem for the linear spectral statistics of general rank-one spiked Wigner matrices as an intermediate step.

ITSep 4, 2018
Parity Queries for Binary Classification

Hye Won Chung, Ji Oon Lee, Doyeon Kim et al.

Consider a query-based data acquisition problem that aims to recover the values of $k$ binary variables from parity (XOR) measurements of chosen subsets of the variables. Assume the response model where only a randomly selected subset of the measurements is received. We propose a method for designing a sequence of queries so that the variables can be identified with high probability using as few ($n$) measurements as possible. We define the query difficulty $\bar{d}$ as the average size of the query subsets and the sample complexity $n$ as the minimum number of measurements required to attain a given recovery accuracy. We obtain fundamental trade-offs between recovery accuracy, query difficulty, and sample complexity. In particular, the necessary and sufficient sample complexity required for recovering all $k$ variables with high probability is $n = c_0 \max\{k, (k \log k)/\bar{d}\}$ and the sample complexity for recovering a fixed proportion $(1-δ)k$ of the variables for $δ=o(1)$ is $n = c_1\max\{k, (k \log(1/δ))/\bar{d}\}$, where $c_0, c_1>0$.