The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently
For machine learning practitioners dealing with sparse Boolean functions, this work reveals that temporal structure in data can be exploited to overcome known computational barriers for gradient-based learning.
The paper shows that temporal correlations from a lazy random walk on the hypercube enable gradient-based methods to learn Boolean k-juntas with sample complexity essentially linear in the ambient dimension d, whereas under independent uniform samples this problem is hard for such methods.
We study how temporal correlations in the data can make certain sparse learning problems efficiently learnable by gradient-based methods. Our focus is on Boolean k-juntas, a canonical sparse learning problem known to pose barriers for gradient-based methods under independent uniform samples. We show that this picture changes when the samples are generated by a lazy random walk on the hypercube. In this setting, the temporal dependencies can be exploited by a two-layer ReLU network trained using stylized-SGD with a temporal-difference loss, which compares target and predicted increments across consecutive samples. For every fixed k, the resulting sample complexity is essentially linear in the ambient dimension d. By contrast, we show that for large-batch gradient methods using standard convex pointwise losses, temporal correlations do not provide the same advantage.