Channel-Wise MLPs Improve the Generalization of Recurrent Convolutional Networks
This incremental improvement addresses generalization issues in neural program synthesis, potentially benefiting hypernetwork approaches.
The paper tackled improving generalization in recurrent convolutional networks by adding channel-wise MLPs, finding that the DAMP architecture significantly outperformed the baseline DARC on in-distribution and out-of-distribution tasks in the Re-ARC benchmark.
We investigate the impact of channel-wise mixing via multi-layer perceptrons (MLPs) on the generalization capabilities of recurrent convolutional networks. Specifically, we compare two architectures: DARC (Depth Aware Recurrent Convolution), which employs a simple recurrent convolutional structure, and DAMP (Depth Aware Multi-layer Perceptron), which extends DARC with a gated MLP for channel mixing. Using the Re-ARC benchmark, we find that DAMP significantly outperforms DARC in both in-distribution and out-of-distribution generalization under exact-match grading criteria. These results suggest that explicit channel mixing through MLPs enables recurrent convolutional networks to learn more robust and generalizable computational patterns. Our findings have implications for neural program synthesis and highlight the potential of DAMP as a target architecture for hypernetwork approaches.