CVARJan 24, 2025

BILLNET: A Binarized Conv3D-LSTM Network with Logic-gated residual architecture for hardware-efficient video inference

arXiv:2501.14495v13 citationsh-index: 14SiPS
Originality Incremental advance
AI Analysis

This work addresses hardware efficiency for video-based applications on resource-constrained devices, representing an incremental improvement in model compression and acceleration.

The paper tackled the problem of high memory and computational demands in video inference models like Conv3D-LSTM by proposing BILLNET, a binarized Conv3D-LSTM network with a logic-gated residual architecture, achieving high accuracy on the Jester dataset with extremely low memory and computational budgets compared to existing resource-efficient models.

Long Short-Term Memory (LSTM) and 3D convolution (Conv3D) show impressive results for many video-based applications but require large memory and intensive computing. Motivated by recent works on hardware-algorithmic co-design towards efficient inference, we propose a compact binarized Conv3D-LSTM model architecture called BILLNET, compatible with a highly resource-constrained hardware. Firstly, BILLNET proposes to factorize the costly standard Conv3D by two pointwise convolutions with a grouped convolution in-between. Secondly, BILLNET enables binarized weights and activations via a MUX-OR-gated residual architecture. Finally, to efficiently train BILLNET, we propose a multi-stage training strategy enabling to fully quantize LSTM layers. Results on Jester dataset show that our method can obtain high accuracy with extremely low memory and computational budgets compared to existing Conv3D resource-efficient models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes