TCD-NPE: A Re-configurable and Efficient Neural Processing Engine, Powered by Novel Temporal-Carry-deferring MACs
This work addresses energy and performance bottlenecks for hardware accelerators in machine learning, though it appears incremental as it builds on existing neural processing engine designs with novel MAC and scheduling optimizations.
The paper tackles the problem of energy and performance inefficiency in neural processing engines by proposing a Temporal-Carry-deferring MAC (TCD-MAC) and a reconfigurable TCD-NPE, resulting in significant improvements in energy consumption and execution time compared to conventional MAC-based solutions.
In this paper, we first propose the design of Temporal-Carry-deferring MAC (TCD-MAC) and illustrate how our proposed solution can gain significant energy and performance benefit when utilized to process a stream of input data. We then propose using the TCD-MAC to build a reconfigurable, high speed, and low power Neural Processing Engine (TCD-NPE). We, further, propose a novel scheduler that lists the sequence of needed processing events to process an MLP model in the least number of computational rounds in our proposed TCD-NPE. We illustrate that our proposed TCD-NPE significantly outperform similar neural processing solutions that use conventional MACs in terms of both energy consumption and execution time.