Learned Cost Model for Placement on Reconfigurable Dataflow Hardware
This work addresses the challenge of efficient hardware mapping for ML models, offering a significant improvement over existing methods but is incremental as it builds on prior cost modeling approaches.
The paper tackles the problem of mapping ML model dataflow graphs onto reconfigurable hardware by developing a learned cost model that predicts throughput 31%-52% more accurately than hand-designed analytical models, resulting in 5.6% faster compiled graphs.
Mapping a dataflow-graph of an ML model onto a reconfigurable system is difficult, as different mappings have different throughputs and consume resource constraints differently. To solve this, a model to evaluate the throughput of mappings is necessary as measuring throughput completely is expensive. Many use a hand-designed analytical model, relying on proxy features or intuition, introducing error. We provide a Learned Approach that predicts throughput 31%-52% more accurately over a variety of graphs. In addition, our approach shows no accuracy degradation after removing performance annotations. We show that using this approach results in 5.6% faster compiled graphs.