Design & Implementation of a Scalable CNN Accelerator
Deep Convolutional Neural Networks (CNNs) have become indispensable for computer vision tasks due to their high accuracy. While large efforts are made to confine the high computational costs of CNNs, low power embedded devices struggle to achieve a real-time frame rate. In this work, we present BinArray, a hardware accelerator for Binary Approximated Convolutional Neural Networks (BACNNs), which provide a configurable trade-off between accuracy and complexity. BinArray translates this trade-off into a task specific compromise between area, throughput and accuracy depending on the given constraints. Its Systolic Array (SA)-architecture is scalable for different sizes of BACNNs ranging from a GTSRB CNN to large MobileNets. We implemented BinArray on a Xilinx Zynq FPGA. Without losing accuracy, an accelerated BACNN achieves a throughput of up to 92.1 FPS on GTSRB compared to a CPU with only 7.9 FPS. BinArray accomplishes this while using less than 2% of the logic available on a mid-sized FPGA.
Studienbetreuer: Jürgen Wassner