卷积神经网络CNN 硬件实现
1
第一部分
整体概述
THE FIRST PART OF THE OVERALL OVERVIEW, PLEASE SUMMARIZE THE CONTENT
2
Main paper
• 2009/CNP: An FPGA-based Processor for Convolutional Networks
…, c8*x2; …, c8*x3; …, c8*x4;
10
2013A / new Architecture
• 二维卷积框图C
– 该卷积结构将卷积的计算与卷积 窗的输入值分隔开,经过比较, 优势在于右边的2D Array Processor可以多个同时进行并 行计算得到多个输出结果,实现 了输入数据的复用,而前一种则 由于没有分开所以无法实现
19
2013B/A Memory-Centric Architecture—ICCD
• The effect of the memory bottlenect can be reduced by a flexible memory hierarchy that supports the complex data access patterns in CNN workload.
c2*x2+c1*x1+c0*x0,
4. C0*x3, c1*x3+c0*x2,
c2*x3+c1*x2+c0*x1,
5. C0*x4, c1*x4+c0*x3,
c2*x4+c1*x3+c0*x2,
6. C0*x5, c1*x5+c0*x4,
c2*x5+c1*x4+c0*x3,