axi interconnect 代码

格式：docx
大小：11.08 KB
文档页数：1

下载文档原格式

AXI-HP接口_DMA_GIC编程

#define INTC_DEVICE_ID
XPAR_SCUGIC_SINGLE_DEVICE_ID
//DMA 控制设备 //GIC 控制设备
ID 号=
#define
DMA_CTRL_IRPT_INTR
XPAR_FABRIC_AXI_CDMA_0_CDMA_INTROUT_INTR //中断号
AXI CDMA 从机接口连接到 PS 通用主机接口 M_AXI_GP1.用于 PS 来配置 AXI 寄存器用于数据传输和状态检测。 AXI CDMA 主机接口连接到 PS 高性能从机接口 S_AXI_HP0.用于 CDMA 模块读取 DDR 系统内存源缓冲区数据。 AXI CDMA 主机接口连接到 PS 高性能从机接口 S_AXI_HP2.用于 CDMA 模块将数据写回 DDR 系统内存目的缓冲区。 AXI CDMA 中断从 PL 连接到 PS 全局中断控制器(GIC)。当数据传输完成或传输时有错误发生，则产生中断。原文中源缓冲区地址 0x20000000~0x2FFFFFFF，目的缓冲区地址 0x30000000~0x3FFFFFFF，大小均为 256MB 。我们要在 ZED Board 上实现， DDR2 的内存空间（见下图）为 0x00100000~0x1FFFFFFF，不能设这么大。
点 OK 确认更新。更新后的 Zynq 表单如下所示
可以看到 PS 和 PL 之间新增的绿色连线生成。回到 Bus Interfaces 表单，展开 PS7_0，点击 M_AXI_GP1（从名字就能判断，是主机接口， AXI 总线，通用类型接口），将它连接到 axi_interconnect_gp1. 展开 axi_cdma_0，点击 S_AXI_LITE（从名字也能判断，是从机接口，AXI 总线，精简类型接口），也连接到 axi_interconnect_gp1.连接时选择 ps7_0.M_AXI_GP1.这样就把 PS 和 PL 通过 axi_interconnect_gp1 这个纽带连接到一起了。接下来，选择 axi_cmda_0 下的 M_AXI，连接到 axi_interconnect_hp 上；ps7_0 下的 S_AXI_HP0 连接到 axi_interconnect_hp，选择 axi_cdma_0.M_AXI。

Zedboard开发例程--点亮Led流水灯(基于PS+PL+SmallRTOS)

图 8、 Vivado 主界面主界面主要包括： Flow Navigator： Sources：在此可以找到整个设计过程中涉及到的所有流程工程包含的设计源文件，源码、约束等
微型嵌入式实时操作系统 SmallRTOS 官方网站：
微型嵌入式实时操作系统 SmallRTOS 官方网站：
图 5、新建工程向导 step3
图 6、选择器件/开发板最后一步：所建立工程的概要信息。
Zedboard 开发例程‐‐点亮 Led 流水灯
——基于 PS+PL+SmallRTOS
1、目标规划
硬件平台：Zedboard；软件平台：Vivado、SDK；实时操作系统：SmallRTOS；实现功能：采用 PL 添加 IP 核，PS 控制 GPIO 的方式，控制 LED 流水灯，同时熟悉微型嵌入式实时操作系统 SmallRTOS 的多任务功能；实现流程：建立工程添加 ARM 内核、 GPIO IP 综合、实现、烧写板级测试；
微型嵌入式实时操作系统 SmallRTOS 官方网站：
图 3、新建工程向导 step1
图 4、新建工程向导 step2 指定工程名和存储路径勾选图 5 复选框内的选项，表明不在此时指定源文件。
Properties： Project Summary： Design Runs：令 3.2. 流程控制子窗口
所选中对象的属性信息在此可以查看工程信息在此可以查看提示信息、警告、错误、也可以输入 tcl 命
图 9、流程子窗口
微型嵌入式实时操作系统 SmallRTOS 官方网站：
图 12 3.4、添加 cpu 在 Diagram 子窗体中找到 Add IP 按钮，位置如图 13 所示的提示框左上方。

赛灵思命名规则

赛灵思命名规则
1.设备名称：设备名称由几个部分组成，包括系列名称、设备类型、芯片大小、速度等信息。

例如，XC7Z010-1CLG400C表示Zynq-7000系列的SoC芯片，大小为10，速度等级为1，封装为CLG400。

2. IP核名称：IP核的名称也包括系列名称、IP类型、版本号等信息。

例如，AXI Interconnect v2.1是一个AXI总线连接器的IP 核，版本号为2.1。

3. 工具名称：赛灵思的工具包括Vivado、ISE等，工具名称也通常包括版本号信息。

4. 文件名称：赛灵思公司的文件名称通常包括系列名称、文件类型、版本号等信息。

例如，xc7z010clg400-1.bit是一个Zynq-7000系列的比特流文件，版本号为1。

总的来说，赛灵思公司的命名规则十分规范化和严谨，为用户提供了方便和清晰的信息。

- 1 -。

FPGA可编程逻辑器件芯片XCVU3P-1FFVC1517C中文规格书

9.Double-click the Debug Bridge IP identified as xvc_vsec to view the configuration optionfor this IP. Make note of the following configuration parameters because they will be used to configure the driver.•PCIe XVC VSEC ID (default 0x0008)•PCIe XVC VSEC Rev ID (default 0x0)IMPORTANT! Do not modify these parameter values when using a Xilinx Vendor ID or provided XVCdrivers and software. These values are used to detect the XVC extended capability. (See the PCIespecification for additional details.)10.In the Flow Navigator, click Generate Bitstream to generate a bitstream for the exampledesign project. This bitstream will be then be loaded onto the FPGA board to enable XVCdebug over PCIe.After the XVC-over-PCIe hardware design has been completed, an appropriate XVC enabledPCIe driver and associated XVC-Server software application can be used to connect the Vivado Design Suite to the PCIe connected FPGA. Vivado can connect to an XVC-Server application that is running local on the same Machine or remotely on another machine using a TCP/IP socket.System Bring-UpThe first step is to program the FPGA and power on the system such that the PCIe link isdetected by the host system. This can be accomplished by either:•Programming the design file into the flash present on the FPGA board, or•Programming the device directly via JTAG.If the card is powered by the Host PC, it will need to be powered on to perform thisprogramming using JTAG and then re-started to allow the PCIe link to enumerate. After thesystem is up and running, you can use the Linux lspci utility to list out the details for theFPGA-based PCIe device.Compiling and Loading the DriverThe provided PCIe drivers and software should be customized to a specific platform. T oaccomplish this, drivers and software are normally developed to verify the Vendor ID, Device ID, Revision ID, Subsystem Vendor ID, and Subsystem ID before attempting to access device-extended capabilities or peripherals like the PCIe-XVC-VSEC or AXI-XVC. Because the provided driver is generic, it only verifies the Vendor ID and Device ID for compatibility before attempting to identify the PCIe-XVC-VSEC or AXI-XVC peripheral.The XVC driver and software are provide as a ZIP file included with the Vivado Design Suiteinstallation.Appendix D: Using the Xilinx Virtual Cable to DebugPG195 (v4.1) April 29, 2021DMA/Bridge Subsystem for PCIe v4.1Figure 35:XVC-over-PCIe with PCIe Extended Capability InterfaceNote : Although the previous figure shows the UltraScale+™ Devices Integrated Block for PCIe IP, other PCIe IP (that is, the UltraScale™ Devices Integrated Block for PCIe, AXI Bridge for PCIe, or PCIe DMA IP)can be used interchangeably in this diagram.XVC-over-PCIe Through AXI (AXI-XVC)Using the AXI-XVC approach, the Debug Bridge IP connects to the PCIe IP through an AXIInterconnect IP. The Debug Bridge IP connects to the AXI Interconnect like other AXI4-Lite Slave IPs and similarly requires that a specific address range be assigned to it. Traditionally thedebug_bridge IP in this configuration is connected to the control path network rather than the system datapath network. The following figure describes the connectivity between the DMA Subsystem for PCIe IP and the Debug Bridge IP for this implementation.Figure 36:XVC over PCIe with AXI4-Lite InterfaceNote : Although the previous figure shows the PCIe DMA IP, any AXI-enabled PCIe IP can be used interchangeably in this diagram.The AXI-XVC implementation allows for higher speed transactions. However, XVC debug traffic passes through the same PCIe ports and interconnect as other PCIe control path traffic, making it more difficult to debug transactions along this path. As result the AXI-XVC debug should be used to debug a specific peripheral or a different AXI network rather than attempting to debug datapaths that overlap with the AXI-XVC debug communication path.Appendix D: Using the Xilinx Virtual Cable to DebugPG195 (v4.1) April 29, 2021DMA/Bridge Subsystem for PCIe v4.1。

AXI总线的一些知识

AXI总线的一些知识AXI-stream总线简介-LDD本节介绍的AXI是个什么东西呢，它其实不属于Zynq，不属于Xilinx，而是属于ARM。

它是ARM最新的总线接口，以前叫做AMBA，从3.0以后就称为AXI了。

Zynq是以ARM作为核心的，运行时也是第一个“醒”过来，然后找可执行代码，找到后进入FSBL（第一引导阶段），接着找配置逻辑部分的bit文件，找到后就叫醒PL按照bit中的方式运行，再接着找可执行代码，进入SSBL（第二引导阶段），这时就可以初始化操作系统的运行环境，引导像Linux这样的大型程序，随后将控制权交给Linux。

Linux运行时可以跟PL进行数据交互。

注意了，就在这时候，数据交互的通路，就是我们本节要讲的AXI总线。

说白了，AXI就是负责ARM与FPGA之间通信的专用数据通道。

ARM内部用硬件实现了AXI总线协议，包括9个物理接口，分别为AXI-GP0~AXI-GP3，AXI-HP0~AXI-HP3，AXI-ACP接口。

如下图黄圈所示。

可以看到，只有两个AXI-GP是Master Port，即主机接口，其余7个口都是Slave Port（从机接口）。

主机接口具有发起读写的权限，ARM可以利用两个AXI-GP主机接口主动访问PL 逻辑，其实就是把PL映射到某个地址，读写PL寄存器如同在读写自己的存储器。

其余从机接口就属于被动接口，接受来自PL的读写，逆来顺受。

这9个AXI接口性能也是不同的。

GP接口是32位的低性能接口，理论带宽600MB/s，而HP和ACP接口为64位高性能接口，理论带宽1200MB/s。

有人会问，为什么高性能接口不做成主机接口呢？这样可以由ARM发起高速数据传输。

答案是高性能接口根本不需要ARM CPU来负责数据搬移，真正的搬运工是位于PL中的DMA 控制器。

位于PS端的ARM直接有硬件支持AXI接口，而PL则需要使用逻辑实现相应的AXI协议。

Xilinx提供现成IP如AXI-DMA，AXI-GPIO，AXI-Datamover都实现了相应的接口，使用时直接从XPS的IP列表中添加即可实现相应的功能。

AXI协议解析（三）

AXI协议解析（三）了解完通道握⼿的依赖关系，我们再看看传输事务的结构。

⾸先看传输的地址结构。

AXI协议是基于突发（burst）传输的。

所谓突发传输，就是在⼀次事务中，连续地传输多个地址相邻的数据。

⼀次突发传输中可以包含⼀⾄多次数据（Transfer）。

每个 transfer 因为使⽤⼀个周期，⼜被称为⼀拍数据（Beat）。

每个数据可以是多个Byte构成。

协议规定，每次突发传输的累计地址不能跨4KB边界。

⾄于为啥是4KB⽽不是别的数值，这是最初操作系统的问题，以4KB为⼀个页（page），对于某些设备⽽⾔，跨了4KB边界可能就是另外⼀个设备了。

ARLEN⽤于读地址通道的突发长度，AWLEN⽤于写地址通道的突发长度。

下⽂中AxLEN指ARLEN或AWLEN， x指代R或者W，对于其它信号也类似，以后不再赘述。

在AXI3中，对于所有的突发类型，都⽀持1-16的突发传输。

AXI4中，INCR类型⽀持1-256突发传输，其它类型依然是1-16。

所以对于AXI3来说，AxLEN信号是4bit宽度；⽽在AXI4中，AxLEN是8bit宽度。

⼀次突发传输的长度是AxLEN+1，因此，最⼩长度就是1了（毕竟0的话没有任何意义）。

AxSIZE信号指⽰突发传输中的数据位宽。

数据位宽不能超过数据总线本⾝的位宽，⽽当数据总线位宽⼤于突发传输的位宽时，将根据协议的相关规定，将数据在部分数据线上传输。

突发传输有以下⼏种类型：FIXED，所有数据都使⽤起始地址。

该模式适合对某个固定地址进⾏多次数据更新，⽐如读写⼀个 fifo 时，读写地址就是固定的。

INCR，后续数据的地址在初始地址的基础上进⾏递增，递增幅度与传输宽度相同。

适合对于RAM 等通过地址映射（mapped memory）的存储介质进⾏读写操作。

WRAP，⾸先根据起始地址得到绕回边界地址与最⾼地址。

当前地址⼩于最⾼地址时，WRAP 与 INCR 类型完全相同，地址递增。

AXI-HP接口_DMA_GIC编程

可以看到我们的板子中 DDR RAM 地址区间为 0x00000000~0x1FFFFFFF，大小为 512MB。这也是我们做 DMA 读写测试的区间。另外如果你生成的地址中 axi_cdma_0 模块的地址区间不是 0x80200000~0x8020FFFF，则需要手动改为这个地址，原文档中强烈要求这一点，说是为了让 PS7_0 能访问这个 IP（within the processing_system7_0 address space）。以上操作完成后，运行 DRC（Design Rule Check），保证无错误发生。关闭 XPS，回到 PlanAhead 软件。生成顶层模块 HDL。保存所有文件，点击“生成比特流”，开始漫长的等待（昨晚在我的笔记本上运行了将近 45min！）。趁这段时间可以带上耳机调大音量听听音乐看看文档。TRM，1691 页，世界沉寂了，只剩自己和文档。综合、实现结束，看一下资源使用量：
这时打开 Zynq 表单，如下图所示
单击绿色的 32b GP AXI Master Ports,打开 PS7_0 配置向导。在 User 表单中，展开 GP Master AXI Interface，选择“使能 M_AXI_GP1 接口”，点确认。单击绿色的 high performance AXI 32/64b Slave Ports，打开 PS7_0 配置向导。在 User 表单中展开“高性能从机 AXI 接口”， “使能 S_AXI_HP0 接口”——将 HP0 基地址设为 0x，高地址设为 0x。位宽默认 64bit “使能 S_AXI_HP2 接口”——将 HP2 基地址设为 0x，高地址设为 0x。位宽默认 64bit
#define INTC_DEVICE_ID
XPAR_SCUGIC_SINGLE_DEVICE_ID

Xilinx Vivado zynq7000 入门笔记

IP Integrator flow1.创建RTL工程2.创建IP Integrator Block Design3.添加zynq 处理器ip中搜索zynq，添加zynq7 Processing System，其中的BFM版本为先前的IP处理器版本。

鼠标右键点击FIXED_IO和DDR接口，选择make external，连接到芯片外部。

但此时处理是完全未经过配置的，双击处理器进行配置。

自动添加的外部接口：（参考ug585文档）FIXED_IO, 是专用的内部固化的外设IO，作用？54个MIO口，DDR_VRN,DDR_VRP: DDR DCI voltage reference pins, refer to UG933, Zynq-7000 AP SoC PCB Design and Pin Planning Guide.PS_SRSTB: Debug system reset, active Low. Forces the system to enter a reset sequence.PS_CLK: System reference clockPS_PORB: Power on reset, active lowDDR接口，处理器ddr内存寻址接口；M_AXI_GP0_ACLK,M_AXI_GP0,在PS-PL Configuration中可取消对GP Master AXI Tnterface的选择FCLK_CLK0：PL Pabric Clocks，不使用可在Clock Configuration 中disable。

FCLK_RESET0_N：时钟复位使能，可在General中disable 。

4.配置processing System，配置处理器内部控制模块的详细功能与特性查看：Soc Technical Reference manual/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf通用配置：（1）MIO配置：Bank0与Bank1分区的IO对应FPGA处理器可配置的IO，由硬件决定电平还是芯片已经指定电平？由硬件决定。

VIP验证应用-详细-AXI

Class Level aaxi_master_tr AXI Master BFM AXI Interconnect BFM AXI Master BFM AXI Slave BFM aaxi_transaction aaxi_slave_tr AXI Slave BFM BFM Level
© 2009
Description Clock count delay from write address phase N -> 1st wvalid N. To model early data phase, value may be negative however delay will be no less than dw_valid_delay clock cycles after last beat of previous address phase transfer Clock count delay between wready/wvalid N -> wvalid N+1 (for read or write). Note if total_outstanding_depth==1, then delay of first beat of T+1 is measured from when T completes. Clock count delay transaction T read address phase -> T+1 address phase (read). Note if total_outstanding_depth==1, then delay is measured from when T completes. Clock count delay transaction T write address phase -> T+1 address phase (write). Note if total_outstanding_depth==1, then delay is measured from when T completes. Clock count delay from bvalid N -> bready N Clock count delay between bready/bvalid N -> bready N+1 Clock count delay from rvalid N -> rready N Clock count delay between rready/rvalid N -> rready N+1 Clock count delay between wready/wvalid N -> wvalid N+1 (for read or write). Note if total_outstanding_depth==1, then delay of first beat of T+1 is measured from when T completes. Clock count delay between rready/rvalid N -> rready N+1

基于XVC协议的Zedboard平台远程更新与调试

基于X V C协议的Z e d b o a r d平台远程更新与调试朱琛1,沈小波1,周志刚2(1.中国电子科技集团公司第五十八研究所,无锡214000;2.中国船舶重工集团公司第702研究所)摘要:船舶雷达数据处理设备是许多科研院所面对庞大的雷达数据而研发的处理设备㊂此类处理设备属于船用设备,且设备中处理板卡众多㊂为了在不打开设备的情况下,脱离专用的U S B J T A G下载电缆对处理板中的F P G A进行远端升级,本文提出一种虚拟线缆协议,通过以太网,利用Z e d b o a r d平台的P S端控制P L端生成的J T A G接口对其进行远程更新与调试的方法㊂该方案成本较低,易于拓展,同时也提高了更新可靠性㊂关键词:X V C协议;Z e d b o a r d;J T A G接口;T C P/I P;远程更新与调试中图分类号:T N919文献标识码:AR e m o t e U p d a t i n g a n d D e b u g g i n g o f Z e d b o a r d P l a t f o r m B a s e d o n X V C P r o t o c o lZ h u C h e n1,S h e n X i a o b o1,Z h o u Z h i g a n g2(1.T h e58R e s e a r c h I n s t i t u t e o f C h i n a E l e c t r o n i c s T e c h n o l o g y G r o u p C o r p o r a t i o n,W u x i214000,C h i n a;2.T h e702R e s e a r c h I n s t i t u t e o f C h i n a S h i p b u i l d i n g I n d u s t r y C o r p o r a t i o n)A b s t r a c t:T h e s h i p r a d a r d a t a p r o c e s s i n g e q u i p m e n t i s d e v e l o p e d b y m a n y s c i e n t i f i c r e s e a r c h i n s t i t u t e s i n t h e f a c e o f h u g e r a d a r d a t a.T h i s k i n d o f p r o c e s s i n g e q u i p m e n t b e l o n g s t o m a r i n e e q u i o m e n t,a n d t h e r e a r e m a n y p r o c e s s i n g b o a r d s i n t h e e q u i p m e n t.T h i s s t u d y a i m s t o d e s i g n a n d i m p l e m e n t r e m o t e u p d a t i n g a n d d e b u g g i n g m u l t i-f i e l d p r o g r a mm a b l e g a t e a r r a y f o r t h e d e t e c t o r w i t h o u t o p e n i n g t h e e q u i p-m e n t,s e p a r a t e f r o m t h e s p e c i l a U S B-J T A G d o w n l o a d c a b l e.I n t h e p a p e r,a n e t w o r k b a s e d Z e d b o a r d̓s p r o c e s s i n g s y s t e m i s u s e d t o a c c e p t c o n f i g u r a t i o n f i l e s v i a n e t w o r k a n d g e n e r a t e J T A G b y Z e d b o a r d̓s p r o g r a mm a b l e l o g i c s e q u e n c e t o F P G A b y m e a n s o f X i l i n x v i s u a l c a b l e p r o t o c o l.T h e r e m o t e u p d a t i n g a n d d e b u g g i n g o f m u l t i-F P G A a r e r e a l i z e d.T h i s m e t h o d i m p r o v e s t h e r e l i a b i l i t y o f r e m o t e u p d a t i n g a n d d e b u g g i n g o f F P G A s a n d i s e a s y t o e x t e n d w i t h l o w c o s t.K e y w o r d s:X V C p r o t o c o l;Z e d b o a r d;J T A G i n t e r f a c e;T C P/I P;r e m o t e u p d a t i n g a n d d e b u g g i n g0引言早期的船舶雷达设备和其他电子设备一样经历过电子管和晶体管的元件阶段㊂随着大规模集成电路的出现,现在的船舶导航雷达处理单元,大多采用高性能的F P G A 进行数据采集及加速㊂当大量模块单元需要升级时,需要拆卸机箱,耗时耗力[1]㊂考虑到对设备进行程序更新及版本升级,拆卸设备工作量较大且更新时易出错,本文设计了一种基于X V C (X i l i n x V i r t u a l C a b l e)的协议,通过以太网,配合A R M+ F P G A混合架构的Z Y N Q7000系列板卡,利用G P I O口模拟J T A G接口进行远程更新调试的方案㊂该方案操作简单,无需额外开发T C P/I P软件,基于网络的通信能够保证足够远距离的灵活可靠的数据传输,还能对F P G A进行远程H a r d w a r e M a n a g e r调试㊂同时,通过简单的J T A G链路连接可以完成远端多片F P G A在线升级及调试㊂1系统架构本设计采用Z e d b o a r d平台嵌入式开发板,其基于X i l-i n x Z Y N Q7000系列芯片㊂Z Y N Q7000采用A R M+ F P G A架构,具有高度集成性,整个系统具有丰富的处理器和扩展资源,用户可以根据需求将不同的模块连接起来,实现自定义逻辑功能[2],因而非常利于多片F P G A远程更新与调试㊂Z e d b o a r d开发板实物图如图1所示㊂通过A R M处理器中网络口控制A X I(A d v a n c e d e X-t e n s i b l e I n t e r f a c e)总线,将更新程序发送到F P G A端,F P-G A端例化一个J T A G接口对光纤接口板进行更新与调试[3]㊂设计实现远程更新船舶雷达设备中光纤接口板(核心芯片为x c7k325t),整体架构图如图2所示㊂2基于Z e d b o a r d平台的P L端开发在Z e d b o a r d平台上进行P L(P r o g r a mm a b l e L o g i c)端开发,核心是通过G P I O口实现J T A G模块功能㊂J T A G图1 Z e d b o a r d开发板实物图图2 设计整体架构图模块功能主要是通过器件内部的T A P (T e s t A c c e s s P o r t,图3 F P G A 端设计架构测试访问端口)对芯片内部的寄存器进行访问㊂T A P 控制器包含一个具有16个状态的自动状态机,对数据寄存器D R (D a t a R e g i s t e r )和指令寄存器I R (I n s t r u c t i o n R e g-s i t e r)进行相关操作㊂本设计中需要4个I P 核:p r o c e s s i n g _s ys t e m 7_0㊁a x i _i n t e r c o n n e c t _0㊁p r o c _s y s _r e s e t _0和a x i _j t a g _0㊂p r o c e s s -i n g _s y s t e m 7_0为Z Y N Q 7000的系统I P 核㊂a x i _i n t e r -c o n n e c t _0为A X I 总线中转I P 核㊂p r o c _s ys _r e s e t _0为系统连接其他A X I 的复位I P 核㊂a x i _j t a g_0为G P I O 配置J T A G 时序I P 核㊂F P G A 端的设计架构图如图3所示㊂P S (P r o c e s s i n g S y s t e m )端通过A X I 总线对a x i _j t a g_0进行数据传输,主要是内核中寄存器的读写,本文中a x i_j t a g _0内核设计了5个寄存器,即[31ʒ0]s l v _r e g0㊁[31ʒ0]s l v _r e g 1㊁[31ʒ0]s l v _r e g 2㊁[31ʒ0]s l v _r e g3㊁[31ʒ0]s l v _r e g 4,分别对应数据传输长度L E N G T H ㊁J T A G 总线的TM S 信号数据㊁J T A G 总线的T D I 信号据㊁J T A G 总线的T D O 信号数据及P S 端传输有效E N A B L E信号㊂通过开发软件设置a x i _j t a g _0的IP 核寄存器起始地址为0x 43C 0_0000,长度为64K B ㊂最后R T L 分析综合㊁实现并生成二进制文件,给P S 端提供软件设计的硬件平台㊂3 基于Z e d b o a r d 的嵌入式L i n u x 平台开发本设计基于Z e d b o a r d 开发板,在P S 端移植嵌入式L i n u x 系统,并需要在开发应用程序中实现X V C 虚拟网络功能㊂3.1 引导程序的移植嵌入式操作系统都会采用引导程序来引导内核,本设计使用u b o o t 作为启动引导[5]㊂由于使用Z Y N Q 系列芯片,在引导操作系统与之前单纯的只含有A R M 架构的芯片有所不同,需要在u b o o t 之前增加F S B L .e l f 及s ys -t e m.b i t 文件,其L i n u x 系统启动过程如图4所示㊂并对u b o o t 进行相应配置,主要包含内核镜像㊁设备树㊁文件系统㊁网络地址㊁加载方式等信息,配置启动方式为Q S P I 启动,Q S P I 的存储文件及地址分配如图5所示㊂图4 Z Y N Q 7000系统启动过程图5 Q S P I 内部存储文件及地址分配3.2 移植L i n u x 内核及生成设备树及文件系统L i n u x 内核是嵌入式系统的关键部分[4]㊂由于嵌入式系统比较精简,通常需要对L i n u x 内核进行裁剪㊂官网下载L i n u x 内核进行相应的配置,编译生成z I m a ge 文件㊂Z Y N Q 7系列芯片的A R M 内核是通过设备树形式的数据结构来配置系统启动的设备参数[5]㊂本设计需要在官方提供的设备树d e v i c e t r e e .d t b 中增加J T A G 接口I P 核的信息,代码如下:&a m b ar a n g e s ; a x i _j t a g _0:a x i j t a g@43C 00000{ c o m p a t i a b l e ="ge n e r i c u i o "; r e g =<0x 43C 000000x 10000>; };};文件系统为嵌入式系统及设备提供文件输入输出等文件管理功能,本设计不展开叙述,编译生成文件系统r a m d i s k 8M.i m a g e .gz 压缩包[6]㊂在启动过程中,u b o o t 会将此文件加载到内存中,并将内存地址传递给内核,待内核启动时,即可同时运行文件系统[7]㊂3.3 X V C 应用程序设计X V C 协议是一种基于T C P /I P 的虚拟线缆协议[8],允许用户通过网络访问X i l i n x F P G A 的J T A G 接口并对其进行高效率的远程更新与调试㊂X V C 协议内容简单,使用方便,其基本内容可总结为如下三条指令[9],如表1所列㊂表1 函数指令功能函数名功能ge f i n f o ()获取X V C 服务的版本信息㊁t m s 及t d i 字节向量一次能够移位的最大长度s h i f t (n u m b i t s ,t m s v e c t o r ,t d i v e c t o r )以字节向量T M S v e c t o r 和T D I v e c t o r 的形式发送n u m _b i t s 个二进制数给X V C服务器s e t t c k()设置通信协议周期成纳秒,返回值为实际周期值本设计实现的X V C 服务器功能主要通过T C P /I P 协议以及内存映射来进行数据交互㊂首先打开U I O 设备即在系统中生成a x i _j t a g _0的设备,并进行内存映射;然后进行T C P /I P 通信需要的一些初始化的工作,建立通信,通过h a n d l e _d a t a ()函数来交互数据㊂X V C 服务器程序设计如图6所示㊂图6 X V C 服务器程序设计流程图h a n d l e _d a t a ()函数内利用m e m c m p ()及m e m c p y()实现命令识别及应用层与设备的数据交互㊂h a n d l e _d a t a()函数内部设计如图7所示㊂图7 h a n d l e _d a t a()函数流程图X V C 协议具体实现内容如下:由于V i v a d o 软件每次发送s h i f t 指令的最大长度为2048字节,故设置接收缓存区b u f f e r 大小为2048,而发送缓存区r e s u l t 大小为1024㊂网络设置包括创建s o c k e t ()进行网络通信,并设置其通信协议为T C P ,非阻塞模式,端口号为2542,并为其分配I P 地址和MA C 地址㊂打开s o c k e t ()并监听网络连接请求㊂从s o c k e t()接收6个字节数据到缓存区,如果与字符串s h i f t 相同,则继续读取4字节的内容作为n u m _b i t s,并转换为n r _b yt e s ㊂接着按该字节数分别读取T M S v e c t o r 和T D I v e c t o r 到接收缓存区b u f f e r ,之后通过m e m c p y()将数据映射到a x i _j t a g _0设备的寄存器中,a x i _j t a g_0模块将读到的寄存器数据赋值给TM S 和T D I 引脚,同时产生周期性的T C K 信号㊂a x i _j t a g _0模块检测到TD O 引脚上的信号写入寄存器s l v _r e g 3,应用层通过m e m c p y ()读到寄存器,发送给P C ㊂P C 验证收到的T D O 数据无误,继续通过s h i f t 指令发送下一个数据包㊂由此便实现了一个完整的J T A G 数据链路,从而实现远程P C 由网络将配置数据通过模拟的J T A G 接口配置给F P G A 的过程㊂4 测试验证及总结利用V i v a d o 软件中的H a r d w a r e M a n a ge r 进行远程更新,首先通过S D K 软件将系统启动所需要的文件烧写进Z e d b o a r d 平台,启动L i n u x 系统㊂在Z e d b o a r d 运行X V C 服务程序,等待P C 通过网络配置F P G A ㊂在V i v a d o 的T C L c o n s o l e 输入以下命令:>>c o n n e c t _h w _s e r v e r>>o p e n _h w _t a r ge t -x v c _u r l 192.168.1.10:2542从图8中可以看出,已经成功连接到光纤接口板,网络I P 地址为192.168.1.10,端口号为2542㊂图8 实验结果该设计具有可扩展性,可以拓展为多F P G A 远程更新与调试,只需要在逻辑设计时加入多个J T A G 模块,并在X V C 应用层程序中更改相应的设备,即可实现多个F P G A 远程更新与调试㊂随着船用雷达的不断更新换代以及多功能化的实现,设备的更新在所难免,本设计给后续升级提供了可靠便捷的保障,已在某型号舰载设备上使用,对今后工业化设备在线升级算法㊁远程更新程序,具有重要引导意义㊂参考文献[1]陶吉怀.基于S O P C 的船用雷达处理单元研究与设计[D ].成都:电子科技大学,2013.[2]文华武.基于以太网的F P G A 远程程序升级系统的设计与研究[D ].重庆:重庆大学,2012.[3]王利军,张超.基于C P U 实现F P G A 远程更新[J ].信息通信,2013,20(6):4445.[4]薛乾,曾云,张杰.基于X V C 网络协议的多F P G A 远程更新与调试[J ].核技术,2015,38(12).[5]3G P P T S 36.322.R a d i o L i n k C o n t r o l (R T C )pr o t o c o l [S ],2015.[6]余婷婷.嵌入式文件系统的研究与设计[D ].武汉:武汉理工大学,2007.[7]华抒军.基于Z e d b o r a d 的软件无线电软件平台的设计与实现[J ].软件,2015,36(10):5760.[8]X i l i n x .X i l i n x v i r t u a l c a b l e o v e r v i e w ,2015.[9]X i l i n x .X i l i n x v i r t u a l c a b l e r u n n i n g o n Z y n q 7000u s i n g th e pe t a l L i n u x t o o l s ,2015.朱琛(工程师),主要研究方向为F P G A 逻辑设计及嵌入式硬件设计;沈小波(工程师),主要研究方向为雷达信号处理系统;周志刚(工程师),主要研究方向为嵌入式L i n u x 系统及应用㊂(责任编辑:薛士然收稿日期:2020-06-18)。

ZYNQ开发入门

ZYNQ开发⼊门第9章 ZedBoard⼊门前⾯⼤家已经对ZYNQ架构以及相应的开发⼯具有⼀定的认识，接下来我们将带领⼤家来⼀起体验ZYNQ，体验软硬件协同设计的魅⼒。

由于时间的关系，下⾯的⼀些实验（本章及后续章节的实验）可能有不完善的地⽅，欢迎读者向我们反馈。

9.1 跑马灯本实验将指导⼤家使⽤Vivado 集成设计环境创建本书的第⼀个Zynq设计。

这⾥，我们使⽤跑马灯这个⼊门实验来向⼤家介绍Vivado IDE的IP Integrator环境，并在Zedboard上实现这个简单的Zynq嵌⼊式系统。

之后，我们将会使⽤SDK创建⼀个简单的软件应⽤程序，并下载到Zynq的ARM处理器中，对在PL端实现的硬件进⾏控制。

本实验分为三个⼩节来向⼤家进⾏介绍：第⼀节我们将使⽤Vivado IDE创建⼀个⼯程。

在第⼀节的基础上，第⼆节我们将继续构建⼀个Zynq嵌⼊式处理系统，并将完成后的硬件导⼊到SDK中进⾏软件设计。

最后⼀节我们将使⽤SDK编写ARM测试应⽤程序，并下载到ZedBoard上进⾏调试。

实验环境：Windows 7 x64操作系统， Vivado2013.4，SDK 2013.49.1.1 Vivado⼯程创建1)双击桌⾯Vivado快捷⽅式，或者浏览Start > All Programes > Xilinx Design Tools > Vivado2013.4 > Vivado 2013.4来启动Vivado.2)当Vivado启动后，可以看到图9-1的Getting Started页⾯。

图9- 1 Vivado 开始界⾯3)选择Create New Project选项，图9-2所⽰的New Project 向导将会打开，点击Next。

图9- 2 New Project 对话框4)在Project Name对话框中，输⼊first_zynq_design作为Project name, 选择C:/XUP/Zed作为Project location，确保Create project subdirectory被勾选上，如图9-3，点击Next。

AXI总线中文详解.

AXI总线协议资料整理第一部分：1、AXI简介：AXI（Advanced eXtensible Interface）是一种总线协议，该协议是ARM公司提出的AMBA（Advanced Microcontroller Bus Architecture）3.0协议中最重要的部分，是一种面向高性能、高带宽、低延迟的片内总线。

它的地址/控制和数据相位是分离的，支持不对齐的数据传输，同时在突发传输中，只需要首地址，同时分离的读写数据通道、并支持显著传输访问和乱序访问，并更加容易就行时序收敛。

AXI 是AMBA 中一个新的高性能协议。

AXI 技术丰富了现有的AMBA 标准内容，满足超高性能和复杂的片上系统（SoC）设计的需求。

2、AXI 特点：单向通道体系结构。

信息流只以单方向传输，简化时钟域间的桥接，减少门数量。

当信号经过复杂的片上系统时，减少延时。

支持多项数据交换。

通过并行执行猝发操作，极大地提高了数据吞吐能力，可在更短的时间内完成任务，在满足高性能要求的同时，又减少了功耗。

独立的地址和数据通道。

地址和数据通道分开，能对每一个通道进行单独优化，可以根据需要控制时序通道，将时钟频率提到最高，并将延时降到最低。

第二部分：本部分对AXI1.0协议的各章进行整理。

第一章本章主要介绍AXI协议和AXI协议定义的基础事务。

1、AXI总线共有5个通道分别是read address channel、write address channel 、read data channel 、write data channel、write response channel。

每一个AXI传输通道都是单方向的。

2、每一个事务都有地址和控制信息在地址通道（address channel）中，用来描述被传输数据的性质。

3、读事务的结构图如下：4、写事务的结构图如下：5、这5条独立的通道都包含一个信息信号和一个双路的V ALD、READY握手机制。

pg035_axis_interconnect

AXI4-Stream Interconnect v1.1LogiCORE IP Product GuideVivado Design SuitePG035 April 1, 2015Table of ContentsIP FactsChapter1:OverviewFeature Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Licensing and Ordering Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Chapter2:Product SpecificationAXI4-Stream Switch and Arbiter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9AXI4-Stream Clock Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10AXI4-Stream Data Width Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11AXI4-Stream Register Slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12AXI4-Stream Data FIFO Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Resource Utilization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Port Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Chapter3:Designing with the CoreGeneral Design Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Clocking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Resets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Chapter4:Design Flow StepsCustomizing and Generating the Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Constraining the Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Synthesis and Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Appendix A:Migrating and UpgradingAppendix B:DebuggingFinding Help on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Vivado Lab Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41AXI4-Stream Interface Debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Appendix C:Additional Resources and Legal NoticesXilinx Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Revision History. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Please Read: Important Legal Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44IntroductionThe AXI4-Stream Interconnect is a key interconnect infrastructure IP that enables the connection of heterogenous master/slave AMBA® AXI4-Stream protocol compliant endpoint IP. The AXI4-Stream Interconnect routes connections from one or moreAXI4-Stream master channels to one or more AXI4-Stream slave channels.FeaturesThe AXI4-Stream Interconnect IP provides the following capabilities:•Configurable multiple master to multiple slave (up to 16x16) capable cross-pointswitch•Arbitrary TDATA byte width conversion •Synchronous and asynchronous clock rate conversion•Configurable datapath FIFO buffers including store and forward (packet)capable FIFOs•Optional register slice at boundaries to ease timing closure•Support for multiple clock domainsIP FactsLogiCORE™ IP Facts TableCore SpecificsSupportedDevices(1)Zynq®-7000, Virtex®-7, Kintex®-7, Artix®-7 SupportedUser Interfaces AXI4-Stream Resources See Table2-3 and Table2-4.Provided with CoreDesign Files Verilog ExampleDesign Not Provided Test Bench Not Provided ConstraintsFile Xilinx Design Constraints (XDC) SimulationModel Behavioral Verilog SupportedS/W Driver N/ATested Design Flows(2)Design Entry Vivado® Design SuiteSimulation For supported simulators, see theXilinx Design Tools: Release Notes Guide. Synthesis Vivado SynthesisSupportProvided by Xilinx @ /support Notes:1.For a complete list of supported devices, see the Vivado IPcatalog.2.For the supported versions of the tools, see theXilinx Design Tools: Release Notes Guide.Chapter1OverviewThe ARM® AMBA® 4 Specification builds on the AMBA 3 specifications by adding newinterface protocols to provide greater interface flexibility in designs with open standards.Among the new interface protocols is the AXI4-Stream interface that is designed to support low resource, high bandwidth unidirectional data transfers. It is well-suited for FPGAimplementation because the transfer protocol allows for high frequency versus clocklatency trade-offs to help meet design goals.The AXI4-Stream protocol is derived from the single AXI3 write channel. It has nocorresponding address or response channel and is capable of non-deterministic bursttransactions (undefined length). The protocol interface signal set adds optional signals to support data routing, end of transaction indicators, and null-beat modifiers to facilitatemanagement and movement of data across a system. These characteristics are suitable for transferring large amounts of data efficiently while keeping gate count low.The protocol interface consists of a master interface and a slave interface. The twointerfaces are symmetric and point-to-point, such that the master interface output signals can connect directly to the slave interface input signals. Utilizing this concept, it is possible to design AXI4-Stream modules that have a slave interface input channel and a masterinterface output channel. Because the master and slave interfaces are symmetric, anynumber of these modules can be daisy-chained together by connecting the masterinterface output channel of one module to the slave interface input channel of anothermodule and so on. The function of the modules can be a multitude of different options such as buffering, data transforming or routing.The AXI4-Stream Interconnect IP is a powerful collection of modules that provides a rich set of functions for connecting together AXI4-Stream masters and slaves. The IP core is capable of performing data switching/routing, data width conversion, pipelining, clock conversion and data buffering. Parameters and IP configuration Graphical User Interfaces (GUIs) are used to configure the core to suit each of the requirements of the system designer.Feature Summary•AXI4-Stream compliant°Supports all AXI4-Stream defined signals: TVALID, TREADY, TDATA, TSTRB, TKEEP, TLAST, TID, TDEST, and TUSER-TDATA, TSTRB, TKEEP, TLAST, TID, TDEST, and TUSER are optional-Programmable TDATA, TID, TDEST, and TUSER widths (TSTRB and TKEEP width is TDATA width/8)°Per port ACLK/ARESETn inputs (supports clock domain crossing)°Per port ACLKEN inputs (optional)•Core switch°1-16 masters°1-16 slaves°Full slave side arbitrated crossbar switch°Slave input to master output routing based on TDEST value decoding and comparison against base and high value range settings°Round-Robin and Priority arbitration°Arbitration suppress capability to prevent head-of-line blocking°Native switch data width 8, 16, 24, 32, 48, … 4,096 bits (any byte width up to 512 bytes)°Arbitration tuning parameters to arbitrate on TLAST boundaries, after a set number of transfers, and/or after a certain number of idle clock cycles.°Optional pipeline stages after internal TDEST decoder and arbiter functional blocks °Programmable connectivity map to specify full or sparse crossbar connectivity •Built in data width conversion•Each master and slave connection can independently use data widths of 8, 16, 24, 32, 48, … 4,096 bits (any byte width up to 512 bytes)•Built-in clock-rate conversion:°Each master and slave connection can use independent clock rates.°Synchronous integer-ratio (N:1 and 1:N) conversion to the internal crossbar native clock rate.°Asynchronous clock conversion (uses more storage and incurs more latency than synchronous conversion).•Optional register-slice pipelining:°Available on each AXI4-Stream channel connecting to each master and slave device.°Facilitates timing closure by trading-off frequency versus latency.°One latency cycle per register-slice, with no loss in data throughput in the register slice under all AXI4-Stream handshake conditions.•Optional datapath FIFO buffering:°Available on datapaths connecting to each master and each slave.°16, 32, 64, 128,…32,768 deep (16-deep and 32-deep are LUT-RAM based, otherwise are block RAM based)°Normal and Packet FIFO modes (Packet FIFO mode is also known as store-and-forward in which a packet is stored and only released downstream after a TLAST packet boundary is detected.)°FIFO data count outputs to report FIFO occupancy•Additional Error Flags•Error flags to detect conditions such as: TDEST decode error, sparse TKEEP removal, and packer error.Licensing and Ordering InformationThis Xilinx LogiCORE™ IP module is provided at no additional cost with the Xilinx Vivado® Design Suite under the terms of the Xilinx End User License. Information about this and other Xilinx LogiCORE IP modules is available at the Xilinx Intellectual Property page. For information about pricing and availability of other Xilinx LogiCORE IP modules and tools, contact your local Xilinx sales representative.Chapter 2Product SpecificationThe AXI4-Stream Interconnect core is a collection of submodules centered around the AXI4-Stream Switch. There are submodules grouped together in datapaths before and after the switch that allow for data manipulation and flow control. Each individual submodule consists of AXI4-Stream protocol compliant master and slave interfaces. A block diagram of the AXI4-Stream Interconnect is shown in Figure 2-1.The AXI4-Stream Switch supports up to 16 masters to 16 slaves in a full or sparse crossbar configuration using the AXI4-Stream signal TDEST as the routing designator.In Figure 2-1 an AXI4-Stream Master can connect to what is designated as the SlaveInterface (SI) of the AXI4-Stream Interconnect. Similarly, an AXI4-Stream Slave can connect to the Master Interface (MI) of the AXI4-Stream Interconnect. The AXI4-Stream Switch supports sparse configurations, SI side decoding, MI side arbitration (one arbiter per MI port), and various arbitration modes.The datapath consists of submodules that contain one SI port and one MI port chained together between the external SI bus to the switch or from the switch to the external MI bus. The datapath allows for data width conversion, buffering, and clock conversion to and from the AXI4-Stream Switch. All submodules including the AXI4-Stream Switch are optionally instantiated based on parameters to allow full flexibility in design.Figure 2-1:AXI4-Stream Interconnect Block DiagramThe submodules within the AXI4-Stream Interconnect are presented in Table 2-1.AXI4-Stream Switch and ArbiterThe AXI4-Stream Switch supports 1:N, M:1 and M:N configurations. It connects up to 16 masters to 16 slaves. The AXI4-Stream TDEST signaling is required for 1:N and M:N configurations.The AXI4-Stream Switch only supports static routing through fixed base-high TDEST ranges for each MI. A single TDEST map applies to all MI. Each MI is arbitrated independently (slave side arbitration). The AXI4-Stream Switch performs SI-side parallel decoding. Unmapped TDEST transfers will signal a decode error and drop the transfer.The internal arbiter can perform fixed priority arbitration or round robin arbitration. The Arbiter/Switch can arbitrate on a per transfer basis or at packet boundaries (signaled by TLAST or after a configurable number of active or idle transfers.)Table 2-1:Submodules within the AXI4-Stream InterconnectSubmodule NameDescriptionInferred or Explicit instantiationSubset ConverterAdds or removes optional signals from an AXI4-Stream bus interface.Limited inference. Infers a TKEEPinternally to the interconnect if there is a data width converter and TID or TDESTor TLAST signals are present. Register SliceProvides a mechanism to bridgeconnections between Masters/Slaves within different ACLK domains. InferredData Width ConverterAllows expansion of the AXI4-Stream TDATA width by aggregating multiple transfers into one transfer or allows reduction of the AXI4-Stream TDATA width by splitting a transfer into multiple transfers of smaller TDATA width.InferredData FIFO Provides AXI4-Stream data storage.ExplicitSwitchAllows routing from multiple masters to multiple slaves.Inferred when there is more than one master or more than one slave in the system.AXI4-Stream Clock ConverterClock converters are necessary in the AXI4-Stream protocol for converting masters operating at different clock rates to slaves. Typically, the AXI4-Stream Interconnect should be clocked at the same rate as the fastest slave; devices not running at that same rate need to be converted. Synchronous clock converters are ideal because they have the lowest latency and smaller area. However, they are only viable if both clocks are phase-aligned, integer clock ratios, and the fmax requirements are able to be met. Asynchronous clock converters are a generic solution able to handle both synchronous/asynchronous clocks with arbitrary phase alignment. The trade-off is that there is a significant increase in area and latency associated with asynchronous clock converters. If global clock enables are configured, additional logic is generated to handle clock enables independently for each clock domain. There is a clock converter module available in every datapath and it is instantiated if either the clocks are specified as asynchronous or have different synchronous clock ratios. The clock converter module performs the following functions:• A clock-rate reduction module performs integer (N:1) division of the clock rate from its input (SI) side to its output (MI) side.• A clock-rate acceleration module performs integer (1:N) multiplication of clock rate from its input (SI) to output (MI) side.•Asynchronous clock rate conversion between the input and output uses an internal FIFO Generator instantiated module.•Clock Enable crossing logic that handles different ACLKEN signals per clock domain. Figure2-2 shows the clock converter with support for independent ACLKEN signals on its SI and MI.Figure 2-2:Clock Converter Module Block DiagramAXI4-Stream Data Width ConverterData width converters (upsizer/downsizer) are required when interfacing different data width cores with each other. One data width conversion module is available to handle all supported combinations of data widths.The conversion follows the AMBA® AXI4-Stream Protocol Specification with regards to ordering and expansion of TUSER bits. The width converter does not process any special TUSER encoding formats; it only maps TUSER bits across the width conversion function using the algorithm specified in the AXI4-Stream protocol specification. Depending on the usage/meaning of TUSER to the endpoint IP, additional external logic might be required to manipulate TUSER bits that have been transformed by the width converter. The number of TUSER bits per TDATA bytes must remain constant between input and output.Up-conversion requires that each incoming beat that is composed of the new larger beat consists of identical TID and TDEST bits and no intermediate TLAST assertions. Partial data might be flushed when either the TLAST bit is received or TID/TDEST changes before enough data is accumulated to send out a complete beat. Unassigned bytes are flushed out as null bytes.Any non-integer multiple byte ratio conversion (N:M) is accomplished by calculating the lowest common multiple (LCM) of N and M and then up-converting from N:LCM then down-converting from LCM:M.Up-conversion features: Range: Input 1-256 Bytes, Output 2-512 Bytes •Supports full range of 1:N byte ratio conversions•Minimum latency of 2 + N clock cycles in 1:N byte ratio up-conversion.Down-conversion features:•Range: Input 2-512 bytes, output 256-1 bytes •Supports full range of N:1 byte ratio conversions •Minimum Latency: two clock cyclesAXI4-Stream Register SliceThe register slice is a multipurpose pipeline register that is able to isolate timing paths between master and slave. The register slice is designed to trade-off timing improvement with area and latency necessary to form a protocol compliant pipeline stage. Implemented as a two-deep FIFO buffer, the register slice supports throttling by the master (channel source) and/or slave (channel destination) as well as back-to-back transfers withoutincurring unnecessary idle cycles. The module can be independently instantiated at all port boundaries.Figure 2-3:Data Width Converter (Down Conversion) Module Block DiagramAXI4-Stream Data FIFO BufferThe FIFO module is capable of providing temporary storage (a buffer) of the AXI4-Stream data. The FIFO Buffer module should be used in between two endpoints when:•More buffering than a register slice is desired.•Store and forward: to accumulate a certain number of bytes from the master before forwarding them to the slave (packet mode.)The FIFO Module can also implement asynchronous clock conversion so when asynchronous clock conversion and FIFOs are enabled on the same interface, redundant FIFOs are not instantiated. The FIFO module uses the Xilinx LogiCORE™ IP FIFO Generator module. This supports native AXI4-Stream with the following features:•Variable FIFO depths•FIFO data widths from 8 to 1,024 bits1,024-bit FIFO data width limit is a FIFO Generator restriction that limits the payload width of the transfer being buffered in the FIFO. This limit might be removed in a future version of FIFO Generator, at which point the AXIS FIFO also supports FIFO buffering for transfers with payloads including TDATA widths of up to 4,096 bits. The width of the FIFOs are determined by the width of the SWITCH component.•Independent or common clock domains•Symmetric aspect ratios•Asynchronous active-Low reset•Selectable memory type. The memory type is inferred as distributed block RAM for depths of 32 or less and block RAM for all others.•Operates in First-Word Fall-Through mode (FWFT)•Occupancy interface•Both FIFO Generator rd_data_count and wr_data_count are passed as separate outputs synchronized to the read side and write side clock domains.StandardsThis core has bus interfaces that comply with the ARM® AMBA AXI4-Stream Protocol Specification Version 1.0.PerformanceThe performance of the AXI4-Stream Interconnect core is limited only by the FPGA logic speed. The core utilizes only block RAMs, LUTs, and registers and contains no I/O elements. The values presented in this section should be used as an estimation guideline; actual performance can vary.Maximum FrequenciesThe core is designed to meet the maximum target frequency of 250 MHz on a Kintex®-7 FPGA (xc7k325tffg900-1.) It can be expected that an -2 speed grade part can achieve 5% higher maximum target frequency and that a -3 speed grade part can achieve 10% higher maximum target frequency. For switch configurations with more than approximately four masters or slaves, the target maximum frequency can be reduced by 20-25%.LatencyThe latency in the IP core can vary on an interface-to-interface basis, depending on how the IP core is configured. The latency is calculated in clocked cycles and is measured as the time that it takes from the assertion of the slave interface TVALID signal to the first assertion of the master interface TVALID signal. The latency for each of the individual submodules is listed in Table2-2. To obtain the minimum latency for your system, you must add up the values shown in the following tables. The latency specifications assume that the master interface TREADY signal input is always asserted. The back-to-back delay is the number of clock cycles that back-to -back transfers can be accepted by the module. This can be observed by counting how many cycles slave interface TREADY is Low after a transfer is accepted on the interface.Table 2-2:Latency by Module TypeModule Type Latency(clocks)Back-To-BackDelay (clocks)DescriptionRegister Slice 1 0Adding a register slice always adds one cycle oflatency. There is no back-to-back delay.Data Width Converter (upsizer) [datawidthratio]The latency varies based on the data width ratio. Forthe slave interface datapath, the ratio is from theslave interface data width to the switch data width.For the master interface datapath, the ratio is fromthe switch data width to the master interface datawidth.Example: If a 1:4 byte data converter is used, thelatency of the module will be 4 clock cycles.Data Width Converter (downsizer)1 [data widthratio]-1The back-to-back delay varies based on the datawidth ratio. For the slave interface datapath, the ratiois from the slave interface data width to the switchdata width. For the master interface datapath, theratio is from the switch data width to the masterinterface data width.Example: If a 2:1 byte data converter is used, then themodule can only accept transfers every other cycle.Synchronous Clock Converter (speed-up)1 0The synchronous clock converter latency is reportedas units of the slave interface clock.Synchronous Clock Converter (speed-down)1 [clockratio]-1The synchronous clock converter latency is reportedas units of the slave interface clock. Back-to-backdelay varies based on the clock ratio.Example: If using a synchronous 150 MHz-to-50 MHz3:1 clock converter (clock ratio of 3), the back-to-backdelay will be 2 clock cycles.Asynchronous Clock Converter NotDefined0The latency associated with an asynchronous clockconverter can vary greatly depending on the clocks. Itcan be expected to see latencies of 5 clock cycles ormore. See the LogiCORE IP FIFO Generator ProductGuide v9.2 (PG057) [Ref1] for more details.FIFO Generator AXI4-Stream (Normal) FIFO 30The FIFO when configured in normal mode willoutput data as soon as it is possible.See the LogiCORE IP FIFO Generator Product Guidev9.2 (PG057) [Ref1] for more details.FIFO Generator AXI4-Stream Packet FIFO UntilTLAST isreceivedor FIFOis full.The FIFO when configured in packet mode will outputdata only when a TLAST is received or the FIFO hasfilled.See the LogiCORE IP FIFO Generator Product Guidev9.2 (PG057) [Ref1] for more details.Crossbar Switch20The output latency of the switch under ideal conditions is 2 clock cycles. There is 1 cycle of latency for the TDEST decode and 1 cycle of latency for the arbiter grant (if idle) to make up the 2 cycles listed in the table. If the output pipeline is enabled, add 1 cycle. The back-to-back delay is defined in this case for an already granted arbitration. Back-to back- arbitration will result in 1 cycle delays between transactions. To reduce arbitration cycles, set the Arbitrate on maximum number of transfers parameter to a higher number, or set Arbitrate on TLAST to Yes if the design allows for it.Table 2-2:Latency by Module Type (Cont’d)Module Type Latency(clocks)Back-To-BackDelay (clocks)DescriptionThroughputThe throughput of a datapath through the AXI4-Stream Interconnect is calculated as TDATA width x clock frequency of each of the paths determined by the SI interface, the switch, and MI interface. The minimum throughput of an individual path for which the transfer will traverse determines the overall throughput of the datapath.For example, a 2x2 configured AXI4-Stream Interconnect is configured as follows:•S00 interface at 256 bits x 250Mhz (64,000 Mbits per second)•S01 interface at 128 bits x 100Mhz (12,800 Mbits per second)•Switch at 256 bits at 200Mhz (51,200 Mbits per second)•M00 interface at 512 bits x 250MHz (128,000 Mbits per second)•M01 interface at 32 bits x 200MHz (6,400 Mbits per second)Calculating the max throughput for each of the paths:•S00->Switch->M00: 51,200 Mbits per second•S01->Switch->M00: 12,800 Mbits per second•S00->Switch->M01: 51,200 Mbits per second•S01->Switch->M01: 6,400 Mbits per secondThe slowest theoretical throughput of the system can be seen as that path fromS01->Switch->M01 as limited by the throughput calculated for the M01 interface. The switch is capable of performing multiple transfers simultaneously between masters and slaves; therefore, total simultaneous throughput is calculated as S00->Switch->M00 +S01->Switch->M01 throughputs or 57,600 Mbits per second.on TLAST transfer to Yes, or by increasing the Arbitrate on maximum number of transfers parameter.Resource UtilizationThe resource utilization of the AXI4-Stream Interconnect is primarily a function of thepayload width of the stream. The payload width of the stream is calculated as the width of the TDATA , TSTRB , TKEEP , TLAST , TID , TDEST and TUSER signals. For example, consider the design that has the following signal widths listed in Table 2-3.The payload width W P is calculated as 128 + 16 + 16 + 1 + 0 + 4 + 16 = 181. The register slice works as a double buffer and is able to hold two AXI4-Stream transfers at one time. Therefore, a rough estimate of utilization can be achieved by multiplying the payload width by two. This signal configuration from Table 2-3 is used in Table 2-4 as the basis for the resource utilizations of the individual modules on a Kintex-7 FPGA (xc7k325tffg900-1) using the Vivado® synthesis tool. Actual resource utilization can be lower after runningimplementation on the design. To obtain the total resource utilization, add the counts from each module in the design.Table 2-3:Signal Widths Used for Resource Utilization EstimationAXI4-Stream SignalWidth (bits)TDATA 128TSTRB 16TKEEP 16TLAST 1TID 0 TDEST 4TUSER 16Total (W p )181。

pg059-axi-interconnect

pg059-axi-interconnectPG059-AXI-INTERCONNECT编辑者：时间：2014-12-16Introduction介绍Xilinx LogicCORE IP AXI Interconnect模块实现寄存器映射的主设备和从设备之间的连接。

AXI互联模块只用于内存映射的数据传输。

AXI互联模块包含多个LogicCORE IP实例，即Infrastructure cores。

支持的特色：兼容AXI协议。

可以通过配置支持AXI3、AXI4和AXI4-Lite协议。

接口数据位宽：AXI4和AXI3：32，64，128，256，512或1024位。

AXI4-Lite：32或64位地址位宽：最高64位USER位宽（每通道）：最大1024位ID位宽：最大32位为了减少资源的使用，可以生成只支持读或写的主设备或从设备Overview概述AXI互联IP核（AXI Interconnect core）只能在VIVADO设计套件的IP集成器（IP Integrator block）设计中使用。

AXI互联IP核是一个层次化的（hierarchical）设计模块，包含多个LogicCORE IP核实例（被称为infrastructure cores）。

infrastructure cores可以在系统设计的时候进行配置、连接。

每一个（infrastructure cores）也可以被直接添加到AXI互联模块外部的模块设计中，或者被添加到Vivado IP Catalog的模块中，或者用在HDL设计中。

AXI互联IP核允许任意AXI主设备和AXI从设备的连接，可以根据数据位宽、时钟域和AXI Sub-protocol进行转换。

当外部主设备或从设备的接口特性不同于互联模块内部的crossbar switch的接口特色时，相应的基本模块（infrastructure cores）就会被自动的引入来执行正确的转换。

VIP验证应用-详细-AXI

Description Clock count delay from write address phase N -> 1st wvalid N. To model early data phase, value may be negative however delay will be no less than dw_valid_delay clock cycles after last beat of previous address phase transfer Clock count delay between wready/wvalid N -> wvalid N+1 (for read or write). Note if total_outstanding_depth==1, then delay of first beat of T+1 is measured from when T completes. Clock count delay transaction T read address phase -> T+1 address phase (read). Note if total_outstanding_depth==1, then delay is measured from when T completes. Clock count delay transaction T write address phase -> T+1 address phase (write). Note if total_outstanding_depth==1, then delay is measured from when T completes. Clock count delay from bvalid N -> bready N Clock count delay between bready/bvalid N -> bready N+1 Clock count delay from rvalid N -> rready N Clock count delay between rready/rvalid N -> rready N+1 Clock count delay between wready/wvalid N -> wvalid N+1 (for read or write). Note if total_outstanding_depth==1, then delay of first beat of T+1 is measured from when T completes. Clock count delay between rready/rvalid N -> rready N+1

基于BRAM的NVMe控制器原型仿真平台设计

计算机工程与设计COMPUTER ENGINEERING AND DESIGN2021年4月第42卷第4期Apr. 2021Vol.42 No.4基于BRAM 的NVMe 控制器原型仿真平台设计冯志华，王华卓+，安东博，罗重，王红艳（中国航天科工集团第二研究院七O 六所，北京100854）摘要：为加快NVMe 控制器的开发进程，实现NVMe 标准命令的快速仿真验证，提出一种基于BRAM 的NVMe 控制器原型仿真平台的设计方法。

将采用块随机存储器代替闪存作为存储介质，处理器直接将数据写入块随机存储器，缩短数据的存储路径，极大地降低工程结构的复杂度，克服NVMe 控制器工程仿真过程耗时较多的缺点。

仿真结果表明了该方法的可行性，相比于原始NVMe 控制器，其结构更简单，仿真过程用时明显减少。

关键词：存储介质；非易失性存储；原型仿真；现场可编程逻辑门阵列；块随机存储器中图法分类号：TP333 文献标识号：A 文章编号：1000-7024 （2021） 041181-07doi ： 10. 16208/j. issnl 000-7024. 2021. 04. 040Design of NVMe protocol controller prototype simulationplatorm based on BRAMFENG Zhi-hua , WANG Hua-zhuo + , AN Dong-bo , LUO Chong , WANG Hong-yan(I n s t i u t e 706 , S e c o n d A c a d e m y o f C h i n a A e r o s p a c e S c i e n c e a n d I n d u s t r y C o r p o r a t i o n , B e i j n g 10854 , C h i n a )Abstract： T o s p e d u p t h e d e v e l o p m e n t p r o c e s o f N VMe p r o t o c o l c o n t r o l e r a n d r e a l i z e t h e r a p i d s i m u l a t i o n a n d v e r i f c a t i o n o f NVMe s t a n d a r d c o m ma n d s , a d e s i g n m e t h o d o f N V M e p r o t o c o l c o n t r o l e r p r o t o t y p e s i m u l a t i o n p l a t f o r m b a s e d o n B RAM w a s C r o C o s e d . B l o c k r a n d o m a c e s m e m o r y w a s u s e d i n s t e a d o f l a s h m e m o r y a s t h e s t o r a g e m e d i u m . T h e C r o c e s o r w r o t e d a t a d i - r e c t l y i n t o t h e b l o c k r a n d o m a c e s m e m o r y ,t h e s t o r a g e p a t h o f d a t a w a s s h o r t e n e d , a n d t h e c o m p l e x i t y o f e n g i n e r i n g s t r u c t u r e wa ?g r e a t l y r e d u c e d t h u ?t h e ?h o r t c o m i n g t h a t N V M e p r o t o c o l c o n t r o l e r t a k e ?m o r e t i m e i n e n g i n e r i n g ?i m u l a t i o n p r o c e ?w a ? o v e r c o m e .C o m p a r e d w i t h t h e o r i g i n a l N V M e p r o t o c o l t h e c o n t r o l e r ?t r u c t u r e i ?i m p l e r a n d t h e ?i m u l a t i o n t i m e i ?i g n i f c a n t l yr e d u c e d .Key words ： s t o r a g e m e d i a ； n o n -v o l a t i l e m e m o r y e x p r e s ； p r o t o t y p e s i m u l a t i o n ； F P G A ； B RAM 4引言近年来，面向PCIe 接口的非易失性存储协议，简称NVMe 协议,为固态硬盘（solid state disk, SSD ）提供了高带宽、高吞吐量和低延迟的性能提升。

AXI4的主机协议代码分析

AXI4的主机协议代码分析AXI4的主机协议代码分析⼀、模块分析（1）端⼝列表input wire INIT_AXI_TXN,// Asserts when ERROR is detectedoutput reg ERROR,// Asserts when AXI transactions is completeoutput wire TXN_DONE,// AXI clock signalinput wire M_AXI_ACLK,// AXI active low reset signalinput wire M_AXI_ARESETN,// Master Interface Write Address Channel ports. Write address (issued by master)output wire [C_M_AXI_ADDR_WIDTH-1 : 0] M_AXI_AWADDR,// Write channel Protection type.// This signal indicates the privilege and security level of the transaction,// and whether the transaction is a data access or an instruction access.output wire [2 : 0] M_AXI_AWPROT,// Write address valid.// This signal indicates that the master signaling valid write address and control information.output wire M_AXI_AWVALID,// Write address ready.// This signal indicates that the slave is ready to accept an address and associated control signals.input wire M_AXI_AWREADY,// Master Interface Write Data Channel ports. Write data (issued by master)output wire [C_M_AXI_DATA_WIDTH-1 : 0] M_AXI_WDATA,// Write strobes.// This signal indicates which byte lanes hold valid data.// There is one write strobe bit for each eight bits of the write data bus.output wire [C_M_AXI_DATA_WIDTH/8-1 : 0] M_AXI_WSTRB,// Write valid. This signal indicates that valid write data and strobes are available.output wire M_AXI_WVALID,// Write ready. This signal indicates that the slave can accept the write data.input wire M_AXI_WREADY,// Master Interface Write Response Channel ports.// This signal indicates the status of the write transaction.input wire [1 : 0] M_AXI_BRESP,// Write response valid.// This signal indicates that the channel is signaling a valid write responseinput wire M_AXI_BVALID,// Response ready. This signal indicates that the master can accept a write response.output wire M_AXI_BREADY,// Master Interface Read Address Channel ports. Read address (issued by master)output wire [C_M_AXI_ADDR_WIDTH-1 : 0] M_AXI_ARADDR,// Protection type.// This signal indicates the privilege and security level of the transaction,// and whether the transaction is a data access or an instruction access.output wire [2 : 0] M_AXI_ARPROT,// Read address valid.// This signal indicates that the channel is signaling valid read address and control information.output wire M_AXI_ARVALID,// Read address ready.// This signal indicates that the slave is ready to accept an address and associated control signals.input wire M_AXI_ARREADY,// Master Interface Read Data Channel ports. Read data (issued by slave)input wire [C_M_AXI_DATA_WIDTH-1 : 0] M_AXI_RDATA,// Read response. This signal indicates the status of the read transfer.input wire [1 : 0] M_AXI_RRESP,// Read valid. This signal indicates that the channel is signaling the required read data.input wire M_AXI_RVALID,// Read ready. This signal indicates that the master can accept the read data and response information.output wire M_AXI_RREADY识别⽅法：M_AXI作为前缀，表明是主机的AXI协议变量。

一种Zynq7000内部通信的程序构架

一种Zynq7000内部通信的程序构架发布时间：2021-04-22T12:31:00.760Z 来源：《科学与技术》2021年3期作者：王晓迪余佩屈鹏珍[导读] Zynq7000系列基于Xilinx全可编程的可扩展处理平台结构是一种SoC芯片王晓迪余佩屈鹏珍（陕西长岭电子科技有限责任公司产品开发部，宝鸡，721006）摘要：Zynq7000系列基于Xilinx全可编程的可扩展处理平台结构是一种SoC芯片，集成了Cortex-A9双核ARM和FPGA。

本文讲述了如何利用xilinx公司Zynq7000系列芯片的AXI_APB_Bridge、AXI_BRAM_Controller和AXI_GPIO这三种IP核实现芯片内部ARM与FPGA之间的通信。

关键词：Zynq7000 IP核内部通信1 引言Zynq7000系列基于Xilinx全可编程的可扩展处理平台结构，该结构在单芯片内集成了基于ARM公司双核ARM Cortex-A9多核处理器的处理系统（PS）,以及基于Xilinx可编程逻辑资源的可编程逻辑系统（PL）。

同时，该结构基于最新的高性能低功耗的28nm、高k金属栅极工艺，能够保证该器件在高性能运行的同时，具有比同类Cortex-A9双核处理器更低的功耗。

Zynq7000系列内部的多核处理器的处理系统与可编程逻辑系统通信（PS-PL）接口主要有AXI_GP、AXI_HP和AXI_ACP接口，这些接口时序复杂，不利于实际应用，因此，xilinx公司还提供了自定义ip的方法来满足用户的个性需求，同时提供了典型的接口ip供用户使用。

利用自定义ip的方法实现PS-PL通信接口是使用xilinx提供的ip封装工具，将用户代码封装成为标准AXI总线形式模块，将模块以图形化的方式加入顶层文件中并进行AXI总线的自动连接，这种方式的优点在于操作较为简单，缺点在于程序中模块数量很多的时候图形化的方式较为混乱，并且调试不易。

核心AXI4 SRAM v2.1手册说明书

HB0716 CoreAXI4SRAM v2.1 Handbook02 20171Revision HistoryThe revision history describes the changes that were implemented in the document. The changesare listed by revision, starting with the most current publication.1.1Revision 1.0Revision 1.0 is the first publication of this document. Created for CoreAXI4SRAM v2.1.Contents1Revision History (3)1.1Revision 1.0 (3)2Preface (8)2.1About this Document (8)2.2Intended Audience (8)3Introduction (9)3.1Overview (9)3.2Features (9)3.3Core Version (10)3.4Supported Families (10)3.5Device Utilization and Performance (10)4Functional Description (11)4.1AXI4 Slave Interface Logic (11)4.2Main Control Logic (12)4.3SRAM Interface Control Logic (12)5Feature Description (13)5.1Write Only Interface (13)5.2Read Only Interface (13)6Interface (14)6.1Configuration Parameters (17)6.1.1CoreAXI4SRAM Configurable Options (17)7Timing Diagrams (18)8Tool Flow (19)8.1License (19)8.2RTL (19)8.3SmartDesign (19)8.4Configuring CoreAXI4SRAM in SmartDesign (20)8.5Simulation Flows (21)8.6Synthesis in Libero (22)8.7Place-and-Route in Libero (22)9Testbench (23)9.1User Test-bench (23)10System Integration (24)11.1Ordering Codes (25)List of FiguresFigure 1 CoreAXI4SRAM Block Diagram (9)Figure 2 CoreAXI4SRAM Design Diagram (11)Figure 3 SmartDesign CoreAXI4SRAM Instance View (19)Figure 4 Configuring CoreAXI4SRAM in SmartDesign (20)Figure 5 Simulation Directory in the Libero SoC 'files' Pane (21)Figure 6 CoreAXI4SRAM User Test-bench (23)Figure 7 CoreAXI4SRAM Example Design (24)Table 1 CoreAXI4SRAM Device Utilization (10)Table 2 I/O Signals (14)Table 3 CoreAXI4SRAM Configuration Parameters (17)Table 4.Ordering Codes (25)2Preface2.1About this DocumentThis handbook provides details about the CoreAXI4SRAM and how to use it.2.2Intended AudienceFPGA designers using Libero® System-on-Chip (SoC).3Introduction3.1OverviewThe CoreAXI4SRAM is an AXI4 slave memory controller and provides access to fabric memories onPolarFire devices.Figure 1 CoreAXI4SRAM Block DiagramThe CoreAXI4SRAM provides access to the embedded large SRAM (LSRAM) and small SRAM (uSRAM)blocks available on the PolarFire system-on-chip (SoC) field programmable gate array (FPGA) familydevice. The core provides an AXI4 Slave interface for addressing and accessing data from theconnected memory devices. It facilitates convenient access to fabric SRAM by AXI4 master. TheCoreAXI4SRAM IP is programmable through parameter configuration.The AXI protocol defines five independent channels, which are: Write address channel, Readaddress channel, Write data channel, Write response channel, and Read data channel. Refer to theAMBA3 AXI specification document for more details.3.2Features∙Supports AXI4 protocol only∙Supports 1:1 synchronous clock∙Interface data widths: 32 and 64-bits∙Supports 32-bit address bus∙Supports single/burst transfers∙Supports AXI4 increment and wrap transfers, except fixed transfers∙Configurable Read / Write, Read-only or Write-only interfacesThe following features are not supported:∙No support for streaming masters∙Does not support AXI4 user and region signals∙Does not support AXI4 QoS signals∙Does not support low power interface signals∙No trust zone security support∙Does not support data width conversion. Microsemi recommends you to use the Microsemi CoreAXI4Interconnect for data width conversion.∙Does not support AXI3 to AXI4 protocol conversion. Microsemi recommends you to use the Microsemi CoreAXI4Interconnect for protocol conversion.3.3Core VersionThis handbook is for CoreAXI4SRAM version 2.1.3.4Supported Families∙PolarFire3.5Device Utilization and PerformanceUtilization and performance data is listed in Table 1 for the PolarFire device family. The data listed inthis table is indicative only. The overall device utilization and performance of the core is systemdependent.Table 1 CoreAXI4SRAM Device UtilizationNote: The data in this table is achieved using typical synthesis and layout settings. Frequency (in MHz) was set to 100 and speed grade was -1.4Functional DescriptionThe AXI4 master communicates with CoreAXI4SRAM slave by requesting access by providingtransaction details on Write address channel or Read address channel. The CoreAXI4SRAM consistsof three major functional blocks, AXI4 Slave Interface block, Main Control logic block, and SRAMMemory Interface logic block. A basic block diagram of the design for CoreAXI4SRAM is as shown inFigure 2.The connected AXI4 master communicates with the CoreAXI4SRAM slave interface. The masterperforms requests access to fabric memory by issuing write/read requests on the Write addresschannel or Read address channel respectively.Following are the three major functional blocks of CoreAXI4SRAM:•Slave Interface Logic•Main Control Logic•SRAM Interface Control LogicFigure 2 CoreAXI4SRAM Design Diagram4.1AXI4 Slave Interface LogicThe core provides AXI4 slave interface to connect to the AXI4 master or AXI4 interconnect businterface. The AXI4 slave interface of the core complies with AMBA® AXI4 protocol specifications.The core supports single beat or burst AXI4 transactions. The burst transactions for incrementalbursts [INCR] can be from 1 beat to 256 beats and the burst transactions for wrapping burst [WRAP]can be 2, 4, 8, or 16 beats. The CoreAXI4SRAM does not support fixed type of burst transactions.The core does not support outstanding write/read transactions. It de-asserts the ready to the AXI4Master and re-asserts only when the current transaction is complete.4.2Main Control LogicThe Main Controller block contains control state machines to perform read/write to the fabricmemories. This block interfaces between the AXI4 slave interface block and the SRAM Control logicblock. It is responsible for the generation of the necessary control signals required to access thefabric memory. The AXI4 interface of the core also performs the write and read channel arbitrationand the address decoding functionalities. The channel arbitration implements a round robinalgorithm and is applicable only when both AXI4 write channels and AXI4 read channels are activesimultaneously. The core responds to read and write transactions in round robin manner. Thisguarantees that none of the AXI channels are held back in wait mode. The address decoding logicutilizes the address received on the AXI4 slave interface to generate the SRAM read/write addressfor the selected memory type.4.3SRAM Interface Control LogicThe SRAM Control logic block performs read/write to the fabric memories. It generates thenecessary memory control signals associated with the selected memory type (based onSEL_SRAM_TYPE configuration). This block is responsible for generation of the address and enablessignals to the fabric memories. In addition, it also aligns the address and data received from the AXI4interface module to match the address and data width configured for the fabric memory.5Feature Description5.1Write Only InterfaceThe core supports Write-Only interface resulting in reduced resource utilization. This interface isenabled by default. The parameter AXI4_IFTYPE_WR is used to enable/ disable the interface.5.2Read Only InterfaceThe core supports Read-Only interface resulting in reduced resource utilization. This interface isenabled by default. The parameter AXI4_IFTYPE_RD is used to enable/ disable the interface.6InterfaceI/O Signal descriptions for CoreAXI4SRAM are defined in Table 2.Table 2 I/O Signals6.1Configuration Parameters6.1.1CoreAXI4SRAM Configurable OptionsThere are a number of configurable options that apply to CoreAXI4SRAM as shown in Table 3. If aconfiguration other than the default is required, use the configuration dialog box in SmartDesign toselect appropriate values for the configurable options.Table 3 CoreAXI4SRAM Configuration ParametersCoreAXI4SRAM IP complies with the AMBA® AXI4 protocol specifications timings.8Tool Flow8.1LicenseCoreAXI4SRAM does not require a register transfer level (RTL) license to be used and instantiated.8.2RTLComplete RTL source code is provided for the core and testbenches.8.3SmartDesignCoreAXI4SRAM is preinstalled in the SmartDesign IP Deployment design environment. An exampleinstantiated view is as shown in Figure 3. The core can be configured using the configuration GUIwithin the SmartDesign, as shown in Figure 4.For more information on using SmartDesign to instantiate and generate cores, refer to theUsing DirectCore in Libero® SoC User Guide.Figure 3 SmartDesign CoreAXI4SRAM Instance View8.4Configuring CoreAXI4SRAM in SmartDesignFigure 4 Configuring CoreAXI4SRAM in SmartDesign8.5Simulation FlowsThe User Testbench for CoreAXI4SRAM is included in all releases.To run simulations, select the User Testbench flow within SmartDesign and click Save and Generateon the Generate pane. The User Testbench is selected through the Core Testbench ConfigurationGUI.When SmartDesign generates the Libero SoC project, it installs the user testbench files.To run the User Testbench, set the design root to the CoreAXI4SRAM instantiation in the Libero SoCdesign hierarchy pane and click the Simulation icon in the Libero SoC design flow window. Thisinvokes ModelSim® and automatically run the simulation.Note:To run the User testbench, copy the Libero generated ram_init.mem(/component/Actel/DirectCore/CoreAXI4SRAM/<version_no>/ram_init.mem) file in to Simulationdirectory in the Libero SoC 'files' pane as shown in the Figure 5.Figure 5 Simulation Directory in the Libero SoC 'files' Pane8.6Synthesis in LiberoTo run synthesis on the core, set the design root to the SmartDesign design and click the Synthesisicon in Libero SoC. The Synthesis window appears displaying the Synplify®project. To run Synthesis,select the Run icon.8.7Place-and-Route in LiberoAfter the design is synthesized, run the compilation and then place-and-route the tools.CoreAXI4SRAM requires no special place-and-route settings.9TestbenchA unified test-bench is used to verify and test CoreAXI4SRAM called as user test-bench.9.1User Test-benchThe user test-bench is included with the releases of CoreAXI4SRAM that verifies few features of theCoreAXI4SRAM.Figure 6 CoreAXI4SRAM User Test-benchFigure 6 shows the user test bench instantiating a Microsemi® DirectCore CoreAXI4SRAM DUT, theAXI Master model, and an AXI4SRAM DUT. The AXI master model drives the Write and Readtransactions to the DUT. The IP core sends the corresponding response and the test benchenvironment determines whether or not the transaction is successful.Note: The User testbench for VHDL is fixed to run at AXI4_DWIDTH = 64 only.10System IntegrationThe example design configures the CoreAXI4SRAM IP and tests the PCIe application on the PolarFiredevice.Figure 7 CoreAXI4SRAM Example Design•CoreAXI4SRAM is connected to System clock generated by the PLL.•Fabric reset is used to reset the core.•CoreAXI4SRAM is connected to the slave interface of the CoreAXI4Interconnect and accessed bythe PCIe controller on the master interface.The example design can be obtained from the Microsemi technical support team.11Ordering Information11.1Ordering CodesOrder CoreAXI4SRAM through your local Microsemi sales representative. Use the following numberconvention when ordering: CoreAXI4SRAM-XX. XX is listed in Table 4.Table 4·Ordering Codes。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

CopperInterconnectTechnologyfor32nm

页数:59
INTERCONNECT介绍

页数:16
FPGA可编程逻辑器件芯片EP2S90F780C4N中文规格书

页数:5
FOXCONN OPTICAL INTERCONNECT TECHNOLOGIES SINGAPOR

页数:11
IT专业英语词汇

页数:45
开放互连联盟OpenInterconnectConsortium的家庭自动化联盟

页数:4
计算机网络词汇

页数:24
以inter为前缀的单词

页数:2
pci名词解释

页数:2
常见总线标准

页数:1

axi interconnect 代码

合集下载

AXI-HP接口_DMA_GIC编程

Zedboard开发例程--点亮Led流水灯(基于PS+PL+SmallRTOS)

赛灵思命名规则

FPGA可编程逻辑器件芯片XCVU3P-1FFVC1517C中文规格书

AXI总线的一些知识

AXI协议解析（三）

AXI-HP接口_DMA_GIC编程

Xilinx Vivado zynq7000 入门笔记

VIP验证应用-详细-AXI

基于XVC协议的Zedboard平台远程更新与调试

ZYNQ开发入门

AXI总线中文详解.

pg035_axis_interconnect

pg059-axi-interconnect

VIP验证应用-详细-AXI

基于BRAM的NVMe控制器原型仿真平台设计

AXI4的主机协议代码分析

一种Zynq7000内部通信的程序构架

核心AXI4 SRAM v2.1手册说明书

文档推荐

最新文档