TMS320DSP算法标准_XDAIS_及参考构架RF5综述

格式：pdf
大小：356.26 KB
文档页数：5

下载文档原格式

/ 5

RF5

框架参考框架。

RF5适用于含有多通道和多算法结构的高密集度应用程序。

与低等级参考框使用线程(任务TSK)阻塞，可用于包含线程间有复杂依赖关系的应用程序。

另还具有可变的通道管理、基于任务TSK的应用程序、高效的任务间通信，以及结构化的线程安全控制机制，且易于替换I／O驱动设备和易于调试。

参考框架最重要的要求就是保证易于与用户硬件接口。

每一个参考架构均被打包成基于开发工具包或其他板卡的完整的应用程序。

针对每一个板卡，可以提供不同等级的参考框架。

对应用软件进行调整以适合参考框架，主要有3个基本要求：调整算法单元和改变通道数量；调整应用程序以使其适应硬件系统；改变驱动以利于运行终端硬件。

了一个通道基础框架，使其很容易就可以封装XDAIS算法。

通过这一封装，应用程序设计1．1RF5数据处理RF5共有4个基本的数据处理部件：任务(task)、通道(channel)、单元(cell)和标准算法(XDAIS algorithm)。

它们之间的关系如图3所示。

通常，一个任务中可以包括一个或多个通道，每个通道中可以包括一个或多个单元，而每个单元中则封装有一个XDAIS算法。

单元封装XDAIS算法的作用在于：提供算法与外部世界的一个标准接口，每个单元执行一个简单的ICELL接口，通过该接口执行算法。

利用通道可以按序执行多个单元，在典型应用中，多个通道可能包含一套执行功能相同的单元序列。

利用任务可以同时处理一个或多个通道，其目的在于组织任务间的数据通信和设备驱动会话等。

与通道不同的是，任务有具体的执行代码，并需要用户自己编写。

该部分代码通常是从外界接收数据、控制通道执行等。

每个任务总是反复执行自己的代码，完成检查控制信息、获得数据、执行通道、发送数据等操作。

1．2RF5中数据通信RF5中的数据通信包括task级通信和cell级通信。

其通信机理为使用结构体进行信息传递，而非通过全局变量传输处理数据。

1．2．1t ask级通信任务级通信主要用到了SCOM消息队列和邮箱(MBX)。

第四章 TMS320系列DSP芯片

A23~0 D31~0
定时器 0 定时器 1 串口0 串口1
8 扩展精度寄存器 8 辅助寄存器 2 变址寄存器地址产生 1 地址产生 2 12 控制寄存器
TMS320C30的功能结构图
大连理工大学出版社
4.2
4.2.2
1
位
0 1
TMS320C3x系列 DSP
寄存器状态寄存器
寄存器与存储器
名称
C V
大连理工大学出版社
4.1
4.1.2
1
TMS320C2xx系列 DSP
TMS320F206引脚与兼容性
TMS320F206 的引脚
A12~A15 A8~A11 A7 A4~A6 A0~A3
71~74 O/Z 66~69 64 60~62 55~58 ... ...
地址线，对片外的数据空间、I/O空间访问时，F206驱动地址线， OFF＝ 0时为高阻。
大连理工大学出版社
4.1
4.1.1
TMS320C2xx系列 DSP
TMS320F206的结构特点
7) 程序空间和数据空间之间可进行数据搬移； 8) 8级内部硬件堆栈，存放调用/中断返回地址； 9) 片内设备： 16bit定时器；片上软等待产生器：可以分别为PS、DS、 IS空间产生0~7个等待；片上振荡器和锁相环，有倍频和分频功能：乘1、乘2、乘4、除2；6个通用I/O引脚；全双工异步串口UART；增强的同步串口，带4级FIFO。 10) 硬件等待； 11) 休眠的IDLE模式，低功耗； 12) 标准的IEEE1149.1仿真口； 13) 100脚表贴器件。
大连理工大学出版社
4.1
4.1.3
1
位 15~13

2、TMS320系列DSP的介绍

OMAP OMAP5910
TM
C5441 532 MIPS C5421 200 MIPS C5420 200 MIPS C5470
C54xTM+ARM7
C54xTM DSP
World’s Most Popular DSP Over 500 Million Shipped $5 Billion in Design-ins
C5407 120 MIPS
Feature Integration
TI所推进的开放式多媒体应用平台所推进的开放式多媒体应用平台
Open Multimedia Application Platform 处理器
Applications Processor Integrated Baseband and Applications Processor
TV 因特网浏览器无线 AP
?
高集成度的OMAP5910提供单片系统功能提供单片系统功能高集成度的
OMAP5910 Core TMS320C55xTM DSP
DSP
32 16
3 Timers Watchdog Timer Interrupt Handler 2 McBSP 2MCSI 3 UART 18 GPIO 4 Mailboxes
TMS320C6000DSP的应用
同一应用的多通道复用
蜂窝基站复用的调制解调器中央办公交换机多信道线路回声抵消多信道话音编码器 Head end cable modem 中央办公XDSL
TMS320C6000DSP的应用（续）
视频图象的压缩、处理、传输
远程监控（PSTN/ISDN/ADSL) 网络视频终端数码摄像机
Traffic Ctrl 75 MHz

TMS320C55x的DSP概括

DSP原理与应用大作业姓名：潘俊涛班级：应电121班学号：1204141192014年6月第1部分概述一、DSP简介；当德州仪器（TI）公司在1982年研发出第一款商用数字信号处理器是，谁也不会想到它竟能给世界带来如此大的变化。

从移动通信到消费电子领域，从汽的第一代数字信号处理器仅包含了55000个晶体管，4KB内存处理指令只有5MIPS （每秒百万条）,经过二十余年的发展，单核数字信号处理器的处理能力已经达到9600MIPS的惊人速度，寻址能力高达1280MB。

而第三代数字信号处理器则以其强大的数字信号处理能力、超低功耗和适合手持设备的超小型封装的等特点，较好的满足了新一代电子产品的要求。

二、DSP的发展；20世纪60年代以来，随着信息技术的不断进步，数字信号处理系统也应运而生并得到迅速的发展。

80年代以前，由于方法的限制，数字信号处理技术处于理论研究阶段，还得不到广泛的应用。

在此阶段，人们利用通用计算机进行数字滤波、频谱分析等算法的研究，以及数字信号处理系统的模拟和仿真。

实施数字信号处理对数字信号处理系统的处理能力提出了严格的要求，所有运算、处理都必须小于系统可接受的最大时延。

典型的数字信号处理系统的基本部分：抗混叠滤波器、模/数转换器、数字信号处理、数/模转换器和抗镜像滤波器。

以下几种问为当前实用的数字信号处理系统:1、利用X86处理器完成实时数字信号处理2、利用通用微处理器成实时数字信号处理3、利用可编程逻辑阵列（FPGA）进行成实时数字信号处理4、利用数字信号处理器（DSP）实现数字信号处理三、DSP的特点；DSP系统的应用领域极其广泛，目前主要的应用领域如下：基本信号处理、通信、语音、图形图像、军事、仪器仪表、控制、医疗和家用电器。

DSP最大的应用领域是通信，并且军事领域是高性能DSP的天地。

众所周知，微处理器的存储结构分为两大类：冯.诺伊曼结构和哈弗结构。

DSP广泛使用冯.诺伊曼结构。

TMS320 DSP

Application ReportSPRA577B Using the TMS320 DSP Algorithm Standard in aStatic DSP System Carl Bergman Digital Signal Processing Solutions AbstractThe TMS320 DSP Algorithm Standard is part of TI's eXpressDSP (XDAIS) technologyinitiative. It allows system designers to easily integrate algorithms from a variety ofsources (e.g., third parties, customers). However, in system design, flexibility comes witha price.This price is paid in CPU cycles and memory space, both critical in all DSP systems, but perhaps most critical in a static system. For this application note, a static system isdefined as one in which memory is allocated once and is used for the remainder of thesystem's life – there is no effort to reclaim or reuse memory. In contrast, a dynamicsystem is one in which the memory is reused while the application is executing. Adynamic system takes advantage of the available memory by sharing it betweenalgorithms, by reclaiming it when an algorithm is deactivated, and by reusing it whenanother algorithm is activated.Algorithms that comply with the TMS320 DSP Algorithm Standard are tested andawarded an eXpressDSP compliant mark upon successful completion of the test. Thisapplication note shows how an eXpressDSP-compliant algorithm may be used effectively in a static system with limited memory. It examines some optimizations and illustratesthem with a very simple example: an algorithm that copies input to output. The impact interms of code size, data size, and CPU cycles will be demonstrated.ContentsTheory of Operation (3)Review of TMS320 DSP Algorithm Standard Fundamentals (3)Naming Conventions (3)Interface Function Summary (4)Sequence of Builds (6)Build 1: No eXpressDSP Interface – Access Algorithm Directly (6)Build 2: Using the High Level Interface, 'CPY' (8)Build 3: Using and Removing Subsections in the Linker Command File (10)Build 4: Removing the Code from the CPY High-Level Interface (12)Build 5: Using Only the SPI – Creating the Object at Design Time (12)Conclusion (13)References (14)TI Contact Numbers (15)FiguresFigure 1.Test Program (6)Figure 2.Build 1 Code Size (8)Figure ing the TMS320 Algorithm Standard Interface (9)Figure 4.Build 2 Code Size (10)Figure 5.Define Subsections (11)Figure 6.NOLOAD Section in Linker Command File (11)Figure 7.Build 3 Code Size (11)Theory of OperationThe TMS320 DSP Algorithm Standard provides a general-purpose interface that allowsefficient use of a large variety of algorithms in a large variety of systems. However, thefull capability of the interface may not be useful in all systems. In a static system wemight allocate memory at design-time and initialize the algorithm at power-on and neverchange anything else. In such a system, the code implementing the create and deletefunctions, although never used, would take up valuable memory.This application note follows an example program through a sequence of steps aimed at reducing code size by linking only the required functions. The unused code is assigned toa subsection that will not be loaded by the linker. The steps in the examples involveincrementally more programming effort. The result is that the code is smaller and lessmemory is used.We begin with a typical implementation of the interface and then illustrate the processwith several optimizations. The first build provides a baseline for comparison. It calls the algorithm directly with no algorithm standard interface. The second build adds the fullalgorithm standard interface. The remaining examples simplify the use of the interfaceand recover the memory from the unused functions.Review of TMS320 DSP Algorithm Standard Fundamentals Some of the key structures of an eXpressDSP-compliant algorithm are:X Memory Table: Describes what memory the algorithm needs in order to operateX Creation Parameters: Describes how the algorithm should be initializedX Status Information: Describes the current state of the algorithmX Function Table: Describes the operations available for the algorithmThere are two levels of access to the algorithm:1) The service provider interface (SPI) provides the most direct access.2) The application programmer's interface (API) provides an alternate, more convenientinterface.The high-level functions of the API use the SPI to create and control the algorithm and to process data.Naming ConventionsThe TMS320 Algorithm Standard naming convention ensures that implementations of the same algorithm from different vendors can co-exist without duplicate symbols. This ismade possible by defining a two-part prefix to external symbols. Part one of the prefixrepresents the algorithm and part two represents the vendor.In our example, the symbol for the 'copy' algorithm is the mnemonic 'CPY'. The symbol for the vendor “Texas Instruments” is the acronym 'TI'. This yields the prefix 'CPY_TI_'.An example of a function name using this prefix would be CPY_TI_create(). This name indicates that TI implements the create function for the copy algorithm.An example of an interface name would be 'CPY_TI_ICPY'. This name indicates that TI implements the interface to the copy algorithm (ICPY) for the copy algorithm. This maysound redundant, but there are other possible interfaces to the copy algorithm. Forexample, the test interface (ITST) in this example would be named 'CPY_TI_ITST'. Interface Function SummaryThe functions that implement the two levels of access (API and SPI) may be organizedaccording to whether they apply to all algorithms (generic), apply to a specific algorithm (algorithm-specific), or apply to a specific implementation of an algorithm (vendor-specific). The naming convention helps here as well. The generic create function would be ALG_create(). The algorithm-specific create function would be CPY_create() with the copy algorithm mnemonic as a prefix. The vendor-specific function (if TI was the vendor) would have the name CPY_TI_create().Functions beginning with 'CPY_' (Algorithm Specific API)The algorithm-specific API is the most convenient access to the algorithm and is asuperset of the TMS320 algorithm standard API.CPY_activate()Prepare the algorithm to runCPY_control()Command and status mechanismCPY_create()Allocate memory and initialize a new algorithm instanceCPY_deactivate()Prepare the algorithm to be inactive or possibly deletedCPY_delete()Remove algorithm instance and deallocate the memory usedCPY_exit()Finalize module other than deleting algorithm instanceCPY_init()Initialize module other than creating algorithm instanceCPY_process()Process dataFunctions Beginning With 'ALG_' (Standard API)The following do not include the algorithm-specific processing function calls.ALG_activate()Prepare the algorithm to runALG_control()Command and status mechanismALG_create()Allocate memory and initializes a new algorithm instanceALG_deactivate()Prepare the algorithm to be inactive or possibly deletedALG_delete()Remove algorithm instance and deallocate the memory usedALG_exit()Finalize module other than deleting algorithm instanceALG_init()Initialize module other than creating algorithm instanceThe CPY_IALG Interface (Standard SPI)The IALG interface functions are described in the comments field of the ialg.h file./** ======== IALG_Fxns ========* This structure defines the fields and methods that must be supplied by* all XDAIS algorithms.** implementationId - unique pointer that identifies the module* implementing this interface.* algActivate() - notification to the algorithm that its memory* is "active" and algorithm processing methods* may be called. May be NULL; NULL => do nothing.* algAlloc() - apps call this to query the algorithm about* its memory requirements. Must be non-NULL.* algControl() - algorithm specific control operations. May be* NULL; NULL => no operations supported.* algDeactivate() - notfication that current instance is about to* be "deactivated". May be NULL; NULL => do nothing. * algFree() - query algorithm for memory to free when removing* an instance. Must be non-NULL.* algInit() - apps call this to allow the algorithm to* initialize memory requested via algAlloc(). Must* be non-NULL.* algMoved() - apps call this whenever an algorithms object or* any pointer parameters are moved in real-time.* May be NULL; NULL => object can not be moved.* algNumAlloc() - query algorithm for number of memory requests.* May be NULL; NULL => number of mem recs is less* then IALG_DEFMEMRECS.*/The CPY_ICPY Interface (Standard SPI Plus Algorithm Extensions) The algorithm extensions provide the data processing function.cpyProcess()Copy data from input buffer to output bufferThe CPY_TI_ICPY Interface (Standard SPI Plus Algorithm Extensions Plus Vendor's Extensions)The copy algorithm has no vendor extensions.Sequence of BuildsBuild 1: No eXpressDSP Interface – Access Algorithm Directly The first build is for comparison purposes. The test program accesses the algorithmdirectly. The copy algorithm only needs the count field in the object and the input andoutput data buffers. Note that the eXpressDSP header files are included to support theuse of the ICPY_TI_Obj structure, which is expected by the algorithm. Figure 1 showsthe code from main() of the test program.The system resources used are measured in terms of code size and CPU cycles. Thecode size is shown in the excerpt from the linker map file in Figure 2. The CPU cycles for the data processing function are determined with the profiler in Code Composer Studio[1].Program Memory Used37,408 bytesData Memory Used4,480 bytesCPU Cycles Used20441 (average of 3 runs on a C6201 EVM card)Figure 1.Test Program/** build1.c*/#include <stdio.h> // access to printf()#include <std.h> // basic data types#include <xdais.h> // XDAIS data types#include <ialg.h> // IALG standard#include <icpy.h> // ICPY standard#include <icpy_ti.h> // ICPY implementation#include <copydata.h> // algorithm implementation/* test data */#define COPY_COUNT 16#define BUFFER_SIZE 80Char * testString = "eXpress DSP Algorithm Standard";Char buffer[BUFFER_SIZE];Int main(){ICPY_TI_Handle handle;ICPY_TI_Obj cpyObj;Char *cp, *input, *output;Int i;printf("build1 1999 0802 1036\n");/* init test buffers ---------------------------------------- */input = testString;output = buffer;/* clear output buffer */cp = output;for (i = BUFFER_SIZE; i > 0; i--) {*cp++ = (Char)0;}printf("input: %s\n", input);printf("output: %s\n", output);/* init the algorithm ---------------------------------------*/handle = (ICPY_TI_Handle)&cpyObj;/* if create failed then exit (can't happen but keep consistent) */ if (handle == NULL) {printf ("object creation failed\n");exit(1);}else {printf ("object created\n");}/* use the algorithm ----------------------------------------*//* set the count of bytes to copy */printf ("cpyControl\n"); // for fair code comparisonprintf ("ICPY_SET_COPY_COUNT\n"); // for fair code comparisoncpyObj.count = 5;i = 1; // place profile point here/* do the copy operation */printf ("cpyProcess\n"); // for fair code comparisoncopyData(handle, input, output);i = 0; // place profile point here/* report results ------------------------------------------- */printf("copy %d bytes, output: %s\n", cpyObj.count, output);printf("end of build1\n");/* for fair code size comparison: from the control function */ printf ("ICPY_READ_STATUS\n");printf ("ICPY_WRITE_STATUS\n");printf ("default case!\n");return(0);}Figure 2.Build 1 Code SizeMEMORY CONFIGURATIONname origin length used attributes fill-------- -------- --------- -------- ---------- -------- PMEM 00000000 000010000 00000000 RWIXEXT0 00400000 000040000 00000000 RWIXEXT1 01400000 000300000 00000000 RWIXEXT2 02000000 000400000 00009220 RWIXEXT3 03000000 000400000 00000000 RWIXDMEM 80000000 000010000 00001180 RWIXSECTION ALLOCATION MAPsection page origin length input sections-------- ---- ---------- ---------- ----------------.text 0 02000000 00008500.cinit 0 02008500 00000414.cio 0 02008914 00000120 UNINITIALIZED.far 0 02008a34 000007ec UNINITIALIZED.stack 0 80000000 00000800 UNINITIALIZED.bss 0 80000800 00000054 UNINITIALIZED.const 0 80000854 0000012c.sysmem 0 80000980 00000800 UNINITIALIZEDBuild 2: Using the High Level Interface, 'CPY'The second build represents a baseline of using the copy algorithm in a static systemwith an algorithm standard interface.The code is built using Code Composer Studio (CCStudio) with the followingcomponents:Build2.mak Code Composer Studio Project FileBuild2.c Test Programmem.c Memory Allocation Utilitycpy.c Algorithm Specific High Level Interfacealg.c Standard High Level Interfacecpy.lib Algorithm LibraryBuild2.cmd Linker Command FileSystem resources used:Build 1Build 2ChangeProgram Memory (bytes)37408404563048Data Memory (bytes)44804615135CPU Cycles2044121343902The code in Figure 3 is an excerpt from the function main() in build2.c and shows thefollowing steps:1) Using the high level CPY API, an instance of the algorithm is created.2) The copy count is then changed from the default value with a control call and thecopy process is run on the input and output buffers.3) Finally, a control call is made to retrieve status, which in our case simply proves wecan find out what the copy count was.Figure ing the TMS320 Algorithm Standard Interface/* init the algorithm --------------------------------------- *//* create an instance of the algorithm object */handle = CPY_create(&CPY_ICPY, &paramDefaults);/* if create failed then exit */if (handle == NULL) {printf ("object creation failed\n");exit(1);}else {printf ("object created\n");}/* use the algorithm ---------------------------------------- *//* set the count of bytes to copy */CPY_control(handle, ICPY_SET_COPY_COUNT, (Void *)5);i = 1;/* place profile point here *//* do the copy operation */CPY_process(handle, input, output);i = 0;/* place profile point here *//* read back the copy count */CPY_control(handle, ICPY_READ_STATUS, &status);Figure 4 is an excerpt from the Build 2 linker map file.Figure 4.Build 2 Code SizeMEMORY CONFIGURATIONname origin length used attributes fill-------- -------- --------- -------- ---------- -------- PMEM 00000000 000010000 00000000 RWIXEXT0 00400000 000040000 00000000 RWIXEXT1 01400000 000300000 00000000 RWIXEXT2 02000000 000400000 00009e08 RWIXEXT3 03000000 000400000 00000000 RWIXDMEM 80000000 000010000 00001207 RWIXSECTION ALLOCATION MAPsection page origin length input sections-------- ---- ---------- ---------- ----------------.text 0 02000000 00009080.cinit 0 02009080 0000047c.cio 0 020094fc 00000120 UNINITIALIZED.far 0 0200961c 000007ec UNINITIALIZED.stack 0 80000000 00000800 UNINITIALIZED.bss 0 80000800 0000008c UNINITIALIZED.const 0 8000088c 0000017b.sysmem 0 80000a08 00000800 UNINITIALIZEDBuild 3: Using and Removing Subsections in the Linker Command FileIn the next build, the code is the same as Build 2 (refer to Figure 3). We now addpragma directives to all the interface levels to assign an input subsection for eachfunction statement (see Figure 5). This allows us to selectively include or exclude the subsections in the link process.System resources used:Build 1Build 3ChangeProgram Memory (bytes)37408394322024Data Memory (bytes)44804615135CPU Cycles2044121132691Figure 5.Define Subsections#pragma CODE_SECTION(CPY_activate, ".text:algActivate")#pragma CODE_SECTION(CPY_apply, ".text:algApply")#pragma CODE_SECTION(CPY_control, ".text:algControl")#pragma CODE_SECTION(CPY_create, ".text:algCreate")#pragma CODE_SECTION(CPY_deactivate, ".text:algDeactivate")#pragma CODE_SECTION(CPY_delete, ".text:algDelete")#pragma CODE_SECTION(CPY_exit, ".text:algExit")#pragma CODE_SECTION(CPY_init, ".text:algInit")Subsections are selected in the linker command file. By specifying a NOLOAD outputsection, the unused code is removed from the program image (see Figure 6). The code is built the same way as in Build 2.Figure 6.NOLOAD Section in Linker Command FileSECTIONS{....notUsed {* (.text:algActivate)* (.text:algApply)* (.text:algDeactivate)* (.text:algDelete)* (.text:algExit)* (.text:algInit)* (.text:algMoved)* (.text:algNumAlloc)} type = NOLOAD >EXT3...}Figure 7 is an excerpt from the Build 3 linker map file.Figure 7.Build 3 Code SizeMEMORY CONFIGURATIONname origin length used attributes fill-------- -------- --------- -------- ---------- -------- PMEM 00000000 000010000 00000000 RWIXEXT0 00400000 000040000 00000000 RWIXEXT1 01400000 000300000 00000000 RWIXEXT2 02000000 000400000 00009a08 RWIXEXT3 03000000 000400000 00000400 RWIXDMEM 80000000 000010000 00001207 RWIXSECTION ALLOCATION MAPsection page origin length input sections-------- ---- ---------- ---------- ----------------.text 0 02000000 00008c80.cinit 0 02008c80 0000047c.cio 0 020090fc 00000120 UNINITIALIZED.far 0 0200921c 000007ec UNINITIALIZED.stack 0 80000000 00000800 UNINITIALIZED.bss 0 80000800 0000008c UNINITIALIZED.const 0 8000088c 0000017b.sysmem 0 80000a08 00000800 UNINITIALIZED.notUsed 0 03000000 00000400 NOLOAD SECTIONBuild 4: Removing the Code from the CPY High-Level InterfaceIn the fourth build, we replace the calls to the CPY interface (CPY_* functions) withmacros that call the standard API (ALG_* functions) and the SPI. The three macrosshown replace the corresponding function calls to CPY_control(), CPY_create() andCPY_process().#define CPY_CONTROL(alg, cmd, status) \((alg->fxns->ialg.algControl)((IALG_Handle)alg, cmd, status));#define CPY_CREATE(fxns, prms) \(CPY_Handle)ALG_create((IALG_Fxns *)fxns, (IALG_Params *)prms);#define CPY_PROCESS(alg, input, output) \(alg->fxns->cpyProcess)((ICPY_Handle)alg, input, output);This allows us to eliminate the file cpy.c from our build. The rest remains the same.System resources used:Build 1Build 4ChangeProgram Memory (bytes)37408392241816Data Memory (bytes)44804611131CPU Cycles2044120725284Build 5: Using Only the SPI – Creating the Object at Design TimeIn Build 5, we remove the remaining API code in alg.c from the program. We can do this because we are going to 'create' the object and declare the data structures the algorithm requires at design time in the test program. Four steps are required for this build:1) Allocate the space for the memory descriptor table.memTab =(IALG_MemRec *)malloc(sizeof(memTab[IALG_DEFMEMRECS]));2) Plug in the addresses of our allocated object and working memory to the memorydescriptor table.memTab[CPY_OBJ_DATA].base =(void *)malloc(sizeof(ICPY_TI_Obj));memTab[CPY_DATA_RAM].base =(void *)malloc(sizeof(cpyDataRam));3) Set the value of our handle to the algorithm. We also set the address of the functiontable in the object. Previously ALG_create() set the function table address andreturned the value of our handle.handle = (CPY_Handle)memTab[CPY_OBJ_DATA].base;handle->fxns = &CPY_ICPY;4) Initialize the algorithm. For this, we call the SPI directly with the parameters itexpects. If the initialization fails, the handle is set to NULL.if (handle->fxns->ialg.algInit((IALG_Handle)handle, memTab,NULL, (IALG_Params *)&paramDefaults) != IALG_EOK) {handle = NULL;}Now with alg.c and mem.c removed from the program (memory allocation is no longerused) and with the subsection .text:algAlloc placed in the .notUsed section, we build theprogram as before.System resources used:Build 1Build 5ChangeProgram Memory (bytes)3740838232824Data Memory (bytes)44804611131CPU Cycles2044120653212ConclusionThese build techniques allowed us to reduce the program memory overhead for using the algorithm standard from 3048 bytes to 824 bytes, a 70% reduction. This wasaccomplished by using only the service provider interface (SPI) and by placing unusedcode in a NOLOAD section.Build 1Build 2Build 3Build 4Build 5Program Memory3740840456394323922438232Change304820241816824Percent of XDAIS100.00%66.40%59.58%27.03%The data memory was not really affected, with a change of only 4 bytes from Build 3 toBuild 4.Build 1Build 2Build 3Build 4Build 5Data Memory44804615461546114611Change From Build 1135135131131The CPU cycle count for the data processing call was measured with the profiler in Code Composer Studio. Because our copy algorithm has only 160 bytes of code, it is important to note that the percentage of overhead of the algorithm standard interface in a morerealistic algorithm would be much smaller than what is shown.With that in mind, the most direct use of the SPI gives us a cycle count of 212 – a littlemore than 1% of the total cycles used in the data processing call. This is less than 24%of the 902 cycles used with the full algorithm standard interface in Build 2.Build 1Build 2Build 3Build 4Build 5CPU Cycles2044121343211322072520653Change902691284212Percent of Total 4.41% 3.38% 1.39% 1.04%In an actual case with a G.723 algorithm, the processing call takes an average of375,000 cycles, and the overhead of 212 cycles would be less than 0.1%. The overhead of the full interface at 902 cycles would be less than 0.25%.Finally, the following chart summarizes the improvements in program memory for the examples given.eXpressDSP Overhead On Program Memory1000200030004000Build 2Build 3Build 4Build 5B y t e s References1.Code Composer Studio User's Guide , SPRU328.2.TMS320C6000 Assembly Language Tools User's Guide , SPRU186.3.TMS320 DSP Algorithm Standard Rules and Guidelines , SPRU352.4. TMS320 DSP Algorithm Standard API Reference, SPRU360.TI Contact NumbersINTERNETTI Semiconductor Home Page/scTI Distributors/sc/docs/distmenu.htm PRODUCT INFORMATION CENTERS AmericasPhone+1(972) 644-5580Fax +1(972) 480-7800Email sc-infomaster@ Europe, Middle East, and Africa PhoneDeutsch+49-(0) 8161 80 3311 English+44-(0) 1604 66 3399 Español+34-(0) 90 23 54 0 28 Francais+33-(0) 1-30 70 11 64 Italiano+33-(0) 1-30 70 11 67 Fax+44-(0) 1604 66 33 34 Email epic@JapanPhoneInternational+81-3-3344-5311 Domestic0120-81-0026FaxInternational+81-3-3344-5317 Domestic0120-81-0036Email pic-japan@ AsiaPhoneInternational+886-2-23786800 DomesticAustralia1-800-881-011TI Number-800-800-1450China10810TI Number-800-800-1450Hong Kong800-96-1111TI Number-800-800-1450India000-117TI Number-800-800-1450Indonesia001-801-10TI Number-800-800-1450Korea080-551-2804Malaysia1-800-800-011TI Number-800-800-1450New Zealand000-911TI Number-800-800-1450Philippines105-11TI Number-800-800-1450Singapore800-0111-111TI Number-800-800-1450Taiwan080-006800Thailand0019-991-1111TI Number-800-800-1450Fax886-2-2378-6808Email tiasia@TI is a trademark of Texas Instruments Incorporated.Other brands and names are the property of their respective owners.IMPORTANT NOTICETexas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue any product or service without notice, and advise customers to obtain the latest version of relevant information to verify, before placing orders, that information being relied on is current and complete. All products are sold subject to the terms and conditions of sale supplied at the time of order acknowledgment, including those pertaining to warranty, patent infringement, and limitation of liability.TI warrants performance of its semiconductor products to the specifications applicable at the time of sale in accordance with TI’s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by government requirements.Customers are responsible for their applications using TI components.In order to minimize risks associated with the customer’s applications, adequate design and operating safeguards must be provided by the customer to minimize inherent or procedural hazards.TI assumes no liability for applications assistance or customer product design. TI does not warrant or represent that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellectual property right of TI covering or relating to any combination, machine, or process in which such semiconductor products or services might be or are used. TI’s publication of information regarding any third party’s products or services does not constitute TI’s approval, warranty or endorsement thereof.Copyright © 2000, Texas Instruments Incorporated。

第2章 TMS320C2000系列DSP芯片的基本结构及性能

TI公司28x系列DSP的发展趋势。
安徽工程大学电气工程学院
安徽工程大学电气工程学院
2.1 C28x Piccolo系列基本结构及性能
2008年10月，TI发布了基于C2000 DSP的 Piccolo系列，取自意大利语“风笛”，是以小巧、低成本、高集成度为主要特点的32位微控制器，采用最新的架构技术成果和增强型外设，能够提供一款低成本的高集成度解决方案，有助于在成本敏感型应用中实现处理器密集型的32位实时控制功能。 Piccolo系列可提供多种封装版本和外设选项，实现了高性能、高集成度、小尺寸以及低成本的完美组合。
安徽工程大学电气工程学院
2.1.1 F2802x系列
F2802x Piccolo系列为C28x内核供电，此内核与低引脚数量器件中的高集成控制外设相耦合。该系列的代码与以往基于 C28x的代码相兼容，并且提供了很高的模拟集成度。 F2802x 系列速度为40-60MHz，配有多达64KB Flash，属于低成本入门级产品。
•
安徽工程大学电气工程学院
2.2.1 F2833x系列
F2833x系列速度为100-150MHz，Flash多达 512KB。它们是针对要求严格的控制应用的高度集成、高性能解决方案。
安徽工程大学电气工程学院
2.2.2 C2834x系列
C2834x系列性能翻倍，达到300MHz，但是此系列解决方案仅限于基于RAM的存储，RAM可达到 516KB。
安徽工程大学电气工程学院
C2000是一种注重实时控制应用的微控制器系列，应用范围包括数字电源、数字电机控制、位置传感、汽车雷达等。
C2000器件核心是一个32位C28x CPU，其频率范围介于40400MHz之间，外加浮点单元，部分器件还配有控制律加速器（CLA），它实际上成为与CPU并行运行的第二个内核，能够独立地控制外设。目前在TMS320C2000系列产品中，TI主要推出了四个系列主流产品，即使用广泛的C28x定点系列、低成本与高创新的C28x Piccolo系列、C28x Delfino浮点性能系列以及基于 C28x和ARM Cortex-M3的Concerto多核系列。图2-1给出了

DSP——TMS320C数字图像处理方案

目录摘要 (I)Abstract (II)1绪论 (1)1.1图像处理的研究背景 (1)1.2图像处理国内外研究现状 (2)1.3 图像处理研究内容及意义 (4)1.3.1图像处理研究内容 (4)1.3.2本文的研究意义 (5)1.4 小结 (6)2 基于DSP的开发系统 (7)2.1 DSP系统简介 (7)2.2 DSP芯片 (7)2.2.1 DSP芯片的基本结构 (8)2.2.2 DSP芯片的种类 (8)2.2.3世界主要的DSP芯片制造公司及其产品 (9)2.2.4 DSP发展现状及应用简介 (10)2.2.5 DSP技术展望 (12)2.3 DSP芯片的特点 (12)2.4图像处理中DSP芯片的选择 (15)2.5基于DSP的图像处理系统 (16)3 CCS开发环境的应用与仿真 (17)3.1 CCS的安裝及简介 (17)3.1.1 CCS简介 (17)3.1.2 CCS的安装使用 (19)3.1.3 CCS的配置与使用 (21)3.2仿真处理分析 (22)4基于DSP的图像处理 (24)4.1图像处理的基本概念 (24)4.2图像处理的硬件系统 (24)4.2.1 TMS320C6000 DSP芯片的硬件系统 (24)4.2.2 TMS320C6000的硬件结构简介 (26)4.2.3试验平台评估 (28)4.3基于DSP的图像处理实现 (29)4.3.1图像直方图统计 (29)4.3.2数字图像边缘检测sobel 算子 (30)4.3.3数字图像锐化laplace 算子 (32)4.3.4图像取反 (35)4.3.5数字图像直方图均衡化增强 (36)4.4试验及结果分析 (37)结论 (42)致谢 (43)参考文献 (44)附录 (45)1绪论1.1图像处理的研究背景数字图像处理又称为计算机图像处理在国外最早出现于20世纪50年代，当时的电子计算机已经发展到一定水平，人们开始利用计算机来处理图形和图像信息。

2.DSP学习进阶

DSP学习进阶学习TI的各种DSP，本着循序渐进的原则，可以分为多个层次在这里总结一下各个层次的进阶：1、DSP2000（除了2812）：进阶：标准C -> C和汇编混合编程说明：把DSP2000当作单片机来玩就可以了，非常简单。

2、DSP5000（包括DSP2812）主要：标准C -> C和汇编混合编程-> DSP/BIOS -> RF3说明：DSP5000是个中等产品，性能不高不低，基本上也没有开发难度。

3、DSP6000主要：标准C -> C和汇编混合编程-> DSP/BIOS -> XDAIS -> RF5 说明：DSP6000的开发难度明显增大，不论是硬件还是软件。

还分为两种档次：（1）DSP62XX & DSP67XX：开发这两类DSP，硬件上会初步遇到信号完整性问题，软件方面来说，DSP/BIOS是必需的，复杂的程序还需要XDAIS和RF3、RF5的知识。

（2）DSP64XX：开发难度比较大，硬件方面需要重点考虑系统合理架构问题，信号完整性问题；软件方面，需要综合运用各种比较先进、专业的知识，例如用DSP/BIOS作为RTOS，用RF5作为程序架构，尽量采用MiniDriver来编写底层驱动程序等。

如果深入编程，还会遇到令人困惑的Cache冲突问题（虽然TI最近专门针对这个难题升级了CCS），等等。

另外还有一些辅助知识，根据自己需要可以选学：1、GEL：推荐所有阶段的开发者都要学；2、RTDX：一般来说没有必要学习；3、CCS中的C++面向对象编程技术：不建议采用；4、CSL：对于DSP6000以上的开发，必须的；5、各种DSP库函数：对于复杂算法程序，建议学习。

第四章 TMS320系列DSP芯片概述

大连理工大学出版社
4.1
1
TMS320C2xx系列 DSP
4.1.2 TMS320F206引脚与兼容性
TMS320F206 的引脚
引脚名
引脚类型说明号 D0~D2 38~41 I/O/ 并行数据总线D15[最高有效位 (MSB)] 到D0[最低有效位(LSB)]。 D3~D6 26~29 Z 多路转换TMS320F206和外部数据 D7~D10 31~34 空间/程序空间或I/O空间之间的数 D11 36 据。当无输出(R/ W 为高)、 RS 保 D12~D15 38~41 持, OFF =0时变成高阻。
大连理工大学出版社
4.1
4.1.3 片内资源
1
位
TMS320C2xx系列 DSP
模式寄存器PMST
大连理工大学出版社
4.1
4.1.3 片内资源
1
位 15~13
TMS320C2xx系列 DSP
寄存器及相关功能说明
状态寄存器ST0
名称 ARP 说明当前ARx寄存器号，x＝0~7
12
11
OV
OVM
算术逻辑单元ALU溢出时为1
溢出模式位，OVM＝0时，对溢出不作处理，OVM＝ 1时，将溢出数据置为最大正数/负数，即饱和
12
CNF
片内RAM设置，CNP=0时，片内双存取RAM的B0块和 B1块映射为数据空间。CNF=1，则B0和B1映射为程序空间。复位后，CNP=0
测试/控制位，当BIT，BITT, CMPR, NORM指令测的条件成立时，TC=1, 否则，TC=0。可用于条件跳转、调用和返回。
大连理工大学出版社
...
...
大连理工大学出版社

主要内容TMS320系列DSP发展概况C54x结构特点总线...

标准同步串口
2 0 0 1 1 0
缓冲串口（BSP） 0 1 1 1 1 2
时分多路串口（TDM） 0 1 1 0 0 1
SZU－TI DSPs Lab
Dr. JI ZHEN
20
6.5 标准同步串行口
2个MMR：发送数据寄存器（DXR）接受数据寄存器（DRR）
每个串行口都有相关的时钟、帧同步脉冲及串行口移位寄存器串行数据可按8位字节或16位字转换收发数据操作时，产生可屏蔽收发中断：RINT 和 XINT 软件管理串行口数据传送串行口是双缓冲的最高时钟频率＝CLKOUT/4
见表 1－30
10~3 PLLCOUNT PLL 计数器值。PLL 计数器是一个减法计数器，每 16 个输入时钟
CLKIN 到来后减 1。对 PLL 开始工作之后到 PLL 成为处理器时
钟之前的一段时间进行计数定时。PLL 计数器能够确保在 PLL
锁定之后以正确的时钟信号加到处理器
SZU－TI DSPs Lab
SZU－TI DSPs Lab
Dr. JI ZHEN
15
6.3 PLL的乘系数
PLLNDIV
PLLDIV
0
X
0
X
1
0
1
0
1
1
1
1
*CLKOUT＝CLKIN×乘系数
PLLMUL
0~14 15
0~14 15
0 或偶数奇数
乘系数* 0.5 0.25
PLLMUL+1 1
(PLLMUL+1)2 PLLMUL4
恢复工作。
SZU－TI DSPs Lab
Dr. JI ZHEN

TMS320C54xDSP原理及应用学习心得

TMS320C54xDSP原理及应用学习心得前言TMS320C54xDSP是一款高性能数字信号处理器，它以其出色的运算速度和丰富的资源而广泛用于音视频处理、工业自动化等领域。

在本文中，我将分享我的学习心得和对该DSP的一些应用理解。

硬件架构TMS320C54xDSP采用了Harvard结构，同时拥有两个数据存储器：P(M)Memory和Data(M)Memory。

其中，P(M)Memory用于存放程序代码和常量，Data(M)Memory则用于存放数据和中间结果。

此外，该DSP还具有丰富的外设资源，如定时器、中断控制器、GPIO等。

运算部件TMS320C54xDSP包含多个运算部件，其中最常用的是乘法器和累加器。

乘法器包括一系列独立的32位乘法器和累加器。

而累加器则用于实现多个数据的加减运算。

此外，该DSP还具有一定程度的并行计算能力，即能够同时执行多个指令。

指令集TMS320C54xDSP支持多种指令集，如算术指令、逻辑指令、移位指令等。

在实际应用中，我们可以根据具体需求选择不同的指令集来完成相应的任务，从而提高运算效率和减少功耗。

应用实例下面，我将以数字信号滤波为例，简要介绍TMS320C54xDSP的应用实例。

假设我们要对一段音频数据进行低通滤波，那么我们可以按照以下步骤来进行计算：1.从P(M)Memory中读取滤波器系数。

2.从Data(M)Memory中读取音频数据。

3.将滤波器系数和音频数据送入乘法器中进行乘法运算。

4.将累加器的值累加，并将结果写入Data(M)Memory中。

5.重复步骤2-4，直到所有数据都被处理完毕。

通过使用TMS320C54xDSP，我们可以快速、高效地完成数字信号滤波过程。

同时，该DSP还可以广泛应用于其他领域，如图像处理、控制系统等。

从学习的角度来看，TMS320C54xDSP的掌握不仅可以帮助我们更好地理解数字信号处理的基本原理，也可以提高我们的工程实践能力。

DSP原理及应用-TMS320C54x软件开发

DSP原理及应用TMS320C54x软件开发
数字信号处理（DSP）是一种重要的信号处理技术，在各领域有广泛的应用。本课程将深入介绍DSP原理及TMS320C54x软件开发，帮助您掌握相关知识和技能。
1. DSP概述
介绍数字信号处理的基本概念、作用和应用领域。
2. 数字信号处理基础
讲解数字信号处理的基本原理、采样和量化技术。320C54x系列数字信号处理器的特点和应用领域。
4. TMS320C54x系列特点
详细介绍TMS320C54x系列数字信号处理器的性能和特点。
5. TMS320C54x芯片架构
解析TMS320C54x芯片的内部结构和功能模块。
6. TMS320C54x软件开发环境
介绍TMS320C54x软件开发所需的开发环境和工具。
7. CCS软件环境概述
讲解CCS（Code Composer Studio）软件开发环境的特点和使用方法。
8. DSP算法设计流程
探讨在DSP开发中的算法设计过程和最佳实践。

绪论《TMS320C54XDSP结构、原理及应用》

如JPEG、MC54xDSP可以用于实现各种图像识别算法，如人脸识别、手势
识别等。
在自动控制系统中的应用
控制系统建模与仿真
利用TMS320C54xDSP的高速运算能力，可以实现各种控制系统的建模与仿真。
控制系统分析与优化
利用TMS320C54xDSP的高速运算能力，可以对控制系统进行分析和优化，提高控制系统的性能。
音频处理
音频压缩、音频分析、音频合成等。
控制与自动化
电机控制、智能仪表、自动控制系统等。
TMS320C54xDSP的主要特点
高性能
采用哈佛结构，流水线作业，运算速度快。
低功耗
采用低功耗设计，适合于电池供电和便携式设备。
定点运算
采用定点运算，无需浮点运算器，降低了成本和功耗。
可扩展性
具有可扩展的外部存储器和I/O接口，方便系统集成和升级。
存储器结构
1
TMS320C54xDSP具有内部和外部存储器两种类型的存储器结构。
2
内部存储器包括程序存储器和数据存储器，用于存储程序代码和临时数据。
3
外部存储器通过外部总线接口与DSP相连，提供更大的存储空间。
输入/输出（I/O）接口
01
I/O接口是TMS320C54xDSP与外部设备进行通信的桥梁。
02 TMS320C54xDSP的结构
中央处理单元（CPU）
01
CPU是TMS320C54xDSP的核心部分，负责执行指令和控制数据流。
02
它包括算术逻辑单元（ALU）、累加器、程序计数器、指令寄
存器等组件。
CPU通过指令集架构（ISA）与外部组件进行通信，以执行各种
03

第二章、DSP技术及应用TMS320C5XXX系列DSP结构

2、控制引脚这类引脚提供控制信号，有些引脚是功能复用引脚。 RS:复位引脚 MSTRB:外部数据存储器选通引脚 MP/MC:DSP工作方式选择信号 READY:数据准备好引脚 HOLD:请求控制存储器接口信号
• 地址、数据引脚 20个地址引脚可寻址1M的外部程序空间、64K的外部数据空间、64K的数据 I/O空间。（A0~A19） 16个数据引脚可并行传送16位数据。（D0~D15）
DSP硬件结构图
主要特点
1、CPU（中央处理单元） • 总线（1条程序、3条数据和4条地址总线) • 40bit的逻辑运算单元 • 并行乘法单元和专用加法器 • 比较、选择和存储单元 • 两个地址发生器和8个辅助寄存器 • 指数译码器
2、存储器具有192K的可寻址空间 3、高度专业化的指令集 4、丰富的片内外设 5、有较高的机器周期 6、有多种省电模式 7、具有高速的仿真接口
• • • • • • • • 通用I/O引脚软件可编程状态发生器可编程块切换逻辑主机接口硬件定时器时钟发生器串口外部总线接口
C5402封装图
主要引脚功能 1、电源及时钟引脚 CVDD 电压为＋1.8V DVDD 电压为+3.3V VSS 接地时钟引脚以及外部PLL引脚构成时钟发生电路。 X1：接外部晶体振荡器的一个引脚 X2/CLKIN：接外部晶体振荡器的另一个引脚或外部时钟输入。 PLL:有三个时钟频率配置引脚CLKMD1~3
• 它的片内ROM,DARAM都可以映像到片内程序空间中，当程序地址生成器产生的地址为外部时，会自动对片外存储器寻址。
• 片上4K ROM将映像到F000H－FFFFH • ROM中的数据有TI公司定义。
数据存储器
• 可寻址64K的数据空间 • 片内RAM为16K

34267《TMS320系列DSP原理、结构及应用》党瑞荣

第一章绪论
1.2 DSP芯片概述
2.DSP芯片的特点
（5）．采用特殊的DSP指令 DSP芯片的另一个特点是采用特殊的指令，这些特殊的指令进一步提高了DSP芯片的处理能力。比如TMS320C3x用于卷积和付氏变换的位翻转指令和循环寻址指令，多片DSP间通信的互锁指令等。（6）．硬件功能强大新一代的DSP芯片具有较强的接口功能，除了具有串行口、DMA控制器、软件可编程等待状态发生器等片内外设外，还配置中断处理器、 PLL（锁相环）、片内存储器、测试接口等单元电路，可以方便地构成一个嵌入式自封闭控制的处理系统。
第一章绪论
1.1信号处理技术基础——数字信号处理
数字信号处理的实现一般有以下几种方法：（1）在通用计算机上用软件（如Fortran、C语言）实现，但速度慢，主要用于算法的模拟；（2）在通用计算机系统中加入专用的加速处理器，以增强运算能力和提高运算速度。不适合于嵌入式应用，专用性强，应用受到限制；（3）用单片机实现，用于不太复杂的数字信号处理。不适合于以乘法-累加运算为主的密集型运算；（4）用通用的可编程 DSP 芯片实现，具有可编程性和强大的处理能力，可完成复杂的数字信号处理的算法，在实时 DSP 领域中处于主导地位；（5）用专用的 DSP 芯片实现，可用在要求信号处理速度极快的特殊场合，如专用于 FFT、数字滤波、卷积、相关算法的 DSP 芯片，相应的信号处理算法由内部硬件电路实现。用户无需编程，但专用性强，应用受到限制；（6）用基于通用 DSP核的ASIC芯片实现。
第一章绪论
1.2 DSP芯片概述
2.DSP芯片的特点
图1.2-3 改进的哈佛结构
第一章绪论
1.2 DSP芯片概述
2.DSP芯片的特点

第章TMSLFX系列DSP概述-资料.ppt

• 主要用途（1）电机（交流伺服、直流永磁、开关磁阻）
的鲁棒控制器。
（2）无刷电机的全变速控制。
（3）使用先进的算法可降低传感器的数量。（4）汽车电子制动系统。（5）多电机系统的单处理器控制。（6）与控制算法处理一起完成电源开关转换的
控制。（开关电源）
• 处理速度： LF240x为30MIPS，LF240xA 为40MIPS。
用途：
• （1）利用辅助寄存器进行条件转移； • （2）利用辅助寄存器进行暂存单元； • （3）利用辅助寄存器进行软件计数。ຫໍສະໝຸດ 2.2 存储器和I/O空间
LF/LC240X芯片有16位地址线，可以访问3个独立的地址空间，总计192K字。
• • 程序存储器:64K字节
外部、中断、时钟引脚 • RS*：复位引脚，当RS*为高电平时，从程
序存储器的0地址开始执行程序；当WD定时器溢出时，在RS*脚产生一个系统复位脉冲； • PDPINTA* ：功率驱动保护中断输入，当电机驱动不正常时，如出现过压、过流时，该中断有效，将PWM脚（EVA）置为高阻态。 • XINT1/IOPA2：外中断1或通用IO脚，极性可编程；
• XINT2/ADCSOC/IOPD0：外中断2可做AD 转换开始输或通用IO脚，极性可编程；
• CLKOUT/IOPE0：时钟输出或通用IO脚； • PDPINTB* ：功率驱动保护中断输入，当
电机驱动不正常时，如出现过压、过流时，该中断有效，将PWM脚（EVB）置为高阻态。
2.1 TMS320LF240x系列的CPU功能模块（第二章)
• 指令系统: 信号处理指令、通用控制指令。源代码和目标代码与24x同代产品兼容，源代码与C2x兼容。

TMS320C54x系列DSP内部结构功能笔记

大家在阅读的时候使Word成折叠模式看着会更舒服些。

关于文档里边用到的中文参考资料，附下载地址：/icview-161578-1-1.html前言DSP的本质还是单片机-微机，个人认为，微机类芯片，不论是MCU,ARM还是DSP 甚至是PC上的处理器，我们在学习这类芯片的使用时，首先要学习的便是这些芯片的CPU 核，存储器组织和中断系统。

只有把这几方面的内容掌握之后，你才能说根据其特点来使用这块芯片，Debug时你心中才有数。

至于片上外设，与芯片本身其实并没有太大关系，大可以要使用时再去学习，而且不同芯片的外设其实还都是相通的。

基于此，我把我在学习这类芯片时对于这几块的学习的笔记摘录出来（这篇文档中对是对Ti的TMS320C54x系列的学习笔记），以供大家参考。

我的笔记虽然可能次序，排版，没有市面上的书本组织得好，但我可以说，我决对是面面向应用来写的，而不是为了出书来写的。

市面上的技术类的书，大多数其实只是简单的把官方的文档翻译过来而已，而且在翻译的过程中还省略了很多细节，这些细节其实对于我们的理解和工程开发都是很重要的，另外很多翻译还不准确。

在我的笔记中，中文资料我只是作为一个参考，或者作为一种线索，真正有价值的信息都是来自于我对于官方文档的学习。

每一部分的记录都是以一个新手的态度来写的，这些问题都是新手在学习过程中很可能会遇到或想到的。

虽然排版不好，但当你遇到相关问题或想了解某一块时，把我写的这个Word 文档下载下来，然后使用Word文档的搜索功能搜索自己感兴趣的内容，相信一定会让你得到较为满意的结果的。

不仅是DSP芯片方面的知识笔记，在我向信号处理工程师奋斗的路程上，我对我每一天的学习都作了笔记记录，这些笔记记录都是面向新手，面向细节，面向工程的，都是自己用心的体会。

当有笔记成熟或自成一块时，我会把这些笔记都陆续分享给大家，以供大家在学习或开发过程的参考，希望能为大家节省点时间。

附一张目前自己的笔记文件夹的图。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

TMS320DSP 算法标准(XDAIS)及参考构架RF5综述司群1) 臧英新1) 陶友传1) 罗丹2)(武汉数字工程研究所1) 武汉 430074) (武汉昊昱微电子有限公司2) 武汉 430074)摘要:主要介绍了TMS320DSP 算法标准和TMS320DSP 算法参考架构RF 。

作为eXpressDSP 软件关键组件的算法参考架构RF 和TMS320数字信号处理器算法标准(XDAIS)定义了一系列编程准则和建议,标准化了算法和系统软件的接口,使得算法使用者的系统集成任务大为减轻,产品开发周期相应大为缩短。

关键词:RF5;XDAIS;eXpressDSP 中图分类号:TP31TMS 320DSP Algorithm Standard and the ReferenceFrameworks Level 5for TMS 320DSPSi Qun 1) Zang Yingxin 1) Tao Youchuan 1) Luo Dan 2)(Wuhan Digital En g ineering Institute 1),Wuhan 430074)(Wuhan Haoyu Micro Electronic Co.Ltd 2).,Wuhan 430074)Abstract:The paper mainly in troduces two key components of the eXpressDSP:the reference frameworks 5for TMS320DSP and the TMS320DSP algorith m standard.With a family of rules and guidelines,they decrease dramatically system integration pressure,shorten sys tem R&D period.Key words:RF5,XDAIS,eXpressDSP Class number:TP311 引言今天,DSP 的应用开发已发生了很大的改变。

图1 eXpressDSP 软件及开发工具硬件技术的发展使得DSP 在保持兼容性的同时使得性能不断提高,功耗不断下降,片上集成度大幅增加;DSP 应用复杂度也在大幅提高,原先几百行的汇编软件程序已被几十万行甚至百万行的C 程序替代;市场的压力要求新产品的开发周期越来越短。

使得软件已成为DSP 方案中最为重要的一环,一个项目的成败也往往由软件实现的好坏决定。

作为业界领先的数字信号处理器供应商,TI 提出了eXpressD SP 的概念来应对DSP 软件的挑战。

e Xpress DSP 软件及开发工具见图1。

TI 还提出了推出27总第152期2006年第2期舰船电子工程Ship Electronic EngineeringVol.26No.2收稿日期:2005年9月6日,修回日期:2005年9月26日了一系列DSP软件参考框架(RF)并提供有相关支持,帮助DSP应用设计人员加速软件开发进程。

本文主要介绍TMS320DSP算法标准以及DSP 软件参考框架RF5。

2TMS320DSP算法标准(XDAIS)随着数字信号处理器(DSP)的应用范围不断扩大,对面向组件的软件模块的需求不断增长,第三方提供的现成算法在基本层面上能够适应需求。

第三方算法在DSP系统开发中发挥着非常重要的作用。

DSP厂商建立了管理不同算法与应用之间接口的算法标准。

如TMS320系列DSP,称之为TMS320DSP算法标准。

2.1算法标准的起源20世纪90年代中期,算法标准的需求逐渐显现出来,出现了功能更为强大的DSP,可支持一个算法的多通道,或者同一DSP上的多个算法。

如基于TMS320C6000平台的DSP,能够开发DSL线卡、视频服务器及其他在单个设备上要求具有极高多通道性能的系统。

利用现有更高层次的性能,许多新兴信号处理标准不断涌现,包括JPE G、MPEG、电视会议、无线电话以及调制解调器与传真改进等。

开发商开始创建交互改变任务的动态系统,而不是一般基于DSP的具有固定功能的静态系统。

系统代码规模也开始剧增,以适应新型多功能系统的复杂性。

一些DSP开发商开始作为第三方出售其知识产权,包括算法。

系统集成商将从第三方购买黑盒子!目标代码,并将其加载至系统中,以节省宝贵的开发时间。

但是第三方开发商常常假定DSP用法,以便使其算法尽可能精简,并获得最佳性能效果。

而系统集成商可能无法了解开发商的事先假定是什么。

如果有了上述假定的话,那么两种或多种算法就不能在多功能系统中和平共处。

这样的问题在利用源代码进行再设计时可能相当困难,如果算法来自不同的第三方(事实常常如此),集成商将面临不兼容性难题以及不可避免的相互指责。

到20世纪90年代末,DSP厂商开始发布行为规则,将其编写为第三方软件开发商必须遵循的标准代码,以便保证算法的兼容性。

2.2XDAIS算法规则XDAIS规则分四组,具备基本的校验机制以保证符合标准。

常识性编程规则。

本组规则的作用在于加强算法的便携性、可预测性及易用性。

取消任意选择。

该标准指定了应在各种不同方法中采用何种方法(就好像交通法规指定了在路上应左行还是右行)。

对C6000(TM)平台而言,算法必须至少支持由小到大的字节顺序,或最好两个都支持,以便为系统开发商提供选择。

资源管理。

本组位于该标准的核心。

本规则适用于外部及内部存储器,以及DMA通道等外设。

统一规范。

本组规则有助于系统集成商衡量算法并评估其在系统中的兼容性。

所有的兼容性算法必须表现最坏情况的中断传输时间、典型与最坏情况的执行,以及程序、堆阵、静态和堆栈存储器要求等方面的特点。

2.3XDAI S的发展XDAIS在5年前推出时,其规则还不到30条。

现在它已有了46条规则,这反映出对标准的需求不断发展。

新规则的添加(以及一些改动)出于以下几点原因:新硬件功能。

添加某些规则是为了涵盖硅技术的开发。

未来,XDAI S还可能包括有关硬件加速器作为共享资源使用的规则。

性能优化。

为优化性能,DMA规则已进行了修订,在此,这些规则也展示了XDAIS标准中的另一个变化领域。

由于早期规则解决了重大冲突,因此一些较新的指导方针倾向于帮助开发商更好地发挥系统优势。

新应用领域。

XDAIS的最初指导方针主要是为了处理带有数据流应用的单功能DSP,如语音及音视频等。

但今天的多功能系统常常必须处理突发数据,这些应用的核心和系统要求有时与流应用的不同,而XDAIS规则必须包括两种类型的数据吞吐量。

有一个没有改变的特性,即需要将开销保持在较低水平。

经验显示,DSP客户与第三方将接受不超过一至两个百分点的性能及存储器干扰。

这对于通用微处理器而言是一个较小的开销百分比。

但是,通常每个性能MIP对DSP都是非常关键的,因此TI已努力将XDAI S开销保持在限定范围内。

2.4XDAI S的算法接口所有符合XDAIS标准的算法必须实现一个标准接口,I ALG接口。

IALG接口提供的功能有:对系统存储资源的管理,算法实例的建立,初始化和终止对象。

IALG接口提供了一个I ALG_Fxns的结28司群等:TMS320DSP算法标准(XDAIS)及参考构架RF5综述总第152期构,又称V表,在这个结构中,除了algAlloc(),al gInit()和algFree()是必须的外,其它的函数都是可选的。

algAlloc()实现存储管理;algInit()用来初始化算法实例对象;algFree()在销毁算法实例对象后,释放存储空间。

除了I ALG,XDAIS还要定义一个算法实例接口,该接口包含了算法的实现。

该接口是算法接口的一个实例。

例如:I G723E NC接口,这是Ti的I TU G.723.1编码器接口,即算法接口。

IG723ENC Fxns扩展了I ALG Fxns,如下所示:typedef struct IG723ENC Fxns{IALG Fxns ialg;/*IG723ENC extend s IALG*/XDAS Bool(*con trol)}(IG723E NC Handle handle; IG723C md cmd,IG723E NC Status*s tatus);XDAS Bool(*encode)(IG723ENC Handle handle, XDAS UInt16*in,XDAS UIn t16*out);}IG723ENC Fxns;在这个结构中除了包含IALG Fxns以外,还定义了control()和encode()两个函数。

这两个函数才是与特定算法相关的函数。

大多数情况下,这个接口对实现算法的函数进行了封装。

对于同一算法,因为有不同的实现,对不同的实现都有不同的算法实例对象,为了便于区分而且不会引起名字冲突,通常以<module><vendor>!来命名算法接口的实例。

<module>为算法名,<vendor>是实现厂商或个人所特有的标示符,如:G723E NC TI I G723E NC,即为TI公司的对G723ENC算法的实现。

我们可以把G723E NC TI IG723E NC看作是IG723E NC的一个实例。

如下:#define IALGFXNS \&G723ENC TI IALG, /*module ID*/ \NULL, /*activate*/ \G723ENC TI algAlloc, /*alloc*/ \NULL, /*control*/ \NULL, /*deactivate*/ \G723ENC TI algFree, /*free*/ \G723ENC TI algInit, /**init*/ \NULL, /*moved*/ \NULL /*numAlloc*/ \IG723ENC Fxns G723E NC TI IG723ENC={IALGFXNS, /*IALG functions*/G723ENC TI control,g723E NC TI encode}G723E NC T I IG723ENC;asm(!G723ENC TI IALG.set G723ENC TI IG723ENC!);第一部分是对IALG的初始化并将它包含进G723E NC TI I G723E NC中,然后把G723ENC TI control和G723ENC TI encode两个函数包含进来,就组成了G723E NC TI I G723E NC结构。

G723E NC TI control和G723ENC TI encode这两个函数是算法的实现函数。