原文出处:Madden S,Frankin M,Hellerstein J,et al.TinyDB:an acquisitional query processing system for sensor networks.




在接下去的几页里,我考虑的是两个计算机的CPU,一个是一个复杂指令集计算机( CISC),另一个是精简指令集计算机(RISC)。在详细的设计检查之后,我们比较了两个CPU的性能,并提交了用来提高性能的一些方法的简要概述。最后,我们讨论了关于一般数字系统设计的设计思路。


正如我们前一章提到的,一个典型的CPU通常被分成两部分:数据路径和控制单元。该数据路径由一个功能单元、登记册和内部总线组成,为在功能单元、存储器以及其他计算机组件之间提供转移信息的途径。这个数据途径有可能是流水线,也有可能不是。控制单元由一个程序计数器,一个指令寄存器,控制逻辑,和可能有其他硬或微程序组成。如果数据途径是流水线那么控制单元也有可能是流水线。电脑的CPU是一个部分,要么是复杂指令集计算机( CISC),要么是精简指令集计算机(RISC),有自己的指令




这两个CPU是为了一个带有微程序控制单元的使用非流水线数据路径的复杂指令集计算机( CISC)和一个带有硬控制单元的使用流水线数据路径的精简指令集计算机(RISC)而提出的。这些是两个截然不同指令集架构,数据路径和控制单元的组合。


我们提交的第一个设计就是为一个带有非流水线数据路径和微程序的控制单元的复杂指令集计算机而设计的。我们以介绍指令集构架为开端,它包括CPU的注册设置,教学形式,和处理方式。复杂指令集计算机( CISC)的指令集构架的性质是通过它的内存到内存进行数据存取操作指示8个处理模式,两长指令格式和指令集,来为它们的执行获得重要的运行序列。

我们为实施复杂指令集计算机( CISC)构架而设计一个数据路径。这个数据路径是基于最初描述的7-9节里,并纳入了8-10节里的CPU中。对登记档案,功能单元以及总线进行修改来支持现有的指令集构架。



图1显示了程序员获得的一套复杂指令集计算机( CISC)的寄存器。所有的注册有16位。这个注册文件有8个寄存器,从R0到R7。R0是一个寄存器,当它被作为目的来使用,作为来源和抛弃的结果来使用时她总是提供零价值。




图2给出了CPU的指令格式。通用指令格式的有五个领域。首先,OPCODE是指定的操作。接下去的两个是MODE 和 S,是被用来确定运算的地址。最后两个领域是SRC和DST,分别是3位的来源登记和目的地登记领域。此外,还有一个可选的第二个字母W,随着一些作为一个操作或一个地址的指示而出现的,而不是随着其他出现的。





图2给出了指定通过MODE领域的处理方式。MODE的前两位指定了4中不同的处理类型:注册、立即、索引以及相关的程序计数器PC。MODE的第三位明确是否地址是通过这些被用作间接处理的模式而形成。一个例外就是直接处理,它是通过运用间接立即类型而获得的。否则,如果第三位等于0的,间接处理就不适用,而如果等于1 ,间接处理就适用。对指令的注册类型来说,MONE(2:1)=00和这个W字母是不需要的。因为运算或处理是来自注册。表格的第三栏提供了注册转换为针对一个操作指令的每个处理模式的声明。

如果IR(15:14) 等于10,然后有两个地址被用来正确的指令。通用指令的所有领域,其中包括S和SRC,被用于为所有指令的案件。其中一个地址,无论是来源或目的地,都使用处理模式。如果S等于0,那么来源使用被MODE指定的处理模式,且来源是注册的。如果S等于1那么目的地使用处理方式,且来源是注册的。注册转换为处理结果的描述在在表2第四次和第五次栏已给出了。此外,根据MODE领域的内容,第二个指令字母W是一个地址或立即操作,有可能存在,也有可能不存在。


在进行下一步之前,明确数据路径来支持指令集构架,我们将简要的说明构架的特征来界定是复杂指令集计算机( CISC)或是精简指令集计算机(RISC)。在第9章里给出的大部分操作都被包括在指令集里。一些不会显示的操作是多余的。同样的动作可以通过使用带有显示指令的适当的处理模式来实现。例如,LD, ST, IN, 和 OUT都可以通过使用在内存映射结构里的MOVE指令来实现。通过查看指令的格式,我们发现大部分指令可以从来自内存的操作上进行直接操作。有8个处理模式和两种不同长度的指令格式。此外,有些的指示执行复杂的行动可被视为很可能会超过一个时钟周期执行的步骤的行动。这些特征明确指出这是一个复杂指令集计算机( CISC)的架构。





我们不能进入8个基于在指令集内可用3位登记地址的临时登记册。为了解决这个问题,首先,我们提供了来自微指令的4位注册地址,其次,选择来自这些地址和微指令集的指令之间的微指令位。此外,允许注册地址的灵活性通过DST成为来源和通过SRC 成为目的地,他们是需要操作结果的允可来直接存放在内存中。为了完成这个目标,我们通过增加图4所示的逻辑来修改登记档案。该指令集架构使用两个地址,一个




(0)中的登记地址和指令(0)中的登记地址之间进行选择的。如果一个指令地址被选定了,不管它是被增加的4倍2比1多功能器确定的DST还是 SRC。这个多路复用器是被第二位的DST或是 SRC控制的,取决于它们之间的一个在任何一个微指令中的第一位一个1,从而确保正确的第二位是用来确定注册地址的。0被附加到DST 和 SRC的这个三位领域的左边致使它们能狗处理RO到R7.加上其中选择来源地址的第一位,是来自包括四个位的的微指令的地址以致所有的17个编程都能被达到。对注册文档最后的改变就是取代在带有在线上他们输入的开放的集成电路和带有在线上他们输出的不变的0值的文件中的存储元素R0。登记档案结果的一个特征显示在图10-4( b)中。


这些输入是由两个4比1多路复用器提供的,它们是MUX R 和MUX L,添加到一个基本的16位切换器中,所有这些都显示在图5(a)。同时,来自输入操作的适当的这些输入是由两个4比1多路复用器提供的,它们是MUX R 和MUX L,添加到一个基本的16位切换器中,所有这些都显示在图5(a)。同时,来自输入操作的适当的结束位必须送交执行触发器。一个2比1多路复用器MUX SO选择的结束位来传递到执行触发器。新的切换器的特征是在第8-10部分上代替了原来的切换器,看在图5(b)。









每个指令的执行都以指令获取微指令为开端的。PC提供了地址以及已经更新的下一个地址。获取的指令被放置在IR中。指令解码过程是以使用MUX M和测绘光盘为开始


在我五个决策中的四个,一个获取操作路线是要进行的。根据操作码(OPCODE)的前三个位,要么一个单一的操作,两个操作(或者一个操作加一个参数),要么一个分支地址,这二者选其一。一个操作地址和参数值被放在登记的R12到R15 (SA, SD, DA, and DD)中的为它们保留的位置。这四个执行路线表明操作在这些标准的注册位置里,以及在大多情况下,利用它们产生一个留在标准位置DD的结果。反馈路线也是使用这个标准登记位置来发现结果和它的地址。



在这篇文章中,我们检查了两个CPU的设计:复杂指令集计算机( CISC)或是精简指令集计算机(RISC)。

复杂指令集计算机( CISC)控制单元包括一个堆栈指针除了程序计数器。控制微程序位于光盘ROM中,以及一个多路复用器的组合和一个光盘ROM提供快速的指令解码。控制单元还具有广泛的跳跃和有条件的分支能力,包括微型子路线的一个层次。关于控制的微程序被模块化以致在执行关于指令的微程序中共享许多微型子路线。


在讨论了复杂指令集计算机( CISC)或是精简指令集计算机(RISC)的性能之后,我们谈到了一些先进的概念,包括并行执行单位,带有流水线的微程序控制器组合,超流水线处理器,超标量处理器,以及高性能的预测和投机技术。最后,我们把本文中的设计技术和广泛的数字系统设计联系起来。



The CPU is the key component of a digital computer. Its purpose is to decode instruction receied from memory and perform transfers, arithmetic, logic, and control operations with data stored in internal registers, memory, or I/O interface units. Externally, the CPU provides one or more buses for transferring instructions, data, and control information to and from components connected to it. In the generic computer at the beginning of chapter 1, the CPU is a part of the processor and is heavily shaded. CPUs, however, may also appear in computers. Small, relatively simple computers called microcontrollers are used in computers and in other digital systems to perform limited or specialized tasks. For example, a microcontroller is present in the keyboard and in the monitor in the generic computer; thus, these components are also shaded. In such microcontrollers, the CPU may be quite different from those discussed in this chapter. The word lengths may be short (say, four or eight bits),the number of registers small, and the instruction sets limited. Performance, relatively speaking, is poor, but adequate for the task. Most important, the cost of these microcontrollers is very low, making their use cost effective.

In the following pages, we consider two computer CPUs, one for a complex instruction set computer (CISC) and the other for a reduced instruction set computer (RISC). After a detailed examination of the designs, we compare the performance of the two CPUs and present a brief

overview of some methods used to enhance that performance. Finally, we relate the design ideas discussed to general digital system design.

1、T wo CPU designs

As mentioned in previous chapters, atypical CPU is usually divided into two parts: the datapath and the control unit. The datapath consists of a function unit, registers, and internal buses that provide pathways for the transfer of information between the registers, the function unit, and other computer components. The datapath may or may not be pipelined. The control unit consists of a program counter, an instruction register, and control logic, and may be other hardwired or microprogrammed. If the datapath is pipelined, the control unit may be also be a pipeline. The computer of which the CPU is a part is either a CISC or a RISC, with its own instruction set architecture.

The purposes of this chapter is to present two CPU designs that illustrate combinations of architectural characteristics of the instruction set, the datapath, and the control unit. The designs will be top down, but with the reuse of prior component designs, illustrating the influence of the instruction set architecture on the datapath and control units, and the influence of the datapath on the unit. The material makes extensive use of tables and diagrams. Although we reuse and modify component designs from others , background information from these chapters is not repeated here.

References, however, are given to earlier sections of the book, where detailed information can be found.

The two CPUs presented are for a CISC using a non-pipelined datapath with a microprogrammed control unit and a RICS using a pipelined datapath with a hardwired pipelined control unit. These represent two quite distinct combinations of instruction set architecture, datapath, and control unit.

2、T he complex instruction set computer

The first design we present is for a complex instruction set computer with a non-pipelined datapath and microprogrammed control unit. We begin by describing the instruction set architecture, including the CPU register set, instruction formats, and addressing modes. The CISC nature of the instruction set architecture is demonstrated by its memory-to-memory access for data manipulation instructions, eight addressing modes, two instruction format lengths, and instructions that require significant sequences of operations for their execution.

We design a datapath for implementing the CISC architecture. The datapath is based on the one initially described in Section 7-9 and incorporated into a CPU in section 8-10. modifications are made to the register file, the function unit, and the buses to support the

present instruction set architecture.

Once the datapath has been specified, a control unit is designed to complete the implementation of the instruction set architecture. The design of the control unit must involve a coordinated definition of both the hardware organization and the microprogram organization. In particular , dividing the microprogram into microroutines, while at the same time designing the sequencer with which they interact, is a key part of the design. Even the instruction fields and opcodes are tied to this coordinated effort. Following the definition of the hardware and microcode organizations, we detail essential parts of the microcode and the microroutines for representative operations.

Instruction set architecture

Figure 10-1 shows the CISC register set accessible to the programmer. All registers have 16 bits. The register file has eight registers, R0though R7.R0is a special register that always supplies the value zero when it is used as a source and discards the result when it is used as a destination.

In additional to the register file, there is a program counter PC and stack pointer SP. The presence of a stack pointer indicates that a memory stack is a part of the architecture . the final register is the processor status register PSR, which contains information only in its rightmost the five bits; the remainder of the register is assumed to contain zero. The PSR contains the four stored status bit values Z,N,C,and V in positions 3 through 0, respectively. In additional, a stored interrupt enable bit EI appears in position 4.

Table 10-1 contains the 42 operations performed by the instructions. Each operation has a mnemonic and a carefully selected opcode. The operations are divided into four groups based on the number of explicit operands and whether the operation is branch. In addition, the status bits affected by the operation are listed.

Figure 10-2 gives the instruction formats for the CPU. The generic instruction format has five fields. The first, OPCODE, specifies of the operation. The next two, MODE and S , are used to determine the addresses of the operands. The last two fields, SRC and DST, are the 3-bit source register and destination register address fields, respectively. In addition, there is an optional second word W that appears with some instructions as an operand or an address, but not with others.

The first two bits of OPCODE, IR(15:14), determine the number of explicit operands and how the fields of the format are used. When these bits are 00,either no operand is required or the location of the operand is implied by OPCODE. Only the OPCODE field is needed, as shown in figure 2(b).the four rightmost OPCODE bits can specify up to 16 operands or with implied operand addresses.

If IR(15:14) is 01, the instruction has one operand and is a data transfer or data manipulation instruction. Since there is an operand, the MODE field specifies the addressing mode for obtaining it. The single address may involve the DST register address in its formation, so the DST field is also present. The S field and SRC field relate to the presence of two operands and so are not used for the typical single operand instructions. but, the shift instructions require a shift amount to indicate how many bits to shift. For maximum flexibility, this shift amount is treated just like a source operand. As a consequence, the SHA and S fields is a full 16-bit operand, but only values 0 through 15 are meaningful. There are sufficient OPCODE bits for 16 instructions with a single operand.

Table 10-2 gives the addressing modes specified by the MODE field. The first two bits of MODE specify four different types of addressing: register, immediate, indexed, and relative to the PC. The third bit of MODE specifies whether the address generated by these modes is

used as an indirect address. The one exception to this is direct addressing, which is obtained by applying indirection to the immediate type. Otherwise, if the third bit equals 0, indirect addressing does not apply whereas, if it equals 1, indirect addressing does apply. For the register type of instruction, MONE(2:1)=00 and the W word is not needed. Since the operand or address comes from a register. The third column of the table provides register transfer statements for each of the addressing modes for the one-operand instructions.

If IR(15:14) is equal to 10, then the instruction has two addresses used for true operands. All fields of the generic instruction, including S and SRC, are used for this case for all instructions. one of addresses, either the source or the destination, uses the addressing modes. If S=0, then the source uses the addressing mode specified by MODE, and the source is a register. If S=1, then the destination uses the addressing mode, and the source is a register. Register transfer descriptions of the resulting addresses are given in the fourth and fifth columns of Table 2. Again, depending on the contents of the MODE field, the second instruction word W, which is an address or an immediate operand, may or may not be present.

Instructions with IR(15:14)=11 are branches. Aside form the S field and the SHA field for shifts, the format is the same as for IR(15:14)=01. For all instructions of this type, the destination address (not the operand) becomes the new address placed in the program counter PC. As a consequence, the register mode is invalid for branch instructions.

Before proceeding to the next step, which defines the datapath to support the instruction set architecture, we will briefly note the characteristics of the architecture that define it as CISC or RISC. Most of the operations given in Chapter 9 are included in the instructio n set. A number of operations that do not appear are redundant. The same actions can be achieved by using proper addressing modes with instructions that do appear. For example, LD, ST, IN, and OUT can all be achieved by using MOVE instructions in a memory-mapped structure. By looking at the formats for the instructions, we find that most of the instructions can operate directly on operate directly on operands from memory. There are eight addressing modes and two different lengths of instruction formats. In addition, some of the instructions perform complex operations which can be viewed as operations that are likely to take more than one clo ck cycle for the execution step. These characteristics clearly identify this as a CISC architecture. Datapath organization

Rather than beginning from scratch, we will reuse the non-pipelined datapath employed with the microprogrammed control in section 8-10, with modifications. That datapath was shown in section 8-10, and the new, modified datapath based on it is given in Figure 10-6. we treat each modification in turn, beginning with the register file.

In section 8-10, register R8 was used as a temporary storage location. In the new

microprogrammed architecture, there are complex instructions spanning many clock cycles and performing complicated operations. Thus, more temporary storage is needed for use by the microprograms. To meet this need, we expand the register file from 9 registers to 16. the first 8 registers, R0 through R7, are visible to the computer programmer. The second 8 registers, R8 though R15 , are used as temporary storage for the microprogram operands and are hidden from the programmer. Figure 10-3 provides a map of the expanded register file with the temporary registers shaded. As indicated previously, register R0 supplies the constant 0. registers R1 through R7 are available to the programmer for use, and registers R8 through R15 provide general temporary storage for use by microprograms, the last four registers, R12 though R15, have special uses: to keep the microcode simple, standard locations are essential for storing the operands and addresses used by execution microcode for most instructions. thus ,R12 is the location for the source address(SA), R13 for the source data (SD), R14 for the destination address(DA), and R15 for the destination data(DD).

We cannot access the eight temporary registers based on the 3-bit register address available in the instruction. To deal with this problem, we provide, first, 4-bit register address from the microinstruction, and second, a microinstruction bit to choose between these addresses and those from the instruction. In addition, the flexibility to allow the register addressed by DST to be a source and by SRC to be a destination is needed to permit results of operations to be placed directly in memory. To accomplish these goals, we modify the register file by adding the logic shown in Figure 10-4(a). the instruction set architecture uses two addresses, one for a source a operand and the other for the other source as well as the destination. The register file uses the B address for a source, and the A and D addresses on the file are connected together, giving the same address for the other source and the destination. Although this reduction from three to two addresses is not essential at the mincroinstruction

level, it decrease the number of bits needed for register addresses in the microinstruction and matches the use of the register fields in the instruction formats.

A quad 2-to-1 multiplexer is attached to each of the two address inputs to the register file, to select between an address from the microinstruction and an address from the instruction. There is a 5-bit field in the microinstruction for the combined destination and source address DSA, in addition to a 5-bit field for the

B address SB. The first bit of each of the these fields selects between the register file address in the microinstruction(0) and the register file address in the instruction(1). If an instruction address is selected, whether it is DST or SR

C is determined by an additional quad 2-to-1 multiplexer. This multiplexer is controlled by the second bit of the DSA or SB fields, depending on which of them has 1 in the first bit in any microinstruction, thereby ensuring that the proper second bit is used to determine the register address. A 0 is appended to the left of the 3-bit fields DST and SRC to cause them to address R0 through R7. the addition to the first bit, which selects the address source, the addresses from the microinstruction contain four bits so that all 16 registers can be reached. The final change to the register file is to replace the storage elements for R0 in the file with open circuits on the lines that were their inputs and with constant zero valves on the lines that were their outputs. A symbol for the resulting register file is show in Figure 10-4(b).

We find that, based on the eight shift instructions provided, the shifter from section 8-10, needs to be modified. The modifications involve the end bits of the shift logic. For logical shifts, a 0 is inserted, as before. For the right arithmetic shift, she sign bit is the incoming bit, and for the left arithmetic shift, 0 is the incoming bit. Rotates require that the bit from the opposite end of the shifter be fed around. Finally, rotates with carry require that the carry flip-flop output be provide as an input on both ends of the shifter.

The inputs are furnished by two 4-t0-1 multiplexers, MUX R and MUX L, added to a basic 16-bit shifter, all shown in Figure 10-5(a). also, the appropriate end bits from the input operand must be sent to the carry flip-flop. A 2-to-1 multiplexer MUX SO selects the end bit to pass to the carry flip-flop C. the symbol for the new shifter, which replaces the basic shifter from section 8-10, appears in Figure 10-5(b), FS3, FS2, FS1, and FS0 from the FS field drive the control inputs S3, S2, S1 and S0, respectively.

All modifications to the original datapath are represented in Figure 10-6. As a part of the design process, the new datapath needs to be checked to make sure that it has all of the capabilities necessary for implementing the instruction set and addressing modes .Certainly ,some decisions have been made that have not been discussed. For example, there is no dedicated multiplication or division hardware, so these operations must be implemented by microprograms controlling the datapath.

Microprogrammed Control Organization

The microgrammed control unit accompanies the datapath of Figure 10-6 in Figure 10-7. The control consists of four principal parts. One is the control unit registers : the instruction register IR, the program counter PC, and the stack pointer SP. In some designs the PC and SP are logically included in the register file and thus are a part of the datapath .Here, since they are separate from the register file and are used primarily for program control ,we haveincluded them with the control . Sequencing within the control unit is provides by the microsequencer , which contains two registers: the control address register CAR and the subroutine branch register SBR. The program counter for the microprogram , the CAR simply counts up to the next address in sequence or loads in parallel . With a parallel load , the

address can be set to any value and the next-address comes from three source including the next-address field in the current microinstruction.

Microroutines have subroutines, just as programs do. To distinguish them, we call subroutines for microprograms microsubroutines. The SBR is used to store the next address for the CAR at the time a microsubroutine in order to return microprogram execution to the

next microinstruction in the calling microroutine. The final part of the control unit is the instruction decode, which consists of combinational logic and is also a next address source for the CAR.

Microprogram structure

We approach the microprogram design top down. The top level consists of an ASM-like chart giving a flow of microroutines. These routines have labels similar to the stages in the pipelined CPU in section 8-11. in this case, however, rather than being performed in a single clock with combinational logic, the routines require the use of the same hardware over multiple cycles. The flow between and, to same extent, within the routines is intimately tied to the instructions and their decoding. Since the mapping ROM can be used for branching simultaneously with a format A data transfer or manipulation operation, it is convenient to control the flow between microroutines entirely by using the mapping ROM. This flow is shown in Figure 10-8; the chart is not strictly an ASM chart, since each rectangular box corresponds to microroutines representing multiple states rather than a single state and to multiple clock cycles rather than single one.

The execution of each instruction begins with the instruction Fetch microroutine. The PC provides the address and is updated to the next address. The instruction fetched is placed in the IR. The the instruction-decoding process begins, using MUX M and the mapping ROM. For MM equal to 00, only the first three bits of OPCODE are used, with the remaining bit set to 0.In addition, the third bit from the left ignored except when the first two equal to 01. A five-way branch results. This branch is represented by the five binary decision boxes in the figure. Since the bits of OPCODE that are used denote the number of opera nds for the instruction being decoded, the destinations of the branches are, in three cases, microroutines to fetch the operands. In another case, program branch addresses are fetched. In the final case, the branch in the chart goes directly to execution. There are three paths to execution blocks that are dependent upon the decision made by the decision boxes. These paths preserve information from the decoding of the three bits of OPCODE in the Instruction Fetch to obtain the shift amount parameter, there is an additional decision required at the end of the two-operand fetch.

In four of the five decisions, an operand fetch routine is performed. Depending upon the first three bits of OPCODE, either a single operand, two operands (or one operand plus a parameter), or a branch address is fetched. The operand address, and parameter values are placed in locations reserved for them in registers R12 through R15 (SA, SD, DA, and DD). The four execution routines find the operands and addresses in these standard register locations and, in most cases, use them to produce a result that is left in standard location DD.

The Write Back routine also uses the standard register locations to find the result and its address.

Following their execution, it is necessary for most operations to place the result in its destination. This is accomplished by the Write-Back microroutine. Some of the operations, however, do not have a result to be written in that routine. The existence of these operations is apparent from the paths leading directly from Zero-operand Execution, Program Branch and Two-operand Execution to Interrupt Handling. After each execution microroutine, the program enters the Interrupt before fetching the next instruction.


In this paper.we examined two CPU designs: the CISC and RISC.

The CISC control unit includes a stack pointer in addition to the program counter.Control microprograms reside in ROM.and a combination of a multi.plexer and a ROM provides fast instruction decoding.The control unit also has extensive{ump and conditional branching capabilities,including one level of microsubroutines.The microprogram for the control is modularized to permit many microsubroutines to be shared in implementing the microprogram for the instructions.

The RISC control unit is pipelined and has special hardware added to deal with branches. Pipelined CPUs have both data and control hazard problems.We examined one of each type of hazard,as well as software and hardware solutions for each.

After discussing CISC and RISC performance,we touched on some advanced concepts, including parallel execution units, a combination of microprogrammed control with a pipeline,superpipelined CPUs, superscalar CPUs,and predictive and speculative techniques for high-performance.Finally, we related the design techniques in this paper to more general digital system design.


