Analyzing the Intel Itanium memory ordering rules using logic programming and SAT
- 格式:pdf
- 大小:207.29 KB
- 文档页数:15
esxiwindowscpu多核的设置原理详细说明物理cpu(即插槽):真实的⼀个CPU;⽐如 2core:⼀个cpu有多少核;⽐如 8hyper threading:超线程技术,即再虚拟个核出来。
所以虚拟软件vmware,esxi最终会算出多少个逻辑cpu:物理cpu(slot) * cores * 2 = 2*8*2=24linux对物理cpu或者slot没有限制。
win10 专业版最多运⾏2个slot或者物理cpu。
在win10上,如果你的esxi虚拟出 vCpu = 16个,由于最多运⾏两个插槽,即2个物理cpu。
那需要配置它的每个cpu核⼼是8核。
这样正好有2 slot。
Setting the Number of Cores per CPU in a Virtual Machine: A How-to GuideWhen creating virtual machines, you should configure processor settings for them. With hardware virtualization, you can select the number of virtual processors for a virtual machine and set the number of sockets and processor cores. How many cores per CPU should you select for optimal performance? Which configuration is better: setting fewer processors with more CPU cores or setting more processors with fewer CPU cores? This blog post explains the main principles of processor configuration for VMware virtual machines. TerminologyFirst of all, let’s go over the definitions of the terms you should know when configuring CPU settings for to help you understand the working principle. Knowing what each term means allows you to avoid confusion about the number of cores per CPU, CPU cores per socket, and the number of CPU cores vs speed.A CPU Socket is a physical connector on the motherboard to which a single physical CPU is connected. A motherboard has at least one CPU socket. Server motherboards usually have multiple CPU sockets that support multiple multicore processors. CPU sockets are standardized for different processor series. Intel and AMD use different CPU sockets for their processor families.A CPU (central processing unit, microprocessor chip, or processor) is a computer component. It is the electronic circuitry with transistors that is connected to a socket. A CPU executes instructions to perform calculations, run applications, and complete tasks. When the clock speed of processors came close to the heat barrier, manufacturers changed the architecture of processors and started producing processors with multiple CPU cores. To avoid confusion between physical processors and logical processors or processor cores, some vendors refer to a physical processor as a socket.A CPU core is the part of a processor containing the L1 cache. The CPU core performs computational tasks independently without interacting with other cores and external components of a “big” processor that are shared among cores. Basically, a core can be considered as a small processor built into the main processor that is connected to a socket. Applications should support parallel computations to use multicore processors rationally.Hyper-threading is a technology developed by Intel engineers to bring parallel computation to processors that have one processor core. The debut of hyper-threading was in 2002 when the Pentium 4 HT processor was released and positioned for desktop computers. An operating system detects a single-core processor with hyper-threading as a processor with two logical cores (not physical cores). Similarly, a four-core processor with hyper-threading appears to an OS as a processor with 8 cores. The more threads run on each core, the more tasks can be done in parallel. Modern Intel processors have both multiple cores and hyper-threading. Hyper-threading is usually enabled by default and can be enabled or disabled in BIOS. AMD simultaneous multi-threading (SMT) is the analog of hyper-threading for AMD processors.A vCPU is a virtual processor that is configured as a virtual device in the virtual hardware settings of a VM. A virtual processor can be configured to use multiple CPU cores. A vCPU is connected to a virtual socket.CPU overcommitment is the situation when you provision more logical processors (CPU cores) of a physical host to VMs residing on the host than the total number of logical processors on the host.NUMA (non-uniform memory access) is a computer memory design used in multiprocessor computers. The idea is to provide separate memory for each processor (unlike UMA, where all processors access shared memory through a bus). At the same time, a processor can access memory that belongs to other processors by using a shared bus (all processors access all memory on the computer). A CPU has a performance advantage of accessing own local memory faster than other memory on a multiprocessor computer.These basic architectures are mixed in modern multiprocessor computers. Processors are grouped on a multicore CPU package or node. Processors that belong to the same node share access to memory modules as with the UMA architecture. Also, processors can access memory from the remote node via a shared interconnect. Processors do so for the NUMA architecture but with slower performance. This memory access is performed through the CPU that owns that memory rather than directly.NUMA nodes are CPU/Memory couples that consist of a CPU socket and the closest memory modules. NUMA is usually configured in BIOS as the node interleaving or interleaved memory setting.An example. An ESXi host has two sockets (two CPUs) and 256 GB of RAM. Each CPU has 6 processor cores. This server contains two NUMA nodes. Each NUMA node has 1 CPU socket (one CPU), 6 Cores, and 128 GB of RAM.always tries to allocate memory for a VM from a native (home) NUMA node. A home node can be changed automatically if there are changes in VM loads and ESXi server loads.Virtual NUMA (vNUMA) is the analog of NUMA for VMware virtual machines. A vNUMA consumes hardware resources of more than one physical NUMA node to provide optimal performance. The vNUMA technology exposes the NUMA topology to a guest operating system. As a result, the guest OS is aware of the underlying NUMA topology for the most efficient use. The virtual hardware version of a VM must be 8 or higher to use vNUMA. Handling of vNUMA was significantly improved in VMware vSphere 6.5, and this feature is no longer controlled by the CPU cores per socket value in the VM configuration. By default, vNUMA is enabled for VMs that have more than 8 logical processors (vCPUs). You can enable vNUMA manually for a VM by editing the VMX configuration file of the VM and adding theline numa.vcpu.min=X, where X is the number of vCPUs for the virtual machine.CalculationsLet’s find out how to calculate the number of physical CPU cores, logical CPU cores, and other parameters on a server.The total number of physical CPU cores on a host machine is calculated with the formula:(The number of Processor Sockets) x (The number of cores/processor) = The number of physical processor cores*Processor sockets only with installed processors must be considered.If hyper-threading is supported, calculate the number of logical processor cores by using the formula:(The number of physical processor cores) x (2 threads/physical processor) = the number of logical processorsFinally, use a single formula to calculate available processor resources that can be assigned to VMs:(CPU sockets) x (CPU cores) x (threads)For example, if you have a server with two processors with each having 4 cores and supporting hyper-threading, then the total number of logical processors that can be assigned to VMs is2(CPUs) x 4(cores) x 2(HT) = 16 logical processorsOne logical processor can be assigned as one processor or one CPU core for a VM in VM settings.As for virtual machines, due to hardware emulation features, they can use multiple processors and CPU cores in their configuration for operation. One physical CPU core can be configured as a virtual CPU or a virtual CPU core for a VM.The total amount of clock cycles available for a VM is calculated as:(The number of logical sockets) x (The clock speed of the CPU)For example, if you configure a VM to use 2 vCPUs with 2 cores when you have a physical processor whose clock speed is 3.0 GHz, then the total clock speed is 2x2x3=12 GHz. If CPU overcommitment is used on an ESXi host, the available frequency for a VM can be less than calculated if VMs perform CPU-intensive tasks.LimitationsThe maximum number of virtual processor sockets assigned to a VM is 128. If you want to assign more than 128 virtual processors, configure a VM to use multicore processors.The maximum number of processor cores that can be assigned to a single VM is 768 in vSphere 7.0 Update 1. A virtual machine cannot use more CPU cores than the number of logical processor cores on a physical machine.CPU hot add. If a VM has 128 vCPUs or less than 128 vCPUs, then you cannot use the CPU hot add feature for this VM and edit the CPU configuration of a VM while a VM is in the running state.OS CPU restrictions. If an operating system has a limit on the number of processors, and you assign more virtual processors for a VM, the additional processors are not identified and used by a guest OS. Limits can be caused by OS technical design and OS licensing restrictions. Note that there are operating systems that are licensed per-socket and per CPU core (for example, ).CPU support limits for some operating systems:Windows 10 Pro – 2 CPUsWindows 10 Home – 1 CPUWindows 10 Workstation – 4 CPUsWindows Server 2019 Standard/Datacenter – 64 CPUsWindows XP Pro x64 – 2 CPUsWindows 7 Pro/Ultimate/Enterprise - 2 CPUsWindows Server 2003 Datacenter – 64 CPUsConfiguration RecommendationsFor older vSphere versions, I recommend using sockets over cores in VM configuration. At first, you might not see a significant difference in CPU sockets or CPU cores in VM configuration for VM performance. Be aware of some configuration features. Remember about NUMA and vNUMA when you consider setting multiple virtual processors (sockets) for a VM to have optimal performance.If vNUMA is not configured automatically, mirror the NUMA topology of a physical server. Here are some recommendations for VMs in VMware vSphere 6.5 and later:When you define the number of logical processors (vCPUs) for a VM, prefer the cores-per-socket configuration. Continue until the count exceeds the number of CPU cores on a single NUMA node on the ESXi server. Use the same logic until you exceed the amount of memory that is available on a single NUMA node of your physical ESXi server.Sometimes, the number of logical processors for your VM configuration is more than the number of physical CPU cores on a single NUMA node, or the amount of RAM is higher than the total amount of memory available for a single NUMA node. Consider dividing the count of logical processors (vCPUs) across the minimum number of NUMA nodes for optimal performance.Don’t set an odd number of vCPUs if the CPU count or amount of memory exceeds the number of CPU cores. The same applies in case memory exceeds the amount of memory for a single NUMA node on a physical server.Don’t create a VM that has a number of vCPUs larger than the count of physical processor cores on your physical host.If you cannot disable vNUMA due to your requirements, don’t enable the vCPU Hot-Add feature.If vNUMA is enabled in vSphere prior to version 6.5, and you have defined the number of logical processors (vCPUs) for a VM, select the number of virtual sockets for a VM while keeping the cores-per-socket amount equal to 1 (that is the default value). This is because the one-core-per-socket configuration enables vNUMA to select the best vNUMA topology to the guest OS automatically. This automatic configuration is optimal on the underlying physical topology of the server. If vNUMA is enabled, and you’re using the same number of logical processors (vCPUs) but increase the number of virtual CPU cores and reduce the number of virtual sockets by the same amount, then vNUMA cannot set the best NUMA configuration for a VM. As a result, VM performance is affected and can degrade.If a guest operating system and other software installed on a VM are licensed on a per-processor basis, configure a VM to use fewer processors with more CPU cores. For example, Windows Server 2012 R2 is licensed per socket, and Windows Server 2016 is licensed on a per-core basis.If you use CPU overcommitment in the configuration of your VMware virtual machines, keep in mind these values: 1:1 to 3:1 – There should be no problems in running VMs3:1 to 5:1 – Performance degradation is observed6:1 – Prepare for problems caused by significant performance degradationCPU overcommitment with normal values can be used in test and dev environments without risks.Configuration of VMs on ESXi HostsFirst of all, determine how many logical processors (Total number of CPUs) of your physical host are needed for a virtual machine for proper work with sufficient performance. Then define how many virtual sockets with processors (Number of Sockets in vSphere Client) and how many CPU cores (Cores per Socket) you should set for a VM keeping in mind previous recommendations and limitations. The table below can help you select the needed configuration.If you need to assign more than 8 logical processors for a VM, the logic remains the same. To calculate the number of logical CPUs in , multiply the number of sockets by the number of cores. For example, if you need to configure a VM to use 2-processor sockets, each has 2 CPU cores, then the total number of logical CPUs is 2*2=4. It means that you should select 4 CPUs in the virtual hardware options of the VM in vSphere Client to apply this configuration.Let me explain how to configure CPU options for a VM in VMware vSphere Client. Enter the IP address of your in a web browser, and open VMware vSphere Client. In the navigator, open Hosts and Clusters, and select the needed virtual machine that you want to configure. Make sure that the VM is powered off to be able to change CPU configuration.Right-click the VM, and in the context menu, hit Edit Settings to open virtual machine settings.Expand the CPU section in the Virtual Hardware tab of the Edit Settings window.CPU. Click the drop-down menu in the CPU string, and select the total number of needed logical processors for this VM. In this example, Iselect 4 logical processors for the Ubuntu VM (blog-Ubuntu1).Cores per Socket. In this string, click the drop-down menu, and select the needed number of cores for each virtual socket (processor). CPU Hot Plug. If you want to use this feature, select the Enable CPU Hot Add checkbox. Remember limitations and requirements. Reservation. Select the guaranteed minimum allocation of CPU clock speed (frequency, MHz, or GHz) for a virtual machine on an ESXi host or cluster.Limit. Select the maximum CPU clock speed for a VM processor. This frequency is the maximum frequency for a virtual machine, even if this VM is the only VM running on the ESXi host or cluster with more free processor resources. The set limit is true for all virtual processors of a VM. If a VM has 2 single-core processors, and the limit is 1000 MHz, then both virtual processors work with a total clock speed of one million cycles per second (500 MHz for each core).Shares. This parameter defines the priority of resource consumption by virtual machines (Low, Normal, High, Custom) on an ESXi host or resource pool. Unlike Reservation and Limit parameters, the Shares parameter is applied for a VM only if there is a lack of CPU resources within an ESXi host, resource pool, or DRS cluster.Available options for the Shares parameter:Low – 500 shares per a virtual processorNormal - 1000 shares per a virtual processorHigh - 2000 shares per a virtual processorCustom – set a custom valueThe higher the Shares value is, the higher the amount of CPU resources provisioned for a VM within an ESXi host or a resource pool. Hardware virtualization. Select this checkbox to enable . This option is useful if you want to run a VM inside a VM for testing or educational purposes.Performance counters. This feature is used to allow an application installed within the virtual machine to be debugged and optimized after measuring CPU performance.Scheduling Affinity. This option is used to assign a VM to a specific processor. The entered value can be like this: “0, 2, 4-7”.I/O MMU. This feature allows VMs to have direct access to hardware input/output devices such as storage controllers, network cards, graphic cards (rather than using emulated or paravirtualized devices). I/O MMU is also called Intel Virtualization Technology for Directed I/O (Intel VT-d) and AMD I/O Virtualization (AMD-V). I/O MMU is disabled by default. Using this option is deprecated in vSphere 7.0. If I/O MMU is enabled for a VM, the VM cannot be migrated with and is not compatible with snapshots, memory overcommit, suspended VM state, physical NIC sharing, and .If you use a standalone ESXi host and use VMware Host Client to configure VMs in a web browser, the configuration principle is the same as for VMware vSphere Client.If you connect to vCenter Server or ESXi host in and open VM settings of a vSphere VM, you can edit the basic configuration of virtual processors. Click VM > Settings, select the Hardware tab, and click Processors. On the following screenshot, you see processor configuration for the same Ubuntu VM that was configured before in vSphere Client. In the graphical user interface (GUI) of VMware Workstation, you should select the number of virtual processors (sockets) and the number of cores per processor. The number of total processor cores (logical cores of physical processors on an ESXi host or cluster) is calculated and displayed below automatically. In the interface of vSphere Client, you set the number of total processor cores (the CPUs option), select the number of cores per processor, and then the number of virtual sockets is calculated and displayed.Configuring VM Processors in PowerCLIIf you prefer using the command-line interface to configure components of VMware vSphere, use to edit the CPU configuration of VMs. Let’s find out how to edit VM CPU configuration for a VM which name is Ubuntu 19 in Power CLI. The commands are used for VMs that are powered off.To configure a VM to use two single-core virtual processors (two virtual sockets are used), use the command:get-VM -name Ubuntu19 | set-VM -NumCpu 2Enter another number if you want to set another number of processors (sockets) to a VM.In the following example, you see how to configure a VM to use two dual-core virtual processors (2 sockets are used):$VM=Get-VM -Name Ubuntu19$VMSpec=New-Object -Type VMware.Vim.VirtualMachineConfigSpec -Property @{ "NumCoresPerSocket" = 2}$VM.ExtensionData.ReconfigVM_Task($VMSpec)$VM | Set-VM -NumCPU 2Once a new CPU configuration is applied to the virtual machine, this configuration is saved in the VMX configuration file of the VM. In my case, I check the Ubuntu19.vmx file located in the VM directory on the datastore (/vmfs/volumes/datastore2/Ubuntu19/). Lines with new CPU configuration are located at the end of the VMX file.numvcpus = "2"cpuid.coresPerSocket = "2"If you need to reduce the number of processors (sockets) for a VM, use the same command as shown before with less quantity. For example, to set one processor (socket) for a VM, use this command:get-VM -name Ubuntu19 | set-VM -NumCpu 1The main advantage of using Power CLI is the ability to configure multiple VMs in bulk. is important and convenient if the number of virtual machines to configure is high. Use VMware cmdlets and syntax of Microsoft PowerShell to create scripts.ConclusionThis blog post has covered the configuration of virtual processors for VMware vSphere VMs. Virtual processors for virtual machines are configured in VMware vSphere Client and in Power CLI. The performance of applications running on a VM depends on the correct CPU and memory configuration. In VMware vSphere 6.5 and later versions, set more cores in CPU for virtual machines and use the CPU cores per socket approach. If you use vSphere versions older than vSphere 6.5, configure the number of sockets without increasing the number of CPU cores for a VM due to different behavior of vNUMA in newer and older vSphere versions. Take into account the licensing model of software you need to install on a VM. If the software is licensed on using a per CPU model, configure more cores per CPU in VM settings. When using virtual machines in VMware vSphere, don’t forget about . Use NAKIVO Backup & Replication to back up your virtual machines, including VMs that have multiple cores per CPU. Regular backup helps you protect your data and recover the data in case of a .5(100%)4votes。
习题+答案2.1Write T or F to each statementT6. An asynchronous device is a clocked device.T7. A latch is an asynchronous device, because it functions at arbitrary times.T8. The repetition of clock pulse can vary from very low rate to very high rate.T9. A synchronous device changes its state only on the arrival of a clock pulse.F10. A clock input may occur at any time.T11. The clock pulses are used to synchronize all operations of the shift register.T13. A unidirectional shift register is capable of shifting in one direction only.F14. In a shift-left register the serial input determines what goes into the leftmost position during the shift register.F15. To provide parallel transfer capability some additional input and output lines should be provided to the shift register.Choose the best answer for each of the following1. How can the output of a logic gate be energized?CA By changing its inputs.B By remaining its inputs unchanged.C By meeting the specific input condition.D By giving a pulse.3. A NAND gate consists of D .A an OR gate followed by an inverterB an AND gate followed by an inverterC an AND gate followed by an OR gateD an inverter followed by an AND gate4. Under what condition is the output of a two-input XOR gate logic-high?BA Its inputs are simultaneously at the same logic level.B Its inputs are simultaneously at opposite logic levels.C Its inputs are logic-low.D its inputs are simultaneously logic-high.2.2Write T or F for each statementT 1. The CPU uses the input and output units to communicate with the outside world.F2. Main memory is sometimes called external memory.F3. After execution the program and the related files of data and files of information will be retained in the main memory for later use.F4. Personal computers have not the features of the larger system..T5. Large systems offer higher processing speeds for user and return more data than PCs. F6. CPUs all size have primary storage, arithmetic logic, and the control section.F7. The output device is the heart of any computer system.T8. The input/output devices and auxiliary storage units of a computer system are called peripherals.F9. The instrument of interpretation and communication between humans and computers system of all sizes is CPU.F10 Special purpose computers can adapted to many situation by giving them an appropriate program.F11 .A minicomputer is the smallest and least expensive type of computer.T12. A special purpose computer performs only one specific task and thus lacks versatility. T13. The larger the system the greater is its processing speed, storage capacity.T14. Mainframe computers are designed to process complex scientific application.T 9. The main memory in a general-purpose computer is made up of RAM integratedcircuit chip.\F 10. When the power is turned on, the program counter is set the first address of thebootstrap loader by the software of the computer.T 11. The read-write heads contact the surface of the floppy disks.T 12. The data on a particular track will be switched automatically onto a spare track by the computer before a catastrophic failure would occur.F 14. The read write heads stay on the same track continuously when the disk drive isWorkingF 16. The possible symbols in the binary numbering system are q to 9.F 17. The decimal value of 16 is represented in 4 bits BCD as 00010101.F 18. Alphanumeric versions of BCD commonly use 6,7or 8 bits to represent characters.F 19. A 6 bit alphanumeric code can represent 128 different characters.F 22. Eight bit codes are limited to representing 128 different characters.T23. An extra check (or parity ) bit is often added to each 6, 7, or 8 bit character represented in storage so that it will be possible to detect coding errors that may occur.T24. If a computer uses an odd parity format to detect errors in character codes, thenevery valid character code will always have an odd number of 1 bits.T25. Processor comes with a unique set of operations called an instruction set.F 26. In an instruction, operands specify the function to be performed.T 27. A processor's job is to retrieve instruction from memory and perform step bystep operations,3.3 True or FalseF1 All operating systems on various computers take the same size.F 2 All operating systems were written in a low level language.T3 The user can't use the computers at all if there is no operating system on the computers.F4 The operating system exists in the lowest layer of a computer.T5 The system calls are provided by the operating system.T 6 A computer's operational software defines the schedule of jobs awaiting execution.F7 Though an operating system can schedule the execution of jobs, it does not manage the availability of I/O devices.T 8 The IOCS component of an operating system controls I/O operations.T9 It is a major problem for the operating system to map the logical file concept onto physical storage devices such as magnetic tape or disk.F10 Files can only be direct access on a disk system.T 11 The logic structure and nomenclature of the different operating systems vary considerable.F12 The form of the system prompt is the same for every kind of operating system 13 You must boot the system, before you use a microcomputer.T14 Spooling is an approach to improving system performance by over aping the input of one job with the output of other jobs.T15 Multiprogramming allows time sharing.T1 6 When they first appeared, the microcomputers were provided with the operating system developed for computers.T17 By using the graphical user interface, what the users need to do is to "point and click" in order to accomplish their tasks.T18 The interface introduced by Windows is the object-oriented user interface.T 19.TMicrosoft failed in betting their future on Windows.Multiple Choice1 A computer's operating system is:Da resource managementb error recoveryc memory managementd all the above2 Which is the generalization of a two-level directory?Da cycle graph directory structureb a tree-structuredc batch systemd all the above3 Which system may have no time constrains?Ca real-time systemb time-share systemc batch systemd all the above4 The more popular micro operating system is:Da MS-DOSb CP/Mc UNIXd all the above5 What languages can be used to develop the operating systems ?Ca A machine language,b An assembly language.c A high level language,d all of the above.6 How does the operating system manage the resources of the computer?a It turns on or off the resources of the computer.b It makes them work together towards some common goals, or objectives.c It controls the way in which these resources are put to work.d It acts directly on the raw hardware.7 The function of an operating system is Da to drive the the raw hardware of the computerb to drive the resources of the computer in accordance with certain objectivesc to provide the higher layers of software with a simplified computerd all of the above11 The graphical user interface provides the users withA a simpler way to interact with their computersB a series of typed commandsC an intuitive sit of graphical icons that allowed the completion of common tasksD an intuitive set of graphical incons for users to "point and click" in order toaccomplish their tasks, so that they didn't have to remember arcane words andcommands anymore12 Windows resemble the Macintosh inA providing a GUI which is introduced by MacintoshB providing a limited means of multitaskingC providing a Windows interface just like the GUID allowing users to load multiple programs and have them run in thebackground while doing other work in a window in the foreground3.4 True or FalseF1 Flowcharting is used primarily for program design and rarely for systems design.T2 When programming in a procedure-oriented language, you should tell the computer'‘what to do' and 'how to do it'.T3 Assembler-level languages use mnemonics to represent instructions.T4 Machine language instructions are composed of a label, an opcode, and an operand.F5 Machine languages must be converted by a compiler to be used by the computer.F6 High-level languages require that programmers work with individual main storage locations.. 5T7 A compiler is translating program that converts high-level languages into machine language.T9 A flowchart loop indicates the repetitive performance of steeps to process data.T10 None of the computers consists of hardware only.F11 Programs written in high-level language can be executed by the computers without the help of a translator program~T13 Each symbolic instruction has one-to-one correspondence with the machine instruction.T14 Writing a program in a high-level language need not take account of the hardwareof the computer.T 15 The opode of an assembly language instruction specifies the operation that is to beperformed by the microprocessor.T 17 The mnemonic for the instruction is the symbolic representation of the actual binary code that the computer directly executes.T 18 A label is on the left of the assembly language statement.T 21 To enable the computer to solve an application problem programmers have to write programs in order to translate the application concepts into computerconcepts.T 22 A class is defined by grouping a user-defined type with all procedures and function that can be applied to it.T 23 The artificial intelligence research community did not agree with the concepts of object-oriented programming when in its early days.F24 Object-oriented programming languages are absolutely different from the LISP programming language.T 25 A program may produce incorrect output even if it runs OK.T 26 An error will occur if a program wants to use a deleted file.F 27 All the errors can be avoided.T 28 A warning will not terminate the program.T 29 Although we could not avoid all the runtime errors, we must take appropriate action when it happens.Match the following terms to the appropriate definition1 program2 programmer3 machine language4 assembler5 source6 object7 interpreter8 compilerA A computer program that translates an instruction into machine language, execute it, and then repeats the process for each instruction in a program until the program is finished.7B The set of statements that make up a computer program.5C A computer program that reads a high-level language instruction.8D A computer-specific set of primitive or elementary instructions that allows people to communicate with a particular computer.3E A set of instruction that tells a computer what to do.1F A program that translates an assembly-level language into machine language4.G Output from a compiler or assembler that is linked with other code to produce executable machine language code.6H A person who creates computer program.23.5True or falseT1 The program specifications are written by the software engineers.F2 Coding a program will consume most of programmer's time and effort.T 3 Programmers should use flowchart and other visual aids when they are designing routines.F4 The goal of the test phase of program development is to "prove" that a particular program has been completely debugged.T 5 More programmers maintain programs rather than code programs.T 6 A structured program is made up of several modules.T 7 "Branching" capability is one the most intriguing properties of a digital computer.。
英特尔Intel面试题[Intel Interview Questions]Introduction:Intel is a global technology company that designs and manufactures advanced integrated digital technology platforms. As part of their recruitment process, Intel conducts interviews to assess the skills and suitability of candidates for various roles within the company. In this article, we will explore some potential Intel interview questions and provide detailed answers.1. Technical Knowledge Questions:- What is the difference between a CPU and a GPU?Answer: A CPU (Central Processing Unit) is responsible for running the instructions of a computer program, executing tasks on a single thread at high speed. A GPU (Graphics Processing Unit) is designed specifically for rendering and displaying graphics on a screen, performing parallel computations with multiple threads.- Can you explain Moore's Law?Answer: Moore's Law states that the number of transistors on a microchip doubles approximately every two years. It illustrates the exponential growth of computing power and the shrinking size of transistors, leading to increased performance and efficiency in electronic devices.- What is the difference between DDR3 RAM and DDR4 RAM? Answer: DDR3 and DDR4 are different generations of Random-Access Memory. DDR4 offers higher speed and lower power consumption compared to DDR3. DDR4 also provides higherbandwidth, enabling faster data transfer rates between the RAM and the CPU.2. Problem-Solving Questions:- How would you design a traffic light control system? Answer: I would start by understanding the requirements and constraints of the system. Next, I would design a state machine to model the different states of the traffic lights (e.g., green, yellow, red) and the transitions between them based on inputs from sensors and timers. I would then implement the control logic using appropriate programming languages and algorithms.- You have a list of numbers. How would you find the median without using built-in functions?Answer: I would first sort the list of numbers in ascending order. If the number of elements in the list is odd, the median would be the middle element. If the number of elements is even, the median would be the average of the two middle elements. This can be found by calculating the average of the (n/2)th and ((n/2)+1)th elements.3. Behavioral Questions:- Tell me about a time when you faced a challenge at work and how you resolved it.Answer: In my previous job, we had a tight deadline for a project, and the team was struggling to meet the requirements. To overcome this challenge, I facilitated effective communication within the team and identified key areas where we could optimize workflows. I also delegated tasks based on individual strengths and provided resources to support the team. Through collaboration andefficient project management, we were able to deliver the project on time and achieve the desired results.- Describe a situation where you had to work collaboratively with a team to achieve a common goal.Answer: During a group project in university, we had to develop a mobile application within a limited timeframe. We divided the tasks among team members based on their skills and interests. We established regular communication channels and conducted frequent status updates to ensure everyone was on the same page. By leveraging each member's expertise and working together, we successfully developed the application and received positive feedback from our peers and professors.Conclusion:Preparing for an Intel interview requires a strong foundation of technical knowledge, problem-solving skills, and the ability to showcase your experiences through behavioral questions. These sample questions provide a glimpse into the types of inquiries you may encounter during an Intel interview. Remember to research the company, review the job requirements, and practice your responses to increase your chances of success.4. Technical Skills Questions:- Can you explain the difference between multi-core and multi-thread processors?Answer: A multi-core processor consists of multiple independent processing units called cores, each capable of executing instructions in parallel. This allows for simultaneous processing of multiple tasks, improving overall system performance. On the other hand, a multi-thread processor utilizes a technique calledmulti-threading, which allows a single core to handle multiple threads of execution. This enables better utilization of the core's resources and can also improve performance.- How does cache memory work in a computer system? Answer: Cache memory is a small, high-speed memory located closer to the CPU than main memory. It stores frequently accessed data and instructions, allowing for faster access compared to retrieving data from main memory. When the CPU needs data, it first checks the cache to see if the data is present. If it is, a cachehit occurs, and the data is retrieved from the cache. If the data is not present, a cache miss occurs, and the CPU retrieves the data from main memory and stores a copy in the cache for future access.- Can you explain the concept of pipelining in processors? Answer: Pipelining is a technique used in processors to overlap the execution of multiple instructions. It breaks down instruction execution into several stages: fetch, decode, execute, memory access, and writeback. Each stage is performed by a different part of the processor, allowing instructions to flow through the pipeline simultaneously. This improves overall instruction throughput and can result in faster execution.5. Problem-Solving Questions:- How would you design a data structure to efficiently store and retrieve a large number of strings?Answer: One possible approach would be to use a data structure called a trie (prefix tree). In a trie, each node represents a character, and the edges represent the next possible characters. This allows for efficient storage and retrieval of strings based on their prefixes,as common prefixes are shared among different strings. By utilizing trie operations such as insertion and search, the system can efficiently handle a large number of strings.- How would you optimize the performance of a database query that is running slowly?Answer: There are several approaches to optimize the performance of a slow-running database query. Firstly, ensure that appropriate indexes are created on the columns involved in the query, as indexes can significantly improve search speeds. Secondly, analyze the query execution plan and identify any performance bottlenecks, such as a full table scan or inefficient joins. Adjusting the query or adding additional conditions to limit the result set can often improve performance. Additionally, considering database schema design and partitioning data can also improve query performance.6. Behavioral Questions:- Describe a situation where you demonstrated strong problem-solving skills.Answer: In a previous job, we encountered a critical bug in the software that caused system crashes. To identify and resolve the issue, I gathered data from users, performed thorough testing, and analyzed log files. After identifying the root cause of the bug, I proposed a solution and worked closely with the development team to implement and test the fix. Through diligent problem-solving, we successfully resolved the issue, ensuring system stability and avoiding further disruptions.- Tell me about a time when you had to adapt to a rapidly changingwork environment.Answer: In a previous role, our company underwent a significant reorganization that resulted in changes to team structures and project priorities. As a result, my role and responsibilities shifted drastically. To adapt to the new environment, I quickly assessed the changes and embraced a flexible mindset. I collaborated with my colleagues, sought out new opportunities for growth, and adjusted my workflow and priorities accordingly. By embracing change and staying adaptable, I was able to thrive in the new work environment.Conclusion:Preparing for an Intel interview requires not only a solid understanding of technical concepts but also the ability to apply problem-solving skills and effectively communicate your experiences. The additional questions in this section provide further insight into the types of inquiries Intel may pose in an interview. Remember to leverage your technical knowledge, showcase your problem-solving abilities, and demonstrate your adaptability to increase your chances of success.。
《计算机专业英语》习题参考答案Lesson 1I.1. Operating System2. Fetch-evaluate-execute3. Front-side bus4. Dual-core processor5. Basic Input/Output System(BIOS)II.1. 指令是特定各式的二进制数列,它们对于每台机器都是唯一的。
2. CPU是中央处理单元的简称,每个字母分开发音。
3. 大多数计算在中央处理器中进行。
4. 双核是指一个处理器上有两个完整运算内核的CPU。
5. 处理器:是微处理器或CPU的缩写。
6. 集成电路:即芯片,是由半导体材料制成的一种电子设备。
III.1. F2. T3. TIV.1.ALU, CU, Register2.memory3.processor4.the CPULesson 2I.1.Static Random Access Memory(SRAM)2.Dynamic Random Access Memory(DRAM)3.Virtual Memory4.Physical Memory5.Level 1 Cache6.Level 2 Cache7.HDD access speedII.1.动态随机存储器之所以称为“动态”是因为它每秒钟被刷新数千次。
2.RAM:是计算机中存储操作系统、应用程序和当前正是用数据的地方。
3.ROM由计算机中一小块长寿命电池供电。
4.RAM缓存是由高速静态随机存储器构成的存储器。
III.1. F2. F3. F4. TIV.1. non-volatile2. compiler3. volatile4. DRAMLesson 3I.1. Motherboard2. PC Case3. Hard Disk Drive(HDD)4. Optical mouse5. RAM6. Mobile DiskII.1.PC是有很多组件构成的一个系统。
写在前面的话各位老师:您们好。
欢迎使用劳动版专业英语教材。
如果您在使用中发现问题,或者有任何建议和意见,可与我们联系:联系人:赵硕联系电话:64962011电子邮箱:zhaos@计算机专业英语(第二版)参考答案Unit 1Lesson1Look and LearnA 1 Monitor 显示器 2 Modem 调制解调器 3 System Unit 系统单元,主机4 Mouse 鼠标 5 Speaker音箱 6 Printer 打印机7 Keyboard 键盘B 1.麦克风可以用来刻录声音。
2.操纵杆是为玩游戏而设计的。
3.多数的扫描仪可以既扫描图片又扫描文本。
4.绘图板被用于图形设计。
DialogueA 略B 1. Mary is good at computer.2. Because it hasn’t the computer software system, Mike’s computer can’t work.3. Mike is a computer outsider.Reading179A Input Device: keyboard, mouse, microphone, scanner, graphics tabletOutput Device: monitor, speaker, printerB Input device (输入设备) convert into(把….转变为) output device(输出设备) translate into (翻译成)Lesson 2Look and Learn1 BIOS-ROM 芯片2 Flash memory 闪存3 memory bank内存条4 CMOS setup CMOS 设置DialogueA Install configuration error utility invalidB1.Mary works at PC Support Center.2.Bill should run the CMOS setup utility to solve the problem.ReadingA 1 b 2 a 3 d 4 c 5 eB 1. Random access memory (RAM); RAM is volatile storage because everything in most types of RAM is lost as soon as the computer is turned off.2. Cache memory; Cache memory improves processing by acting as a temporary high-speed holding area between the memory and the CPU.3. Flash RAM; Flash RAM can retain data even if power is disrupted.1804. Read only memory (ROM); ROM chips are not volatile and cannot be changed by the user.1 Floppy disk 软盘2 CD-ROM 光驱3 Hard disk 硬盘4 Flash Disk 闪存盘;u盘 5 mobile hard disk 移动硬盘DialogueB1. Yes 2. Yes 3. No 4. YesReadingA 1 A 2 Bseries prevent unrecoverable comprise dustB concentric181Lesson 4Look and Learn1. CPU slot (CPU插槽)2. Memory slots (内存插槽)3.AGP slot(AGP 插槽)4.PCI slots(PCI插槽)5. Power connector (电源接口)6.Input/Output ports(输入输出端口)DialogueA interface socket battery slot processorB 1.Because the battery of the motherboard is getting low.2. The socket types of both the processor and the motherboard have to be the same for them to work with each other. The motherboard must have updated BIOS in order for the certain CPUs to work right.ReadingA tower medium component expansion serial specializeB 1 No 2 Yes 3 NoUnit 2Lesson1Look and Learn1. desktop2. Pop-up Menu3. Folder4. Screen saverDialogue182B1.His card is an external sound card.2.Windows will look for and install a driver automatically.ReadingA 1. b 2. d 3. c 4. aB1. Operating system recognizes input from the keyboard and sends output to the display screen.2. Operating system keeps track of files and directories on the disk.3. Operating system controls peripheral devices such as disk drives and printers.Lesson2Look and LearnA Word Access Excel PowerPointB 1.Title bar 2.Menu bar 3. Toolbar 4.Row 5. Column 6. Cell 7.Statusbar DialogueA1. word2. icon3. dialog boxB1. NO2. Yes3. YesReadingA command grid software interface interact formulaB 1. a 2. b 3. cLesson3Look and LearnA. PhotoshopB. FlashC. 3DS MAXD. After EffectsDialogue1. web pages2. Flah Player3. Plug-in4. browser183ReadingA visual audio technology interactivity navigationalB 1. B 2.C 3. BLesson4Look and Learn1. carbon copy2. subject3. attachment4.salutationDialogueA 1. Yes 2. NO 3. Yes 4. NoB 1. set up 2. log 3. Outlook ExpressReadingA filter intention instruction confidentialB 1. D 2.C 3. A 4. BUnit 3Lesson1Look and Learnwork interface card2.Hub3. Switch4.Wireless Router5.CableDialogueB 1. NO 2. Yes 3. Yes 4. NOReadingA 1. b 2. d 3. a 4. cB 1. a 2.c 3. bLesson2Look and Learn1841. Domain name2. Browser3. Website4. ProtocolDialogueA 1.ADSL 2. ISP 3. IP and DNS addressReadingA cable upload motorway permanentB 1. c 2. a 3. d 4. bLesson3Look and Learn1. bus2. star3. ring4. treeDialogueA 1.Toplogy 2. star 3. dynamicB 1. Yes 2. NO 3. Yes 4. NOReadingA format device protocol destinationB 1. c 2. a 3. bLesson4Look and Learn1. firewall2. antivirus3. spam4. virusDialogueA 1.Control panel 2. antivirus 3. securityB 1. NO 2. Yes 3. NO 4. YesReadingA hack management maintenance attackB 1. a 2. b185Unit 4Lesson1Look and learn(1)销售副总(2)营销经理(3)销售代表(4)销售助理(5)采买部经理(6)采购员DialogueAB 1. ABC Company and Huaxia Commercial Company2.products design3. configuration and priceReadingA 1.No 2. Yes 3. Yes 4. YesB 1.B 2.B 3.DLesson 2Look and learn1. discount stores2. catalogue3. promotion4. market share Dialogue186187B1. Not at all.a.你怎么啦?有什么问题吗? 2. What ’s wrong with you?b.期待再次相见。
Chapter 10Adding New Software InterruptsIN THIS CHAPTER+ Understanding what happens when a 32-bit application executes an INTinstruction+ Adding new software interrupts to the Windows NT kernel+ Using callgates to execute privileged code+ How to use callgatesAs WE SAW IN THE previous chapter, software interrupts are one of the mechanismsused for calling system services. We have also seen that INT 2E is used for gettingthe system services from the Windows NT kernel. By adding new software inter-rupts, it is possible to add new system services to the Windows NT kernel. We havealready seen one way to add new system services to the Windows NT kernel, andthis is just one more method. In this chapter, we will not be playing with the oper-ating system data structures as we did in Chapter 7. Instead, we will use Intel data structures to add new system services.What Happens When a 32-Bit Application Executes an INT nn Instruction?Before we proceed with the technique of adding new software interrupts to the Windows NT kernel, let's first see what happens when a 32-bit application executesan INT nn type of instruction. Application programs run at privilege level 3, andthe kernel code executes at privilege level 0. When a 32-bit application program executes an INT nn type of instruction, the processor first looks at the descriptorentry for the interrupt and verifies that the current privilege level is at least as highas the descriptor privilege level. If not, the processor raises a General ProtectionFault. If the privilege level of the descriptor allows the interrupt to continue, the201202 Part 11: Undocumented Windows NTprocessor switches to the kernel stack. The kernel stack is selected by looking at thefield in the Task State Segment (TSS). After this, the processor pushes the old ring 3stack pointer (SS:ESP) and a standard interrupt frame (EFLAGS and CS:EIP) andjumps to the handler routine specified in the interrupt descriptor table entry. Thehandler performs its job and finally executes the IRETD instruction to return to thecalling application. When IRETD is executed, the processor pops off EFLAGS andCS:EIP, notices the switch from ring 0 to ring 3 and pops off the ring 3 SS:ESP, andthen the execution continues from the instruction following the INT nn instruction.If you see the descriptor entry for INT 2Eh through a debugger such as SoftlCE, you will notice that its descriptor privilege level is 3. That is why NTDLL.DLL cancall INT 2Eh on behalf of the applications.Adding New Software Interrupts tothe Windows NT KernelAs you saw in the last chapter, an interrupt gate is installed in the IDT for the soft-ware interrupts. Here is the structure of the interrupt gate:typedef struct InterruptGate {unsigned short O f f s e t L o w ;unsigned short S e l e c t o r;unsigned char R e s e r v e d;unsigned char SegmentType:4;unsigned char SystemSegmentFlag:1;unsigned char Dpl:2;unsigned char Present:1;unsigned short OffsetHigh;( InterruptGate_t;There are a few unused interrupts in Windows NT, including INT 20h and INT 22-29h. You can use these interrupts to add new software interrupts. Following arethe steps for adding new software interrupts:1. Get the base address of the interrupt descriptor table using the assemblyinstruction "sidt." This instruction stores the base address and limit of IDTat the specified memory location.2. Treat this base address an a pointer to array of "InterruptGate_t"structures.3. Index the interrupt number to be added into this table.4. Fill in the "InterruptGate_t" entry at the index according to therequirements of the interrupt gate. That is, set the "SegmentType" field toChapter 10: Adding New Software Interrupts203 OEh meaning interrupt gate; set the "SystemSegmentFlag" to 0 meaningsegment; set the "Selector," "OffsetLow," and "OffsetHigh" fields with theaddress of the interrupt handler. Set the "Present" field to 1.5. Establish some mechanism for passing parameters to the interrupt serviceroutine. For example, INT 2Eh uses the EDX register to point to the userstack frame and the EAX register for the service ID.We have already seen mechanisms used by INT 2Eh handler in Chapter 6.//"include "n t d d k.h"//•include "s t d a r g.h"//include "addint.h"/* Old Idt Entry */IdtEntry_t OldldtEntry;/* Interrupt Handler */extern v o i d _cdecl InterruptHandler();/* Buffer to store result of sidt instruction */char buffer[6];6. Use the INT nn instructions in your application programs according to theconventions established in the previous step.The sample application that illustrates this method adds INT 22h to the WindowsNT kernel. The interrupt handler expects that the EDX register points to the buffer,which will be filled by the handler with the "Newly added interrupt called" string.The buffer should be at least 29 bytes long.Following is the device dnver that adds a new software interrupt to the WindowsNT kernel. The driver adds the interrupt in its DriverEntry routine and removes the interrupt in its DrvUnload routine. The full source code for the application that is-sues this newly added interrupt is not given. Only the relevant part that issues theinterrupt is given here.Listing 10-1: ADDINT.C204Part 11: Undocumented Windows NTChapter 10: Adding New Software Interrupts205Part 11: Undocumented Windows NTChapter 10: Adding New Software Interrupts207Using Callgates to ExecutePrivileged CodeNext, we will discuss one generic method of executing ring 0 instructions from auser-level application running at ring 3 with the help of a device driver. This is anequivalent of RINGO by Matt Pietek, which appeared in the May 1993 edition ofMicrosoft Systems Journal in an article called "Run Privileged Code from YourWindows-based Program Using CallGates." This may be used for performing directport I/O under Windows NT (refer to "Direct Port I/O and Windows NT" by DaleRoberts, Dr. Dobb's Journal of Software Tools, May 1996). The whole trick of run-ning ring 0 instructions at ring 3 is based on the concept of callgates.Callgates are mechanisms that facilitate controlled and secure communicationfrom a lower privilege level to higher privilege level. Right now we will considerthe control transfer from ring 3 to ring 0 since Windows NT uses only these twoprivilege levels. It is as if you have ring 3 and ring 0 code on two sides of a callgate,208 Part 11: Undocumented Windows NTwith the callgate acting as an intermediary between the two. The callgate enablesmessages to pass from one ring to the other.When creating a callgate, you have to specify the address of each side of the fence and the number of parameters to be passed from one side of the fence to theother. The privilege level of the callgate dictates which processes have access to it.When the control is transferred though the callgate, the processor switches to thering 0 stack. This stack is selected by looking at the TSS. The TSS contains the stackfor each privilege level. After this, the processor pushes the ring 3 SS:ESP on thisnew stack. Then the processor copies the number of parameters specified by thecallgate from the ring 3 stack to the ring 0 stack. Parameters are in terms of thenumber of DWORDS for 32-bit callgates and the number of WORDS for a 16-bitcallgate. Further, the processor pushes the ring 3 CS:EIP onto the stack and jumpsto the address specified in the callgate. The function at ring 0 is responsible forcleaning the parameters from the stack once it has finished executing. In the end,the ring 0 code should execute a retf nn instruction to clean up the stack and returncontrol to the ring 3 code.The sample accompanying this technique is based on the sample program PHYS.EXE demonstrated in Matt Pietrek's Windows 95 Programming Secrets (IDGBooks Worldwide). The sample shows you how you can use the same trick underWindows NT. The sample uses three undocumented functions in NTOSKRNLEXE.These functions enable you to allocate and release selectors from the GlobalDescriptor Table (GOT) and modify the descriptor entries corresponding to the se-lectors. Use of the following undocumented functions prevents the need to directlymanipulate Intel data structures such as the GOT.NTSTATUSunsigned short *S e l e c t o r A r r a y.int NumberOfSelectors); - " ' -The function allocates the specified number of selectors from the GDT and fillsin the SelectorArray with the allocated selector values. NTOSKRNL keeps a linkedlist of free selectors in the descriptor itself. Also, NTOSKRNL keeps track of thenumber of free selectors. The function checks whether the specified number of se-lectors is present. If enough selectors are available, the function removes those se-lectors from the free list and gives the list to the caller. Interestingly, these functionsare exported from the NTOSKRNL.EXE file, so any driver can use them. Other func-tions also enable descriptor queries and other tasks, but they are not exported.NTSTATUSunsigned short *SelectorArray,int NumberOfSelectors);Chapter 10: Adding New Software Interrupts209 The function releases the specified number of selectors. The selectors are speci-fied in the array SelectorArray. The function updates the variable that keeps trackof the number of selectors and inserts these selectors in the free list of selectors.NTSTATUSThis function fills in the descriptor corresponding to a particular selector. Thesecond parameter should be a pointer to a descriptor entry.How to Use the Callgate TechniqueThe following sample shows how you can perform direct-to-port I/O and run priv-ileged instructions from a user-level application with the callgate technique. A de-vice driver is provided that enables the user application to allocate and release the callgates. The user-level application contains a function that does direct port I/O toget the base memory size and extended memory size from CMOS data. The applica-tion also prints the contents of CPU control registers such as CRO, CR2. The in-structions for accessing these registers are privileged.The sample comprises three modules:+ CALLGATE.SYS, which provides the functions for allocating and releasing the GOT selectors.+ The user mode DLL called CGATEDLL.DLL, which provides wrappers for calling the functions in CALLGATE.SYS. This DLL uses DeviceloControl totalk to CALLGATE.SYS.+ The user mode application CGATEAPP.EXE, which uses wrappers inCGATEDLL.DLL to demonstrate the sample. CGATEAPP.EXE contains thefunction that does direct port I/O and tries to access the processor controlregisters.The function in CGATEAPP.EXE that runs ring 0 code is written in Assembly language due to the restrictions imposed by the 32-bit compiler. These restrictionsare discussed in Matt Pietrek's Windows 95 Programming Secrets, but we will sum-marize those points again. The function that is called through callgate has to makea far return, whereas a standard 32-bit compiler generates a near return. Also, the function gets called as a far call, so the stack frame is not compatible with the one generated by a standard 32-bit compiler. The 32-bit compiler generates code insuch a way that it expects the first parameter to be at [EBP+8] once it sets up thestack frame with PUSH EBP, MOV EBP, and ESP. However, because the functiongets called as a far call, the first parameter is present at [EBP+OCh].r210Part 11: Undocumented Windows NTChapter 10: Adding New Software Interrupts211212Part 11: Undocumented Windows NTChapter 10: Adding New Software Interrupts213214Part 11: Undocumented Windows NTChapter 10: Adding New Software Interrupts215r216Part 11: Undocumented Windows NTChapter 10: Adding New Software Interrupts217218Part 11: Undocumented Windows NTChapter 10: Adding New Software Interrupts219220Part 11: Undocumented Windows NTPaging IssuesWhile writing the callgate sample, we observed that there are certain issues regard-ing accessing the paged/swapped out data in the interrupt routine and also in thefunction called through callgate. All the existing interrupt handlers such as INT 2Ehwere seen to follow certain entry and exit code before performing any real work.Some of the tasks performed by the entry code were:1. Creates some space on stack.2. Prepares a trap frame that will record the state of some of the CPUregisters.3. Saves away some of the fields in Thread Environment Block such asprocessor mode and one field in TEB, which SoftICE calls as "KSS EBP."We don't know the exact meaning of this, but its seems that eachinterrupt handler should set this field to the trap frame created inprevious step.4. Saves away the contents of FS register and sets FS register to 0x30.Out of all these steps, the first step is absolutely necessary and is related to the logic used by page fault handler of the operating system. The page fault handlerdoes some arithmetic on the current stack pointer and the stack pointer at the timeof ring transition from ring 3 to ring 0 and take some decisions. If at least a specificChapter 10: Adding New Software Interrupts221 amount of stack space is not found between these two stack pointer values, thenthe system crashes with a Blue Screen.It is essential that you follow this while writing interrupt handlers or functions executed through callgate to successfully access paged out data. The fourth step ofsetting FS register to 0x30 is also necessary since the system expects FS register topoint to Processor Control Region when the thread is executing in ring 0 and theselector 0x30 points to the descriptor with the base address equal to address of processor control region.Note that you have to follow the same steps while hooking softwareinterrupts. IThe second and third step seems to be only for bookkeeping information.All the samples in this book that use callgates or interrupt handlers use a macrodefined in UNDOCNT.INC file called RingOProlog and RingOEpilog. These macros implement the code, which takes care of these paging issues.SummaryIn this chapter, we detailed how interrupts are executed under Windows NT. Thenwe discussed a mechanism for adding new software interrupts. Along the way, we discussed some processor data structures used while processing the interrupt and presented an example that adds a software interrupt (0x22) to Windows NT. Wealso showed an example of an application that calls the newly added interrupt.After that, we discussed callgates, used for running ring 0 code from ring 3. Thiswas followed by an example that demonstrated how to use callgates to read proces-sor control registers such as CRO, CR3 and do direct port I/O from ring 3. The chap-ter concluded with the discussion about the paging issues while executing functions through callgates and interrupt handlers.。
[How To] Intel Optane MemoryIntel Optane Memory can be seen as a cache memory for the SATA device; by moving frequently used data to the Intel Optane, it helps shorten the data access time for the matching device and accelerates the system performance significantly.This guide shows you how to configure and check the status of Intel Optane Memory.Notes:∙MSI Notebooks support 16GB and 32GB Intel Optane Memory module.∙Intel Optane Memory acceleration supports only Windows 10 64-bit system.∙PCIe NVMe devices and any SATA devices which created with the RAID volume are not supported for Intel Optane Memory system acceleration.∙ A single SATA device which installed with multiple operating systems is not supported.Outline1.How to setup Intel Optane Memory?2.How to correctly disable Intel Optane Memory?3.How to check if Intel Optane Memory is configurered properly?*To know more about Intel Optane Memory, visit Intel website:https:///content/www/us/en/support/articles/000024018/memory-and-storage/intel-optane-memory.html1.How to setup Intel Optane Memory?*Note: Notebook which originally bundled with Intel Optane Memory and has the pre-installed system, all settings are already applied and configured.(To check the status, go to 3. How to check the status of Intel Optane.)1.1.Update the latest BIOS released on MSI website.Find the BIOS update file under product download page for your notebookmodel; enter the product name of your notebook on upper right searchcolumn of MSI website and select “Download” under your notebook model.1.2.Update the latest Intel Rapid Storage Technology released on MSI website ordownload the “SetupOptaneMemory” utility released on Intel website.System which has the RAID volume, use the Intel Rapid Storage Technologyapplication for configuring the Intel Optane; otherwise, download and install the SetupOptaneMemory utility for the setups.*Note: Intel RST driver is included in both tool's installation file, so you canonly choose to use one of them installed on the system.(Ref: https:///content/www/us/en/support/articles/000024385/memory-and-storage.html)1.3.Install Intel Optane Memory on the M.2 PCIe interface slot.*Note:1. Before buying Intel Optane Memory, make sure an additional empty M.2PCIe slot is available for the upgrade.2. If you’re not familiar with your system configuration or disassembly, wesuggest you contact with the MSI local service center or online supportteam and let our technicions help you with the upgrade.1.4.Manually format the Intel Optane Memory to GPT.*Skip this step if the Optane disk is properly recognized in the system.*Note: Dynamic disk is not supported for Intel Optane.Right click on the Windows icon and select “Disk Management”; Right click on the unknown disk, select the partition style “GPT (GUID Partition Table)”and click OK to initialize the Optane.1.5.Enable the Intel Optane Memory on the system.1.5.1.Intel® Rapid Storage Technology Utility:*For the system which already has a RAID volume acceleration and wants to configure the Optane acceleration, run the Intel Rapid Storage Technology utility.Run the Intel(R) Rapid Storage Technology application, select Intel® Optane Memory tab and click on “Enable” to create the RAID volume for IntelOptane Memory and the SATA devices.Select the SATA device (here we use the 1TB hard drive for an example) which you choose to configure with the Intel Optane Memory.1.5.2.SetupOptaneMemory Utility:*For the system which doesn’t have the RAID volume, run the SetupOptaneMemory to configure the system acceleration.Run the SetupOptaneMemory utility and under “Setup” page, select the SATA device (here we use the 1TB hard drive for an example) which you choose to configure with the Intel Optane Memory; click “Enable” to start the setups.(It takes few seconds until the setup is complete)1.6.Reboot the system to complete the setup.*Warning*A system RESTART is required after the setup completes, do NOT shut downand power on the system after the setup because there is a high risk that the system can be unstabled or not able to boot up without a proper systemreboot after the setup.1.6.1.Intel® Rapid Storage Technology Utility:1.6.2.SetupOptaneMemory Utility:2.How to correctly disable Intel Optane Memory?2.1.Intel® Rapid Storage Technology Utility:2.1.1.Enter “Intel(R) Rapid Storage Technology” in Windows Search columnand run the application.2.1.2.Click the “Disable” button under Intel Optane Memory page.2.1.3.Wait for the setup and follow the on screen instruction to RESTART thesystem and complete the changes.2.2.SetupOptaneMemory Utility:2.2.1.Enter “SetupOptaneMemory” in Windows Search column and run theapplication.2.2.2.Click the “Disable” button under Setup page.2.2.3.Wait for the setup and follow the on screen instruction to RESTART thesystem and complete the changes.3.How to check if the Intel Optane is configured properly?3.1.If you have the pre-installed system and Optane, enter “Intel(R) RapidStorage Technology” in Windows Search column and run the application; click on Intel® Optane Memory button to check its current status.3.2.I f the Optane is configured by SetupOptaneMemory utility, enter“SetupOptaneMemory” in Windows Search column and run the application;select the Setup page to check the current status.。
计算机基础知识:开机自检阶段跳出的常见“英文短句”(异常警示)解析及其处理电脑开机的第一个阶段是——加电自检(因主板不同显示自检信息的完整性也不同)。
但是无论是什么主板,在这个阶段假如出现自检异常,就会跳出各种各样的英文短句。
千万不要忽视这些短句:因为它包含了非常重要的信息,读懂这些信息就可以非常方便的自己动手解决相应问题。
现将一些常见“英文短句”(异常警示)解释及其处理提供如下,供大家参考。
1.CMOS battery failed中文:CMOS电池失效。
处理:只要更换新的主板纽扣电池即可。
2.CMOS check sum error-Defaults loaded中文:CMOS执行全部检查时发现错误,要载入系统预设值。
处理:可以先换个主板纽扣电池试试。
如果问题还是没有解决,那么说明CMOS RAM可能有问题,只有进行更换。
3.Press ESC to skip memory test中文:正在进行内存检查,可按ESC键跳过。
处理:这是因为在CMOS内没有设定跳过存储器的第二、三、四次测试,开机就会执行四次内存测试。
当然可以按 ESC键结束内存检查,不过每次都要这样太麻烦了。
你可以进入COMS设置后选择BIOS FEATURS SETUP,将其中的Quick Power OnSelf Test设为Enabled,储存后重新启动即可。
4.Keyboard error or no keyboard present中文:键盘错误或者未接键盘。
处理:检查键盘的连线是否松动或者损坏。
5.Hard disk install failure中文:硬盘安装失败。
处理:这是因为硬盘的电源线或数据线可能未接好或者硬盘跳线设置不当。
可以检查一下硬盘的各根连线是否插好,看看同一根数据线上的两个硬盘的跳线的设置是否一样,如果一样,只要将两个硬盘的跳线设置的不一样即可(一个设为Master,另一个设为Slave)。
6.Secondary slave hard fail中文:检测从盘失败处理:可能是CMOS设置不当,比如说没有从盘但在CMOS里设为有从盘,那么就会出现错误,这时可以进入COMS设置选择IDE HDD AUTODETECTION进行硬盘自动侦测。
电脑出现错误?看不懂英⽂?⼩编整理了电脑常见错误中英⽂翻译经常遇到电脑出现问题,但是⾃⼰⼜看不懂英⽂,⽆法和维修⼈员表达出电脑的问题。
⼩编今天帮你整理出电脑常见问题有中英对译,希望对你有帮助!建议收藏哦!⼀、BIOS中的提⽰信息提⽰信息说明Drive A error 驱动器A错误System halt 系统挂起Keyboard controller error 键盘控制器错误Keyboard error or no keyboard present 键盘错误或者键盘不存在BIOS ROM checksum error BIOSROM 校验错误Single hardisk cable fail 当硬盘使⽤Cable选项时硬盘安装位置不正确FDD Controller Failure BIOS 软盘控制器错误HDD Controller Failure BIOS 硬盘控制器错误Driver Error 驱动器错误Cache Memory Bad, Do not Enable Cache ⾼速缓存Cache损坏,不能使⽤Error: Unable to control A20 line 错误提⽰:不能使⽤A20地址控制线Memory write/Read failure 内存读写失败Memory allocation error 内存定位错误CMOS Battery state Low CMOS没电了Keyboard interface error 键盘接⼝错误Hard disk drive failure 加载硬盘失败Hard disk not present 硬盘不存在Floppy disk(s) fail (40) 软盘驱动器加载失败,⼀般是数据线插反,电源线没有插接,CMOS内部软驱设置错误CMOS checksum error-efaults loaded. CMOS校验错误,装⼊缺省(默认)设置打开应⽤保存⾼清⼤图⼆、BIOS刷新失败后,Bootblock启动时出现的提⽰信息提⽰信息说明Detecting floppy drive A media... 检测软驱A的格式Drive media is : 1.44Mb1.2Mb 720Kb 360K 驱动器格式是1.44Mb、12Mb、720kb、360kb的⼀种DISK BOOT FAILURE, INSERT SYSTEM DISK AND PRESS ENTER 磁盘引导失败,插⼊系统盘后按任意键继续三、MBR主引导区提⽰信息提⽰信息说明Invalid partition table ⽆效的分区表Error loading operating sy stem 不能装⼊引导系统Missing operating system 系统引导⽂件丢失说明:如果在计算机启动过程中,在硬件配置清单下⽅(也就时在平时正常启动时出现Starting Windows 98…的地⽅)出现不可识别字符,此时可判断硬盘分区表损坏。
Analyzing the Intel Itanium Memory Ordering Rules Using Logic Programming and SATYue Yang,Ganesh Gopalakrishnan,Gary Lindstrom,and Konrad SlindSchool of Computing,University of Utah{yyang,ganesh,gary,slind}@Abstract.We present a non-operational approach to specifying andanalyzing shared memory consistency models.The method uses higherorder logic to capture a complete set of ordering constraints on executiontraces,in an axiomatic style.A direct translation of the semantics to aconstraint logic programming language provides an interactive and incre-mental framework for exercising and verifyingfinite test programs.Theframework has also been adapted to generate equivalent boolean satisfi-ability(SAT)problems.These techniques make a memory model spec-ification executable,a powerful feature lacked in most non-operationalmethods.As an example,we provide a concise formalization of the IntelItanium memory model and show how constraint solving and SAT solv-ing can be effectively applied for computer aided analysis.Encouraginginitial results demonstrate the scalability for complex industrial designs.1IntroductionModern shared memory architectures rely on a rich set of memory-access re-lated instructions to provide theflexibility needed by software.For instance,the Intel Itanium T M processor family[1]provides two varieties of loads and stores in addition to fence and semaphore instructions,each associated with different ordering restrictions.A memory model defines the underlying memory order-ing semantics(also known as memory consistency).Proper understanding of these ordering rules is essential for the correctness of shared memory consis-tency protocols that are aggressive in their ordering permissiveness,as well as for compiler transformations that rearrange multithreaded programs for higher concurrency and minimal synchronization.Due to the complexity of advanced computer architectures,however,practicing designers face a serious problem in reliably comprehending the memory model specification.Consider,for example,the assembly code shown in Fig.1that is run concur-rently on two Itanium processors(such code fragments are generally known as litmus tests):Thefirst processor,P1,executes a store of datum1into address a; it then performs a store-release1of datum1into address b.Processor P2performs a load-acquire from b,loading the result into register r1.It is followed by an or-dinary load from location a into register r2.The question arises:if all locations This work was supported by a grant from the Semiconductor Research Corporation for Task1031.001,and Research Grants CCR-0081406and CCR-0219805of NSF.P1P2st a,1;ld.acq r1,b;st.rel b,1;ld r2,a;Fig.1.A litmus test showing the ordering properties of store-release and load-acquire. Initially,a=b=0.Can it result in r1=1and r2=0?The Itanium memory model does not permit this result.However,if the load-acquire in P2is changed to an ordinary load,the result would be allowed.initially contain0,can thefinal register values be r1=1and r2=0?To determine the answer,the Itanium memory model must be consulted.The formal specifi-cation of the Itanium memory model is given in an Intel application note[2].It comprises a complex set of ordering rules,24of which are expressed explicitly based on a large amount of special terminology.One can follow a pencil-and-pen approach to reason that the execution shown in Fig.1is not permitted by the rules specified in[2].Based on this,one can conclude that even though the instructions in P2pertain to different addresses,the underlying hardware is not allowed to carry out the ordinary load at the beginning,and by the same token, a shared memory consistency protocol or an optimizing compiler cannot reorder the instructions in P2.A further investigation shows that the above result would be permitted if the st.rel in P1is changed to a st,or the ld.acq in P2is changed to a ld.Therefore,st.rel and ld.acq must both be used in pairs to achieve the“barrier”effect in this scenario.A litmus test like this can reveal crucial information to help system design-ers make right decisions in code selection and optimizations.But as bigger tests are used and more intricate rules are involved,trace properties quickly become non-intuitive and hand-proving program compliance can be very difficult.How can one be assured that there does not exist an interacting rule that might in-troduce unexpected implications?Also,a large scale design is often composed from simpler components.To avoid being overwhelmed by the overall complex-ity,a useful technique is to isolate the rules related to a specific architectural feature so that one can analyze the model piece by piece.For example,if one can selectively enable/disable certain rules,he or she may quicklyfind out that the“program order”rules in[2]are critical to the scenario in Fig.1while many others are irrelevant.These issues suggest that a series of useful features are needed from the specification framework to help people better understand the underlying model. Unfortunately,most non-operational specification methods leave these issues un-resolved because they use notations that do not support analysis through execu-1Briefly,a store-release instruction will,at its completion,ensure that all previous instructions are completed;a load-acquire instruction correspondingly ensures that all following instructions will complete only after it completes.These explanations are far from precise-what does“previous”and“completion”mean?A formal spec-ification of a memory model is key to precisely capture these and all similar notions.tion.Given that designers need lucid and reliable memory model specifications, and given that memory model specifications can live for decades,it is crucial that progress be made in this regard.In this paper,we take a fresh look at the non-operational specification method and explore what verification techniques can be applied.We make the follow-ing contributions in this paper.First,we present a compositional method to axiomatically capture all aspects of the memory ordering requirements,result-ing a comprehensive,constraint-based memory consistency model.Second,we propose a method to encode these specifications using FD-Prolog.2This enables one to perform interactive and incremental analysis.Third,we have harnessed a boolean satisfiability checker to solve the constraints.To the best of our knowl-edge,this is thefirst application of SAT methods for analyzing memory model compliance.As a case study in this approach,we have formalized a large subset of the Itanium memory model and used constraint programming and boolean satisfiability for program analysis.Related work The area of memory model specification has been pursued under different approaches.Some researchers have employed operational style specifi-cations[3][4][5][6],in which the update of a global state is defined step-by-step with the execution of each instruction.For example,an operational model[4] for Sparc V9[7]was developed in Murphi.With the model checking capability supported by Murphi,this executable model was used to examine many code se-quences from the Sparc V9architecture book.While the descriptions comprising an operational specification often mirror the decision process of an implementer and can be exploited by a model checker,they are not declarative.Hence they tend to emphasize the how aspects through their usage of specific data structures, not the what aspects that formal specifications are supposed to emphasize.Other researchers have used non-operational(also known as axiomatic)spec-ifications,in which the desired properties are directly defined.Non-operational styles have been widely used to describe conceptual memory models[8][9].One noticeable limitation of these specifications is the lack of a means for automatic execution.An axiomatic specification of the Alpha memory model was written by Yu[10]in Lisp in1995.Litmus-tests were written in an S-expression syntax. Verification conditions were generated for the litmus tests and fed to the Simplify [11]verifier of Compaq/SRC.In contrast,we specify the modern Itanium mem-ory model.Our specification is much closer to the actual industrial specification, thanks to the declarative nature of FD-Prolog.The FD constraint solver offers a more interactive and incremental environment.We have also applied SAT and demonstrated its effectiveness.Lamport and colleagues have specified the Alpha and Itanium memory mod-els in TLA+[12][13].These specifications also support the execution of litmus tests.Their approach builds visibility order inductively.While this also pre-cisely specifies the visibility order,the manner in which such inductive definitions 2FD-Prolog refers to Prolog with afinite domain(FD)constraint solver.For example, SICStus Prolog and GNU Prolog have this feature.Fig.2.The process of making an axiomatic memory model executable.Legality of a litmus test can be checked by either a constraint solver or a SAT solver.are constructed will vary from memory model to memory model,making com-parisons among them harder.Our method instead relies on primitive relations and directly describes the components to make up a full memory model.This makes our specifications easier to understand,and more importantly,to com-pare against other memory models using the same primitives.This also means we can disable some sub-rules quite reliably without affecting the other primitive ordering rules-a danger in a style which merges all the ordering concerns in a monolithic manner.Roadmap In the next section,we introduce our methodology.Section3de-scribes the Itanium memory ordering rules.Section4demonstrates the analysis of the Itanium memory model through execution.We conclude and propose future works in Section5.The concise specification of the Itanium ordering constraints is provided in the Appendix.2Overview of the FrameworkA pictorial representation of our methodology is shown in Fig.2.We use a collection of primitive ordering rules,each serving a clear purpose,to specify even the most challenging commercial memory models.This approach mirrors the style adopted in modern declarative specifications written by the industry, such as[2].Moreover,by using pure logic programs supported by certain modern flavors of Prolog that also includefinite domain constraints,one can directly capture these higher order logic specifications and also interactively execute the specifications to obtain execution results for litmus tests.Alternatively,we can obtain SAT instances of the boolean constraints representing the memory model through symbolic execution,in which case boolean satisfiability tools can be employed to quickly answer whether certain litmus tests are legal or not.2.1Specification MethodTo define a memory model,we use predicate calculus to specify all constraints imposed on an ordering relation order.The constraints are almost completelyfirst-order;however,since order is a parameter to the specification,the con-straints are most easily captured with higher order predicate calculus(we use the HOL logic[14]).Previous non-operational specifications often implicitly require general ordering properties,such as totality,transitivity,and circuit-freeness. This is the main reason why such specifications cannot readily be executed. In contrast,we are fully explicit about such properties,and so our constraints completely characterize the memory model.2.2Executing Axiomatic SpecificationsA straightforward transcription of the formal predicate calculus specification into a Prolog-style logic program makes it possible for interactive and incremen-tal execution of litmus tests.This encourages exploration and experiment in the validation and(we anticipate)the development of complex coherence protocols. To make a specification executable,we instantiate it over afinite execution and convert the verification problem to a satisfiability problem.The Algorithm Given afinite execution ops with n operations,there are n2 ordering pairs,constituting an ordering matrix M,where the element M ij indi-cates whether operations i and j should be ordered.We go through each ordering rule in the specification and impose the corresponding constraint regarding the elements of M.Then we check the satisfiability of all the ordering requirements. If such a M exists,the trace ops is legal,and a valid interleaving can be derived from M.Otherwise,ops is not a legal trace.Applying Constraint Logic Programming Logic programming differs fun-damentally from conventional programming in that it describes the logical struc-ture of the problems rather than prescribing the detailed steps of solving them. This naturally reflects the philosophy of the axiomatic specification style.As a result,our formal specification can be easily encoded using Prolog.Memory or-dering constraints can be solved through a conjunction of two mechanisms that FD-Prolog readily provides.One applies backtracking search for all constraints expressed by logical variables,and the other uses non-backtracking constraint solving based on arc consistency[15]for FD variables,which is potentially more efficient and certainly more complete(especially under the presence of negation) than with logical variables.This works by adding constraints in a monotonically increasing manner to a constraint store,with the in-built constraint propagation rules of FD-Prolog helping refine the variable ranges(or concluding that the constraints are not satisfiable)when constraints are discovered and asserted to the constraint store.Applying Boolean Satisfiability Techniques The goal of a boolean satis-fiability problem is to determine a satisfying variable assignment for a boolean formula or to conclude that no such assignment exists.A slight variant of the Prolog code can let us benefit from SAT solving techniques,which have advanced tremendously in recent years.Instead of solving constraints using a FD solver,we can let Prolog emit SAT instances through symbolic execution.The resultant formula is true if and only if the litmus test is legal under the memory model. It is then sent to a SAT solver tofind out the result.3Specifying the Itanium Memory Consistency Model The original Itanium memory ordering specification is informally given in various places in the Itanium architecture manual[1].Intel later provided an applica-tion note[2]to guide system developers.This document uses a combination of English and informal mathematics to specify a core subset of memory opera-tions in a non-operational style.We demonstrate how the specification of[2] can be adapted to our framework to enable computer aided analysis.Virtually the entire Intel application note has been captured.3We assume proper address alignment and common address size for all memory accesses,which would be the common case encountered by programmers(even these restrictions could be easily lifted).The detailed definition of the Itanium memory model is presented in the Appendix.This section explains each of the rules.The following defini-tions are used throughout this paper:Instructions-Instructions with memory access or memory ordering semantics. Five instruction types are defined in this paper:load-acquire(ld.acq),store-release(st.rel),unordered load(ld),unordered store(st),and memory fence (mf).An instruction i may have read semantics(isRd i=true)or write seman-tics(isWr i=true).Ld.acq and ld have read semantics.St.rel and st have write semantics.Mf has neither read nor write semantics.Instructions are de-composed into operations to allow afiner specification of the ordering properties. Execution-Also known as a trace,contains all memory operations generated by a program,with stores being annotated with the write data and loads being annotated with the return data.An execution is legal if there exists an order among the operations in the execution that satisfies all memory model con-straints.Address Attributes-Every memory location is associated with an address attribute,which can be write-back(WB),uncacheable(UC),or write-coalescing (WC).Memory ordering semantics may vary for different attributes.Predicate attribute is used tofind the attribute of a location.Operation Tuple-A tuple containing necessary attributes is used to mathe-matically describe memory operations.Memory operation i is represented by a tuple P,Pc,Op,Var,Data,WrId,WrType,WrProc,Reg,UseReg,Id ,where3We have formally captured21out of24rules from[2].Semaphore operations,which require3rules,have yet to be defined.requireLinearOrder requireMemoryDataDependence requireReadValue -requireIrreflexiveTotal-MD:RAW-validWr-requireTransitive-MD:WAR-validLocalWr -requireAsymmetric-MD:WAW-validRemoteWr-validDefaultWr requireWriteOperationOrder requireDataFlowDependence-validRd-local/remote case-DF:RAR-remote/remote case-DF:RAW requireNoUCBypasss-DF:WARrequireProgramOrder requireSequentialUC -acquire case requireCoherence-RAR case-release case-local/local case-RAW case-fence case-remote/remote case-WAR case-WAW caserequireAtomicWBReleaseTable1.The specification hierarchy of the Itanium memory ordering rules.p i=P:issuing processorpc i=P c:program counterop i=Op:instruction typevar i=V ar:shared memory locationdata i=Data:data valuewrID i=W rId:identifier of a write operationwrType i=W rT ype:type of a write operationwrProc i=W rP roc:target processor of a write operationreg i=Reg:registeruseReg i=UesReg:flag of a write indicating if it uses a registerid i=Id:global identifier of the operationA read instruction or a fence instruction is decomposed into a single oper-ation.A write instruction is decomposed into multiple operations,comprising a local write operation(wrType i=Local)and a set of remote write opera-tions(wrType i=Remote)for each target processor(wrProc i),which also includes the issuing processor.Every write operation i that originates from a single write instruction shares the same program counter(pc i)and write ID (WrID i).3.1The Itanium Memory Ordering RulesAs shown below,predicate legal is a top-level constraint that defines the legal-ity of a trace ops by checking the existence of an order among ops that satisfies all requirements.Each requirement is formally defined in the Appendix.legal ops≡∃order.requireLinearOrder ops order∧requireWriteOperationOrder ops order∧requireProgramOrder ops order∧requireMemoryDataDependence ops order∧requireDataFlowDependence ops order∧requireCoherence ops order∧requireReadValue ops order∧requireAtomicWBRelease ops order∧requireSequentialUC ops order∧requireNoUCBypass ops orderTable1illustrates the hierarchy of the Itanium memory model definition. Most constraints strictly follow the rules from[2].We also explicitly add a pred-icate requireLinearOrder to capture the general ordering requirement since[2] has only English to convey this important ordering property.General Ordering Requirement(Appendix A.1)This requires order to be an irreflexive total order which is also circuit-free.Write Operation Order(Appendix A.2)This specifies the ordering among write operations originate from a single write instruction.It guarantees that no write can become visible remotely before it becomes visible locally.Program Order(Appendix A.3)This restricts reordering among instruc-tions of the same processor with respect to the program order.Memory-Data Dependence(Appendix A.4)This restricts reordering among instructions from the same processor when they access common locations.Data-Flow Dependence(Appendix A.5)This is supposed to specify how local data dependency and control dependency should be treated.However,this is an area that is not fully specified in[2].Instead of pointing to an informal document as done in[2],we provide a formal specification covering most cases of data dependency,namely establishing data dependency between two memory operations by checking the conflict usages of local registers.4Although[2]out-lines four categories for data-flow dependency(RAR,RAW,WAR,and WAW), the WAW case(a write here is actually a read in terms of register usage,e.g., st a,r)does not establish any value-based data dependence relation.Therefore, data dependency as specified in orderedByLocalDepencence is only setup by thefirst three cases.4We do not cover branch instructions or indirect-mode instructions that also induce data dependency.We provide enough data dependency specification to let designers experiment with straight-line code that uses registers-this is an important require-ment to support execution.Coherence(Appendix A.6)This constrains the order of writes to a common location.If two writes to the same location with the attribute of WB or UC become visible to a processor in some order,they must become visible to all processors in that order.Read Value(Appendix A.7)This defines what data can be observed by a read operation.There are three scenarios:a read can get the data from a local write(validLocalWr),a remote write(validRemoteWr),or the default value (validDefaultWr).In validRemoteWr we require that“the read is not ordered with the candidate remote write”.It is slightly different from[2],which requires that“the candidate write is ordered with the read”.This results from the differ-ence in the way the ordering path is constructed.Since we do no have an explicit rule that establishes the order when a read gets the value from a write,the order-ing relation between them would not pre-exist.In[2],a total order is implicitly imposed.Similar to shared memory read value rules,predicate validRd guaran-tees consistent assignments of registers-the value of a register is obtained from the most recent previous assignment of the same register.Total Ordering of WB Releases(Appendix A.8)This specifies that store-releases to write-back(WB)memory must obey remote write atomicity,i.e., they become remotely visible atomically.Sequentiality of UC Operations(Appendix A.9)This specifies that op-erations to uncacheable(UC)memory locations must have the property of se-quentiality,i.e.,they must become visible in program order.No UC Bypassing(Appendix A.10)This specifies that uncacheable(UC) memory is not cacheable and does not allow local bypassing from UC writes.4Making the Itanium Memory Model ExecutableWe have developed two methods to analyze the Itanium memory model.The first,as mentioned earlier,uses Prolog backtracking search,augmented with finite-domain constraint solving.The second approach targets the powerful SAT engines that have recently emerged.The Constraint Logic Programming ApproachOur formal Itanium specification is implemented in SICStus Prolog[16].Litmus tests are contained in a separate testfile.5When a test number is selected, the FD constraint solver examines all constraints automatically and answers whether the selected execution is legal.By running the litmus tests we can learn 5We have verified most of the sample programs provided by[2].The only3(out of17) examples we cannot do at this point involve disjoint accesses to memory locations.Other litmus tests can also be easily added.P1P2(1)st_local(a,1);(7)ld.acq(1,b);(2)st_remote1(a,1);(8)ld(0,a);(3)st_remote2(a,1);(4)st.rel_local(b,1);(5)st.rel_remote1(b,1);(6)st.rel_remote2(b,1);Fig.3.An execution resulted from the program in Fig.1.Stores are decomposed into local stores and remote stores.Loads are associated with return values.the degree to which executions are constrained,i.e.,we can obtain a general view of the global ordering relation between pairs of instructions.Consider,for example,the program discussed earlier in Fig.1.Its instruc-tions are decomposed into operations as shown in Fig.3.After taking this trace as input,the Prolog tool attempts all possible orders until it canfind an instanti-ation that satisfies all constraints.For this particular example,it returns“illegal trace”as the result.If one comments out the requireProgramOrder rule and examines the trace again,the tool quicklyfinds a legal ordering matrix and a corresponding interleaving as shown in Fig.4.Many other experiments can be conveniently performed in a similar way.Therefore,not only does this approach give people the notation to write rigorous as well as readable specifications,it also allows users to play with the model,asking“what if”queries after selectively enabling/disabling the ordering rules that are crucial to their work.Although translating the formal specification to Prolog is fairly straightfor-ward,there does exist some“logic gap”between predicate calculus and Prolog. Most Prolog systems do not directly support quantifiers.Therefore,we need to implement the effect of a universal quantifier by enumerating the relatedfinite domain.The existential quantifier is realized by the back tracking mechanism of12345678101100000200100000300000000411101110511100110611100010711100000811111110Fig.4.A legal ordering matrix for the execution shown in Fig.3when requirePro-gramOrder is disabled.A value1indicates that the two operations are ordered.A possible interleaving84567123is also automatically derived from this matrix.Prolog when proper predicate conditions are set.The SAT ApproachAs an alternative method,we use our Prolog program as a driver to emit propositional formulae asserting the solvability of the constraints.After being converted to a standard format called DIMACS,thefinal formula is sent to a SAT solver,such as berkmin[17]or zChaff[18].Although the clause-generation phase can be detached from the logic programming approach,the ability to have it coexist with FD-Prolog might be advantageous since it allows the two methods to share the same specification base.The complexity of boolean satisfiability is NP-Complete.However,tremendous progress has been achieved in recent years in SAT tools,making SAT solving an effective technique for industrial applica-tions.According to our initial results,this seems to offer an encouraging path to tackling larger problems.Performance ResultsPerformance statistics from some litmus tests is shown below.These tests are chosen from[2]and represented by their original table numbers.Performance is measured on a Dell Inspiron3800machine with512MB memory700MHz CPU. SICStus Prolog is run under compiled mode.The SAT solver used is berkmin.Test Result FD Solver(sec)Vars Clauses SAT(sec)CNF Gen Time [2,Table5]illegal0.38646790.03negligible [2,Table10]legal 2.3610012800.01negligible [2,Table15]illegal17.7576157060.05a minute [2,Table18]illegal 1.914421250.01few secs [2,Table19]legal 3.814420440.02few secs5ConclusionsThe setting in which contemporary memory models are expressed and analyzed needs to be improved.Towards this,we present a framework based on axiomatic specifications(expressed in higher order logic)of memory ordering requirements. It is straightforward to encode these requirements as constraint logic programs or,by an extra level of translation,as boolean satisfiability problems.In the latter case,one can employ current SAT tools to quickly answer whether certain executions are permitted or not.Our techniques are demonstrated through the adaptation and analysis of the Itanium memory model.Being able to tackle such a complex design also attests to the scalability of our framework for cutting-edge commercial architectures.Our methodology provides several benefits.First,the ability to execute the underlying model is a powerful feature that promotes understanding.Second,the compositional specification style provides modularity,reusability,and scalability. It also allows one to add constraints incrementally for investigation purposes. Third,the expressive power of the underlying logic allows one to define a wide。