当前位置:文档之家› Innovation_NoelHurley

Innovation_NoelHurley

Innovation_NoelHurley
Innovation_NoelHurley

Innovation and the ARM Architecture ARM的体系结构和革新
Noel Hurley Manager, CPU Product Marketing
TM
1
Growth of Software (DTV) / 软件的增长
TM
2

Performance Growth Continues
性能不断提高
Media based products drive 基于媒体的产品要求 – More complex OS – 更复杂的操作系统 – More complex applications – 更复杂的应用程序 – E-commerce – 电子商务 More applications running simultaneously 更多的程序同时运行 More Gaming in devices 在器件上有更多的游戏 Widening gap between core and memory speeds 内核和存储器的速度差距越来越大 Greater pressure on predictability 要求更多的前瞻性 Increased performance and architectural efficiency 要求性能和构架不断的提升
000's Tri/sec (lit, sm ooth shaded, textured) 3500 3000 2500 2000 1500 1000 500 0 2003 2004 2005 2006 2007 2008
3D Graphics Requirements
High End System s Low End Systems
TM
3
The Need for Security
安全性的需求
mCommerce on the rise 电子商务不断增长 – Need for secure transactions – 需要安全的信息交互 – Prevention of identity thefts – 提防信息犯罪 Mobile devices used as business platforms 移动设备应用于商务平台 – Need for data privacy – 需要保密数据 – Prevention of data corruption. – 防范数据盗窃 – Guarding against viruses – 反对病毒入侵 Wireless service losses 无线服务的损失 – Illegal unlocking of phones – 非法的电话解锁 – Over $1B per year – 每年超过$1B的损失
Wireless Financial Payments
35 Users in millions 30 25 20 15 10 5 0 1999 2000 2001 2002 2003 2004
Asia Europe USA
source: Celent communications
“The real issue with viruses, worms and other Internet crime is identity. Can the identity of the data sender be reliably authenticated? Can the user or operating system determine that a program is safe to run, prior to its execution?” - Kerry Maletsky Atmel Corporation
TM
4

Deep Submicron…New Rules
深亚微米….新规则
Technology scaling trends are not in our favour 技术变化的趋势带来新的困难 – Leakage power 100 – 漏电流功耗 Technology node – New processes are expensive 1 – 昂贵的新工艺 – Diminishing performance gains from 0.01 process scaling – 工艺技术的革新所带来 的性能的提高越来越有限 0.0001 – Dynamic power Sub-threshold Gate-oxide remains high Leakage Leakage 0.000001 – 动态功耗依然很高 1990 1995 2000 2005 Solutions need to cut across traditional boundaries 需要新方案来超越传统的边界 – (SW / architecture / micro-architecture / circuits) – (软件/构架/微构架/电路) – NEON?, IEM, TrustZone?, OptimoDE?…
Norm alized Total Chip Pow er Dissipation
TM
300
Possible trajectory if high-k dielectrics reach m ainstream production
200
150
100
50
2010
2015
0 2020
Data from ITRS roadmap
5
The Architecture for the Digital World?
数字世界的体系结构
Software / 软件
Tools / 工具
Engines 引擎 Systems / 系统
TM
6
Physical Gate Length[nm ]
Dynam ic Pow er
250

Innovation in the Data Plane
在数据处理上的革新 Control-plane 控制处理器
Soft
Design flexibility Design reuse Time to market Power Performance Area
Engines
Hard
RISC Cores
Digital Signal Processing
Dedicated Logic
Configurable 可配置
Optimal Performance and Efficiency 在性能和效率上优化
Programmable 可编程
Optimal Flexibility and risk mitigation 在灵活性和较少 风险上优化
VLIW 超长指令
Optimal Performance through parallelism 通过并行性优化性能
TM
7
Architectural Innovation
体系结构的革新
2004 ARMv7: Neon ARM Cortex? Family ARMv6: TrustZone?, IEM*
增强的媒体和数字处理提高了系 统性能 为下一代的移动设备提供增强的 功耗利用率和安全特性
在设备的核心提供增强的安全性 Increased Performance AND Lower system cost
2003
2003 2002 2000 1998 1995
ARMv6T2: Thumb-2 ISA ARMv6: SIMD Media ISA ARMv5TEJ: Jazelle?
增强的代码密度减少系统成本 增强的媒体和数字处理提高系 统性能 增强的Java性能满足市场的要 求
ARMv5TE: DSP Instructions 增强的媒体和数字处理提高了
系统性能
ARMv4T: Thumb ISA
增强的代码密度减少系统成本
TM
8
Data-plane

Thumb-2 - Instruction Set Innovation
Thumb-2 – 指令集的创新
EEMBC Analysis - Performance
Second Generation of the Thumb? Architecture 第二代Thumb?体系 – Blended 16-bit and 32-bit instruction set – 混合的16位和32位指令集 – 25% faster than Thumb – 比Thumb指令快25% – 26% smaller than ARM – 比ARM指令小26% Fully binary compatible with existing Thumb code 同现有的Thumb代码二进制兼容 Increases Performance but maintains code density 在保持代码密度的同时提高性能 Maximizes cache and tightly coupled memory usage 最佳的Cache和TCM的利用率 Eliminates mode switching overhead when using floating point and interrupts 可以减少浮点处理和中断时模式切换的开销
TM
EEMBC Analysis – Code Size
9
NEON?- Innovation in Media & DSP
NEON? - 在媒体和数字处理上的创新
有效的数字处理 – >2 x ARM11? and >3 x ARM9E? 在媒体和数字处理上 有效的数据操作 – 优化在寄存器中的数据表现形式 – 对可利用的的存储器带宽的最优使用 – 减少数据放置的开销 – 在分离的寄存器中操作 ? 将VFP的寄存器组扩展为32x64bit寄存器 – 支持整形,定点和浮点数据 紧凑的集成度提供了简单的编程模式 面向“C”的编程模型
DMA
Memory
Bus matrix
Neon
ARM Core
peripheral
TM
10

IEM – Innovation in Power
IEM – 功耗革新
Hardware and software solution for energy management by dynamic control of voltage and frequency scaling 通过动态控制电压和频率提供能量管理的硬件和软件方案
Apps Apps Apps
IEM software
OS
Policy Evaluation Stack
Required Performance Volts, MHz
Intelligent Energy Controller
Policy Policy Policy
Dynamic Voltage Controller
Dynamic Clock Generator
IEM software connects to OS kernel and collects data IEM软件从操作系统中获取数据 Multiple policies categorize the software workload 多种策略对软件工作负荷加以分类 Prediction of future performance requirement is made 对未来的性能要求加以预测 Suitable operating point (Voltage and Frequency) is set 设置电压和频率的适当工作点
TM
11
TrustZone – Innovation in Security
TrustZone – 安全性的革新 Normal
General Apps Media App Hardware support required to enable Trusted Secure Kernel 硬件的支持带来可信任的 安全内核 Secure Kernel allows the use of authenticated dynamically loaded applications 安全的内核允许经过认证 的应用程序的使用
Trusted
DRM
OS
access driver
Trusted Interpreter (STIP*)
Security services
Secure kernel
TrustZone Monitor Software
* As defined by the STIP Consortium and GlobalPlatform.
TM
12

CPU and Systems Innovation
CPU和系统的创新
ARM 11 Family MPCore ARM Cortex Family
Engines
PrimeXsys and the Backplane
TM
13
The ARM11 Family
ARM11系列
Application Processors 应用处理器
ARM1136J(F)-S ARM1176JZ(F)-S
Embedded / 嵌入式
ARM1156T2(F)-S
ARM1176J( F)-S
TM
14

ARM11 Family – SIMD
ARM11系列 – SIMD
ARM11 vs ARM9E ARM11 vs ARM7
ARM11 SIMD Improvement using Emuzed Codecs
MPEG4 AAC MPEG4 AAC LTP AAC Dec MP3 Dec MJPEG Dec MJPEG Enc JPEG Dec (1024x768 10:1) JPEG Enc (1024x768 10:1) H.264 Baseline Dec H.264 Baseline Enc H.263 Baseline Dec H.263 Baseline Enc MPEG4 SP Decode MPEG4 SP Encode 1.00 1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00
Improvement per MHz
TM
15
ARM1176JZ(F)-S
A 550MHz Security Enhanced ARM11 Core
ARMv6Z体系结构 –媒体扩展 –Jazelle –增强的实时性能(快速中断模式/VIC支持) 550M ARM11内核 –快速的,8级流水线 –动态的跳转预测和返回堆栈 –可选择的向量浮点处理 IEM Enabled 激活的IEM –支持多电压域(睡眠模式)和频率可调 ARM TrustZone? –对系统安全型的体系扩展 高性能的AXI存储器系统 –可编程的指令和数据缓存,以及带有专用64位 DMA的TCM
TM
Area / 面积
2.95mm2 on 0.13u process 0.13工艺,2.95mm2
16

Importance of System Performance
系统性能的重要性
“The gap between the speed of processor and the speed of DRAM is growing at 50% per year.” 处理器和DRAM速度上的差距以每年50%的速度增长 David Patterson – Bridging the Processor Memory Gap
Poor system design results in poor software performance even with at high MHz 差的系统设计将导致低的软件性能,即使运行在很高的时钟频率下 – Need optimized infrastructure for performance and power efficiency – 需要为性能和功耗效率优化基础结构 – Unleash full performance of ARM11processors and OptimoDE? data engines even with low cost, slow memory – 用尽可能小的代价和存储器释放ARM11处理器和OptimoDE? 数据引擎的全部性能 – Application specific, so system needs to be flexible, configurable – 应用的特殊性要求灵活和可配置的系统
L220 AXI Level 2 Cache Controller
PL300 AXI Configurable Interconnect
PL340 AXI Memory Controller
SDR/DDR SDRAM
TM
17
ARM1176JZ-S PrimeXsys Platform
TrustZone technology for processor and system security TrustZone带来处理器 和系统的安全性 Intelligent Energy Manager (IEM) compatible 符合智能能量管理技术 AXI System Components AXI 系统部件
CoreSight? debug and trace CoreSight?调试和 跟踪技术
Software development environment 软件开发环境
Integrated – Optimized – Verified 集成—优化—验证
TM
18

ARM1176JZ-S PrimeXsys Platform
Optimised power, performance & area for application area 为应用领域优化功耗 , 性能和面积 Virtual Component is best possible starting point 虚拟部件作为 最佳的起始点 Configured IP memory controller, AXI system Interconnect 可配置的IP 存储控 制器,AXI系统接口 Reduced Risk 减少风险 Reduced Cost 降低成本 Shortening Time to Market 缩短上市时间 Delivered to lead Partner already 已被先进的合作伙伴 采用
Integrated – Optimized – Verified 集成—优化—验证
TM
19
ARM1156T2(F)-S Core Summary
性能 – 0.13工艺下550MHz – 高速的9级流水线 – 优化的浮点单元: ARM1156T2F-S – 并行的load/store和整型处理 – AXI总线提供内核级的吞吐速度 – 0.13工艺下1.9mm2 容错性 – 容错机制减少软件错误 – 优秀的存储器颗粒保护单元 预见性 – 快速的中断相应 – 确定的操作 效率 – 最高指令吞吐率的ARM内核 ? 全局的跳转预测 达到1.40DMIPS/MHz – Thumb-指令集提供快速和紧凑的代码 ? 比ARM指令小26%,比Thumb指令快25%
TM
20

EEMBC Auto/Industrial Suite
EEMBC 汽车/工业测试
ARM1156T2F-S
Performance
140 120 350 300 250
PPC405
TriCore TC1M
SH4
Relative Code Size
EEMBC AutoMark
100 80 60 40 20 0
%
200 150 100 50 0
ARM1156T2F-S score uncertified, other data from EEMBC website (July 2003) ARM1156T2F-S非经正式认证,其他数据从EEMBC网站得取(2003年七月)
TM
21
MPCore – Innovating Multiprocessing
MPCore – 多处理器的创新
Extending your choice in creating high-performance devices 在创造高性能的设备方面拓宽你的选择 – Either through clock frequency and superscalar architecture – 可以通过高的时钟频率和超标量的架构 – Or now through multiprocessors – 也可以通过多重处理 ? Complementary technologies ? 互补的技术 ? Together extending the ARM advantage into new markets ? 一起将ARM的优势扩展到新的市场 MPCore is ARM’s first multiprocessor core MPCore是ARM第一个多处理器内核 – Supporting multi-function / multi-application devices needing high performance and low power consumption – 支持需要高性能和低功耗的多功能/多应用设备 – Ideal for consumer devices both in the home and the car, and networking applications – 对家用,车载和网络应用的设备是理想的选择
TM
22

MPCore – Benefits
MPCore – 好处
H/W enhanced interrupt and inter-processor communication
Configurable number of hardware interrupt lines
Private Fast Interrupts (FIQ)
(Can be used as NMI)
Interrupt Distributor
Configurable Between 1 and 4 CPUs
Timer Wdog CPU interface
IRQ
Timer Wdog
CPU interface
IRQ
Timer Wdog
CPU interface
IRQ
Timer Wdog
CPU interface
CPU/VFP
Cache coherence for flexible and efficient software Private Peripheral Bus
CPU/VFP L1 Memory
CPU/VFP L1 Memory
CPU/VFP L1 Memory
L1 Memory
Snoop Control Unit (SCU)
I & D Coherence 64bit bus Control bus
Looks like a uniprocessor with simplified integration and validation
Primary AXI R/W 64-bit bus
TM
Optional 2nd AXI R/W 64-bit bus
Performance, scalability and flexibility
23
The ARM Cortex Family
ARM Cortex系列
ARM Cortex
Thumb-2, NEON, ARMv7
一个基于ARMv7架构新的产品系列,Thumb2指令集和AMBA AXI接口规范 ARM Cortex A 系列 – 应用CPU,针对于执行 复杂操作系统和用户应用程序 – 第一个产品:即将面世 – 执行ARM、Thumb、Thumb-2 ARM Cortex R 系列 – 深度嵌入,针对实时环 境 – 第一个产品:即将面世 – 执行ARM、Thumb、Thumb-2 ARM Cortex M 系列 – 微控制器,针对成本敏 感、系统确定、中断驱策的环境
2007
2000+
Performance
500
400
300
200
100
Cortex-M3 “Sandcat”
2005
– 第一个产品:ARM Cortex-M3 – 执行Thumb和Thumb-2 指令
TM
24

ARM CPU Market Diversity
多样的ARM CPU市场
Automotive1% Consumer Entertainment 6% Imaging 7% MCU 2% Networking 10% Secure 1% Storage 5%
Units in Thousands 300 400
Unit Shipments of Non-Wireless ARM Cores
200
100
Wireless 68%
Units shipped in Q1 2004 by segment 以市场细分的 2004第一季度的出货量
0
2000
2001
2002
2003
2004
Increased diversity leads to increasing segmentation and focus Architectural technologies deployed at multiple price/performance points 产品多样性的增长带来市场的细分,对体系结构技术就有了多种价性 比的需求
TM
25
The First member of the ARM Cortex Family
ARM Cortex系列第一个成员
Targeting 32-Bit Performance for Cost-Sensitive Applications 瞄准成本敏感的32位应用
ARM Cortex-M3 Processor ARM Cortex-M3 处理器
TM
26

Dynamic Microcontroller Market
不断变化的微控制器市场
In-Stat/MDR MCU Demands In Applications 2002-2007 (Oct 03)
$4,500 $4,000 $3,500 $3,000
$2,500 $2,000 $1,500 $1,000 $500 $0 8-Bit 16-Bit 32-Bit
2002 2007
Continued downward pressure on cost 成本不断下降的压力 Application requirements driving higherperformance MCU 应用需要更高性能的微控制器 Engineers need faster and easier development 工程师需要更快更容易的开发
Demand accelerating for costsensitive high-performance solution 对成本敏感而又需要高性能解决 方案的需求越来越强烈
Revenue ($ Millions)
HIGH PERFORMANCE
LOW COST
“32-bit performance…. …but not at any price”
32位的性能….但不能不顾价 格
TM
27
Cortex M3 – Innovation in efficiency
Cortex M3 – 效率上的创新
Innovation in ISA, Control, Interrupts, Memory, IO & Debug 在指令构架、控制、中断、存储器、IO和调试全面革新 – Reduced gate, memory (code & data) and pin count – 节省门数、存储器(指令和数据)和管脚 – Improved Debug & Trace – 提高调试和跟踪 – Event driven determinism and reduced latency – 事件驱动决定和减少的开销
Debug and Sleep Control with Configurable Interrupts & NMI
3-Stage Pipeline, Harvard,Thumb-2 Optional ETM Optional Memory Protection Unit SingleWireViewer Flash Patch & Breakpoints SRAM/IO
Debug Access Port (Single Wire) Data Watchpoints & Trace Flash Interface
TM
28

ARM Cortex-M3 Competitor Benchmarks
ARM Cortex-M3 和竞争者的评测结果
Comparitive Core Performance
10
Relative Performance
ARM Cortex-M3 processor delivers 30% performance increase over NEC V850 ARM Cortex-M3处理器提供 比NEC V850高30%的性能
8 6 4 2 0
8051 NEC uPD 78k0 68HC12 Infineon c164 NEC V850E Renesas M32R Cortex M3
ARM Cortex-M3
ARM Cortex-M3 code is over 3x more dense than the Renesas M32R – dramatically reducing memory cost ARM Cortex-M3的代码密度是 Renesas M32R的3倍-大量的 节省存储器成本
Comparitive Core Code Density
1.2 Relative Code Size 1 0.8 0.6 0.4 0.2
Note: Scaled Automotive and MCU benchmarks
TM
0
29
ARM Cortex-M3 Processor Overview
ARM Cortex-M3 处理器概况
Low-cost, high-performance processor ideally suited to next generation MCUs and cost sensitive applications 低成本,高性能处理器是下一代MCU和成本敏感应用的理想选择 – Very high code and data density = optimal memory usage – 非常高的指令和数据密度 = 优化的存储器使用 – 33K gates core / 60K gates core & system peripherals – 33K内核门数/60K内核加系统外设门数 Efficient control architecture 高效的控制架构 – Atomic bit manipulation of data and to peripherals – 对数据和外设的原子位操作 – Best-in-class interrupt response for interrupt driven systems – 对中断驱动的系统提供最好的中断相应 Integrated system peripherals 集成的系统外设 – Configurable interrupt system 可配置的中断系统 – Integrated bus arbiter 集成的总线仲裁器 Less than 1mW/MHz power consumption 小于1mW/MHz的功耗 Advanced debug Architecture for fast system debug 为快速系统调试提供先进的调试 构架
TM
30
ARM Cortex-M3

Full Spectrum of Innovation
Release Adv Development Concept MPCore ? 2000+
ARM Cortex
Thumb?-2, NEON, ARMv7
ARM OptimoDE
PrimeXsys and the Backplane
Professional Studio HDTV Entropy Codec 350Mhz
Control Plane Performance
500 ARM1136EJ-S? 400
ARM1176JZ F- S? ARM1156T2F- S?
Consumer Video MPEG4/H.263 50MHz
ARM1026EJ-S? 300 ARM926EJ-S? 200 ARM7T DMI? ARM966E- S? ARM946E- S?
Consumer Audio MP3/AAC/AMR 7M Hz Ultra Low Pow er
ARM968E- S? SC200?/SC210? 100 SC100?/SC110? 2005
Cortex?-M3 2006 2004
Hearing Aid 2.5MHz
Worst case conditions
TM
31
Thank You! 谢 谢!
TM
32
Data Plane Performance

相关主题
文本预览
相关文档 最新文档