当前位置:文档之家› CMP design space exploration subject to physical constraints

CMP design space exploration subject to physical constraints

CMP design space exploration subject to physical constraints
CMP design space exploration subject to physical constraints

CMP Design Space Exploration Subject to Physical Constraints

Yingmin Li?,Benjamin Lee?,David Brooks?,Zhigang Hu??,Kevin Skadron??Dept.of Computer Science,University of Virginia??IBM T.J.Watson Research Center ?Division of Engineering and Applied Sciences,Harvard University {yingmin,skadron}@https://www.doczj.com/doc/0913084974.html,,zhigangh@https://www.doczj.com/doc/0913084974.html,,{dbrooks,bclee}@https://www.doczj.com/doc/0913084974.html,

Abstract

This paper explores the multi-dimensional design space for chip multiprocessors,exploring the inter-related vari-ables of core count,pipeline depth,superscalar width,L2 cache size,and operating voltage and frequency,under various area and thermal constraints.The results show the importance of joint optimization.Thermal constraints dominate other physical constraints such as pin-bandwidth and power delivery,demonstrating the importance of con-sidering thermal constraints while optimizing these other parameters.For aggressive cooling solutions,reducing power density is at least as important as reducing total power,while for low-cost cooling solutions,reducing to-tal power is more important.Finally,the paper shows the challenges of accommodating both CPU-bound and memory-bound workloads on the same design.Their re-spective preferences for more cores and larger caches lead to increasingly irreconcilable con?gurations as area and other constraints are relaxed;rather than accommodat-ing a happy medium,the extra resources simply encourage more extreme optimization points.

1Introduction

Recent product announcements show a trend toward ag-gressive integration of multiple cores on a single chip to maximize throughput.However,this trend presents an ex-pansive design space for chip architects,encompassing the number of cores per die,core size and complexity(pipeline depth and superscalar width),memory hierarchy design, operating voltage and frequency,and so forth.Identify-ing optimal designs is especially dif?cult because the vari-ables of interest are inter-related and must be considered simultaneously.Furthermore,trade-offs among these de-sign choices vary depending both on workloads and physi-cal(e.g.,area and thermal)constraints.

We explore this multi-dimensional design space across a range of possible chip sizes and thermal constraints, for both CPU-bound and memory-bound workloads.Few prior works have considered so many cores,and to our knowledge,this is the?rst work to optimize across so many design variables simultaneously.We show the inter-related nature of these parameters and how the optimum choice of design parameters can shift dramatically depending on sys-tem constraints.Speci?cally,this paper demonstrates that:

?A simple,fast approach to simulate a large number of cores by observing that cores only interact through the L2cache and shared interconnect.Our methodology uses single-core traces and only requires fast cache simulation for multi-core results.

?CPU-and memory-bound applications desire dramat-ically different con?gurations.Adaptivity helps,but any compromise incurs throughput penalties.?Thermal constraints dominate,trumping even pin-bandwidth and power-delivery constraints.Once ther-mal constraints have been met,throughput is throttled back suf?ciently to meet current pin-bandwidth and ITRS power-delivery constraints.

?A design must be optimized with thermal constraints.

Scaling from the thermal-blind optimum leads to a con?guration that is inferior,sometimes radically so, to a thermally optimized con?guration.?Simpler,smaller cores are preferred under some con-straints.In thermally constrained designs,the main determinant is not simply maximizing the number of cores,but maximizing their power ef?ciency.Ther-mal constraints generally favor shallower pipelines and lower clock frequencies.

?Additional cores increase throughput,despite the re-sulting voltage and frequency scaling required to meet thermal constraints,until performance gains from an additional core is negated by the impact of voltage and frequency scaling across all cores.

?For aggressive cooling solutions,reducing power den-sity is at least as important as reducing total power.

For low-cost cooling solutions,however,reducing to-tal power is more important.

This paper is organized as follows.Section2is the re-lated work.We introduce our model infrastructure and val-idation in section3.We present design space exploration results and explanations in section4.We end with conclu-sions and proposals for future work in section5.

2Related Work

There has been a burst of work in recent years to under-stand the performance,energy,and thermal ef?ciency of different CMP organizations.Few have looked at a large numbers of cores and none,of which we are aware,have

jointly optimized across the large number of design param-eters we consider while addressing the associated method-ology challenges.Li and Mart′?nez[17]present the most aggressive study of which we are aware,exploring up to 16-way CMPs for SPLASH benchmarks and considering power constraints.Their results show that parallel execu-tion on a CMP can improve energy ef?ciency compared to the same performance achieved via single-threaded ex-ecution,and that even within the power budget of a sin-gle core,a CMP allows substantial speedups compared to single-threaded execution.

Kongetira et al.[12]describe the Sun Niagara processor, an eight-way CMP supporting four threads per core and tar-geting workloads with high degrees of thread-level paral-lelism.Chaudhry et al.[4]describe the bene?ts of multiple cores and multiple threads,sharing eight cores with a sin-gle L2cache.They also describe the Sun Rock processor’s “scouting”mechanism that uses a helper thread to prefetch instructions and data.

El-Moursy et al.[6]show the advantages of clustered architectures and evaluate a CMP of multi-threaded,multi-cluster cores with support for up to eight contexts.Huh et al.[10]categorized the SPEC benchmarks into CPU-bound,cache-sensitive,or bandwidth-limited groups and explored core complexity,area ef?ciency,and pin band-width limitations,concluding due to pin-bandwidth limi-tations that a smaller number of high-performance cores maximizes throughput.Ekman and Stenstrom[5]use SPLASH benchmarks to explore a similar design space for energy-ef?ciency with the same conclusions.

Kumar et al.[14]consider the performance,power,and area impact of the interconnection network in CMP archi-tecture.They advocate low degrees of sharing,but use transaction oriented workloads with high degrees of inter-thread sharing.Since we are modeling throughput-oriented workloads consisting of independent threads,we follow the example of Niagara[12]and employ more aggressive L2 sharing.In our experiments,each L2cache bank is shared by half the total number of cores.Interconnection design parameters are not variable in our design space at this time, and in fact constitute a suf?ciently expansive design space of their own that we consider this to be beyond the scope of the current paper.

The research presented in this paper differs from prior work in the large number of design parameters and metrics we consider.We evaluate CMP designs for performance, power ef?ciency,and thermal ef?ciency while varying the number of cores per chip,pipeline depth and width,chip thermal packaging effectiveness,chip area,and L2cache size.This evaluation is performed with a fast decou-pled simulation infrastructure that separates core simula-tion from interconnection/cache simulation.By consider-ing many more parameters in the design space,we demon-strate the effectiveness of this infrastructure and show the inter-relatedness of these parameters.

The methodologies for analyzing pipeline depth and width build on prior work by Lee and Brooks[16]by developing?rst-order models for capturing changes in core area as pipeline dimensions change,thereby enabling power density and temperature analysis.We identify op-timal pipeline dimensions in the context of CMP archi-tectures,whereas most prior pipeline analysis considers single-core microprocessors[8,9,22],Furthermore,most prior work in optimizing pipelines focused exclusively on performance,although Zyuban et al.found18FO4delays to be power-performance optimal for a single-threaded mi-croprocessor[26].

Other researchers have proposed simpli?ed processor models,with the goal of accelerating simulation.Within the microprocessor core,Karkhanis and Smith[11]de-scribe a trace-driven,?rst-order modeling approach to estimate IPC by adjusting an ideal IPC to account for branch misprediction.In contrast,our methodology adjusts power,performance,and temperature estimates from de-tailed single-core simulations to account for fabric events, such as cache misses and bus contention.In order to model large scale multiprocessor systems running commercial workloads,Kunkel et al.[15]utilize an approach that com-bines functional simulation,hardware trace collection,and probabilistic queuing models.However,our decoupled and iterative approach allows us to account for effects such as latency overlap due to out-of-order execution,effects not easily captured by queuing models.Although decoupled simulation frameworks have been proposed in the context of single-core simulation(e.g.,Kumar and Davidson[13]) with arguments similar to our own,our methodology is ap-plied in the context of simulating multi-core processors.

3Experimental Methodology

To facilitate the exploration of large CMP design spaces,we propose decoupling core and intercon-nect/cache simulation to reduce simulation time.Detailed, cycle-accurate simulations of multi-core organizations are expensive,and the multi-dimensional search of the design space,even with just homogeneous cores,is prohibitive. Decoupling core and interconnect/cache simulation dra-matically reduces simulation cost with minimal loss in accuracy.Our simulation approach uses IBM’s Turan-dot/PowerTimer,a cycle-accurate,execution-driven simu-lator,to generate single-core L2cache-access traces that are annotated with timestamps and power values.We then feed these traces to Zauber,a cache simulator we developed that models the interaction of multiple threads on one or more shared interconnects and one or more L2caches.Za-uber uses hits and misses to shift the time and power values in the original traces.Generating the traces is therefore a one-time cost,while what would otherwise be a costly mul-tiprocessor simulation is reduced to a much faster cache https://www.doczj.com/doc/0913084974.html,ing Zauber,it is cost-effective to search the entire multi-core design space.

3.1Simulator Infrastructure

Our framework decouples core and interconnect/cache simulation to reduce simulation time.Detailed core sim-ulation provides performance and power data for various

Fetch Decode

NFA Predictor Multiple Decode

L2I-Cache Millicode Decode

L3I-Load Expand String

I-TLB Miss Mispredict Cycles

L2I-TLB Miss Register Read

13

49

177

122

77

3550

24

Table1.19FO4Latencies(cycles).

core designs,while interconnect/cache simulation projects the impact of core interaction on these metrics.

3.1.1Core Simulation

Our detailed core simulation infrastructure consists of Tu-randot,PowerTimer,and HotSpot2.0.Turandot is a val-idated model of an IBM POWER4-like architecture[19]. PowerTimer implements circuit-extracted,validated power models,which we have extended with analytical scaling formulas based on Wattch[1,2].HotSpot2.0is a vali-dated,architectural model of localized,on-chip tempera-tures[21].Each of these components in our detailed sim-ulation infrastructure is modular so that any particular sim-ulator can be replaced with an alternative.We extended Turandot and PowerTimer to model the performance and power as pipeline depth and width vary using techniques from prior work[16].

Depth Performance Scaling:Pipeline depth is quanti-?ed in terms of FO4delays per pipeline stage.1The perfor-mance model for architectures with varying pipeline depths are derived from the reference19FO4design by treating the total number of logic levels as constant and indepen-dent of the number of pipeline stages.This is an abstrac-tion for the purpose of our analysis;increasing the pipeline depth could require logic design changes.The baseline la-tencies(Table1)are scaled to account for pipeline depth changes according to Eq.(1).These scaled latencies ac-count for latch delays(F O4latch=3)and all latencies have a minimum of one cycle.This is consistent with prior work in pipeline depth simulation and analysis for a single-threaded core[26].

Lat target= Lat base×F O4base?F O4latch

1Fan-out-of-four(FO4)delay is de?ned as the delay of one inverter driving four copies of an equally sized inverter.When logic and overhead per pipeline stage is measured in terms of FO4delay,deeper pipelines have smaller FO4delays.

4D1D

Functional Units

FXU21

41

FXU21

41

CR11

Pipeline Stage Widths

FETCH82

82

RENAME41

82

RETIRE41 Table2.Width Resource Scaling.

Structure Energy

Growth

Factor Register Rename

Instruction Issue

Memory Unit

Multi-ported Register File

Data Bypass

Functional Units

In certain non-clustered structures,however,linear power scaling may be inaccurate and,for example,does not capture non-linear relationships between power and the number of SRAM access ports since it does not account for the additional circuitry required in a multi-ported SRAM cell.For this reason,we apply superlinear power scaling with exponents(Table3)drawn from Zyuban’s work in es-timating energy growth parameters[25].Since these pa-rameters were experimentally derived through analysis of non-clustered architecture,we only apply this power scal-ing to the non-clustered components of our architecture.

3.1.2Interconnection/Cache Simulation

The core simulators are supplemented by Zauber,a much faster simulator that performs interpolation on L2cache traces provided by the core simulators.Zauber decouples detailed core simulation and the simulation of core interac-tion.The cores in a CMP architecture usually share one or more L2caches through an interconnection fabric.There-fore,resource contention between cores occurs primarily in these two resources.We?nd it possible to simulate cache and fabric contention independent of core simula-tions without losing too much accuracy.The impact of contention on the performance and power of each core may then be evaluated quickly using interpolation.

First,we collect L2access traces based on L1cache misses through one pass of single-core simulations with a speci?c L2cache size(0.5MB in our experiment).We found these L2traces to be independent of the L2cache size.In these traces we record the L2cache address and access time(denoted by the cycle)information for ev-ery access.We also need to sweep through a range of L2cache sizes for each benchmark and record the per-formance and microarchitectural resource utilization every 10k instructions as this information will be used in the in-terpolation.These L2traces are fed into an cache simulator and interconnection-contention model that reads the L2ac-cesses of each core from the traces,sorts them according to time of access,and uses them to drive the interconnec-tion and L2cache simulation.This interconnection/cache simulator outputs the L2miss ratio and the delay due to contention for every10k instruction segment of the thread running on each core.

With this L2miss ratio and interconnection contention information we calculate the new performance and power number for each10k instruction segment of all the threads. Since we know the performance and microarchitectural re-source utilization for several L2miss ratio values,we are able to obtain new performance and utilization data for any other L2miss ratio produced by the cache simulator via in-terpolation.Power numbers can be derived from the struc-ture utilization data with post-processing.

When we interleave the L2accesses from each thread, we are using the cycle information attached with each ac-cess to sort them by time of access.However,each thread may suffer different degrees of performance degradation due to interconnection and L2cache contention.Therefore, sorting by time of access may not re?ect the real ordering.In our model,we iterate to improve accuracy.In particu-lar,given the performance impact from cache contention for each thread,we can use this information to adjust the time of each L2access in each L2trace and redo L2cache emulation based on this new L2access timing information. Iterating to convergence,we?nd three iterations are typi-cally enough to reach good accuracy.

We validate Zauber against our detailed cycle-accurate simulator,Turandot.Figure1shows the average per-formance and power data from Turandot simulation and Zauber simulation for2-way and4-way CMPs.From these?gures,the average performance and power differ-ence between Turandot and Zauber is within1%.For a 2-way CMP,Zauber achieves a simulation time speedup of 40-60x,with detailed Turandot simulations requiring1-2 hours and the decoupled simulator requiring1-3minutes.

Since we are modeling throughput-oriented workloads consisting of independent threads,we consider a relatively high degree of cache sharing Niagara[12].Each L2cache bank is shared by half the total number of cores.The inter-connection power overheads are extrapolated from[14].

We assume the L2cache latency does not change when we vary the L2cache size.We also omit the effects of clock propagation on chip throughput and power when core number increases.

3.2Analytical Infrastructure

We use formulas to vary and calculate parameters of in-terest in the CMP design space exploration.The design parameters we consider include core count,core pipeline dimensions,thermal resistance of chip packaging,and L2 cache size.As we vary these parameters,we consider the impact on both power and performance metrics.

3.2.1Performance and Power Modeling

The analytical model uses performance and dynamic power data generated by Zauber simulation.Leakage power density for a given technology is calculated by Eq.

(3),where A and B are coef?cients determined by a linear regression of ITRS data and T is the absolute temperature. A=207.94and B=1446for65nm technology.

P leakage density=A·T2·e?B/T(3) 3.2.2Temperature Modeling

We use steady-state temperature at the granularity of each core to estimate the chip thermal effects.This neglects lo-calized hotspots within a core as well as lateral thermal coupling among cores.Addressing these is important fu-ture work,but employing a simple analytical temperature formula instead of the more complex models in HotSpot re-duces simulation time and allows us to focus on how heat-removal limitations constrain core count and core type.

We observe that the heat spreader is almost isothermal for the range of the chip areas and power values we in-vestigate,so we can separate the global temperature rise across the thermal package due to total chip power dissi-pation from localized temperature rise above the package

0.511.522.53

3.54

4.532K B 128K B 512K B 2M B 8M B 32M B

L2 C a c h e S iz e

I P C

(a)Performance

Figure 1.The validation of Zauber model.

due to per-core power dissipation.This is described by Equations 4,5,and 6.Suppose we want to calculate the temperature of a speci?c core on the chip,where P glo and P core are the global chip and single core dynamic power,respectively.Similarly,L glo and L core are the global chip and single core leakage power,respectively.The chip’s total dynamic power is the sum of the dynamic power dis-sipated by all the cores on chip,the L2cache and the in-terconnect.The chip leakage power is summed in a sim-ilar manner.The sum of R spread and R hsink denotes the thermal resistance from the heat spreader to the air and the sum of R silicon and R T IM denotes the thermal resistance from the core.Collectively,these parameters specify the chip’s thermal characteristics from the device level to the heat spreader,ignoring the lateral thermal coupling above the heat spreader level.

We categorize the CMP heatup into local and global ef-fects.The former is determined by the local power dissi-pation of any given core and the effect on its temperature.The latter is determined by the global chip power.

H glo +H loc

=T core ?T amb (4)H glo =(P glo +L glo )·(R spread +R hsink )(5)H loc

=

(P core +L core )·(R silicon +R T IM )(6)

This distinction between local and global heatup mech-anisms is ?rst qualitatively introduced by Li et al.in [18].They observed that adding cores to a chip of ?xed area in-creases chip temperature.This observation may not be evi-dent strictly from the perspective of per-core power density.Although power density is often used as a proxy for steady-state temperature with each core exhibiting the same power density,core or unit power density is only an accurate pre-dictor of the temperature increases in the silicon relative to the package.Per-unit or per-core power density is analo-gous to one of the many thermal resistances comprising the entire network that represents the chip.

Adding cores does indeed increase temperature,be-cause it increases the total amount of power that must be removed.The current primary heat removal path is con-vection from a heat sink.Although accurate expressions for convective heat transfer are complex,a ?rst-order ap-proximation is:

q =hA (T sink ?T air )

(7)

where q is the rate of convective heat transfer,h is the con-vective heat transfer coef?cient that incorporates air speed and various air?ow properties,A is the surface area for convection,and to ?rst order we can assume T air is ?xed.At steady-state,the total rate of heat P generated in the chip must equal the total rate of heat removed from the chip.If we hold h and A constant,then as we add cores and P exceeds q ,T sink must rise to balance the rates so that P =q .This increases on-chip temperatures because the sink temperature is like an offset for the other layers from the sink-spreader interface through the chip.

Alternative heat removal mechanisms also warrant con-sideration.For example,fan speed may be increased,but this approach is often limited by acoustical limits and var-ious board-layout and air?ow factors that lead to dimin-ishing returns (e.g.increased pressure drop across a larger heat sink).We can lower the inlet air temperature,but this is not an option in many operating environments (e.g.a home of?ce),or may be extremely costly (e.g.in a large data center).We could also increase heat sink area,but this is where Eq.(7)breaks down.That expression assumes that the heat source is similar in size to the conductive sur-face.In reality,increasing the heat sink surface area does not improve convective heat transfer in a simple way.In-creasing ?n height and surface area is limited by air?ow constraints that dictate an optimal ?n con?guration.In-creasing the total size of the heat sink (i.e.increasing the area of its base),leads to diminishing returns as the ratio of sink to chip area increases due to limitations on how well the heat can be spread.In the limit,the heat source looks like a point source and further increases in the sink area will have no bene?t,as heat will not be able to spread at all to the outermost regions of the heat sink.

With regard to the single thermal resistance in the HotSpot model,adding cores is equivalent to adding cur-rent sources connected in parallel to a single resistor,the sink-to-air resistance.The increased current leads to a larger IR drop across this resistor and a proportionally larger heat-sink temperature.

Equations 4,5,and 6,quantify the contributions from global and local heatup.Figure 2presents results from val-idating this simple model against HotSpot,varying the heat sink resistance and ?xing the power distribution to test dif-ferent temperature ranges.The temperature difference be-

tween these two models is normally within3?.

Figure2.Simpli?ed temperature model validation.

3.2.3Area Modeling

We assume a65nm technology.Based on a Power5die photo,we estimate the baseline core area to be11.52mm2, equivalent to the area of1MB of L2cache.We assume each n/2cores share one L2cache through a crossbar rout-ing over the L2and estimate the total crossbar area to be 6.25n·mm2[14],where n is the number of cores.As pipeline dimensions vary,we scale the core area to account for additional structures and overhead.

Depth Area Scaling:Given our assumption of?xed logic area independent of pipeline depth,latch area con-stitutes the primary change in core area as depth varies. Let w latch be the total channel width of all pipeline latch transistors,including local clock distribution circuitry.Let

w total be the total channel width for all transistors in the baseline microprocessor,excluding all low-leakage tran-sistors in on-chip memories.Let the latch growth fac-tor(LGF)capture the latch count growth due to logic shape functions.In our analysis,we take the latch ratio (w latch/w total)to be0.3and the LGF to be1.1,assuming superlinear latch growth as pipeline depth increases[26]. Assuming changes in core area are proportional to the total channel width of latch transistors in the pipeline,we scale the portion of core area attributed to latches superlinearly with pipeline depth using Eq.(8).

A target=A base 1+w latch F O4base LGF?1 (8)

Width Area Scaling:Table4presents area scaling fac-tors for varying pipeline width.We consider each unit and its underlying macros.To?rst-order,the core area at-tributed to the?xed point,?oating point,and load store units scale linearly due to clustering.We also assume the area of multi-ported SRAM array structures is wire domi-nated and scales linearly with the number of ports[23,24]. This assumption applies to SRAM memories(e.g.register ?les),but may be extended to queues(e.g.issue queues), tables(e.g.rename mappers),and other structures poten-tially implemented as an SRAM array.

Note the area of the instruction fetch and decode units (IFU,IDU)are independent of width.Within the fetch unit,Unit/Macro4D

0.5 2.0

0.5 2.0

0.6 1.8

1.0 1.0

0.5 2.0

1.0 1.0 Total 1.0

frequency and their scaling factors as V sc and F sc,we use a nonlinear voltage/frequency relationship obtained from HSPICE circuit simulation.After determining the volt-age and frequency scaling required for thermal control,we calculate our reward functions,BIPS and BIPS3/W,with Equations10and11.

BIP S(n,l,d,w)=BIP S base·F sc(10) BIP S3

P dyn+P leak V sc 2(11) 3.3Workloads

We characterize all SPEC2000benchmarks into eight major categories:high IPC(>0.9)or low IPC(<0.9),high temperature(peak temperature>355K)or low tempera-ture(peak temperature<355K),?oating-point or integer benchmark.We employ eight of the SPEC2000bench-marks(art,mcf,applu,crafty,gcc,eon,mgrid,swim)as our single thread benchmarks,spanning these categories. We further categorize benchmarks according to their L2 miss ratios,referring to those with high and low miss ra-tios as memory-and CPU-bound,respectively.

To generate static traces,we compile with the xlc com-piler and-O3option.We use Simpoint[7]to identify rep-resentative simulation points and generate traces by captur-ing100million instructions beginning at the Simpoint.

For both CPU-bound and memory-bound benchmarks, we use pairs of single-thread benchmarks to form dual-thread benchmarks and replicate these pairs to form mul-tiple benchmark groups of each benchmark category for CMP simulation with more than two cores.We only simu-late workloads consisting of a large pool of waiting threads to keep all cores active,representing the“worst typical-case”operation likely to determine physical limits.

4Results

We present the results from the exploration of a large CMP design space that encompasses core count,pipeline dimensions,and cache size.We consider optimizing for performance(BIPS)and power-performance ef?ciency (BIPS3/W)under various area and thermal constraints.In addition to demonstrating the effectiveness of our experi-mental methodology for exploring large design spaces,our results also quantify signi?cant CMP design trends and demonstrate the need to make balanced design choices.

4.1Optimal Con?gurations

Table5presents optimal con?gurations that maximize BIPS and BIPS3/W for a?xed pipeline depth while Ta-ble6presents optima for a?xed superscalar width.Con-?gurations are presented for various combinations of area and thermal constraints.The area constraint can take on one of four values:no constraint(“nolimit”),100mm2,200mm2,or400mm2.Similarly,packaging assumptions and hence thermal constraints can take on one of three values:no constraint(NT),low constraint(LR=0.1,low thermal resistance,i.e.aggressive,high-cost thermal so-lution),and high constraint(HR=0.5,high thermal resis-tance,i.e.constrained thermal solution,such as found in a laptop).The tables differentiate between CPU-and memory-bound benchmarks and specify the required volt-age and frequency ratios needed to satisfy thermal con-straints.

Figures3–5present performance trade-offs between core count,L2cache size,and pipeline dimensions for a 400mm2chip subject to various thermal constraints.

4.1.1No Constraints

In the absence of area and thermal constraints(no-limit+NT+CPU,nolimit+NT+MEMORY),the throughput maximizing con?guration for both CPU-and memory-bound benchmarks employs the largest L2cache and num-ber of cores.Although the optimal pipeline width for all benchmarks is eight(8W),CPU-bound benchmarks favor deeper pipelines(12FO4)to take advantage of fewer mem-ory stalls and higher instruction level parallelism.Con-versely,memory-bound benchmarks favor relatively shal-low pipelines(18FO4).

For BIPS3/W,the optimal depth shifts to shallower pipelines;18FO4and30FO4delays per stage are op-timal for CPU and memory-bound benchmarks,respec-tively.The optimal width shifts to shallower,narrower pipelines for memory-bound benchmarks due to the rela-tively high rate of memory stalls and low instruction level parallelism.

4.1.2Area Constraints

Considering area constraints({100,200,400}+NT+*),we ?nd core number and L2cache size tend to decrease as area constraints are imposed.Although both tech-niques are applied in certain cases(100+NT+CPU, 100+NT+MEMORY),decreasing the cache size is natu-rally the most effective approach to meet area constraints for CPU-bound benchmarks,while decreasing the number of cores is most effective for memory-bound benchmarks.

With regard to pipeline dimensions,we?nd the op-timal width decreases to2W for all area constraints on memory-bound benchmarks(*+NT+MEMORY)except 100+NT+MEMORY.According to the area models in Sec-tion3.2.3,changes in depth scale the latch area(only30% of total area)whereas changes in width scale the area as-sociated with functional units,queues,and other width-sensitive structures.Thus,shifting to shallower widths provides greater area impact(Table5).Although pipeline depths may shift from12to18/24FO4delays per stage, they are never reduced to30FO4delays per stage to meet area constraints(Table6).

As in the case without constraints,the bar plots in Figure3,which vary pipeline depth,shows CPU-bound

Core V oltage Core V oltage (MB)Width Scaling(MB)Width Scaling 328 1.00168 1.00 840.63840.63 220.39220.43

44 1.0044 1.00

440.64240.65 220.39220.43

24 1.0024 1.00

240.80220.85 220.51220.51

24 1.0024 1.00

240.96240.96 240.70240.70 328 1.00324 1.00 1640.611640.61 820.45820.45 162 1.00162 1.00 1640.731620.72 820.45820.45

82 1.0082 1.00

840.90820.88 820.51820.51

24 1.0042 1.00

24 1.00420.98

240.73420.73

Table5.Optimal Con?gurations with Varying Pipeline Width,Fixed Depth(18FO4)

Core V oltage Core V oltage (MB)Depth Scaling(MB)Depth Scaling 3212 1.001618 1.00 8180.638180.63 2240.442240.44 412 1.00418 1.00 4180.642240.78 2240.442240.44 218 1.00218 1.00 2180.802240.95 2180.452240.55 218 1.00218 1.00 2180.962180.96 2180.702180.70 3218 1.003230 1.00 16300.7816300.78 8300.488300.48 1618 1.001630 1.00 16300.91830 1.00 8300.488300.48 824 1.00830 1.00 824 1.00830 1.00 4300.754300.75 224 1.00230 1.00 224 1.00230 1.00 2300.952300.95

Table6.Optimal Con?gurations with Varying Pipeline Depth,Fixed Width(4D)

benchmarks favor deeper pipelines(4MB/12FO4/4is op-timal)and memory-bound benchmarks favor shallower pipelines(16MB/18FO4/4or16MB/24FO4/4are optimal). The line plots in Figures3–4also present performance for varying widths for modest thermal constraints.In this case,the optimal pipeline width is4W for a?xed depth of 18FO4delays per stage.

4.1.3Thermal Constraints

We?nd thermal constraints(nolimit+{NT,LR,HR}+*), also shift optimal con?gurations to fewer and simpler cores.The optimal core number and L2size tends to de-crease with heat sink effectiveness.For example,the op-timum for nolimit+HR+MEMORY is8MB L2cache and 10cores.Again,CPU-bound benchmarks favor decreas-ing cache size to meet thermal constraints while memory-bound benchmarks favor decreasing the number of cores.

Figure5also illustrates the impact of global heating on optimal pipeline con?gurations.As the number of cores increase for CPU-bound benchmarks,the optimal de-lay per stage increases by6FO4(i.e.,from18to24FO4) when twelve cores reside on a single chip.The increas-ing core count increases chip temperature,leading to shal-lower pipelines that lower power dissipation,lower global temperature,and meet thermal constraints.

Simpler cores,characterized by smaller pipeline dimen-sions,tend to consume less power and,therefore,miti-

510152025303540452

4

6

8

10

12

14

16

18

20

Co r e N u m b e r

B I P S

(a)cpu

Figure 3.Performance of various con?gurations with chip area constraint at 400mm?2(without thermal control).

Figure 4.Performance of various con?gurations with chip area constraint at 400mm?2(R =0.1heat

sink).

Figure 5.Performance of various con?gurations with chip area constraint at 400mm?2(R =0.5heat sink).

gate the core’s thermal impact.In particular,the optimal pipeline depth shifts to 24and 30FO4delays per stage for CPU and memory-bound benchmarks,respectively,when comparing nolimit+NT+*to nolimit+HR+*in Table 6.Similarly,the optimal width shifts to 2W for all bench-marks when comparing the same entries in Table 5.

Figures 4–5show imposing thermal constraints shifts the optimal depth to shallower design points.The perfor-mance for CPU and memory-bound benchmarks are max-imized for 18-24and 24-30FO4delays per stage,respec-tively.Pipeline power dissipation increases superlinearly

with depth while pipeline area increases sublinearly ac-cording to Section 3.2.3.Thus,growth in power dissipa-tion exceeds area growth and the overall power density in-creases with depth.Thus,optimal designs must shift to shallower pipelines to meet thermal constraints.Similarly,more aggressive thermal constraints,shown in Figure 5shifts the optimal width to the narrower 2W,especially as the number of cores increases.These results also suggest thermal constraints will have a greater impact on pipeline con?gurations than area constraints.

4.1.4Area and Thermal Comparison

Comparing the impact of thermal constraints(no-limit+NT+*versus nolimit+HR+*)to the impact of area constraints(nolimit+NT+*versus100+NT+*)demon-strates larger shifts towards smaller pipeline dimensions. In general,thermal constraints exert a greater in?uence on the optimal design con?gurations.

Applying a more stringent area constraint reduces the trend towards simpler cores.With a smaller chip area,re-sulting in fewer cores and smaller caches,total power dis-sipated and the need for thermal control is diminished.As this occurs,pressure towards simpler cores with smaller pipeline dimensions also fades.

4.1.5Depth and Width Comparison

Consider a baseline con?guration2MB/18FO4/4W.As thermal constraints are imposed,the con?guration may either shift to a shallower core(2MB/24FO4/4W)or shift to a narrower core(2MB/18FO4/2W).Since changes in width scale area for both functional units and many queue structures,whereas changes in depth only scale area for latches between stages,width reductions have a greater area impact relative to depth reductions.Thus, the2MB/24FO4/4W core is a larger core relative to the 2MB/18FO4/2W and exhibits lower dynamic power den-sity.However,the smaller2MB/18FO4/2W core bene?ts from less leakage power per core and,consequently,less global power(since dynamic power dissipation is compa-rable for both cores).

From our temperature models in Section3.2.2,total power output,P global,has greater thermal impact for a chip with a poor heat sink(i.e.,high thermal resistance, R heatsink).Similarly,the thermal impact is dominated by the local power density,P core,for a chip with a good heat sink.In this case,the transfer of heat from the silicon sub-strate to the spreader dominates thermal effects.Thus,to minimize chip heatup,it is advantageous to reduce width and global power in the context of a poor heat sink and ad-vantageous to reduce depth and local power density in the context of a more expensive heat sink.

4.2Hazards of Neglecting Thermal Constraints

Thermal constraints should be considered early in the design process.If a chip is designed without thermal con-straints in mind,designers must later cut voltage and clock frequency to meet thermal constraints.The resulting volt-age and frequency,and hence performance,will likely be cut more severely than if a thermally-aware con?guration were selected from the beginning.Figure6demonstrates the slowdown incurred by choosing a non-thermally op-timal design with voltage and frequency scaling over the thermally-optimal design.The y-axis plots the thermal-aware optimal performance minus the performance of the con?guration without thermal considerations,normalized to the optimal performance.This?gure summarizes the slowdown for all combinations of die sizes,heat-sink con-?gurations,application classes,and for both pipeline depth and width optimizations.The average difference for vary-ing depth is around12-17%and7-16%for varying width.

However,we?nd that for large,400mm2chips,omit-ting thermal consideration may result in huge perfor-mance degradations.For example,the400+HR+CPU and 400+HR+MEMORY con?gurations result in a40%–90% difference in performance for BIPS and BIPS3/W.As area constraints are relaxed,the optimal point tends to include more cores and larger L2caches.However,if the chip has severe thermal problems,DVFS scaling must scale aggres-sively to maintain thermal limits,into a region with sig-ni?cant non-linear voltage and frequency scaling,produc-ing large performance losses.For smaller chips with fewer cores and smaller L2caches,the difference may be negli-gible because there are very few con?gurations to choose from.As future CMP server-class microprocessors target 400mm2chips with more than eight cores,it will be es-sential to perform thermal analysis in the early-stages of the design process when decisions about the number and complexity of cores are being performed.

4.3DVFS Versus Core Sizing

In meeting thermal constraints for large CMP machines where global heat-up and total chip power is a concern, designers may be forced to choose among implementing fewer cores,smaller L2caches,or employing aggressive DVFS scaling.We?nd DVFS superior to removing cores for CPU-bound applications as long as reductions in fre-quency are met by at least an equal reduction in dynamic and leakage power.Additional cores for CPU-bound appli-cations provide linear increases in performance with near-linear increases in power dissipation.However,because of the strongly non-linear relationship between voltage scal-ing and clock frequency at low voltages,voltage scaling at some point stops providing super-linear power savings to make up for the performance(clock-frequency)loss.At this point,designers must consider removing cores and L2 cache from the design to meet thermal constraints.

For example,a chip with30%leakage power no longer achieves super-linear power-performance bene?t from DVFS scaling after roughly0.55x Vdd scaling;fre-quency of the chip drops to0.18x and power dissipation also to0.18x(dominated by leakage power,which only scales linearly with Vdd).Further reductions in Vdd lead to greater performance loss than power savings.(In future process technologies,more than0.55x Vdd scaling may also approach reliability limits of conventional CMOS cir-cuits.)

Figure5shows an example of this behavior with the 2MB/18FO4/4W design.When this design exceeds14 cores,further increases in core count lead to perfor-mance degradation.Vdd scaling has exceeded0.55x, and the additional DVFS scaling necessary to meet ther-mal constraints costs more performance than is gained by adding these additional cores.On the other hand, the2MB/18FO4/2W design only requires Vdd scaling of

0.10.20.30.40.50.60.70.80.91400+L

R +C P U 400+H

R +C P U

200+L

R +C P U

200+H

R +C

P U

100+L

R +C P U 100+

H R +C P U 400+L R +M E M O R Y 400+H R +

M E M O R Y

200+L R +M E M O R Y 200+H R +M E M O R Y 100+L R +M E M O R Y 100+H R +M E M O R Y

A

v e

r a

g e

R e l a t i v e d i f f e r e n c e f r o m o p t i m a l

(a)Optimized for pipeline depth

Figure 6.The difference from the optimal when no thermal consideration is made in early design.

0.57x out to 20cores,which is why this design is attractive even with the additional cores.

Similar analyses hold for memory-bound applications.In this case,the tradeoff is more complex,because the per-formance bene?t from adding cores may be non-linear.

4.4Accommodating Heterogeneous Workloads

Figures 3–5also highlight the dif?culty of accommodat-ing a range of workload types under area constraints.This is less of a concern when looking at a small number of cores like most prior studies.Prior studies have also neglected the role of pipeline dimensions,which we ?nd to play a major role.And for large numbers of cores,radically dif-ferent con?gurations are possible.

CPU-bound and memory-bound workloads have dif-ferent,incompatible optima.The performance loss from using the CPU-bound optimum with the memory-bound workload and vice-versa is severe,37–41%and 26–53%respectively,depending on thermal constraints.Even if we try to identify compromise con?gurations,it is surprising how poorly they perform for one or the other workload.Of course,the best compromise depends on how heavily each workload is weighted.We tried to minimize the perfor-mance loss on both workloads.

With no thermal limits,the best con?guration is 164-wide,18FO4-deep cores with 8MB of cache,incurring an 18%penalty for the CPU-bound workload.If we turn off 8cores,it incurs 10%penalty for the memory-bound work-load.Moving to 16MB improves memory-bound perfor-mance,but hurts CPU-bound performance because it sac-ri?ces 8cores with an area constraint of 400mm 2.

With thermal limits,the optimal con?gurations begin to converge,as the maximum possible number of cores and the L2cache size is constrained,as the BIPS bene?t of extra cores is reduced for CPU-bound benchmarks,and as the bene?t of additional cache lines is reduced for memory-bound benchmarks.For low thermal resistance,the best compromise is 184-wide cores and 8MB.This incurs only a 4%performance loss for CPU-bound benchmark and a 10%loss for the memory-bound case.With high thermal

resistance,the best compromise is 144-wide,30FO4-deep cores with 8MB of cache.Turning off 4cores we reach the optimal con?guration for memory-bound case,but this con?guration incurs 12%penalty for the CPU-bound case.Although the discrepancy between the needs of CPU-and memory-bound workloads narrows with increasing thermal constraints,some penalty seems inevitable,be-cause CPU-bound benchmarks prefer more cores while memory-bound benchmarks prefer larger L2caches.It is interesting to note that we do not see a simple heuristic for identifying good compromise con?gurations.

5Conclusions

Our major conclusions include:

?Joint optimization across multiple design variables is necessary.Even pipeline depth,typically ?xed in ar-chitecture studies,may impact core area and power enough to change the optimal core count.Optimizing without thermal constraints and then scaling to a ther-mal envelope leads to dramatically inferior designs compared to those obtained from including thermal constraints in the initial optimization.

?Thermal constraints appear to dominate other physi-cal constraints like pin-bandwidth and power delivery.Once thermal constraints are met,at least within the design space we studied,power and throughput have been throttled suf?ciently to fall safely within current off-chip I/O bandwidth capabilities and ITRS power-delivery projections.

?Thermal constraints tend to favor shallower pipelines and narrower cores,and tend to reduce the opti-mal number of cores and L2cache size.Neverthe-less,even under severe thermal constraints,additional cores bene?t throughput despite aggressive reductions in operating voltage and frequency.This is true until performance gains from an additional core is negated by the impact of the additional voltage and frequency scaling required of all the cores.This in?ection oc-curs at approximately 55%of the nominal Vdd,well

into the range of non-linear frequency scaling(18% of nominal!).

?For aggressive cooling solutions,reducing power den-sity is at least as important as reducing total power.

For low-cost cooling solutions,however,reducing to-tal power is more important because raising power dissipation(even if power density is the same)raises

a chip’s temperature.

These results raise a range of questions for future work, such as the need for adaptive chip architectures that can dy-namically accommodate the full range of workloads,from heavily CPU-bound to heavily memory-bound.Examining how our?ndings here might change with other workloads (e.g.,scienti?c parallel applications or communication-heavy commercial server workloads)and other architec-tures(e.g.,in-order processors)is future work.Further re-search on L2/L3/Memory interconnect/hierarchy for CMP and on the impact of clock propagation on CMP through-put and power is also necessary.

While CMPs may optimize for throughput-oriented ap-plication workloads at the expense of single-thread perfor-mance,single-thread performance will still be an important consideration for many application domains.Addressing single-thread performance will likely require additional de-sign tradeoffs.This does not necessarily require aggressive superscalar cores running at full voltage and frequency.Fu-ture research in this direction must consider speculative multithreading,heterogeneous cores,dynamic core adap-tation,run-ahead execution/scouting,and so forth.

Acknowledgments

This work was funded in part by the National Science Founda-tion under grant nos.CAREER CCR-0133634,CAREER CCF-0448313,CCR-0306404,CCF-0429765,a Faculty Partnership Award from IBM T.J.Watson,a gift from Intel Corp.,and an Excellence Award from the Univ.Fund for Excellence in Science and Technology.We would also like to thank Dee A.B.Weikle and the anonymous reviewers for their helpful feedback.

References

[1] D.Brooks,P.Bose,V.Srinivasan,M.Gschwind,P.Emma,and

M.Rosen?eld.Microarchitecture-level power-performance analy-sis:the powertimer approach.IBM J.Research and Development, 47(5),2003.

[2] D.Brooks,V.Tiwari,and M.Martonosi.Wattch:a framework for

architectural-level power analysis and optimizations.In Proc.of the 27th Int’l Symp.on Computer Architecture,Jun.2000.

[3]P.Chaparro,J.Gonzalez,and A.Gonzalez.Thermal-aware clus-

tered microarchitectures.In Proc.of the Int’l Conf.on Computer Design,Oct.2004.

[4]S.Chaudhry,P.Caprioli,S.Yip,and M.Tremblay.High perfor-

mance throughput computing.IEEE Micro,25(3):32–45,May/June 2005.

[5]M.Ekman and P.Stenstrom.Performance and power impact of

issue-width in chip-multiprocessor cores.In Proc.of the Int’l Conf.

on Parallel Processing,Oct.2003.

[6] A.El-Moursy,R.Garg,D.Albonesi,and S.Dwarkadas.Partitioning

multi-threaded processors with a large number of threads.In Proc.

of the2005Int’l Symp.on Performance Analysis of Systems and Software,Mar.2005.

[7]G.Hamerly,E.Perelman,https://www.doczj.com/doc/0913084974.html,u,and B.Calder.Simpoint3.0:

Faster and more?exible program analysis.In Proc.of the Wkshp on Modeling,Benchmarking and Simulation,June2005.

[8] A.Hartstein and T.R.Puzak.The optimum pipeline depth for a

microprocessor.In Proc.of the29th Int’l Symp.on Computer Ar-chitecture,pages7–13,May2002.

[9]M.S.Hrishikesh et al.The optimal logic depth per pipeline stage

is6to8FO4inverter delays.In Proc.of the29th Int’l Symp.on Computer Architecture,pages14–24,May2002.

[10]J.Huh,D.Burger,and S.W.Keckler.Exploring the design space

of future CMPs.In Proc.of the Int’l Conf.on Parallel Architectures and Compilation Techniques,Sep.2001.

[11]T.Karkhanis and J.E.Smith.A?rst-order superscalar processor

model.In Proc.of the31st Int’l Symp.on Computer Architecture, Jun.2004.

[12]P.Kongetira,K.Aingaran,and K.Olukotun.Niagara:A32-way

multithreaded sparc processor.IEEE Micro,25(2):21–29,Mar./Apr.

2005.

[13] B.Kumar and https://www.doczj.com/doc/0913084974.html,puter system design using a hi-

erarchical approach to performance evaluation.In Communications of the ACM,23(9),Sep.1980.

[14]R.Kumar,V.Zyuban,and D.M.Tullsen.Interconnections in multi-

core architectures:Understanding mechanisms,overheads and scal-ing.In The32nd Int’l Symp.on Computer Architecture,June.2005.

[15]S.R.Kunkel,R.J.Eickemeyer,M.H.Lipasti,T.J.Mullins,

B.O.Krafka,H.Rosenberg,S.P.Vander-Wiel,P.L.Vitale,and

L.D.Whitley.A performance methodology for commercial servers.

In IBM J.of Research and Development,44(6),2000.

[16] B.Lee and D.Brooks.Effects of pipeline complexity on SMT/CMP

power-performance ef?ciency.In Proc.of the Workshop on Com-plexity Effective Design,Jun.2005.

[17]J.Li and J.F.Martinez.Power-performance implications of thread-

level parallelism on chip multiprocessors.In Proc.of the2005 Int’l Symp.on Performance Analysis of Systems and Software,pages 124–34,Mar.2005.

[18]Y.Li,D.Brooks,Z.Hu,and K.Skadron.Performance,energy and

thermal considerations for SMT and CMP architectures:Extended discussion and results.Technical Report CS-2004-32,Univ.of Vir-ginia Dept.of Computer Science,Oct.2004.

[19]M.Moudgill,J.Wellman,and J.Moreno.Environment for powerpc

microarchitecture exploration.IEEE Micro,19(3),May/Jun.1999.

[20]K.Ramani,N.Muralimanohar,and R.Balasubramonian.Microar-

chitectural techniques to reduce interconnect power in clustered pro-cessors.In Proc.of the Workshop on Complexity Effective Design, Jun.2004.

[21]K.Skadron,K.Sankaranarayanan,S.Velusamy,D.Tarjan,M.R.

Stan,and W.Huang.Temperature-aware microarchitecture:Mod-eling and implementation.ACM Trans.on Architecture and Code Optimization,1(1),Mar.2004.

[22] E.Sprangle and D.Carmean.Increasing processor performance by

implementing deeper pipelines.In Proc.of the29th Int’l Symp.on Computer Architecture,pages25–34,May2002.

[23]M.Tremblay,B.Joy,and K.Shin.A three dimensional register?le

for superscalar processors.In Proc.of the28th Hawaii Int’l Conf.

on System Sciences,1995.

[24]N.Weste and D.Harris.CMOS VLSI design:A circuit and systems

perspective.Addison-Wesley,2005.

[25]V.Zyuban.Inherently lower-power high-performance superscalar

architectures.PhD thesis,Univ.of Notre Dame,Mar.2000. [26]V.Zyuban, D.Brooks,V.Srinivasan,M.Gschwind,P.Bose,

P.Strenski,and P.Emma.Integrated analysis of power and perfor-mance for pipelined microprocessors.IEEE Transactions on Com-puters,53(8),Aug.2004.

半导体封装制程简介

(Die Saw) 晶片切割之目的乃是要將前製程加工完成的晶圓上一顆顆之芯片(Die)切割分離。首先要在晶圓背面貼上蓝膜(blue tape)並置於鋼 製的圆环上,此一動作叫晶圓粘片(wafer mount),如圖一,而後再 送至晶片切割機上進行切割。切割完後,一顆顆之芯片井然有序的排 列在膠帶上,如圖二、三,同時由於框架之支撐可避免蓝膜皺摺而使 芯片互相碰撞,而圆环撐住膠帶以便於搬運。 圖一 圖二

(Die Bond) 粘晶(装片)的目的乃是將一顆顆分離的芯片放置在导线框架(lead frame)上並用銀浆(epoxy )粘着固定。引线框架是提供芯片一個粘着的位置+ (芯片座die pad),並預設有可延伸IC芯片電路的延伸腳(分為內 引腳及外引腳inner lead/outer lead)一個引线框架上依不同的設計可以有 數個芯片座,這數個芯片座通常排成一列,亦有成矩陣式的多列排法 。引线框架經傳輸至定位後,首先要在芯片座預定粘着芯片的位置上点

上銀浆(此一動作稱為点浆),然後移至下一位置將芯片置放其上。 而經過切割的晶圓上的芯片則由焊臂一顆一顆地置放在已点浆的晶 粒座上。装片完後的引线框架再由传输设备送至料盒(magazine) 。装片后的成品如圖所示。 引线框架装片成品 胶的烧结 烧结的目的是让芯片与引线框晶粒座很好的结合固定,胶可分为银浆(导电胶)和绝缘胶两种,根据不同芯片的性能要求使用不同的胶,通常导电胶在200度烤箱烘烤两小时;绝缘胶在150度烤箱烘烤两个半小时。 (Wire Bond) 焊线的目的是將芯片上的焊点以极细的金或铜线(18~50um)連接到引线框架上的內引腳,藉而將IC芯片的電路訊號傳輸到外界。當

半导体全制程介绍

《晶圆处理制程介绍》 基本晶圆处理步骤通常是晶圆先经过适当的清洗(Cleaning)之后,送到热炉管 (Furnace)内,在含氧的环境中,以加热氧化(Oxidation)的方式在晶圆的表面 形成一层厚约数百个的二氧化硅层,紧接着厚约1000到2000的氮化硅层 将以化学气相沈积Chemical Vapor Deposition;CVP)的方式沈积(Deposition)在刚刚长成的二氧化硅上,然后整个晶圆将进行微影(Lithography)的制程,先在 晶圆上上一层光阻(Photoresist),再将光罩上的图案移转到光阻上面。接着利用蚀刻(Etching)技术,将部份未被光阻保护的氮化硅层加以除去,留下的就是所需要的线路图部份。接着以磷为离子源(Ion Source),对整片晶圆进行磷原子的植入(Ion Implantation),然后再把光阻剂去除(Photoresist Scrip)。制程进行至此,我们已将构成集成电路所需的晶体管及部份的字符线(Word Lines),依光罩所提供的设计图案,依次的在晶圆上建立完成,接着进行金属化制程(Metallization),制作金属导线,以便将各个晶体管与组件加以连接,而在每一道步骤加工完后都必须进行一些电性、或是物理特性量测,以检验加工结果是否在规格内(Inspection and Measurement);如此重复步骤制作第一层、第二层...的电路部份,以在硅晶圆上制造晶体管等其它电子组件;最后所加工完成的产品会被送到电性测试区作电性量测。 根据上述制程之需要,FAB厂内通常可分为四大区: 1)黄光本区的作用在于利用照相显微缩小的技术,定义出每一层次所需要的电路图,因为采用感光剂易曝光,得在黄色灯光照明区域内工作,所以叫做「黄光区」。 2)蚀刻经过黄光定义出我们所需要的电路图,把不要的部份去除掉,此去除的步骤就> 称之为蚀刻,因为它好像雕刻,一刀一刀的削去不必要不必要的木屑,完成作品,期间又利用酸液来腐蚀的,所 以叫做「蚀刻区」。 3)扩散本区的制造过程都在高温中进行,又称为「高温区」,利用高温给予物质能量而产生运动,因为本区的机台大都为一根根的炉管,所以也有人称为「炉管区」,每一根炉管都有不同的作用。 4)真空

15个重要的均线形态

15个重要的均线形态: 1、银山谷 2、金山谷 3、死亡谷 4、首次粘合向上发散 5、首次粘合向下发散 6、首次交叉向上发散 7、首次交叉向下发散 8、再次粘合向上发散9、再次粘合向下发散 10、再次交叉向上发散11、首次交叉向下发散 12、烘云托月13、乌云密布14、蛟龙出海15、断头铡刀 一、银山谷1、出现在上涨初期; 2、由3根均线交叉组成; 3、形成一个尖头向上的不规则三角形。 为见底信号。 通常作为激进型的买点。 案例:002046轴研科技2015年7月28日大跌过后形成银山谷。 二、金山谷1、出现在银山谷之后; 2、构成方式与银山谷相同; 3、金山谷所处的位置必须等于或高于银山谷,否则将成为新的银山谷。 为重要的买进信号,为稳健型的重要买入入场信号。 大多会形成重要的底部形态,如双底、头肩底等。 案例,轴研科技2013年11月26日形成金山谷,量能放大,且伴有跳空,为一个很好的买点,随后股价大涨。 三、死亡谷1、出现在下跌初期; 2、有三根均线交叉组成,形成一个尖头向下的不规则三角形 见顶信号 在股价大幅上扬后,出现次信号要特别小心,可配合成交量加以判断。

案例:000877天山股份2017年4月12日大牛股天山股份形成死亡谷 四、首次粘合向上发散1、既可出现在下跌后横盘末期,又可出现在上涨后的横盘末期; 2、短、中、长均线同时以喷射状向上发散; 3、多根均线发散前形成粘合态。 为重要的买入信号。 向上发散时如果有成交量配合,可作为首次入场信号。 案例:天山股份2017年2月6日首次粘合发散,后面大幅度上涨。 五、首次粘合向下发散1、既可出现在下跌后横盘末期,又可出现在下跌后的横盘末期; 2、短、中、长均线同时以喷射状向下发散; 3、多根均线向下发散前粘合在一起。 这是一个极其重要的卖出信号。 如果再配合以向下跳空等重要K线语言,则更加要引起高度重视。 案例:601390中国中铁2015年6月12日构成首次粘合向下发散。 六、首次交叉向上发散1、出现在下跌后期; 2、短、中、长均线从向下发散不断收敛后再向上发散。 为激进型买进信号。该信号由于未进行有效的低位整理,所以准确率不高。 案例2016年铁龙物流构成首次交叉向上发散。 向上交叉是银山谷的一个极端形态,三线共点,但这种情况较少见,较小的银三角可以认为就是交叉向上发散。

所有均线的用法

5均线、10均线、20均线、30均线、60均线、120均线,240均线操盘手对此有特定称谓. 一、攻击线 5均线 所谓攻击线就是我们日常所说的五日均线。有的朋友觉得很可笑,五日均线还用讲吗,这个傻瓜都知道。事实上问题就出在这里,越简单的你反而不会花大力气去学习深究其里。这里需要给大家强调一点,这些特定称谓一般指常用的日线系统,但攻击线也可用于分时、周线、月线甚至是年线,如果你是中线持股者五周线就是你的攻击线,其他依次类推。攻击线作用有三个,攻击线拐头向上表示其有助涨作用,攻击线走平意味着股票正在做平台整理,拐头向下代表其有助跌作用,读者朋友要注意,在大盘系统较稳定的情况下自己要选择攻击线陡峭向上的个股,斜率大上涨速度快,赚钱速度快。 5日均线:均线系统是最为基本、直观的技术分析工具。它简单、实用,也可以解决很多种复杂情况,因此为许多股民朋友们所喜爱。5日均线是均线中最常见的,也是最必不可少的,常常作为买入卖出的信号,作为一个波段结束反转形成的启动点。5日均线战法将详细介绍利用股价和5日均线配合来观察股市和操盘的方法,我们将十分详细地讲解5日均线在分析和实战中的运用,希望对于各位新手能有所帮助。 一、5日均线战法:概述 5日均线对于短线操盘和波段操作至关重要,甚至可以说,5日均线是其生命线。5日均线战法无非就是弄清楚5日均线的变化以及5日均线和股价、K线的关系。更加深入地来看,5日均线的使用高手最终会淡化5日均线的影响,因为5日均线最终还是由K线走出的收盘价决定的,并且5日均线稍稍滞后于K线的走势,5日均线战法某种程度上只是帮助我们理解市场的一个中间手段。 二、5日均线战法:用法 (1)股价高过5日均线过多,可以认为股价偏离5日均线过大,“乖离率”太大。这时候股价往往会有回调的风险,属于单边走势结束的时机,最好伺机卖出。至于“乖离率”,一般认为股价偏离5日均线7%到15%属于过高,可以适时进行短线轧平头寸。 一、攻击线即是5日均线。其主要作用是推动价格在短期内形成攻击态势,不断引导价格上涨或下跌。如果攻击线上涨角度陡峭有力(没有弯曲疲软的状态),则说明价格短线爆发力强。反之,则弱。同样,在价格进入下跌阶段时,攻击线也是重要的杀跌武器,如果向下角度陡峭,则杀跌力度极强。 在临盘实战中,当价格突破攻击线,攻击线呈陡峭向上的攻击状态时,则意味着短线行情已经启动,此时应短线积极做多。同理,当价格击穿攻击线,攻击线呈向下拐头状态时,则意味着调整或下跌行情已经展开,此时应短线做空。 5均线开平仓标准: 1.实体中阳上穿5均线回踩站稳做多。 2.实体中阴下穿5均线回测受阻做空。 3.K线拉大阴线超卖,5均线乖离率过大,短线轻仓做多。 4.K线拉大阳线超买,5均线乖离率过大,短线轻仓做空。 5.5均线走平时应观望,等待实体K线突破阻力或支撑回测确认在顺势跟进。

三维可视化智能物联网管理平台设计

三维可视化智能物联网管理平台 技术方案 二〇一二年八月

目录 一、概述 (3) 1.1项目背景 (3) 1.2建设系统的意义 (4) 1.3设计依据和参考资料 (5) 二、系统特点 (5) 三、设计原则 (6) 3.1可靠性 (6) 3.2先进性与合理性 (6) 3.3开发性 (6) 3.4可扩展性 (6) 四、系统总体构架 (6) 4.1系统整体框图 (6) 4.2系统研究内容 (7) 五、系统组成 (8) 5.1软件组成 (8) 5.2 硬件组成 (9) 5.3 软件功能 (10) 5.4 开发环境 (14) 5.5 系统报价 (14)

一、概述 1.1项目背景 物联网是指通过信息传感设备,按照约定的协议,把需要联网的物品与网络连接起来,进行信息交换和通讯,以实现智能化识别、定位、跟踪监控和管理的一种网络,它是在网络基础上的延伸和扩展应用。物联网是被称为继计算机、互联网之后世界信息产业发展的第三次浪潮。有业内专家认为物联网一方面可以提高经济效益,大大节约成本,另一方面可以为全球经济的复苏提供技术动力。 目前,美国、加拿大、欧盟、日本、韩国等都在投入巨资深入研究探索物联网,并启动了以物联网为基础的“智慧地球”、“U-Japan”、“U-Korea”、“物联网行动计划”等国家性区域战略规划。 我国把发展物联网已经提到国家的战略高度,它不但是信息技术发展到一定阶段的升级需要,同时也是实现国家产业结构调整,推动产业转型升级的一次重要契机。2010年9月,《国务院关于加快培育和发展战略性新兴产业的决定》发布,新一代信息技术、节能环保、新能源等七个产业被列为中国的战略性新兴产业,将在今后加快推进,其中物联网技术作为新一代信息技术的重要组成部分,更是在近一年里受到政府、企业和科研机构的大力支持。 当前,世界各国的物联网基本都处于技术研究与试验阶段,物联网相关技术研究还处于起步发展阶段,在物联网基础研究和技术开发等方面还面临许多挑战。物联网涉及到的关键技术领域很多,包括RFID识别技术、泛在传感技术与纳米嵌入技术、IPV6地址技术以及等。从软件的角度来看,物联网软件技术研究方面也是处于起步阶段,尤其是基础软件的研究均处于探索阶段。 面对物联网所带来的大数据量、数据时效性高、安全与隐私性要求高等挑战,人们也在不断地探索亲的解决办法。在物联网系统中,由于传感器节点及采样数据的异构性,基础软件显得尤为重要。物联网基础软件不仅屏蔽了各类传感器硬件及数据的差异,实现了物联网节点及数据的统一处理,而且实现了海量物联网节点之间的协同工作,从而大大简化了物联网应用程序的开发。我们以动态位置感知类应用为例,相关的传感器可以包括GPS传感器、RFID传感器、手机定

丝瓜对人体的功效与作用

丝瓜对人体的功效与作用 丝瓜的营养价值 丝瓜的营养价值很高,丝瓜中含有蛋白质、脂肪、碳水化合物、粗纤维、钙、磷、铁、瓜氨酸以及核黄素等B族维生素、维生素C,还含有人参中所含的成分--皂甙。丝瓜实味甘性平,有清暑凉血、解毒通便、祛风化痰、润肌美容、通经络、行血脉、下乳汁等功效,其络、籽、藤、花、叶均可入药。【深圳清渠农场提供】 女性多吃丝瓜对身体很有好处,丝瓜中含防止皮肤老化的B族维生素,增白皮肤的维生素C等成分,能保护皮肤、消除斑块,使皮肤洁白、细嫩,是不可多得的美容佳品,故丝瓜汁“美人水”之称。

长期食用或用丝瓜液擦脸,还能使人皮肤变得光滑、细腻。另外,丝瓜还有清暑凉血、解毒通便、调理月经不顺等功效,月经不调者,身体疲乏,体质虚弱的女性更宜多食用。 1、丝瓜子性寒,味苦微甘,有清热化痰、解毒、润燥、驱虫等作用。 2、丝瓜络性平味甘,以通络见长,可以用于产后缺乳和气血淤滞之胸胁胀痛。 3、丝瓜花性寒,味甘微苦,有清热解毒之功,可以用于肺热咳嗽、咽痛、疔疮等。 4、丝瓜藤舒筋活络,并且可以祛痰。 5、丝瓜藤茎的汁液具有美容去皱的特殊功能。 6、丝瓜根可以用来消炎杀菌、去腐生肌。 7、月经不调者、身体疲乏者适宜多吃丝瓜。丝瓜还有清暑凉血、解毒通便、调理月经不顺等功效,月经不调者,身体疲乏,体质虚弱的女性更宜多食用。 1、看外观 不要选择瓜形不周正、有突起的丝瓜。然后,摸摸丝瓜的外皮,挑外皮细嫩些的,不要太粗,不然丝瓜很可能已老,有一颗颗的种。 2、掂重量

把丝瓜拿在手里掂量掂量,感觉整条瓜要有弹性;手指稍微用力捏一捏,如果感觉到硬硬的,就千万不要买,硬硬的丝瓜就非常有可能是苦的。 3、尝味道 如果卖丝瓜老板答应,还可以掰断丝瓜头,放在嘴里尝一尝,一尝就知道苦或者不苦啦。 4、看手感 好的丝瓜,粗细均匀,用手捏一下,很结实,这是新鲜的;如果软塌塌的,是放久了。皮的颜色发暗,不青翠且较粗就是老了,丝瓜一定要去皮吃,即使很嫩,也要削掉皮,不然很难吃。 1、保湿 丝瓜水含有果胶、植物黏液和多糖成分,涂抹在面部不易挥发,而且丝瓜水所含的维生素还能渗透我们的皮肤细胞,滋养我们的皮肤,从而达到保湿养肤的目的。 2、抗衰老,去皱纹 丝瓜水含有维生素E等抗氧化营养成分,这些抗氧化营养成分能直接作用于我们的脸部肌肤,使我们脸部皮肤长久的充满活力,衰老减慢,而且丝瓜水还有一定的粘附能力,用丝瓜水涂抹脸部洗净后,可以去除我们脸部皮肤的深层污垢,使我们的皮肤洁净健康,所以丝瓜水能从各个方面达到活肤去皱的目的。 3、美白嫩肤

六大均线使用技巧

现货产品投资中,除了运用基本面分析外,技术分析其地位也是是相当重要的,很多投资者热衷于技术分析,可以根据技术分析判断现货产品价格走势。下面小编给大家介绍下六大均线使用技巧。 一、攻击线,即5日均线 攻击线的主要作用是推动价格在短期内形成攻击态势,不断引导价格上涨或下跌。如果攻击线上涨角度陡峭有力(没有弯曲疲软的状态),则说明价格短线爆发力强。反之,则弱。同样,在价格进入下跌阶段时,攻击线也是重要的杀跌武器,如果向下角度陡峭,则杀跌力度极强。 在临盘实战中,当价格突破攻击线,攻击线呈陡峭向上的攻击状态时,则意味着短线行情已经启动,此时应短线积极做多。同理,当价格击穿攻击线,攻击线呈向下拐头状态时,则意味着调整或下跌行情已经展开,此时应短线做空。 二、操盘线,即10日均线 此线也有行情线之称。操盘线的主要作用是推动价格在一轮中级波段行情中持续上涨或下跌。如果操盘线上涨角度陡峭有力,则说明价格中期上涨力度强。反之,则弱。同样,在价格进入下跌波段时,操盘线同样可促使价格反复盘跌。 在临盘实战中,当价格突破操盘线,操盘线呈持续向上的攻击状态时,则意味着波段性中线行情已经启动,此时应短线积极做多。同理,当价格击穿操盘线,

操盘线呈向下拐头状态时,则意味着上涨行情已经结束,大波段性调整或下跌行情已经展开,此时应中线做空。 三、辅助线,即22日均线 辅助线的主要作用是协助操盘线,推动并修正价格运行力度与趋势角度,稳定价格趋势运行方向。同时,也起到修正生命线反应迟缓的作用。在一轮波段性上涨行情中,如果辅助线上涨角度较大并陡峭有力,则说明价格中线波段上涨力度极强。反之,则弱。同样,价格在下跌阶段时,辅助线更是价格反弹时的强大阻力,并可修正价格下跌轨道,反复促使价格震荡盘跌。 在临盘实战中,当价格突破辅助线,辅助线呈持续向上的攻击状态时,则意味着波段性中线行情已经启动,此时应短线积极做多。同理,当价格击穿辅助线,辅助线呈向下拐头状态时,则意味着阶段性中线上涨行情已经结束,而阶段性调整或下跌行情已经展开,此时应中线做空。 四、生命线,即66日均线 生命线的主要作用是指明价格的中期运行趋势。在一个中期波段性上涨趋势中,生命线有极强的支撑和阻力作用。如果生命线上涨角度陡峭有力,则说明价格中期上涨趋势强烈,主力洗盘或调整至此位置可坚决狙击。反之,则趋势较弱,支撑力也将疲软。同样,在价格进入下跌趋势时,生命线同样可压制价格的反弹行为,促使价格持续走弱。

物联网管理平台解决方案

物联网管理平台解决方案 行业背景 “物联网”时代,钢筋混凝土、电缆将与芯片、宽带整合为统一的基础设施,基础设施像是一块新的地球工地,世界的运转就在它上面进行,其中包括经济管理、生产运行、社会管理乃至个人生活。 物联网被广泛地应用到各行各业的服务中,打破物品之间的壁垒,使之日益为客户提供融合的服务体验。以无处不在的互联网和一系列高科技术为标志的数字经济积极地打破顾客、企业、竞争对手和合作伙伴之间的界限,提供数字化快速传输和大量数据存储,提高信息传播的安全性,实现商业流程的自动化。 随着互联网产业蓬勃发展,芯片、传感器、网络设备成本不断降低,自动化设备的网络通讯和信息处理能力日渐强大,加之宽带网络日益普及,奠定了产业发展的现实基础。 方案概述 XXXX物联网解决方案涵盖物联网的各个层次,研发范围包括物联网应用平台、多种物联终端以及应用系统。 物联网应用平台 XXXX研发的物联网应用平台是一个支撑物联网应用的核心支撑平台,是一个高效、稳定的物联网应用基础运行平台,向下提供连接各种物联终端等硬件设备的通信接口(包括各种有线和无线通信方式),方便和终端进行数据的交换;向上为智能交通、智能物流、城市管理、智能家居等应用提供一些业务基础服务(例如位置服务、工作流服务、电子地图显示服务等等)以及业务基础框架。 通过平台,上层物联网应用系统不用关心底层终端的工作方式,只要专注于功能的实现,促进应用系统的开发部署更加快捷高效。

平台采用模块化设计,包括以下功能组件: ●统一服务接口:平台为上层应用提供统一的服务接口,对于各种构件的调用采用 SOA服务模式。 ●统一安全认证:以用户信息、系统权限为核心,集成各业务系统的认证信息,提 供一个高度集成且统一的认证平台。其结构具有系统健壮、结构灵活、移动办公、 安全可靠等特点。 ●业务基础服务:提供物联网常用的GIS、定位、工作流、调度等基础服务,提供 图形报表、终端控制、数据加密、数据传输等统一的实现方法,实现系统的松散耦 合,提高系统的灵活性和可扩展性,保障快速开发、降低运营维护成本。 ●业务基础框架:支撑基础服务运行,提供统一的开发模式、数据持久化、系统权 限管理及其它一些公共模块(多语言、日志、缓存、配置管理等)。 ●通信层:物联应用平台和终端设备之间沟通桥梁,支持无线和有线的联系方式。 主要功能是提供平台和终端层之间安全高效的数据传输服务。 ?终端产品 终端层涵盖各行业涉及的相关终端设备,如个人使用的手机、PAD、PC、笔记本电脑;智能家居相关的IP可视门禁、IP可视电话、机顶盒;工控行业的RTU、DTU、温控传感器;

几种不同均线的设置.--转载新浪博客

几种不同均线的设置 (2008-10-08 23:17:37) 转载▼ 均线是研究大盘和股票最基本的要素,看均线也是股民的基本功,有的炒股高手就凭几条均线就能判断出大盘和股票的涨与跌,而且照样能赚到大钱。对于均线的设置各人都有不同的习惯,也都有各自的道理,关键是自己能够不能够把握,这一点十分重要。 均线的设置正常情况下,应当是5日、10日、20日、30日、60日、120日、250日、500日均线等,5日线实际上代表的是一周的价格水平,也可视为周线,以此类推,10日线是半月线,20日线是月线,30日线是1.5月线,60日线是三月线,120日线是半年线,250日线是年线,也是牛熊分界线,500日线是两年线。以此类推。 在益盟公司的几位老师中,在大盘上俞涌老师设一条38日均线和一条73 日均线,冯崎老师设一条18日均线。他们的设置都有各自的道理,38日线是差两天的两月线,也是差两天的8周线;73日线是比4月线早7天的一条均线。关于38日线,我经过验证,对于看大盘非常有价值,大盘在牛市时指数基本是不会破这条线的,而在这波熊市里,大盘的指数基本是在38日线以下运行,很难突破这条线,也可以说在这条均线下面买入股票,想赚钱是非常难的。而且,大盘每每反弹到 38日线这个阻力位时必定回调。现在大盘的指数正好是受阻于这条均线的,可见,用38日线看上证指数是有特别意义的。 冯崎老师设的18日线,是比月线早两天的一条线,冯崎老师把这条线称为大盘的生命线,也就是说指数在这条线以下不操作股票,在这条线以下买入股票不赚钱是正常的,赚了钱就是不正常的。在这次一年多的下跌行情中,大盘只有三次短暂地在18日均线上方运行,我们赚钱的时机也就这么三回,抓住了就可赚一笔,而在这条线以下想赚钱,都是侥幸的。 在个股上做短线和长线的人习惯设为3、5、18、30、62、133、250、500日线。做超短线的人习惯设为3、5、13、20、35、70、 120、250日线。做中线的习惯设为5、10、20、30、60、120、185日线。各有各的独到之处,这需要自己仔细体味和研究。一个比较正规的主力,一个老道的操盘手在拉升股价时,都可以会预设一条线,股价大多数时间是沿着这条线运行的,只要不破这条线,我们可以放心持有,破了这条线就要果断卖出。至于这条线是多少日线,需要自己去揣摩了。

物联网管理系统

1、简介 昆仑海岸物联网云服务平台是由北京昆仑海岸传感技术有限公司开发的面向物联网设备的数据服务平台。目前昆仑海岸物联网云服务平台需要和本公司自主研发的KL-H系列物联网网关产品配套使用,通过物联网网关可以实现对温度、湿度、照度、土壤温度、土壤水分、照度、二氧化碳、氧气等环境值的监控,同时可以下发控制命令,完成对一些设备的控制。通过这套系统,可以很好地实现智慧农业、智慧城市等一些项目。如图1: 图1

云服务平台功能分三个模块:应用模块、数据服务模块、单机版模块,如图2: 图2 应用模块:会员通过该模块,可直接享受数据服务、地图定位、历史记录、历史曲线等功能。 数据服务模块:会员通过该模块,可以将我公司服务器上的数据信息下载到本地计算机上,便于二次开发。单机版模块:会员通过该模块,可在自己的服务器上实现数据收发功能,便于二次开发。

2、申请账号与登录 用户通过浏览器(推荐使用非IE 内核的浏览器,如火狐浏览器)访问昆仑海岸物联网云服务平台(以下简称为“平台”),在浏览器地址栏中输入域名https://www.doczj.com/doc/0913084974.html, 进入平台主界面。 点击【注册】进入用户注册界面,填写相关信息并点击【确认提交用户信息】按键,如图3所示。 当用户申请账号操作完成后,需要等待账号被平台管理员授权后方可使用,若账 号未被授权,则登录时会出现如图4所示的提示。 会员通过浏览器(推荐使用非IE 内核的浏览器,如火狐浏览器)访问平台,在浏 览器地址栏中输入管理平台的域名https://www.doczj.com/doc/0913084974.html, 进入登陆界面。使用已被授权的账号 登陆管理平台,如图5 所示。 图 4 图 3 图5

登陆成功后,会员可【查看账户信息】,来查看账户信息、设备信息、以及联系 方式等。本平台一个账户最多可以提供5只物联网网关的服务,如果多于5只将不能 添加设备,会员可以拨打电话,来实现扩容服务。如图6: 图6

半导体各工艺简介5

Bubbler Wet Thermal Oxidation Techniques

Film Deposition Deposition is the process of depositing films onto a substrate. There are three categories of these films: * POLY * CONDUCTORS * INSULATORS (DIELECTRICS) Poly refers to polycrystalline silicon which is used as a gate material, resistor material, and for capacitor plates. Conductors are usually made of Aluminum although sometimes other metals such as gold are used. Silicides also fall under this category. Insulators refers to materials such as silicon dioxide, silicon nitride, and P-glass (Phosphorous-doped silicon dioxide) which serve as insulation between conducting layers, for diffusion and implantation masks,and for passivation to protect devices from the environment.

高手的均线总结及使用心得

高手的均线总结及使用心得 https://www.doczj.com/doc/0913084974.html,/post_46_908231_1_d.aspx 一:对于均线的支撑作用的论证 截止到目前为止,人们通常对于均线是否具有真正有效的支撑作用抱有疑问,笔者的心得如下:对于 一只股票,只要主力尚在其中,根据主力的类型和目标的不同,主力不会允许股价有效击破某条重要均线 ,因为通常主力会顾及到主力的成本区域,大众的心理层面以及主力的目标和操作手法等,此时股价往往 在重要均线获得支撑,而当主力已经基本离场出局,均线的支撑作用就自然消失。所以均线的支撑作用在 个股的行情展开过程中才会发生作用,如果主力出逃则失效。如果由于主力出逃导致有效杀破某条均线, 由于缺乏主力关照,日后在反弹到此均线时由于散户的抛压就会再次下跌,看起来就好象均线产生了压制 作用。下面我们依据主力运作的周期和目标把主力分为短线主力,中线主力和长线主力,除了控盘程度极 高的庄股,通常一支股票中上述三种类型的主力是共存的。另外对于日线图,通常定义的有效击破的概念 是连续三日在某均线下方运行,幅度超过3%。 二:通用重要均线 通用短期均线:5日,10日均线

心得:对于走主升浪的强势股,5日均线是低吸的重要位置,强势股在拉升的过程中经常会技术性的 缩量回挡5日线之后立刻拉起,判断的重要依据是缩量,时机最好在一波拉升的初中期也就是幅度不大的 时候。对于由短线主力发动的行情来说,通常主力会在回档到5日或者10日线的时候出来护盘或者向上拉 起。此种类型个股的典型例子是华夏银行。 通用中期均线:20日,30日,60日均线 心得:中线主力必须维护的均线,一般逐浪上攻的慢牛型股票会在20日或者30均线获得支撑继续向上 ,如果30日线跟60日线相距不远的话瞬时打到60日线也有可能。通常来说,30日线是中线的生命线,30日 线走平或者向上,股价在30日线上方运行是中线看多做多的标志,30日线向下,股价在30线下方运行是中 线看空做空的标志,此方法如果结合个股的涨幅和基本面来配合使用成功率会更高。依托中期均线逐浪上 攻型的个股例子可以参考1月5日之前皖通高速的技术形态。 通用长期均线:120,240日线 心得:长线主力需要维护的均线,如果中短线主力出逃,股价往往在长期均线处获得支撑,经过连续 的下跌后,由于止跌回升导致长期均线走平或者拐头向上是中长线开始走好的标志。股价有效突破原本反

半导体制程气体介绍

一、半導體製程氣體介紹: A.Bulk gas: ---GN2 General Nitrogen : 只經過Filter -80℃ ---PN2 Purifier Nitrogen ---PH2 Purifier Hydrgen (以紅色標示) ---PO2 Purifier Oxygen ---He Helium ---Ar Argon ※“P”表示與製程有關 ※台灣三大氣體供應商: 三福化工(與美國Air Products) 亞東氣體(與法國Liquid合作) 聯華氣體(BOC) 中普Praxair B.Process gas : Corrosive gas (腐蝕性氣體) Inert gas (鈍化性氣體) Flammable gas (燃燒性氣體) Toxic gas (毒性氣體) C.General gas : CDA : Compressor DryAir (與製程無關,只有Partical問題)。 ICA : Instrument Compressor Air (儀表用壓縮空氣)。 BCA: Breathinc Compressor Air (呼吸系統用壓縮空氣)。 二、氣體之物理特性: A.氣體分類: 1.不活性氣體: N2、Ar、He、SF6、CO2、CF4 , ….. (惰性氣體) 2.助燃性氣體: O2、Cl2、NF3、N2O ,….. 3.可燃性氣體: H2、PH3、B2H6、SiH2Cl2、NH3、CH4 ,….. 4.自燃性氣體: SiH4、SC2H6 ,….. 5.毒性氣體: PH3、Cl2、AsH3、B2H6、HCl、SiH4、Si2H6、NH3 ,…..

均线的设置和使用技巧详解

均线的设置和使用技巧详解 均线是最常用的分析指标之一,所反映的是过去一段时间内市场的平均成本变化情况。均线和多根均线形成的均线系统可以为判断市场趋势提供依据,同时也能起到支撑和阻力的作用。 一、均线的构成 均线,按照计算方式的不同,常见地可分为普通均线、指数均线、平滑均线和加权均线。这里主要介绍普通均线和指数均线。 普通均线:对过去某个时间段的收盘价进行普通平均。比如20日均线,是将过去20个交易日的收盘价相加然后除以20,就得到一个值;再以昨日向前倒推20个交易日,同样的方法计算出另外一个值,以此类推,将这些值连接起来,就形成一个普通均线。 指数均线:形成方式和普通均线完全一致,但在计算均线值的时候,计算方式不一样。比如20日均线,指数均线则采取指数加权平均的方法,越接近当天,所占的比重更大,而不是像普通均线中那样平均分配比重。所以指数均线大多数情况下能够更快地反映出最新的变化。 二者的优劣:没有绝对的优劣,在不同的运行阶段,二者可以体现出不同的效果,选取时的经验成分相对更大。个人更习惯于使用指数均线,用多根指数均线的结合来辅助判断市场趋势。 下图是欧元/美元日线图上所体现的两种不同均线的效果,这里的均线参数设置都是20,其中红色均线为普通20日均线,蓝色均线

为20日指数均线。从图片中可以看出,大多数情况下指数均线的拐头速度都相对早于普通均线,能相对更快地跟上市场的趋势。 说明:一般而言,均线的计算价格用收盘价,但也可根据需求的不同使用最低价、最高价或者开盘价。 二、均线系统的设置 如果将多根有规律的均线排列在一起,这是就可以形成一个对分析更有帮助的均线系统。一般来说,在用均线系统进行分析时,常见的参数排列方式为: A、5、10、20、30、40、50(等跨度排列方式) B、5、8、13、21、34、55(费波纳奇排列方式) 在这两种排列方式的基础上,增加125、200和250三个参数的均线三、利用均线系统排列状态判断市场节奏

洋丝瓜的功效与作用

洋丝瓜的功效与作用 洋丝瓜在生活中很受欢迎,也是比较常见的食材,其口感脆嫩,营养丰富,不仅仅含有诸多有益物质而且有保健的功效,所以也是餐桌上常见的食物。洋丝瓜的烹饪方法有很多,切片切丝均可,不仅可以烧炒,还可以做汤。下面就为大家详细介绍下洋丝瓜的功效和作用,大家可以了解下。 洋丝瓜的营养价值 洋丝瓜别名佛手瓜、梨瓜、福寿瓜、洋丝瓜、拳头瓜、合掌瓜、隼人瓜、菜肴梨,佛手瓜的功效:理气和中、疏肝止哎,佛手瓜的作用:治消化不良、胸闷气胀、呕吐、肝胃气痛、气管炎咳嗽多痰,佛手瓜食用禁忌:佛手瓜属于温性食物,阴虚体热和体质虚弱的人应少食。 别名:梨瓜、洋丝瓜、拳头瓜、合掌瓜、福寿瓜、隼人瓜、菜肴梨。 功效:理气和中,疏肝止哎。 主治:适宜于消化不良、胸闷气胀、呕吐、肝胃气痛以及气管炎咳嗽多痰者食用。 用法用量:佛手瓜的食用方法很多,鲜瓜可切片、切丝,作荤炒、素炒、凉拌,做汤、涮火锅、优质饺子馅等。还可加工成腌制品或做罐头。在国外,佛手瓜以蒸制、烘烤、油炸、嫩煎等方法食用。每次1个。 洋丝瓜的功效与作用

1、预防疾病:洋丝瓜含有的营养物质比较全面丰富,经常食用可以大大提高人体免疫力和预防疾病的入侵。 2、利尿排钠、降血压:洋丝瓜含有的蛋白质和钙元素是黄瓜的2-3倍,同时含有丰富的维生素和人体所需的各种矿物质,而且又是一种低热量和低钠的食物,对防治高血压和心脏病具有很好的保健作用。所以经常食用洋丝瓜具有利尿排钠、降血压的神奇功效。 3、提高智力:据现代医学研究发现洋丝瓜中含有大量的锌元素,而锌元素又是青少年儿童智力发育所必需的物质,所以经常食用洋丝瓜有助于提高智力。 4、理气和中、疏肝止咳:据中医的说法洋丝瓜具有理气和中、疏肝止咳的作用,常被中医用来治疗消化不良、胸闷气胀、呕吐、肝胃气痛以及气管炎咳嗽多痰等疾病症状。 5、抗氧化、防癌抗癌:洋丝瓜中含有的硒元素的含量是其他蔬菜不可比拟的,医学研究证明硒元素是人体不可缺少的一种微量元素,具有很好的抗衰老和防癌抗癌功效,可以有效的保护人体细胞膜结构和功能不被损害。 6、补肾壮阳 洋丝瓜还有一个奇妙的功效就是可以帮助男女改善因营养不良引起的不育症,特别是对男性性功能衰退具有显著的效果,因此被誉为男人的“天然伟哥”,因此长期食用洋丝瓜可以起到补肾壮阳的效果。

半导体全制程介绍

半导体全制程介绍 《晶圆处理制程介绍》 基本晶圆处理步骤通常是晶圆先经过适当的清洗 (Cleaning)之后,送到热炉管(Furnace)内,在含氧的 环境中,以加热氧化(Oxidation)的方式在晶圆的表面形 成一层厚约数百个的二氧化硅层,紧接着厚约1000到 2000的氮化硅层将以化学气相沈积Chemical Vapor Deposition;CVP)的方式沈积(Deposition)在刚刚长成的二氧化硅上,然后整个晶圆将进行微影(Lithography)的制程,先在晶圆上上一层光阻(Photoresist),再将光罩上的图案移转到光阻上面。接着利用蚀刻(Etching)技术,将部份未被光阻保护的氮化硅层加以除去,留下的就是所需要的线路图部份。接着以磷为离子源(Ion Source),对整片晶圆进行磷原子的植入(Ion Implantation),然后再把光阻剂去除(Photoresist Scrip)。制程进行至此,我们已将构成集成电路所需的晶体管及部份的字符线(Word Lines),依光罩所提供的设计图案,依次的在晶圆上建立完成,接着进行金属化制程(Metallization),制作金属导线,以便将各个晶体管与组件加以连接,而在每一道步骤加工完后都必须进行一些电性、或是物理特性量测,以检验加工结果是否在规格内(Inspection and Measurement);如此重复步骤制作第一层、第二层的电路部份,以在硅晶圆上制造晶体管等其它电子组件;最后所加工完成的产品会被送到电性测试区作电性量测。 根据上述制程之需要,FAB厂内通常可分为四大区: 1)黄光本区的作用在于利用照相显微缩小的技术,定义出每一层次所需要的电路图,因为采用感光剂易曝光,得在黄色灯光照明区域内工作,所以叫做「黄光区」。

股票均线的设置和使用技巧

股票均线的设置和使用技巧 2010-10-26 12:51:04| 分类:股票| 标签:买卖股票 macd 股价平均|举报|字号大中小订阅股票均线的设置和使用技巧(2010-09-21 23:00:28) 内容:均线是最常用的分析指标之一,所反映的是过去一段时间内市场的平均成本变化情况。均线和多根均线形成的均线系统可以为判断市场趋势提供依据,同时也能起到支撑和阻力作用 < XMLNAMESPACE PREFIX ="O" /> 一般来说,在用均线系统进行分析时,常见的参数排列方式为: A、5、10、20、30、40、50(等跨度排列方式) B、5、8、13、21、34、55(费波纳奇排列方式) 在这两种排列方式的基础上,增加120、200两个参数的均线 一、利用均线系统排列状态判断市场节奏 均线系统的排列状态,一般可分为股聚、发散、平行三种状态。其中股聚状态表明的是市场经历过一段单边走势后出现的对原有动能消化或积蓄新运行动能的过程,股聚的时间越长,产生的动能越大;发散状态则是股聚之后产生的爆发动能的前奏;平行状态则表明市场运行动能已经单方向爆发,引发了新的单边运行趋势。在单边市运行节奏当中,这三种状态的循环为:股聚-〉发散-〉平行-〉股聚而在盘整市中,则体现为股聚-〉发散-〉股聚的循直到突破的发生。 二、均线常规用法均线的两个基本功能;

1、“金叉”和“死叉”指示的基本信号 2、均线对股价的支撑和阻力作用当股价运行在均线系统的下方时,此时均线系统将起到明显的阻力作用,阻碍股价的反弹;同样地,当股价站稳于均线系统之上后,均线系统又将起到明显的支撑作用,这个作用的产生原因就在于均线计算方式所体现的市场成本变化,这种支撑和阻力效果也被称为“均线的斥力”。当然,均线也有吸引力,当股价单边运行过程中较远地离开了均线系统后,均线也会发出“引力”作用,均线本身跟随股价运行方向前进的同时,也会拉 动股价出现回调。 五、特殊均线的用法所谓特殊均线是指在特定阶段能起到明显多空分水岭的均线,一般来说我们将其运用于单边市的主趋势中。其中30周指数均线就体现出特殊的多空分水岭效果。 股票均线简单看 年线下变平,准备捕老熊。年线往上拐,回踩坚决买! 年线往下行,一定要搞清,如等半年线,暂做壁上观。 深跌破年线,老熊活年半。价稳年线上,千里马亮相。 要问为什么?牛熊一线亡!半年线下穿,千万不要沾。 半年线上拐,坚决果断买!季线如下穿,后市不乐观! 季线往上走,长期做多头!月线不下穿,光明就在前! 股价踩季线,入市做波段。季线如被破,眼前就有祸,长期往上翘,短期呈缠绕,平台一做成,股价往上跃

与均线相关的技术指标

第二章与均线相关的技术指标 前一章谈了均线,而许多传统经典指标与之息息相关,如BISA、MACD、DMA、TRIX、BOLL、ENV等,在它们共同的特点是在指标的设计中,都以价格平均线作为首要因素进行数学模型构建的。可见均线无论在指标设计还是在技术分析中都占有极重要的地位。 下面就上述指标进行逐一讲解,通过对指标的数学表达、曲线形态对比,以及指标间的关联分析,揭示指标本质与外在关系,并会发现一些有趣的现象。 一、乖离率BISA与变动速率ROC指标 先看乖离率BISA指标的公式源码: BIAS:(CLOSE-MA(CLOSE,N))/MA(CLOSE,N)*100; 算法:乖离率=〔当日收盘价-N日移动平均线〕×100/N日移动平均线 公式算法表明,以N日移动平均线为基准线(即指标中的零轴),计算价格上下波动远离基准线的幅度,其大小用乖离率表示。换句话说,BIAS实质上表示的是价格相对于N日均线的涨跌幅。所以当正乖离率愈大时,价格相对于均线的涨幅愈大,短期获利丰厚,回吐可能性愈高;反之负乖离率愈大,价格相对于均线的跌幅愈大,空头回补可能性愈高。至于乖离率的技术点怎么确定,除与设定的基准均线(参数N)有关外,还会因个股的股性以及所处行情性质相关,也因个人的操作风格不同而不同。比如以60平均线为参考,在一般震荡行情中,通常BIAS波幅在±25%左右,极端行情或主升段时也可达±60%或+100%以上;而对短线涨升行情,股价相对于10均线的乖离率通常为5%~8%。 现在来分析与BIAS原理很相近的变动速率ROC指标,原公式表达为: ROC:(CLOSE-REF(CLOSE,N))/REF(CLOSE,N)*100; ROCMA:MA(ROC,M); 即:变动速率=〔当日收盘价-N日前的收盘价〕×100/N日前的收盘价ROCMA是再计算变动速率ROC的M日移动平均。

物联网称重管理系统

物联网称重管理系统 Document number:NOCG-YUNOO-BUYTT-UU986-1986UT

物联智能计量管控平台-一体化解决方案 ?计重之星-防作弊智能称重管理系统(版本~ ?无人值守智能称重管控系统(单机/网络/WEB版本 ?基于物联网技术的集团级计量自动化管控平台 物联:支持从计量现场(RFID读写器、红外检测仪、光栅、计量仪器仪表、地感、LED屏、声波定 位、雷达测速装置、信号指示灯、抓拍相机(照片、车牌)、道匝/拦道机、喇叭/音箱/功放/语音对讲)到管控办公间(机柜、工控机、PLC主控柜、各类仪器仪表及传感装置、有线无线网络设备、UPS、条码打印机)、从控制办公间到管理办公间(PC、身份认证U盘)、再到客户终端 (PDA、手机、盘点机、扫描器)的整个信息网络共享接连,网络连接支持有线局域网、GPRS、WIFI、3G等. 智能:支持单片机、PLC数据采集与信号控制(外部设备),实现数据与设备的智能控制与自动化 计量:软件提供了对各类计重、计数、计温、计湿、计雾、计微量元素仪器仪表的数据采集 管控:实现对业务数据的管理与控制、对工作设备状态的管理与控制 平台:提供一体化集成化的WEB操作平台(个人电脑、手机、PDA) “计重之星-防作弊智能网络称重管理系统”是为适应信息化时代的发展要求,结合我国计重(衡器)行业的市场发展的需要和客户的实际需求而开发的一套现代计重、物流、智能控制管理系统。该系统对衡器行业在当前信息化普及时代刚刚起步的背景,为国内外衡器企业及相关行业用户在计重管理上的需要,为提高各行业用户在计重管理业务上的管理水平,实现其从当初习惯的手工操作模式有效的过渡到自动化,智能化的计算机管理模式,实现行业用户在计重管理全面信息化的完整解决方案。该系统可以运行于各种通用的操作系统及微机和服务部构建的各种单机、局域网、内部网(Intranet)或广域网等多种软硬件环境,适用于水泥、煤矿、钢材、化工、港口、采矿、电力、粮油、食

均线的12种形态

均线的12种形态

在技术分析中,市场成本原理非常重要,它是趋势产生的基础,市场中的趋势之所以能够维持,是因为市场成本的推动力,例如:在上升趋势里,市场的成本是逐渐上升的,下降趋势里,市场的成本是逐渐下移的。成本的变化导致了趋势的沿续。 均线代表了一定时期内的市场平均成本变化,我非常看重均线,本人在多年的实战中总结出一套完整的市场成本变化规律----均线理论。他用于判断大趋势是很有效果的。 内容包扩: 1,均线回补。 2,均线收敛。 3,均线修复。 4,均线发散。 5,均线平行。 6,均线脉冲。 7,均线背离。 8,均线助推。 9,均线扭转。 10,均线服从。 11,均线穿越。 12,均线角度。 今日,我首先给大家重点介绍一下均线回补和均线收敛,希望在和大家交流中得到完善和补充。

一,均线回补。 均线反应了市场在均线时期内的市场成本,例如:30日均线反应了30日内的市场平均成本。股价始终围绕均线波动,当股价偏离均线太远时,由于成本偏离,将导致股价向均线回补(回抽),在牛市里表现为上升中的回调,在熊市里表现为下跌中的反弹。二,均线收敛。 均线的状态是市场的一个重要信号,当多条均线出现收敛迹象时,表明:市场的成本趋于一致,此时是市场的关建时刻,因为将会出现变盘!大盘将重新选择方向。 当均线出现收敛时大家都能看出来,这并不重要,而重要的是:在变盘前,如何来判断变盘的方向呢?判断均线收敛后将要变盘的方向是非常重要的,他是投资成败的关建,判断的准,就能领先别人一步。 判断变盘方向时,应该把握另外的两条原则:均线服从原理和均线扭转扭转原理。 均线服从原理是:短期均线要服从长期均线,变盘的方向将会按长期均线的方向进行,长期均线向上则向上变盘,长期均线向下则向下变盘。日线要服从周线,短期要服从长期。利用均线服从原理得出的结论并不是一定正确的,它只是一种最大的可能性,而且他还存在一个重要确陷:也就是迟钝性,不能用来判断最高点和最底点,只能用来判断总体大方向。另外在实战中,市场往往会发生出人意料的逆转,此时均线服从原理就要完全失效了,

相关主题
文本预览
相关文档 最新文档