PerformanceAnalysisof_省略_FluctuationsLoa
- 格式:pdf
- 大小:2.03 MB
- 文档页数:7
IEEE Std 1243-1997 IEEE Guide for Improving the Lightning Performance of Transmission LinesSponsorTransmission and Distribution Committeeof theIEEE Power Engineering SocietyApproved 26 June 1997IEEE Standards BoardAbstract: The effects of routing, structure type, insulation, shielding, and grounding on transmis-sion lines are discussed. The way these transmission-line choices will improve or degrade light-ning performance is also provided. An additional section discusses several special methods that may be used to improve lightning performance. Finally, a listing and description of the FLASH pro-gram is presented.Keywords: grounding, lightning protection, overhead electric power transmission, shieldingIEEE Standards documents are developed within the IEEE Societies and the Standards Coordinat-ing Committees of the IEEE Standards Board. Members of the committees serve voluntarily and without compensation. They are not necessarily members of the Institute. The standards developed within IEEE represent a consensus of the broad expertise on the subject within the Institute as well as those activities outside of IEEE that have expressed an interest in participating in the develop-ment of the standard.Use of an IEEE Standard is wholly voluntary. The existence of an IEEE Standard does not imply that there are no other ways to produce, test, measure, purchase, market, or provide other goods and services related to the scope of the IEEE Standard. Furthermore, the viewpoint expressed at the time a standard is approved and issued is subject to change brought about through developments in the state of the art and comments received from users of the standard. Every IEEE Standard is sub-jected to review at least every Þve years for revision or reafÞrmation. When a document is more than Þve years old and has not been reafÞrmed, it is reasonable to conclude that its contents, although still of some value, do not wholly reßect the present state of the art. Users are cautioned to check to determine that they have the latest edition of any IEEE Standard.Comments for revision of IEEE Standards are welcome from any interested party, regardless of membership afÞliation with IEEE. Suggestions for changes in documents should be in the form of a proposed change of text, together with appropriate supporting comments.Interpretations: Occasionally questions may arise regarding the meaning of portions of standards as they relate to speciÞc applications. When the need for interpretations is brought to the attention of IEEE, the Institute will initiate action to prepare appropriate responses. Since IEEE Standards rep-resent a consensus of all concerned interests, it is important to ensure that any interpretation has also received the concurrence of a balance of interests. For this reason, IEEE and the members of its societies and Standards Coordinating Committees are not able to provide an instant response to interpretation requests except in those cases where the matter has previously received formal consideration.Comments on standards and requests for interpretations should be addressed to:Secretary, IEEE Standards Board445 Hoes LaneP.O. Box 1331Piscataway, NJ 08855-1331USAAuthorization to photocopy portions of any individual standard for internal or personal use is granted by the Institute of Electrical and Electronics Engineers, Inc., provided that the appropriate fee is paid to Copyright Clearance Center. To arrange for payment of licensing fee, please contact Copyright Clearance Center, Customer Service, 222 Rosewood Drive, Danvers, MA 01923 USA; (508) 750-8400. Permission to photocopy portions of any individual standard for educational class-room use can also be obtained through the Copyright Clearance Center.Introduction(This introduction is not part of IEEE Std 1243-1997, IEEE Guide for Improving the Lightning Performance of Trans-mission Lines.)For most overhead power transmission lines, lightning is the primary cause of unscheduled interruptions.Several methods for estimating lightning-outage rates have been developed in the past, and many publica-tions have been written on how to design transmission lines that experience minimum interruptions.The methods for estimating the lightning performance of transmission lines show several approaches to a real-life engineering problem that is ill-deÞned. Precise constants are rarely known and are often not really con-stant, input data is difÞcult to describe mathematically except in idealized ways, and outputs may be depictable only by probabilities or average values. By its nature, lightning is difÞcult to study and model. Lightning tran-sients are so fast that air ionization time constants lead to a time- and waveshape-dependent insulation strength. Lightning peak currents may be ten times higher or lower than the median 31 kA value. Typical ground-ßash densities are between 1 and 10 ßashes/km 2 , so a typical 100 km long strip of width 20 m should receive 2Ð20 ßashes per year. A transmission line of normal height will receive ten times more ßashes because it is tall. Structure footing impedances vary with soil characteristics, ßash current, and time. Nonlinear corona and surge response effects change the waveshape and magnitude of stresses. As a further complication, light-ning ßash density varies widely from year to year and changes with location and season.Any method of judging the lightning performance of transmission lines must cope with these uncertainties.It is pointless, and indeed misleading, to promote a method that is more precise than the accuracy of the input data. The uncertainties of the problem do permit some simpliÞcation of the method; rough estimates are likely to be as correct as a much more detailed solution. It is in this spirit that this guide and the FLASH program have been prepared. If the transmission-line designer keeps these limitations in mind, the factors that most inßuence the lightning performance of a given transmission line may be evaluated.Knowledge has improved in recent years in such areas as shielding design, stroke characteristics, impulse current behavior of grounds, and lightning ground-ßash density. Work is continuing in these areas, as well as others. However, a simple design guide is needed now. It is the purpose of this publication to provide a sim-pliÞed guide that includes new advances in this Þeld, for use by transmission-line designers.The IEEE Working Group on Estimating the Lightning Performance of Overhead Transmission Lines had the following membership during the preparation of this guide:James T. Whitehead, Chair, 1985-1989William A. Chisholm, Chair, 1989-1997The Working Group would like to thank the many individuals who reviewed the text and provided comments in the review and balloting process.John G. AndersonJohn W. AndersonPhilip BarkerRalph BernsteinJames E. ChapmanPritindra ChowdhuriErnico CinieriRoger ClaytonLuigi DelleraHamid ElahiAndrew J. ErikssonGeorge GelaStanislaw Grzybowski Christopher Hickman Andrew R. Hileman Atsuyuki Inoue Marasu Ishii Wasyl Janischewskyj John Kappenman Robert C. Latham Duilio M. Leite Vito J. Longo Harry G. Mathews James McBride Thomas McDermott Hideki Motoyama Charles H. Moser Abdul M. Mousa Richard E. Orville Dee E. Parrish Charles Pencinger John B. Posey Joseph D. Renowden Farouk A. M. Rizk Thomas Short Mohamed H. Shwehdi Martin A. Uman Edward R. WhiteheadThis document is dedicated to the work and memory of Ed Whitehead:ÒLightning is the enemy,Trees are your friends,TheyÕll shield your linesHoweÕer the leader bends.ÓThe following persons were on the balloting committee:Hanna E. Abdallah Edward J. Adolphson Glenn AndersenJ. G. AndersonJames E. Applequist Edwin AverillMichael P. Baldwin Phil P. BarkerRon L. BarkerE. BetancourtJohn BoyleH. Steve Brewer Joseph F. BuchJames J. Burke Vernon L. Chartier Nisar ChaurdhryPrit ChowdhuriDon ChuEnrico CinieriSam ClutsThomas M. Compton F. Leonard Consalvo William T. CrokerPaul L. Dandeno Glenn A. DavidsonR. DeckerTom. Diamantis William K. DickDale A. Douglass Robert Engelken William E. FeeroJon M. Ferguson Harald FienJames FunkeGeorge GelaDonald A. Gillies Edwin J. Tip Goodwin I. S. GrantStan GrzybowskiAdel E. Hammad Jerome G. Hanson Donald G. Heald Roland Heinrichs Richard W. Hensel Steven P. Hensley Christopher W. Hickman Andrew Robert Hileman John Hunt Atsuyuki InoueWasyl JanischewskyjJohn G. KappenmanRichard KennonJeff J. KesterRobert O. KlugeEd KnappNestor KolcioAlan E. KollarDonald E. KoonceDavid J. KourySamy G. KrishnasamyBarin KumarStephen R. LambertRobert C. LathamBenny H. LeeGerald E. LeeAntonio L. LimJoseph MaK. T. MassoudaMike. McCaffertyJohn McDanielNigel P. McQuinRoss McTaggartJohn E. Merando Jr.Sam MichaelJ. David MitchellDaleep MohlaRichard K. MooreJ. H. MoranHideki MotoyamaAbdul M. MousaJay L. NichollsD. L. NickelGeorge B. NilesStig L. NilssonRonald J. OedemannRobert G. OswaldGerald A. PaivaBert ParsonsMohammad A. PashaJesse M. PattonDavid F. PeeloCarlos PeixotoThomas J. PekarekRobert C. PetersTrevor PfaffR. Leon PlasterPercy E. PoolSteven C. PowellPatrick D. QuinnParvez RashidJerry L. RedingJoseph RenowdenFrank RichensStephen J. RodickJohn R. RossettiG. W. RoweTim E. RoysterThomas J. RozekJohn S. RumbleAnne-Marie SahazizianMahesh P. SampatDonald SandellNeil P. SchmidtJohn SchneiderMinesh ShahDevki N. SharmaJohn SheaTom ShortMohamed H. ShwehdiHyeong Jin SimPritpal SinghTarkeshwar SinghRobert A. SmithStephen F. SmithBrian T. SteinbrecherGary E. StemlerL .R. StenslandRobert P. StewartMike F. StringfellowAndy SweetanaEva J. TarasiewiczSubhash C. TuliJ. G. TzimorangasJoseph J. VaschakJohn VithayathilReigh WallingDaniel J. WardJames T. WhiteheadJeffrey S. WilliamsFrank S. YoungLuciano E. ZaffanellaPeter ZhaoRich ZinserWilliam B. ZollarsWhen the IEEE Standards Board approved this guide on 26 June 1997, it had the following membership:Donald C. Loughry, Chair Richard J. Holleman, Vice ChairAndrew G. Salem, Secretary*Member EmeritusAlso included are the following nonvoting IEEE Standards Board liaisons:Satish K. AggarwalAlan H. CooksonErika H. MurphyIEEE Standards Project EditorNational Electrical Safety Code and NESC are registered trademarks and service marks of the Institute of Electrical and Electronics Engineers, Inc.Windows is a trademark of Microsoft Corporation.Clyde R. Camp Stephen L. Diamond Harold E. Epstein Donald C. Fleckenstein Jay Forster*Thomas F. Garrity Donald N. Heirman Jim Isaak Ben C. JohnsonLowell JohnsonRobert KennellyE. G. ÒAlÓ KienerJoseph L. KoepÞnger*Stephen R. LambertLawrence V . McCallL. Bruce McClungMarco W. Migliaro Louis-Fran•ois Pau Gerald H. Peterson John W. Pope Jose R. Ramos Ronald H. Reimer Ingo RŸsch John S. Ryan Chee Kiow TanHoward L. WolfmanContents1.Overview (1)1.1Scope (1)1.2Purpose (1)1.3Disclaimer (2)2.References (2)3.Definitions and Acronyms (2)3.1Definitions (2)3.2Acronyms (4)4.Route selection (4)4.1Lightning frequency of incidence (4)4.2Route effects (5)4.3Structure height (5)4.4Soil resistivity (6)4.5Adjacent environment (6)5.Shielding (6)5.1Shielding angle (7)5.2The final leader step (7)5.3Flashover from shielding failure (9)5.4Shielding and engineering (10)5.5Shielding of the center phase (12)5.6Overhead ground wire size and operating losses (12)6.Insulation (13)6.1Effect of voltage waveshapes (13)6.2Effect of insulation levels and insulation type (16)6.3Wood or fiberglass in series with insulators (17)6.4Effect of power-line voltage (17)7.Tower footing impedance (18)7.1Composite line performance (18)7.2Supplemental grounding (19)7.3Counterpoise (20)7.4Resistance of complex footings (21)7.5Special grounding effects (21)8.Special methods of improving lightning performance (22)8.1Additional shield wires (22)8.2Guy wires on transmission towers (23)8.3Ground wire on separate structures (23)8.4Line surge arresters (23)8.5Unbalanced insulation on double-circuit lines (24)8.6Active air terminals (24)Annex A (informative) Isolated bonding of wooden structures (26)Annex B (normative) The FLASH program (27)Annex C (informative) Bibliography (34)IEEE Guide for Improving the Lightning Performance of Transmission Lines1. Overview1.1 ScopeFor this guide, a transmission line is any overhead line with a phase-to-phase voltage exceeding 69 kV and an average conductor height of more than 10 m. The transmission line is usually shielded by one or more overhead ground wires (OHGWs), at least for a short distance from a substation. While reference is prima-rily made to ac transmission characteristics, the guide is also relevant for high-voltage direct-current (HVDC) overhead lines.The guide is written for the transmission-line designer. When given the problem of designing or redesigning a transmission line, the designer should consider certain limiting factors such as the voltage level, the begin-ning and ending points for the transmission line, and the desired ampacity of the line. Sometimes the exact route, and the type of conductor and structure have already been determined. Usually the designer may choose structural details, the geometry of the structure, the structure height, the exact placement of the OHGWs, the amount and type of insulation, the type of grounding, and other design features of a line. This guide is written to show the designer which choices will improve or degrade lightning performance. Sections of the guide discuss the effect of routing, structure type, insulation, shielding, and grounding. An additional section discusses several special methods, which may be used to improve lightning performance. Finally, in Annex B, a listing and description of the FLASH program is presented.The line designer should be aware that lightning performance is not of primary importance in the economics of line designing. Other factors, such as line length, right-of-way costs, construction costs, material costs, and losses affect the economics of a line design much more than lightning performance. The designer should always balance the costs of higher insulation levels, improved grounding, better shielding, or line relocation against the beneÞts of improved reliability.1.2 PurposeThis guide contains simple mathematical equations, tables, and graphs that provide the information needed to design an overhead power transmission line with minimum lightning interruptions. Versions 1.6 and 1.7 of the FLASH program are provided on the diskette included with this guide. Annex B includes a description of the program. The FLASH program uses the models in the design guide along with a description of transmission-line features to estimate the lightning outage rate that may be expected. These simpliÞed models may also be adapted to assess the beneÞts of novel methods for improving lightning performance.IEEEStd 1243-1997IEEE GUIDE FOR IMPROVING THE 1.3 DisclaimerThe FLASH program is included in this guide as a convenience to the user. Other numerical methods may be more appropriate in certain situations. The IEEE Working Group on Estimating the Lightning Performance of Overhead Transmission Lines of the Lightning and Insulator Subcommittee has made every effort to ensure that the program yields representative calculations under anticipated conditions. However, there may well be certain calculations for which the method is not appropriate. It is the responsibility of the user to check calculations against Þeld experience or other existing calculation methods.2. ReferencesThis guide shall be used in conjunction with the following standards. When the following standards are superseded by an approved revision, the revision shall apply.ANSI C2-1997, National Electrical Safety Code¨ (NESC¨).1ANSI C29.1-1988 (Reaff 1996), American National Standard for Electric Power InsulatorsÑTest Methods.2 ANSI C29.2-1992, American National Standard for InsulatorsÑWet Process Porcelain and Toughened GlassÑSuspension Type.ANSI C29.8-1985 (Reaff 1995), American National Standard for Wet-Process Porcelain Insulators (Appara-tus, Cap, and Pin Type).3. DeÞnitions and Acronyms3.1 DeÞnitions3.1.1 active air terminal: An air terminal which has been modiÞed to lower its corona inception gradient.3.1.2 air terminal (lightning protection): The combination of an elevation rod and brace, or footing placed on upper portions of structures, together with tip or point, if used.3.1.3 back ßashover (lightning): A ßashover of insulation resulting from a lightning stroke to part of a net-work or electric installation which is normally at ground potential. See also:direct-stroke protection.3.1.4 back-ßashover rate: The annual outage rate on a circuit or tower-line length basis caused by back ßashover on a transmission line.3.1.5 counterpoise: A conductor or system of conductors arranged beneath the line; located on, above, or most frequently below the surface of the earth; and connected to the grounding systems of the towers or poles supporting the transmission lines.3.1.6 critical current: The Þrst-stroke lightning current to a phase conductor which produces a critical impulse ßashover voltage wave.1The NESC is available from the Sales Department, American National Standards Institute, 11 West 42nd Street, 13th Floor, New York, NY 10036, USA. It is also available from the Institute of Electrical and Electronics Engineers, 445 Hoes Lane, P.O. Box 1331, Piscataway, NJ 08855-1331, USA.2ANSI publications are available from the Sales Department, American National Standards Institute, 11 West 42nd Street, 13th Floor, New York, NY 10036, USA.IEEE LIGHTNING PERFORMANCE OF TRANSMISSION LINES Std 1243-1997 3.1.7 critical impulse ßashover voltage (insulators) (CFO): The crest value of the impulse wave which, under speciÞed conditions, causes ßashover through the surrounding medium on 50% of the applications.3.1.8 direct stroke protection (lightning): Lightning protection designed to protect a network or electric installation against direct strokes.3.1.9 ßashover: A disruptive discharge through air around or over the surface of solid or liquid insulation, between parts of different potential or polarity, produced by the application of voltage wherein the break-down path becomes sufÞciently ionized to maintain an electric arc.3.1.10 ground electrode: A conductor or group of conductors in intimate contact with the ground for the purpose of providing a connection with the ground.3.1.11 ground ßash density (GFD): The average number of lightning strokes to ground per unit area per unit time at a particular location.3.1.12 lightning Þrst stroke: A lightning discharge to ground initiated when the tip of a downward stepped leader meets an upward leader from the earth.3.1.13 lightning ßash: The complete lightning discharge, most often composed of leaders from a cloud fol-lowed by one or more return strokes.3.1.14 lightning subsequent stroke: A lightning discharge that may follow a path already established by a Þrst stroke.3.1.15 lightning outage: A power outage following a lightning ßashover that results in system fault current, thereby necessitating the operation of a switching device to clear the fault.3.1.16 line lightning performance: The performance of a line expressed as the annual number of lightning ßashovers on a circuit mile or tower-line mile basis. See also:direct-stroke protection.3.1.17 line surge arrester: A protective device for limiting surge voltages on transmission-line insulation by discharging or bypassing surge current; it prevents continued ßow of follow-current to ground and is capable of repeating these functions.3.1.18 overhead ground wire (OHGW): Grounded wire or wires placed above the phase conductors for the purpose of intercepting direct strokes in order to prevent the phase conductors from the direct strokes. They may be grounded directly or indirectly through short gaps. See also:direct-stroke protection.3.1.19 shielding failure ßash-over rate (SFFOR): The annual number of ßashovers on a circuit or tower-line length basis caused by shielding failures.3.1.20 shielding failure rate (SFR): The annual number of lightning events on a circuit or tower-line length basis, which bypass the overhead ground/shield wire and terminate directly on the phase conductor. This event may or may not cause ßashover.3.2 shield wire: Grounded wire(s) placed near the phase conductors for the purposes ofa)Protecting phase conductors from direct lightning strokes,b)Reducing induced voltages from external electromagnetic Þelds,c)Lowering the self-surge impedance of an OHGW system, ord)Raising the mutual surge impedance of an OHGW system to the protected phase conductors.3.2.1 standard lightning impulse: A unidirectional surge having a 30Ð90% equivalent rise time of 1.2 m s and a time to half value of 50 m s.IEEEStd 1243-1997IEEE GUIDE FOR IMPROVING THE 3.2.2 transmission line: Any overhead line used for electric power transmission with a phase-to-phase volt-age exceeding 69 kV and an average conductor height of more than 10 m.3.2.3 underbuilt shield wires: Shield wires arranged among or below the average height of the protected phase conductors for the purposes of lowering the OHGW system impedance and improving coupling. Underbuilt shield wires may be bonded to the structure directly or indirectly through short gaps. Insulated earth return conductors on HVDC transmission lines, and/or faulted phases, both function as underbuilt shield wires.3.2 AcronymsACSR aluminum conductor, steel reinforcedAWG American Wire GageCFO critical ßashoverEGM electro-geometricEHV extra-high voltageGFD ground ßash densityHV high voltageHVDC high-voltage direct currentOHGW overhead ground wireRTS rated tensile strengthSFR shielding failure rateSFFOR shielding failure ßashover rate4. Route selectionMany factors play an important role in route selection. Power system considerations dictate where the trans-mission line should begin and end. Economic considerations require the line to be as short as possible, because construction costs and electrical losses are high. Certain environmental constraints dictate where and how a transmission line may be built. Even with these restrictions, there are still ways for the transmis-sion-line designer to make decisions that will affect the lightning performance of the line. It is the purpose of this clause to illustrate the ways that a designer may improve the lightning performance of a transmission line by selecting the proper route.4.1 Lightning frequency of incidenceLightning location systems and ßash-counter networks have been deployed in North America [B11], [B34]3 and elsewhere. With enough experience, these networks may provide detailed ground ßash-density (GDF) maps. Orographic and geographic features, such as proximity to large bodies of water or elevation changes, will affect the ßash density. Flash-density maps will provide much greater detail and accuracy than what was previously available with thunder data. A typical GDF map is shown in [B35]. Lightning severity maps may also be available to show where more damaging lightning strokes occur. When two routes with similar soil characteristics are being compared, the route through a region with lower density of severe ßashes will have fewer outages.With more detailed maps averaged over enough time, the designer may select a route with a minimum expo-sure to lightning. By way of guidance about the averaging time required, MacGorman et al. [B29] found that 3The numbers in brackets correspond to those of the bibliography in Annex C.IEEE LIGHTNING PERFORMANCE OF TRANSMISSION LINES Std 1243-1997 one-year average thunder data had roughly 35Ð40% standard deviations, Þve-year averages had 30% stan-dard deviations, and ten-year averages had 25% standard deviations. A similar variation would be expected for GFD. Thus, a minimum of 5Ð10 years of GFD observations are desirable. The natural scatter of lightning activity may make it impossible to estimate a mean value with more precision than the ten-year standard deviation.4.2 Route effectsThe transmission-line designer is often faced with the choice of routing a line through a valley, along the side of a mountain, or on the top of a mountain. This decision may affect the lightning performance of the line in two ways. First, the route may affect the exposure of the line to lightning. Second, soil resistivity may be different for alternate routes. The effects of soil resistivity are discussed in 4.4.Transmission line exposure is affected by both GFD and by the lineÕs physical relation to its environment. A structure that protrudes above the surrounding terrain is more likely to be struck by lightning than a structure shielded by natural features. Structures located along the top of mountains, ridges, or hills will be likely targets for lightning strikes. These locations should be avoided as much as possible. It is preferable to locate structures along mountain sides where the top of the structure does not appear higher than the top of the mountain or ridge. Locating lines in the ßoor of a narrow valley may provide the line with useful lightning protection.Detailed design of lines should consider the effects of different routes on structure height, soil, or overburden depth and the adjacent environment.4.3 Structure heightThe Þrst factor of a line route that affects lightning performance is structure height, especially if its towers are higher than the surrounding terrain. Increasing the tower height has two important effects: more ßashes are collected by the taller structure, and the shielding characteristics of overhead conductors change as height is increased, as explained in Clause 5.The ßash collection rate, N s , is given by the following equation [B18]:(1)wherehis the tower height (m); bis the OHGW separation distance (m); N gis the GFD (ßashes/km 2 /yr); N s is the ßashes/100 km/yr.From Equation (l), if the tower height is increased by 20%, the ßash rate to the line would increase by 12%.If a measured value of N g is not available, it may be estimated [B18], [B29] using(2)(3)whereT dis the number of thunderstorm days/yr (Keraunic Level);N S N g 28h 0.6b +10-----------------------èøæö=N g 0.04T d 1.25=N g 0.04T h1.1=IEEEStd 1243-1997IEEE GUIDE FOR IMPROVING THE T h is the number of thunderstorm hours/yr.Data for T d and T h may be obtained from several meteorological sources, (e.g., [B11], [B29]).4.4 Soil resistivityThe second factor of a line route that affects lightning performance is soil resistivity. Resistivity has a linear relationship to footing impedance, which is explained in Clause 7. Substantial voltages are generated on grounded members of the structure when either the OHGW or the structure are struck by lightning. High structure footing impedances cause increased voltages and more lightning outages for a given lightning exposure.A complete line design will specify the types and sizes of ground electrodes needed to achieve the required footing impedance. The electrode sizes and shapes will depend on the range of soil conductivities found on installation. In some geographic areas, surveys of apparent ground resistivity have been carried out for radio-frequency broadcast [B25] or geological purposes. In particular, airborne electromagnetic surveys [B36] at 10Ð50 kHz return a suitable conductivity-depth product which may be analyzed further or used directly for tower spotting.High footing impedances occur in rocky terrain, which should be avoided as much as possible. When rocky terrain may not be avoided, improved grounding methods should be used to lower footing impedances to acceptable values. These improved methods usually require large-ring or radial-crowfoot installation at con-siderable cost. Rocky terrain usually occurs on mountain tops and mountain sides, while river lowlands tend to have low soil resistivity. This is a second reason why transmission-line routes away from hill crests will tend to have better lightning performance.4.5 Adjacent environmentOne way to prevent structures from being a target for lightning is to take advantage of surrounding forestation. Tall trees located near the transmission line may intercept a lightning stroke which may have caused an outage if the tree had not been there. Most transmission lines have sufÞcient impulse strength to be immune to the resulting induced voltages. Reference [B31] provides additional information about the effects of forestation. When possible, lines may be routed through forestation and tall trees may be left in place near the line. Trees that may fall on the line or reduce operating clearances will require more frequent trimming, and forest Þres may degrade the line protection locally. However, the reduction in apparent line height and corresponding reduction in lightning outages may still be worthwhile.Tall structures located in ßat, open Þelds make excellent targets for lightning. In these conditions, the struc-ture height should be minimized, and the structure footing impedance should be reduced much as possible. Another way to use the surrounding environment to shield a transmission line from lightning is to route the line next to existing transmission line structures. Experience has shown that a line sharing right-of-way with another line having taller structures will have fewer lightning outages than if it were on a separate right-of-way. Two lines of identical design and on adjacent rights-of-way share lightning strikes resulting in lower than normal outage rates for both lines. This improvement should be balanced with the greater risk of multi-ple line outages.5. ShieldingWhen lightning strikes a phase conductor, no other object shares in carrying the lightning current. Most ßashes to an unprotected phase conductor are therefore capable of producing ßashovers. OHGWs may inter-cept the stroke and shunt the current to the ground through the tower impedance and footing resistance if they are properly located. The resultant voltages across the transmission-line insulation, and the likelihood。
The Differences and Solutions of Male and Female Students’English Learning【abstract】 in china, female students always perform better than male students in english learning. this thesis seeks to tackle the differences between male and female students,aiming to uncover the causes of the differences and explore the solutions for male and female students’ english learning.【关键词】 male and female students english learning differences solutionsin china, female students always have better performance than male students in english learning. what caused the differences between male and female students in english learning? and what can we do to deal with the differences? ⅰ. differences between male and female students1.1 biological differences between male and female students first, scientific research has found that the left brain dominates language and logical analysis, right brain dominates imaginable thinking and intuition. females’ left brain function better than males’, it specializes earlier and develops in a higher level. these make the female students’ language ability is superior to male students, andit makes female students be likely to pay attention to details. males’ right brain is superior to females’, so male students are good at dealing with spatial relationships of information and analyzing.1.2 psychological differences between male and female students1.2.1 intelligence differencethe male and the female are different in such areas as perception, memory and thinking. the female is good at listening and speech perception, while the male is good at vision and spatial perception. in addition, according to the characteristics of people in the process of perception, consciousness can be divided into three types: analytical, integrated and analytical-integrated analysis. it’s observed that female students lean to the analytical analysis, and male students lean to the integrated analysis. on the proportion of analytical-integrated analysis, female students’proportion is higher than male students’. generally speaking, female students are good at image memory, emotion and motor memory, male students are relatively good at logic memory. on the memory method, in general, girls are slightly dominant in unconscious memorizing and mechanical memorizing. inconscious and meaning memorizing, the male students are better than female students.on thinking, studies at home and abroad have the same views, that is female students have imaginable thinking, they are inclined to resolve the problem with images; male students think more logically, they are accustomed to solving problems with abstract thinking. on the quality of thinking, male students are slightly superior to female students. thinking quality includes depth, flexibility, ingenuity and agility. but from the view of english learning, female students’thinking is better than male students’ because female students prefer imaginable thinking which is used to fine processing, and it is cautious and slow, but more accurate. and male students are more impulsive.1.2.2 differences of learning motivationstudies have shown that female students are more interested in english learning, and they have obvious tendency to intrinsic motivation. male students are not very interested in english learning in general, they always urged by parents and teachers. they are more passive in english learning and have stronger external motivation.1.2.3 differences of personalitymales, in general, are confident, brave, independent, open-minded, careless, impatient, progressive, they have more emotional fluctuations. females generally are calm, patient, meticulous, serious, timid, shy, vulnerable and so on. on the personality types, males are mostly extroversive, independent, and strong; most females are introversive, submissive.male students are more willing to show them in the learning process, especially in spoken english, but sometimes they tend to be too impulsive. they think too hurriedly, and the answers are not accurate enough. while female students are not confident enough, they think carefully and their answers are more accurate.1.2.4 differences of cognitive stylefield-independent learners often perceive things with internal references rather than external references. they judg things in their own standard, and they are not very good at getting social information and feel free from the influence of external information. their emotional expressions are confident, self-reliant, brave and combative. they are not good at communicating with others. field-dependent learners are more dependent on external references when perceivethings. they like associate with others and pay more attention to social information. they are good at imitating and easy to be influenced by others, and they cannot differentiate things from the complex situation including a number of elements and components.1.2.5 differences of learning strategieslearning strategies can be divided into: cognitive strategies, metacognitive strategies, emotional strategies. the survey showed that: female students outperform male students in applying cognitive strategies and metacognitive strategies. about the emotional strategy, male students are slightly superior to female students. but sometimes, when facing difficulties that appear in the english learning, female students performance better at relieving their negative emotions, such as using words and so on. male students are typically not good at giving vent to their feelings. the long-term depression and anxiety will deepen their boredom feeling of english learning and even make them fear to english. on the whole, female students are more active than boys on the use of english learning strategies, in order to achieve better learning outcomes, female students use more learning strategies frequently.ⅱ. solutionsdue to physiological factors cannot be changed, here are some solutions that correspond to the psychological aspects.2.1 intelligence differencefemales are good at listening and speech perception, while males are good at vision and spatial perception. female students can listen to english songs, news and other listening materials to improve their english skills. females have a good sense of imitation, so female students should imitate more, and pay attention to the accuracy of and fluency of speech. male students can enhance english learning effects by using pictures, charts, videos, and actions, they need to listen and read more to enhance their english language sense. both male and female students should make more use ofanalytical-integrated analysis to solve problems in learning english.female students can use more conscious memory while using their mechanical memory to improve efficiency. male students need to make good use of the conscious memory and review frequently. in terms of thinking, male students should think comprehensively and carefully in english learning process. they should start from details to the whole when they arethinking, and they should pay more attention to the accuracy. female students should limit their thinking time, facilitating judgment by using intuition, improving the problem-solving speed and comprehensive problem-solving ability.2.2 learning motivationmale students are more easy to be influenced by extrinsic motivation, and they are more vulnerable to external incentives than female students. therefore male students should enhance their english learning purpose and stimulate their achievement motivation. teachers and parents should encourage them. and achieving success, is one of the best incentives. in addition, the female students should try to increase their interest in english through a variety of methods, and they should learn from excellent english learners in order to achieve greater success.2.3 differences of personalityas males are more impulsive, therefore they should keep calm when solve problems, and they should consider more deliberately before coming to the conclusion or solution to a problem. female students should be more confident without fear of errors.2.4 differences of cognitive styleas the male students tend to the field independent cognitive style in general, they are more rational than female students. when encountering problems, they prefer to solve them independently. so their interpersonal skills, discourse skills are weaker than femal students. so male students need to participate in more cooperative activities in english learning and communicate more with others. female students should overcome crowd and be more independent as well as be more active in the competition.2.5 differences of learning strategiesmale students need to improve their metacognitive strategies, they should plan, supervise and assess their own learning, reinforcing learning purpose and becoming initiative in english leraning. they should learn to groom their bad mood when encounter difficulties in the learning, for example, they can talk with others or ask for help. they should constantly encourage themselves to overcome difficulties and learn to use various learning strategies. female students should learn how to optimize their learning strategies to improve learning efficiency and use more emotional learning strategies, such as communication andco-operation.2.6 both male and female students should know how to use their strengths as well as avoid their weaknesses and learn from each other to make progressin practical english learning, everyone’s strengths and weaknesses vary from each other. we should recognize and respect the gender differences in english learning. at the same time, we should be aware of our special problems and adapt the appropriate solutions. by doing these, everyone can become a successful english learner.【references】[1] zhou chunfeng. research on gender differences and strategies of high school students ’ english learning[c]. master’s degree thesis, east china normal university, 2005.[2] wei li. comparative study of english learning differences between male and female students[m]. heilongjiang science and technology information, 2008。
Part Ⅱ Proofreading and Error Correction (15 min)The following passage contains TEN errors. Each line contains a maximum of ONE error. In each case, only ONE word is involved. You should proofread the passage and correct it in the following way.For a wrong word, underline the wrong word and write the correct one in the blank provided at the end of the line.For a missing word, mark the position of the missing word with a “∧” sign and write the word you believe to be missing in the blank provided at the end of the line.For an unnecessary word cross out the unnecessary word with a slash “/’ and put the word in the blank provided at the end of the line.ExampleWhen∧art museum wants a new exhibit, (1) anit never/buys things in finished form and hangs (2) neverthem on the wall. When a natural history museumwants an exhibition, it must often build it. (3) exhibitThe hunter-gatherer tribes that today live as our prehistoric 1.______ human ancestors consume primarily a vegetable diet supplementing 2._____with animal foods. An analysis of 58 societies of modem hunter-gatherers, including the Kung of southern Africa, revealed that onehalf emphasize gathering plant foods, one-third concentrate on fishingand only one-sixth are primarily hunters. Overall, two-thirdsand more of the hunter-gatherer’s calories come from plants. Detailed 3.______ studies of the Kung by the food scientists at the University ofLondon, showed that gathering is a more productive source of foodthan is hunting. An hour of hunting yields in average about 100 4.______ edible calories, as an hour of gathering produces 240. 5.______ Plant foods provide for 60 percent to 80 percent of the Kung 6._______ diet, and no one goes hungry when the hunt fails. Interestingly, ifthey escape fatal infections or accidents, these contemporaryaborigines live to old ages despite of the absence of medical care. 7._______ They experience no obesity, no middle-aged spread, little dentaldecay, no high blood pressure, on heart disease, and their bloodcholesterol levels are very low( about half of the average American 8._______ adult), if no one is suggesting what we return to an aboriginal life 9.________ style, we certainly could use their eating habits as a model for 10.________ healthier diet.The grammatical words which play so large a part in Englishgrammar are for the most part sharply and obviously different 1._______ from the lexical words. A rough and ready difference which mayseem the most obvious is that grammatical words have“ lessmeaning”, but in fact some grammarians have called them 2._______“empty” words as opposed in the “full” words of vocabulary. 3.________ But this is a rather misled way of expressing the distinction. 4._________ Although a word like the is not the name of something as man is,it is very far away from being meaningless; there is a sharp 5._________ difference in meaning between “man is vile and” “the man isvile”, yet the is the single vehicle of this difference in meaning. 6.________ Moreover, grammatical words differ considerably amongthemselves as the amount of meaning they have, even in the 7.________ lexical sense. Another name for the grammatical words has been“little words”. But size is by no mean a good criterion for 8._________ distinguishing the grammatical words of English, when weconsider that we have lexical words as go, man, say, car. Apart 9.________ from this, however, there is a good deal of truth in what somepeople say: we certainly do create a great number of obscurity 10.________ when we omit them. This is illustrated not only in the poetry ofRobert Browning but in the prose of telegrams and newspaper headlines.During the early years of this century, wheat was seen as thevery lifeblood of Western Canada. People on city streets watchedthe yields and the price of wheat in almost as much feeling as if 1._______ they were growers. The marketing of wheat became an increasing 2._______ favorite topic of conversation.War set the stage for the most dramatic events in marketingthe western crop. For years, farmers mistrusted speculative grainselling as carried on through the Winnipeg Grain Exchange.Wheat prices were generally low in the autumn, so farmers could 3._______ not wait for markets to improve. It had happened too often thatthey sold their wheat soon shortly after harvest when farm debts 4.________ were coming due, just to see prices rising and speculators getting rich. 5._______ On various occasions, producer groups, asked firmer control, 6._______ but the government had no wish to become involving, at 7.______ least not until wartime when wheat prices threatened to runwild.Anxious to check inflation and rising life costs, the federal 8.______ government appointed a board of grain supervisors to deal withdeliveries from the crops of 1917 and 1918. Grain Exchangetrading was suspended, and farmers sold at prices fixed by theboard. To handle with the crop of 1919, the government appointed 9.______ the first Canadian Wheat Board, with total authority to 10.______ buy, sell, and set prices.There are great impediments to the general use of a standardin pronunciation comparable to that existing in spelling (orthography).One is the fact that pronunciation is learnt “naturally”and unconsciously, and orthography is learnt 1__________ deliberately and consciously. Large numbers of us, in fact,remain throughout our lives quite unconscious with what our speech 2.__________ sounds like when we speak out, and it often comes as a shock 3.__________ when we firstly hear a recording of ourselves. It is not a voice we 4._________ recognize at once, whereas our own handwriting is somethingwhich we almost always know. We begin the natural learning 5.__________ of pronunciation long before we start learning to read or write,and in our early years we went on unconsciously imitating and 6.__________ practicing the pronunciation of those around us for many morehours per every day than we ever have to spend learning even our 7.___________ difficult English spelling. This is “natural”, therefore, that our 8.__________ speech-sounds should be those of our immediate circle; after all,as we have seen, speech operates as a means of holding a community 9.__________ and giving a sense of 'belonging'. We learn quite early torecognize a “stranger”, someone who speaks with anaccent of a different community-perhaps only a few miles far. 10.__________ 2003改错Demographic indicators show that Americans in the postwarperiod were more eager than ever to establish families. They quicklybrought down the age at marriage for both men and women and broughtthe birth rate to a twentieth century height after more than a hundred (1)______ years of a steady decline, producing the “baby boom.” These young(2)_______ adults established a trend of early marriage and relatively largefamilies that Went for more than two decades and caused a major (3)_______ but temporary reversal of long-term demographic patterns. Fromthe 1940S through the early 1960s, Americans married at a high rate (4)________ and at a younger age than their Europe counterparts. (5)________ Less noted but equally more significant, the men and women on who (6)________ formed families between 1940 and 1960 nevertheless reduced the (7)________ divorce rate after a postwar peak; their marriages remained intact toa greater extent than did that of couples who married in earlier as well (8)________ as later decades. Since the United States maintained its dubious (9)_________ distinction of having the highest divorce rate in the world, thetemporary decline in divorce did not occur in the same extent in (10)_________ Europe. Contrary to fears of the experts, the role of breadwinner andhomemaker was not abandoned.2004改错One of the most important non-legislative functions of the U.S Congressis the power to investigate. This power is usually delegated to committees - either standing committees, special committees set for a specific (1)________ purpose, or joint committees consisted of members of both houses. (2)________ Investigations are held to gather information on the need forfuture legislation, to test the effectiveness of laws already passed,to inquire into the qualifications and performance of members andofficials of the other branches, and in rare occasions, to lay the (3)________ groundwork for impeachment proceedings. Frequently, committeesrely outside experts to assist in conducting investigative hearings (4)_________ and to make out detailed studies of issues. (5)_________ There are important corollaries to the investigative power. Oneis the power to publicize investigations and its results. Most (6)_________ committee hearings are open to public and are reported (7)__________ widely in the mass media. Congressional investigationsnevertheless represent one important tool available to lawmakers (8)__________ to inform the citizenry and to arouse public interests in national issues.(9)________ Congressional committees also have the power to compeltestimony from unwilling witnesses, and to cite for contemptof Congress witnesses who refuse to testify and for perjurythese who give false testimony. (10)_________ 2005改错The University as BusinessA number of colleges and universities have announced steeptuition increases for next year much steeper than the current,very low, rate of inflation. They say the increases are needed becauseof a loss in value of university endowments heavily investing in common 1 stock. I am skeptical. A business firm chooses the price that maximizesits net revenues, irrespective fluctuations in income; and increasingly the 2 outlook of universities in the United States is indistinguishable from those of 3 business firms. The rise in tuitions may reflect the fact economic uncertainty 4 increases the demand for education. The biggest cost of beingin the school is foregoing income from a job (this is primarily a factor in 5 graduate and professional-school tuition); the poor one's job prospects, 6 the more sense it makes to reallocate time from the job market to education,in order to make oneself more marketable.The ways which universities make themselves attractive to students 7 include soft majors, student evaluations of teachers, giving studentsa governance role, and eliminate required courses. 8 Sky-high tuitions have caused universities to regard their students as customers. Just as business firms sometimes collude to shorten the 9 rigors of competition, universities collude to minimize the cost to them of the athletes whom they recruit in order to stimulate alumni donations, so the best athletes now often bypass higher education in order to obtain salaries earlierfrom professional teams. And until they were stopped by the antitrust authorities, the Ivy League schools colluded to limit competition for the best students, by agreeing not to award scholarships on the basis of merit rather than purelyof need-just like business firms agreeing not to give discounts on their best 10 customer.2006改错We use language primarily as a means of communication withother human beings. Each of us shares with the community in which welive a store of words and meanings as well as agreeing conventions as 1_______ to the way in which words should be arranged to convey a particular 2______ message: the English speaker has in his disposal vocabulary and a3_______ set of grammatical rules which enables him to communicate his4______ thoughts and feelings, in a variety of styles, to the other English 5_______ speakers. His vocabulary, in particular, both that which he uses activelyand that which he recognizes, increases in size as he growsold as a result of education and experience. 6______ But, whether the language store is relatively small or large, the systemremains no more, than a psychological reality for tike inpidual, unlesshe has a means of expressing it in terms able to be seen by another 7_______ member of his linguistic community; he bas to give tile system aconcrete transmission form. We take it for granted rice’ two m ost8_______ common forms of transmission-by means of sounds produced by ourvocal organs (speech) or by visual signs (writing). And these are 9___ ___ among most striking of human achievements. 10_______2007改错From what has been said, it must be clear that no one canmake very positive statements about how language originated.There is no material in any language today and in the earliest 1records of ancient languages show us language in a new and 2emerging state. It is often said, of course, that the language 3 ___ originated in cries of anger, fear, pain and pleasure, and the 4 necessary evidence is entirely lacking: there are no remotetribes, no ancient records, providing evidence ofa language with a large proportion of such cries 5than we find in English. It is true that the absenceof such evidence does not disprove the theory, but in 6other grounds too the theory is not very attractive.People of all races and languages make rather similarnoises in return to pain or pleasure. The fact that7such noises are similar on the lips of Frenchmenand Malaysians whose languages are utterly different,serves to emphasize on the fundamental difference8__________ between these noises and language proper. We maysay that the cries of pain or chortles of amusementare largely reflex actions, instinctive to large extent, 9whereas language proper does not consist of signsbut of these that have to be learnt and that are10__________ wholly conventional.2008年改错The desire to use language as a sign of national identityis a very natural one,and in result language has played a 1__________ prominent part in national moves.Men have often felt the need 2__________ to cultivate a given language to show that they are distinctive 3____________ from another race.whose hegemony they resent.At the time the 4.___________ United States split off from Britain,for example,therewere proposals that independence should be linguistically accepted by 5._________ the use of a different language from those of Britain. 6.__________ There was even one proposal that Americans should adopt Hebrew.Others favoured the adoption of Greek,though,as one man put it,things would certainly be simpler for Americans if they stuck on to 7.___________ English and made the British learn Greek.At the end,as everyone 8.___________ knows,the two countries adopted the practical and satisfactorysolution of carrying with the same language as before.Sincenearly two hundred years now,they have shown the 9.____________ world that political independence and national identity can be 10.___________ complete without sacrificing the enormous mutual advantages of a common language.2009年改错The previous section has shown how quickly a rhyme passesfrom one school child to the next and illustrates the further difference (1)__ ___ between shcool lore and nursery lore. In nursery lore a verse, learntin early childhood, is not usually passed on again when the little listener (2)__ ___ has grown up, and has children of their own, or even grandchildren. (3)___ __ The period between learning a nursery rhyme and transmittingIt may be something from twenty to seventy years. With the playground (4)__ ___ lore, therefore, a rhyme may be excitedly passed on whtin the very hour (5)__ ___it is learnt; and in the general, it passes between children of the (6)___ __ same age, or nearly so, since it is uncommon for the difference in agebetween playmates to be more than five years. If therefore, a playgroundrhyme can be shown to have been currently for a hundred years, or (7)___ __ even just for fifty, it follows that it has been retransmitting overand over; very possibly it has passed along a chain of two or three (8)__ ___ hundred young hearers and tellers, and the wonder is that it remains live (9)___ __ after so much handling, to let alone that it bears resemblance to the (10)__ __ original wording.2012PART IV PROOFREADING & ERROR CORRECTION (15 MIN)The passage contains TEN errors.Each indicated line contains a maximum of ONE error.In each case, only ONE word is involved.You should proof-read the passage and correct it in the following way:For a wrong word, underline the wrong word and write the correct one in the blank provided at the end of the line.For a missing word, mark the position of the missing word with a "L" sign and write the word you believe to be missing in the blank provided at the end of the line.For an unnecessary word, cross the unnecessary word with a slash "/" and put the word in the blank provided at the end of the line.EXAMPLEWhen A art museum wants a new exhibit, (1) anit never buys things in finished form and hangs (2) neverthem on the wall.When a natural history museumwants an exhibition, it must often build it.(3) exhibitProofread the given passage on ANSWER SHEET TWO as instructed.The central problem of translating has always been whether to translate literally or freely.The argument has been going since at least the first (1) ______ century B.C.Up to the beginning of the 19th century, many writersfavoured certain kind of “free” translation: the sp irit, not the letter; the (2) _______ sense not the word; the message rather the form; the matter not (3) _______ the manner.This is the often revolutionary slogan of writers who (4) _______ wanted the truth to be read and understood.Then in the turn of 19th (5) _______ century, when the study of cultural anthropology suggested thatthe linguistic barriers were insuperable and that the language (6) _______ was entirely the product of culture, the view translation was impossible (7) _______ gained some currency, and with it that, if was attempted at all, it must be as (8) _____ literal as possible.This view culminated the statement of the (9) _______ extreme “literalists” Walter Benjamin and Vladimir Nobokov.The argument was theoretical: the purpose of the translation, thenature of the readership, the type of the text, was not discussed.Toooften, writer, translator and reader were implicitly identified witheach other.Now, the context has changed, and the basic problem remains.(10)_____。
Performance AttributionAnalysisPrefacePerformance attribution interprets how investors achieve their performance and measures the sources of value added to a portfolio. This guide describes how returns, relative to a benchmark, are broken down into attribution effects to determine how investors achieve performance and measure the sources of value added to a portfolio.About Performance AttributionFor investment managers to evaluate their job performance, they need to know how they achieved their performance results. In particular, they need to know whether their success is the result of their ability to effectively allocate their portfolio’s assets to various segments, their ability to effectively select securities within a given segment or the combined effect of their selection and allocation within a segment. Performance attribution interprets how investors achieve their performance and measures the sources of value added to a portfolio. To determine success, investors establish a benchmark, which they seek to outperform. Value added is the amount the return achieves in excess of the benchmark.Different Attribution MethodsThere are generally considered to be three basic forms of attribution. These include multi-factor analysis, style analysis and return decomposition analysis. The highlights of each are as follows: Multi-Factor Analysis· Attributes performance to factors such as P/E ratio, economy, bond durations, etc· Used by academics and sophisticated firms· More difficult to calculate· Requires a great deal of data (economic/fundamental)· Difficult to explain· Not widely accepted in industryStyle Analysis· Developed by noble laureate William Sharpe (Sharpe ratio)· Uses portfolio rates of return to determine investment style· Easy to calculate (requires benchmark and portfolio returns)· Easy to explain (uses regression analysis)· Not widely accepted in industryReturn Decomposition Analysis· Attributes performance vs. benchmarks· Can focus on allocation (top/down approach) or selection (bottom up approach)· Easy to calculate ( requires benchmark and portfolio returns and weights)· Easy to understand and explain· Used by large portion of investment community· Widely accepted in industryAs the return decomposition analysis is most widely used and accepted, this is the model we will examine.About Attribution EffectsIn a return decomposition analysis model, value added to a portfolio’s return is commonly referred to as the active management effect. The active management effect is the difference between the total portfolio return and total benchmark return. It is also the sum of the following investment decisions or effects:• Allocation• Selection• InteractionDefining the Allocation EffectThe allocation effect measures an investment manager’s ability to effectively allocate their portfolio’s assets to various segments. A segment refers to assets or securities that are grouped within a certain classification such as Equity, Fixed, or Technology. The allocation effect determines whether the overweighting or underweighting of segments relative to a benchmark contributes positively or negatively to the overall portfolio return. Positive allocation occurs when the portfolio is overweighted in a segment that outperforms the benchmark and underweighted in a segment that underperforms the benchmark. Negative allocation occurs when the portfolio is overweighted in a segment that underperforms the benchmark and underweighted in a segment that outperforms the benchmark.Scenario 1A positive allocation effect occurred because the portfolio weight was greater than the benchmark weight and the benchmark return was greater than the total benchmark return. The investment manager overallocated assets to a segment that outperformed the total benchmark.Scenario 2A negative allocation effect occurred because the portfolio weight was greater than the benchmark weight and the benchmark return was less than the total benchmark return. The investment manager overallocated assets to a segment that underperformed the total benchmark. Scenario 3A negative allocation effect occurred because the portfolio weight was less than the benchmark weight and the benchmark return was greater than the total benchmark return. The investment manager underallocated assets to a segment that outperformed the total benchmark. Scenario 4A positive allocation effect occurred because the portfolio weight was less than the benchmark weight and the benchmark return was less than the total benchmark return. The investment manager underallocated assets to a segment that underperformed the benchmark. Calculating the Allocation EffectThe following calculation is used to calculate a portfolio’s allocation effect.The following example shows how return decomposition analysis calculates the portfolio’s allocation effect.Example:Benchmark = S&P 500Benchmark segment = S&P 500 TechnologyBenchmark return = 9%Benchmark weight = 7%Portfolio technology return = 8%Portfolio technology weight = 15%Total benchmark return = 7%The allocation effect can be expressed in percentages or basis points (bps) as follows:The allocation effect in this example is positive because the manager overweighted the Portfolio Technology segment, which performed better than the total benchmark for the portfolio.Defining the Selection EffectThe selection effect measures the investment manager’s ability to select securities within a given segment relative to a benchmark. The over or underperformance of the portfolio is weighted by the benchmark weight, therefore, selection is not affected by the manager’s allocation to the segment. The weight of the segment in the portfolio determines the size of the effect—the larger the segment, the larger the effect is, positive or negative.The table that follows displays two possible scenarios for the selection effect:Selection Effects TableScenario 1A positive selection effect occurred because the portfolio return was greater than the benchmark return. The investment manager made good decisions in selecting securities that, as a whole, outperformed similar securities in the benchmark.Scenario 2A negative selection effect occurred because the portfolio return was less than the benchmark return. The investment manager made poor decisions in selecting securities that, as a whole, underperformed similar securities in the benchmark.Calculating the Selection EffectThe return decomposition model uses the following calculation to calculate a portfolio’s selection effect:The following example shows how a portfolio’s selection effect is calculated.Example:Benchmark = S&P 500Benchmark segment = S&P 500 TechnologyBenchmark return = 9%Benchmark weight = 7%Portfolio technology return = 8%Portfolio technology weight = 15%Total benchmark return = 7%The selection effect can be expressed in percentages or basis points (bps) as follows:There is a negative selection effect in this example because the manager selected securities that did not perform as well as the securities in the benchmark for the same segment.Defining the Interaction EffectThe interaction effect measures the combined impact of an investment manager’s selection and allocation decisions within a segment. For example, if an investment manager had superior selection and overweighted that particular segment, the interaction effect is positive. If an investment manager had superior selection, but underweighted that segment, the interaction effect is negative. In this case, the investment manager did not take advantage of the superior selection by allocating more assets to that segment.Since many investment managers consider the interaction effect to be part of the selection or the allocation, it is often combined with the either effect.The table that follows displays four possible scenarios for the interaction effect:Interaction Effects TableScenario 1A positive interaction effect occurred because the portfolio weight was greater than the benchmark weight and the portfolio return was greater than the benchmark return. The investment manager exercised good selection and overallocated assets to that segment.Scenario 2A negative interaction effect occurred because the portfolio weight was greater than the benchmark weight and the portfolio return was less than the benchmark return. While the investment manager overweighted the portfolio securities for a given segment, that segment underperformed against the benchmark return for the same segment.Scenario 3A negative interaction effect occurred because the portfolio’s weight was less than the benchmark weight and the portfolio’s return was greater than the benchmark return. The investment manager underweighted the segment with good selection. The manager exercised good selection but poor allocation.Scenario 4A positive interaction effect occurred because the portfolio weight was less than the benchmark weight and the portfolio return was less than the benchmark return. The impact of the joint effects is positive because the manager’s decision to underweight a poor performing segment was a good decision.Calculating the Interaction EffectThe Performance Attribution module uses the following calculation to calculate a portfolio’s interaction effect:The following example shows how the return decomposition model calculates a portfolio’s interaction effect.Example:Benchmark = S&P 500Benchmark segment = S&P 500 TechnologyBenchmark return = 9%Benchmark weight = 7%Portfolio technology return = 8%Portfolio technology weight = 15%Total benchmark return = 7%The interaction effect can be expressed in percentages or basis points (bps) as follows:There is a negative interaction effect in this example because the manager overallocated securities for a segment that performed poorly relative to the benchmark.Defining the Active Management EffectThe active management effect is the sum of the selection, allocation, and interaction effects. It is also the difference between the total portfolio return and the total benchmark return. You can use the active management effect to determine the amount the investment manager has added to a portfolio’s return. If the active management effect is positive, the investment manager has contributed positively to the portfolio’s return. If the active management effect is negative, the investment manager has not contributed positively to the portfolio’s return.The return decomposition model uses the following calculation to calculate the active management effect:。
performance evaluation methodsPerformance evaluation is one of the most critical aspects of any organization. It refers to the process of assessing an employee's work performance against predetermined standards of performance. The primary objective of performance evaluation is to identify the strengths and weaknesses of an employee and provide feedback to help them improve their work.There are several methods of performance evaluation used in organizations. This article will discuss some of these methods in detail.1. Self-EvaluationSelf-evaluation is a method of performance evaluation in which employees assess their performance against predetermined goals and standards. This method involves employees reflecting on their work, identifying their strengths and weaknesses, and documenting their accomplishments.Self-evaluation can be a valuable tool, as it allows employees to take ownership of their performance, encourages self-reflection, and can improve communication between employee and employer.2. Peer EvaluationPeer evaluation is a method of performance evaluation in which an employee's performance is assessed by their colleagues. This method involves colleagues providing feedback on the employee's work, interpersonal skills, and overall contribution to the team.Peer evaluation can be a useful tool, as it provides employees with valuable feedback from their colleagues, encourages teamwork and collaboration, and can help identify areas for improvement.3. Manager EvaluationManager evaluation is a method of performance evaluation in which an employee's performance is assessed by their manager. This method involves the manager providing feedback on the employee's work performance, as well as their contribution to the organization.Manager evaluation can be effective if the manager has a good understanding of the employee's work and can provide constructive feedback. However, this method can also be biased, as managers may be influenced by personal biases or factors outside of work.4. 360-Degree Feedback360-degree feedback is a method of performance evaluation in which an employee's performance is assessed by multiple sources, including colleagues, managers, and customers. This method involves gathering feedback from a variety of perspectives to provide a comprehensive view of theemployee's performance.360-degree feedback can be an effective tool, as it provides a more complete picture of an employee's performance, encourages collaboration and teamwork, and can help identify areas for improvement.In conclusion, there are several methods of performance evaluation used in organizations. Each method has itsstrengths and weaknesses, and the most effective method will depend on the objectives of the evaluation and the culture of the organization. Employers should consider using a varietyof methods to provide a comprehensive view of an employee's performance and encourage ongoing development and improvement.。
Low Temperature Viscosity Measurements -Lovis for Battery ElectrolytesRelevant for: battery industry, electrochemical research, automotive industryPerform viscosity measurements down to -20 °C with Lovis 2000 M/ME with cooling option. Test even highly corrosive solvents for ion salts by using unbreakable PCTFE capillaries with small filling volumes (110 µL or 450 µL). Handling of the sample inside a glove box filled with inert gas and a hermetically closed system prevent contamination or evaporation of the sample.1 IntroductionSince the introduction of the lithium-ion batteries in 1990, the interest in this technology has emerged steadily, not only for portable devices but also for the automotive industry. Their high energy density as well as outstanding cycle stability are the main reasons for commercial success, but several problems arise with the usage of the most common non-aqueous electrolytes, which contain lithiumhexafluoro-phosphate (LiPF6) as conductive salt and a mixture of cyclic and non-cyclic organic carbonates.In addition to the high purity required of all used solvents (e.g. traces of protic impurities such as water can cause severe deterioration of the cell performance after a short life / cycle time) the cell performance has to be stable over a broad temperature range from arctic to tropical conditions without any significant degradation. Therefore, an exact characterization of newly developed electrolytes at different temperatures is an essential part in the lithium-ion cell research today. These challenges have to be considered for every other upcoming battery systems like magnesium ion cells or sulfur cells, too.Therefore, research companies use different standard electrochemical measurements for monitoring batteries. In this connection viscosity, conductivity and – if required – density measurements of the electrolytes support those investigations.The performance of the charge and discharge rate of a rechargeable battery, that is the ion transport, is characterized by the ion conductivity, which depends on the viscosity and the dielectric constant.The viscosity of the solvent, in which the ion salt is solved, affects the mobility of ions, as shown in the Stokes-Einstein equation; mobility is inversely proportional to the viscosity:r ... radius of the solvated ionBased on those viscosity measurements important conclusions on the wettability of the electrode /electro-lyte interface can be drawn, too. Fast, accurate and reproducible viscosity measurement over a wide temperature range is highly desirable for successful development of new electrolyte systems.This application report shows how the Lovis can be used for electrolyte measurements even at tempera-tures below zero. The Lovis, equipped with coolingoption and in combination with the capillary made of PCTFE, enables measurement of highly corrosive substances over a wide temperature range.2 Instrumentation2.1Lovis 2000 M/ME Microviscometer with Cooling OptionFigure 1: Lovis 2000 M with cooling optionThe Lovis 2000 M/ME Microviscometer measures the rolling time of a ball inside an inclined capillary.Variable inclination angles allow for measurements at different shear rates. Temperature control via Peltier elements is extremely fast and provides utmost accuracy.For measuring at temperatures below zero, the Lovis ME Module can be equipped with a lowtemperature option. In combination with a recirculating cooler, it is possible to measure at temperatures as low as -20 °C (lower temperatures down to -30 °C on request, depending on the cooling liquid of the recirculating cooling, ambient temperature and ambient air humidity).The integrated software calculates the kinematic or dynamic viscosity, provided the sample's density value is known.Figure 2: Lovis PCTFE capillariesWith the PCTFE capillaries it is possible to measure nearly every liquid, also corrosive, aggressive or hazardous solvents and electrolytes.The measuring viscosity of a PCTFE capillary ranges from 0.8 mPa.s to 160 mPa.s.Used material:▪ Capillary: PCTFE short (110 µL) ▪ Capillary diameter: 1.62mm ▪ Ball material: Steel ▪ Ball diameter: 1.5 mm2.3 Additional Equipment▪ Glove box filled with argon.▪Circulation cooler plus insulated hoses. How to set up the cooling is precisely described in the documentation of Lovis 2000 M/ME.3MeasurementAll determinations were performed manually without autosampler. The viscosity measurements were performed in a temperature range from -20 °C to +60 °C with steps of 5 °C or 10 °. For temperature table scans (TTS) two density values at two different reference temperatures were typed in manually in the "Quick Settings" ("Lovis Density TS/TTS") for every sample. The instrument automatically extrapolated the missing temperature / density values by linearextrapolation. The density values for the manual input were determined with the SVM™.Every scan was performed twice in order to obtain a repeat determination. To check the reproducibility, all measurements were performed with Lovis and SVM™ in parallel.3.1 Samples▪Different mixtures of organic carbonates, which contain lithiumhexafluorophosphate as conductive salt – for lithium ion batteries (LIB), either commercial available standardelectrolytes or newly developed electrolyte solutions.▪Solvents containing a polar organic solvent and dioxolane added with Li-sulfur-compounds as conductive salts – for future Li-S-cell systems (LiS).▪Solvents containing a polar organic solvent plus Mg-compounds as conductive salts – for prospective Mg-ion batteries.3.2 Instrument Settings Measuring Method: Temperature Table Scan (TTS) Measuring Settings:▪ Temperature: scan between -20 °C to +60 °C ▪ Equilibration Time: no ▪ Measurement Cycles: 3▪ Measuring Angle: Auto Angle * ▪ Variation Coefficient:0.4 % for standard electrolytes ▪ Measuring Distance: Short* Adjustment was performed over an angle range from of 20° to 70° in 10° steps3.3 Filling of the CapillaryAll samples were manually filled in an argon glove box under inert conditions. For each measurement a new steel ball was used to avoid any cross contamination from one measurement to the other. After closing the capillary with the appropriate plug, the hermetically sealed capillary was removed from the glove box.3.4 CleaningThe capillary was cleaned thoroughly with smallbrushes after every test sequence. Ethanol, deionized water and other appropriate solvents were used as cleaning liquids. If necessary, the capillary was placed into an ultrasonic bath (approximately 10 to 20 min, 30 °C, water plus standard detergent). Afterwards the capillary was dried under a pressure-less nitrogen stream.4 ResultsFigure 3: Reproducibility check; standard Li-ion electrolyte V24 measured with Lovis and SVM ™ from +20 °C to -20 °C4.4Temperature Profile of Li-polysulfide4.5Checking the Influence of Conducting Salt5ConclusionBy using the Lovis 2000 M/ME equipped with cooling option, it is possible to perform measurements from -20 °C up to +100 °C. In combination with the capillary made of PCTFE even extremely corrosive substances can be measured under hermetically sealed atmosphere. This allows users to measure theviscosity of electrolytes, which might be destroyed or changed in structure by air and/or air humidity.▪ The small capillary sizes require only littlesample volume (starting from 110 µL). ▪ The small diameter of the PCTFE capillary(1.62 mm) enables also the measurement of very low-viscosity samples (viscosity range from 0.8 mPa.s to 160 mPa.s).▪ The cooling option allows for viscositymeasurements down to -20 °C (lowertemperatures down to -30 °C are possible on request and depending on ambient conditions).▪ The closed system avoids any contaminationand evaporation.▪ The variable inclination angle of themeasurement allows for the variation of the shear rate.▪ Lovis 2000 M/ME is highly modular; it can becombined with DMA™ M Density Meters for automated calculation of dynamic andkinematic viscosity. It can also be combined with an Xsample™ sample changer (see Figure 8) for automatic filling and cleaning of the capillary and measurements with high sample throughput.6ReferencesSpecial thanks to DI Gisela Fauler and Ms. Katja Kapper from VARTA Micro Innovation GmbH who tested the Lovis with cooling option and the PCTFE capillaries and supported Anton Paar with their measurement data.Contact Anton Paar GmbH Tel: +43 316 257-0****************************|。
Performance Analysis of a Real-Time JavaExecution Environment for IEC 61499Kleanthis C. Thramboulidis, George S. Doukas, Alkiviadis ZoupasElectrical and Computer Engineering, University of Patras, 26500, Greece(e-mail: {thrambo, gdoukas}@ upatras.gr, azoupas@upnet.gr).Abstract: The IEC 61499 standard enhances the 1131 Function Block model to exploit the advantages of the object technology in the industrial automation domain. Several prototype development environments have been developed by various research groups and the first commercial tools that support this model are already in the market. However, the absence of mature run-time environments that will allow the execution of IEC 61499 compliant applications is still evident. Even for the existing ones, there is no evidence about their efficiency to meet real-time constraints imposed by this kind of applications. Benchmarking of run-time environment is required to prove that these environments can be considered for the development of real-time applications. In this paper, a benchmarking framework is described and it is used to analyze the performance of an IEC 61499 run-time environment that is based on the real-time Java Specification. The recently released IBM and Sun RTSJ implementations are used to demonstrate the effectiveness of the RTSJ based framework. Performance results prove the applicability of the proposed run-time environment and the model driven approach that was adopted, in the control and automation domain.Keywords: IEC 61499, Function Block, Real-Time Java, RTSJ, run-time environment, performance analysis, benchmark.1. INTRODUCTIONThe Function Block (FB) construct is a well-known and widely used by control engineers. It was first introduced by the IEC 61131 standard on programming languages for programmable logic controllers, and was later extended by the IEC’s 61499 standard (IEC 61499, 2005) to share many of the well defined and already widely acknowledged benefits of object technology. The IEC 61499 describes a methodology that utilizes the FB as the main building block and defines the way that FBs can be used to define robust, re-usable software components that constitute complex distributed control systems (DCSs). Complete control applications, can be defined by one or more FB Networks (FBNs) that specify event and data flow among function block or sub application instances. The event flow determines the scheduling and execution of the operations specified by each function block’s algorithm(s).The majority of control and automation software deals with applications with more or less strict timing constrains. Although IEC 61499 does not provide a way to capture these constraints, the final executable should be deterministic, thus runtime timing behaviour of execution environments should be provided. Moreover, the standard intentionally leaves a lot runtime issues open to be defined later by developers. This results in IEC 61499 runtime implementations that demonstrate various execution behaviours, which may confuse control application engineers. To address the above issues, benchmarking proves to be a very useful tool, enabling better understanding and evaluation of the runtime behavior of IEC 61499 compliant execution environments.In this paper a framework for the benchmarking of IEC 61499 FB applications is described. The FB instance and the FB network are analyzed and specific measurements for the benchmarking are defined. A real-time Java based run-time environment, the RTSJ-AXE, is used to demonstrate the proposed benchmark. This work also proves the efficiency of the RTSJ-AXE run time environment for the real-time domain.The remainder of this paper is organized as follows: Background and related work is presented in the next section. The real-time Java execution environment that is used as basis for this work is described in section 3. Performance analysis and results are presented in section 4, and finally the paper is concluded in section 5.2. BACKGROUND AND RELATED WORKThe IEC 61499 standard enhances the 1131 Function Block model to exploit the benefits of the object technology in the industrial automation domain. It also proposes a methodology for the development of control applications based on the component concept. The Function Block is defined as the main building block that can be used by the control engineer to define the model of his application. The FB type, which defines the structure and behaviour of FB instances, is composed of a head and a body. The body encapsulates the algorithms and internal data; it is connected to input eventsProceedings of the 13th IFAC Symposium on Information Control Problems in ManufacturingMoscow, Russia, June 3-5, 2009and generates output events. The head captures the dynamics of the FB; it consumes the input events, triggers the execution of algorithms and generates output events. The dynamics of the head are specified by a special kind of state transition diagram that is called execution control chart.Several research groups are working to provide run-timeenvironments for IEC 61499 based DCSs. The FBRT () is the first run-time environment for IEC 61499 based control applications. FBRT utilizes Java but it supports neither timeliness, nor the run-time re-configurability of the system. The method invocation paradigm, which is adopted for the implementation of event connections, and the non-determinism of the Java platform make the environment inappropriate for real-time applications and imposes many restrictions to its use in real world applications. IsaGraph (/), a well known commercially available toolset for the IEC 61131, includes in its latest version support for IEC 61499. The proposed execution environment, even though not completely compliant with the standard, provides the first commercially available tool that supports it. The Fuber execution environment is under development at Chalmers University of Technology (Cengic et al. 2006). The 4DIAC-RTE (/) is a runtime environment that is provided in a PC version and an embedded ARM7 based version (Sunder et al. 2007) . These environments are not currently described in publicly available documents regarding the adopted implementation policies; performance measurements are not available. A FB-based model to support configuration and reconfiguration of DCSs is proposed and its implementation on real-time Java is discussed in Brennan et al. (2002). However, no proof of concept neither a prototype implementation is provided for the above real-time Java based approach. Benchmarking of IEC61499-compliant runtime environments is an issue of interest for researchers for the last couple of years. Soundararajan et al. (2007), study an agent-based software design pattern, utilized for the benchmarking of real-time distributed control systems, such as IEC61499, and applied it on a hybrid physical/simulation environment. Sunder et al. (2007), proposed a set of benchmarks to evaluate the capabilities (performance) of different IEC 61499 runtime environments regarding the execution of basic FBs and FB networks. Key steps of the basic FB execution procedure were identified and utilized to produce timing characteristics of FB execution on two different environments the C++FBRT and the MARTE. However, the proposed benchmark does not address the reconfiguration related characteristics of the run time environment. Both run-time environments are not described in any publicly available publication so it is not possible to understand the different behaviours for the described execution application scenarios, i.e., mixed serial and parallel and tree structure. The C++ FBRT run time environment is executed on a C167 Infineon microcontroller without operating system so it is evident that the control application is in the form of a monolithic application. This means that the component based model that is in the heart of the IEC 61499 FB model is not supported bythis run-time environment and thus the providedmeasurements cannot be considered for comparison.3. THE REAL-TIME JAVA EXECUTION ENVIRONMENT3.1. The Real-Time Java SpecificationIt is widely accepted that Java applications running on a general-purpose JVM can only meet soft RT requirements that are at the level of hundreds of milliseconds. The most important reasons for this nondeterministic performance of Java run-time environment are the dynamic class loading, the garbage collection and the native code compilation. However, a wide variety of researchers recognized the potential of Java in real-time applications and invented techniques to address most of the above issues. The Real-time Java Specification for Java (RTSJ) (Bollela, G., et al., 2000) is the most important and systematic approach to provide an extension to address the limitations of the Java language. RTSJ defines modifications and new features to the semantics of the JVM such as scheduling properties suitable for real-time applications with provisions for periodic and sporadic tasks, support for deadlines and CPU time budgets, and means to avoid garbage collection (GC) delays. RTSJ, apart from favouring productivity and portability, allows programmers to write real-time programs in a type-safe language, reducing many opportunities for catastrophic failures.RTSJ received a high acceptance from vendors and a variety of real-time Java implementations and products were introduced in the market. The first reference implementation, the RI, was developed by TimeSys () and runs on all Linux versions. Jamaica () runs on a wide variety of real time operating systems. It provides the programmer with an optionally activated real-time GC, which makes memory programming a more relaxed process compared to the RTSJ one. Other JVM implementations claiming compliance with RTSJ are jRate (/) and the JVM developed by aJile Systems (), which running directly on hardware, opens new horizons on the application of RTSJ in embedded systems domain. The recently released IBM and Sun RTSJ-compliant implementations are expected to give a great push in the use of real-time Java. IBM’s real-time Java is part of the WebSphere Real Time V1.0 (/software/webservers/realtime/) that contains a stand-alone Java Standard Edition 5 Runtime Environment to support real-time applications. Sun real-time Java (/javase/technologies/realtime/index.jsp) is a standards-based extension of the J2SE 5.0 platform designed to address, according to Sun, the growing demand for predictable computing in industries such as aerospace, financial services, industrial automation and telecommunications, as well as in scientific research.3.2. The RTSJ-Archimedes Execution EnvironmentThe RTSJ-AXE is a run-time environment that was developed in the context of the Archimedes system platform to allow the exploitation of real-time Java implementations inthe industrial automation domain and specifically in 61499 based applications.The Archimedes system platform is composed of a methodology, a framework and an ESS. The Archimedes ESS that currently supports the design and deployment phases in the application and partially in the resource layers of the Model Integrated Mechatronics (MIM) architecture was developed utilizing the General Modelling Environment (GME). GME is a configurable toolset with generic functionality for graphical development that supports the easy creation of domain-specific modelling and program synthesisenvironments (Ledeczi, et al., 2001). The Archimedes system platform adopts the model driven development which means that the control engineer constructs the models of hisapplication using a domain specific language, as for example the IEC 61499 FB notation and the systems automatically generates the executable model that will be executed on an IEC 61499 compliant run-time environment. In the case of RTSJ-AXE, the 61499 models are automatically transformed to RTSJ compliant java specifications which are ready to be executed on the RTSJ-AXE run-time environment.In more detail, Archimedes ESS or any other 61499 compliant ESS, such as Corfu FBDK, can be used to define the FB model of the application. New FB types and FB networks can be defined or imported from IEC- compliant XML specifications that have been produced by other IEC-compliant ESSs. These FB design models are further refinedand enhanced to capture the real-time constraints of the control application. The so constructed platform independentmodels, utilizing the RTSJ-AXE model interpreters, are transformed to the RTSJ-based FB platform-specific implementation model that can be executed on the execution environment running on RTSJ compliant implementations.The RTSJ-AXE extends the functionality of Archimedessystem platform so as to exploit RTSJ in the model driven development process of distributed control applications. It is composed of: a) An FB implementation model framework, i.e. a set of classes that enable the re-use of all these design decisions that have been done for the proper use of RTSJ constructs in mapping FB based design specifications of control applications to executable real-time Java implementations. b) An execution environment that is required for the deployment and execution of the proposed FB implementation model. This environment provides the infrastructure required to meet deployment and re- deployment needs, as well as stringent non-functional requirements such as maximum permissible response times, minimum throughputs and deadlines usually imposed by the nature of DCSs. c) A set of interpreters to automaticallygenerate the implementation model from the FB design model. d) A tool, the RTSJ launcher, to support the preparation and launching of the application on the target environment.To eliminate the major cause of unpredictability of the Javaimplementation, the RTSJ defines theNoHeapRealtimeThread (NHRT) class. The NHRT class isused in the RTSJ-AXE to allow selected FB instances to pre-empt the GC without delay since it runs at a higher prioritythan the GC. This allows the control application to beindependent from the GC. Event connections between FB instances are implemented in RTSJ-AXE, using theAsynchronous Event Handling Mechanism. This mechanism was derived in RTSJ by generalizing the traditional asynchronous event handling mechanism of Java. Based on this, asynchronous event handlers become schedulable entities that act as real time threads and inherit all the scheduling characteristics of threads. A detailed description of the RTSJ-AXE can be found in (Thramboulidis et al. 2005).4. PERFORMANCE ANALYSIS AND RESULTS In this section the RTSJ execution environment used to analyze the timing behaviour of the proposed run-time environment are described. The timing behaviour of IEC 61499 implementations is analyzed and benchmarking results obtained running the proposed RTSJ-based 61499 run-time environment are presented. The objective of this performanceanalysis is to contribute to the performance analysis of IEC 61499 implementations, and demonstrate the timingcharacteristics of the proposed execution framework on different implementations environments of RTSJ. It is not the objective of this work to compare the various RTSJ implementations. Such evaluations can be found in other works, i.e. (Enery, et al., 2007). 4.1. Hardware and Software test bedsThree different real-time Java platforms, i.e., TimeSys RI, IBM WebSphere Real Time, and Sun Java Real-Time System, were used to analyze the timing behaviour of theproposed IEC 61499 run-time environment. More specifically the three software/hardware platforms used for are:TimeSys RI : TimeSys has developed the official RTSJ implementation. RI runs on any Linux platform and its threading model maps directly the Java threads onto Linux POSIX threads. For the test bed, version 1.0-547.1 of RI was used running on TimeSys Linux/RT (GPL version) 4.1 Kernel 2.4.21. A significant improvement in performance was identified using the new version 1.1-alpha. The hardware platform used was a PC with AMD 64 2.4 GHz and 2Gb RAM.IBM WebSphere Real Time : Ver 1.0 of IBM WebSphere Real Time was used, running over Red Hat Enterprise Linux 4 patched with RTkernel 2.6.16 on a IBM server 798452G, X3455, Opteron Dual-Core model 2218, 2X2.6GHz/ 2x1MB L2 SE, 4x512MB. Sun Java Real-Time System : Sun Java Real-Time System (Java RTS) 2.0 RC2 was used running over Sun Solaris 10, on a Sun Fire 280R Server, with 2x Sun UltraSPARC III CU1.2 GHz with 4 GB RAM. It is important to note that Sun Java RTS was designed for a dual-CPU system, but can alsorun on a single CPU system. This will result in higher latencyand jitter numbers, but is still an effective solution forapplications with high temporal requirements.Naturally, it would have been more favourable to run the tests on the same hardware platform. However, the IBM and Sun real-time Java implementations supported, when the tests were run, only their own hardware platforms. 4.2. FB instance performance analysisThe performance analysis of the FB instance is mainly based on the ECC that describes the dynamic behaviour of the basic FB type instance. The state transition diagram shown in Fig.1, which describes the behaviour of the FB instance, was used for the analysis of performance characteristics of the FB instance. The FB instance is blocked in the idle state (S0) waiting for an event at its event inputs. The presence of an event at the event inputs of the FB instance fires the transition t1. During this transition the sampling of input data of the FB instance is performed and the internal data input variables are updated so as to be used in subsequent calculations.S1S2S0t1t2t3t4Fig. 1. State machine for the execution of basic FB instance. During S1 the transitions of the current state of the ECC are evaluated. If a transition fires, i.e., its condition evaluates to true, the transition t3 is fired. This means that the duration of S1 depends on the number and the type of transitions. During S2 the actions that are associated with the new state of the FB instance’s ECC are executed. Each action is performed by executing the associated algorithm, if any, and issuing an event at the associated event output. The transition t4 is fired after the execution of the associated actions and the FB instance enters the S1 state where the transitions of the current state of the ECC are evaluated again. If a transition fires, t3 is activated, otherwise t2 is activated and the FB instance enters the S0 state.For the benchmark analysis of the execution timing characteristics of the FB instance, a dummy FB type was defined and used (Fig. 2). The DummyFB has: one inputevent (EI), seven data inputs (D1, D2, … D7) of type Boolean, one event output (EO) and one data output (DO). The event input EI is used to activate the FB instance, which executes 1, 2, 3 or up to 7 subsequent state transitions depending on the values of associated data. One ‘dummy’ algorithm is assumed with just a return statement and only one event per EC action. The performance results of the DummyFB can be used to calculate the execution time of any basic FB type of the application, assuming that the worst-case-execution-times of its algorithms are available.Read Data Test (t1): This test measures the time required forthe transition t1 of the state machine that describes theexecution of the basic FB instance. For each platform usedfor the execution of the proposed run-time environment 1.000samples were collected and the average, standard deviation,min, max and 99,5 % are reported. Fig. 3 illustrates the read-data duration for IBM, Sun and TimeSys RI implementationsfor the DummyFB. In our implementation all input data areread from the DataConnectionManager independent of theevent WITH specifier. Better results can be obtained using the WITH specifier to read only the data related to the specific event that fired the transition.Fig. 2. The dummy FB type used in our benchmark.Evaluating Transitions Test (S1): This test measures the duration of the S1 state. The S1 time depends on the number and the type of transitions. Transition expressions in the DummyFB are: a) ei && d1, for the transition from START to the ST1 state, and dn for the transition from START to the STn state, where n=2..7. There is a transition which always fires from any STn state to the START state. Fig. 4 shows the results obtained using the DummyFB with one output event and up to seven transitions.Fig. 3. Read Data Duration performance measurements.Fig. 4. Evaluating Transitions (S1) Test results.Performing Actions (S2) Test: This test measures the durationof the S2 state. The S2 duration depends on the number ofactions, the algorithm execution time, and the output eventfiring time. Fig. 5 shows the results obtained using theDummyFB with one output event and up to seven associatedactions in a state. It should be noted that a great jitter in theexecution times was noticed until the IBM and Sunimplementations reach a steady state. This is why we had the systems to reach the steady state before starting the measurements of our tests (Enery, et al., 2007).Fig. 5. Performing Actions (S2) Test results. 4.3. FB Network performance analysisFor the FB network performance analysis, dispatch-latency,event-connection latency and the event-handler computation-time are presented. The Read data latency has no meaning forthe FB network, since for the proposed implementation this time is the one that was measured in the FB instance performance analysis.Dispatch latency Test:This test measures the time from when the output event of the event producer FB is fired, to when its handler, which is an instance of the EventHandler class, isinvoked. Fig. 6 illustrates the dispatch latency for the three platforms used to run the proposed run-time environment. Itshould be noted that standard deviation for allimplementations is quite small and better performance isachieved when a BoundAsyncHandler is used (Fig. 7). Event-connection latency Test: This test measures the time from when the event is fired, to when the corresponding consumer FB instance becomes ready for execution. To ensure that each event firing causes a complete execution cycle of the consumer FB instance, the producer FB instance fires the next event only after the processing of the previous event has terminated. Measurements for the event-connection latency on the three platforms used are quite similar to the Dispatch Latency ones.Event-handler computation-time (EHCT) Test: This testmeasures the time needed for the event handler to perform itstask without interruption, i.e., to update the events in the inputEventMonitors that have already been subscribed for the specific event (Fig. 8). The EHCT depends on the number of monitors to update. There is one monitor per FB instance that accepts the corresponding event. It should be noted that Std. Dev. is greatly improved in the case of the Sun execution environment using BoundAsyncHandler (Fig. 9), whereas it remains the same for the case of IBM and RI.4.4. Deployment and re-deployment performance analysis For the deployment and re-deployment performance analysis the Festo MPS example application was used. Festo MPS is a laboratory system widely used in IEC 61499 related papers. It is composed of three units: the distribution unit, the testingunit and the processing unit. Cylindrical work pieces are forwarded from the distribution unit to the testing unit andnext to the processing unit, where a drill performs the most important processing of this mechanical unit. A detaileddescription of this case study can be found inhttp://seg.ee.upatras.gr/seg/dev/FestoMPS.htm. The whole control application for the Festo MPS system in the form of FB notation can be downloaded from the same site.Fig. 6. Dispatch latency for Async Event Handler Fig. 7. Dispatch latency for BoundAsync Event HandlerFig. 8. Event-handler computation-time for Async EventHandlerFig. 9. Event-handler computation-time for BoundAsync Event Handler.Table 1. Deployment timing characteristics for the Festo MPS application.Action # of timesperformedWebSphere RealTimeRI(1.1-alpha)Java RTSJava RTS(Dedicated CPU)FB instantiation 5 64,314 ms 21,564 ms 38,181 ms 38,934 msData Connection 10 2,160 ms 0,736 ms 1,390 ms 1,516 msEvent Connection 8 8,797 ms 20,845 ms 5,103 ms 5,456 ms Table 2. Re-deployment timing characteristics for the Festo MPS application.Action # of timesperformedWebSphere RealTimeRI(1.1-alpha)Java RTSJava RTS(Dedicated CPU)FB instantiation 2 4,065 ms 5,705 ms 7,033 ms 6,816 ms Create DataConnection1 0,023 ms 0,028 ms 0,076 ms 0,043 ms Create EventConnection5 1,227ms 14,543 ms 2,939 ms 2,732 ms Delete EventConnection2 0,106 ms 0,130 ms 0,379 ms 0,274 msTable 1 presents the timing characteristics of the deployment process for 5 FB instances. It must be noted that the 5 instances were all of different FB types so the load time for 5 different classes is responsible for the long total instantiation time. Deployment was executed in two steps: The first step is executed with priority 20 whereas the second step is executed with high priority (35). Times are for the given number of repetitions of the corresponding action. It is evident that our framework does not exploit the second CPU of the hardware platform when this is dedicated to RT threads.After the deployment of the first version of the Festo MPS example application that does not count the illegal workpieces, a re-deployment scenario was executed to demonstrate the applicability of the proposed approach regarding re-configuration. Two FB instances, one CounterFB and one PrintInt, were appended to the control application during runtime to get the final version, which has exactly the same behavior as the first one, except that the number of illegal workpieces is printed on the operator’s display. Table 2 illustrates the timing characteristics of the proposed run-time environment for the various RTSJ implementations for the above re-deployment scenario. It should be noted that the create event connection operation isa time consuming action for RI.5. CONCLUSIONSA benchmark for the IEC 61499 function block model is described and performance measurements for a run-time environment based on real-time Java are provided. Performance results prove the applicability of the real-time Java execution environment for hard real-time applications. These measurements can be used to calculate the response time of the various transactions of any FB application. This work should be further extended to analyze the timing characteristics of an FB network that is deployed on a network of interconnected nodes.REFERENCESInternational Electro-technical Commission (IEC), 2005, International Standard IEC 61499, Function Blocks, Part 1 - Part 4Thramboulidis K. (2005), “Model Integrated Mechatronics – Towards a new Paradigm in the Development ofManufacturing Systems”, IEEE Transactions onIndustrial Informatics, vol. 1, No. 1. Feb 2005 Thramboulidis, K., and Zoupas, A. (2005), “Real-Time Java in Control and Automation: A Model DrivenDevelopment Approach”, 10th IEEE Int. Conf. onEmerging Technologies and Factory Automation,Catania, Italy, Sept. 2005Sünder, C., Zoitl, A., Strasser, T., and Brunnenkreef, J.(2007). Benchmarking of IEC 61499 runtime environments. In Name(s) of editor(s) (ed.), EmergingTechnologies and Factory Automation, ETFA. IEEEConference on, 474-481.Soundararajan, K., and Brennan, R. (2008). Design patterns for real-time distributed control system benchmarking.Robotics and Computer-Integrated Manufacturing, 606-615. Pergamon Press.Bollela, G., B. Brosgol, et al., 2000, The Real-Time Specification for Java, /Addison Wesley/.Cengic, G., Ljungkrantz, O., Akesson, K. (2006), “Formal Modeling of Function Block Applications Running inIEC 61499 Execution Runtime”, /11th IEEE International Conference on Emerging Technologies andFactory Automation/, September 20-22, 2006, CzechRepublic.Sunder, C., Zoitl, A., Rofner, H, Strasser, T., Brunnenkreef, J. (2007), “Benchmarking of IEC61499 runtimeenvironments”, /12th IEEE Int. Conf. on EmergingTechn. and Factory Automation/, Sept 2007, Patras,Greece.Brennan, R., Fletcher, M., Norrie, D. (2002), “An Agent-Based approach to reconfiguration of real-timedistributed control systems”, /IEEE transactions onRobotics and Automation/, vol. 18, No. 4, pp. 444-451,August 2002Ledeczi, A., M. Maroti, A. Bakay , G. Karsai, J. Garrett, C.Thomason , G. Nordstrom, J. Sprinkle and P. Volgyesi.“The Generic Modeling Environment”. Proc. ofWISP’2001, Budapest, 2001.Enery Mc Enery, D. Hickey, and M. Boubekeur, “Empirical Evaluation of Two Main-Stream RTSJ Implementations”, 5th International Workshop on JavaTechnologies for Real-time and Embedded Systems(JTRES ’07) Sept. 26-28, 2007 Vienna, Austria.。
Performance of low-complexity code acquisition for direct-sequence spread spectrum systemsK.Yu and I.B.CollingsAbstract:The authors consider a low-complexity technique for direct-sequence spread spectrum code acquisition.They present performance analysis in Rayleigh fading channels,and with carrier frequency offset in the receiver.They derive analytical expressions for acquisition probabilities,and use them to determine the mean and variance of the acquisition time.It is shown that acceptable performance can be achieved with significant reduction in hardware requirements,compared to conventional detectors.Simulations confirm the analytical results.Performance comparisons are presented under a range of scenarios including significant frequency offsets and fading channels.1IntroductionTiming recovery is a vital task which fundamentally affects the performance of digital communications receivers.For direct-sequence spread spectrum communications,one of the most important timing tasks is code acquisition.This is especially the case for high mobility communications,with fading channels.Such a problem is of course well known and studied,given the widespread use of CDMA systems [1].However,recently the development of wireless LANs has raised new problems.In particular,these systems use very short spreading sequences,and experience large frequency offsets in the receiver carrier recovery circuitry.These two factors have sparked renewed interest in problems of code acquisition.In this paper we are interested in examining low-complexity acquisition techniques required for practical WLAN systems with low spreading gain.Of particular interest is the effect of fading and frequency offset.Our analysis will show that acceptable performance can be achieved with considerably lower computational complexity than is commonly thought to be required.Traditionally,code acquisition designs are divided into two stages,namely detection (code search)and verification.In a parallel acquisition mode,the detection stage simultaneously evaluates all the possible code offsets and picks the one with the highest correlation [2–5].In a serial search,the detection stage evaluates the possible code offsets one after another until an acceptable correlation is found [6–11].Hybrid schemes have also been considered including partial parallel searches [2,12].Performance analyses for these acquisition structures have been pre-sented,in the case of fading channels in [8,9,11,12],and in the case of frequency offset in [13–16].In fact,it is well known that for a conventional correlation detector,a frequency offset of half the data rate causes a performance loss of about 4dB.In this paper we consider the combinedeffect of simultaneous channel fading and receiver frequency offset.Of course,the task of code acquisition is intimately linked to the transmission modulation and coding format,and to the transmission channel characteristics.The IEEE stan-dard (802.11)for WLANs [17]must be able to operate in a wide variety of environments,providing a tough challenge for the code acquisition detector.The standard specifies a direct sequence spreading code of only 11chips,and the frequency offset can be up to 62.5kHz.The task of code acquisition is significantly more difficult than for current mobile standards such as IS95and 3G systems.This is particularly the case with increasing user mobility,and with the pressure to reduce the computational complexity.A key contribution of this paper is that we consider and analyse the use of a low-complexity sign correlator in the code acquisition system,as opposed to conventional correlation.Sign operations (including sign quantisation,sign multiplication,and sign summation),significantly reduce the complexity of hardware implementation.We derive analytical performance measures for both the low-complexity and conventional schemes,in the presence of fading and frequency offset.Expressions are derived for mean acquisition times and associated variances,for both serial and parallel search schemes,and are given as a function of the fading,SNR and frequency offset.The results show that the sign-correlator technique can provide acceptable performance within 1.6dB of the conventional approach,and do so with significantly reduced complexity.Simulated acquisition probabilities are generated which demonstrate the validity of the assumptions made in our analysis.The simulated results are seen to match closely our analytic predictions.2Signal model and detectorConsider the following transmitted signalx ðt Þ¼s ðt Þcos 2p f 0twhere f 0is the carrier frequency ands ðt Þ¼s rem b tT c c N þ1h t ÀtT c "#T cThe authors are with the Telecommunications Laboratory,School of Electrical and Information Engineering,University of Sydney,NSW 2006,Australia r IEE,2003IEE Proceedings online no.20030742doi:10.1049/ip-com:20030742Paper first received 22nd August 2002and in revised form 28th May 2003where rem denotes remainder,ands i 2Æ1ffiffiffiffiNp &'for i ¼1;...;Nis the spreading code.Also,T c ¼T /N ,where N is the spreading gain,T c is the chip duration,T is the symbol period,and h (t )¼1/T c for 0o t o T c and 0otherwise.We further define the spreading code vector s ¼[s 1s 2?s N ]T ,and the cyclic k -position shifted code vector s (k )¼[s N Àk +1?s 1?s N Àk ]T ,where 77s 772¼77s (k )772¼1.Alsodenote s ðk Þi as the i th element of s (k ).For the WLAN applications of interest,we assume the usual complex-valued fading Rayleigh channel,constant over a symbol interval but independent from symbol to symbol.Also,in this paper we consider frequency offset in the transmitter (or receiver).The received signal is thereforer ðt Þ¼ffiffiffiP p j a j s ðt Àk T c ÞÂcos ð2p ðf 0þD f Þðt Àk T c Þþy Þþu ðt Þwhere P is the signal power,7a 7is the channel amplitude (with E [7a 72]¼1),and k T c is the transmission delay with k a random integer uniformly distributed between 1and k max (i.e.chip synchronisation).Also y is the random phase (uniformly distributed),D f is the carrier frequency offset,and v (t )is Gaussian noise.Now consider the low-complexity sign-correlator code acquisition detector.This detector has all correlation calculations performed as binary operations,significantly reducing hardware requirements.We do not suggest that this detector is particularly novel,but rather our aim is to provide analysis which we will then use to demonstrate that such a low-complexity scheme can prove successful even in fading channels with high levels of frequency offset,typical of current WLAN systems.The first stage in the detector is shown in Fig.1,with quadrature baseband (two-level)quantized outputs,where sgn (x )models a simple 1-bit logic gate.The second stage of the detector is shown in Fig.2,where the binary outputs from Fig.1are binary correlated with the local spreading code,with the appropriate test timing offset.Turning to the mathematical details we have [Note 1]r f ðt Þ¼2ffiffiffiP p j a j s ðt Àk T c Þ½cos ð2p ð2f 0þD f Þt À2p ðf 0þD f Þk T c þy Þþcos ð2pD ft À2p ðf 0þD f Þk T c þy Þ þv I ðt Þandy I ð‘Þ¼sgn Z ‘T c ð‘À1ÞT cr I ðt Þd t 0B@1CANow since f 0c 1/T c we can writey I ð‘Þ¼sgn ffiffiffiP p j a j s rem ‘À1Àk N ÀÁþ1T c½I I ð‘Þþn I ð‘Þ!ð1Þwhere n I (c )is an independent zero-mean Gaussian variable with variance s 2n ,andI I ð‘Þ91T cZ ‘T c ð‘À1ÞT ccos ð2pD ft þf Þd t ð2Þf 9À2p ðf 0þD f Þk T c þy ð3ÞNow,when D f {1/T c ,I I (c )is approximately constantfor all c in the same symbol,so let us instead consider the average integration over an entire symbol interval,as in [18].Consider c 0to be a chip sampling instance which corresponds to the first chip of a symbol.Now,for c A [c 0,N +c 0À1],we defineJ 91N X ‘0þN À1‘¼‘I I ð‘Þ¼1NT cZ ð‘0þN À1ÞT c ð‘0À1ÞT ccos ð2pD ft þf Þd t¼12p NT c D f½sin ð2pD f ð‘0À1þN ÞT c þf ÞÀsin ð2pD f ð‘0À1ÞT c þf Þ ¼a cos cð4Þwherea ¼sin ðpD fNT c ÞpD fNT cc ¼pD fNT c þ2pD f ð‘0À1ÞT c þf By replacing I I (c )in (1),with this average expression (4),we havey I ð‘Þ%sgnffiffiffiP p a j a j cos c s rem‘À1Àk NÂÃþ1þZ I ð‘Þ We can now write an ‘average’expression for z I (i )(defined in Fig.2).At the i th symbol,y I (c )correlates with test-offsetcode s (d )whose j th element is s ðd Þj ,where c take valuesfrombinary chip matched filter0ty I (l )(binary)y Q (l )(binary)Fig.1Binary chip matchedfilterbinary correlatory I (l )y Q (l )offset codeFig.2Code offset detector using binary correlatorNote 1:We present only the in-phase component for notational simplicity since the quadrature component is independent and therefore can be derived in the same way.c 0¼((i À1)N +1)+k to iN +k .Thereforez I ði Þ¼X N j ¼1y I ðði À1ÞN þk þj Þsgn ðs ðd Þj Þð5Þ%X N j ¼1sgn ðffiffiffiP p a j a j cos c s ðk ÞjþZ I ðði À1ÞN þk þj ÞÞsgn ðs ðd Þj Þ¼X N j ¼1ð1Àr j Þsgn ðffiffiffiP p a j a j cos c s ðk Þj Þsgn ðs ðd Þj Þð6Þwherer j ¼2n I ðði À1ÞN þk þj Þs ðk Þj cos c o 0;and j n I ðði À1ÞN þk þj Þj 4ffiffiffiffiffiffiffiffiffiffiP =N p a j a jj cos c j 0elsewhere8><>:And the conditional probability for r j ¼2isP ðr j ¼2j j a j ;c Þ¼Z1f ða ;c Þ1ffiffiffiffiffiffi2p p s nexp Àx 22s n !d x ¼Qf ða ;c Þs n ð7Þwhere Q (x )is the Q-function andf ða ;c Þ¼ffiffiffiffiPN r a j a j cos c Now (6)can be further written asz I ði Þ¼sgn ðcos c ÞX N j ¼1ð1Àr i Þsgn ðs ðk Þj Þsgn ðs ðd Þj ÞDue to non-coherent detection,we only need consider the magnitude of z I (i )and hence the term ~z I ði Þas follows:~z I ði Þ¼X N j ¼1ð1Àr j Þsgn ðs ðk Þj Þsgn ðs ðd Þj Þð8ÞThis model will be used in the following Section to derive the detection probabilities and acquisition times in each case studied.3Parallel search structure3.1Mean and variance of acquisition timeFigure 3shows the state diagram of the parallel codeacquisition detector,similar to that in [12].Let hypothesis H 1denote that the codes are in synchronisation while hypothesis H 0means codes are not in synchronisation.A maximum selection criterion (MSC)is applied in the search (or selection)stage using a parallel code offset search (usually performed by parallel hardware)which simulta-neously tests all possible offsets.At verification stage a threshold-crossing criterion is applied to the selected offset A times.The A tests are performed over disjoint NT c second intervals and are thus statistically independent.If the threshold is exceeded at least B times out of the A tests,then the hypothesis is accepted,otherwise the search recommences.It is assumed that the channel-decoder will detect false alarms and in that case the search recommences after a JNT c delay.Flow graph theory gives the mean and variance ofacquisition time as follows [1,6,12]E ½T ACQ ¼d d z ln G ðz Þ !z ¼1¼1P D ð1þA þJP F ÞT ð9ÞVar ½T ACQ ¼d 2d z ln G ðz Þþdd z ln G ðz Þ!z ¼1¼1P D½J 2P F ÀðA þ1Þ2 T 2þ1P D½1þA þJP F 2T 2ð10Þwhere G (z )is the generating function from Fig.3,T is thesymbol period,P D ¼P D 1P D 2and P F ¼P F 1P F 2.Clearly,we need now to obtain these acquisition probabilities for the low-complexity sign-correlation detector which is the focus of this paper.3.2Detection probabilities3.2.1Sign-correlator detector with frequency offset :For hypothesis H 1(codes in synchronisation)wehave,from (8),that~z I ði Þ¼N ÀX N j ¼1r jLet m 1and s 21be defined to be the mean and variance of ~z I ði Þ.Then using (7)m 1¼N ÀX N j ¼1E ½r j¼N À2N E Qf ða ;c Þs n!ð11ÞRecall that 7a 7is Rayleigh,and since y is uniform,so is c .Definingg ðc Þ¼ffiffiffiP p a cos cffiffiffiffiN p sn(1−P )z ATFig.3State-transition diagram of the acquisition processSEL is the parallel selection stage.The verification stages are denoted V.V1indicates that the correct offset was tentatively selected,V2indicates that an incorrect offset was tentatively selected.FA indicates false alarmgivesE Qfða;cs n!¼12pZ2pZ1QðgðcÞxÞÂ2x exp½Àx2 d x d cð12Þ¼12pZ2p12ÀZ1exp½Àx2Â1ffiffiffiffiffiffi2pp gðcÞexpÀg2ðcÞ2x2!d x d c¼14pZ2p1Àffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffig2ðcÞ2þgðcÞs!d c¼12À1parcsin uð13Þwhereu¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP a2P aþ2N s n sSubstituting(13)into(11)yieldsm1¼2Nparcsin uð14ÞLikewise the variance of~z IðiÞiss2 1¼N2À2NX Nj¼1E½r j þX Nj¼1E½r2jþX Ni¼1X Nj¼1;j¼1E½r i E½r j Àm21¼N2À4NðNÀ1ÞE Qfða;cs n!þ4NðNÀ1ÞE Qfða;cn!2Àm21ð15ÞSubstituting(13)into(15)producess2 1¼N1À4p2arcsin2uð16ÞAlso,note that the quadrature correlator output,~z Q,has the same mean and variance as this in-phase component.We now propose to model~z fðiÞand~z QðiÞas twoGaussian random variables with mean m1and variance s21.Of course,the actual ranges of~z IðiÞand~z QðiÞarefinite on [ÀN,N],therefore the variance of our Gaussian model mustbe greater than s21in order to achieve a good match forspreading gains of practical interest.Based on num-erical studies it turns out that a modified variance,s2 1c ¼s21þ0:75m21,provides an accurate analytical expres-sion.This modified variance was found to provide accurate theoretical predictions of performance in all cases studied, and therefore validates our Gaussian assumption. Clearly,under the Gaussian assumptions,the testvariable(from Fig.2)RðiÞ¼~z2I ðiÞþ~z2QðiÞhas a non-centralchi-square distribution with non-centrality parameter m2R¼2m21.The pdf of R(i)conditioned on hypothesis H1isp RðR j H1Þ¼12s21cexpÀ2m21þR2s21c!I0ffiffiffi2pm1ffiffiffiRps21c!ð17ÞFor hypothesis H0(codes not in synchronisation),we againmodel~z IðiÞand~z QðiÞas two Gaussian random variables.With random spreading,we can easilyfind that their meansare zero in this case,and their variances are N,which givesp RðR j H0Þ¼12NeÀR=2Nð18ÞNow let us move on to derive the probabilities ofacquisition.At search stage(‘SEL’in Fig.3)the detectionprobability P D1is given byP D1¼Z1p Rðy j H1ÞZ yp Rðx j H0Þd x2435NÀ1d yð19ÞSubstituting(17)and(18)into(19)givesP D1¼Z112s21cexpÀ2m21þy2s21c!I0m1ffiffiffiffiffiffi2yps21Â1ÀexpÀy2Nh iNÀ1d y¼12snexpÀm21s1c!X NÀ1n¼0ðÀ1ÞnNÀ1nÂZ1expÀ12s1cþn2Ny!I0m1ffiffiffiffiffiffi2yps1cd y¼X NÀ1n¼0ðÀ1Þn NNþn s1cNÀ1nexpÀnm21Nþn s1c!ð20ÞThe incorrect detection probability at search stage can beobtained in terms of P D1:P F1¼1ÀP D1ð21ÞAt verification stage(‘V1’and‘V2’in Fig.3)it can beshown thatP D2¼X Ai¼BAiP i1ð1ÀP1ÞAÀið22ÞP F2¼X Ai¼BAiP i2ð1ÀP2ÞAÀið23ÞHere P1and P2are the probabilities that(respectively)H1test sample and H0test sample cross the threshold.ThereforeP1¼Z1R112s21cexpÀ2m21þx2s21c!I0ffiffiffi2pm1ffiffiffix ps21c!d x¼Qffiffiffiffiffiffiffiffi2m21s1cs;ffiffiffiffiffiffiffiR1s1cs!ð24ÞP2¼Z1R112NexpÀx2Nh id x¼expÀR12N!ð25Þwhere R1is the threshold value which is selected numericallyto minimize the mean acquisition time at each SNR[11,12],and Q(x1,x2)is the Marcum Q-function[19].Substituting(20)–(23)into(9)and(10)will now give the requiredperformance measures for the reduced complexity sign-correlation detector with frequency offset.3.2.2Conventional correlator detector with frequency offset:In this Section we consider the conventional correlator which requires high-resolution correlation in contrast to the binary correlator in the previous section.Our analysis here follows the procedures in[12],but here we extend the analysis to include the effect of constant frequency offset.In this case the in-phaseoutput isz IðiÞ¼ffiffiffiPpa j a j cos cðsðdÞÞT sðkÞþw IðiÞð26Þwhere w I(i)is an independent zero mean Gaussianrandom variable with variance s2w .Conditioned on7a7and H1,we havep RðR j H1;j a jÞ¼1n expÀP a2j a j2þRw "#ÂI0ffiffiffiffiffiffiPRpa j a jswNote that this function does not depend on c.Averaging with respect to7a7givesp RðR j H1Þ¼12s22expÀR2s22!ð27Þwhere s22¼0:5a2Pþs2w.For hypothesis H0we havep RðR j H0Þ¼12s2wexpÀR2s2w!ð28ÞAt search stage,P D1is again given by(19).Substituting(27) and(28)givesP D1¼Z112s22expÀx2s22!1ÀexpÀx2s2w!NÀ1d x¼X NÀ1i¼0ðÀ1ÞiNÀ1iZ112s2expÀs2wþi s222s2ws2x!d x¼X NÀ1i¼0ðÀ1ÞiNÀ1is2ws2wþi s2P:ð29Þand P F1¼1ÀP D1.At verification stage the detection and false alarm probabilities can be respectively computed using(22)and (23),but where in this caseP1¼Z1R212s2expÀx2s2!d x¼expÀR22s2!ð30ÞP2¼Z1R212s2wexpÀx2s2w!d x¼expÀR22s2n!ð31Þwhere R2is the detection threshold determined in the same way as R1.4Serial search structureIn this Section we consider the case when a serial search and threshold-crossing criterion is applied at search stage,in contrast to the parallel search in the previous Section.The verification stage is the same as in the preceding Section. While serial search takes longer to achieve acquisition compared to the parallel search,it requires less hardware to implement.4.1Mean and variance of acquisition timeA general expression for the mean acquisition time in this case is given by(1.183)in[1],and here we extend it to include the false-alarm penalty time J.We therefore have the following expressions for mean and variance of the acquisition time in this serial-detection caseE½T ACQ ¼1P D1þAP D1þðNÀ1Þ½Â1À12P DðAP F1þJP FÞ!Tð32ÞVar½T ACQ ¼À12À1þ2AP Dþ2ð1þAP D1ÞPDf0þNþ712ÀNP DþNÀ1P2Df2þÀ12þ1P Df1!ðNÀ1ÞT2þ1P2Dð1þAP D1Þ2À1P Dð1þ2AþA2P D1Þ!T2ð33Þwheref0¼1þAP F1þJP Ff1¼AðAþ1ÞP F1þð1þ2AþJÞJP Fand where the remaining variables in(32)and(33)have analogous definitions to those in the previous Section.4.2Detection probabilitiesSince the search stage in this serial case only tests one code offset at a time,the analysis can be viewed in the same way as that for the verification stage previously,but with A¼1 and B¼1.In other words for the sign correlator the search-stage probabilities,P D1and P F1are given in this case by P1 and P2as given in(24)and(25)respectively.For the conventional correlator the search-stage probabilities are likewise given by(30)and(31)respectively.For both correlators,the verification-stage probabilities are the same as in the parallel search case of the previous Section.5Complexity comparisonTable1shows the computational requirements for the serial and parallel acquisition approaches during one symbol duration,under the assumption that the output of the conventional chip matchedfilter is quantised into q levels. Generally conventional correlation uses(q¼16)-bit opera-tions(or higher),but sign correlation only needs(q¼1)-bitTable1:Computational requirements per symbol duration for processing gain N(for sign-correlation q¼1,while for conventional-correlation q is typically16or more)Search stage Serial q-bit multiplications:N+1q-bit additions:N+1Parallel q-bit multiplications:N(N+1)q-bit additions:N(N+1) VerificationstageSerial q-bit multiplications:N+1q-bit additions:NParallel q-bit multiplications:N+1q-bit additions:Noperations.In other words the sign-correlator only requires one-sixteenth of the hardware.6Numerical and simulation resultsEach of the simulation studies in this Section assumes the modified Jakes model was used to generate the Rayleigh fading channel with normalised fading rate 0.02(normalised to the symbol rate).In each case,the curves relating to the conventional detector are generated assuming infinite precision,(i.e.there is no quantisation).The data symbol rate was 1Msymbol/s.Figure 4shows the detection probabilities,P D 1,for the conventional correlator and the sign correlator,with and without carrier frequency offset.Clearly the analytic expressions we have derived in this paper accurately match the simulated results for practical SNR.Using these simulated and analytical detection probabil-ities,we then computed the corresponding mean acquisition time and standard deviation.The results are shown in Figs.5and 6(note that the curves for the conventional correlator with 0kHz frequency offset can be related to the simulation figures in [12],however in that paper they considered a partial parallel structure,and had a slightly different fading model).Here we set the verification-stage parameters to check for B ¼2threshold crossings out of A ¼3tests.We also set the false-alarm penalty variable to J ¼100.Actually this is somewhat arbitrary since it depends on the particular equaliser–decoder used further down the processing chain (e.g.J 4100in [8,12]).However our main interest is to characterise and examine the performance of the low-complexity sign correlator,and as such we are primarily interested in the relative performance compared to the conventional approach.The figures show that the low-complexity sign-correlation-based acquisition scheme achieves close to the performance of the conventional method (with and without the presence of frequency offset).The performance degradation is about 1.6dB.Figures 7and 8show P D 1and the mean acquisition time,for two different spreading gains.The plots are without the presence of carrier frequency offset.As the spreading increases the detection probability decreases since of course there are more code offsets to test.The results show that the performance degradation of the low-complexity sign-correlator technique is slightly less for lower WLAN−5510152000.20.40.60.81.0average SNR per symbol, dBd e t e c t i o n p r o b a b i l i t i e sFig.4Parallel-search analytical and simulated selection-stage detection probabilities (P D1)of the sign-correlator and conventional approaches for two values of frequency offset (0and 500kHz),with a spreading gain of 31−551015201010101010average SNR per symbol, dBm e a n , µsFig.5Parallel-search analytical and simulated mean acquisition time of the sign-correlator and conventional approaches for two values of frequency offset (0and 500kHz),with a spreading gain of 31−55101520100101102103104average SNR per symbol, dBs t a n d a r d d e v i a t i o n , µsFig.6Parallel-search analytical and simulated standard deviation of acquisition time of the sign-correlator and conventional approaches for two values offrequency offset (0and 500kHz),with a spreading gain of 3100.20.40.60.81.0average SNR per symbol, dBd e t e c t i o n p r o b a b i l i t i e sFig.7Parallel-search selection-stage detection probabilities (P D1)of the sign-correlator and conventional approaches for two different spreading gains (SG ¼15,127)without frequency offsetspreading gains.We also see acceptable performance for high spreading,which is interesting since the implementa-tion savings increase significantly as the spreading gain increases.Figure 9shows the mean acquisition time when the serial search structure is applied (note that the curves for the conventional correlator with 0kHz frequency offset can be related to the simulation figures in [8],however in that paper they considered much longer spreading sequences,and plotted their curves normalised to the chip rate,rather than the symbol rate for fading and SNR).As can be seen from the figures here,the performance degradation of the sign-based technique is even smaller than those in Figs.5and 6.Figures 10and 11compare the four different combina-tions of correlators and search structures with and without frequency offset respectively.7ConclusionThis paper generated accurate closed form analytical expressions for both low-complexity and conventional codeacquisition schemes,in fading channels and with frequency offset.We also presented performance comparisons to verify the analysis,and showed that the low-complexity scheme performs with only a 1.6dB performance degrada-tion.8AcknowledgmentThe first author was supported in part by Southern Poro Communications (SPC).9References1Simon,M.,Omura,J.K.,Scholtz,R.A.,and Levitt,B.K.:‘Spread spectrum communications handbook’(McGraw-Hall,New York,USA,1995)2Su,Y.T.:‘Rapid code acquisition algorithms employing PN matched filters’,IEEE mun.,June 1988,36,pp.724–7333Milstein,L.B.,Gevargiz,J.,and Das,P.K.:‘Rapid acquisition for direct sequence spread-spectrum communications using parallel SAW convolvers’,IEEE mun.,July 1985,33,pp.593–6004Chawla,K.K.,and Sarwate, D.V.:‘Parallel acquisition of PN sequences in DS/SS systems’,IEEE mun.,May 1994,42,pp.2155–2164−55101520100101102103104average SNR per symbol, dBm e a n , µsFig.8Parallel-search mean acquisition time of the sign-correlator and conventional approaches for two different spreading gains (SG ¼15,127)without frequency offset101102103104average SNR per symbol, dBm e a n , µsFig.9Serial-search analytical and simulated mean acquisition time of the sign-correlator and conventional approaches for two values of frequency offset (0and 500kHz)and SG ¼31100101102103104average SNR per symbol, dBm e a n , µsFig.10Mean acquisition times for the four different combinations of correlators and search structures without frequency offset,with SG ¼31−50510152010101102103104average SNR per symbol, dBm e a n , µsFig.11Mean acquisition times for the four different combinations of correlators and search structures with a frequency offset of 500kHz,and with SG ¼31。
PERFORMANCE ANALYSIS THROUGH SYNTHETIC TRACE GENERATION Lieven Eeckhout Koen De Bosschere Henk NeefsDepartment of Electronics and Information Systems(ELIS),Ghent UniversitySint-Pietersnieuwstraat41,B-9000Gent,Belgiumleeckhou,kdb,neefs@elis.rug.ac.beABSTRACTMost research in the area of microarchitectural perfor-mance analysis is done using trace-driven simulations.Al-though trace-driven simulations are fairly accurate,theyare both time-and space-consuming which makes themsometimes impractical.Modeling the execution of a com-puter program by a statistical profile and generating a syn-thetic benchmark trace from this statistical profile can beused to accelerate the design process.Thanks to the statis-tical nature of this technique,performance characteristicsquickly converge to a steady state solution during simu-lation,which makes this technique suitable for fast designspace explorations.In this paper,it is shown how more de-tailed statistical profiles can be obtained and how the syn-thetic trace generation mechanism should be designed togenerate syntactically correct benchmark traces.As a re-sult,the performance predictions in this paper are far moreaccurate than those reported in previous research.I.INTRODUCTIONTrace-driven simulations are are generally accurate[1].However,there are serious practical shortcomings withtrace-driven simulations.First,tracing complete bench-mark executions is infeasible as this requires the storageof billions of instructions.Second,simulation time is alsoprohibitive for such huge traces;especially if traces areused to evaluate various processor configurations for vari-ous workloads,which requires many simulation runs.In practice,researchers try to limit the size of traces bystoring only a part of them.One can either take a contigu-ous part of the trace(preferably not a part that belongsto the initialization sequence of the program),or applytrace sampling[3].Recently,statistical modeling[2,7]was presented as a technique to accelerate the simulationprocess.First,a statistical profile or a set of statistical pro-gram characteristics is extracted from a program execu-tion.This statistical profile is then used to generate a syn-thetic trace which is subsequently fed into a trace-drivensimulator,which will estimate the attainable performanceFigure1:The SPROF/SPROG performance modeling en-vironment.The coloured boxes are tools;the white boxesare intermediatefiles.instruction window of128instructions.This paper is organized as follows.Section II presents ageneral overview of our performance modeling environ-ment called SPROF/SPROG.In section III it is shownwhich statistical characteristics are extracted from bench-mark traces to form a viable statistical profile.SectionIV presents the synthetic trace generation algorithm beingused.Our performance modeling environment is validatedin section V.Section VI discusses related work and,finally,we will conclude in section VII.II.THE SPROF/SPROG FRAMEWORKOur general framework for performance modelingis depicted in Figure 1.First,a real programtrace,for example a SPEC benchmark trace,is ana-lyzed by two microarchitecture-dependent profiling toolsand one microarchitecture-independent profiling tool.The microarchitecture-independent profiling tool calledSPROF1extracts statistics concerning the instruction mixand the dependencies between instructions through reg-ister values as well as through memory values.Themicroarchitecture-dependent profiling tools extract statis-tics concerning the branch and cache behaviour of the pro-gram trace for a specific branch predictor and a specificcache organization.This is done by simulating the de-sired aspect,namely the branch or the cache behaviour.The complete set of statistics(program,branch and cachestatistics)which are generated by trace profiling,is calleda statistical profile.Note that a statistical profile can be computed from anactual trace as is shown in Figure1,but it is more conve-nient to compute it on-the-fly from either an instrumentedfunctional simulator,or from an instrumented version ofthe benchmark program running on a real system withoutthe need to store huge traces.A second important note2Statistical PROGram trace generator3These nine instruction classes were identified for the Alpha ISA.However,this does not affect the generality of the technique presentedinteger multiply,floating-point operation,floating-pointdivide single-precision andfloating-point divide double-precision.The second distribution which we identified,is thenumber-of-operands distribution or the probability that in-struction has source operands.Since the numberof operands of an instruction is dependent on its type,wehave actually measured. The variable number of operands is caused by the fact thatsome instructions occur in a register-register as well as ina register-immediate format,while belonging to the same instruction class in our classification.We have also measured the age-of-register-operandsdistribution[4]for each register operand of each instruc-tion type.Indeed,we measured the probability that the-th register operand of an instruction of type having operands,is produced instructions before it in the trace;i.e.instruction produces a register instance which isconsumed by instruction.Formally stated,we measured,withand the age of the-th register operand of instruction .Notice that in these distributions only real(RAW)data dependencies were considered.From the experiments presented in section V,it becameclear that these distributions are not sufficient to accu-rately predict the attainable performance of out-of-order processors.Therefore,we decided to measure these dis-tributions conditionally on the types of the instructions before instruction in the trace,with ranging from (which corresponds to the distributions described above) to.The resulting distributions then are,e.g.for:,and,for. These distributions will be further denoted for arbitrary as,and,respectively.Since we also wanted to model memory dependen-cies,we identified the probability that a load is memory-dependent on the-th store before it in the trace.In other words,we measured the read-after-write distribu-tion,ST LD,with the-th memory operation before instruction in the trace.Note that most of the distributions mentioned here aretheoretically infinite.But for practical purposes,they can be truncated at a certain limit.This limit imposes a constraint on the window size of the out-of-order architec-tures being modeled.In this paper,we chosewhich allows the exploration of all contemporary out-of-order architectures.As stated in section III,the amount of storage requiredto store a statistical profile should be limited.The totalnumber of probabilities to be stored here is,which is exponential in,where specifies what conditional distributions,and are being-1.5%-1.0%-0.5%0.0%0.5%1.0%1.5%2.0%2.5%3.0%3.5%12345678910dependency distance de r r o r (i n p r o b a b i l i t y )Figure 2:The deviation between the marginal age-of-register-operands distribution of the original and the syn-thetic trace (for the li benchmark)as a function of the dependency distance and the number of instructions which are used in the conditional distributions.3.For each source operand,determine the instruction which produces this register instance using the age-of-register-operands distribution.In this step,syntac-tical correctness is guaranteed as follows:look for register dependencies until we get a register instance which is not created by a store or a branch opera-tion.If after a certain maximum number of trails,10,000in our case,still no valid dependency is found,instruction is made dependent on an instruction which comes at least instructions before it in the trace.For processors with an instruction windowsize less than,this means that the dependency is simply squashed.We made this design option in or-der not to eliminate too many register dependencies.4.If instruction is a load operation,use the read-after-write distribution to determine which store operation accesses the same memory address as instruction does.5.If instruction is a branch instruction,determine whether the branch and its target will be correctly pre-dicted using the branch statistics.6.If instruction is a load operation,determine whether the load will get its data from L1cache,L2cache or main memory using the D-cache statistics.7.Determine whether or not instruction will cause anI-cache miss at the L1or L2level.The method proposed here to enforce syntactical cor-rectness has some serious implications on the representa-tiveness of the synthetic trace.In Figure 2,the deviation is shown between the desired marginal age-of-register-operands distribution—measured from the real program trace—and the marginal age-of-register-operands distribu-tion of the generated synthetic trace as a function of which specifies the conditional distributions ,and used.For ,we observe that the distribu-tion resulting from synthetic trace generation (generally)li train.lsp0226go5092stone9.in 250200compress train.in (50K)0217gccgcc.i -O 0182m88ksim -c train.in 350200ijpeg penguin.ppm100170perl scrabbl.pl scrabbl.in 600200vortexvortex.in (persons.250)8002004Notethat this is only true for small values of dependency distances;for higher dependency distances,in our case more than ,the op-posite will be true since distributions sum to one.on a DEC500au station with an Alpha21164processor. The Alpha architecture is a load/store architecture and has 32integer and32floating-point registers,each of which is64bits wide.The SPECint95benchmarks have been compiled with the DEC cc compiler version5.6with the optimizationflag set to-O4.The traces were carefully selected not to include initialization code.B.Out-of-order architectureTo validate our performance modeling methodology,con-temporary out-of-order superscalar architectures were as-sumed.The architectures considered are organized as fol-lows:an instruction window of32,64and128instruc-tions;an issue width of4,8and16;2,4and8memory units;3,5and10non-memory units,respectively.The fetch bandwidth and the reorder bandwidth were chosen to be the same as the issue bandwidth.The latencies of the instruction types are:integer1cycle,load3cycles (this includes address calculation and L1D-cache access), multiply8cycles,FP operation4cycles,single and dou-ble precision FP divide18and31cycles,respectively.All operations are fully pipelined,except for the divide.A dynamic memory disambiguation strategy is as-sumed;re-execution is implemented to recover from mis-speculated loads,which re-executes the instructions which are dependent(directly or indirectly)on the misspeculated load.The branch predictor is a hybrid predictor with a4K meta predictor choosing between a4K bimodal predictor and an8-bit gshare that indexes into a4K predictor[6]. The branch target buffer contains512sets and has an as-sociativity of4.The return address stack contains four entries.The pipeline in front of the instruction window contains four stages:a fetch stage,a decode stage,a reg-ister renaming stage and a dispatch stage which inserts the instructions in the instruction window.The processor configurations simulated have a32K di-rect mapped L1I-cache and a64K2-way set-associative L1D-cache,both having a block size of32bytes.The L2 cache is a unified512K4-way set-associative cache with a block size of32bytes.A load which needs to access L2 cache or main memory takes11or81cycles,respectively.C.Performance prediction accuracyFigure3presents the relative error,synthetic trace real tracec c c Figure 3:Performance prediction accuracy:the relative IPC error between the IPC value of the synthetic trace andthe corresponding real trace.The first row:synthetic trace with perfect branch predictor and perfect caches;second row:real trace with probabilistic branch predictor and probabilistic caches;third row:synthetic trace with probabilistic branch predictor and probabilistic caches.Carl and Smith [2]proposed a hybrid approach,where a synthetic instruction trace was generated based on exe-cution statistics and fed into a trace-driven simulator,as is done here.The work presented here is a continuation of the work initiated by Carl and Smith [2]by suggesting several improvements:incorporating memory dependen-cies,using more detailed statistical profiles and guarantee-ing syntactical correctness of the synthetic traces.As a result,the performance predictions reported in this paper are far more accurate than those reported in [2].VII.CONCLUSIONIn this paper,it was shown how the performance predic-tion accuracy of statistical modeling can be increased by enhancing the statistical profiles and by guaranteeing syn-tactically correct synthetic traces.As a result,the rel-ative prediction errors reported here range from -8%to 14%over the SPECint95benchmarks for a 16-issue out-of-order processor with an instruction window of 128in-structions,which are far more accurate predictions than those reported by Carl and Smith in [2].REFERENCES[1] B.Black and J.P.Shen.Calibration of microprocessor performancemodels.IEEE Computer ,31(5):59–65,May 1998.[2]R.Carl and J.E.Smith.Modeling superscalar processors via statisti-cal simulation.In Workshop on Performance Analysis and its Impact on Design ,June 1998.[3]T.M.Conte,M.A.Hirsch,and K.N.Menezes.Reducing stateloss for effective trace sampling of superscalar processors.In Pro-ceedings of the 1996International Conference on Computer Design (ICCD-96),October 1996.[4]M.Franklin and G.S.Sohi.Register traffic analysis for streamlin-ing inter-operation communication in fine-grain parallel processors.In Proceedings of the 22nd Annual International Symposium on Mi-croarchitecture (MICRO-22),pages 236–245,December 1992.[5]J.L.Hennessy and puter Architecture:A Quan-titative Approach .Morgan Kaufmann Publishers,second edition,1996.[6]bining branch predictors.Technical Report WRLTN-36,Digital Western Research Laboratory,June 1993.[7] D.B.Noonburg and J.P.Chen.A framework for statistical mod-eling of superscalar processor performance.In Proceedings of the third International Symposium on High-Performance Computer Ar-chitecture (HPCA-3),pages 298–309,February 1997.[8]J.V oldman,B.Mandelbrot,L.W.Hoevel,J.Knight,and P.Rosen-feld.Fractal nature of software-cache interaction.IBM Journal of Research and Development ,27(2):164–170,March 1983.。
变频器输入电压变化,输出电压不变的原因概述说明1. 引言1.1 概述变频器作为一种重要的电力调节设备,在工业生产和日常生活中起着至关重要的作用。
它能够将固定频率和幅值的输入电压通过控制器调节,得到不同于输入电源的输出电压。
然而,在实际应用过程中,我们经常会遇到输入电压发生变化的情况。
有趣的是,尽管输入电压变化,但变频器仍然能够保持输出电压基本不变。
那么,这是由于什么原因呢?本文将对此进行深入探讨。
1.2 文章结构本文主要分为五个部分来阐述变频器输入电压变化导致输出电压不变的原因。
首先,在第二部分中,我们将从理论角度简要概述了变频器工作原理,并分析了输入电压变化对其运行产生的影响。
接下来,在第三部分中,我们将详细介绍使得输出电压保持稳定的设计方法与措施。
然后,在第四部分中,我们将引入实际应用中可能遇到的问题,并提供相应的解决方案。
最后,在第五部分中,我们总结了文章并展望了未来发展方向。
1.3 目的本文的目的是通过详细阐述变频器输入电压变化导致输出电压不变的原因,使读者对变频器工作原理和控制方法有更深入的理解。
同时,我们希望为变频器设计与应用工程实践提供一些启示,并对面临的挑战和未来发展方向进行展望。
此外,我们还将思考如何推广和创新地应用变频器技术在特定领域中。
通过这样的努力,我们可以更好地利用变频器技术来满足不同领域需求并促进社会经济发展。
2. 变频器输入电压变化对输出电压的影响分析:2.1 变频器工作原理简介变频器是一种用于调节交流电机转速的设备,通过将固定频率的电源电压转换成可调节频率和幅值的输出电压来控制电机运行。
它主要由整流器、滤波器、逆变器以及控制回路等组成。
2.2 输入电压变化对变频器运行的影响变频器输入电压的稳定性是保证其正常运行的关键因素之一。
输入电压的变化会直接影响到逆变器输出端的交流电压波形与幅值,从而导致输出端供给负载的有效功率发生改变。
当输入电压增加时,输出端供给负载的功率也会相应增加;反之,当输入电压降低时,输出功率也会减少。
Survey ofSwitching Techniques in High–Speed Networksand Their PerformanceYuji OIE,Tatsuya SUDA,Masayuki MURATA,Hideo MIYAHARA1.Department of Electrical Engineering,Sasebo College of Technology,Sasebo857–11,Japan.2.Department of Information and Computer Science,University of California,Irvine,CA92717,U.S.A.,(phone)714-856-5474or7403,(e-mail)suda@.3.Department of Information and Computer Sciences,Faculty of Engineering Science,Osaka Univer-sity,Toyonaka560,Japan.AbstractOne of the most promising approaches for high speed networks for integrated service applications is fast packet switching,or ATM(Asynchronous Transfer Mode).ATM can be characterized by very high speed transmission links and simple,hardwired protocols within a network.To match the transmission speed of the network links,and to minimize the overhead due to the processing of network protocols,the switching of cells is done in hardware switching fabrics in ATM networks.A number of designs have been proposed for implementing ATM switches.While many differences exist among the proposals,the vast majority of them is based on self-routing multi-stage interconnection networks. This is because of the desirable features of multi-stage interconnection networks such as self–routing capa-bility and suitability for VLSI implementation.Existing ATM switch architectures can be classified into two major classes:blocking switches,where blockings of cells may occur within a switch when more than one cell contends for the same internal link, and non-blocking switches,where no internal blocking occurs.A large number of techniques have also been proposed to improve the performance of blocking and nonblocking switches.In this paper,we present an ex-tensive survey of the existing proposals for ATM switch architectures,focusing on their performance issues. 1IntroductionIn recent years,the broadband ISDN(B–ISDN)has received increasing attention as a vehicle to accommo-date a wide variety of communication applications[2].B-ISDN is expected to support diverse applications ranging from low bit rate communications between terminals and host computers,to the broadcasting of high resolution videos.One of the most promising approaches for B–ISDN is the Asynchronous Transfer Mode(ATM).ATM can be characterized by very high speed transmission links and simple,hardwired protocols within a network. In ATM networks,information(such as voice,data,and video)is divided intofixed length data blocks,called cells,and these cells are asynchronously transmitted through the network.To match the transmission speed of the network links,and to minimize the overhead due to processing of network protocols,the switching of cells is done in the hardware switching fabrics in ATM networks,unlike the traditional packet switching approach,where switching is done by computers running communication processes.VLSI technology is used to implement the switching protocols in hardware rather than software.A large number of designs have been proposed for implementing ATM switches.While many differences exist among the proposed ATM switch architectures,most of them are based on multi-stage interconnection networks.This is due to the desirable features of multi-stage interconnection networks;features such as a self–routing capability and suitability for VLSI implementation.ATM switches can be grouped into two major classes:blocking switches and nonblocking switches.In blocking switches,internal blocking of cells occurs within a switch when more than one cell contends for the same internal link.In nonblocking switches,internal blocking will not happen.However,blocking still occurs on the outputs to the switch,when more than one cell is destined for the same output.Note that the blocking due to output contention may also occur in blocking switches.Examples of blocking switches in-clude the Banyan switch and the buffered Banyan switch[14,16].The crossbar switch[5]and the Batcher Banyan switch[11]are examples of nonblocking switches.A large number of techniques have also been proposed to improve the performance of blocking and non-blocking switches.Performance improvement is gained by speeding up switching fabrics,reducing cell loss due to output contention,and/or constructing a switch fabric of multiple switch planes.In this paper,we categorize and discuss the major switch architectures and their improvement techniques, focusing on the performance issues.The performance measures of our interests are the maximum throughput,the delay time,and the cell loss probability.This paper is organized as follows.In section2,we summerize the major assumptions and notations used in this paper.Blocking switches are discussed in section3.The Banyan switch and its improvements are discussed in subsections3.1and3.2,respectively.The buffered Banyan switch(i.e.,the Banyan switch with internal buffers)is discussed in subsection3.3.Section4sur-veys nonblocking switches(subsections4.1,4.2and4.3)and their improvement techniques(subsections4.4 through4.8).Other related research on nonblocking switches are summerized in subsection4.9.Subsection 4.10summerizes and compares the performance of a variety of nonblocking switches.Concluding remarks are given in section5.2Assumptions and NotationsBefore we start surveying various ATM switch architectures,we summerize assumptions and notations used in this paper.Throughout the paper,we assume switch fabrics of size(input ports and output ports).Input and output channels to a switch are of the same speed.Cells are assumed to be of a constant length,and the channel time is slotted with the slot size being equal to a cell transmission time.All the channels are assumed to be synchronized.Arrivals of cells at each of the input ports follow a Bernoulli ly,a cell arrives with probability in a slot,and there is no arrival with probability.Since we use the slot length as the unit of time,also corresponds to the input traffic load to the input channel.Uniform traffic refers to the situation where incoming cells are destined to output ports with a uniform probability of0 12 34 56 70 12 34 56 7101Input Ports OutputPortsFig.1Banyan switch with binary switching elementspaper.)This results in blocking of all cells that wish to use that link,except for one.Hence,the Banyan switch is referred to as a blocking switch.In Fig.2,a cell going to output port0and a cell going to output port1contend for the link between the second and third stages.As a result,only one of them(the cell from input port1in this example)can actually reach its destination(output port0).If each switching element has a buffer(input link buffer),blocked cells will be stored for later transmission;otherwise they will be dropped from the switch.It is this internal blocking which causes the degradation in the performance of the Banyan switch.Kruskal et al.investigated the effects of the internal blocking on the performance of the Banyan switch [17].They developed approximate analysis for the Banyan switch assuming that switching elements do not have buffers and that the traffic is uniform.They obtained the probability that cells successfully reaches their destination output ports.That probability is given by(1)where is the number of stages in the Banyan switch,is the size of a switching element,and is the input traffic load.Equation1turns out to be a very good approximation except when is large and is small.From this equation,we can easily see that decreases sharply as the number of stages increases.Therefore,the size of the Banyan switch impacts heavily on the performance of the switch.3.2Improvements to Banyan SwitchesIn the above we saw that the internal blocking can occur within the Banyan switch,when cells contend for a common link in the switch,and that it limits the throughput of the Banyan switch.(Throughput of a switch is the average number of cells coming out from an output port per slot.)One can improve the limited through-put of the Banyan switch by reducing or eliminating the internal blocking.Dilated switches and replicated switches reduce the internal blocking by providing multiple paths between any input and output port pairs.In the following we discuss these switch architectures.This subsection also briefly refers to another switch ar-chitecture,the Batcher Banyan switch,in which the internal blocking is completely eliminated.The Batcher0 12 34 56 70 12 34 56 7000 001conflictInput Ports OutputPortsFig.2Example of internal blocking in the Banyan switchBanyan switch has a sorting network which precedes the Banyan switch.A sorting network orders the in-coming cells according to the destinations in their ascending order,and feeds them into the Banyan switch, making the switch internally nonblocking.The Batcher Banyan switch is discussed again in detail in section 4.3.2.1Multipath Banyan SwitchesBy replacing each link which connects switching elements in the Banyan switch by(distinct)parallel links, we obtain the–dilated Banyan switch.Another way to provide multiple paths is to construct a switch from a multiple,say,of Banyan switch planes in parallel.This architecture is called the–replicated switch. These two switch configurations provide multiple paths between any input and output port pairs,and thus reduce the probability of the internal blocking.These multipath Banyan switches have been analyzed in[17,32],assuming that switching elements do not have any buffers.For the dilated Banyan switch,it was shown that the cell loss probability decreases when the dilation increases[17,32].Dilation of between4and8is shown to be sufficient enough to reduce the cell loss probability to a very small value.The–dilated Banyan switch becomes a nonblocking switch, when,where is the size of a switching element,and is the number of stages in the Banyan switch.The–replicated switch was shown to provide a similar performance to that of a–dilated switch [17].Adding a supplementary switch plane to the Banyan switch can also provide multiple paths between input and output port pairs.Anido et al.[1]considered a multipath switch constructed from two Banyan switches: one act as a routing network,while the other acts as a switching network.They discussed several path selec-tion algorithms and their performance.IBCsInput PortsOutputPortsFig.3buffered Banyan switch3.2.2Batcher Banyan SwitchesThe Banyan switch possesses an interesting characteristic.If all of the incoming cells are in ascending order relative to their output addresses,the switch is guaranteed to prevent cells from being internally blocked.To guarantee that the cells will be in ascending order requires only that there be some type of sorting network preceding the Banyan switch.In the Starlite switch[11],a Batcher network is used as the sorting network. This switch configuration is called the Batcher Banyan switch.The Batcher Banyan switch belongs to a major class of multi–stage interconnection networks,the class of nonblocking switches.This class of switches is discussed in detail in section4.3.3Banyan Switches with Internal Buffers(Buffered Banyan Switches)In the above we saw that the internal blocking limits the throughput of the Banyan switch,and that the multi-path and the Batcher Banyan switches improve the throughput by either reducing or eliminating the internal blocking.There exists another switch architecture,the buffered Banyan switch,which improves the through-put by storing contending cells in buffers at switching elements and retransmitting them at later time.3.3.1Buffered Banyan SwitchesIf each switching element within the Banyan switch has a buffer to store blocked cells,cell loss will be de-creased,and the throughput will be improved.Such a switch is called the buffered Banyan switch(Fig.3). In the buffered Banyan switch,each switching element is a switch with a buffer(input link buffer)on each of its input links.The buffered Banyan switch has input buffer controllers(IBCs)in front of thefirst stage.The IBC forwards the head cell in the input port buffer to the switch,when there is space in the input link buffer of the corresponding switching element at thefirst stage of the switch.Once cells are fed into the switch,they are controlled by“back pressure”.A cell advances to a switching element at the next stage,if there is space in the input link buffer associated with the destination switching element.If cells from different input link buffers of a switching element are going to the same output link, one cell is randomly chosen and advanced to the next stage,and the others remain in their input link buffers.When congestion happens and the input link buffers startfilling up,the back pressure mechanism propagates from the output ports towards the input ports in the buffered Banyan switch.Thus,cell loss does not occur within this switch;it can only occur at the inputs to the switch.(With an infinite input buffer size,no cell loss occurs.)Jenq[14]analyzed a single–buffered binary Banyan switch(),where each input link of a switching element has a buffer size equal to one cell.He assumed buffers of infinite capacity on the input ports to the switch,and showed that the maximum throughput is approximately0.45for large values of,an inadequate throughput value.One approach to improve this limited throughput is to use input link buffers of larger capacity.Kim et al.[16]analyzed a multiple–buffered binary Banyan switch constructed of binary switching elements with input link buffer space for more than one cell.They assumed that cells in an input link buffer are served on an FIFO basis.It was shown that the multiply buffered approach achieves a minor improvement in performance as compared to that of a single–buffered approach.This suggests that the performance bottleneck of the multiple–buffered switch lies,not in the limited buffer capacity,but in the contention among the head cells. This is referred to as head of the line blocking(HOL blocking)and describes the following phenomenon. When more than one cell contends for the same output link of a switching element(output contention),only one cell may pass through the switching element;the others are blocked and stored in the input link buffers. When the head of the line cell is blocked due to output contention,all the cells in the same input link buffer are blocked,if cells are served on an FIFO basis within the buffer.The HOL blocking will be discussed again in subsection4.5in more detail.In the two papers discussed above[14,16],the buffered Banyan switch is constructed of binary switching elements.In[3,32],it was shown that the buffered Banyan switch constructed of switching elements of size greater than two results in a higher throughput.It is interesting to note that,in case of nonblockingswitches,it has been shown that the maximum throughputdecreases,as the size of switching elements increases[15].For instance,a nonblocking switch constructed of switching elements achieves the maximum throughput of0.75,while the switch constructed of switching elements results in smaller throughput of0.68.The performance of the buffered Banyan switch under non–uniform traffic has been analyized.Wu[35] investigated the effects of existence of a point–to–pointconnection on the performance of the single–buffered Banyan switch through simulation.(Refer to section2for the definition of a point–to–point connection.) Wu showed that the maximum throughput of the buffered Banyan switch decreases significantly,when there exists a point–to–point connection in addition to uniform traffic.Therefore,the buffered Banyan switch does not favor non–uniform traffic.Kim et al.[16]showed that,if all the inputs to the switch are of a point–to–pointconnection type,single–buffered and multiple–buffered Banyan switches result in almost the same throughput.They also showed that,for the mixture of a point–to–pointconnection and uniform traffic,the multiple–buffered Banyan switch results in throughput improvement of10%to15%over the single–buffered Banyan switch,depending on traffic load from a point–to–point connection.This improvement,however,becomes negligible as the size of the switch becomes large.3.3.2Buffered Banyan Switches with Bypass QueueingIn the above subsection,we observed that the HOL blocking can happen on the input link buffers of the switching element in the buffered Banyan switch.The HOL blocking occurs when the HOL cell is blocked due to output contention,and if cells are served on an FIFO basis within the input link buffer.This observation leads to the following technique called“bypass queueing”in order to improve the performance of the buffered Banyan switch.Bubenik and Turner[3]proposed the bypass queueing discipline.In bypass queueing,when the HOLblocking occurs cells in that particular input buffer“bypass”the HOL cell and sequentially join competition for the available output links until some cell wins competition.This bypass queueing discipline is a variation of the window selection policy to be discussed in subsection4.5.Shiomoto et al.[29]provided an exact analysis for a switching element assuming bypass queue-ing and obtained its throughput–average delay performance.They showed that bypass queueing(on a switching element)attains a maximum throughput close to1.0.This is a significant increase from0.75,the throughput of a switching element with FIFO input link buffers.They also analyzed the Banyan switch comprised of binary switching elements assuming bypass queueing,and showed that the maximum through-put is achieved when the size of an input link buffer at a switching element is8.Bubenik et al.[3]studied the multiple–buffered Banyan switch of size constructed of binary switchingelements.They showed,through simulation,that bypass queueing achieves the maximum through-put of0.63,when is large.They also showed that the maximum throughput of the multiply buffered Banyan switch constructed of switching elements is nearly0.75,if a bypass queueing discipline is assumed at input link buffers.The buffered Banyan switches constructed of switching elements of larger size achieves higher throughput.4Nonblocking ATM SwitchesThe Banyan switches that we studied in the previous section belong to the class of blocking switches,where the internal blocking of cells may occur when they contend for a common link within a switch.In this section, we focus on another major class of multistage interconnection networks,the class of nonblocking switches, where the internal blocking does not occur.The Batcher Banyan switch discussed briefly in subsection3.2.2 is an example of the switches in this class.The crossbar switch is another example.In this section,we de-scribe possible architectures for nonblocking switches,investigate their performance,and discuss various techniques to improve that performance.4.1Architectures for Nonblocking ATM SwitchesThere are two major architectures to implement nonblocking switches:crossbar switch based architecture and Banyan switch based architecture.We will review each switch architecture in detail in this subsection.4.1.1Crossbar SwitchesThe crossbar switch has–inputs and–outputs with a fully connected topology;that is,there are cross points within the switch.Since it is always possible to setup a connection between any arbitrary input and output pair,internal blocking will not happen in the crossbar switch.Note that,although blocking will not occur inside the switch,it may still occur on the outputs to the switch,when more than one cell is destined for the same output.The architecture of the crossbar switch has some advantages.First,it uses a simple two–state cross–point switch(open and connected state)which is easy to implement.Also,the modularity of the switch design allows easy expansion.One can build a larger switch by simply adding more cross–point stly, this switch design provides for a low latency as compared to the Banyan type switches,because it has the smallest number of connecting points between an arbitrary input and output pair.One disadvantage to this design,however,is the fact that it uses the maximum number of crosspoints(cross-point switches)needed to implement an switch.The Knockout Switch is a nonblocking switch based on the crossbar design[8,36].It has inputs and outputs and consists of a crossbar type(crossbarlike)switch with a bus interface module at each outputInputPorts Output PortsBusInterfaces12N12NFig.4Knockout switch (Fig.4).Since each input has a direct connection to each output,cells will not interfere with one another;internal blocking does not occur.However,since more than one cell may go to the same output port,output contention may result.The purpose of the bus interface is to resolve this conflict.The bus interface has three components.They are:the cell filter,the concentrator,and the shared buffer (Fig.5).The functions of these components are described below.In the Knockout switch,each bus interface sees all cells from the input ports.The function of the cell filter is to separate cells destined for this particular output and those destined for other outputs in the switch.The cell filter does this by setting an activity bit in the cell’s header to logical one,if that cell is destined for this output.The activity bit is set to logical zero otherwise.At the beginning of each slot,all of the cell filters are initially open.That is,they will allow data bits from the input buses to pass into the concentrator.As the cell headers pass through these filters,there is a bit-by-bit comparison performed between the destination address (found in the cell header)and the particular output’s address.When the filter finds a cell destined for another output,it sets the activity bit of that cell’s header to logical zero,otherwise the activity bit is set to logical one.Thus,at the end of the slot,all cells destined for a particular output will be in the concentrator associated with that output.The name “concentrator”comes from the function it performs.Specifically,the concentrator provides for an to concentration,where is the number of separate buffers in the shared buffer.There are outputsfrom the concentrator into the shared buffer.If some number of cells,(),arrive for a particular output,then they will appear on the outputs 1to of the concentrator.When ()cells arrive,then cells will be dropped.The work of the concentrator is essentially analogous to a tournament ofplayers competing for prizes,where the prizes represent the output ports.Initially all of the cells are able to compete for the first ofthe concentrator outputs,where there will be one winner.All of the losers ()then compete for the second output.This process,all of the losers of the previous stage competing for the concentrator output in this stage,continues times.After the winner is selected in stage ,the remaining losers are dropped.If cells ()are destined for a particular output,there will be cells which have an activity bit set to zero emerging from the concentrator.Otherwise,all of the packets emerging from the concentrator are destined for this particular output.The shared buffer then receives these cells,but only stores those which have activity bit set to one circularly into buffers.In this way the concentrator has reduced the size of the buffer space from a multiple of down to a multiple of .This reduces the overall switch complexity,thus decreasing the cost of fabrication.ConcentratorShared Buffer 1L1N Cell FiltersOutputInputs22Fig.5Bus Interface of the Knockout switchThe advantages to this design include a low latency and no internal blocking.These are due to the use of a crossbar like switch which provides the smallest number of connection points between an arbitrary input and output ports.In addition,this switch is self–routing.The bus interface allows only those packets which are destined for the particular output to be transmitted on that output stly,this switch was designed with modularity in mind so that it will allow for modular growth without complex changes.4.1.2Batcher Banyan SwitchesIn the previous subsection,we studied nonblocking switches based on the crossbar design.In this section,we discuss nonblocking switches based on the Banyan switch design.In the Banyan switch,the internal blocking is possible.In the worst case,it is possible thatInput PortsOutputPortsBanyanNetworkBatcherNetworkFig.6Batcher Banyan switchCopy NetworkInput PortsOutputPortsExpanderNetworkSort-to-DestinationNetworkFig.7Starlite switchsorting network,called the Sort–to–Destination network,where they are arranged in ascending order accord-ing to their destination addresses.The output of the Sort–to–Destination network then goes into a Banyan switch,called the Expander,where the relative ordering of the cells is changed to an absolute ordering(and cells are delivered to the output ports).The Starlite switch can support multicast services,in addition to regular one–to–one communication.In order to yield multicasting capabilities,a copy network is placed between the inputs and the Batcher Banyan switch(Fig.7).Thefirst stage of the copy network is a sort–to–copy network,which takes as input both source and copy cells.The output from the sort–to–copy network is then those cells ordered by their source addresses.Note that copy cells and source cells from the same input will be adjacent to one another.The copy network then copies the informationfield from the source cells into its corresponding copy cells.These cells are then sent into the Batcher Banyan switch.Regardless of whether it is a crossbar based or a Batcher Banyan based switch,output contention still oc-curs in a nonblocking switch,when more than one cell is destined for the same output.When this happens, cells which lose contention are dropped from a switch,if the switch does not have any buffering discipline. In order to minimize the cell loss,buffering of cells is necessary.Buffers may be placed on the inputs to the switch,or on the outputs to the switch,or possibly on both.Queueing of cells may be implemented in a shared buffer common to all the input/output ports.In the following,wefirst study the performance of non-blocking switches without buffers.We then investigate various buffering schemes and techniques to improve the performance of nonblocking switches.Input BuffersInput PortsOutputPorts Fig.8Nonblocking switch with input buffers4.2Nonblocking Switches Without Input BuffersIn this subsection,we study the performance of the simplest nonblocking switch,a nonblocking switch with-out any buffers.In this case,when output contention happens,only one cell is successfully transferred to the destination output port,and the remaining cells are dropped from the switch.In[26],Patel analyzed an nonblocking switch(the cross–bar switch)in the context of intercon-necting multiprocessors,and obtained the cell loss probability for both of=finite and cases.It is assumed that the speed of a switch is equal to that of the input channels;a switch transfers at most one cell per slot from each of the input ports.Patel showed that,for the case of,the probability that a cell wins an output contention is given byMaximum Throughput 20.750040.655360.630280.6184Another possible approach to improve the performance of the nonblocking switch is to reduce or elimi-nate the HOL blocking.When the HOL cell is blocked due to output contention,a cell behind it going to an available output port can be sent instead.This reduces the HOL blocking and results in a better throughput performance.Subsection4.5explains the window selection discipline,where one of thefirst cells in an input buffer is selected and sent prior to the HOL e of a shared buffer also improves the throughput of a switch by eliminating the HOL blocking.In the shared buffer switch there is no buffer on the inputs,nor on the outputs.Arriving cells are immediately injected into the switch.When output contention happens, a winning cell goes through a switch,and the losers are stored in a shared buffer common to all of the in-put ports for later transmission.Since cells which lose output contention are stored in a shared buffer,the queue structure is not retained.Newly arriving cells immediately join in the competition for available out-puts.In addition,cells in the shared buffer also have access to the switch.Since more cells are available to select from,it is possible that less outputs will be idle in the shared buffer scheme.Thus,the throughput of the shared buffer switch is slightly better than that of the switch with the window selection discipline.In subsection4.6,the switch with a shared buffer and its performance are discussed.The third possible approach to improve the performance of the nonblocking switch is to optimally select one cell among the contending cells and transfer it to the output,instead of selecting a cell randomly.There are a number of possible selection policies.For instance,selecting a cell from the longest queue may improve the switch performance.In subsection4.7,we discuss the longest queue selection and the priority selection schemes.A fourth possible technique for improving performance of the nonblocking switch is to construct a switch consisting of multiple switch planes(a parallel switch).Parallel switches are discussed in subsection4.8. Other related research is summerized in subsection4.9.Subsection4.10summerizes and compares the per-formance of a variety of nonblocking switches.4.4Speedup of Nonblocking Switches With FIFO Input BuffersIn the preceding subsections,we assumed that,when output contention happens,only one of the contending cells is accepted at the output.This section considers the case where up to()contending cells are accepted at an output port per slot.(is the number of input ports.)is referred to as the speedup ratio of the switch in the following.This switching speed of cells may be realized by,for instance,the Batcher Banyan switch operating times faster than the input/output channels.The Knockout switch with an“to concentrator”at each output port has the same switching speed[36].The ATOM switch speeds up the cell switching speed by transmitting arriving bits in parallel through the switch[31].A number of papers have studied the effect of increasing the cell switching speed on the switch performance.Tab.2summerizes the mathematical models and the performance measures obtained in the past research.In what follows,we discuss the major results of the past research.Wefirst consider the ideal case where the switch can transfer the maximum of cells per slot to an output(i.e.,).In this case,buffering of cells is not necessary on inputs,because the switch operates fast enough to handle all the cells,even when the cells from all the input ports(of them)are going to the same output.Queueing of cells occurs only on the outputs(Fig.9).A switch only with output buffers are referred to as the output buffered switch[15,31].The performance of the output buffered switch has been investigated.The average delay is analyzed in [15]for infinite buffer case,and the cell loss probability at output buffers is obtained in[10]forfinite buffer case.In[15],Karol et al.assumed uniform traffic—an incoming cell goes to a particular output with the probability。
Journal of Hazardous Materials 254–255 (2013) 36–45Contents lists available at SciVerse ScienceDirectJournal of HazardousMaterialsj o u r n a l h o m e p a g e :w w w.e l s e v i e r.c o m /l o c a t e /j h a z m atFlocculation of both anionic and cationic dyes in aqueous solutions by the amphoteric grafting flocculant carboxymethyl chitosan-graft -polyacrylamide ଝZhen Yang,Hu Yang ∗,Ziwen Jiang,Tao Cai,Haijiang Li,Haibo Li,Aimin Li,Rongshi ChengState Key Laboratory of Pollution Control and Resource Reuse,School of the Environment,School of Chemistry &Chemical Engineering,Nanjing University,Nanjing 210093,PR Chinah i g h l i g h t s•A series of amphoteric grafting floc-culants (CMC-g -PAM)was prepared.•Flocculation of both anionic andcationic dyes by CMC-g -PAM was studied.•The flocculation mechanism of CMC-g -PAM was discussed in detail.•The grafted PAM contributed much to the bridging and sweeping floccula-tion effects.g r a p h i c a la b s t r a c tA series of the amphoteric grafting flocculants,carboxymethyl chitosan-graft-polyacrylamide,has been prepared and employed for removal of both anionic and cationic dyes from aqueoussolutions.a r t i c l ei n f oArticle history:Received 31July 2012Received in revised form 15January 2013Accepted 22March 2013Available online 27 March 2013Keywords:Amphoteric grafting chitosan-based flocculants Dye removalFlocculation performance Flocculation mechanism Fractal dimensiona b s t r a c tIn the current work,a series of amphoteric grafting chitosan-based flocculants (carboxymethyl chitosan-graft -polyacrylamide,denoted as CMC-g -PAM)was designed and prepared successfully.The flocculants were applied to eliminate various dyes from aqueous solutions.Among different graft copolymers,CMC-g -PAM11with a PAM grafting ratio of 74%demonstrated the most efficient performance for removal of both the anionic dye (Methyl Orange,MO)and the cationic dye (Basic Bright Yellow,7GL)under the corresponding favored conditions (80mg/L of the flocculant at pH 4.0,and 160mg/L at pH 11.0).In comparison with its precursors,chitosan and carboxymethyl chitosan,CMC-g -PAM11showed higher removal efficiencies and wider flocculation windows.More importantly,the graft copolymer produced notably more compacted flocs based on image analysis in combination with fractal theory,which was of great significance in practical water treatment.Furthermore,the flocculation mechanism was discussed in detail.The grafted polyacrylamide chains were found to contribute much to the improved bridging and sweeping flocculation effects,but reduced charge neutralization flocculation for the effect of charge screening.© 2013 Elsevier B.V. All rights reserved.ଝSupported by the Natural Science Foundation of China (Grant Nos.51073077,50938004and 50825802),the Open Fund from State Key Laboratory of Pollution Control and Resource Reuse of Nanjing University (Grant No.PCRRF11004),and the Scientific Research Foundation of Graduate School of Nanjing University (Grant No.2012CL06).∗Corresponding author.Tel.:+862583686350;fax:+862589680377.E-mail address:yanghu@ (H.Yang).1.IntroductionDue to the development of modern society,textile industry has become one of the major polluters of water [1].As the main pollutants in textile wastewaters,dyes are not only esthetically unwelcome,but also engender annoyance to the aquatic biosphere [2,3].Since the accumulation of dyes hinders sunlight penetra-tion and depletes the dissolved oxygen,the ecosystem of receiving0304-3894/$–see front matter © 2013 Elsevier B.V. All rights reserved./10.1016/j.jhazmat.2013.03.053Z.Yang et al./Journal of Hazardous Materials254–255 (2013) 36–4537water has been largely disturbed[1,4].In order to deal with this problem,environmental regulations in many countries have made it mandatory to eliminate dyes before discharging textile wastew-aters[5,6].In the past decades,various techniques,includingfloccula-tion,adsorption,filtration,oxidation,and electrolysis,have been employed for removal of dyes from wastewater[7–9].However, no single economically and technically viable method is available until now.To achieve adequate level of color removal,two or more methods are usually combined[1].Therein,flocculation is regarded as one of the most successful primary purification techniques to produce better feed quality for further treatment owning to its low cost and high efficiency[1,2,10,11].Recently,extensive attention has been given to natural poly-mericflocculants for their eco-friendliness,biodegradability and free of secondary pollution risk,in comparison with traditional metal-based and synthetic polymeric ones[2,5,12–14].Among natural polymer-basedflocculants,chitosan,obtained from the deacetylation of chitin(the second-most-abundant natural poly-mer in the world),has attracted great interest and been widely applied in manyfields[1,15,16].Chitosan is a cationic polyelec-trolyte because of the presence of primary amino groups,which is suitable for eliminating anionic dyes from the perspective of the charge attraction[15].However,textile wastewater is normally complicated[1],which may also contain cationic dyes in addition to anionic ones.Hence,chitosan with only cationic groups is usually incapable for removal of dyes with various charge types.As a con-sequence,amphotericflocculants[17,18],containing both cationic and anionic ions,are needed to eliminate both anionic and cationic dyes.Among amphoteric chitosan-basedflocculants,the simplest one is carboxyalkyl chitosan(CMC)[19].However,the applicability of CMC is always limited due to relatively lower molecular-weight and shorter storage life.Graft polymerization is an efficient method to overcome these drawbacks.It is well known that polyac-rylamide(PAM)is efficient in water purification,[20]though it meanwhile exhibits shortcomings in toxicity of residual monomer [2]and non-biodegradability.Besides,the high molecular-weight PAM can produce denseflocs,which are usually helpful for easy sedimentation in water treatment.Therefore,grafting PAM onto CMC is promising,and the tailor-madeflocculants with controlled molecular-weight and moderate biodegradability are expected to combine the advantages of both natural and synthetic polymers [21–23].Nevertheless,research on removal of both anionic and cationic dyes using amphoteric grafting chitosan-basedflocculants is quite limited until now.Besides the color removal efficiency,thefloc properties are also crucial inflocculation process.Generally,compactflocs are pre-ferred to yield lower sludge volumes and reduce the total cost[24]. Since the irregular and porousflocs have been found to be geomet-rically fractal[25],the fractal theory becomes a useful approach for evaluating thefloc properties.Based on the fractal concept,the most powerful parameter is the fractal dimension,which presents the compactness of theflrger fractal dimensions indicate denserflocs,which are desired in theflocculation process.Frac-tal dimensions can be presented by various forms according to the space dimension.One of the commonly used fractal dimensions is two-dimensional one(D2)[26],and it is defined by the power law relationship between the projected area(A)and the characteristic length(l),which can be measured by image analysis(IA).Taking into account of all factors as mentioned above,a series of amphoteric graftingflocculants,carboxymethyl chitosan-graft-polyacrylamide(CMC-g-PAM),with different grafting ratios,was designed and synthesized in the current study.The structure of obtained copolymers was characterized by Fourier transform infrared(FTIR),1H nuclear magnetic resonance(1H NMR)spectra,X-ray diffraction(XRD),scanning electron microscope(SEM) and zeta potential(ZP)measurements in detail.Given that the amphotericflocculants have both cationic and anionic groups, they were employed forflocculation of an anionic dye(Methyl Orange,denoted as MO)and a cationic dye(Basic Bright Yellow 7GL,denoted as7GL)from aqueous solutions for thefirst time. The effects of pH,flocculant dosage,grafting ratio and initial dye concentration on the decolorization efficiency were systematically studied.Besides,the properties offlocs were especially investi-gated by fractal theory.Moreover,theflocculation mechanisms were also discussed in detail based on theflocculation performance andfloc structure.2.Experimental2.1.MaterialsChitosan with85.2%degree of deacetylation was purchased from Shangdong Aokang Biological Co.,Ltd.Its viscosity-average molecular-weight was8.34×105g/mol,calculated from the intrin-sic viscosity[27].Monochloroacetic acid(Zibo Lushuo Economic Trade Co.,Ltd.),acrylamide(Nanjing Chemical Reagent Co.,Ltd.), ceric ammonium nitrate(CAN)(Sinopharm Chemical Reagent Co., Ltd.),MO(Sinopharm Chemical Reagent Co.,Ltd.)and7GL(Rugao Xingwu Chemical Co.,Ltd)were used without further purification. All other chemicals were purchased from Nanjing Chemical reagent Co.,Ltd.Distilled water was used in all experiments.2.2.Preparation of CMC-g-PAMCMC was synthesized according to the method reported in authors’previous work[18,23].Its substitution degree of car-boxymethyl groups is48.3%,calculated from1H NMR spectrum.A desired amount of solid CMC was dissolved in400mL of1wt.% HCl solution.After30min of stirring under N2atmosphere,a certain amount of CAN was added as the initiator.The solution was kept for5min of pretreatment by the initiator to suppress formation of PAM homopolymer[28],before dropwise addition of the acryla-mide monomer aqueous solution.Then the reaction was carried out for3h under N2atmosphere,and thefinal sample was precipitated out in acetone.The obtained solid product wasfiltered,washed by ethanol,extracted using acetone as the solvent in a Soxhlet apparatus for72h for removal of impurities,[28–30]andfinally vac-uum dried at room temperature.The product of CMC-g-PAM was obtained at last.Furthermore,three different CMC-g-PAM samples with various feed ratios between CMC and acrylamide(1:1,1:3and 1:5)were designed and prepared in this work,which were named CMC-g-PAM11,CMC-g-PAM13and CMC-g-PAM15,respectively.2.3.CharacterizationFTIR spectra were recorded on a Bruker Model IFS66/S FTIR spectrometer.The interval of measured wave numbers was 650–4000cm−1.1H NMR spectra were carried out using a Bruker AVANCE Model DRX-500spectrometer,operating at500MHz,in a mixed solvent composed of CF3COOD and D2O with a mass ratio of1:1.The wide-angle XRD patterns were recorded on a Shimadzu Model XRD-6000X-ray diffractometer,operating at a volt-age of40kV and a current of30mA using Cu K␣radiation ( =0.15418nm).The surface morphologies were observed by a Shimadzu Model SSX-550SEM under an acceleration voltage of10.0kV.38Z.Yang et al./Journal of Hazardous Materials 254–255 (2013) 36–45Zeta potential was measured using a Malvern Model Nano-Z Zetasizer.2.4.Flocculation experiment2.4.1.Flocculation procedureThe MO and 7GL aqueous solutions with respective pre-determined concentration were employed as synthetic wastewa-ter.The pH levels were adjusted by dilute HCl or NaOH aqueous solution.Jar tests were conducted using 250-mL jars at room tem-perature.On the basis of previous work [15],the procedure was designed as follows:After addition of the prescribed amount of flocculants into 100mL synthetic wastewater,the mixture was strongly stirred at a speed of 200rpm for 3min at the beginning,then at 50rpm for 10min,and finally left to settle for rge amount of flocs could be observed during the experiments.The small flocs continued to aggregate and became larger one,finally settled down.Preliminary tests demonstrated that the flocculation equilibrium was achieved with nearly constant residual dye con-centration after sedimentation for 2h.Thus,2h was selected as settling time in this work.2.4.2.Residual dye concentration measurementAfter sedimentation for 2h,sample solutions were collected using a syringe and filtrated by a Millipore filter with 0.45m pore-diameter.The amount of dye adsorbed by the filter was negligible (below 2%).The filtrates were then adjusted to pH at 7.0using dilute HCl or NaOH aqueous solution,and the residual dye concentration was determined by a UNIC Model UV2100UV-visible spectropho-tometry.A calibration curve for each dye was measured in advance.The absorbances of MO and 7GL were detected at the wavelengths of 463and 415nm,respectively.The color removal efficiency was calculated as follows:Color removal (%)=1−C C 0×100%(1)where C 0and C were the dye concentrations in the solution before and after flocculation,respectively.2.4.3.Floc properties measurementFloc properties were measured by IA.After sedimentation,the flocs were carefully taken from the jars and transferred onto a glass dish.A Pentax Model K-m digital camera with a 200-mm lens was used to take photos.A and l were derived from the photos using image analysis software.In this work,l was defined as the largest350030002500200015001000-COO-Wave numb er (cm -1)Fig.1.FTIR spectra of various samples:chitosan (a),CMC (b),CMC-g -PAM11(c),CMC-g -PAM13(d)and CMC-g -PAM15(e).projection length,and applied to evaluate the floc size.D 2was then obtained from the power law relationship.[23,31]A ∝l D 2(2)The linear fitting method was adopted to obtain the slope D 2of the log-log plot of A and l for better comparison,since it was more widely used than non-linear fitting reported in previous literatures [23,31–34].3.Results and discussion3.1.Characterization of the flocculantsA series of amphoteric grafting flocculants,CMC-g -PAM,with different grafting ratios,was prepared,and the detailed prepara-tion method has been described in the experimental part.The FTIR spectra of chitosan,CMC and various CMC-g -PAM samples are illus-trated in Fig.1.The bands of 1025(C-OH),1151(bridge -O-),1585(N H),3286cm −1(O H)in Fig.1(a)are the characteristic FTIR peaks of chitosan.A new peak at 1734cm −1in Fig.1(b)indi-cates that carboxymethyl groups [19]are successfully introduced to chitosan backbone.Besides the aforementioned peaks,the peaks around 1651,1601and 1447cm −1as shown in Figs.1(c–e)are attributed to amide-I (C O)on the PAM chains,the overlap of COO −on the CMC backbone and the amide-II (N H)on the PAM chains,and the C N stretching in the graft copolymers,H3-6Chemical shift (pp m)Fig.2.1H NMR spectra of various samples:chitosan (a),CMC (b),CMC-g -PAM11(c),CMC-g -PAM13(d)and CMC-g -PAM15(e).Z.Yang et al./Journal of Hazardous Materials 254–255 (2013) 36–4539respectively.[23,35]As the homopolymer of PAM has been already removed during the preparation process as described in the exper-imental part,the detected PAM chain should be chemically grafted onto the CMC backbone.In addition,the 1H NMR spectra as shown in Fig.2provide further structure evidences of the amphoteric grafting flocculants.Aside from the signal of H1(4.66ppm),H2proton (2.95ppm)and H3-6protons (3.25–3.80ppm)in chitosan appearing in Fig.2(a),the resonance at 3.93ppm in Fig.2(b)is assigned to the protons on the carboxymethyl groups [19].The new intense peaks at around 1.50and 2.10ppm in Fig.2(c–e)are the resonances of methane and methylene protons on the PAM chains [23],respectively.According to the 1H NMR spectra of Fig.2(c–e),the grafting ratios (G )of CMC-g -PAM11,CMC-g -PAM13and CMC-g -PAM15are also calculated,which are 74%,180%and 212%,respectively.G is defined as [36]G (%)=WPAMW CMC×100%(3)102030405060(e)(d)(c )(b)(a )2 (o)Fig.3.XRD patterns of various samples:chitosan (a),CMC (b),CMC-g -PAM11(c),CMC-g -PAM13(d)and CMC-g -PAM15(e).Fig.4.SEM images of various samples:chitosan (a),CMC (b),CMC-g -PAM11(c),CMC-g -PAM13(d)and CMC-g -PAM15(e).40Z.Yang et al./Journal of Hazardous Materials 254–255 (2013) 36–45Z e t a p o t e n t i a l (m V )pHFig.5.Zeta–pH profiles of chitosan ( ),CMC ( ),CMC-g -PAM11(᭹),CMC-g -PAM13( )and CMC-g -PAM15( ).where W PAM and W CMC are the weights of the PAM branches and the CMC backbone,respectively.Moreover,the XRD patterns of various samples are also exhib-ited in Fig.3to investigate the structure of the final products further.It is demonstrated in Fig.3(a)that chitosan consists of a major reflection around 2Â=20◦which is the representative of the crys-tal form II [37].In Fig.3(b),CMC has a relatively weaker band at 2Â=20◦for the decrease of the ordered structure,which is in agree-ment with previous work [38].Additionally,in Fig.3(c–e),with the increase of G ,the peaks become broader but weaker accordingly,indicating that the structure of the amphoteric grafting products become amorphous.Therefore,it can be concluded from the XRD patterns that the ordered structure of chitosan has been destroyed by graft chain notably.[37]To give direct observation of the amorphous features of the final products,SEM images have been taken and showed in Fig.4.Apparent differences can be observed among chitosan,CMC and various CMC-g -PAM samples.Chitosan has relative regular struc-ture in Fig.4(a),which is weakened in CMC as shown in Fig.4(b)due to the introduction of carboxymethyl groups.After PAM has been grafted onto CMC,the morphologies of CMC-g -PAM samples in Fig.4(c–e)change more drastically:more porous and looser structures are found.It should be noted that,among the three CMC-g -PAM samples,CMC-g -PAM11exhibits the most porous structure,which may be due to the fact that the graft copolymers with suitable grafting ratio are beneficial to form loose condensed structure.The porous morphologies of samples are expected to wield positive influence on flocculation efficiency,since more pores are favored for adsorbing and flocculating pollutants in water.It is well known that the charge properties of flocculants exerts great influence on the flocculation efficiency.[23,39]Therefore,the pH dependence of ZP of various sample solutions are also mea-sured and shown in Fig.5.Chitosan shows significantly positive ZP before pH 8.0due to its characteristics of cationic polyelectrolytes,whereas the ZP near zero is depicted after pH 8.0.In contrast,the amphoteric CMC and CMC-g -PAM samples have similar isoelec-tric points (pI)around pH 5.5.Before the pI,they are positively charged because of the protonation of amino groups on the CMC backbone,which would be advantageous for flocculating anionic dyes;whereas negative surface charges occur due to the deprotona-tion of carboxymethyl groups after the pI,which will be favored for removal of cationic dyes.An interesting phenomenon is observed that the absolute values of ZP become closer to zero,as the G of the grafted copolymers turns larger.This is mainly ascribed to the fact that the non-ionic PAM branch chains screen the charges on the CMC backbone [40].Fig.6.Flocculation of MO (a)and 7GL (b)solutions (dyes concentration is 100mg/L)with various dosages of CMC-g -PAM11at different pH.The inset figures are the pH dependence of the optimal dosages and their corresponding color removal effi-ciency.3.2.Flocculation performance of various flocculants for dye removal3.2.1.Effects of pH and flocculants dosageAmong numbers of external parameters which can affect the flocculation efficiency,pH and dosages are the two most important ones,especially for ionic flocculants.[12,41]Thus,the effects of pH and flocculants dosage on the color removal efficiency were stud-ied firstly.As mentioned above,CMC-g -PAM11has the most pores among the graft copolymers based on SEM observation,which is expected to benefit the flocculation.Therefore,it is selected to floc-culate MO and 7GL at various pH levels,and the dependences of color removal on flocculant dosage at various pH are illustrated in Fig.6.Considering the ionic type of the two dyes and the ZP of the flocculant from the perspective of charge attraction,acid con-dition is chosen for MO while alkaline condition is for 7GL.Since MO is insoluble below pH 3.0,its test pH range starts from pH 4.0.It is found from Fig.6that,the color removal efficiencies at various pH conditions all increase with increasing dosage before reaching maximum values at corresponding optimal ones,then slightly decrease.This phenomenon indicates that charge neutral-ization [5,42]plays an important role in the flocculation process.The flocculant neutralizes the net charges of dyes,and some kinds of insoluble complexes are formed which will further aggregate together and settle down.Lower dosage of the flocculant is not enough for neutralization whereas excessive flocculant engenders restabilization of the flocs due to the electrostatic repulsion among primary flocs.However,in comparison with some of other floc-culants [43,44],the restabilization effect of CMC-g -PAM is notZ.Yang et al./Journal of Hazardous Materials 254–255 (2013) 36–4541Fig.7.Flocculation of MO solution at pH 4.0(a)and 7GL solution at pH 11.0(b)with different initial dye concentration using various dosages of CMC-g -PAM11.The inset figures are the dependence of the optimal dosages and their corresponding color removal efficiency on the initial dye concentration.so remarkable.The improvement of the flocculation window is ascribed to the effect of charge screening by PAM chains.In addition,the amphoteric CMC-g -PAM11displays a consider-ably high efficiency to eliminate both anionic and cationic dyes at the corresponding optimal conditions.For removal of MO as shown in the inset figure of Fig.6(a),the optimal condition is the dosage of 80mg/L at pH 4.0.Taking the charge neutralization effect into account,MO is an anionic dye that has good affinity with cationic matter,and thus the lowest pH in the test range is welcome due to the most positive ZP of the flocculant.Nevertheless,the optimal pH for removal of 7GL is not pH 13.0(the highest pH in the test range).On the contrary,160mg/L of flocculants at pH 11.0shows the best removal efficiency for 7GL in the inset figure of Fig.6(b).This can be explained as follows:Besides the ZP level of the floccu-lant,the ionization degree of the dye also plays an important role in dye removal [45,46].Higher ionization degree is beneficial for enhancing the flocculation efficiency.At excessively high pH,large numbers of negative charged hydroxyl groups (OH −)surround the cationic groups of 7GL molecules tightly through charge attraction,and prevent the ionization effectively.Therefore,the dye removal efficiency is reduced with decreasing ionization degree.3.2.2.Effect of the initial dyes concentrationIn actual water treatment,the initial dye concentration of wastewater is variable,which meanwhile influences the floccu-lation performance of flocculants.Consequently,the effect of the initial dye concentration was investigated at the corresponding optimal pH,and the results are represented in Fig.7.As shown in Fig.7,higher optimal dosage of the flocculant is required with increasing initial dye concentration for both MO and 7GL.This is fully consistent with previous works [47,48].Data analysis in the insert figures of Fig.7reveals that,in the test con-centration range,they both show linear relationships between the optimal dosage and initial dye concentration.The stoichiometric linear relationship is another evidence of the charge neutraliza-tion mechanism [18,47,48],which can be applied to determine the optimal dosage according to actual dye concentration directly.Besides,from the insert figures of Fig.7,it is also found that,at the corresponding optimal dosages,the color removal efficien-cies rise until the initial concentration of both dyes increases to approximate 100mg/L,and then gradually reach plateau.It can be explained as follows:on the one hand,greater numbers of insoluble flocculant/dye complexes can be formed more easily through charge neutralization at higher dye concentration,which will aggregate further and form larger flocs through the bridging effect [42];on the other hand,larger amount of big size flocs can further capture small flocs and residual dyes in water through the sweeping effects [9],which will be discussed detailedly in the fol-lowing section.3.2.3.Effect of the grafting ratio of the flocculantsIn addition to the external factors discussed above,the struc-ture factors (inner factor)[5,49]of flocculants,such as the grafting ratio of the graft copolymers,exert much stronger influence on the flocculation performance.Therefore,the flocculation performance of different CMC-g -PAM samples with various G ,as well as their precursors (chitosan and CMC),are brought into comparisonandFig.8.Flocculation of MO solution at pH 4.0(a)and 7GL solution at pH 11.0(b)with various dosages of different flocculants.42Z.Yang et al./Journal of Hazardous Materials 254–255 (2013) 36–45Table 1Flocculation performance of various flocculants.DyesFlocculantsChitosanCMCCMC-g -PAM11CMC-g -PAM13CMC-g -PAM15MODosage opt (mg/L)a 1806080200220Color removal (%)b 90.7888.0792.8686.9886.12D 2b 1.61±0.02 1.64±0.02 1.94±0.02 1.93±0.02 1.89±0.02R 20.8260.8850.9280.9280.917Average l (m)b211.4306.2360.9354.0353.47GLDosage opt (mg/L)a 300120160160180Color removal (%)b 8.7091.7594.9789.8089.10D 2b – 1.59±0.02 1.87±0.01 1.85±0.01 1.83±0.01R 2–0.8840.9630.9640.961Average l (m)b–252.7557.1530.5509.1a Optimal dosage at which the highest color removal occurs.bThe corresponding color removal at the optimal dosage.shown in Fig.8.From Fig.8,the optimal dosages and the highest color removal efficiencies of different flocculants are summarized in Table 1.Besides,the flocs properties are measured through image analysis.The log-log plots of projected area and the characteristic length of various samples are provided in Fig.9.According to Fig.9,the two-dimensional fractal dimensions of flocs are also listed in Table 1.Chitosan exhibits a high efficiency for removal of MO due to its characteristics of cationic polyelectrolytes.However,it shows verylow removal efficiency for 7GL,and no observable floc is formed.This implies that the cationic polyelectrolyte is unable to flocculate the cationic matters.In contrast,all the amphoteric flocculants manifest consider-able flocculation capacities for both anionic and cationic dyes,due to the presence of both cationic and anionic groups on the flocculants.Among the amphoteric flocculants,though CMC has the lowest optimal dosages,its corresponding color removal effi-ciencies are lower than those of CMC-g -PAM11.Additionally,Log lL o g A(a)Log lL o g ALog lL o g A(f)Log lL o g AL o g ALog l(g)(c)Log lL o g ALog lL o g A(h)(d)Log lL o g ALog lL o g A(i)(e)Fig.9.The log-log plots of A and l of various flocs (flocculation of MO (a-e)and 7GL (f-i)with different flocculants at the corresponding optimal conditions).44Z.Yang et al./Journal of Hazardous Materials254–255 (2013) 36–45 itsflocculation windows are smaller than those of CMC-g-PAM11due to the stronger electrostatic repulsion among theCMC/dye complexes.More significantly,after grafting PAM chains,CMC-g-PAM11produces much denserflocs(higher D2)withlarger particle sizes than CMC.It is of remarkably importancefrom the perspective of purification efficiency for rapid separa-tion.Furthermore,it should be considered that the costs of disposaland handling the enormous quantities offlocs usually account for agiant part of the overall costs in water treatment plants.In addition,the limited land available for the disposal of sludge is a strikingworry,besides the cost of chemicals themselves.[24]Since morecompactedflocs are much easier to handle and take up less space,it is reasonable to deduce that the graft copolymer can be morehelpful for reducing the costs.The formation of the much denser flocs is attributed to the PAM graft chains.As theflexible PAM chains on the comb-like copolymer enhance the approachability to dye molecules and small particles,largerflocs can aggregate with other flocs and seize residual dyes together through bridging[42]and sweeping[9]effects,respectively.Thus,more compactedflocs are produced.However,this does not mean that unrestrained PAM on the flocculants(larger G)is expected.As listed in Table1,compared with CMC-g-PAM11,excessive PAM in CMC-g-PAM13and CMC-g-PAM15results in increasing theflocculant dosage,decreasing the color removal efficiency,and worsening theflocs properties.In fact, despite that PAM chains are beneficial for bridging and sweep-ing effects,too many PAM would screen the charges on the CMC backbone according to the ZP results as shown in Fig.5and make disadvantageous influence on charge neutralization effect.There-fore,the grafting ratio of thefinal products should be controlled ina suitable range.3.3.Flocculation mechanismBesides theflocculation performance as investigated above,flocculation mechanism is also notably important for more sci-entific and effective supervision in the design and application of theflocculants.FTIR is commonly applied to study the interaction between theflocculants and dyes to investigate theflocculation mechanism further.[3]Consequently,FTIR of theflocs are illus-trated in Fig.10.In Fig.10(a),bands assigned to sulfonic groups at1033,1110 and1192cm−1in the spectrum of MO shift to1026,1113and 1160cm−1in that offlocs,respectively.On the other hand,the band corresponding to the primary amine groups of CMC-g-PAM11 at906cm−1shift to896cm−1.These changes all indicate that the anionic sulfonic groups in MO react with primary amine groups on the backbone of chitosan-basedflocculant due to charge neutraliza-tion[50,51].Similarly in Fig.10(b),since the characteristic peaks of 7GL at758,1448and1490cm−1shift to746,1456and1495cm−1, respectively,it can be deduced that electrostatic interaction also takes place between theflocculant and7GL.Taking into account the effects of various factors on the flocculation performance and characterization offlocs by FTIR, theflocculation mechanism in this work can be explained and described in Scheme1:(i)charge neutralization predominates at the beginning of theflocculation process,and produces numbers of insolubleflocculant/dye complexes with a rapid speed;(ii)then, through bridging effect due to theflexible PAM graft chains,[22,23] the insoluble complexes aggregate and form larger net-likeflocs; (iii)with the help of enhanced approachability of PAM chains,the largerflocs with net-like structure can further seize residual dyes from water through sweeping effect[9].Finally,the compacted flocs are formed and settle down.350030002500200015001000Wave number (cm-1)Wave number (cm-1)(a)3500300025002000150010001495(b)Fig.10.FTIR spectra of(i)CMC-g-PAM11,(ii)dyes and(iii)flocs produced in the flocculation of MO solution at pH4.0(a)and7GL solution at pH11.0(b)with the corresponding optimalflocculant dosages.4.ConclusionThe tailored amphoteric graftingflocculants CMC-g-PAM were prepared successfully,and proved to be efficient in eliminating both anionic and cationic dyes from water under the corresponding opti-mal conditions.To obtain the better removal efficiency,the PAM grafting ratio of the copolymers should be controlled in a suitable pared with chitosan and CMC,CMC-g-PAM11exhibits higher removal efficiencies,widerflocculation windows and con-siderably denserflocs,which is regarded as a promising material in real application from the perspective of both performance and cost. On the basis offlocculation performance andflocs structure,the flocculation mechanism of CMC-g-PAM for dye removal is proposed in detail.Charge neutralization predominates at the beginning of theflocculation process,and the enhanced approachability of the flexible PAM graft chains contributes much to further bridging and sweeping effects.References[1]A.K.Verma,R.R.Dash,P.Bhunia,A review on chemical coagulation/flocculationtechnologies for removal of colour from textile wastewaters,J.Environ.Man-age.93(2012)154–168.[2]A.Y.Zahrim,C.Tizaoui,N.Hilal,Coagulation with polymers for nanofiltrationpre-treatment of highly concentrated dyes:a review,Desalination266(2011) 1–16.。
A Performance Comparison of Contemporary DRAM ArchitecturesVinodh Cuppu, Bruce Jacob Dept. of Electrical & Computer Engineering University of Maryland, College Park {ramvinod,blj}@ABSTRACT In response to the growing gap between memory access time and processor speed, DRAM manufacturers have created several new DRAM architectures. This paper presents a simulation-based performance study of a representative group, each evaluated in a small system organization. These small-system organizations correspond to workstation-class computers and use on the order of 10 DRAM chips. The study covers Fast Page Mode, Extended Data Out, Synchronous, Enhanced Synchronous, Synchronous Link, Rambus, and Direct Rambus designs. Our simulations reveal several things: (a) current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem; (b) bus transmission speed will soon become a primary factor limiting memory-system performance; (c) the post-L2 address stream still contains significant locality, though it varies from application to application; and (d) as we move to wider buses, row access time becomes more prominent, making it important to investigate techniques to exploit the available locality to decrease access time. 1 INTRODUCTIONBrian Davis, Trevor Mudge Dept. of Electrical Engineering & Computer Science University of Michigan, Ann Arbor {btdavis,tnm}@• Where is time spent in the primary memory system (the memory system beyond the cache hierarchy, but not including secondary [disk] or tertiary [backup] storage)? What is the performance benefit of exploiting the page mode of contemporary DRAMs? For the newer DRAM designs, the time to extract the required data from the sense amps/row caches for transmission on the memory bus is the largest component in the average access time, though page mode allows this to be overlapped with column access and the time to transmit the data over the memory bus. • How much locality is there in the address stream that reaches the primary memory system? The stream of addresses that miss the L2 cache contains a significant amount of locality, as measured by the hit-rates in the DRAM row buffers. The hit rates for the applications studied range 8–95%, with a mean hit rate of 40% for a 1MB L2 cache. (This does not include hits to the row buffers when making multiple DRAM requests to read one cache-line.) We also make several observations. First, there is a one-time tradeoff between cost, bandwidth, and latency: to a point, latency can be decreased by ganging together multiple DRAMs into a wide structure. This trades dollars for bandwidth that reduces latency because a request size is typically much larger than the DRAM transfer width. Page mode and interleaving are similar optimizations that work because a request size is typically larger than the bus width. However, the latency benefits are limited by bus and DRAM speeds: to get further improvements, one must run the DRAM core and bus at faster speeds. Current memory busses are adequate for small systems but are likely inadequate for large ones. Embedded DRAM [5, 19, 37] is not a near-term solution, as its performance is poor on high-end workloads [3]. Faster buses are more likely solutions—witness the elimination of the slow intermediate memory bus in future systems [12]. Another solution is to internally bank the memory array into many small arrays so that each can be accessed very quickly, as in the MoSys Multibank DRAM architecture [39]. Second, widening buses will present new optimization opportunities. Each application exhibits a different degree of locality and therefore benefits from page mode to a different degree. As buses widen, this effect becomes more pronounced, to the extent that different applications can have average access times that differ by 50%. This is a minor issue considering current bus technology. However, future bus technologies will expose the row access as the primary performance bottleneck, justifying the exploration of mechanisms to exploit locality to guarantee hits in the DRAM row buffers: e.g. rowbuffer victim caches, prediction mechanisms, etc. Third, while buses as wide as the L2 cache yield the best memory latency, they cannot halve the latency of a bus half as wide. Page mode overlaps the components of DRAM access when making multiple requests to the same row. If the bus is as wide as a request, oneIn response to the growing gap between memory access time and processor speed, DRAM manufacturers have created several new DRAM architectures. This paper presents a simulation-based performance study of a representative group, evaluating each in terms of its effect on total execution time. We simulate the performance of seven DRAM architectures: Fast Page Mode [35], Extended Data Out [16], Synchronous [17], Enhanced Synchronous [10], Synchronous Link [38], Rambus [31], and Direct Rambus [32]. While there are a number of academic proposals for new DRAM designs, space limits us to covering only existent commercial parts. To obtain accurate memory-request timing for an aggressive out-of-order processor, we integrate our code into the SimpleScalar tool set [4]. This paper presents a baseline study of a small-system DRAM organization: these are systems with only a handful of DRAM chips (0.1–1GB). We do not consider large-system DRAM organizations with many gigabytes of storage that are highly interleaved. The study asks and answers the following questions: • What is the effect of improvements in DRAM technology on the memory latency and bandwidth problems? Contemporary techniques for improving processor performance and tolerating memory latency are exacerbating the memory bandwidth problem [5]. Our results show that current DRAM architectures are attacking exactly this problem: the most recent technologies (SDRAM, ESDRAM, and Rambus) have reduced the stall time due to limited bandwidth by a factor of three compared to earlier DRAM architectures. However, the memory-latency component of overhead has not improved.Copyright © 1999 IEEE. Published in the Proceedings of the 26th International Symposium on Computer Architecture, May 2-4, 1999, in Atlanta GA, USA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966. 1Data rd/wr ras casData In/Out Buffers Column Decoder Sense Amps/Word DriversRASData Transfer Data Transfer Overlap Column Access Row AccessClock & Refresh CktryColumn Address Buffer... Bit Lines ...CASRow Decoder....addressRow Address BufferMemory ArrayAddress Row Address Column Address Column Address Column AddressFigure 1: Conventional DRAM block diagram. The conventional DRAM uses a split addressing mechanism still found in most DRAMs today.DQValid DataoutValid DataoutValid Dataoutcannot exploit this overlap. For cost considerations, having at most an N/2-bit bus, N being the L2 cache width, might be a good choice. Fourth, critical-word-first does not mix well with burst mode. Critical-word-first is a strategy that requests a block of data potentially out of address-order; burst mode delivers data in a fixed but redefinable order. A burst-mode DRAM can thus can have longer latencies in real systems, even if its end-to-end latency is low. Finally, the choice of refresh mechanism can significantly alter the average memory access time. For some benchmarks and some refresh organizations, the amount of time spent waiting for a DRAM in refresh mode accounted for 50% of the total latency. As one might expect, our results and conclusions are dependent on our system specifications, which we chose to be representative of mid- to high-end workstations: a 100MHz 128-bit memory bus, an eight-way superscalar out-of-order CPU, lockup-free caches, and a small-system DRAM organization with ~10 DRAM chips. 2 RELATED WORKFigure 2: FPM Read Timing. Fast page mode allows the DRAM controller to hold a row constant and receive multiple columns in rapid succession.varying several CPU-level parameters such as issue width, cache size & organization, number of processors, etc. This study focuses on the performance behavior of different DRAM architectures. 3 BACKGROUNDBurger, Goodman, and Kagi quantified the effect on memory behavior of high-performance latency-reducing or latency-tolerating techniques such as lockup-free caches, out-of-order execution, prefetching, speculative loads, etc. [5]. They concluded that to hide memory latency, these techniques often increase demands on memory bandwidth. They classify memory stall cycles into two types: those due to lack of available memory bandwidth, and those due purely to latency. This is a useful classification, and we use it in our study. This study differs from theirs in that we focus on the access time of only the primary memory system, while their study combines all memory access time, including the L1 and L2 caches. Their study focuses on the behavior of latency-hiding techniques, while this study focuses on the behavior of different DRAM architectures. Several marketing studies compare the memory latency and bandwidth available from different DRAM architectures [7, 29, 30]. This paper builds on these studies by looking at a larger assortment of DRAM architectures, measuring DRAM impact on total application performance, decomposing the memory access time into different components, and measuring the hit rates in the row buffers. Finally, there are many studies that measure system-wide performance, including that of the primary memory system [1, 2, 9, 18, 23, 24, 33, 34]. Our results resemble theirs, in that we obtain similar figures for the fraction of time spent in the primary memory system. However, these studies have different goals from ours, in that they are concerned with measuring the effects on total execution time ofA Random Access Memory (RAM) that uses a single transistorcapacitor pair for each binary value (bit) is referred to as a Dynamic Random Access Memory or DRAM. This circuit is dynamic because leakage requires that the capacitor be periodically refreshed for information retention. Initially, DRAMs had minimal I/O pin counts because the manufacturing cost was dominated by the number of I/O pins in the package. Due largely to a desire to use standardized parts, the initial constraints limiting the I/O pins have had a long-term effect on DRAM architecture: the address pins for most DRAMs are still multiplexed, potentially limiting performance. As the standard DRAM interface has become a performance bottleneck, a number of “revolutionary” proposals [26] have been made. In most cases, the revolutionary portion is the interface or access mechanism, while the DRAM core remains essentially unchanged. 3.1 The Conventional DRAMThe addressing mechanism of early DRAM architectures is still utilized, with minor changes, in many of the DRAMs produced today. In this interface, shown in Figure 1, the address bus is multiplexed between row and column components. The multiplexed address bus uses two control signals—the row and column address strobe signals, RAS and CAS respectively—which cause the DRAM to latch the address components. The row address causes a complete row in the memory array to propagate down the bit lines to the sense amps. The column address selects the appropriate data subset from the sense amps and causes it to be driven to the output pins. 3.2 Fast Page Mode DRAM (FPM DRAM)Fast-Page Mode DRAM implements page mode, an improvement on conventional DRAM in which the row-address is held constant and data from multiple columns is read from the sense amplifiers. The data held in the sense amps form an “open page” that can be accessed relatively quickly. This speeds up successive accesses to2Data rd/wr ras casData In/Out BuffersQ DClockData Transfer Data Transfer OverlapRASClock & Refresh CktryColumn Decoder Sense Amps/Word Drivers ... Bit Lines...Address Row Address Column Address CASColumn Access Row AccessColumn Address BufferRow Decoder....addressRow Address BufferMemory ArrayDQValid DataoutValid DataoutValid DataoutFigure 5: SDRAM Read Operation Clock Diagram. SDRAM contains a writable register for the request length, allowing high-speed column access. Figure 3: Extended Data Out (EDO) DRAM block diagram. EDO adds a latch on the output that allows CAS to cycle more quickly than in FPM.4 cyclesData Transfer Transfer OverlapRASData Transfer Transfer OverlapAddressColumn AccessCol Col ColRow AccessCASColumn Access Row AccessCommand ACTV/ READ Read Strobe Read TermAddress Row Address Column Address Column Address Column AddressDQBank/ RowDoutDoutDoutDQValid DataoutValid DataoutValid DataoutFigure 6: Rambus DRAM Read Operation. Rambus DRAMs transfer on both edges of a fast clock and can handle multiple simultaneous requests.Figure 4: EDO Read Timing. The output latch in EDO DRAM allows more overlap between column access and data transfer than in FPM.3.5Enhanced Synchronous DRAM (ESDRAM)the same row of the DRAM core. Figure 2 gives the timing for FPM reads. The labels show the categories to which the portions of time are assigned in our simulations. Note that page mode is supported in all the DRAM architectures in this study. 3.3 Extended Data Out DRAM (EDO DRAM)Extended Data Out DRAM, sometimes referred to as hyper-page mode DRAM, adds a latch between the sense-amps and the output pins of the DRAM, shown in Figure 3. This latch holds output pin state and permits the CAS to rapidly de-assert, allowing the memory array to begin precharging sooner. In addition, the latch in the output path also implies that the data on the outputs of the DRAM circuit remain valid longer into the next clock phase. Figure 4 gives the timing for an EDO read. 3.4 Synchronous DRAM (SDRAM)Enhanced Synchronous DRAM is an incremental modification to Synchronous DRAM that parallels the differences between FPM and EDO DRAM. First, the internal timing parameters of the ESDRAM core are faster than SDRAM. Second, SRAM row-caches have been added at the sense-amps of each bank. These caches provide the kind of improved intra-row performance observed with EDO DRAM, allowing requests to the last accessed row to be satisfied even when subsequent refreshes, precharges, or activates are taking place. 3.6 Synchronous Link DRAM (SLDRAM)Conventional, FPM, and EDO DRAM are controlled asynchronously by the processor or the memory controller; the memory latency is thus some fractional number of CPU clock cycles. An alternative is to make the DRAM interface synchronous such that the DRAM latches information to and from the controller based on a clock signal. A timing diagram is shown in Figure 5. SDRAM devices typically have a programmable register that holds a bytesper-request value. SDRAM may therefore return many bytes over several cycles per request. The advantages include the elimination of the timing strobes and the availability of data from the DRAM each clock cycle. The underlying architecture of the SDRAM core is the same as in a conventional DRAM.RamLink is the IEEE standard (P1596.4) for a bus architecture for devices. Synchronous Link (SLDRAM) is an adaptation of RamLink for DRAM, and is another IEEE standard (P1596.7). Both are adaptations of the Scalable Coherent Interface (SCI). The SLDRAM specification is therefore an open standard allowing for use by vendors without licensing fees. SLDRAM uses a packet-based split request/response protocol. Its bus interface is designed to run at clock speeds of 200-600 MHz and has a two-byte-wide datapath. SLDRAM supports multiple concurrent transactions, provided all transactions reference unique internal banks. The 64Mbit SLDRAM devices contain 8 banks per device. 3.7 Rambus DRAMs (RDRAM)Rambus DRAMs use a one-byte-wide multiplexed address/data bus to connect the memory controller to the RDRAM devices. The bus runs at 300 Mhz and transfers on both clock edges to achieve a theoretical peak of 600 Mbytes/s. Physically, each 64-Mbit RDRAM is3Table 1: DRAM Specifications used in simulationsDRAM type FPMDRAM EDODRAM SDRAM ESDRAM SLDRAM RDRAM DRDRAM Size 64Mbit 64Mbit 64Mbit 64Mbit 64Mbit 64Mbit 64Mbit Rows 4096 4096 4096 4096 1024 1024 512 Columns 1024 1024 256 256 128 256 64 Transfer Width 16 bits 16 bits 16 bits 16 bits 64 bits 64 bits 128 bits Row Buffer 16K bits 16K bits 4K bits 4K bits 8K bits 16K bits 4K bits Internal Banks 1 1 4 4 8 4 16 Speed – – 100MHz 100MHz 200MHz 300MHz 400MHz Precharge 40ns 40ns 20ns 20ns 30ns 26.66ns 20/40ns Row Access 15ns 12ns 30ns 20ns 40ns 40ns 17.5ns Column Access 30ns 30ns 30ns 20ns 40ns 23.33ns 30ns Data Transfer 15ns 15ns 10ns 10ns 10ns 13.33ns 10nsTable 2: Time components in primary memory systemComponent Row Access Time Column Access Time Data Transfer Time Data Transfer Time Overlap Description The time to (possibly) precharge the row buffers, present the row address, latch the row address, and read the data from the memory array into the sense amps The time to present the column address at the address pins and latch the value The time to transfer the data from the sense amps through the column muxes to the data-out pins The amount of time spent performing both column access and data transfer simultaneously (when using page mode, a column access can overlap with the previous data transfer for the same row) Note that, since determining the amount of overlap between column address and data transfer can be tricky in the interleaved examples, for those cases we simply call all time between the start of the first data transfer and the termination of the last column access Data Transfer Time Overlap (see Figure 8). Refresh Time Bus Wait Time Bus Transmission Time Amount of time spent waiting for a refresh cycle to finish Amount of time spent waiting to synchronize with the 100MHz memory bus The portion of time to transmit a request over the memory bus to & from the DRAM system that is not overlapped with Column Access Time or Data Transfer Timedivided into 4 banks, each with its own row buffer, and hence up to 4 rows remain active or open1. Transactions occur on the bus using a split request/response protocol. Because the bus is multiplexed between address and data, only one transaction may use the bus during any 4 clock cycle period, referred to as an octcycle. The protocol uses packet transactions; first an address packet is driven, then the data. Different transactions can require different numbers of octcycles, depending on the transaction type, location of the data within the device, number of devices on the channel, etc. Figure 6 gives a timing diagram for a read transaction. 3.8 Direct Rambus (DRDRAM)buffers, compared to 16 full row buffers. A critical difference between RDRAM and DRDRAM is that because DRDRAM partitions the bus into different components, three transactions can simultaneously utilize the different portions of the DRDRAM interface. 4 EXPERIMENTAL METHODOLOGYDirect Rambus DRAMs use a 400 Mhz 3-byte-wide channel (2 for data, 1 for addresses/commands). Like the Rambus parts, Direct Rambus parts transfer at both clock edges, implying a maximum bandwidth of 1.6 Gbytes/s. DRDRAMs are divided into 16 banks with 17 half-row buffers2. Each half-row buffer is shared between adjacent banks, which implies that adjacent banks cannot be active simultaneously. This organization has the result of increasing the row-buffer miss rate as compared to having one open row per bank, but it reduces the cost by reducing the die area occupied by the row1. In this study, we model 64-Mbit Rambus parts, which have 4 banks and 4 open rows. Earlier 16-Mbit Rambus organizations had 2 banks and 2 open pages, and future 256-Mbit organizations may have even more. 2. As with the previous part, we model 64-Mbit Direct Rambus, which has this organization. Future (256-Mbit) organizations may look different.For accurate timing of memory requests in a dynamically reordered instruction stream, we integrated our code into SimpleScalar, an execution-driven simulator of an aggressive out-of-order processor [4]. We calculate the DRAM access time, much of which is overlapped with instruction execution. To determine the degree of overlap, and to separate out memory stalls due to bandwidth limitations vs. latency limitations, we run two other simulations—one with perfect primary memory (zero access time) and one with a perfect bus (as wide as an L2 cache line). Following the methodology in [5], we partition the total application execution time into three components: TP TL and TB which correspond to time spent processing, time spent stalling for memory due to latency, and time spent stalling for memory due to limited bandwidth. In this paper, time spent “processing” includes all activity above the primary memory system, i.e. it contains all processor execution time and L1 and L2 cache activity. Let T be the total execution time for the realistic simulation; let TU be the execution time assuming unlimited bandwidth—the results from the simulation that models cacheline-wide buses. Then TP is the time given by the simulation that models a perfect primary memory system, and we calculate TL and TB: TL = TU – TP and TB = T – TU. In addition, we consider one more component: the degree to which the processor is able to overlap memory access time with processing4x16 DRAM x16 DRAM x16 DRAM x16 DRAM CPU and caches 128-bit 100MHz bus Memory Controller x16 DRAM x16 DRAM x16 DRAM x16 DRAM (a) Configuration used for non-interleaved FPMDRAM, EDODRAM, SDRAM, and ESDRAMCPU and caches128-bit 100MHz busMemory Controller(b) Configuration used for SLDRAM, RDRAM, and DRDRAM...(c) (Strawman) configuration used for parallel-channel SLDRAM & Rambus performanceFigure 7: DRAM bus configurations. The DRAM/bus organizations used in (a) the non-interleaved FPM, EDO, SDRAM, and ESDRAM simulations; (b) the SLDRAM and Rambus simulations; and (c) the parallel-channel SLDRAM and Rambus performance numbers in Figure 11. Due to differences in bus design, the only bus overhead included in the simulations is that of the bus that is common to all organizations: the 100MHz 128-bit memory bus.DRAMDRAMCPU and caches128-bit 100MHz busMemory Controllernization fails to take advantage of some of the newer DRAM parts that can handle multiple concurrent requests. 100MHz 128-bit buses are common for high-end machines, so this is the bus configuration that we model. We assume that the communication overhead is only one 10ns cycle in each direction. The DRAM/bus configurations simulated are shown in Figure 7. For DRAMs other than Rambus and SLDRAM, eight DRAMs are arranged in parallel in a DIMM-like organization to obtain a 128-bit bus. We assume that the memory controller has no overhead delay. SLDRAM, RDRAM, and DRDRAM utilize narrower, but higher speed busses. These DRAM architectures can be arranged in parallel channels, but we study them here in the context of a single-width DRAM bus, which is the simplest configuration. This yields some latency penalties for these architectures, as our simulations require that the controller coalesce bus packets into 128-bit chunks to be transmitted over the 100MHz 128-bit memory bus. To put the designs on even footing, we ignore the transmission time over the narrow DRAM channel. Because of this organization, transfer rate comparisons may also be deceptive, as we are transferring data from eight conventional DRAM (FPM, EDO, SDRAM, ESDRAM) concurrently, versus only a single device in the case of the narrow bus architectures (SLDRAM, RDRAM, DRDRAM). The simulator models a synchronous memory interface: the processor’s interface to the memory controller has a clock signal. This is typically simpler to implement and debug than a fully asynchronous interface. If the processor executes at a faster clock rate than the memory bus (as is likely), the processor may have to stall for several cycles to synchronize with the bus before transmitting the request. We account for the number of stall cycles in Bus Wait Time. The simulator models several different refresh organizations, as described in Section 5. The amount of time (on average) spent stalling due to a memory reference arriving during a refresh cycle is accounted for in the time component labeled Refresh Time. 4.2 InterleavingDRAMDRAMDRAMDRAMDRAMDRAMDRAMDRAMDRAMDRAMDRAMtime. We call this overlapped component TO, and if TM is the total time spent in the primary memory system (the time returned by our DRAM simulator), then TO = TP – (T – TM). This is the portion of TP that is overlapped with memory access. 4.1 DRAM Simulator OverviewThe DRAM simulator models the internal state of the following DRAM architectures: Fast Page Mode [35], Extended Data Out [16], Synchronous [17], Enhanced Synchronous [10, 17], Synchronous Link [38], Rambus [31], and Direct Rambus [32]. The timing parameters for the different DRAM architectures are given in Table 1. Since we could not find a 64Mbit part specification for ESDRAM, we extrapolated based on the most recent SDRAM and ESDRAM datasheets. To measure DRAM behavior in systems of differing performance, we varied the speed at which requests arrive at the DRAM. We ran the L2 cache at speeds of 100ns, 10ns, and 1ns, and for each L2 access-time we scaled the main processor’s speed accordingly (the CPU runs at 10x the L2 cache speed). We wanted a model of a typical workstation, so the processor is eight-way superscalar, out-of-order, with lockup-free L1 caches. L1 caches are split 64KB/64KB, 2-way set associative, with 64-byte linesizes. The L2 cache is unified 1MB, 4-way set associative, writeback, and has a 128-byte linesize. The L2 cache is lockup-free but only allows one outstanding DRAM request at a time; note this orga-For the 100MHz 128-bit bus configuration, the transfer size is eight times the request size; therefore each DRAM access is a pipelined operation that takes advantage of page mode. For the faster DRAM parts, this pipeline keeps the memory bus completely occupied. However, for the slower DRAM parts (FPM and EDO), the timing looks like that shown in Figure 8(a). While the address bus may be fully occupied, the memory bus is not, which puts the slower DRAMs at a disadvantage compared to the faster parts. For comparison, we model the FPM and EDO parts as interleaved as well (shown in Figure 8(b)). The degree of interleaving is that required to occupy the memory data bus as fully as possible. This may actually over-occupy the address bus, in which case we assume that there are more than one address buses between the controller and the DRAM parts. FPM DRAM specifies a 40ns CAS period and is four-way interleaved; EDO DRAM specifies a 25ns CAS period and is twoway interleaved. Both are interleaved at a bus-width granularity. 5 EXPERIMENTAL RESULTSFor most graphs, the performance of several DRAM organizations is given: FPM1, FPM2, FPM3, EDO1, EDO2, SDRAM, ESDRAM, SLDRAM, RDRAM, and DRDRAM. The first two configurations (FPM1 and FPM2) show the difference between always keeping the row buffer open (thereby avoiding a precharge overhead if the next access is to the same row) and never keeping the row buffer open. FPM1 is the pessimistic strategy of closing the row buffer after every access and precharging immediately; FPM2 is the optimistic strategy of keeping the row buffer open and delaying precharge. The dif-5。
Automated Application Performance Monitoring with: IBM Application Performance Analyzer Automation AssistantCustomer’s benefits:Eliminates up to 90% of the manual effort to manage application performance Reduces CPU utilization by definition of threshold valuesAllows customer to focus tuning efforts on jobs with the greatest CPU and runtime savings potential with user defined thresholdsAutomation, integration and proactive schedule of Application Performance Analyzer Integration of IBM Application Performance Analyzer reportsApplication Performance Analyzer measurement results will be analyzed and stored Measurement results can be viewed independent of the started taskAutomatically identifies tuning opportunities across your entire batch and online mainframe environment, including Parallel SysplexISPF user interface and interface to DB2 Performance Data Warehouse (PDWH) Provides IMS preload list for optimization Performance HistoryAutomatic reorganisation of DB2 Performance Data Warehouse Monitoring of program changesISPF Interface of IBM Application Performance Analyzer Automation AssistantThe Business challengeIT organizations are still under constant pressure to get applications into pro-duction as quickly and with as few resources as possible. Mainframe resources are constrained and to consider application performance until it’s too late is a expensive strategy. IT staff has not enough time to find and tune the performance bottlenecks hidden within the thousands of batch jobs and online transactions. Yearly Millions of dollars/euros are wasted because of poorly performing applications. To find and fixing these costly bottlenecks with limited staff and limited time is a challenge.And the solutionAutomated application performance monitoring is made easier and more productive with:IBM Application Performance Analyzer Automation Assistant andIBM Application Performance AnalyzerThis solution package turns Application Performance tuning into a strategic advantage.IBM Application Performance Analyzer Automation Assistant interfaces seamlessly with Application Performance Analyzer to automate the process of measuring applications, prioritizing your tuning efforts and filters large volumes of data for quick identification of tuning opportunities. Customers benefit is to spend more time on analysis and optimization.The tuning cycle:More performance through threshold monitoringIBM Application Performance Analyzer Automation Assistant allows you to set threshold values, based on Sysplex, LPAR and subsystem level for z/OS batch jobs, DB2, IMS, CICS and WebSphere MQ. Setting threshold values enables you to act in good time, which means – find out the candidates which consumes your resources and which should be measured by IBM Application Performance Analyzer.Example: Definition of DB2 Sysplex WatchlistG031-6968。
R ece i ved 10M a rch 2010Perform ance Anal ysis of a Six-bar L inkage Pu m pi ng Unit Based on a Sm all Fl uctuations Load ShapeREN Tao ,SUN W enSchoo l o fM echan i c al Eng i n eer i n g ,X i an Shiyou Un iversity ,X i an 710065,P .R.Ch i n aAbstract :Crank shaft net torque of pump ing units is a volatile alterna ting load,w hile ou t p ut torque of a generalm o tor is basically constant .Thus ,load characteristics o f pump i n g units and m otors are not able to har m on iously m atch each other ;this res u lted in a h i g her outpu t of t h e m otor ,lo w er efficiency,and h i g her ener gy consump tion of t h e pump ing units .A ne w si x -bar linkag e pump ing unit is p resente d accor d ing to m o m ent -chang ing theory .It allo w s to adj u st auto m aticall y fo llo w ing the chang es o f p olis hed rod load,and ach ieves s m all crank s haft curve fluc -t u ation.The ne w pump ing un it i m p rovesm otor efficiency,reduces m otor out p ut po w er ,and saves energy.A ccor d ing to the desi g n sche m e ,kine matics and kinetics m odels of t h e ne w si x -bar linkages pum p i n g unit are built up.An o p-ti m u m design on t h e m ain p erfor m ance para m eters and functional anal y sis w ere p er for m e d.K ey w ords :si x -bar li n kage pum ping un i,t perfor m ance analysis ,opti m ization design ,energy sav ing1 I ntroductionA sucker rod lift is used i n over 90%o f do m estic oil w e lls ;total installed capac ity o f m otors is m ore than 3500MW.Pum ping un it efficiency is qu ite l o w,t h us annual e lectricity consum pti o n is m ore than 100b i-l li o n k W.h.The pum pi n g unit effic i e ncy i n China ,on average ,is on l y 25 96%,and is 30 5%for a deve-loped country [1].Therefore ,t h e study and deve l o p -m ent o f a ne w generation of an e fficient ener gy -saving trans m issi o n m echanis m pum ping unit has a desired advantage .F i g ure 1illustrates the struct u re of a conventional w a l k -bea m pum pi n g un i.t There are three parts that contribute to the energy consum pti o n :the effectivem echanicalw ork to lift o il liquid ,the fricti o n po w er loss by bel,t reducer ,and li n kage and the m otor heat loss [2].Reduc i n g the fricti o n po w er loss o f t h e trans -m issi o n is very li m ited to i m prove the effic iency o f the pu m p i n g un it syste m .Thus ,the genera l v ie w tells t h atm o tor heat loss is a m a i n reason o f the lo w eff-i ciency o f t h e pum p i n g un it syste m.The m a i n reason of h i g h heat loss o f t h e m oto r is that a generalm o tor issuited for drag un ifor m load ;ho w ever ,the net torque of the pum p i n g unit crank shaft is an alternati n g loadwhose load fluctuations are very large .H ence ,i n o r -der to m eet peak load ,the un it had to m atch a large po w erm o tor .This m akesm o tor input po w er less than one -th ird o f rated po w er i n m ost o f the wo r k i n g period [3],na m e l y b i g horse drags s m all cart ,and m o tor l o ad rate ,on the average ,is only 20%~30%[4].M o tor heat l o ss is proporti o na l to fluct u a -tions of the crank net torque ;the s m aller the fluct u a -tions the s m aller the m otor heat l o ss [2].Therefore ,m aking a crank shaft net torque curve i n a s m all fl u c -tuation is the bestw ay to i m prove m otor l o ad rate and reduce heat loss .For m any years ,energy -sav ing pu m p i n g un its o fw a l k -bea m balance w e i g ht dev iating do w nw ar d ,s w i n g rod ,de flecti n g w heel and doub l e donkey ,have been stud -ied and deve l o ped ,and also have been app li e d w ide -ly .Figure 2show s crank shaft net torque curves of four k i n ds o f ener gy -sav i n g pump i n g un its[3].A ssho w n i n Figure 2,the net torque curve fl u ctuati o ns of the four ki n ds o f energy -sav ing pum ping un its are very large ,and a h i g h peak load and a deep va-l ley load are a lso presen ted ,or even negati v e load .Internati onal Jou rnal of Plant Eng i neeri ng and M anage m ent Vo.l 15 N o .3 Sep te m ber 2010A ccordi n g to the net to r que curve ,as sho w n i n F i g ure 2,to select a mo tor ,t h e i n sta ll e d po w er is still greatand m o tor heat l o ss w ill still be quite h i g h .Thus ,four ki n ds of energy -sav ing pum p i n g unit syste m eff-i ciency w ere i n deed i m proved ,bu t the effect is very li m ited.A ne w six -bar linkage pum ping uni,t desi g ned byrocker block li n kage ,applies a un ique and innovative t w ice slider m o m en-t chang ing structure .Gu ide rod sli d i n g i n the r ocker b l o ck track is first m o m en-t chang i n g ,and sli d er slidi n g i n the gu i d e rod track is second .A fter t w ice m o m en-t c hang ing and exact co m-pound counterbalance ,the crank shaft net torquecurve o f the pum ping unit is a s m all fl u ctuationshape ,and t h e above m entioned techn ica lprob le m s of the ex isting ener gy -sav i n g pum ping un its have beenso l v ed .2 Sche m e and principle of the ne w six -bar linkages pu m ping unit2 1 Sche m e structure and featuresF i g ure 3illustrates the genera l structure of the ne w six -bar li n kage pum ping un i.tF i gure 3 Sche m atic d i agra m o f the new si x -barli nkage pump i ng unitThe unit consists of crank ,guide r od ,rocker bar ,sl-i der ,r ocker b l o ck and fra m e .The crank i s h i n ged to the m i d dle o f the gu i d e rod .The front of the gu i d e r od has a sli d er track ,the back of it sli d es o r r o lls in the rocker block track wh ich can be a sliding or ro l-l ing track.The back -end o f t h e rocker bar is hinged to the fra m e ,the m i d dle o f it has a sli d er o r w heel and the front of it li n ks up w ith t h e donkey ,steel w ire r ope ,connector and sucker rod .The m ain fea t u res o f the ne w si x -bar li n kage pu m p i n g un it are as fo ll o w s :(1)The linkages have a large ang le of extre m e pos-i tions .The inherent str uctural characteristics o f the r ock i n g block m ec han is m are that the w a l k -bea m r ock i n g b l o ck si x -bar li n kage has a g reater ang le in ex tre m e positi o ns .The m echan is m m eets t h e pu m p i n g un itw orking conditi o ns requ ired w hich is fast do w n158Internati onal Jou rnal of Plant Eng i neeri ng and M anage m ent Vo.l 15 N o .3 Sep te m ber 2010and slo w lift .(2)Unique and innovative m o m en-t chang i n g is in the structure.W ith the crank ro tating,t h e slider(or w heel)slides i n the track i n t h e front o f t h e guide rod.Th is structure m akes the guide rod f o rce ar m on t h e r ocker bar that is changed w ith t h e sli d er sliding, so that rocker bar m o m ent w ill be changed auto m at-i ca ll y w it h the po lished rod l o ad chang i n g.2 2 Pr i n c i p leW hen the crank rota tes,the guide rod slides i n the rocker block track,and s w i n gs up and down w ith the rocker block together at the fra m e p i v o.t M eanwh ile, t h e gu i d e drives the sli d er to sli d e i n the rocker bar track and the sli d er drives the rocker bar to s w ing up and do wn based on the fra m e p i v o.t The donkey head of the fr on-t end o f the rocker bar drives the steelw ire rope,connecto r and sucker rod to do the up and do wn rec i p rocating m otion.3 Pu m ping unit ki n e matic and kinetic perfor m ance analysisThe pump i n g un it kine m atic and k i n etic perfor m ance para m eters i n cl u de:polished rod displace m en,t po-l ished rod velocity,acceleration,po lished r od l o ad, t h e net to r que of the crank shaft and roo tm ean square tor que o f the crank sha ft and m o tor pow er.3 1 M otion para m etersSee Figure4for t h e ne w six-bar li n kage pum pi n g unitm oti o n diagram.F i gure4 N ew si x-bar li nkage pump i ng unit m oti on diag ramTrans m issi o n m echanis m basic di m ensions i n Figure4 are as fo llo w s:crank leng th R,rocker bar length P1,the d istance be t w een fra m e pivo t O1and slider B1is P,the distance bet w een fra m e p i v ot B and crank ax is centre O is C,the distance bet w een fra m e O1and crank ax is centre O is C1.3 1 1 G eo m etric para m eters calculationAs Figure4sho w s:=a cos RCH=R sin( + )AB=R2+C2-2RC co s( + )=arcsin HAB= -( + )-H1=BB1si n1=arcsin H1PBB1=12[2(C+C1)cos +4(C+C1)2cos2 -4[(C+C1)2-P2]] Pum p i n g unit m ax i m um str oke SS=P1 1maxW here 1max is the m ax i m u m rocker bar s w i n g angle( ).3 1 2 Polished rod angul a r displace m ent= 1max- 13 1 3 Polished rod velocit y(1)Rocker bar angular velocity pp=dd t=- 11=H 1P co s 1H 1=BB1 si n +BB1sinBB1 =(C+C1)BB1cosBB1-(C+C1)cossin =H AB-H ABAB2=si ncosco s =- sinH =R cos( + )AB =CRABsin( + )159Perf or m ance Anal ysis of a S i x-bar L i nkage Pump i ng U n i t B ased on a Sm all Fl u ctuations Load Shap e(2)Po lished rod velocity VV=P1 p3 14 Po lished rod acce leration (1)Rocker bar angular acce leration p p=d pd t=- 11=1cos 1H 1P- 1co s 1co s 1=- 1sin 1H 1=BB1 si n +2BB1 sin +BB1sinBB1 =2(C+C1)BB1 cos +(C+C1)BB1cos -(BB1 )2BB1-(C+C1)cosco s =-( sin + sin )si n =1AB2(H AB-H AB -2AB AB sin )=1cos(sin - co s )H =-R 2sin( + )AB =1AB[CR 2cos( + )-(AB )2](2)Po lished rod accelerati o n aa=P1 p3 2 Polished rod load calculationPo lished rod load can be ca lculated through the fo-l l o w i n g fo r m ula[5]W=Ef r ux x=0+f r L( r- y)W here:u is any position d i s place m ent of the sucker rod(m);E is the pump i n g r od m aterial elastic m od-u l u s(N/m);f r is t h e sucker rod cross-sectional area (m2);L is the pum p subm ergence depth(m); r, y are the sucker rod and o il density(kg/m3).3 3 C rank shaft net torque and root m ean square torque3 3 1 Force analysisThe lo w er pump i n g speed,for si m p licity,the i m pac,t on crank shaft net torque,o f crank w e i g h,t gu i d e rod w e i g ht and m o m ent o f i n ertia,are o m itted.A lso, friction i m pact of the m echan is m k i n e m atic pa ir is no t considered[6].See F i g ure5for t h e sche m atic d i a gra m o f crank force analysis.The forces on the crank i n vo l v e tangential force T,guide rod force F d n and F d t and crank coun-ter ba l a nce w e i g ht Q.The centrifugal fo rce F ql is gen-erated by the crank counterba lance w e i g ht and support reactions R1x and R1y.F i gure5 Sche m atic d i ag ra m o f crank force ana l ys i sThe torques balance equation for the crank shaft cen-ter O isTR+Q r si n( - 1- )=F d t R si n +F dn R co s(1) See F i g ure6for the sche m atic d iagra m o f r ocker bar force analysi s.The forces on the rocker bar have po-l ished rod load W,rocker bar w e i g ht Q2,i n ertia force F y1l,F y1g,sli d er force F d1and F d1m,rocker bar coun-ter ba l a nce w e i g ht Q1,inertia force F y l,F yg and sup-port reacti o ns R2x,R2y.F igu re6 Schema ti c diagra m of rocker bar fo rce ana lysisThe torque balance equation for t h e rocker bar center O1is160Internati onal Jou rnal of Plant Eng i neeri ng and M anage m ent Vo.l15 N o.3 Sep te m ber2010WP 1+Q 1g P 22 p +12Q 2P 1cos 1+14Q 2gP 21 p=F d 1P cos 1+F d 1m P si n 1+Q 1P 2cos 1(2)See F i g ure 7for the sche m a tic d iagra m of gu i d e rod force ana l y sis .The fo rces on the guide rod have crank forces F d n and F d t ,slider force F d 1,rocker block F d n and the friction of slider and rocker b l o ck F d 1m ,F d 2m.F i gure 7 Schem ati c d i agra m o f gu i de rod force analysisThe torque balance equati o n at po i n t B isF d n AB =F d 1BB 1(3)The forces balance equation in t h e t -t d irecti o n isF d t +F d 2m =F d 1m (4)The forces balance equation in t h e n-n direction isF d n =F d 1+F d 2(5)Friction F d 1m 、F d 2mF d 1m =F d 1f (6)F d 2m =F d 2f(7)W here :f stands for friction coefficientSo l v i n g (1)~(7)si m u ltaneous equations ,we fi n dt h e tangen tial fo rce T .3 3 2 C rank shaft net t orqueReducer crank sha ft net to r que is equal to the product of the crank rad i u s R and tangential force T on the crank p i n .M n =R TM n =BB 1R (co s -f sin )ABP (cos 1+f si n 1)WP 1+Q 1gP 22 p +12Q 2P 1cos 1+14Q 2gP 21 p -Q 1P 2cos1-Q r sin ( - 1- )3 3 3 C rank shaft root m ean square torque M eM e =122M 2n d3 4 M otor pow er N eThe m o tor effecti v e output po w er N e isN e =M e9550W here : is pump i n g un it effic i e ncy .4 Exa m ples and results analysis4 1 Exa m ple calcu lationsB ased on the design sche m e ,a se l-f developed pro -gra m wh ich is based on MATLAB too l b ox[7]and isused as the si m ulati o n syste m of the pum ping un it[8]w as app lied to si m ulati o n and opti m izati o n desi g n of the ne w six -bar li n kages pum ping un i.t Exa m ple 1[9]:The act u a l w or k i n g cond itions are as follo w s :m ax-i m um stroke length S =3m,pum ping speed n =9m in -1,sucker rod suspensi o n 889 8m,sub m er -gence depth 217 8m,pum p d ia m eter 70mm,sucker r od dia m eter 22mm,oil p i p e d ia m eter 63 5mm,m o isture conten t 93%.C ontrast pum ping un it is the conventi o na lw alk -bea m pum pi n g un i.t Exa m ple 2[10]:The act u a l w or k i n g cond itions are as follo w s :m ax-i m um stroke length S =5m,pum ping speed n =6m in -1,sucker rod suspension 1000m ,sub m er -gence depth 300m,pum p d ia m eter 70mm ,sucker r od dia m eter 22mm,oil p i p e d ia m e ter 62mm,m o is -ture content 92%.Contrast pum ping unit is the doub -le donkey head pum ping un i.tF i gure 8 C rank shaft net torque curves o f conventiona lw a l k -beam and ne w six -ba r li nkages pu m p i ng unitF i g ure 8sho w s crank shaft net torque curves o f the161Perf or m ance Anal ysis of a S i x -bar L i nkage Pump i ng U n i t B ased on a Sm all Fl u ctuations Load Shap econventionalw alk -bea m pu m p i n g un it and ne w si x -bar li n kages pum ping unit applying Exa m ple 1.F i g ure 9sho w s torque curves of the double donkey head pum-pi n g unit applying Exa m ple 2.F i g ure 10sho w s torque curves of t h e ne w si x -bar linkages pu m p i n g unit ap -plying Exa m p le 2.T ab le 1 The m ain perfor m ance para m etersof the t hree pu m pi ng un itsM odelConventiona l N e wsi x -barli nkagesD oub l e donkey head N ew si x-bar li nkages Stroke length(m )3 0003 0005 0004 999M i n .ve l o city (m /s)-1 392-1 324-1 500-1 423M ax .ve l oc ity (m /s)1 5221 2901 3901 389M i n .accelerati on(m /s 2)-1 659-1 399-1 250-1 069M ax.acce l e ration(m /s 2)2 0241 3991 0301 069M ax .po lished rod l oad (k N )55 88251 68765 77064 904M i n .po li shed rod l oad (k N )20 31419 65430 18031 561M ax .ne t t o rque (kN )31 98314 01631 05023 679M i n .net t o rque (kN )-11 3303 6880 4804 889R oot m ean squa re t o rque (k N m )17 8579 94921 57017 219A ctive power (k W )21 03811 72116 26012 980CLF 1 6641 0271 1001 024A cti ve po w er -sav i ng (%)44 28620 1724 2 R esults analysisA co m parative ana lysis o f the ne w si x -bar li n kages pum pi n g un it opti m iza ti o n desi g n resu lts w ith conven -tionalw alk -bea m and doub le donkey head pu m p i n g u -nit design results[10]are sho w n i n Tab le 1.(1)Perfor m ance para m eters :As sho wn i n Table 1,the m a i n perfor m ance para m eters of the ne w six -bar li n kages pu m ping unit are sign ificantl y better than that of the conventionalw a l k -bea m pum ping uni,t and also better than that of the double donkey head pu m p i n gun i.t A lthough m ax i m u m acceleration 1 069m /s 2is sli g htly h i g her than 1 030m /s 2of the doub le donkey head pum pi n g uni,t m ax i m um acce leration of the ne w six -bar linkages pu m p i n g un it does no t occur at the beg i n n i n g o f t h e upper str oke ,but occurs on the do w n stroke .M ax i m um accelerati o n o f t h e ne w si x -bar li n k -ages i s on l y 1 032m /s 2.(2)Energy -saving :Roo t m ean square torque is an162Internati onal Jou rnal of Plant Eng i neeri ng and M anage m ent Vo.l 15 N o .3 Sep te m ber 2010i m portant index of energy consum ption.As shown i n Table1,a co mparison of the ne w si x-bar linkages pu m p i n g unitw ith a conventi o na lw a l k-bea m pum ping unit and double donkey head pum ping un i,t roo t m ean square torques respective ly,decli n e fro m 17 857and21 570to9 949and17 219.A ctive po w er-sav i n g rates i n crease by44 286%and 20 172%,respective l y.(3)Cyclical variab le l o ad factor-CLF and m o tor l o ad rate:A co m par i s on of the ne w six-bar li n kages pum-pi n g un itw it h the doub le donkey head pu m p i n g uni,t m ax i m um net torques,respecti v ely,dec line by 56 176and23 739%;m i n i m u m net torques,re-specti v e l y,i n crease fro m-11 330and0 485to 3 688and5 145.The CLF val u es,respectively,re-duce fro m1 664and1 1to1 027and1 024;there-f o re t h ese g reatl y reduce the fluct u ation o f crank shaft net torque and m ake t h e net to r que curve o f the crank shaft i n a s m a ll fluctuation shape so that m o tor in-stalled po w er is g reatly reduced and m otor l o ad rate is i n creased and it ach ieves69 706%.M eanwh ile,m o-to r heat l o ss is greatly reduced,pum pi n g unit eff-i ciency is i m proved and un it ener gy consum ption is greatl y reduced.References[1] Zhang X D,Jia G C,Thoughts about the de-velo pm ent of our country pump ing un it.O ilF ield Equ i p m en,t37(1):27,2008(I n Ch-inese)[2] Su D S,Liu X G,LuW X,D i n g J L,Liu S,Q iu X F,Ren Y L,Overvie w of ener gy-savi n gm echanis m o f beam-balanced pump.ChinaPetro l e u m M ach i n ery,29(5):49~50,2001(In Ch i n ese)[3] Zhu J,W ang Z F,SunH F,Zhang D S,En-ergy-s aving princip le and con trastive anal y sisof per for m ance of ener gy-saving beam-pu m p i n gunit.Journa l o f Daq i n g Petroleum I nstitute,29(5):41~43,2005(I n Ch i n ese)[4] Q iW G,Zhu X L,Zhang Y L,A study offuzzy neural net w ork control o f energy-savingof oil pump.Proceedings of the Chinese Soc-iety for E lectrical Eng i n eering,24(6):137,2004(I n Ch i n ese)[5] Y ao C D,D i m ension op ti m i z ation and p er-for m ance si m ulation of link t y ped long strokeoil suction pum p w ith stroke incre m ent byp lanetar y gear.Journa l o fM ach i n e Desi g n,21(1):60,2004(In Ch i n ese)[6] W an B L,Oil production equipm ent desi g nand calculation.Petro leu m I ndustry Press,27~28,1988(I n Chinese)[7] Song Z J,MATLAB app lications in scienti f icco mputation.Tsi n ghua U niversity Press,431~459,2005(I n Ch i n ese)[8] R en Tao,Op ti m um desi g n of pump ing unitsbased on calculation of MATLAB o p ti m izationtoolbox es.O il Field Equ i p m en,t37(4):53~56,2008(I n Ch i n ese)[9] M i n Z B,Liu Y,Luan Q D,Op ti m u m desi g no f conventional pump ing units.Journal ofDaqi n g Petro leum I nstitute,30(1):118~119,2006(I n Chinese)[10] Cu iX M,Luan Q D,H an D Q,Chen P,LuY P,Chen P,S t u dy on the po w er savi n g e ffecto f involute pump ing unit.Jour nal o f Daq i n gPetroleum I nstitute,27(2):102~103,2003(In Ch i n ese)Brief BiographiesREN Tao is an associate pro fessor i n the Schoo l of M echanical Engenderi n g,X i an Sh i y ou Un i v ersity.H is research i n terests are i n petroleum m achinery,es-pecia ll y pum ping un its design.E-m ai:l rentao365@ 126.co mSUN W en is a lecturer i n t h e Schoo l o fM echan i c al Engenderi n g,X i an Sh i y ou U niversity.H is research interests are in li n kages desi g n.163Perf or m ance Anal ysis of a S i x-bar L i nkage Pump i ng U n i t B ased on a Sm all Fl u ctuations Load Shap e。