SAS Data Step 3 Do and Array
- 格式:ppt
- 大小:651.50 KB
- 文档页数:53
SAS循环与数组SAS 循环与数组SAS提供了循环语句以满⾜在编程中需要多次执⾏相同操作的情况。
有时还需要对不同的变量执⾏相同的操作,此时可定义SAS数组,并通过数组名和下标来引⽤这些变量。
1 循环SAS循环语句通常有如下⼏种形式:迭代DO语句、DO WHILE语句和DO UNTIL语句。
1.迭代DO语句迭代DO语句的基本形式如下:DO 索引变量=开始值 <TO 结束值> <BY递进值> <WHILE(表达式)> <UNTIL(表达式)>;… SAS语句…END;其中:·索引变量⽤于指定⼀个变量,若该变量不存在,则创建新变量。
DO语句和END语句之间的语句称为DO组,索引变量的值会控制DO 组的执⾏。
·开始值指定索引变量的初始值,可以是表达式或表达式序列。
DO组的执⾏从“索引变量=开始值”开始。
在循环的第⼀个迭代开始前,对开始值求值。
如果结束值和递进值不存在,那么开始值可能是⼀系列项,则DO语句的形式如下。
DO 索引变量=项1 <, …项n>;项1~项n可以是数字常量、字符常量或变量。
SAS为列表中的每个项执⾏⼀次DO组。
·结束值指定索引变量的结束值。
当开始值和结束值都存在时,DO 组执⾏直到下⾯任意⼀种情况发⽣时循环执⾏结束:索引变量的值超过结束值;DO组中存在指⽰退出循环的语句,例如LEAVE语句、GO TO 语句;如果有WHILE或UNTIL选项,则WHILE之后的表达式不满⾜或 UNTIL之后的表达式满⾜(可参考后⾯对DO UNTIL语句和DO WHILE 语句的介绍)。
·递进值指定⼀个数字,或者是产⽣数字值的表达式,来控制索引变量的增量。
递进值在循环执⾏前进⾏计算。
因此,在DO组内对递进值的修改不会影响循环迭代次数。
每次迭代后,索引变量的值为其当前值的基础上增加递进值。
如果未指定递进值,则索引变量的值增加1。
第十五课用在DATA步的控制语句DA TA步的基本概念、流程和有关文件的操作语句我们前面已介绍。
但我们所介绍的DA TA步中的SAS语句都是按语句出现的次序对每一个观测进行处理。
有时需要对一些确定的观测跳过一些SAS处理语句,或者改变SAS语句的处理次序,就需要用到DA TA步中的控制语句,实现SAS程序的分支、转移和循环等改变处理次序的功能。
SAS系统提供的控制语句从实现功能的角度看主要有以下五大类:●实现循环(DO语句)●实现选择(SELECT语句)●实现分支(IF语句)●实现转移(GOTO语句)●实现连接(LINK语句)一.实现循环(DO语句)循环程序中使用DO语句的主要形式有四种,如下所示:●DO语句的程序格式之一:IF条件表达式THEN DO ;一些SAS语句;END ;●DO语句的程序格式之二:DO 变量=开始值TO 终值BY 步长值;一些SAS语句;END ;●DO语句的程序格式之三:DO WHILE (条件表达式);一些SAS语句;END ;●DO语句的程序格式之四:DO UNTIL (条件表达式);一些SAS语句;END ;DO WHILE 和DO UNTIL语句中的表达式是用括号括起来的。
两种循环程序格式的区别是,对条件表达式的判断位置。
DO WHILE是在循环体的开头,而DO UNTIL是在循环体的结束,也就是说DO UNTIL至少执行循环体中一些SAS语句一次。
下面我们举例来说明DO语句的使用。
1使用循环DO组产生随机数数据集例如,我们需要产生一组均匀分布的随机数流的数据集,程序如下:Data DoRanuni ;seed = 20000101 ;Do I = 1 to 10 by 2 ;X1=ranuni(seed ) ;X2=ranuni(seed ) ;Output ;End ;Proc print data=DoRanuni;Run ;程序中的X1和X2都采用是相同种子变量值SEED=20000101来产生的均匀分布的随机数流。
SAS数组array英⽂参考⼿册Paper 242-30Arrays Made Easy: An Introduction to Arrays and Array Processing Steve First and Teresa Schudrowitz, Systems Seminar Consultants, Inc., Madison, WI ABSTRACTMany programmers often find the thought of using arrays in their programs to be a daunting task, and as a result they often shy away from arrays in their code in favor of better-understood, but more complex solutions. A SAS array is a convenient way of temporarily identifying a group of variables for processing within a data step. Once the array has been defined the programmer is now able to perform the same tasks for a series of related variables, the array elements. Once the basics of array processing are understood arrays are a simple solution to many program scenarios.Throughout this paper we will review the following array topics:1) Why do we need arrays?2) Basic array conceptsa) Definitionb) Elementsc) Syntaxd) Rules3) Using array indexes4) One dimension arrays5) Multi-dimension arrays6) Temporary arrays7) Explicit vs. implicit subscripting8) Sorting arrays9) When to use arrays10) Common errors and misunderstandingsINTRODUCTIONMost mathematical and computer languages have some notation for repeating or other related values. These repeated structures are often called a matrix, a vector, a dimension, a table, or in the SAS data step, this structure is called an array. While every memory address in a computer is an array of sorts, the SAS definition is a group of related variables that are already defined in a data step. Some differences between SAS arrays and those of other languages are that SAS array elements don’t need to be contiguous, the same length, or even related at all. All elements must be character or numeric. WHY DO WE NEED ARRAYS?The use of arrays may allow us to simplify our processing. We can use arrays to help read and analyze repetitive data with a minimum of coding.An array and a loop can make the program smaller. For example, suppose we have a file where each record contains 24 values with the temperatures for each hour of the day. These temperatures are in Fahrenheit and we need to convert them to 24 Celsius values. Without arrays we need to repeat the same calculation for all 24 temperature variables:data;input etc.celsius_temp1 = 5/9(temp1 – 32);celsius_temp2 = 5/9(temp2 – 32);. . .celsius_temp24 = 5/9(temp24 – 32);run;An alternative is to define arrays and use a loop to process the calculation for all variables:data;input etc.array temperature_array {24} temp1-temp24;array celsius_array {24} celsius_temp1-celsius_temp24;do i = 1 to 24;celsius_array{i} = 5/9(temperature_array{i} – 32);end;run;While in this example there are only 24 elements in each array, it would work just as well with hundreds of elements. In addition to simplifying the calculations, by defining arrays for the temperature values we could also have used them in the input statement to simplify the input process. It should also be noted, while TEMP1 is equivalent to the first element, TEMP2 to the second etc., the variables do not need to be named consecutively. The array would work just as well with non-consecutive variable names.array sample_array {5} x a i r d;In this example, the variable x is equivalent to the first element, a to the second etc.Arrays may also be used to provide table lookups. For instance, we have different percentage rates that will be applied to a representative’s sales amounts to determine their commission. The percentage amount may be stored within an array structure, and then the array could provide a location to look up the appropriate percentage for inclusion in the commission calculation.BASIC ARRAY CONCEPTSArrays within SAS are different than arrays in other languages. SAS arrays are another way to temporarily group and refer to SAS variables. A SAS array is not a new data structure, the array name is not a variable, and arrays do not define additional variables. Rather, a SAS array provides a different name to reference a group of variables.The ARRAY statement defines variables to be processed as a group. The variables referenced by the array are called elements. Once an array is defined, the array name and an index reference the elements of the array. Since similar processing is generally completed on the array elements, references to the array are usually found within DO groups.ARRAY STATEMENTThe statement used to define an array is the ARRAY statement.array array-name {n} <$> array-elements <(initial-values)>;The ARRAY statement is a compiler statement within the data step. In addition, the array elements cannot be used in compiler statements such as DROP or KEEP. An array must be defined within the data step prior to being referenced and if an array is referenced without first being defined, an error will occur. Defining an array within one data step and referencing the array within another data step will also cause errors, because arrays exist only for the duration of the data step in which they are defined. Therefore, it is necessary to define an array within every data step where the array will be referencedThe ARRAY statements provides the following information about the SAS array:array-name – Any valid SAS namen – Number of elements within the array$ - Indicates the elements within the array are character type variableslength – A common length for the array elementselements – List of SAS variables to be part of the arrayinitial values – Provides the initial values for each of the array elementsThe array name will be used as part of the array reference when the array elements are used in processing. The name must follow the same rules as variable names, therefore, any valid SAS name is a valid array name. When naming an array it is best to avoid using an array name that is the same as a function name to avoid confusion. While parentheses or square brackets can be used when referencing array elements, the braces {} are used most often since they are not used in other SAS statements. SAS does place one restriction on the name of the array. The array name may not be the same name as any variable on the SAS data set.The elements for an array must be all numeric or all character. When the elements are character it is necessary to indicate this at the time the array is defined by including the dollar sign ($) on the array statement after the referenceto the number of elements. If the dollar sign is not included on the array statement, the array is assumed to be numeric. When all numeric or all character variables in the data set are to be elements within the array, there are several special variables that may used instead of listing the individual variables as elements. The special variables are:_NUMERIC_ - when all the numeric variables will be used as elements _CHARACTER_ - when all the character variables will be used as elements _ALL_ - when all variables on the data set will be used as elements and the variables are allthe same typeN is the array subscript in the array definition and it refers to the number of elements within the array. A numeric constant, a variable whose value is a number, a numeric SAS expression, or an asterisk (*) may be used as thesubscript. The subscript must be enclosed within braces {}, square brackets [], or parentheses (). In our temperature array temperature_array {24} temp1 – temp24;When the asterisk is used, it is not necessary to know how many elements are contained within the array. SAS will the elements.array allnums {*} _numeric_;When it is necessary to know how many elements are in the array, the DIM function can be used to return the count of elements.do i = 1 to dim(allnums);allnums{i} = round(allnums{i},.1);end;In this example, when the array ALLNUMS is defined, SAS will count the number of numeric variables used aselements of the array. Then, in the DO group processing, the DIM function will return the count value as the ending range for the loop.ARRAY REFERENCESWhen an array is defined with the ARRAY statement SAS creates an array reference. The array reference is in the following form:array-name{n}The value of n will be the element’s position within the array. For example, in the temperature array defined abovethe temperature for 1:00 PM is in the variable TEMP13. The array element has been assigned the 13thposition within the array. Therefore, the array reference will be:temperature_array{13}The variable name and the array reference are interchangeable. When an array has been defined in a data step either the variable name or the array reference may be used.Variable Name Array Referencetemp1temperature_array{1}temp2temperature_array{2}temp3temperature_array{3}temp4temperature_array{4}temp5 temperature_array{5}An array reference may be used within the data step in almost any place other SAS variables may be used including as an argument to many SAS functions. If the data step does not have an ARRAY statement to define the array and create the array reference, errors will occur. When an array is referenced within a data step, it must be defined with an ARRAY statement in the same data step.USING ARRAY INDEXESThe array index is the range of array elements. In our temperature example, we are looking at temperatures for each of the 24 hours of the day. The array is defined as:array temperature_array {24} temp1 – temp24;Another variation in SAS arrays from arrays within other languages is, subscripts are 1-based by default where arrays in other languages may be 0-based. When we set the array bounds with the subscript and only specify the number of elements within the array as our upper bound, the lower bound is the default value of 1. In our example, the index begins with the lower bound of 1 and ends with the upper bound of 24.There may be scenarios when we want the index to begin at a lower bound other than 1. This is possible by modifying the subscript value when the array is defined. For this example we are using our same temperature variables. Only this time we only want the temperatures for the daytime, temperatures 6 through 18. In this example the array is defined as:array temperature_array {6:18} temp6 – temp18;The subscript will be written as the lower bound and upper bound of the range, separated by a colon.This technique can simplify array usage by using natural values for the index. Examples of this might be to use a person’s age, or to use a year value to get to the correct element.ONE DIMENSION ARRAYSA simple array may be created when the variables grouped together conceptually appear as a single row. This is known as a one-dimensional array. Within the Program Data Vector the variable structure may be visualized as: temperature_array{1}{2} {3}{4}{5} (24)Temperature Variables temp1Temp2temp3temp4temp5…temp24The array statement to define this one-dimensional array will be:array temperature_array {24} temp1 – temp24;The array has 24 elements for the variables TEMP1 through TEMP24. When the array elements are used within the data step the array name and the element number will reference them. The reference to the ninth element in the temperature array is:temperature_array{9}MULTI-DIMENSION ARRAYSA more complex array may be created in the form of a multi-dimensional array. Multi-dimensional arrays may be created in two or more dimensions. Conceptually, a two-dimensional array appears as a table with rows and columns. Within the Program Data Vector the variable structure may be visualized as:2nd Dimension SALE_ARRAY{r,1}{r,2}{r,3}{r,4}…{r,12} SalesVariables{1,c}SALES1SALES2SALES3SALES4…SALES12Expense Variables {2,c}EXP1EXP2EXP3EXP4…EXP121st DimensionCommissionVariables{3,c}COMM1COMM2COMM3COMM4…COMM12 The array statement to define this two-dimensional array will be:array sale_array {3, 12} sales1-sales12 exp1-exp12 comm1-comm12;The array contains three sets of twelve elements. When the array is defined the number of elements indicates the number of rows (first dimension), and the number of columns (second dimension). The first dimension of the array is the three sets of variable groups: sales, expense, and commission. The second dimension is the 12 values within thegroup. When the array elements of a multi-dimensional array are used within the data step the array name and the element number for both dimensions will reference them. The reference to the sixth element for the expense group in the sales array is:sale_array{2,6}Three and more dimensions can be defined as well. It should be noted that if a PROC PRINT is run, only the actual variable names are displayed instead of the array elements, so it can sometimes be difficult to visualize the logical structure. TEMPORARY ARRAYSA temporary array is an array that only exists for the duration of the data step where it is defined. A temporary array is useful for storing constant values, which are used in calculations. In a temporary array there are no corresponding variables to identify the array elements. The elements are defined by the key word _TEMPORARY_.When the key word _TEMPORARY_ is used, a list of temporary data elements is created is created in the Program Data Vector. These elements exist only in the Program Data Vector and are similar to pseudo-variables.array{1}{2}{3}{4}{5}{6}VariablesValues0.050.080.120.200.270.35One method of setting the values for a temporary array’s elements is to indicate initial values in the ARRAY statement. Therefore, a temporary array of rate values might be defined as follows:array rate {6} _temporary_ (0.05 0.08 0.12 0.20 0.27 0.35);The values for the temporary data elements are automatically retained across iterations of the data step, but do not appear in the output data sets. With a few exceptions, temporary data elements behave in a manner similar to variables. Temporary data elements do not have names, therefore they may only be referenced using the array reference. The asterisk subscript cannot be used when defining a temporary array and explicit array bounds must be specified for temporary arrays.We are now able to apply the constant values defined in the array. For example: when a customer is delinquent in payment of their account balance, a penalty is applied. The amount of the penalty depends upon the number of months that the account is delinquent. Without array processing this IF-THEN processing would be required to complete the calculation:if month_delinquent eq 1 then balance = balance + (balance * 0.05);else if month_delinquent eq 2 then balance = balance + (balance * 0.08);else if month_delinquent eq 3 then balance = balance + (balance * 0.12);else if month_delinquent eq 4 then balance = balance + (balance * 0.20);else if month_delinquent eq 5 then balance = balance + (balance * 0.27);else if month_delinquent eq 6 then balance = balance + (balance * 0.35);By placing the penalty amounts into a temporary array, the code for calculating the new balance can be simplified. The penalty amounts have been stored in the temporary array RATE. The new account balance with the penalty can now be calculated as:array rate {6} _temporary_ (0.05 0.08 0.12 0.20 0.27 0.35);if month_delinquent ge 1 and month_delinquent le 6 thenbalance = balance + (balance * rate{month_delinquent});In addition to simplifying the code, the use of the temporary array also improves performance time.Setting initial values is not required on the ARRAY statement. The values within a temporary array may be set in another manner within the data step.array rateb {6} _temporary_;do i = 1 to 6;rateb{i} = i * 0.5;end;Earlier versions of SAS originally defined arrays in a more implicit manner as follows:array array-name<(index-variable)> <$> array-elements <(initial-values)>; In an implicit array, an index variable may be indicated after the array name. This differs from the explicit array previously discussed where a constant value or an asterisk, as the subscript, denotes the array bounds. When an implicit array is defined, processing for every element in the array may be completed with a DO-OVER statement. The variable specified as the index variable is tied to the array and should only be used as the array index.Example:array item(j) $ 12 x1-x12;do over item;put item;end;When referencing the array, only the array name is specified. Thus, the implied notation, instead of the more explicit mode described earlier. If a program needed to reference by index-variable, that can also be done, but must be specified in a separate statement.array item(j) $ 12 x1-x12;do j=1 to 12;put item;end;Because of the difficulty understanding the more cryptic implicit arrays, explicit arrays are recommended. Implicit array support was left in SAS only to insure older programs would continue to run.SORTING ARRAYSThere are several new experimental call routines in SAS 9.1 that can be used to sort across a series of variables. SORTN can be used to sort numeric variables and SORTQ for character fields. An example of sorting several numeric variables is as follows:data _null_;array xarry{6} x1-x6;set ds1;call sortn(of x1-x6);run;When an observation from ds1 is processed the values brought into the Program Data Vector appear as follows:xarry{1}{2}{3}{4}{5}{6}Variables x1x2x3x4x5x6Values0.270.120.200.080.350.05The SORTN call routine will sort the values of the variables in ascending order and replace the Program Data Vector values with the new sorted values. Thus, after the call routine the Program Data Vector will appear as follows:xarry{1}{2}{3}{4}{5}{6}Variables x1x2x3x4x5x6Values0.050.080.120.200.270.35Because the values for the variables are now different the value selected by the array reference will be affected. For instance, to calculate rate we must add the value in the array reference to 0.75 as follows:rate = 0.75 + xarry{i};If the calculation is completed prior to the SORTN call routine and i is equal to 3, rate would be 0.95. On the other hand, if the same calculation were to be completed after the call routine, rate would be 0.87.It makes sense to use arrays when there are repetitive values that are related and the programmer needs to iterate though most of them. The combination of arrays and do loops in the data step lend incredible power to programming. The fact that the variables in the array do not need to be related or even contiguous makes them even more convenient to use.COMMON ERRORS AND MISUNDERSTANDINGSCommon errors and misunderstandings occur in array processing. This section will review several of these and how they are resolved.INVALID INDEX RANGEIn the processing of array references, it is easy to use an index value from outside the array bounds. A typical instance when this occurs is while looping through the array references with DO group processing.data dailytemp;set tempdata;array temperature_array {24} temp1-temp24;array celsius_array {24} celsius_temp1-celsius_temp24;do until (i gt 24);i = 1;celsius_array{i} = 5 / 9 * (temperature_array{i} – 31);end;i=0;drop i;run;In the scenario a DO-UNTIL loop is used to process the 24 different temperatures. When a DO-UNTIL loop is processed the evaluation of the expression is not completed until the bottom of the loop. After processing for i equal to 24, the expression is still true. Therefore, processing remains within the loop and i will be incremented to 25. Both arrays used in the calculation were defined with only 24 elements. An index value of 25 is greater than the array’s upper bound. As a result, the data error message “Array subscript out of range” will be received.There are two possible resolutions to this scenario. One possibility is to continue using the DO-UNTIL loop, but change the expression to check for I greater than 23:do until (i gt 23);i = 1;celsius_array{i} = 5 / 9 * (temperature_array{i} – 31);end;Another possibility is to modify the loop to DO-WHILE processing. A DO-WHILE loop evaluates the expression first and if the expression is true the processing within the loop will continue. Therefore, the modified code for using aDO-WHILE loop would be:do while (i le 24);i = 1;celsius_array{i} = 5 / 9 * (temperature_array{i} – 31);end;FUNCTION NAME AS AN ARRAY NAMEThe use of a function name as an array name on the ARRAY statement may cause unpredictable results. If a function name is used as an array name, SAS will no longer recognize the name as a function name. Therefore, rather than seeing parenthetical values as function arguments, SAS now sees those values as the index value for an array reference.When a function name is used to define an array name SAS will provide a warning in the log. If the function is then used within the data step error messages may be received as result of SAS attempting to interpret the function arguments as an array reference.8 data dailytemp;9 set tempdata;10 array mean {24} temp1-temp24;WARNING: An array is being defined with the same name as a SAS-supplied or user-defined function. Parenthesized references involving this name will be treated as array references and not function references.11 do i = 1 to 24;12 meantemp = mean(of temp1-temp24);ERROR: Too many array subscripts specified for array MEAN.13 end;14run;15Avoiding function names as array names is a simple resolution in this scenario.ARRAY REFERENCED IN MULTIPLE DATA STEPS, BUT DEFINED IN ONLY ONEEvery data step where an array is referenced it must be defined within the step with an ARRAY statement. A sample program contains the following data steps:data dailytemp;set tempdata;array temperature_array {24} temp1-temp24;array celsius_array {24} celsius_temp1-celsius_temp24;do i = 1 to 24;celsius_array{i} = 5 / 9 * (temperature_array{i} – 31);end;run;data celsius;set dailytemp;do i = 1 to 24;if celsius_array{i} lt 0 then tempdesc = ‘below freezing’;else if celsius_array{i} gt o then tempdesc = ‘above freezing’;end;run;The first data step contains the definition of the array CELSIUS_ARRAY on the second ARRAY statement. References to the array do not cause any issues. The second data step has two references to the array but the array has not been defined within this data step.The resolution to this issue requires the inclusion of the ARRAY statement to define the array within every data step where it is referenced. Therefore, the second data step would be modified as follows:data celsius;set dailytemp;array celsius_array {24} celsius_temp1-celsius_temp24;do i = 1 to 24;if celsius_array{i} lt 0 then tempdesc = ‘below freezing’;else if celsius_array{i} gt o then tempdesc = ‘above freezing’;end;run;CONCLUSIONArrays are not a mystery, but a wonderful tool for Data step programmers that need to reference repetitive values. CONTACT INFORMATIONYour comments and questions are valued and encouraged. Contact the author at:Steve FirstSystems Seminar Consultants, Inc.2997 Yarmouth Greenway DrMadison, WI 53711Work Phone: (608) 279-9964 x302Fax: (608) 278-0065Email: sfirst@/doc/446319a7b0717fd5360cdca6.htmlWeb: /doc/446319a7b0717fd5360cdca6.htmlTeresa SchudrowitzSystems Seminar Consultants, Inc.2997 Yarmouth Greenway DrMadison, WI 53711Work Phone: (608) 279-9964 x309Fax: (608) 278-0065Email: tschudrowitz@/doc/446319a7b0717fd5360cdca6.htmlWeb: /doc/446319a7b0717fd5360cdca6.htmlSAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ? indicates USA registration.Other brand and product names are trademarks of their respective companies.。
About the T utorialSAS is a leader in business analytics. Through innovative analytics, it caters to business intelligence and data management software and services. SAS transforms data into insight which can give a fresh perspective to business.Unlike other BI tools available in the market, SAS takes an extensive programming approach to data transformation and analysis rather than a drag-drop-connect approach. This makes it stand out from the crowd with enhanced control over data manipulation. SAS has a very large number of components customized for specific industries and data analysis tasks.AudienceThis tutorial is designed for all those readers who want to read and transform raw data to produce insights for business using SAS. Readers who aspire to become Data Analysts or Data Scientists can also draw benefits from this tutorial.PrerequisitesBefore proceeding with this tutorial, you should have a basic understanding of Computer Programming terminologies. A basic understanding of any of the programming languages will help you understand the SAS programming concepts. Familiarity with SQL will be an added benefit.Disclaimer & CopyrightCopyright 2016 by Tutorials Point (I) Pvt. Ltd.All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any contents or a part of contents of this e-book in any manner without written consent of the publisher.We strive to update the contents of our website and tutorials as timely and as precisely as possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our website or its contents including this tutorial. If you discover any errors on our website or inthistutorial,******************************************.T able of ContentsAbout the Tutorial (i)Audience (i)Prerequisites (i)Disclaimer & Copyright (i)Table of Contents (ii)1.SAS – Overview (1)Uses of SAS (1)Types of SAS Software (3)Libraries in SAS (4)2.SAS – Environment (5)Download SAS University Edition (5)The SAS Environment (14)3.SAS – User Interface (15)SAS Main Window (15)Code Autocomplete (16)Program Execution (16)Program Log (17)Program Result (17)Program Tabs (18)4.SAS – Program Structure (22)SAS Program Structure (22)DATA Step (22)PROC Step (23)The OUTPUT Step (23)The Complete SAS Program (24)Program Output (24)5.SAS – Basic Syntax (26)SAS Statements (26)SAS Variable Names (26)SAS Data Set (27)SAS File Extensions (27)Comments in SAS (28)6.SAS – Data Sets (29)SAS Built-In Data Sets (29)Importing External Data Sets (31)7.SAS – Variables (35)SAS Variable Types (35)Use of Variables in SAS Program (36)Using the Variables (37)8.SAS – Strings (39)Declaring String Variables (39)String Functions (40)Accessing Array Values (44)Using the OF operator (44)Using the IN operator (45)10.SAS – Numeric Formats (47)Reading Numeric formats (47)Displaying Numeric formats (48)11.SAS – Operators (50)Arithmetic Operators (50)Logical Operators (51)Comparison Operators (52)Minimum/Maximum Operators (53)Concatenation Operator (54)Operators Precedence (55)12.SAS – Loops (56)Flow Diagram (56)SAS – DO Index Loop (57)SAS – DO WHILE Loop (58)SAS – DO UNTIL Loop (59)13.SAS – Decision Making (60)SAS – IF Statement (61)SAS − IF THEN ELSE Statement (63)SAS − IF THEN ELSE IF Statement (65)SAS − IF-THEN-DELETE Statement (66)14.SAS − Functions (68)Function Categories (68)Mathematical Functions (68)Date and Time Functions (69)Character Functions (70)Truncation Functions (71)Miscellaneous Functions (72)15.SAS − Input Methods (74)List Input Method (74)Named Input Method (75)Column Input Method (76)Formatted Input Method (77)16.SAS – Macros (79)Macro Variables (79)Local Macro Variable (80)Macro Programs (81)Commonly Used Macros (82)Macro % RETURN (83)Macro % END (84)SAS Date Informat (86)SAS Date output format (87)SAS DATA SET OPERATIONS (88)18.SAS − Read Raw Data (89)Reading ASCII (Text) Data Set (89)Reading Delimited Data (90)Reading Excel Data (91)Reading Hierarchical Files (92)19.SAS − Write Data Sets (94)PROC EXPORT (94)Writing a CSV file (95)Writing a Tab Delimited File (96)20.SAS − Concatenate Data Sets (97)21.SAS − Merge Data Sets (103)Data Merging (103)22.SAS − Subsetting Data Sets (107)Subsetting Variables (107)Subsetting Observations (109)23.SAS − Sort Data Sets (111)Reverse Sorting (112)Sorting Multiple Variables (113)24.SAS − Format Data Sets (115)Using PROC FORMAT (116)25.SAS − SQL (118)SQL Create Operation (118)SQL Read Operation (119)SQL SELECT with WHERE Clause (120)SQL UPDATE Operation (121)SQL DELETE Operation (123)26.SAS − ODS (124)Creating HTML Output (124)Creating PDF Output (126)Creating TRF(Word) Output (127)27.SAS − Simulations (129)SAS DATA REPRESENTATION (130)28.SAS − Histograms (131)Simple Histogram (131)Histogram with Curve Fitting (132)29.SAS − Bar Charts (134)Simple Bar chart (134)Stacked Bar chart (135)Clustered Bar chart (136)30.SAS − Pie Charts (138)Simple Pie Chart (138)Pie Chart with Data Labels (140)Grouped Pie Chart (142)31.SAS − Scatter Plots (144)Simple Scatterplot (144)Scatterplot with Prediction (145)Scatter Matrix (147)32.SAS − Boxplots (148)Simple Boxplot (148)Boxplot in Vertical Panels (150)Boxplot in Horizontal Panels (150)SAS BASIC STATISTICAL PROCEDURE (152)33.SAS ─ Arithmetic Mean (153)Mean of a Dataset (153)Mean of Select Variables (154)Mean by Class (155)34.SAS ─ Standard Deviation (156)Using PROC MEANS (156)Using PROC SURVEYMEANS (157)Using BY Option (159)35.SAS ─ Frequency Distributions (161)Single Variable Frequency Distribution (161)Multiple Variable Frequency Distribution (163)Frequency Distribution with Weight (164)36.SAS ─ Cross Tabulations (165)Cross Tabulation of 3 Variables (166)Cross Tabulation of 4 Variables (167)37.SAS ─ T-tests (169)Paired T-test (170)Two Sample T-test (172)38.SAS ─ Correlation Analysis (173)Correlation Between All Variables (175)Correlation Matrix (176)39.SAS ─ Linear Regression (177)40.SAS ─ Bland-Altman Analysis (180)Enhanced Model (182)41.SAS ─ Chi-Square (184)Two-Way Chi-Square (186)42.SAS ─ Fisher's Exact Tests (188)Applying Fisher Exact Test (188)43.SAS ─ Repeated Measure Analysis (190)44.SAS — One Way Anova (193)Applying ANOVA (193)Applying ANOVA with MEANS (194)45.SAS ─ Hypothesis Testing (196)1.SASSAS stands for Statistical Analysis Software. It was created in the year 1960 by the SAS Institute. From 1st January 1960, SAS was used for data management, business intelligence, Predictive Analysis, Descriptive and Prescriptive Analysis etc. Since then, many new statistical procedures and components were introduced in the software.With the introduction of JMP (Jump) for statistics, SAS took advantage of the graphical user interface (GUI) which was introduced by the Macintosh. Jump is basically used for applications like Six Sigma, designs, quality control and engineering and scientific analysis. SAS is platform independent which means you can run SAS on any operating system either Linux or Windows. SAS is driven by SAS programmers who use several sequences of operations on the SAS datasets to make proper reports for data analysis.Over the years SAS has added numerous solutions to its product portfolio. It has solution for Data Governance, Data Quality, Big Data Analytics, Text Mining, Fraud management, Health science etc. We can say that SAS has a solution for every business domain.To have a glance at the list of products available you can visit SAS Components. Uses of SASSAS is basically worked on large datasets. With the help of SAS software, you can perform various operations on data. Some of the operations include:∙Data management∙Statistical analysis∙Report formation with perfect graphics∙Business planning∙Operations research and project management∙Quality improvement∙Application development∙Data extraction∙Data transformation∙Data updation and modificationIf we talk about the components of SAS, then more than 200 components are available in SAS.T ypes of SAS SoftwareLet us now understand the different types of SAS software.∙Windows or PC SAS∙SAS EG (Enterprise Guide)∙SAS EM (Enterprise Miner i.e. for Predictive Analysis)∙SAS Means∙SAS StatsWe use Windows SAS in large organizations and also in training institutes. A few organizations also use Linux but there is no graphical user interface so you have to write code for every query. In Window SAS, there are a lot of utilities available that help the programmers and also reduce the time of writing the codes.A SaS Window has 5 parts.SASLibraries in SASLibraries are storage locations in SAS. You can create a library and save all the similar programs in that library. SAS provides you the facility to create multiple libraries. A SAS library is only 8 characters long.There are two types of libraries available in SAS:2.SASSAS Institute Inc. has released a free SAS University Edition. This provides a platform for learning SAS programming. It provides all the features that you need to learn in BASE SAS programming which in turn enables you to learn any other SAS component.The process of downloading and installing SAS University Edition is very simple. It is available as a virtual machine which needs to be run on a virtual environment. You need to have virtualization software already installed in your PC before you can run the SAS software. In this tutorial, we will be using VMware. The following are the details of the steps to download, setup the SAS environment and verify the installation.Download SAS University EditionSAS University Edition is available for download at the URL SAS University Edition. Please scroll down to read the system requirements before you begin the download. The following screen appears on visiting this URL.Setup virtualization softwareScroll down on the same page to locate the installation step 1. This step provides the links to get the suitable virtualization software. In case you already have any one of these software installed in your system, you can skip this step.Quick start virtualization softwareIn case you are completely new to the virtualization environment, you can familiarize yourself with it by going through the following guides and videos available as step 2. You can skip this step in case you are already familiar.Download the Zip fileIn step 3, you can choose the appropriate version of the SAS University Edition compatible with the virtualization environment you have. It downloads as a zip file with the name similar to unvbasicvapp__9411005__vmx__en__sp0__1.zipUnzip the Zip fileThe zip file above needs to be unzipped and stored in an appropriate directory. In our case, we have chosen the VMware zip file which shows the following files after unzipping.Start the VMware player (or workstation) and open the file which ends with an extension. vmx. The following screen appears. Please notice the basic settings like memory and hard disk space allocated to the vm.Click the Power on this virtual machine alongside the green arrow mark to start the virtual machine. The following screen appears.The following screen appears when the SAS vm is in the state of loading after which the running vm gives a prompt to go to a URL location that will open the SAS environment.Starting SAS studioOpen a new browser tab and load the above URL (which differs from one PC to another). The following screen appears indicating the SAS environment is ready.SASThe SAS EnvironmentOn clicking the Start SAS Studio, we get the SAS environment which by default opens in the visual programmer mode as shown in the following screenshot.We can also change it to the SAS programmer mode by clicking on the dropdown.We are now ready to write the SAS Programs.3.SASSAS Programs are created using a user interface known as SAS Studio. In this chapter, we will discuss the various windows of SAS User Interface and their usage.SAS Main WindowThis is the window you see on entering the SAS environment. The Navigation Pane is to the left. It is used to navigate various programming features. The Work Area is to the right. It is used for writing the code and executing it.Code AutocompleteThis feature helps in getting the correct syntax of the SAS keywords and also provides link to the documentation for the keywords.Program ExecutionThe execution of code is done by pressing the run icon, which is the first icon from left or the F3 button.Program LogThe log of the executed code is available under the Log tab. It describes the errors, warnings or notes about the program’s execution. This is the window where you get all the clues to troubleshoot your code.Program ResultThe result of the code execution is seen in the RESULTS tab. By default, they are formatted as html tables.End of ebook previewIf you liked what you saw…Buy it from our store @ https://。
sasdata语句(实用版)目录1.SAS 数据步的基本概念2.SAS 数据步的语法结构3.SAS 数据步的应用实例正文1.SAS 数据步的基本概念SAS(Statistical Analysis System,统计分析系统)是一种广泛应用于数据处理、分析和建模的软件。
在 SAS 中,数据步(data step)是用于读取、整理和操作数据的基本步骤。
数据步是 SAS 程序的核心部分,它可以从各种数据源获取数据,对数据进行清洗和整理,并将处理后的数据存储到 SAS 数据集中。
2.SAS 数据步的语法结构SAS 数据步的基本语法结构如下:```data 数据集名;```其中,“数据集名”是自定义的数据集名称,可以由字母、数字和下划线组成。
在数据步中,可以使用各种 SAS 函数和语句对数据进行处理。
以下是一个简单的 SAS 数据步示例:```data example;infile "data.csv" dlm="," firstobs=2;input var1 var2 var3;output;```在这个示例中,我们从名为“data.csv”的 CSV 文件中读取数据,并将数据存储到名为“example”的 SAS 数据集中。
3.SAS 数据步的应用实例下面是一个 SAS 数据步应用实例,用于从 CSV 文件中读取数据并计算数据的平均值:```data example;infile "data.csv" dlm="," firstobs=2;input var1 var2 var3;compute mean = var1 + var2 + var3;output;```在这个示例中,我们首先从 CSV 文件中读取数据,然后使用“compute”语句计算数据的平均值,并将结果存储到名为“mean”的新变量中。
最后,我们将处理后的数据输出到 SAS 数据集中。
一SAS语句SAS语言程序由数据步和过程步组成。
数据步用来生成数据集、计算、整理数据,过程步用来对数据进行分析、报告。
SAS语言的基本单位是语句,每个SAS语句一般由一个关键字(如DATA,PROC,INPUT,CARDS,BY)开头,包含SAS名字、特殊字符、运算符等,以分号结束。
SAS关键字是用于SAS语句开头的特殊单词,SAS语句除了赋值、累加、注释、空语句以外都以关键字开头。
SAS名字在SAS程序中标识各种SAS成分,如变量、数据集、数据库,等等。
SAS 名字由1到8个字母、数字、下划线组成,第一个字符必须是字母或下划线。
SAS关键字和SAS 名字都不分大小写。
二SAS表达式SAS数据步程序中的计算用表达式完成。
表达式把常量、变量、函数调用用运算符、括号连接起来得到一个计算结果。
常量SAS常量主要有数值型、字符型两种,并且还提供了用于表达日期、时间的数据类型。
数值型:数值型常数可以用整数、定点实数、科学计数法实数表示。
如:12,-7.5,2.5E-10字符型:字符型常数为两边用单撇号或两边用双撇号包围的若干字符。
如:'Beijing',"Li Ming","李明"日期、时间:日期型常数是在表示日期的字符串后加一个字母d大小写均可),中间没有空格。
时间型常数是在表示时间的字符串后加一个字母t。
日期时间型常数在表示日期时间的字符串后加字母dt。
日期型:'13JUL1998'd时间型:'14:20't日期时间型:'13JUL1998:14:20:32'dt因为SAS是一种数据处理语言,而实际数据中经常会遇到缺失值,比如没有观测到数值,被访问人不肯答,等等。
SAS中用一个单独的小数点来表示缺失值常量。
变量SAS变量的基本类型有两种:数值型和字符型。
日期、时间等变量存为数值型(实际记录为距1960/01/01的天数)。
SAS语言概述SAS提供了一种完善的编程语言。
类似于计算机的高级语言,SAS用户只需要熟悉其命令、语句及简单的语法规则就可以做数据管理和分析处理工作。
因此,掌握SAS编程技术是学习SAS的关键环节。
在SAS中,把大部分常用的复杂数据计算的算法作为标准过程调用,用户仅需要指出过程名及其必要的参数。
这一特点使得SAS编程十分简单。
一、SAS程序SAS程序是SAS语句的有序集合。
SAS程序可分为两部分:1.数据步(DATAStep)2.过程步(PROCStep)在一份SAS程序中,通常有一个数据步和一个过程步.有时可能有多个数据步和多个过程步。
数据步是为过程步准备数据的且将准备好的数据放在数据集中,过程步是把指定数据集中的数据计算处理并输出结果。
二、SAS语句SAS语句是以SAS关键词开头、后跟SAS名、特殊字符或操作符组成,并且以分号结尾。
一个SAS语句规定了一种操作或为系统提供某些信息。
1.SAS关键字关键字是系统已赋于确定意义的一个单词。
在SAS语言里,除了赋值、求和、注释等语句外,多数语句是以其关键字作为开头的。
如DATA、FORMA,PROC、INFILE等都是相应语句的关键字。
2.SAS名在SAS语句中,可能出现的SAS名有变量名,数据集名,输出格式名,过程名,选择项名,数组名和语句标号名。
还有SAS对文件的一种特殊称呼叫逻辑库名和文件逻辑名。
SAS名是字母或下划线开头后跟宇母或数宇或下划线的字符串,字符个数不多于八个。
空格和特殊宇符(如$,@,#等)不许在SAS名中出现。
另外,SAS保留了一些特殊的变量名并赋于特定的意义,这些变量都是以下划线开头和结尾,如N_表示数据步已执行过的次数。
三、语句描述记号(1)关键字用英文书写,在写程序时,这些词必须严格以给出的拼写形式书写。
(2)[ ]内的项是可选项。
(3)…表示有多个项目四、SAS数据集“SAS数据集(DataSet)”是SAS中一种特定的数据文件。
Paper CC-17Arrays – Data Step EfficiencyHarry Droogendyk, Stratia Consulting Inc., Lynden, ONABSTRACTArrays are a facility common to many programming languages, useful for programming efficiency. SAS® data step arrays have a number of unique characteristics that make them especially useful in enhancing your coding productivity. This presentation will provide a useful tutorial on the rationale for arrays and their definition and use. INTRODUCTIONMost of the row-wise functionality required for data collection, reporting and analysis is handled very efficiently by the various procedures available within SAS. Column-wise processing is a different story. Data may be supplied in forms that require the user to process across the columns within a single row to achieve the desired result. Column-wise processing typically requires the user to employ the power and flexibility of the SAS data step.Even within the SAS data step different methods with varying degrees of coding and execution efficiency can be employed to deal with column-wise processing requirements. Data step arrays are unparalleled in their ability to provide efficient, flexible coding techniques to work with data across columns, e.g. a single row of data containing 12 columns of monthly data.This paper will cover array definition, initialization, use and the efficiencies afforded by their use.DEFINING ARRAYSArray definition in SAS is quite different than most other computer languages. In other languages, arrays are typically a temporary container that holds single or multi-dimensional data that has convenient indexing capabilities which may be used to iterate over the data items within, or to directly reference specific items. SAS’s _temporary_ array structures are most like the definitions found in other languages, but SAS provides so much more array functionality than found in other languages.The basic array definition statement is as follows:ARRAY array_name (n) <$> <length> array-elements <(initial-values)>;- array_name any valid SAS variable name- (n) number of array elements ( optional in some cases )- $ indicates a character array- length length of each array element ( optional in some cases )- array-elements _temporary_ or SAS variable name(s) or variable lists ( optional )- initial-values array initial valuesSAS arrays must contain either numeric or character data, not a mixture of both.TEMPORARY ARRAY DEFINITIONTemporary arrays provide convenient tabular data stores which persist only while the data step is executing. The “variables” created by the temporary array are automatically dropped from the output dataset. Temporary arrays are often used to act as accumulators, store factors etc.. when the items being referenced are not required to be stored with the data. Temporary arrays are automatically retained across data step iterations, i.e. the element values are not set to missing at the top of the data step like most other data step variables.array factors(12,3) _temporary_ ;array desc(12) $15_temporary_ ;The first statement defines a two-dimension numeric array. The second, a 12 element character array, each element 15 bytes in length.PDV VARIABLE ARRAY DEFINITIONNon-temporary data step arrays demonstrate the real flexibility and versatility of SAS arrays. These types of arrays can refer to a series of variables already present in the program data vector ( PDV ) or even create variables while permitting those variables to be referenced via array structures. Some examples will illustrate the possibilities: Consider an existing dataset containing four columns of quarterly metrics:In the data step below, the data step compiler recognizes the columns in the incoming dataset ( SET statement ) and adds them to the PDV. The array statement references the four quarterly variables and makes them available within an array structure simply called Q. It is not necessary to define the number of elements in the array in this case, SAS will determine the size of the array by the number of array elements defined by the qtr1 – qtr4 variable list. Note the variable name displayed when the array elements are PUT into the log. The array statement has not created any additional variables, but has merely made the pre-existing PDV variables available via a convenient array structure. The array definition can be seen as a logical structure overlaying the physical variables.set quarterly_data;array q qtr1-qtr4;do _i = 1to4;put cust_id= @13 q(_i)=;end;cust_id=1 qtr1=185cust_id=1 qtr2=971cust_id=1 qtr3=400cust_id=1 qtr4=260cust_id=2 qtr1=922cust_id=2 qtr2=970cust_id=2 qtr3=543In the following data step, the ARRAY statement will create qtr1, qtr2, qtr3, qtr4 variables in the PDV as per the array definition since they don’t already exist.data quarterly_data;length cust_id 8;array q 8 qtr1-qtr4;do cust_id = 1to7;do _i = 1to4;q(_i) = ceil(ranuni(1) * 1000);end;output;end;drop _: ;run;The following statement also defines the variables, but uses the array name to provide the prefix of the PDV variable name and suffixes the name with consecutive numbers. Again, qtr1, qtr2, qtr3, qtr4 will be created at compile time.array qtr(4) 8;The “QTR” array name might look familiar – that’s because it’s also a built-in SAS function. The compiler takes note of that fact and warns the user that the QTR() function is unavailable in this data step because the array reference will take precedence.366 array qtr(4) 8;NOTE: The array qtr has the same name as a SAS-supplied or user-defined function. Parenthesesfollowing this name are treated as array references and not function references.The first example showed the use of the qtr1 – qtr4 variable list to define the array elements. In the same way, any variable list syntax may be used:array desc _character_; * character vars already defined in PDV;array amts _numeric_; * numeric vars already defined in PDV;Arrays may be made up of variables of different lengths. Note the variable list below which specifies that all character variables between flag_x and desc ( as defined by the PDV order ) be part of the LST array though each character variable has a different length and they are not contiguous.length flag_x $10amt 8name $20dol 4desc $60 ;array lst flag_x -character- desc;An instance of a numeric array made up of various numeric variables, i.e. without a common prefix like “qtr”: array various discount refund pre_tax after_tax payable received;By default data step arrays are indexed starting at 1, i.e. the first element in the array above would be referenced by various(1). The array size, i.e. the number of elements, may be derived using the DIM() function. However, it’s often advantageous to begin indexing an array with a value other than 1, e.g. in the event we wanted to use the age of school-age children ( 5-18 years old ) directly as an index into an array: An example will make this more clear. Note the use of the various functions which describe the array:- DIM() which provides the number of array elements- LBOUND() the index value of the lower array boundary- HBOUND() the index value of the upper array boundarydata_null_;array ages (5:18) _temporary_;d = dim(ages);l = lbound(ages);h = hbound(ages);put'Ages array has ' d ' elements, bounded by ' l ‘ and ‘ h;do until(done);set sashelp.class end = done;ages(age) + 1;end;do i = lbound(ages) to hbound(ages);put i 2. @7 ages(i);end;stop;run;The log results follow, showing the array element ranges and the number of children in each age range.Ages array has 14 elements, bounded by 5 and 185 .<snip>10 .11 212 513 314 415 416 117 .18 .Multi-dimensional arrays are defined and dealt with in a very similar manner. Note the extra parameter in the DIM() function to reference to the 2nd dimension. In similar fashion, both LBOUND() and HBOUND() may use the 2nd function parameter to refer to specific dimension as well. Log results follow.data_null_;array ages(5:18,2) _temporary_;array gender(2) $1_temporary_ ('M','F');d1 = dim(ages); * number of elements in 1st dimension;d2 = dim(ages,2); * number of elements in 2nd dimension;l = lbound(ages);h = hbound(ages);put'Ages array has ' d1 ' x ' d2 ' elements, bounded by ' l ' and ' h;do until(done);set sashelp.class end = done;if sex = 'M'then s = 1; else s = 2;ages(age,s) + 1;end;put'Age' @6'Male' @12'Female';do i = lbound(ages) to hbound(ages);put i @7 ages(i,1) 1. @15 ages(i,2);end;stop;run;Ages array has 14 by 2 elements, bounded by 5 and 18Age Male Female<snip>10 . .11 1 112 3 213 1 214 2 215 2 216 1 .17 . .18 . .ARRAY VALUE INITIALIZATIONArrays may be initialized at compile time with specific values. Since arrays based on PDV variables are really just a method of referencing “normal” variables, the elements in PDV arrays will be set to missing at the top of the data step as per the usual data step rules.It’s helpful to assign initial values to the array at compile time since that saves a step during execution. The first two array initialization statements below accomplish the same thing, i.e. (12*0) is the equivalent of twelve zeroes: data_null_;array mths1 (12) _temporary_(12*0);array mths2 (12) _temporary_(000000000000);array mth_qtr (12) _temporary_(3*13*23*33*4);do i = 1to12;put mths1(i)= @14 mths2(i)= @27 mth_qtr(i)=;end;run;mths1[1]=0 mths2[1]=0 mth_qtr[1]=1mths1[2]=0 mths2[2]=0 mth_qtr[2]=1mths1[3]=0 mths2[3]=0 mth_qtr[3]=1mths1[4]=0 mths2[4]=0 mth_qtr[4]=2<snip>mths1[11]=0 mths2[11]=0 mth_qtr[11]=4mths1[12]=0 mths2[12]=0 mth_qtr[12]=4EFFICIENCIESUsing arrays in the data step will result in coding efficiency, more maintainable code and at times, data-driven code. Data steps that make good candidates for array processing are those that contain “wallpaper code”, i.e. a repeating pattern of code where multiple values of the same type are processed, e.g. month1 to month12 values, or calculations involving factors that can be indexed by another data value ( e.g. age ).In addition, the SAS summary functions such as SUM() are able to receive an entire array as an input parameter, thus simplifying the code.STRIPPING WALLPAPERThe most common use for arrays is the ability to iterate over a series of similar variables, performing the same operation on each, without having to code each variable manually.data monthly_sales;set sales;if month1 = .then month1 = 0; else month1=round(month1,.1);if month2 = . then month2 = 0; else month2=round(month2,.1);....if month11 = . then month11 = 0; else month11=round(month11,.1);if month12 = .then month12 = 0; else month12=round(month12,.1);run;The use of an array reduces the lines of code from 12 to 5:data monthly_sales;set sales;array m month1-month12;do _i = 1to dim(m);if m(_i) = .then m(_i) = 0; else m(_i) = round(m(_i),.1);end;drop _: ;run;It may be necessary to assign a different factor to a metric depending on a third data item. e.g. dosage variance by age. Rather than coding a series of IF / THEN / ELSE or SELECT / WHEN statements, an array can be employed to lookup the dosage values.data dosages;set sashelp.class;array dosages (5:18) _temporary_ ( .5.5.6.71 1.2 1.3 1.51.6 1.82.1 2.3 2.73.0);dosage = dosages(age);run;proc print data = dosages noobs;var name age dosage;run;Name Age dosageAlfred 14 1.8Alice 13 1.6Barbara 13 1.6Carol 14 1.8Henry 14 1.8James 12 1.5Jane 12 1.5Janet 15 2.1Jeffrey 13 1.6John 12 1.5Joyce 11 1.3Judy 14 1.8Louise 12 1.5Mary 15 2.1Philip 16 2.3Robert 12 1.5Ronald 15 2.1Thomas 11 1.3William 15 2.1FUNCTIONS WITH ARRAYSMany SAS functions can process an entire array with a very simple coding reference, somewhat akin to how variable lists can be processed by these same functions. In the example below in the second data step, the SUM() function is coded two different ways and there’s no real advantage to using an array.data sales;do cust_id = 1to10;array mth(12) 8; * defines mth1 - mth12 to PDV ;do _i = 1to12;mth(_i) = ceil(ranuni(1) * 1000);end;output;end;drop _: ;run;data monthly_sales;set sales;array m mth1-mth12;annual_sum_array = sum(of m(*) );annual_sum_list = sum(of mth: );drop mth: ;run;Results of the array and variable list summation are identical:But, if the monthly sales variables were named JAN, FEB, MAR etc… and mixed through the dataset among other numeric variables, there would be no way to reference them via a variable list and arrays would be the only method of referencing them efficiently. e.g. array m jan feb mar … dec;SAS also provides a call routine to sort array contents in ascending order, SORTC and SORTN for character and numeric arrays respectively.data_null_;array name(8) $10('George''nancy''Susan''george''James''Robert''Rob''Fred');call sortc(of name(*));put +3 name(*);run;Fred George James Rob Robert Susan george nancySPECIAL ARRAY FUNCTIONSIt’s all well and good to define an array that contains a number of character or numeric variables in the PDV. But, sometimes when the array is processed, it’s very helpful to be able to identify the characteristics of the specific variables underlying each array element. There’s a number of functions available, some listed below, that surface the particulars of the variable underlying the array reference. See SAS Online Documentation for a complete list: VNAME() variable nameVLABEL() label of the variableVFORMAT() format applied to variableVLENGTH() defined variable lengthThe real-world example below uses arrays and the VFORMAT function to create a fixed length text file.Define the skeleton dataset containing the variables and their formats:data dialer_skeleton;format num $12. name $20.expiry $10.msg $15.rep $3.;array txtfile _character_;call symputx('dim',dim(txtfile)); * capture array length;stop;run;Define the test data:data dialer_input;length num $10 name $20msg $15 rep $2;num = '9052941234'; name = 'George Smith'; expiry = '31Oct2011'd;msg = '1st call'; rep = '01'; output;num = '5196471212'; name = 'Susan Jones'; expiry = '01Nov2011'd;msg = 'Second call';rep = '01'; output;num = '4162235678'; name = 'Sam Brown'; expiry = '02Nov2011'd;msg = 'New customer';rep = '2'; output;num = '6137219988'; name = 'Brian Tait'; expiry = '29Oct2011'd;msg = 'Attriter'; rep = '2'; output;run;Create output text data. Note the use of the VFORMAT function to retrieve the formatted length to ensure each data item will be written with the correct length to line up properly in the fixed-width text output.%macro txt;data dialer ( drop = _: );if 0 then set dialer_skeleton; * set variable lengths/formats;array txtfile _character_; * make available in array;set dialer_input ( rename = ( expiry = _exp ));expiry = put(_exp,yymmddd10.);length record $100;record = putc(txtfile(1),vformat(txtfile(1)))%do i = 2%to &dim; * loop limit from skeleton step;|| putc(txtfile(&i),vformat(txtfile(&i)))%end;;put record;run;%mend txt;%txtResults:9052941234 George Smith 2011-10-311st call 015196471212 Susan Jones 2011-11-01Second call 014162235678 Sam Brown 2011-11-02New customer 26137219988 Brian Tait 2011-10-29Attriter 2The flexibility of this method comes into play when the text output requirements change. The only code that need be modified is the dialer_skeleton data step. Since the data step that actually creates the text output has no hard-coded variables or lengths, any changes will be picked up automatically. Less maintenance = less errors.CONCLUSIONSAS data step arrays are simple to define and provide many opportunities for coding efficiency. Common physical data characteristics may be overlaid with logical array definitions which permit the use of looping constructs and functions to write efficient, flexible and maintainable code. Arrays have other uses as well which have been covered by other fine SAS user group papers, e.g. data step transpose. See and search for “array” for additional reading.REFERENCESSteve First and Teresa Schudrowitz, “Arrays Made Easy: An Introduction to Arrays and Array Processing”/proceedings/sugi30/242-30.pdfSAS Institute 2009, “SAS 9.2 Language Reference: Dictionary”,/documentation/cdl/en/lrdict/62618/HTML/default/titlepage.htmCONTACT INFORMATIONYour comments and questions are valued and encouraged. Contact the author at:Harry DroogendykStratia Consulting Inc.www.stratia.caSAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.Other brand and product names are trademarks of their respective companies.。