SAS讲义-第九课
- 格式:doc
- 大小:49.00 KB
- 文档页数:5
第一课SAS软件的基本概念一.What is SAS?SAS - Statistics analysis system✧它是一个组合软件系统,由多个功能模块组合而成;✧其基本部分是 BASE SAS模块;✧BASE SAS 模块是 SAS 系统的核心:承担着主要的数据管理任务,管理用户使用环境,进行用户语言的处理,调用其他 SAS 模块和产品。
✧具有灵活的功能扩展接口和强大的功能模块:SAS/STAT(统计分析模块)SAS/GRAPH(绘图模块)SAS/QC(质量控制模块)SAS/ETS(经济计量学和时间序列分析模块)SAS/OR(运筹学模块)SAS/IML(交互式矩阵程序设计语言模块)SAS/FSP(快速数据处理的交互式菜单系统模块)SAS/AF(交互式全屏幕软件应用系统模块)我们的主要学习内容✧SAS/Base✧SAS/Stat✧SAS/Graph二.SAS 窗口系统Editor 窗口:编程窗口Log 窗口:显示程序运行过程Output 窗口:显示运行结果Explorer 窗口:用于管理 SAS 文件。
它可以◆查看SAS 文件◆产生外部文件的快捷路径◆产生新的SAS 文件◆打开SAS 文件看其内容◆移动,复制和删除文件◆打开相关的窗口,例如新的library 窗口Results 窗口:三.Base SAS 的内容●SAS language●SAS procedures●Macro facility●Data step debugger●Output delivery system四.SAS语言的基本要素✧data set options - SAS数据集选项✧SAS system options - SAS系统选项✧formats and informats - 输出格式和输入格式✧functions✧Statements - SAS语句五.SAS数据的结构SAS 数据由行和列组成。
一行成为一个观测值(observation), 一列成为一个变量(variable)。
第一章1.缺省情况下,快捷键F1, F3, F4, F5, F6, F7, F8, F9和Ctrl+E的作用是什么?F1帮助,F3 end,F4 recall调回提交的代码,F5 激活编辑器窗口,F6激活日志窗口,F7键激活输出窗口,F8 提交,F9键查看所有功能键功能,Ctrl+E键清除窗口内容。
2.缺省情况下SAS系统的五个功能窗口及各自的作用是什么?怎样定义激活这些窗口的快捷键?1)资源管理器窗口。
作用:访问数据的中心位置。
2)结果窗口。
作用:对程序的输出结果进行浏览和管理。
3)增强型编辑器窗口。
作用:比普通编辑窗口增加了一些功能,如定义缩写,显示行号,对程序段实现展开和收缩等。
4)日志窗口。
作用:查看程序运行信息。
5)输出窗口。
查看SAS程序的输出结果。
3.怎样增加和删除SAS工具?使用菜单栏中的工具=>定制=>“定制”标签实现工具的增加和删除。
4.SAS日志窗口的信息构成。
提交的程序语句;系统消息和错误;程序运行速度和时间。
5.在显示管理系统下,切换窗口和完成各种特定的功能等,有四种发布命令的方式:即,在命令框直接键入命令;使用下拉菜单;使用工具栏;按功能键。
试举例说明这些用法。
如提交运行的命令。
程序写完后,按F3键或F8键提交程序,或单击工具条中的提交按纽,或在命令框中输入submit命令,或使用菜单栏中的运行下的提交,这样所提交的程序就会被运行。
6.用菜单方式新建一个SAS逻辑库。
在菜单栏选择工具—》新建逻辑库出现如图所示界面。
在名称中输入新的逻辑库名称。
在引擎中根据数据来源选择不同的引擎,如果只是想建立本机地址上的一个普通的SAS数据集数据库,可以选择默认。
然后选中“启动时启用”复选框,在逻辑库信息中,单击路径后的“浏览”按钮,选择窗口可以不填,单击确定产生一个新的逻辑库。
7.说明下面SAS命令的用途并举例:keys,dlglib,libname,dir,var,options,submit,recall.Keys激活功能键的设定窗口。
TUTORIAL 9: Random and Mixed effects ANOVAI. Random and Mixed effects ANOVATo date we have been concerned with constructing ANOVA models in which the factors have a predetermined set of levels. This type of model is often referred to as a fixed effects model. These models are appropriate for studies where our interest centers on the effects of the specific factor levels chosen, and they are the only levels that are considered relevant.Often the factor levels can be seen as a sample from a population of potential factor levels and inference is designed about the population of levels. In this situation the factor is considered to be a random variable and using a fixed-effects model is no longer appropriate. In the two-way ANOVA case, there are a variety of possible scenarios that we can potentially study. When both factors are random, we use a random effects model. When one factor is random and the other is fixed we use a mixed-effects model.PROC GLM and PROC MIXED are two procedures in SAS designed for analyzing random and mixed effects models. For more complicated models PROC MIXED is the most appropriate choice. However, for the models we will be analyzing in this class PROC GLM will suffice.To fit random and mixed effects models in PROC GLM we need to introduce a new statement. The RANDOM statement in PROC GLM declares that one or more effects in the model should be considered random rather than fixed. The general form of PROC GLM for fitting random and mixed-effects models is,PROC GLM data = data set;CLASS variables; /* Identifies the variables that divide the data setinto groups.MODEL response variable = explanatory variables;RANDOM random variables; /* Identifies the random effectsRUN;All the other statements are used in a similar manner as they were in the one-way and two-way ANOVA case. The only difference is the inclusion of the RANDOM statement.Suppose A and B are factors in a two-way ANOVA model and y specifies the response variable. As before, the MODEL statement is used to define the format of the model. To define a model without an interaction term we write:MODEL y = A B;To instead define a model with an interaction term we write:MODEL y = A | B;The RANDOM statement is used to define which parts of the model are considered random.Consider the model with an interaction term included. The statement:RANDOM A | B;specifies that A, B and A*B are all random. This tells SAS to use a two-way random effects model. The statement:RANDOM A A*B;specifies that A and A*B are random, while B is fixed. This tells SAS to use a mixed-effects model.Once you include the RANDOM statement in your code, SAS will automatically calculate the expected mean squares and use this as a guide for choosing the appropriate tests. Including the option TEST after the RANDOM statement performs hypothesis tests for each effect specified in the model, using appropriate error terms as determined by the expected mean squares.Ex. A process engineer thinks the material used for the motor casing and the supply source of the bearings used in the motor both have an impact on the amount of motor vibration (in microns). He performs an experiment in which casings made of steel, aluminum and plastic were constructed using bearings from 5 randomly selected sources.Source1 2 3 4 5Steel 13.1 13.2 16.3 15.8 13.7 14.3 15.7 15.8 13.5 12.5Aluminum 15.0 14.8 15.7 16.4 13.9 14.3 13.7 14.2 13.4 13.8Plastic 14.0 14.3 17.2 16.7 12.4 12.3 14.4 13.9 13.2 13.1In this problem the material used for the casing can be considered fixed. The source of the bearings is a random effect as we are interested in studying all possible sources. We will therefore assume that the source and interaction terms are both random and use a mixed effects model.The code for performing a mixed effects analysis can be written as follows:DATA vibration;INPUT case $ source vib @@;DATALINES;S 1 13.1 S 1 13.2 S 2 16.3 S 2 15.8 S 3 13.7S 3 14.3 S 4 15.7 S 4 15.8 S 5 13.5 S 5 12.5A 1 15.0 A 1 14.8 A 2 15.7 A 2 16.4 A 3 13.9A 3 14.3 A 4 13.7 A 4 14.2 A 5 13.4 A 5 13.8P 1 14.0 P 1 14.3 P 2 17.2 P 2 16.7 P 3 12.4P 3 12.3 P 4 14.4 P 4 13.9 P 5 13.2 P 5 13.1;RUN;PROC GLM;CLASS case source;MODEL vib = case | source;RANDOM source source*case / TEST;RUN;This program gives rise to the following output:The GLM ProcedureDependent Variable: vibSum ofSource DF Squares Mean Square F Value Pr > F Model 14 48.98466667 3.49890476 31.43 <.0001Error 15 1.67000000 0.11133333Corrected Total 29 50.65466667R-Square Coeff Var Root MSE vib Mean0.967032 2.324662 0.333667 14.35333Source DF Type I SS Mean Square F Value Pr > Fcase 2 0.70466667 0.35233333 3.16 0.0713source 4 36.67466667 9.16866667 82.35 <.0001case*source 8 11.60533333 1.45066667 13.03 <.0001Source DF Type III SS Mean Square F Value Pr > Fcase 2 0.70466667 0.35233333 3.16 0.0713source 4 36.67466667 9.16866667 82.35 <.0001case*source 8 11.60533333 1.45066667 13.03 <.0001Source Type III Expected Mean Squarecase Var(Error) + 2 Var(case*source) + Q(case)source Var(Error) + 2 Var(case*source) + 6 Var(source) case*source Var(Error) + 2 Var(case*source)Tests of Hypotheses for Mixed Model Analysis of VarianceSource DF Type III SS Mean Square F Value Pr > Fcase 2 0.704667 0.352333 0.24 0.7899source 4 36.674667 9.168667 6.32 0.0135Error 8 11.605333 1.450667Error: MS(case*source)Source DF Type III SS Mean Square F Value Pr > Fcase*source 8 11.605333 1.450667 13.03 <.0001 Error: MS(Error) 15 1.670000 0.111333The relevant tests can be found below the table with the expected mean squares. Studying the output, the interaction between the material and the source of bearings is a significant source of variation (F = 13.03, p-value < 0.0001). The different casings by themselves do not appear to affect the amount of vibration (F = 0.24, p-value = 0.7899), though the interpretation of this test is clouded by the significant interaction. In addition, the main effect corresponding to the source is significant (F = 6.32, p-value < 0.0135).II. Repeated Measures ANOVARepeated measures designs are common in many settings (e.g. behavioral and life sciences). This type of design utilizes the same subject for each of the treatments under study. A repeated measures study may either involve several treatments or only a single treatment that is evaluated at different time points. When several measurements are taken on the same subject, the measurements tend to be correlated and this correlation needs to be accounted for in the model. Repeated measures ANOVA can be viewed as a generalization of the paired t-test. The model assumes that every pair of measurements has the same correlation coefficient across subjects and that the variance and covariances are homogenous across time (or treatment). This specific structure for the covariance is referred to as compound symmetry. Compound symmetry is usually not a realistic assumption when dealing with measurements over time, as measurements closer together are typically more highly correlated than measurements that are far apart. There exist adjustments (e.g. Greenhouse-Geisser and Huynh-Feldt) that can be used to correct the observed significance levels for unequal correlation coefficients.Prior to performing repeated measures ANOVA in SAS it is important that the data is organized in the appropriate format. Each row should include the repeated measurements from one subject. The first column should contain the subject identifier, and the remaining columns should contain the repeated measurements of the response variable (e.g. y1, y2,….yn if there is a total of n measurements on subject i). For example the following data contains four repeated measures on 3 subjects:120 24 28 28215 18 23 24318 19 24 23We can read this data into a SAS data set using the following code:DATA mydata;INPUT subject y1-y4;DATALINES;1 20 24 28 282 15 18 23 243 18 19 24 23;RUN;Note that in the INPUT statement we can refer to the four repeated measures as y1-y4, rather than listing all four variable names separately. This is especially convenient as the number of measures increases.In order to use PROC GLM to fit a repeated measures model, we need to include a new statement and make some edits to the MODEL statement. The REPEATED statement asks SAS to provide a number of appropriate tests in the output for testing hypothesis concerning repeated measures data.An example of a one-way repeated measure model for the data set described in the example above can be written as follows:PROC GLMMODEL y1-y4= \NOUNI;REPEATED factor_name;RUN:This code differs from a standard one-way ANOVA in a few important ways. As there is no specific group identification for the subjects, there is no need for a CLASS statement.In the MODEL statement all of the repeated measures of the response variable are written on the left-hand side of the model equation. In addition, there are no explanatory variables on the right-hand side of the model statement. The option NOUNI tells SAS not to run separate ANOVA models for each of the 4 repeated measures. This minimizes the amount of unnecessary output. Finally, the REPEATED statement tells SAS to provide the appropriate tests and to refer to the repeated factor as factor_name. Note that the name that you ultimately choose is arbitrary, and is meant to help guide you in reading the output. The name should not be the same as any variable name that already exists in the data set being analyzed and should conform to the usual conventions of SAS variable names.The REPEATED statement has a variety of options. The PRINTE option produces output regarding the partial correlation coefficients, as well as a test of the hypothesis that the covariance structure of the repeated measurements is such that the p-values from the F-test are valid. In particular, you can check that the compound symmetry assumption is valid by studying the partial correlation coefficients (the correlation between different measurements should be approximately equal) and the results of the test of sphericity.Ex. In a wine-judging competition, four different wines of the same vintage were judged by six experienced judges. The order of the wine presentation was randomized for each judge and the wines were tasted blindly. Each wine was scored on a 40-point scale – the higher the score, the better the wine.Judge Wine1 2 3 4120 24 28 28215 18 23 24318 19 24 23426 26 30 30522 24 28 26619 21 27 25Is there a significant difference in the mean score between the wines?The following code can be used to answer this question:DATA winedata;INPUT judge score1-score4;DATALINES;1 20 24 28 282 15 18 23 243 18 19 24 234 26 26 30 305 22 24 28 266 19 21 27 25;RUN;PROC GLM data = winedata;MODEL score1-score4 = /NOUNI;REPEATED wine /PRINTE;RUN;This code gives rise to the following output:The GLM ProcedureRepeated Measures Analysis of VariancePartial Correlation Coefficients from the Error SSCP Matrix / Prob > |r| DF = 5 score1 score2 score3 score4score1 1.000000 0.929670 0.924946 0.8404180.0072 0.0082 0.0362score2 0.929670 1.000000 0.975453 0.9216350.0072 0.0009 0.0090score3 0.924946 0.975453 1.000000 0.8943960.0082 0.0009 0.0161score4 0.840418 0.921635 0.894396 1.0000000.0362 0.0090 0.0161E = Error SSCP Matrixwine_N represents the contrast between the nth level of wine and the last wine_1 wine_2 wine_3wine_1 22.0000 10.0000 8.0000wine_2 10.0000 8.0000 6.0000wine_3 8.0000 6.0000 7.3333Partial Correlation Coefficients from the Error SSCP Matrix of theVariables Defined by the Specified Transformation / Prob > |r|DF = 5 wine_1 wine_2 wine_3wine_1 1.000000 0.753778 0.6298370.0835 0.1802wine_2 0.753778 1.000000 0.7833490.0835 0.0653wine_3 0.629837 0.783349 1.0000000.1802 0.0653Sphericity TestsMauchly'sVariables DF Criterion Chi-Square Pr > ChiSqTransformed Variates 5 0.1106961 8.1924883 0.1459Orthogonal Components 5 0.3515625 3.8910912 0.5652MANOVA Test Criteria and Exact F Statistics for the Hypothesis of no wine EffectH = Type III SSCP Matrix for wineE = Error SSCP MatrixS=1 M=0.5 N=0.5Statistic Value F Value Num DF Den DF Pr > FWilks' Lambda 0.02314815 42.20 3 3 0.0059Pillai's Trace 0.97685185 42.20 3 3 0.0059Hotelling-Lawley Trace 42.20000000 42.20 3 3 0.0059Roy's Greatest Root 42.20000000 42.20 3 3 0.0059Repeated Measures Analysis of VarianceUnivariate Tests of Hypotheses for Within Subject EffectsAdj Pr > FSource DF Type III SS Mean Square F Value Pr > F G - G H - Fwine 3 184.0000000 61.3333333 57.50 <.0001 <.0001 <.0001Error(wine) 15 16.0000000 1.0666667Greenhouse-Geisser Epsilon 0.6038Huynh-Feldt Epsilon 0.9270Studying the partial correlation coefficients does not show any great departures from compound symmetry. The sphericity tests confirm these results (p-value = 0.5652). To test for treatment effects we find that F = 57.50 (p-value < 0.0001). Hence, we can reject the null hypothesis of no difference in treatment means. The mean scores for the four wines differ.。
SAS讲义-第九课一、Do循环1、大家回看第四课的例11,可以发现Do循环应该要和End搭配使用。
下面都是可行的Do语句。
do i=5;do i=2,3,5,7;do i=1 to 100;do i=1 to 100 by 2;do i=100 to 1 by -1;do i=1 to 5,7 to 9;do i=’01jan99’d,’25feb99’d;do i=’01jan99’d to ‘01jan2000’d by 1;例1 产生1,2,9,8 的序列。
data a;do i=1,2,9,8;output;end;run;思考:若output放在end之后,或者去掉output,那会怎样呢?例2 产生1-20的奇数序列。
data a;do i=1 to 20 by 2;output;end;run;例3 求1-100的自然数之和。
data a;do i=1 to 100 ;n+i;output;end;run;例4 求1-100的自然数的平方和。
data a;do i=1 to 100 ;n+i**2;output;end;run;例5用do循环处理数组。
(下课还会深入说数组)data a(drop=i);array day{7} d1-d7;do i=1 to 7;day{i}=i+1;end;run;2、do while语句。
先判断while表达式,若成立则执行,否则推测循环。
例6data a;n=0;do while (n<5);n+1;output;end;run;例7 计算1加到100的过程中,第一个大于等于2000的数。
data a;do i=1 to 100 while (n<2000) ;n+i;output;end;run;3、do until 语句。
先执行,直到until的表达式为真,推出循环。
4、do over 语句。
我们到下课再说。
二、select语句。
Select-when 相当于一般编程语言里面的swich-case语句。
直接看例子。
例8 data a;set resdat.class;x=0;obs=_n_;select(obs);when(2) x=2;when(3,7)x=5;otherwise x=3;end;run;三、return语句。
Return语句可以让系统返回到data步开头。
例9return语句与if-then共用data a;input x y z;if x=y then return;s=x+y;cards;1 2 32 2 3;run;分析:默认情况下,系统将每条观测读入到数据集a,可是当x=y时,return 语句被执行,也就是s=x+y没有被执行,那么s即为空值。
思考,如果在s=x+y 语句后面增加一条output,又会怎样呢?此时要考虑一旦有了output,run本身的输出功能消失了。
例10 return语句与Do循环语句共同使用。
data a;input a b c @@;do x=1 to 5;ax=a*x;if ax>b then return;output;end;cards;1 2 3 2 6 8;run;分析:由于Do循环包含了output,一个输入数据行本来可以生产5条观测,可是在循环中增加了return。
也就是,当ax>b,系统执行return语句,没有执行output语句。
练习10某人将500元存入在某银行的存款账户,这个账户利率为7%,每年计息一次。
用累加语句和循环语句计算三年末此人能得多少钱。
下面的练习仅供大家参考学习,不作练习。
练习11 given the SAS data set SASDATA.TWO:X Y-- --5 23 15 6The following SAS program is submitted:data SASUSER.ONE SASUSER.TWO OTHER;set SASDATA.TWO;if X eq 5 then output SASUSER.ONE;if Y lt 5 then output SASUSER.TWO;output;run;What is the result?A.data set SASUSER.ONE has 5 observationsdata set SASUSER.TWO has 5 observationsdata set WORK.OTHER has 3 observationsB.data set SASUSER.ONE has 2 observationsdata set SASUSER.TWO has 2 observationsdata set WORK.OTHER has 1 observationsC.data set SASUSER.ONE has 2 observationsdata set SASUSER.TWO has 2 observationsdata set WORK.OTHER has 5 observationsD. No data sets are output. The DATA step fails execution due to syntax errors. Answer: A练习12Consider the following data step:data WORK.NEW;set WORK.OLD(keep=X);if X < 10 then X=1;else if X >= 10 AND X LT 20 then X=2;else X=3;run;In filtering the values of the variable X in data set WORK.OLD, what new value would be assigned to X if its original value was a missing value?A. X would get a value of 1.B. X would get a value of 3.C. X would retain its original value of missing.D. This step does not run because of syntax errors.Answer: A练习13 The following SAS program is submitted:data WORK.SALES;do Year=1 to 5;do Month=1 to 12;X + 1;end;end;run;How many observations are written to the WORK.SALES data set?A. 0B. 1C. 5D. 60Answer: B /*假如在X+1后添加output,那又选什么?*/练习14 The following SAS program is submitted:data WORK.OUTDS;do until(Prod GT 6);Prod + 1;end;run;What is the value of the variable Prod in the output data set?A. . (missing)B. 6C. 7D. Undetermined, infinite loop.Answer: C练习15 Given the SAS data set WORK.PRODUCTS:ProdId Price ProductType Sales Returns------ ----- ----------- ----- -------K12S 95.50 OUTDOOR 15 2 B132S 2.99 CLOTHING 300 10R18KY2 51.99 EQUIPMENT 25 5 3KL8BY 6.39 OUTDOOR 125 15 DY65DW 5.60 OUTDOOR 45 5 DGTY23 34.55 EQUIPMENT 67 2 The following SAS program is submitted:data WORK.REVENUE(drop=Sales Returns Price);set WORK.PRODUCTS(keep=ProdId Price Sales Returns);Revenue=Price*(Sales-Returns);run;How many variables does the WORK.REVENUE data set contain?A. 2B. 3C. 4D. 6Answer: AThe following SAS program is submitted:练习16 data WORK.PRODUCTS;Prod=1;do while(Prod LE 6);Prod + 1;end;run;What is the value of the variable Prod in the output data set?A. 6B. 7C. 8D. . (missing numeric)Answer: B。