sas习题大全带程序编码资料
- 格式:doc
- 大小:475.50 KB
- 文档页数:38
sas练习题(打印版)### SAS练习题(打印版)#### 一、基础数据操作1. 数据导入- 题目:使用SAS导入一个CSV文件,并列出前5个观测值。
- 答案:使用`PROC IMPORT`过程导入数据,并用`PROC PRINT`展示前5个观测。
2. 数据筛选- 题目:筛选出某列数据大于50的所有观测。
- 答案:使用`WHERE`语句进行筛选。
3. 数据分组- 题目:根据某列数据对数据集进行分组,并计算每组的均值。
- 答案:使用`PROC MEANS`过程和`BY`语句进行分组和计算。
4. 数据排序- 题目:按照某列数据的升序或降序对数据集进行排序。
- 答案:使用`PROC SORT`过程进行排序。
#### 二、描述性统计分析1. 单变量分析- 题目:计算某列数据的均值、中位数、标准差等统计量。
- 答案:使用`PROC UNIVARIATE`过程进行单变量描述性统计分析。
2. 频率分布- 题目:计算某列数据的频数和频率分布。
- 答案:使用`PROC FREQ`过程进行频率分布分析。
3. 相关性分析- 题目:计算两列数据的相关系数。
- 答案:使用`PROC CORR`过程计算相关系数。
#### 三、假设检验1. t检验- 题目:对两组独立样本的均值进行t检验。
- 答案:使用`PROC TTEST`过程进行t检验。
2. 方差分析- 题目:对多个组别数据进行方差分析。
- 答案:使用`PROC ANOVA`过程进行方差分析。
3. 卡方检验- 题目:对分类变量进行卡方检验。
- 答案:使用`PROC FREQ`过程和`CHI2TEST`选项进行卡方检验。
#### 四、回归分析1. 简单线性回归- 题目:使用一个自变量和一个因变量进行简单线性回归分析。
- 答案:使用`PROC REG`过程进行简单线性回归。
2. 多元线性回归- 题目:使用多个自变量和一个因变量进行多元线性回归分析。
- 答案:同样使用`PROC REG`过程,但包括多个自变量。
6、SINA的编程Data about_sin;do x=0to20by0.1;Y=sin(X);output;end;Run;proc gplot;plot y*x;run;7. 将数据集CLASS中的观测,判断并显示其年级(变量名grade表示):Age<=4 ,grade=“小班”;4<Age<=6 ,grade=“大班”;否则,grade=“学前班”;用分支语句完成数据步操作,程序命名为grade03.sasdata grade;set homework.class;select;when(age<=4) grade="小班";when (4<age<=6) grade="大班";otherwise grade="学前班";end;run;8. 编写sum.SAS程序,计算从1-999之间的所有奇数的和。
data sum;s=0;do n=1to1000by2;s=s+n;end;run;9. 用程序stu_merge.sas实现:•创建如下数据集stu_info和stu_score,•合并这两数据集,新数据集命名为:student.•在数据集student后增加变量result.如果成绩<60,显示”no pass”,否则显示”pass”输出显示数据集student中的id,subject,score,result列,并分别以标签”学号”,”科目”,”分数”和”结果”显示data stu_info;input id sex $ age class $;cards;1 boy 14 A2 girl 15 A3 girl 15 A4 boy 16 B5 boy 15 B6 girl 15 B;proc sort;by id;run;data stu_score;input id subject $ score;cards;1 Chinese 891 maths 792 Chinese 672 maths 843 Chinese 783 maths 834 Chinese 694 maths 855 Chinese 795 maths 69;proc sort;by id;run;data student;merge stu_info stu_score;by id;select;when(score<60) result="no pass";otherwise result="pass";end;run;proc print data=student;run;4.编写数据步程序,将数据集classbirth按性别不同分别创建boy和girl数据集,均只保留变量name,sex,birth,weight.保存该程序命名为grade02.sasdata boy girl;set Mylib.Classbirth;if sex="男"then output boy;else output girl;keep name age weight sex;run;第3章实验练习1.1.建立数据集data aa.ex3_01;input time @@;cards;1510 1450 1480 1460 1520 1480 1490 14601480 1510 1530 1470 1500 1520 1510 1470run;2.操作:在insight中打开数据集ex3_01单击分析/分布,将time选入Y框中,按输出按钮,在描述统计量栏中选基本置信区间。
回归分析:两个变量之间:1直线多重现性回归(一个岁多个变量)2曲线样本的代表性数据:1准确测量(测量方法仪器技术水平)2调查(询问发调查表)过失误差(调查或记录错误)数据本身(异常点)回归分析中的诊断:1数据本身(异常点)2贡献性诊断筛选变量(8):前进后退逐步回归(多重线性多重LOGISTIC)多重线性回归结果变量是定量的最好是否合正太分布不管二值还是多重都用多重LOGISTIC一:考虑药物种类就是单因素非单组设计线性回归简单回归(样本量为30的单组二元)设有30例某病患者,将他们随机均分为两组,第1组用A药治疗,第2组用B药治疗,对每一位患者均观测性别、年龄、体重、CD34+ 和微核细胞数(MNC),资料见表2。
表3-29 两种药物治疗同一种疾病患者的部分原因和指标的观测结果药物编号性别年龄(岁) 体重(kg)MNC(x108/kg) CD34+(x106/kg)A 1 男31 60 4.42 7.072 女43 58 2.67 1.393 男55 58 4.14 2.154 男55 58 3.23 1.585 女35 60 2.54 1.096 男24 58 2.37 1.427 男37 60 2.38 0.488 男37 60 2.58 1.559 女43 60 4.54 2.9510 男26 60 1.24 0.3111 女38 68 2.43 3.4312 女29 73 2.16 1.1913 男46 73 3.49 4.3614 男43 85 3.06 5.5115 男46 85 2.65 2.41B 1 女38 55 3.86 4.982 男16 46 6.00 5.883 女28 58 4.57 3.664 女30 60 3.02 1.965 女32 60 3.75 2.666 女38 60 5.41 9.207 男38 68 2.68 3.648 男38 68 2.73 3.069 男46 56 3.99 3.8310 男46 56 3.84 1.1511 男20 60 5.79 6.5412 男20 60 5.23 3.1413 女49 57 3.42 2.3314 男36 67 4.38 1.9315 女43 75 7.60 8.36请按要求实现如下的统计分析,并给出统计和专业结论。
P265 1今有某种型号的电池三批,它们分别是A、B、C三个工厂所生产的,为评比其质量,各随机抽取5只电池为样品,经试验得其寿命(h)如下:A B C4042 484538 262834323039 5040 5043试在显著性水平0.05下检验电池的平均寿命有无显著的差异,若差异是显著的,试求均差μA -μB,μA-μC和μB-μC的置信水平为95%的置信区间。
代码:data l1;do b=1to5;do a=1to3;input x@@;output;end;end;cards;40 26 39 42 28 50 48 34 40 45 32 50 38 30 43 proc anova;class a;model x=a;run;结果输出:The SAS System 19:15 Friday, April 9, 2012 5The ANOVA ProcedureClass Level InformationClass Levels Valuesa 3 1 2 3Number of observations 15 The SAS System 19:15 Friday, April 9, 2012 6The ANOVA ProcedureDependent Variable: xSum ofSource DF Squares Mean Square F Value Pr > FModel 2 615.6000000 307.8000000 17.07 0.0003Error 12 216.4000000 18.0333333Corrected Total 14 832.0000000R-Square Coeff Var Root MSE x Mean0.739904 10.88863 4.246567 39.00000Source DF Anova SS Mean Square F Value Pr > Fa 2 615.6000000 307.8000000 17.07 0.0003 结论:结论:在显著水平为0.05下0.0003<0.05,所以各个总体均值间有显著差异。
重庆医科大学--卫生统计学统计软件包SAS上机练习题(一)1、SAS常用的窗口有哪三个?请在三个基本窗口之间切换并记住这些命令或功能键。
2、请在PGM窗口中输入如下几行程序,提交系统执行,并查看OUTPUT窗和LOG窗中内容,注意不同颜色的含义;并根据日志窗中的信息修改完善程序。
3、将第2题的程序、结果及日志保存到磁盘。
4、试根据如下例1的程序完成后面的问题:表1 某班16名学生3门功课成绩表如下问题:1)建立数据集;2)打印至少有1门功课不及格同学的信息;(提示,使用if语句)参考程序:data a;input id sh wl bl;cards;083 68 71 65084 74 61 68085 73 75 46086 79 80 79087 75 71 68084 85 85 87085 78 79 75086 80 76 79087 85 80 82088 77 71 75089 67 73 71080 75 81 70118 70 54 75083 70 66 84084 62 73 65099 82 70 79;run;data b;set a;if sh<60 or wl<60 or bl<60then output;run;proc print data=b;var id sh wl bl;run;5、根据下列数据建立数据集表2 销售数据开始时间终止时间费用2005/04/28 25MAY2009 $123,345,0002005 09 18 05OCT2009 $33,234,5002007/08/12 22SEP2009 $345,60020040508 30JUN2009 $432,334,500提示:(格式化输入;数据之间以空格分隔,数据对齐;注意格式后面的长度应以前一个位置结束开始计算,如果读入错误,可试着调整格式的宽度;显示日期需要使用输出格式)开始时间,输入格式yymmdd10.终止时间,输入格式date10.费用,输入格式dollar12.参考程序:data a;input x1 yymmdd10. x2 date10. x3 dollar13.;cards;2005/04/28 25MAY2009 $123,345,0002005 09 18 05OCT2009 $33,234,5002007/08/12 22SEP2009 $345,60020040508 30JUN2009 $432,334,500;run;proc print;run;proc print;format x1 yymmdd10. x2 date9. x3 dollar13.;run;6、手机号码一编码规则一般是:YYY-XXXX-ZZZZ,其YYY为号段;XXXX一般为所在地区编码;ZZZZ为对应的个人识别编号。
SAS Certificate Base Practice Questions and Detailed Answers Chapter 1: Basic ConceptsChapter 2: Referencing Files and Setting OptionsChapter 3: Editing and Debugging SAS ProgramsChapter 4: Creating List ReportsChapter 5: Creating SAS Data Sets from Raw DataChapter 6: Understanding DATA Step ProcessingChapter 7: Creating and Applying User-Defined FormatsChapter 8: Creating Enhanced List and Summary ReportsChapter 9: Producing Descriptive StatisticsChapter 10: Producing HTML OutputChapter 11: Creating and Managing VariablesChapter 12: Reading SAS Data SetsChapter 13: Combining SAS Data SetsChapter 14: Transforming Data with SAS FunctionsChapter 15: Generating Data with DO LoopsChapter 16: Processing Variables with ArraysChapter 17: Reading Raw Data in Fixed FieldsChapter 18: Reading Free-Format DataChapter 19: Reading Date and Time ValuesChapter 20: Creating a Single Observation from Multiple RecordsChapter 21: Creating Multiple Observations from a Single RecordChapter 22: Reading Hierarchical FilesChapter 1: Basic Concepts Answer Key1.How many observations and variables does the data set below contain?a. 3 observations, 4 variablesb. 3 observations, 3 variablesc. 4 observations, 3 variablesd.can't tell because some values are missingCorrect answer:cRows in the data set are called observations, and columns are called variables. Missing values don't affect the structure of the data set.2.How many program steps are executed when the program below is processed?data user.tables;infile jobs;input date name $ job $;run;proc sort data=user.tables;by name;run;proc print data=user.tables;run;a.threeb.fourc.fived.sixCorrect answer:aWhen it encounters a DATA, PROC, or RUN statement, SAS stops reading statements andexecutes the previous step in the program. The program above contains one DATA step and two PROC steps, for a total of three program steps.3.What type of variable is the variable AcctNum in the data set below?a.numericb.characterc.can be either character or numericd.can't tell from the data shownCorrect answer:bIt must be a character variable, because the values contain letters and underscores, which are not valid characters for numeric values.4.What type of variable is the variable Wear in the data set below?a.numericb.characterc.can be either character or numericd.can't tell from the data shownCorrect answer:aIt must be a numeric variable, because the missing value is indicated by a period rather than by a blank.5.Which of the following variable names is valid?a.4BirthDateb.$Costc._Items_d.Tax-RateCorrect answer:cVariable names follow the same rules as SAS data set names. They can be 1 to 32 characters long, must begin with a letter (A–Z, either uppercase or lowercase) or an underscore, and can continue with any combination of numbers, letters, or underscores.6.Which of the following files is a permanent SAS file?a.Sashelp.PrdSaleb.Sasuser.MySalesc.Profits.Quarter1d.all of the aboveCorrect answer:dTo store a file permanently in a SAS data library, you assign it a libref other than the default Work. For example, by assigning the libref Profits to a SAS data library, you specify that files within the library are to be stored until you delete them. Therefore, SAS files in the Sashelp and Sasuser libraries are permanent files.7.In a DATA step, how can you reference a temporary SAS data set named Forecast?a.Forecastb.Work.Forecastc.Sales.Forecast (after assigning the libref Sales)d.only a and b aboveCorrect answer:dTo reference a temporary SAS file in a DATA step or PROC step, you can specify the one-level name of the file (for example, Forecast) or the two-level name using the libref Work (for example, Work.Forecast).8.What is the default length for the numeric variable Balance?a. 5b. 6c.7d.8Correct answer:dThe numeric variable Balance has a default length of 8. Numeric values (no matter how many digits they contain) are stored in 8 bytes of storage unless you specify a different length.9.How many statements does the following SAS program contain?proc print data=new.prodsalelabel double;var state day price1 price2; where state='NC';label state='Name of State';run;a.threeb.fourc.fived.sixCorrect answer:cThe five statements are•PROC PRINT statement (two lines long)•VAR statement•WHERE statement (on the same line as the VAR statement)•LABEL statement•RUN statement (on the same line as the LABEL statement).10.What is a SAS data library?a. a collection of SAS files, such as SAS data sets and catalogsb.in some operating environments, a physical collection of SAS filesc.in some operating environments, a logically related collection of SAS filesd.all of the aboveCorrect answer:dEvery SAS file is stored in a SAS data library, which is a collection of SAS files, such as SAS data sets and catalogs. In some operating environments, a SAS data library is a physical collection of files. In others, the files are only logically related. In the Windows and UNIX environments, a SAS data library is typically a group of SAS files in the same folder or directory.Chapter 2: Referencing Files and Setting Options1.If you submit the following program, how does the output look?options pagesize=55 nonumber;proc tabulate data=clinic.admit;class actlevel;var age height weight;table actlevel,(age height weight)*mean;run;options linesize=80;proc means data=clinic.heart min max maxdec=1;var arterial heart cardiac urinary;class survive sex;run;a.The PROC MEANS output has a print line width of 80 characters, but the PROCTABULATE output has no print line width.b.The PROC TABULATE output has no page numbers, but the PROC MEANS outputhas page numbers.c.Each page of output from both PROC steps is 55 lines long and has no page numbers,and the PROC MEANS output has a print line width of 80 characters.d.The date does not appear on output from either PROC step.Correct: answer:cWhen you specify a system option, it remains in effect until you change the option or end your SAS session, so both PROC steps generate output that is printed 55 lines per page with no page numbers. If you don't specify a system option, SAS uses the default value for that system option.2.In order for the date values 05May1955 and 04Mar2046 to be read correctly, what value mustthe YEARCUTOFF= option have?a. a value between 1947 and 1954, inclusiveb.1955 or higherc.1946 or higherd.any valueCorrect answer:dAs long as you specify an informat with the correct field width for reading the entire date value, the YEARCUTOFF= option doesn't affect date values that have four-digit years.3.When you specify an engine for a library, you are always specifyinga.the file format for files that are stored in the library.b.the version of SAS that you are using.c.access to other software vendors' files.d.instructions for creating temporary SAS files.Correct answer:aA SAS engine is a set of internal instructions that SAS uses for writing to and reading from files in a SAS library. Each engine specifies the file format for files that are stored in the library, which in turn enables SAS to access files with a particular format. Some engines access SAS files, and other engines support access to other vendors' files.4.Which statement prints a summary of all the files stored in the library named Area51?a.proc contents data=area51._all_ nods;b.proc contents data=area51 _all_ nods;c.proc contents data=area51 _all_ noobs;d.proc contents data=area51 _all_.nods;Correct answer:aTo print a summary of library contents with the CONTENTS procedure, use a period to append the _ALL_ option to the libref. Adding the NODS option suppresses detailed information about the files.5.The following PROC PRINT output was created immediately after PROC TABULATEoutput. Which SAS system options were specified when the report was created?a.OBS=, DATE, and NONUMBERb.PAGENO=1, and DATEc.NUMBER and DATE onlyd.none of the aboveCorrect answer:bClearly, the DATE and PAGENO= options are specified. Because the page number on the output is 1, even though PROC TABULATE output was just produced. If you don't specify PAGENO=, all output in the Output window is numbered sequentially throughout your SAS session.6.Which of the following programs correctly references a SAS data set named SalesAnalysisthat is stored in a permanent SAS library?a.data saleslibrary.salesanalysis;set mydata.quarter1sales;if sales>100000;run;b.data mysales.totals;set sales_99.salesanalysis;if totalsales>50000;run;c.proc print data=salesanalysis.quarter1;var sales salesrep month;run;d.proc freq data=1999data.salesanalysis;tables quarter*sales; run;Correct answer:bLibrefs must be 1 to 8 characters long, must begin with a letter or underscore, and can contain only letters, numbers, or underscores. After you assign a libref, you specify it as the first element in the two-level name for a SAS file.7.Which time span is used to interpret two-digit year values if the YEARCUTOFF= option isset to 1950?a.1950-2049b.1950-2050c.1949-2050d.1950-2000Correct answer:aThe YEARCUTOFF= option specifies which 100-year span is used to interpret two-digit year values. The default value of YEARCUTOFF= is 1920. However, you can override the default and change the value of YEARCUTOFF= to the first year of another 100-year span. If you specify YEARCUTOFF=1950, then the 100-year span will be from 1950 to 2049.8.Asssuming you are using SAS code and not special SAS windows, which one of thefollowing statements is false?a.LIBNAME statements can be stored with a SAS program to reference the SAS libraryautomatically when you submit the program.b.When you delete a libref, SAS no longer has access to the files in the library.However, the contents of the library still exist on your operating system.c.Librefs can last from one SAS session to another.d.You can access files that were created with other vendors' software by submitting aLIBNAME statement.Correct answer:cThe LIBNAME statement is global, which means that librefs remain in effect until you modify them, cancel them, or end your SAS session. Therefore, the LIBNAME statement assigns the libref for the current SAS session only. You must assign a libref before accessingSAS files that are stored in a permanent SAS data library.9.What does the following statement do?libname osiris spss 'c:\myfiles\sasdata\data';a.defines a library called Spss using the OSIRIS engineb.defines a library called Osiris using the SPSS enginec.defines two libraries called Osiris and Spss using the default engined.defines the default library using the OSIRIS and SPSS enginesCorrect answer:bIn the LIBNAME statement, you specify the library name before the engine name. Both are followed by the path.10.What does the following OPTIONS statement do?options pagesize=15 nodate;a.suppresses the date and limits the page size of the logb.suppresses the date and limits the vertical page size for text outputc.suppresses the date and limits the vertical page size for text and HTML outputd.suppresses the date and limits the horizontal page size for text outputCorrect answer:bThese options affect the format of listing output only. NODATE suppresses the date and PAGESIZE= determines the number of rows to print on the page.Chapter 3: Editing and Debugging SAS Programs Answer Key1.As you write and edit SAS programs it's a good idea toa.begin DATA and PROC steps in column one.b.indent statements within a step.c.begin RUN statements in column one.d.all of the aboveCorrect answer:dAlthough you can write SAS statements in almost any format, a consistent layout enhances readability and enables you to understand the program's purpose. It's a good idea to begin DATA and PROC steps in column one, to indent statements within a step, to begin RUN statements in column one, and to include a RUN statement after every DATA step or PROC step.2.What usually happens when an error is detected?a.SAS continues processing the step.b.SAS continues to process the step, and the log displays messages about the error.c.SAS stops processing the step in which the error occurred, and the log displaysmessages about the error.d.SAS stops processing the step in which the error occurred, and the program outputdisplays messages about the error.Correct answer:cSyntax errors generally cause SAS to stop processing the step in which the error occurred. When a program that contains an error is submitted, messages regarding the problem also appear in the SAS log. When a syntax error is detected, the SAS log displays the word ERROR, identifies the possible location of the error, and gives an explanation of the error.3. A syntax error occurs whena.some data values are not appropriate for the SAS statements that are specified in aprogram.b.the form of the elements in a SAS statement is correct, but the elements are not validfor that usage.c.program statements do not conform to the rules of the SAS language.d.none of the aboveCorrect canswer:Syntax errors are common types of errors. Some SAS system options, features of the Editorwindow, and the DATA step debugger can help you identify syntax errors. Other types oferrors include data errors, semantic errors, and execution-time errors.4.How can you tell whether you have specified an invalid option in a SAS program?a. A log message indicates an error in a statement that seems to be valid.b. A log message indicates that an option is not valid or not recognized.c.The message "PROC running" or "DATA step running" appears at the top of theactive window.d.You can't tell until you view the output from the program.Correct answer:bWhen you submit a SAS statement that contains an invalid option, a log message notifies you that the option is not valid or not recognized. You should recall the program, remove or replace the invalid option, check your statement syntax as needed, and resubmit the corrected program.5.Which of the following programs contains a syntax error?Correct answer:bThe DATA step contains a misspelled keyword (dat instead of data). However, this is such a common (and easily interpretable) error that SAS produces only a warning message, not an error.6.What does the following log indicate about your program?proc print data=sasuser.cargo99var origin dest cargorev;2276ERROR 22-322: Syntax error, expecting one of the following:;, (, DATA, DOUBLE, HEADING, LABEL, N, NOOBS, OBS, ROUND, ROWS, SPLIT, STYLE,UNIFORM, WIDTH.ERROR 76-322: Syntax error, statement will be ignored.11 run;a.SAS identifies a syntax error at the position of the VAR statement.b.SAS is reading VAR as an option in the PROC PRINT statement.c.SAS has stopped processing the program because of errors.d.all of the aboveCorrect answer:dBecause there is a missing semicolon at the end of the PROC PRINT statement, SAS interprets VAR as an option in PROC PRINT and finds a syntax error at that location. SAS stops processing programs when it encounters a syntax error.Chapter 4: Creating List Reports Answer Key 1.Which PROC PRINT step below creates the following output?Correct answer:cThe DATA= option specifies the data set that you are listing, and the ID statement replaces the Obs column with the specified variable. The VAR statement specifies variables and controls the order in which they appear, and the WHERE statement selects rows based on a condition. The LABEL option in the PROC PRINT statement causes the labels that are specified in the LABEL statement to be displayed.2.Which of the following PROC PRINT steps is correct if labels are not stored with thedata set?Correct aanswer:You use the DATA= option to specify the data set to be printed. The LABEL optionspecifies that variable labels appear in output instead of variable names.3.Which of the following statements selects from a data set only those observations forwhich the value of the variable Style is RANCH, SPLIT, or TWOSTORY?Correct answer:dIn the WHERE statement, the IN operator enables you to select observations based on several values. You specify values in parentheses and separate them by spaces or commas. Character values must be enclosed in quotation marks and must be in the same case as in the data set.4.If you want to sort your data and create a temporary data set named Calc to store thesorted data, which of the following steps should you submit?Correct answer:cIn a PROC SORT step, you specify the DATA= option to specify the data set to sort. The OUT= option specifies an output data set. The required BY statement specifies the variable(s) to use in sorting the data.5.Which options are used to create the following PROC PRINT output?13:27 Monday, March 22, 1999 Patient Arterial Heart Cardiac Urinary203 88 95 66 11054 83 183 95 0664 72 111 332 12210 74 97 369 0101 80 130 291 0a.the DATE system option and the LABEL option in PROC PRINTb.the DATE and NONUMBER system options and the DOUBLE and NOOBSoptions in PROC PRINTc.the DATE and NONUMBER system options and the DOUBLE option inPROC PRINTd.the DATE and NONUMBER system options and the NOOBS option in PROCPRINTCorrect answer:bThe DATE and NONUMBER system options cause the output to appear with the date but without page numbers. In the PROC PRINT step, the DOUBLE option specifies double spacing, and the NOOBS option removes the default Obs column.6.Which of the following statements can you use in a PROC PRINT step to create thisoutput?Correct answer:dYou do not need to name the variables in a VAR statement if you specify them in the SUM statement, but you can. If you choose not to name the variables in the VAR statement as well, then the SUM statement determines the order of the variables in the output.7.What happens if you submit the following program?proc sort data=clinic.diabetes;run;proc print data=clinic.diabetes;var age height weight pulse;where sex='F';run;a.The PROC PRINT step runs successfully, printing observations in their sortedorder.b.The PROC SORT step permanently sorts the input data set.c.The PROC SORT step generates errors and stops processing, but the PROCPRINT step runs successfully, printing observations in their original (unsorted)order.d.The PROC SORT step runs successfully, but the PROC PRINT step generateserrors and stops processing.Correct answer:cThe BY statement is required in PROC SORT. Without it, the PROC SORT step fails. However, the PROC PRINT step prints the original data set as requested.8.If you submit the following program, which output does it create?proc sort data=finance.loans out=work.loans;by months amount;run;proc print data=work.loans noobs; var months;sum amount payment;where months<360;run;a.b.c.d.Correct answer:aColumn totals appear at the end of the report in the same format as the values of the variables, so b is incorrect. Work.Loans is sorted by Month and Amount, so c isincorrect. The program sums both Amount and Payment, so d is incorrect.9.Choose the statement below that selects rows which•the amount is less than or equal to $5000•the account is 101-1092 or the rate equals 0.095.Correct answer:cTo ensure that the compound expression is evaluated correctly, you can use parentheses to groupaccount='101-1092' or rate eq 0.095OBS Account Amount Rate MonthsPayment1 101-1092 $22,000 10.00%60 $467.432 101-1731 $114,0009.50% 360 $958.573 101-1289 $10,000 10.50%36 $325.024 101-3144 $3,500 10.50%12 $308.525 103-1135 $8,700 10.50%24 $403.476 103-1994 $18,500 10.00%60 $393.077 103-2335 $5,000 10.50%48 $128.028 103-3864 $87,500 9.50% 360 $735.759 103-3891 $30,000 9.75% 360 $257.75For example, from the data set above, a and b above select observations 2 and 8 (those that have a rate of 0.095); c selects no observations; and d selects observations 4 and 7 (those that have an amount less than or equal to 5000).10.What does PROC PRINT display by default?a.PROC PRINT does not create a default report; you must specify the rows andcolumns to be displayed.b.PROC PRINT displays all observations and variables in the data set. If youwant an additional column for observation numbers, you can request it.c.PROC PRINT displays columns in the following order: a column forobservation numbers, all character variables, and all numeric variables.d.PROC PRINT displays all observations and variables in the data set, a columnfor observation numbers on the far left, and variables in the order in which they occur in the data set.Correct answer:dYou can remove the column for observation numbers. You can also specify the variables you want, and you can select observations according to conditions.Chapter 5: Creating SAS Data Sets from Raw Data Answer Key1.Which SAS statement associates the fileref Crime with the raw data fileC:\States\Data\Crime?a.filename crime 'c:\states\data\crime';b.filename crime c:\states\data\crime;c.fileref crime 'c:\states\data\crime';d.filename 'c:\states\data\crime' crime; Correct aanswer:Before you can read your raw data, you must reference the raw data file by creating afileref. You assign a fileref by using a FILENAME statement in the same way thatyou assign a libref by using a LIBNAME statement.2.Filerefs remain in effect untila.you change them.b.you cancel them.c.you end your SAS session.d.all of the aboveCorrect answer:dLike LIBNAME statements, FILENAME statements are global; they remain in effect until you change them, cancel them, or end your SAS session.3.Which statement identifies the name of a raw data file to be read with the filerefProducts and specifies that the DATA step read only records 1-15?a.infile products obs 15;b.infile products obs=15;c.input products obs=15;d.input products 1-15;Correct answer:bYou use an INFILE statement to specify the raw data file to be read. You can specify a fileref or an actual filename (in quotation marks). The OBS= option in the INFILE statement enables you to process only records 1 through n.4.Which of the following programs correctly writes the observations from the data setbelow to a raw data file?Correct answer:dThe keyword _NULL_ in the DATA statement enables you to use the power of the DATA step without actually creating a SAS data set. You use the FILE and PUT statements to write out the observations from a SAS data set to a raw data file. The FILE statement specifies the raw data file and the PUT statement describes the lines towrite to the raw data file. The filename and location that are specified in the FILE statement must be enclosed in quotation marks.5.Which raw data file can be read using column input?a.b.c.d.all of the aboveCorrect answer:bColumn input is appropriate only in some situations. When you use column input, your data must be standard character or numeric values, and they must be in fixed fields. That is, values for a particular variable must be in the same location in all records.6.Which program creates the output shown below?Correct answer:aThe INPUT statement creates a variable using the name that you assign to each field. Therefore, when you write an INPUT statement, you need to specify the variable names exactly as you want them to appear in the SAS data set.7.Which statement correctly reads the fields in the following order: StockNumber,Price, Item, Finish, Style?Field Name Start Column End Column Data TypeStockNumber 1 3 characterFinish 5 9 characterStyle 11 18 characterItem 20 24 characterPrice 27 32 numericCorrec t answer:bYou can use column input to read fields in any order. You must specify the variable name to be created, identify character values with a $, and name the correct starting column and ending column for each field.8.Which statement correctly re-defines the values of the variable Income as 100percent higher?a.income=income*1.00;b.income=income+(income*2.00);c.income=income*2;d.income=*2;Correct answer:cTo re-define the values of the variable Income in an Assignment statement, you specify the variable name on the left side of the equal sign and an appropriate expression including the variable name on the right side of the equal sign.9.Which program correctly reads instream data?a.data finance.newloan;input datalines;if country='JAPAN';MonthAvg=amount/12;1998 US CARS 194324.121998 US TRUCKS 142290.301998 CANADA CARS 10483.441998 CANADA TRUCKS 93543.641998 MEXICO CARS 22500.571998 MEXICO TRUCKS 10098.881998 JAPAN CARS 15066.431998 JAPAN TRUCKS 40700.34;b.data finance.newloan;input Year 1-4 Country $ 6-11Vehicle $ 13-18 Amount 20-28;if country='JAPAN';MonthAvg=amount/12;datalines;run;c.data finance.newloan;input Year 1-4 Country 6-11Vehicle 13-18 Amount 20-28;if country='JAPAN';MonthAvg=amount/12;datalines;1998 US CARS 194324.121998 US TRUCKS 142290.301998 CANADA CARS 10483.441998 CANADA TRUCKS 93543.641998 MEXICO CARS 22500.571998 MEXICO TRUCKS 10098.881998 JAPAN CARS 15066.431998 JAPAN TRUCKS 40700.34;d.data finance.newloan;input Year 1-4 Country $ 6-11Vehicle $ 13-18 Amount 20-28;if country='JAPAN';MonthAvg=amount/12;datalines;1998 US CARS 194324.121998 US TRUCKS 142290.301998 CANADA CARS 10483.441998 CANADA TRUCKS 93543.641998 MEXICO CARS 22500.571998 MEXICO TRUCKS 10098.881998 JAPAN CARS 15066.431998 JAPAN TRUCKS 40700.34;Correct answer:dTo read instream data, you specify a DATALINES statement and data lines, followed by a null statement (single semicolon) to indicate the end of the input data. Program a contains no DATALINES statement, and the INPUT statement doesn't specify the fields to read. Program b contains no data lines, and the INPUT statement in program c doesn't specify the necessary dollar signs for the character variables Country and Vehicle.10.Which SAS statement subsets the raw data shown below so that only the observationsin which Sex (in the second field) has a value of F are processed?a.if sex=f;b.if sex=F;c.if sex='F';d. a or bCorrect answer:cTo subset data, you can use a subsetting IF statement in any DATA step to process only those observations that meet a specified condition. Because Sex is a character variable, the value F must be enclosed in quotation marks and must be in the same case as in the data set.Chapter 6: Understanding DATA Step Processing Answer Key1.Which of the following is not created during the compilation phase?。
sas初赛练习题SAS(Statistical Analysis System)初赛练习题旨在测试参赛者对SAS软件的使用能力以及数据分析的基本方法。
本文将以问题-解决方案的形式来回答各个练习题,并附上相应的代码和结果。
问题一:计算BMI指数BMI指数(Body Mass Index)是一种常用的身体质量指数计算方法,通过以下公式计算:BMI = 体重(kg)/ 身高(m)^2假设有一组数据,记录了50位参赛者的身高(单位:cm)和体重(单位:kg),请编写SAS代码计算每位参赛者的BMI指数,并输出结果。
解决方案:```sasdata participants;input height weight;/* 将身高转换为米 */height = height / 100;BMI = weight / (height * height);datalines;160 60165 65170 70175 75180 80/* 其他参赛者数据 */;run;proc print data=participants;var height weight BMI;run;```运行以上代码,你将得到一个包含BMI指数的数据集。
你可以使用`proc print`命令来查看计算结果。
问题二:筛选数据假设你的队友已经给出了一个数据集,其中记录了参赛者的姓名(name)、年龄(age)和成绩(score)。
现在需要对数据进行筛选,要求只保留年龄在20岁以上并且成绩大于80分的参赛者数据。
请编写SAS代码完成此筛选。
解决方案:```sasdata selected_participants;set participants;where age > 20 and score > 80;run;proc print data=selected_participants;var name age score;run;```以上代码将针对之前问题一的数据集进行筛选,将满足条件的参赛者数据保存在一个新的数据集中,并使用`proc print`查看结果。
已知某研究对象分为三类,每个样品考察4项指标,各类的观测样品数分别为7,4,6;类外还有3个待判样品(所有观测数据见表2)。
假定样本均来自正态总体。
表2 判别分类的数据(1)试用马氏距离判别法进行判别分析,并对3个待判样品进行判别归类。
(2)使用其他的判别法进行判别分析,并对3个待判样品进行判别归类,然后比较之。
问题求解1判别分析及判别归类使用SAS软件中的DISCRIM过程进行判别归类,SAS程序及结果如下。
data d510;input x1-x4 group @@;cards;6 -11.5 19 90 1-11 -18.5 25 -36 390.2 -17 17 3 2-4 -15 13 54 10 -14 20 35 20.5 -11.5 19 37 3-10 -19 21 -42 30 -23 5 -35 120 -22 8 -20 3-100 -21.4 7 -15 1-100 -21.5 15 -40 213 -17.2 18 2 2-5 -18.5 15 18 110 -18 14 50 1-8 -14 16 56 10.6 -13 26 21 3-40 -20 22 -50 3-8 -14 16 56 .92.2 -17 18 3 .-14 -18.5 25 -36 .;proc print;run;proc discrim data=d510 simple pcov wsscp psscp wcovdistance list;class group;var x1-x4;run;从结果来看,样本2、3类之间的马氏距离为d 212=1.34,检验(2)(3)0:H μμ= 的F 统计量为0.63177,相应的p =0.651>0.10,故在显著性水平=0.10α时量总体2、3类的均值向量没有显著差异,即认为对讨论样本分为2、3类的判别问题是没有太大意义的。
此外,判别结果中两个样本被判错归类:1类中8号样本应属于2类,2类中9号样本应属于1类;且待判得三个样本分别属于1,2,3类。
P265 1今有某种型号的电池三批,它们分别是A、B、C三个工厂所生产的,为评比其质量,各随机抽取5只电池为样品,经试验得其寿命(h)如下:A B C4042 484538 262834323039 5040 5043试在显著性水平0.05下检验电池的平均寿命有无显著的差异,若差异是显著的,试求均差μA -μB,μA-μC和μB-μC的置信水平为95%的置信区间。
代码:data l1;do b=1to5;do a=1to3;input x@@;output;end;end;cards;40 26 39 42 28 50 48 34 40 45 32 50 38 30 43 proc anova;class a;model x=a;run;结果输出:The SAS System 19:15 Friday, April 9, 2012 5The ANOVA ProcedureClass Level InformationClass Levels Valuesa 3 1 2 3Number of observations 15 The SAS System 19:15 Friday, April 9, 2012 6The ANOVA ProcedureDependent Variable: xSum ofSource DF Squares Mean Square F Value Pr > FModel 2 615.6000000 307.8000000 17.07 0.0003Error 12 216.4000000 18.0333333Corrected Total 14 832.0000000R-Square Coeff Var Root MSE x Mean0.739904 10.88863 4.246567 39.00000Source DF Anova SS Mean Square F Value Pr > Fa 2 615.6000000 307.8000000 17.07 0.0003 结论:结论:在显著水平为0.05下0.0003<0.05,所以各个总体均值间有显著差异。
代码:data l1;p265 1 (ua-ub)input lei n;do rep= 1to n;input x@@;output;end;cards;1 540 42 48 45 382 526 28 34 32 30;proc ttest;class lei;var x;run;结果输出:The SAS System 19:15 Friday, April 9, 2012 25The TTEST ProcedureStatisticsLower CL Upper CL Lower CL Upper CL Variable lei N Mean Mean Mean Std Dev Std Dev Std Dev Std Errx 1 5 37.664 42.6 47.536 2.3815 3.9749 11.4221.7776x 2 5 26.074 30 33.926 1.8946 3.1623 9.0871.4142x Diff (1-2) 7.3618 12.6 17.838 2.426 3.5917 6.8808 2.2716T-TestsVariable Method Variances DF t Value Pr > |t|x Pooled Equal 8 5.55 0.0005x Satterthwaite Unequal 7.62 5.55 0.0006Equality of VariancesVariable Method Num DF Den DF F Value Pr > Fx Folded F 4 4 1.58 0.6685代码:data l1;(p265 1 ub-uc)input lei n;do rep= 1to n;input x@@;output;end;cards;1 526 28 34 32 302 539 50 40 50 43;proc ttest;class lei;var x;run;结果输出:The SAS System 19:08 Friday, April 23, 2012 1The TTEST ProcedureStatisticsLower CL Upper CL Lower CL Upper CL Variable lei N Mean Mean Mean Std Dev Std Dev Std Dev Std Errx 1 5 26.074 30 33.926 1.8946 3.1623 9.0871.4142x 2 5 37.795 44.4 51.005 3.1873 5.3198 15.2872.3791x Diff (1-2) -20.78 -14.4 -8.018 2.9558 4.3761 8.3835 2.7677T-TestsVariable Method Variances DF t Value Pr > |t|x Pooled Equal 8 -5.20 0.0008x Satterthwaite Unequal 6.51 -5.20 0.0016Equality of VariancesVariable Method Num DF Den DF F Value Pr > Fx Folded F 4 4 2.83 0.3378代码:data l1;(p265 1 ua-uc)input lei n;do rep= 1to n;input x@@;output;end;cards;1 540 42 48 45 382 539 50 40 50 43;proc ttest;class lei;var x;run;结果输出:The SAS System 19:15 Friday, April 9, 2012 28The TTEST ProcedureStatisticsLower CL Upper CL Lower CL Upper CL Variable lei N Mean Mean Mean Std Dev Std Dev Std Dev Std Errx 1 5 37.664 42.6 47.536 2.3815 3.9749 11.4221.7776x 2 5 37.795 44.4 51.005 3.1873 5.3198 15.2872.3791x Diff (1-2) -8.648 -1.8 5.0485 3.1718 4.6957 8.996 2.9698T-TestsVariable Method Variances DF t Value Pr > |t|x Pooled Equal 8 -0.61 0.5613x Satterthwaite Unequal 7.41 -0.61 0.5626Equality of VariancesVariable Method Num DF Den DF F Value Pr > Fx Folded F 4 4 1.79 0.5862结论:在置信水平为95%的置信区间。
ua-uc、ua-ub、ub-uc分别为(-7.65,4.05)、(6.75,18.45)、(-20.25,-8.55).p 265 2为了寻找飞机控制面板上仪器表的最佳布置,试验了三个方案,观察领航在紧急情况的反应时间(以1/10秒记),随机地选择28名领航员,得到他们对于不同的布局方案的反应时间如下:方案Ⅰ14 13 9 15 11 13 14 11方案Ⅱ10 12 7 11 8 12 9 10 13 9 10 9方案Ⅲ11 5 9 10 6 8 8 7试在显著性水平0.05下检验各个方案的反应时间有无显著的差异,若有差异,试求μ1-μ2,μ1-μ3,μ2-μ3的置信水平为0.95的置信区间。
代码:data l1;(p 265 2)input type$ n;do i=1to n;input x@@;output;end;cards;M1 814 13 9 15 11 13 14 11M2 1210 12 7 11 8 12 9 10 13 9 10 9M3 811 5 9 10 6 8 8 7;proc anova;class type;model x = type;run;结果输出:The SAS System 19:10 Friday, April 16, 2012 1The ANOVA ProcedureClass Level InformationClass Levels Valuestype 3 M1 M2 M3Number of observations 28The SAS System 19:10 Friday, April 16, 2012 2The ANOVA ProcedureDependent Variable: xSum ofSource DF Squares Mean Square F Value Pr > FModel 2 81.4285714 40.7142857 11.31 0.0003Error 25 90.0000000 3.6000000Corrected Total 27 171.4285714R-Square Coeff Var Root MSE x Mean0.475000 18.70643 1.897367 10.14286Source DF Anova SS Mean Square F Value Pr > Ftype 2 81.42857143 40.71428571 11.31 0.0003 结论:在显著水平为0.05下0.0003<0.05,所以各个方案的反应时间有着明显的差异。
代码:data l1;(p265 2 u1-u2)input lei n;do rep= 1to n;input x@@;output;end;cards;1 814 13 9 15 11 13 14 112 1210 12 7 11 8 12 9 10 13 9 10 9;proc ttest;class lei;var x;run;结果输出:The SAS System 19:08 Friday, April 23, 2012 2The TTEST ProcedureStatisticsLower CL Upper CL Lower CL Upper CL Variable lei N Mean Mean Mean Std Dev Std Dev Std Dev Std Errx 1 8 10.828 12.5 14.172 1.3223 2 4.07050.7071x 2 12 8.883 10 11.117 1.2454 1.7581 2.9850.5075x Diff (1-2) 0.7203 2.5 4.2797 1.4024 1.8559 2.7446 0.8471T-TestsVariable Method Variances DF t Value Pr > |t|x Pooled Equal 18 2.95 0.0085x Satterthwaite Unequal 13.7 2.87 0.0125Equality of VariancesVariable Method Num DF Den DF F Value Pr > Fx Folded F 7 11 1.29 0.6738结论:u1-u2,u1-u3,u2-u3的置信水平为0.95的置信区间为(0.72,4.28),(2.55,6.45),(0.22,3)代码:data l1;(p265 2 u1-u3)input lei n;do rep= 1to n;input x@@;output;end;cards;1 814 13 9 15 11 13 14 112 811 5 9 10 6 8 8 7;proc ttest;class lei;var x;run;结果输出:The SAS System 19:08 Friday, April 23, 2012 3The TTEST ProcedureStatisticsLower CL Upper CL Lower CL Upper CL Variable lei N Mean Mean Mean Std Dev Std Dev Std Dev Std Errx 1 8 10.828 12.5 14.172 1.3223 2 4.07050.7071x 2 8 6.328 8 9.672 1.3223 2 4.07050.7071x Diff (1-2) 2.3552 4.5 6.6448 1.4643 2 3.15421T-TestsVariable Method Variances DF t Value Pr > |t|x Pooled Equal 14 4.50 0.0005x Satterthwaite Unequal 14 4.50 0.0005Equality of VariancesVariable Method Num DF Den DF F Value Pr > Fx Folded F 7 7 1.00 1.0000结论:代码:data l1;(p265 2 u2-u3)input lei n;do rep= 1to n;input x@@;output;end;cards;1 1210 12 7 11 8 12 9 10 13 9 10 92 811 5 9 10 6 8 8 7;proc ttest;class lei;var x;run;结果输出:The SAS System 19:08 Friday, April 23, 2012 4The TTEST ProcedureStatisticsLower CL Upper CL Lower CL Upper CL Variable lei N Mean Mean Mean Std Dev Std Dev Std Dev Std Errx 1 12 8.883 10 11.117 1.2454 1.7581 2.9850.5075x 2 8 6.328 8 9.672 1.3223 2 4.07050.7071x Diff (1-2) 0.2203 2 3.7797 1.4024 1.8559 2.7446 0.8471T-TestsVariable Method Variances DF t Value Pr > |t|x Pooled Equal 18 2.36 0.0297x Satterthwaite Unequal 13.7 2.30 0.0378Equality of VariancesVariable Method Num DF Den DF F Value Pr > Fx Folded F 7 11 1.29 0.6738结论:p265 3某防治站对4个林场的松毛虫密度进行调查,每个林场调查5块地得资料如下表:地点松毛虫密度(头/标准地)192 189 176 185 190190 201 187 196 200188 179 191 183 194187 180 188 175 182判断4个林场松毛从密度有无显著差异,取显著性水平α=0.05.代码:data l1;(p265 3)do b=1to5;do a=1to4;input x@@;output;end;end;cards;192 190 188 187 189 201 179 180 176 187 191 188185 196 183 175 190 200 194 182proc anova;class a;model x=a;run;结果输出:The SAS System 19:15 Friday, April 9, 2012 7The ANOVA ProcedureClass Level InformationClass Levels Valuesa 4 1 2 3 4Number of observations 20The SAS System 19:15 Friday, April 9, 2012 8The ANOVA ProcedureDependent Variable: xSum ofSource DF Squares Mean Square F Value Pr > FModel 3 403.3500000 134.4500000 3.77 0.0321 Error 16 571.2000000 35.7000000Corrected Total 19 974.5500000R-Square Coeff Var Root MSE x Mean0.413883 3.184091 5.974948 187.6500Source DF Anova SS Mean Square F Value Pr > Fa 3 403.3500000 134.4500000 3.77 0.0321 结论:P265 4一试验用来比较4种不同药品解除外科手术后疼痛的延长时间(h),结果如下表:药品时间长度(h)A 8 6 4 2B 6 6 4 4C 8 10 10 10 12D 4 4 2试在显著性水平α=0.05下检验各种药品对解除疼痛的延续时间有无显著差异。