计量第四章作业
- 格式:pdf
- 大小:407.10 KB
- 文档页数:27
第四章 多元线性回归模型的估计与假设检验问题4.1什么是偏回归系数? 答:在总体回归函数12233k k Y X X X u ββββ=+++++中,系数2,,k ββ被称为斜率系数或偏回归系数。
(多元样本回归函数的系数亦称偏回归系数)4.2什么是完全多重共线性?什么是高度共线性(近似完全共线性)?答:对于解释变量123,,...k X X X X ,如果存在不全为0的数123,,...k λλλλ,使得112233...0k k X X X X λλλλ++++=则称解释变量之间存在着完全的多重共线性。
如果解释变量123,,...k X X X X 之间存在较大的相关性,但又不是完全共线性,则称解释变量之间存在不完全多重共线性。
4.3 多元回归方程中偏回归系数与一元回归方程中回归系数的含义有何差别? 答:相同点:两者都表示当X 每变化一单位时,Y 的均值的变化。
不同点:偏回归系数是表示当其他解释变量不变时,这一解释变量对被解释变量的影响。
而回归系数则不存在其他解释变量,也就不需要对其他变量进行限制。
4.4 几个变量“联合显著”的含义是什么?答:联合显著的含义是,几个变量作为一个集体是显著的。
即在它们的系数同时为0的假设下,统计量超过临界值。
直观的意义是,它们的系数同时为零的可能性很小。
习题4.5下表中的数据23,,Y X X 分别表示每周销售量,每周的广告投入和每周顾客的平均收入(见DATA4-5)(1)估计回归方程12233()E Y X X βββ=++。
(2)计算拟合优度。
(3)计算校正拟合优度。
(4)计算2β的置信区间(置信水平为95%)。
(5)检验假设03H :0β=(备择假设13H :0β≠,显著性水平为5%) (6)检验假设03H :0β=(备择假设13H :0β>,显著性水平为5%)(7)检验建设023H :0ββ==(显著性水平为5%)。
答:(1)由eviews6.0输出结果:可知1ˆ109.4β=,23ˆˆ2.835714, 5.125714ββ== 回归方程为:23()109.4 2.835714 5.125714E Y X X =++(2)由输出结果可以得到拟合优度为0.910086。
计量地理学第二版第四章11题答案1、从四大区域的角度看,未跨区域的工程是()[单选题] *A.?南水北调工程B.?青藏铁路工程(正确答案)C.?西气东输工程D.?西电东送工程2、10.下列地理区域主要位于非季风区的是( ) [单选题] *A.北方地区、南方地区B.北方地区、西北地区C.西北地区、青藏地区(正确答案)D.青藏地区、南方地区3、日本经济的脆弱性表现在()[单选题] *A、经济增长的速度太快B、矿产资源贫乏,绝大部分依赖进口(正确答案)C、劳动力不足,且技术水平低D、工业发达,农业落后4、下列关于西北地区的叙述,不正确的是( )。
[单选题] *A.西北地区的地形以高原和盆地为主B.西北地区气候干旱,草原、荒漠广布C.西北地区河流多为内流河D.贺兰山以东地区降水较少,以西地区降水较多(正确答案)5、5.东南亚的中南半岛的地形特点是()[单选题] *A.东高西低B.河流广阔C.山河相间,纵列分布(正确答案)D.北部山地,中部平原,南部高原6、实现“东西部双赢”战略,与新疆有关的工程项目是()[单选题] *A.?西电东送B.?西气东输(正确答案)C.?青藏铁路D.?南水北调7、下列关于东北地区的描述,正确的是[单选题] *A.为了保护生态环境,三江平原停止开荒(正确答案)B.图中P、Q两河段初春初冬都有凌汛发生C.东北三省重工业发达依托的是科技人才集中D.三江平原濒临海洋,属于温带季风气候8、3.丽丽在暑假游记中写道:“那天我终于看到了一年四季绿树常青,水田稻花正飘香的美景.”这让我想起了家乡的另一番景象:“美丽的草原我的家,风吹草低遍地花,牛羊好似珍珠撒,蒙古包就像白莲花”.该游记描述的两个地区分别是()[单选题] *A.北方地区、南方地区B.南方地区、青藏地区C.南方地区、西北地区(正确答案)D.西北地区、青藏地区9、39.该国家人民主要使用的语言及所属人种是()[单选题] *A.白色人种英语B.白色人种阿拉伯语(正确答案)C.黑色人种阿拉伯语D.黑色人种英语10、68.关于美国农业生产的自然条件,下列描述不正确的是()[单选题] *A.耕地面积广大B.河湖灌溉便利C.地处热带,光热条件好(正确答案)D.土壤肥沃11、关于图示内容的叙述,正确的是[单选题] *A.准噶尔盆地位于地势第一级阶梯上B.太行山以东为华北平原(正确答案)C.图③所示黄河河段位于下游D.祁连山以南是四川盆地12、下列属于东南亚的岛屿是()*A、中南半岛(正确答案)B、钓鱼岛C、马来群岛(正确答案)D、海南岛13、目前印度发展农业面临的巨大压力的主要原因是()[单选题] *A、农村劳动力少B、水旱灾害频繁(正确答案)C、环境污染严重D、土地沙漠化14、13.(应用题)“羌笛何须怨杨柳,春风不度玉门关。
第四章一元线性回归第一部分学习目的和要求本章主要介绍一元线性回归模型、回归系数的确定和回归方程的有效性检验方法。
回归方程的有效性检验方法包括方差分析法、t检验方法和相关性系数检验方法。
本章还介绍了如何应用线性模型来建立预测和控制。
需要掌握和理解以下问题:1 一元线性回归模型2 最小二乘方法3 一元线性回归的假设条件4 方差分析方法5 t检验方法6 相关系数检验方法7 参数的区间估计8 应用线性回归方程控制与预测9 线性回归方程的经济解释第二部分练习题一、术语解释1 解释变量2 被解释变量3 线性回归模型4 最小二乘法5 方差分析6 参数估计7 控制8 预测二、填空ξ,目的在于使模型更1 在经济计量模型中引入反映()因素影响的随机扰动项t符合()活动。
2 在经济计量模型中引入随机扰动项的理由可以归纳为如下几条:(1)因为人的行为的()、社会环境与自然环境的()决定了经济变量本身的();(2)建立模型时其他被省略的经济因素的影响都归入了()中;(3)在模型估计时,()与归并误差也归入随机扰动项中;(4)由于我们认识的不足,错误的设定了()与()之间的数学形式,例如将非线性的函数形式设定为线性的函数形式,由此产生的误差也包含在随机扰动项中了。
3 ()是因变量离差平方和,它度量因变量的总变动。
就因变量总变动的变异来源看,它由两部分因素所组成。
一个是自变量,另一个是除自变量以外的其他因素。
()是拟合值的离散程度的度量。
它是由自变量的变化引起的因变量的变化,或称自变量对因变量变化的贡献。
()是度量实际值与拟合值之间的差异,它是由自变量以外的其他因素所致,它又叫残差或剩余。
4 回归方程中的回归系数是自变量对因变量的()。
某自变量回归系数β的意义,指的是该自变量变化一个单位引起因变量平均变化( )个单位。
5 模型线性的含义,就变量而言,指的是回归模型中变量的( );就参数而言,指的是回归模型中的参数的( );通常线性回归模型的线性含义是就( )而言的。
第四章习题4.1 没有进行t检验,并且调整的可决系数也没有写出来,也就是没有考虑自由度的影响,会使结果存在误差.4.3200224430.3120332。
7 330.6200334195。
6135822.8 334。
6200446435.8159878.3 l347.7200554273.7183084.8 353.9200663376.9211923。
5 359。
2200773284。
6249529。
9 376.5200879526.5314045.4 398.7200968618。
4340902。
8 395。
9201094699.3401512.8 408。
92011113161.4472881.6 431.0一研究的目的和要求我们知道,商品进口额与很多因素有关,了解其变化对进出口产品有很大帮助。
为了探究和预测商品进口额的变化,需要定量地分析影响商品进口额变化的主要因素。
二、模型的设定及其估计经分析,商品进口额可能与国内生产总值、居民消费价格指数有关。
为此,考虑国内生产总值GDP、居民消费价格指数CPI为主要因素。
各影响变量与商品进口额呈正相关。
为此,设定如下形式的计量经济模型:=+ln+lnCP式中,亿元);lnGDP为国内生产总值(亿元);lnCPI为居民消费价格指数(以1985年为100)。
各解释变量前的回归系数预期都大于零。
为估计模型,根据上表的数据,利用EViews软件,生成Y、lnGDP、lnCPI等数据,采用OLS方法估计模型参数,得到的回归结果如下图所示:模型方程为:lnY=-3。
111486+1。
338533lnGDP-0.421791lnCPI(0。
463010)(0。
088610)(0。
233295)t= (—6。
720126) (15。
10582)(—1。
807975)=0.988051 =0.987055 F=992。
2582该模型=0.988051,=0。
987055,可决系数很高,F检验值为992.2582,明显显著。
第四章 习 题一、单选题1、如果回归模型违背了同方差假定,最小二乘估计量____A .无偏的,非有效的 B.有偏的,非有效的C .无偏的,有效的 D.有偏的,有效的2、Goldfeld-Quandt 方法用于检验____A .异方差性 B.自相关性C .随机解释变量 D.多重共线性3、DW 检验方法用于检验____A .异方差性 B.自相关性C .随机解释变量 D.多重共线性4、在异方差性情况下,常用的估计方法是____A .一阶差分法 B.广义差分法C .工具变量法 D.加权最小二乘法5、在以下选项中,正确表达了序列自相关的是____j i u x Cov D j i x x Cov C ji u u Cov B ji u u Cov A j i j i j i j i ≠≠≠≠≠=≠≠,0),(.,0),(.,0),(.,0),(.6、如果回归模型违背了无自相关假定,最小二乘估计量____A .无偏的,非有效的 B.有偏的,非有效的C .无偏的,有效的 D.有偏的,有效的7、在自相关情况下,常用的估计方法____A .普通最小二乘法 B.广义差分法C .工具变量法 D.加权最小二乘法8、White 检验方法主要用于检验____A .异方差性 B.自相关性C .随机解释变量 D.多重共线性9、Glejser 检验方法主要用于检验____A .异方差性 B.自相关性C .随机解释变量 D.多重共线性10、简单相关系数矩阵方法主要用于检验____A .异方差性 B.自相关性C .随机解释变量 D.多重共线性11、所谓异方差是指____2222)(.)(.)(.)(.σσσσ==≠≠i i i i x Var D u Var C x Var B u Var A12、所谓不完全多重共线性是指存在不全为零的数k λλλ,,,21 ,有____1112211221221122.0.0..k k k k k x x x k k k k A x x x v B x x x C x x x v e D x x x v e v λλλλλλλλλλλλ++++=+++=∑⎰++++=++++=式中是随机误差项13、设21,x x 为解释变量,则完全多重共线性是____0.(021.0.021.22121121=+=++==+x x e x D v v x x C e x B x x A 为随机误差项) 14、广义差分法是对____用最小二乘法估计其参数11211211121121)()1(....-------+-+-=-++=++=++=t t t t t t t t t t t t tt t u u x x y y D u x y C u x y B u x y A ρρβρβρρρβρβρββββ15、在DW 检验中要求有假定条件,在下列条件中不正确的是____A .解释变量为非随机的 B.随机误差项为一阶自回归形式C .线性回归模型中不应含有滞后内生变量为解释变量D.线性回归模型为一元回归形式16、在下例引起序列自相关的原因中,不正确的是____A.经济变量具有惯性作用B.经济行为的滞后性C.设定偏误D.解释变量之间的共线性17、在DW 检验中,当d 统计量为2时,表明____A.存在完全的正自相关B.存在完全的负自相关C.不存在自相关D.不能判定18、在DW 检验中,当d 统计量为4时,表明____A.存在完全的正自相关B.存在完全的负自相关C.不存在自相关D.不能判定19、在DW 检验中,当d 统计量为0时,表明____A.存在完全的正自相关B.存在完全的负自相关C.不存在自相关D.不能判定20、在DW 检验中,存在不能判定的区域是____A. 0﹤d ﹤l d ,4-l d ﹤d ﹤4B. u d ﹤d ﹤4-u dC. l d ﹤d ﹤u d ,4-u d ﹤d ﹤4-l dD. 上述都不对21、在修正序列自相关的方法中,能修正高阶自相关的方法是____A. 利用DW 统计量值求出ρˆ B. Cochrane-Orcutt 法 C. Durbin 两步法 D. 移动平均法22、在下列多重共线性产生的原因中,不正确的是____A.经济本变量大多存在共同变化趋势B.模型中大量采用滞后变量C.由于认识上的局限使得选择变量不当D.解释变量与随机误差项相关23、在DW 检验中,存在正自相关的区域是____A. 4-l d ﹤d ﹤4B. 0﹤d ﹤l dC. u d ﹤d ﹤4-u dD. l d ﹤d ﹤u d ,4-u d ﹤d ﹤4-l d24、逐步回归法既检验又修正了____A .异方差性 B.自相关性 C .随机解释变量 D.多重共线性25、设)()(,2221i i i i i i x f u Var u x y σσββ==++=,则对原模型变换的正确形式为____ )()()()(.)()()()(.)()()()(..212222122121i i i i i i i i i i i i i i i i i i i i i i i i x f u x f x x f x f y D x f u x f x x f x f y C x f u x f x x f x f y B u x y A ++=++=++=++=ββββββββ26、在修正序列自相关的方法中,不正确的是____A.广义差分法B.普通最小二乘法C.一阶差分法D. Durbin 两步法27、在检验异方差的方法中,不正确的是____A. Goldfeld-Quandt 方法B. spearman 检验法C. White 检验法D. DW 检验法28、在DW 检验中,存在零自相关的区域是____A. 4-l d ﹤d ﹤4B. 0﹤d ﹤l dC. u d ﹤d ﹤4-u dD. l d ﹤d ﹤u d ,4-u d ﹤d ﹤4-l d29.如果模型中的解释变量存在完全的多重共线性,参数的最小二乘估计量是( )A .无偏的 B. 有偏的 C. 不确定 D. 确定的30. 已知模型的形式为u x y 21+β+β=,在用实际数据对模型的参数进行估计的时候,测得DW 统计量为0.6453,则广义差分变量是( )A. 1t t ,1t t x 6453.0x y 6453.0y ----B. 1t t 1t t x 6774.0x ,y 6774.0y ----C. 1t t 1t t x x ,y y ----D. 1t t 1t t x 05.0x ,y 05.0y ----31. 在具体运用加权最小二乘法时,如果变换的结果是x u x x x 1xy 21+β+β=,则Var(u)是下列形式中的哪一种?( )A. 2σxB. 2σ2x B. 2σx D. 2σLog(x)32. 在线性回归模型中,若解释变量1x 和2x 的观测值成比例,即有i 2i 1kx x =,其中k 为非零常数,则表明模型中存在( )A. 异方差B. 多重共线性C. 序列自相关D. 设定误差33. 已知DW 统计量的值接近于2,则样本回归模型残差的一阶自相关系数ρˆ近似等于( ) A. 0 B. –1 C. 1 D. 4二、多项选择1、能够检验多重共线性的方法有____A.简单相关系数法B. DW 检验法C. 判定系数检验法D. 方差膨胀因子检验E.逐步回归法3、能够修正多重共线性的方法有____A.增加样本容量B.岭回归法C.剔除多余变量D.逐步回归法E.差分模型3、如果模型中存在异方差现象,则会引起如下后果____A. 参数估计值有偏B. 参数估计值的方差不能正确确定C. 变量的显著性检验失效D. 预测精度降低E. 参数估计值仍是无偏的4、能够检验异方差的方法是____A. gleiser检验法B. White检验法C. 图形法D. spearman检验法E. DW检验法F. Goldfeld-Quandt检验法5、如果模型中存在序列自相关现象,则会引起如下后果____A. 参数估计值有偏B. 参数估计值的方差不能正确确定C. 变量的显著性检验失效D. 预测精度降低E. 参数估计值仍是无偏的6、检验序列自相关的方法是____A. gleiser检验法B. White检验法C. 图形法D. DW检验法E. Goldfeld-Quandt检验法7、能够修正序列自相关的方法有____A. 加权最小二乘法B. Durbin两步法C.广义最小二乘法D. 一阶差分法E.广义差分法8、Goldfeld-Quandt检验法的应用条件是____A. 将观测值按解释变量的大小顺序排列B. 样本容量尽可能大C. 随机误差项服从正态分布D. 将排列在中间的约1/4的观测值删除掉9、在DW检验中,存在不能判定的区域是____A. 0﹤d﹤l dB. u d﹤d﹤4-u dC. l d﹤d﹤u dD. 4-u d﹤d﹤4-l dE. 4-l d﹤d﹤4。
17CHAPTER 4SOLUTIONS TO PROBLEMS4.2 (i) and (iii) generally cause the t statistics not to have a t distribution under H 0.Homoskedasticity is one of the CLM assumptions. An important omitted variable violates Assumption MLR.3. The CLM assumptions contain no mention of the sample correlations among independent variables, except to rule out the case where the correlation is one.4.3 (i) While the standard error on hrsemp has not changed, the magnitude of the coefficient has increased by half. The t statistic on hrsemp has gone from about –1.47 to –2.21, so now the coefficient is statistically less than zero at the 5% level. (From Table G.2 the 5% critical value with 40 df is –1.684. The 1% critical value is –2.423, so the p -value is between .01 and .05.)(ii) If we add and subtract 2βlog(employ ) from the right-hand-side and collect terms, we havelog(scrap ) = 0β + 1βhrsemp + [2βlog(sales) – 2βlog(employ )] + [2βlog(employ ) + 3βlog(employ )] + u = 0β + 1βhrsemp + 2βlog(sales /employ ) + (2β + 3β)log(employ ) + u ,where the second equality follows from the fact that log(sales /employ ) = log(sales ) – log(employ ). Defining 3θ ≡ 2β + 3β gives the result.(iii) No. We are interested in the coefficient on log(employ ), which has a t statistic of .2, which is very small. Therefore, we conclude that the size of the firm, as measured by employees, does not matter, once we control for training and sales per employee (in a logarithmic functional form).(iv) The null hypothesis in the model from part (ii) is H 0:2β = –1. The t statistic is [–.951 – (–1)]/.37 = (1 – .951)/.37 ≈ .132; this is very small, and we fail to reject whether we specify a one- or two-sided alternative.4.4 (i) In columns (2) and (3), the coefficient on profmarg is actually negative, although its t statistic is only about –1. It appears that, once firm sales and market value have been controlled for, profit margin has no effect on CEO salary.(ii) We use column (3), which controls for the most factors affecting salary. The t statistic on log(mktval ) is about 2.05, which is just significant at the 5% level against a two-sided alternative.18(We can use the standard normal critical value, 1.96.) So log(mktval ) is statistically significant. Because the coefficient is an elasticity, a ceteris paribus 10% increase in market value is predicted to increase salary by 1%. This is not a huge effect, but it is not negligible, either.(iii) These variables are individually significant at low significance levels, with t ceoten ≈ 3.11 and t comten ≈ –2.79. Other factors fixed, another year as CEO with the company increases salary by about 1.71%. On the other hand, another year with the company, but not as CEO, lowers salary by about .92%. This second finding at first seems surprising, but could be related to the “superstar” effect: firms that hire CEOs from outside the company often go after a small pool of highly regarded candidates, and salaries of these people are bid up. More non-CEO years with a company makes it less likely the person was hired as an outside superstar.4.7 (i) .412 ± 1.96(.094), or about .228 to .596.(ii) No, because the value .4 is well inside the 95% CI.(iii) Yes, because 1 is well outside the 95% CI.4.8 (i) With df = 706 – 4 = 702, we use the standard normal critical value (df = ∞ in Table G.2), which is 1.96 for a two-tailed test at the 5% level. Now t educ = −11.13/5.88 ≈ −1.89, so |t educ | = 1.89 < 1.96, and we fail to reject H 0: educ β = 0 at the 5% level. Also, t age ≈ 1.52, so age is also statistically insignificant at the 5% level.(ii) We need to compute the R -squared form of the F statistic for joint significance. But F = [(.113 − .103)/(1 − .113)](702/2) ≈ 3.96. The 5% critical value in the F 2,702 distribution can be obtained from Table G.3b with denominator df = ∞: cv = 3.00. Therefore, educ and age are jointly significant at the 5% level (3.96 > 3.00). In fact, the p -value is about .019, and so educ and age are jointly significant at the 2% level.(iii) Not really. These variables are jointly significant, but including them only changes the coefficient on totwrk from –.151 to –.148.(iv) The standard t and F statistics that we used assume homoskedasticity, in addition to the other CLM assumptions. If there is heteroskedasticity in the equation, the tests are no longer valid.4.11 (i) Holding profmarg fixed, n rdintensΔ = .321 Δlog(sales ) = (.321/100)[100log()sales ⋅Δ] ≈ .00321(%Δsales ). Therefore, if %Δsales = 10, n rdintens Δ ≈ .032, or only about 3/100 of a percentage point. For such a large percentage increase in sales,this seems like a practically small effect.(ii) H 0:1β = 0 versus H 1:1β > 0, where 1β is the population slope on log(sales ). The t statistic is .321/.216 ≈ 1.486. The 5% critical value for a one-tailed test, with df = 32 – 3 = 29, is obtained from Table G.2 as 1.699; so we cannot reject H 0 at the 5% level. But the 10% criticalvalue is 1.311; since the t statistic is above this value, we reject H0 in favor of H1 at the 10% level.(iii) Not really. Its t statistic is only 1.087, which is well below even the 10% critical value for a one-tailed test.1920SOLUTIONS TO COMPUTER EXERCISESC4.1 (i) Holding other factors fixed,111log()(/100)[100log()](/100)(%),voteA expendA expendA expendA βββΔ=Δ=⋅Δ≈Δwhere we use the fact that 100log()expendA ⋅Δ ≈ %expendA Δ. So 1β/100 is the (ceteris paribus) percentage point change in voteA when expendA increases by one percent.(ii) The null hypothesis is H 0: 2β = –1β, which means a z% increase in expenditure by A and a z% increase in expenditure by B leaves voteA unchanged. We can equivalently write H 0: 1β + 2β = 0.(iii) The estimated equation (with standard errors in parentheses below estimates) isn voteA = 45.08 + 6.083 log(expendA ) – 6.615 log(expendB ) + .152 prtystrA(3.93) (0.382) (0.379) (.062) n = 173, R 2 = .793.The coefficient on log(expendA ) is very significant (t statistic ≈ 15.92), as is the coefficient on log(expendB ) (t statistic ≈ –17.45). The estimates imply that a 10% ceteris paribus increase in spending by candidate A increases the predicted share of the vote going to A by about .61percentage points. [Recall that, holding other factors fixed, n voteAΔ≈(6.083/100)%ΔexpendA ).] Similarly, a 10% ceteris paribus increase in spending by B reduces n voteAby about .66 percentage points. These effects certainly cannot be ignored.While the coefficients on log(expendA ) and log(expendB ) are of similar magnitudes (andopposite in sign, as we expect), we do not have the standard error of 1ˆβ + 2ˆβ, which is what we would need to test the hypothesis from part (ii).(iv) Write 1θ = 1β +2β, or 1β = 1θ– 2β. Plugging this into the original equation, and rearranging, givesn voteA = 0β + 1θlog(expendA ) + 2β[log(expendB ) – log(expendA )] +3βprtystrA + u ,When we estimate this equation we obtain 1θ≈ –.532 and se( 1θ)≈ .533. The t statistic for the hypothesis in part (ii) is –.532/.533 ≈ –1. Therefore, we fail to reject H 0: 2β = –1β.21C4.3 (i) The estimated model isn log()price = 11.67 + .000379 sqrft + .0289 bdrms (0.10) (.000043) (.0296)n = 88, R 2 = .588.Therefore, 1ˆθ= 150(.000379) + .0289 = .0858, which means that an additional 150 square foot bedroom increases the predicted price by about 8.6%.(ii) 2β= 1θ – 1501β, and solog(price ) = 0β+ 1βsqrft + (1θ – 1501β)bdrms + u= 0β+ 1β(sqrft – 150 bdrms ) + 1θbdrms + u .(iii) From part (ii), we run the regressionlog(price ) on (sqrft – 150 bdrms ), bdrms ,and obtain the standard error on bdrms . We already know that 1ˆθ= .0858; now we also getse(1ˆθ) = .0268. The 95% confidence interval reported by my software package is .0326 to .1390(or about 3.3% to 13.9%).C4.5 (i) If we drop rbisyr the estimated equation becomesn log()salary = 11.02 + .0677 years + .0158 gamesyr (0.27) (.0121) (.0016)+ .0014 bavg + .0359 hrunsyr (.0011) (.0072)n = 353, R 2= .625.Now hrunsyr is very statistically significant (t statistic ≈ 4.99), and its coefficient has increased by about two and one-half times.(ii) The equation with runsyr , fldperc , and sbasesyr added is22n log()salary = 10.41 + .0700 years + .0079 gamesyr(2.00) (.0120) (.0027)+ .00053 bavg + .0232 hrunsyr (.00110) (.0086)+ .0174 runsyr + .0010 fldperc – .0064 sbasesyr (.0051) (.0020) (.0052) n = 353, R 2 = .639.Of the three additional independent variables, only runsyr is statistically significant (t statistic = .0174/.0051 ≈ 3.41). The estimate implies that one more run per year, other factors fixed,increases predicted salary by about 1.74%, a substantial increase. The stolen bases variable even has the “wrong” sign with a t statistic of about –1.23, while fldperc has a t statistic of only .5. Most major league baseball players are pretty good fielders; in fact, the smallest fldperc is 800 (which means .800). With relatively little variation in fldperc , it is perhaps not surprising that its effect is hard to estimate.(iii) From their t statistics, bavg , fldperc , and sbasesyr are individually insignificant. The F statistic for their joint significance (with 3 and 345 df ) is about .69 with p -value ≈ .56. Therefore, these variables are jointly very insignificant.C4.7 (i) The minimum value is 0, the maximum is 99, and the average is about 56.16. (ii) When phsrank is added to (4.26), we get the following:n log() wage = 1.459 − .0093 jc + .0755 totcoll + .0049 exper + .00030 phsrank (0.024) (.0070) (.0026) (.0002) (.00024)n = 6,763, R 2 = .223So phsrank has a t statistic equal to only 1.25; it is not statistically significant. If we increase phsrank by 10, log(wage ) is predicted to increase by (.0003)10 = .003. This implies a .3% increase in wage , which seems a modest increase given a 10 percentage point increase in phsrank . (However, the sample standard deviation of phsrank is about 24.)(iii) Adding phsrank makes the t statistic on jc even smaller in absolute value, about 1.33, but the coefficient magnitude is similar to (4.26). Therefore, the base point remains unchanged: the return to a junior college is estimated to be somewhat smaller, but the difference is not significant and standard significant levels.(iv) The variable id is just a worker identification number, which should be randomly assigned (at least roughly). Therefore, id should not be correlated with any variable in the regression equation. It should be insignificant when added to (4.17) or (4.26). In fact, its t statistic is about .54.23C4.9 (i) The results from the OLS regression, with standard errors in parentheses, aren log() psoda =−1.46 + .073 prpblck + .137 log(income ) + .380 prppov (0.29) (.031) (.027) (.133)n = 401, R 2 = .087The p -value for testing H 0: 10β= against the two-sided alternative is about .018, so that we reject H 0 at the 5% level but not at the 1% level.(ii) The correlation is about −.84, indicating a strong degree of multicollinearity. Yet eachcoefficient is very statistically significant: the t statistic for log()ˆincome β is about 5.1 and that forˆprppovβ is about 2.86 (two-sided p -value = .004).(iii) The OLS regression results when log(hseval ) is added aren log() psoda =−.84 + .098 prpblck − .053 log(income ) (.29) (.029) (.038) + .052 prppov + .121 log(hseval ) (.134) (.018)n = 401, R 2 = .184The coefficient on log(hseval ) is an elasticity: a one percent increase in housing value, holding the other variables fixed, increases the predicted price by about .12 percent. The two-sided p -value is zero to three decimal places.(iv) Adding log(hseval ) makes log(income ) and prppov individually insignificant (at even the 15% significance level against a two-sided alternative for log(income ), and prppov is does not have a t statistic even close to one in absolute value). Nevertheless, they are jointly significant at the 5% level because the outcome of the F 2,396 statistic is about 3.52 with p -value = .030. All of the control variables – log(income ), prppov , and log(hseval ) – are highly correlated, so it is not surprising that some are individually insignificant.(v) Because the regression in (iii) contains the most controls, log(hseval ) is individually significant, and log(income ) and prppov are jointly significant, (iii) seems the most reliable. It holds fixed three measure of income and affluence. Therefore, a reasonable estimate is that if the proportion of blacks increases by .10, psoda is estimated to increase by 1%, other factors held fixed.。
计量经济学第四章习题第四章练习题1. 什么是异⽅差性?试举例说明经济现象中的异⽅差性。
检验异⽅差性的⽅法思路是什么? 2. 判断题。
并简单说明理由。
(1) 存在异⽅差时,普通最⼩⼆乘法(OLS )估计量是有偏的和⽆效的; (2) 存在异⽅差时,常⽤的t 检验和F 检验失效;(3) 存在异⽅差时,常⽤的OLS 估计⼀定是⾼估了估计量的标准差; (4)如果从OLS 回归中估计的残差呈现出系统性,则意味着数据中存在着异⽅差; (5) 存在序列相关时,OLS 估计量是有偏的并且也是⽆效的; (6) 消除序列相关的⼀阶差分变换假定⾃相关系数ρ必须等于1; (7) 回归模型中误差项t u 存在异⽅差时,OLS 估计不再是有效的; (8) 存在多重共线性时,模型参数⽆法估计;(9)存在多重共线性时,⼀定会使参数估计值的⽅差增⼤,从⽽造成估计效率的损失;(10) ⼀旦模型中的解释变量是随机变量,则违背了基本假设,使得模型的OLS 估计量有偏且不⼀致。
3. 回归模型中误差项t u 存在序列相关时,OLS 估计不再是⽆偏的;已知消费模型:01122t t t t y x x u ααα=+++。
其中,t y :消费⽀出;t x 1:个⼈可⽀配收⼊;t x 2:消费者的流动资产。
设0)(=t u E ,为常数)其中2212()(σσt t ar x u V =。
要求: (1)进⾏适当变换消除异⽅差,并证明之。
(2)写出消除异⽅差后,模型的参数估计量的表达式。
4. 简述异⽅差对下列各项有何影响:(1) OLS 估计量及其⽅差; (2) 置信区间;(3)显著性t 检验和F 检验的使⽤。
5. 已知模型:22201122,()t t t t t t t Y X X u Var u Z βββσσ=+++==。
式中,Y 、X 1、X 2和Z 的数据已知。
假设给定权数t w ,加权最⼩⼆乘法就是求下式中的各β,以使的下式最⼩2221102)()(t t t t t t t t t X w X w w Y w u w RSS βββ---==∑∑(1) 求RSS 对β1、β2和β2的偏微分并写出正规⽅程。
E4.1E4.2 E4.3 E4.4E4.1VARIABLES aheage 0.605(0.0245)Constant 1.082(0.688)Observations 7,711R-squared 0.029Robust standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.11. ① 截距估计值estimated intercept: 1.082② 斜率估计值estimated slope: 0.605回归方程:ahe= 1.082+0.605*age③ 当工人年长 1 岁,平均每小时工资增加0.605 美元。
2. Bob: 0.605*26+1.082=16.812 (美元)Alexis: 0.605*30+1.082=19.232 (美元)答:预测Bob 的收入为每小时16.812美元,Alexis为19.232 美元。
3. 年龄不能解释不同个体收入变化的大部分。
因为R-squared 反映了因变量的全部变化能通过回归关系被自变量充分解释的比例,而分析得R-squared 的值为0.029,解释度低,说明年龄不能解释不同个体收入变化的大部分E4.1(0.0449)Observations 463 R-squared 0.036Robust standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.1① 截距估计值: 3.998斜率估计值: 0.133回归方程: Course_Eval=3.998+0.133*beautylave_esruo0a u ty a e1.答:两者看上去有微弱的正相关关系2.VARIABLES course eval beautyConstant0.133 (0.0550) 3.998//mean beauty② 截距的估计值=Course_Eval的样本均值-斜率估计值*Beauty 的样本均值计算得Beauty 的样本均值趋近于零,所以截距的估计值等于Course_Eval的样本均值。
第四章练习题及参考解答4.1 假设在模型i i i i u X X Y +++=33221βββ中,32X X 与之间的相关系数为零,于是有人建议你进行如下回归:ii i i i i u X Y u X Y 23311221++=++=γγαα(1)是否存在3322ˆˆˆˆβγβα==且?为什么? (2)111ˆˆˆβαγ会等于或或两者的某个线性组合吗? (3)是否有()()()()3322ˆvar ˆvar ˆvar ˆvar γβαβ==且?练习题4.1参考解答:(1) 存在3322ˆˆˆˆβγβα==且。
因为()()()()()()()23223223232322ˆ∑∑∑∑∑∑∑--=iiiii iii iii x x x x x x x y x x y β当32X X 与之间的相关系数为零时,离差形式的032=∑i ix x有()()()()222223222322ˆˆαβ===∑∑∑∑∑∑iiiiiiii xx y x x x x y 同理有:33ˆˆβγ= (2) 111ˆˆˆβαγ会等于或的某个线性组合 因为 12233ˆˆˆY X X βββ=--,且122ˆˆY X αα=-,133ˆˆY X γγ=- 由于3322ˆˆˆˆβγβα==且,则 11222222ˆˆˆˆˆY Y X Y X X αααββ-=-=-= 11333333ˆˆˆˆˆY Y X Y X X γγγββ-=-=-= 则 1112233231123ˆˆˆˆˆˆˆY Y Y X X Y X X Y X X αγβββαγ--=--=--=+- (3) 存在()()()()3322ˆvar ˆvar ˆvar ˆvar γβαβ==且。
因为()()∑-=22322221ˆvar r x iσβ当023=r 时,()()()22222232222ˆvar 1ˆvar ασσβ==-=∑∑iixr x 同理,有()()33ˆvar ˆvar γβ=4.2在决定一个回归模型的“最优”解释变量集时人们常用逐步回归的方法。
Southwestern University of Finance and
Economics
计量第四章作业
天姝
41112011
班级:11级金融服务与管理实验
题目一
表中给出了美国1971-1986年期间的数据,其中Y :售出新客车的数量(千量); X2: 新车价格指数(1967=100);X3:居民消费价格指数(1967=100);X4:个人可支配收入(PDI,10亿美元);X5:利率;X6:城市就业劳动力(千人)。
考虑下面的客车需求函数:
12233445566ln ln ln ln ln ln t t t t t t t
Y X X X X X u ββββββ=++++++(1)用OLS 法估计样本回归方程;
(2)如果模型存在多重共线性,试估计各辅助回归方程,找出那些变量是高度共线性的。
(3)如果存在严重的共线性,你会除去哪一个变量,为什么?(4)在除去一个或多个解释变量后,最终的客车需求函数是什么?这个模型在哪些方面好于包括所有解释变量的原始模型?
(5)你认为还有哪些变量可以更好的解释美国的汽车需求?
(1)
新的回归方程:
2346ln 7.142989 1.942652ln -4.464915ln +2.397534ln -0.142488ln t t t t t
Y X X X X =+各参数的系数变化没有特别大的变化,修正后的多重可决系数有所上升、F 统计量也有所上升,各变量的t 检验的统计量显著水平有所上升。
(5)汽油价格,国外汽车的价格等因素也会影响美国的汽车需求。
(3)选择适当方法,消除多重共线性,建立一个较好的回归模型。
(1)
估计的回归方程:
Y=4968.295+0.557410X2+5.256011X3-0.720540X4-0.206228X5-0.4 46377X6-9.148800X7-0.618037X8。