统计学完整版
- 格式:doc
- 大小:389.50 KB
- 文档页数:27
(完整版)统计学公式大全统计学公式大全本文档旨在提供统计学领域常用的公式大全,便于大家在研究和实践中进行参考和应用。
描述统计学公式中心趋势度量1. 平均数(Mean):$\bar{x} =\frac{{\sum_{i=1}^{n}x_i}}{n}$2. 中位数(Median):若数据个数为奇数,中位数为排序后的中间值;若数据个数为偶数,中位数为排序后的中间两个值的平均值。
3. 众数(Mode):出现频率最高的数值。
离散趋势度量1. 方差(Variance):$Var(x) = \frac{{\sum_{i=1}^{n}(x_i - \bar{x})^2}}{n}$2. 标准差(Standard Deviation):$SD(x) = \sqrt{Var(x)}$3. 极差(Range):$Range(x) = \max(x) - \min(x)$分布形状度量1. 偏度(Skewness):$\text{Skewness} =\frac{{\sum_{i=1}^{n}(x_i - \bar{x})^3}}{n \cdot SD(x)^3}$2. 峰度(Kurtosis):$\text{Kurtosis} =\frac{{\sum_{i=1}^{n}(x_i - \bar{x})^4}}{n \cdot SD(x)^4}$ 推断统计学公式参数估计1. 样本均值的抽样分布标准差(Standard Error of the Mean):$SE(\bar{x}) = \frac{{SD(x)}}{\sqrt{n}}$2. 双侧置信区间公式(Confidence Interval):$\bar{x} \pm Z\cdot SE(\bar{x})$3. 样本比例的抽样分布标准差(Standard Error of Proportion):$SE(p) = \sqrt{\frac{{p(1-p)}}{n}}$4. 双侧置信区间公式(Confidence Interval):$p \pm Z \cdotSE(p)$假设检验1. 样本均值和总体均值的差异(t检验):$t = \frac{{\bar{x} -\mu}}{{SE(\bar{x})}}$2. 双侧拒绝域临界值(t分布):$t_{\text{critical}} = \pmt_{\alpha/2, df}$3. 样本比例和总体比例的差异(z检验):$z = \frac{{\hat{p} - p}}{{SE(p)}}$4. 双侧拒绝域临界值(z分布):$z_{\text{critical}} = \pmz_{\alpha/2}$回归分析公式简单线性回归模型1. 回归方程(Simple Linear Regression):$y = \beta_0 +\beta_1x + \epsilon$2. 线性预测公式(Simple Linear Regression):$\hat{y} =\hat{\beta}_0 + \hat{\beta}_1x$3. 斯皮尔曼秩相关系数(Spearman's Rank Correlation Coefficient):$r_s = 1 - \frac{6\sum d_i^2}{n(n^2 - 1)}$4. 相关系数的显著性检验(t检验):$t = \frac{r}{\sqrt{\frac{1 - r^2}{n-2}}}$结论本文档列举了统计学领域常用的公式,包括描述统计学中的中心趋势度量、离散趋势度量和分布形状度量,推断统计学中的参数估计和假设检验,以及回归分析中的简单线性回归模型等相关公式。
统计学简答题完整版一、统计的含义与本质就是什么?P2含义:“统计”一词可以有三种含义:统计活动、统计数据与统计学。
统计活动就是对各种统计数据进行搜集、整理并做出相应的推断、分析的活动,通常被划分为统计调查、统计整理与统计分析三个阶段;统计数据就是通过统计活动获得的、用以表现研究现象特征的各种形式的数据;统计学则就是指导统计活动的理论与方法,就是关于如何搜集、整理与分析统计数据的科学。
本质:统计的本质就就是关于为何统计,统计什么与如何统计的思想。
二、统计数据有哪些分类?不同类型数据有什么不同特点?P71.统计数据按照所采用的计量尺度不同,可以分为定性数据与定量数据。
定性数据就是指只能用文字或数字代码来表现事物的品质特征或属性特征的数据,具体又分为定类数据与定序数据两种。
定量数据就是指用数值来表现事物数量特征的数据,具体又分为定距数据与定比数据两种。
2.统计数据按照其表现形式不同,可以分为绝对数、相对数与平均数。
绝对数就是用以反映现象或事物绝对数量特征的数据,有明确的计量单位。
相对数就是用以反映现象或事物相对数量特征的数据,它通过另外两个相关统计数据的对比来体现联系关系。
平均数就是用以反映现象或事物平均数量特征的数据,体现现象某一方面的一般数量水平。
3.统计数据按照其来源不同,可以分为观测数据与实验数据两类。
观测数据就是通过统计调查或观测的方式而获取的反映研究现象客观存在的数量特征的数据。
实验数据就是在人为控制的条件下,通过实验的方式而获得的关于实验对象的数据。
4.统计数据按照其加工程度不同,可以分为原始数据与次级数据两类。
原始数据就是指直接向调查对象搜集的、尚待加工整理、只反映个体特征的数据。
次级数据也称为加工数据或二手数据,就是指已经经过加工整理、能反映总体数量特征的各种非原始数据。
5.统计数据按照其时间或空间状态不同,可以分为时序数据与截面数据。
时序数据就是对同一现象在不同时间上搜集到的数据(即空间状态相同,时间状态不同)。
完整版)统计学名词解释统计学名词解释第一章绪论在统计学上,随机变量指的是取值之间不能预料到的变量。
总体,又称母全体或全域,是指具有某种特征的一类事物的全体。
构成总体的每个基本单元称为个体。
从总体中抽取的一部分个体称为样本。
次数指的是某一事件在某一类别中出现的数目,又称为频数。
频率,又称相对次数,指某一事件发生的次数被总的事件数目除,即某一数据出现的次数被这一组数据总个数去除。
概率指某一事物或某一情在某一总体中出现的比率。
一旦确定了某个值,就称这个值为某一变量的观测值。
参数,又称为总体参数,是描述一个总体情况的统计指标。
样本的那些特征值叫做统计量,又称特征值。
第二章统计图表统计表是由纵横交叉的线条绘制,并将数据按照一定的要求整理、归类、排列、填写在内的一种表格形式。
一般由表号、名称、标目、数字、表注组成。
统计图一般采用直角坐标系,通常横轴表示事物的组别或自变量x,称为分类轴。
纵轴表示事物出现的次数或因变量,称为数值轴。
一般由图号及图题、图目、图尺、图形、图例、图组成。
简单次数分布表适合数据个数和分布范围比较小的时候用,它是依据每一个分数值在一列数据中出现的次数或总计数资料编制成的统计表。
而分组次数分布表适合数据个数和分布范围比较大的时候用。
数据量很大时,应该把所有的数据先划分在若干区间,然后将数据按其数值大小划归到相应区域的组别内,分别统计各个组别中包括的数据个数,再用列表的形式呈现出来。
分组次数分布表的编制步骤包括求全距、定组距和组数、列出分组组距、登记次数和计算次数。
相对次数分布表用频数比率或百分数来表示次数,而累加次数分布表则把各组的次数由下而上或由上而下加在一起。
最后一组的累加次数等于总次数。
双列次数分布表用同一个表表示有联系的两列变量的次数分布。
而不等距次数分布表则适用于像工资级别和年龄分组这样的不等距数据。
需要注意的是,归组效应是分组次数分布表的缺点之一,因为原始数据不见了,从而依据这样的统计表算出的平均值会与用原始数据算出的值有出入,出现误差。
统计学习题参考答案 HEN system office room 【HEN16H-HENS2AHENS8Q8-HENH1688】第一章导论(1)数值型变量。
(2)分类变量。
(3)离散型变量。
(4)顺序变量。
(5)分类变量。
(1)总体是该市所有职工家庭的集合;样本是抽中的2000个职工家庭的集合。
(2)参数是该市所有职工家庭的年人均收入;统计量是抽中的2000个职工家庭的年人均收入。
(1)总体是所有IT从业者的集合。
(2)数值型变量。
(3)分类变量。
(4)截面数据。
(1)总体是所有在网上购物的消费者的集合。
(2)分类变量。
(3)参数是所有在网上购物者的月平均花费。
(4)参数(5)推断统计方法。
第二章数据的搜集1.什么是二手资料使用二手资料需要注意些什么与研究内容有关的原始信息已经存在,是由别人调查和实验得来的,并会被我们利用的资料称为“二手资料”。
使用二手资料时需要注意:资料的原始搜集人、搜集资料的目的、搜集资料的途径、搜集资料的时间,要注意数据的定义、含义、计算口径和计算方法,避免错用、误用、滥用。
在引用二手资料时,要注明数据来源。
2.比较概率抽样和非概率抽样的特点,举例说明什么情况下适合采用概率抽样,什么情况下适合采用非概率抽样。
概率抽样是指抽样时按一定概率以随机原则抽取样本。
每个单位被抽中的概率已知或可以计算,当用样本对总体目标量进行估计时,要考虑到每个单位样本被抽中的概率,概率抽样的技术含量和成本都比较高。
如果调查的目的在于掌握和研究总体的数量特征,得到总体参数的置信区间,就使用概率抽样。
非概率抽样是指抽取样本时不是依据随机原则,而是根据研究目的对数据的要求,采用某种方式从总体中抽出部分单位对其实施调查。
非概率抽样操作简单、实效快、成本低,而且对于抽样中的专业技术要求不是很高。
它适合探索性的研究,调查结果用于发现问题,为更深入的数量分析提供准备。
非概率抽样也适合市场调查中的概念测试。
3.调查中搜集数据的方法主要有自填式、面方式、电话式,除此之外,还有那些搜集数据的方法?实验式、观察式等。
第一章1、指出下列的变量中哪一个属于分类变量(D)A、年龄B、工资C、汽车产量D、购买商品时的支付方式(现金、信用卡、支票)23、指出下面的变量中哪一个属于数值型变量(A )A、生活费支出B、产品的等级C、企业类型D、员工对企业某项改革措施的态度4、某研究部门准备在全市200万个家庭中抽取2000个家庭,以推断该城市所有职工家庭的年人均收入。
这项研究的总体是(B )A、2000个家庭B、200万个家庭C、2000个家庭的人均收入D、200万个家庭的人均收入5、某研究部门准备在全市200万个家庭中抽取2000个家庭,以推断该城市所有职工家庭的年人均收入。
这项研究的样本是(A)A、2000个家庭B、200万个家庭C、2000个家庭的人均收入D、200万个家庭的人均收入6、下列不属于描述统计问题的是(A)A、根据样本信息对总体进行的推断B、了解数据分布的特征C、分析感兴趣的总体特征D、利用图表等对数据进行汇总和分析7、在下列叙述中,采用推断统计方法的是(B)A、用图形描述某企业职工的学历构成B、从一个果园中采摘36个橘子,利用这36个橘子的平均重量估计果园中橘子的平均重量C、一个城市在1月份的平均汽油价格D、随机抽取100名大学生,计算出他们的月平均生活费支出8、最近发表的一份报告称,“由150辆轿车组成的一个样本表明,进口轿车的价格明显高于本国生产的轿车”。
这一结论属于(D)A、对样本的描述B、对样本的推断C、对总体的描述D、对总体的推断9、为了估计全国高中学生的平均身高,从20个城市选取了100所中学进行调查。
在该项研究中,样本是(D )A、100所中学B、20个城市C、全国的高中生D、100所中学的高中生10、只能归于某一类有序类别的非数字型数据称为(B )A、分类数据B、顺序数据C、数值型数据D、数值型变量第二章1、从含有N个元素的总体中,抽取n个元素作为样本,使得总体中的每一个元素都有相同的机会(概率)被抽中,这样的抽样方式称为(A)A、简单随机抽样B、分层抽样C、系统抽样D、整群抽样2、从总体中抽取一个元素后,把这个元素放回到总体中再抽取第二个元素,直至抽取n个元素为止,这样的抽样方法称为(A)A、重复抽样B、不重复抽样C、分层抽样D、整群抽样34、先将总体各元素按某种顺序排列,并按某种规则确定一个随机起点,然后每隔一定的间隔抽取一个元素,直至抽取n个元素形成一个样本,这样的抽样方式称为(C)A、简单随机抽样B、分层抽样C、系统抽样D、整群抽样5、先将总体划分为若干群,然后以群作为抽样单位从中抽取部分群,再对抽中的各个群中所包含的所有元素进行观察,这样的抽样方式称为(D)A、简单随机抽样B、分层抽样C、系统抽样D、整群抽样6、为了调查某校学生的购书费用支出,从男生中抽取60名学生调查,从女生中抽取40名学生调查,这种调查方法是(D)A、简单随机抽样B、整群抽样C、系统抽样D、分层抽样7、为了调查某校学生的购书费用支出,从全校抽取4个班级的学生进行调查,这种调查方法是(D)A、简单随机抽样B、系统抽样C、分层抽样D、整群抽样8、为了调查某校学生的购书费用支出,将全校学生的名单按拼音顺序排列后,每隔50名学生抽取一名学生进行调查,这种调查方法是(C)A、简单随机抽样B、整群抽样C、系统抽样D、分层抽样第三章1、把数据的全部类别或组都列出来,落在某一特定类别或组中的数据个数称为(A)A、频数B、频率C、频数分布表D、累计频数2、样本中各不同类别数值之间的比值称为(D)A、频数B、频率C、比例D、比率3、下面的哪一个图形最适合于描述结构性问题(B)A、条形图B、饼图C、雷达图D、直方图4、下面的哪一个图形适合于比较研究两个或多个样本或总体的结构性问题(A)A、环形图B、饼图C、直方图D、茎叶图5、为比较多个样本间的相似性,适合采用的图形是(C)A、环形图B、茎叶图C、雷达图D、箱线图67、由一组数据的最大值、最小值、中位数和两个四分位数5个特征值绘制而成的、反映原始数据分布的图形,称为(D)A、条形图B、茎叶图C、直方图D、箱线图第四章1、如果一个数据的标准分数是-2,表明该数据(B)A、比平均数高出2个标准差B、比平均数低2个标准差C、等于2倍的平均数D、等于2倍的标准差2、经验法则表明,当一组数据对称分布时,在平均数加减2个标准差的范围之内大约有B的数据。
可编辑修改精选全文完整版统计学原理第一章基础第一节统计的定义统计是从数据中获取信息的一种方法。
第二节主要统计概念一、总体总体就是统计工作者研究对象的全体。
对总体的描述性测度称为参数,如均值,最大值、最小值等。
二、样本样本就是从总体中抽取的若干数据的集合。
对样本的描述性测度量是统计量。
三、统计推断统计推断是运用样本数据对总体进行估计、预测和决策的过程。
可靠性测度共有两种:置信水平和显著性水平。
三个例子:企业多元化战略:多元化企业和非多元化企业的绩效差异。
普通学生和学生干部:就业和收入差异。
男生和女生:成绩差异。
第三节:数据的类型一、定距数据定距数据是实数:如身高、距离、收入等二、定性数据定性数据的取值是类别:如男性、女性。
三、定序数据定序数据也表现为定性的,但是取值是有顺序的。
例如,不好、一般、好、很好、优秀。
定性数据和定序数据的区别在于后者的取值是有顺序的。
第四节数据的描述方法一、图表描述方法计算机命令1.将数据输入或导入列中。
2.选择数据列。
3.单击图表向导(Chart Wizard)、线图(Line)和完成(Finish)。
4.如果想做某些改变,则鼠标右键单击图表,选择图表选项。
二、数字描述方法1.中心位置的测度(1)算术平均数求和:SUM平均值:average(2)中位数:中位数是通过把观测值按顺序排列而计算得到的。
处于中间位置的观测值即为中位数。
中值:median,如果数据有n个,若n为单数,取值为中间的数值;若n为偶数,取值为中间两个数的均值。
众数:mode 。
注意:在不只有一个众数的情况下,Exce 只显示最小的,不显示是否有其它众数。
最大值:max ;最小值:min ;平方根:sqrt数据分析:分析工具库是Excel 所附的一组统计函数,它可以通过菜单栏找到。
单击工具,找到“数据分析”;如果“数据分析”不存在,点击“加载宏”,然后选择分析工具库。
找一台安装有数据分析的电脑,进入excel 安装目录(一般是C:\Program Files\Microsoft Office)进入OFFICE10文件夹拷贝Library 文件夹到你的电脑同名文件夹里,然后执行前面的加载宏步骤就可以了。
《统计学》试题库第一章:统计基本理论和基本概念一、填空题1、统计是统计工作、统计学和统计资料的统一体,统计资料是统计工作的成果,统计学是统计工作的经验总结和理论概括。
2、统计研究的具体方法主要有大量观察法、统计分组法、统计推断法和综合指标法。
3、统计工作可划分为设计、调查、整理和分析四个阶段。
4、随着研究目的的改变,总体和个体是可以相互转化的。
5、标志是说明个体特征的名称,指标是说明总体数量特征的概念及其数值。
6、可变的数量标志和所有的统计指标称为变量,变量的具体数值称为变量值。
7、变量按其数值变化是否连续分,可分为连续变量和离散变量,职工人数、企业数属于离散变量;变量按所受影响因素不同分,可分为确定性变量和随机变量。
8、社会经济统计具有数量性、总体性、社会性、具体性等特点。
9、一个完整的统计指标应包括指标名称和指标数值两个基本部分。
10、统计标志按是否可用数值表示分为品质标志和数量标志;按在各个单位上的具体表现是否相同分为可变标志和不变标志。
11、说明个体特征的名称叫标志,说明总体特征的名称叫指标。
12、数量指标用绝对数表示,质量指标用相对数或平均数表示。
13、在统计中,把可变的数量标志和统计指标统称为变量。
14、由于统计研究目的和任务的变更,原来的总体变成总体单位,那么原来的指标就相应地变成标志,两者变动方向相同。
二、是非题1、统计学和统计工作的研究对象是完全一致的。
(×)2、运用大量观察法,必须对研究对象的所有或足够多的单位进行观察调查。
(√)3、统计学是对统计实践活动的经验总结和理论概括。
(√)4、一般而言,指标总是依附在总体上,而总体单位则是标志的直接承担者。
(√)5、数量指标是由数量标志汇总来的,质量指标是由品质标志汇总来的。
(×)6、某同学计算机考试成绩80分,这是统计指标值。
(×)7、统计资料就是统计调查中获得的各种数据。
(×)8、指标都是用数值表示的,而标志则不能用数值表示。
(完整版)统计学总复习提纲统计学复习提纲第⼀章:绪论1、1)统计的含义:统计⼀词有统计⼯作、统计资料、统计科学三种含义,但最基本的还是统计⼯作。
没有统计⼯作就不会有统计资料,没有丰富的统计实践经验就不会产⽣统计科学。
2)统计的研究对象:统计学的研究对象是统计⼯作的规律,即搜集、整理和分析统计数据的⽅法,是⼀门⽅法论科学。
3)统计的特点:数量性、具体性、综合性2、统计学的基本概念1)总体:总体是指在某种共性的基础上由许多个别事物结合起来的整体。
总体有三⽅⾯特征:同质性、⼤量性、差异性总体可分为有限总体和⽆限总体2)总体单位:构成总体的个别事物叫总体单位。
总体和总体单位是根据统计研究的⽬的来确定的。
3)标志:标志是指说明总体单位特征的名称。
标志可分为数量标志(⽤数字回答问题)和品质标志(⽤⽂字回答问题)。
标志还可分为不变标志和可变标志。
不变标志:所有总体单位共同具有的特征。
它是构成总体的必要条件和确定总体范围的标准。
可变标志:在总体各单位之间必然存在差异的标志。
4)变量:可变标志中既有品质标志也有数量标志。
可变的数量标志就叫变量。
变量的具体数值叫变量值。
凡变量值只能以整数出现的变量,叫离散变量。
凡变量值可作⽆限分割的变量,叫连续变量。
5)指标与指标体系:指标:说明总体数量特征的概念。
指标体系:以共同的研究⽬的为纽带⽽相互联系的⼀系列统计指标。
6)指标与标志的区别与联系区别有⼆:第⼀,指标说明总体的特征;⽽标志说明总体单位的特征。
第⼆,指标只反映总体的数量特征,所有指标都要⽤数字来回答;标志则既有反映总体单位的数量特征(⽤数字回答),也有反映总体单位的品质特征(⽤⽂字回答)。
⼆者联系:主要表现:许多标志的数值都是由总体各单位的数量标志的标志值汇总⽽得来的。
品质标志虽然本⾝不具有数值,但有些指标是按品质标志分组分组计算得出。
由于总体和总体单位可随统计研究的⽬的⽽易位,故指标和数量标志在⼀定的条件下可以变换。
统计学名词解释HEN system office room 【HEN16H-HENS2AHENS8Q8-HENH1688】名词解释●统计工作:是从数量方面对社会经济现象做调查研究的一种工作,是人们为认识客观事物而进行的搜集、整理、分析和提供统计资料的工作过程。
●统计资料:是统计工作的成果,是指在统计实践活动中所取得的,反映统计研究对象有关特征的各种综合性的数字资料和分析报告。
●统计学:是阐述统计理论与方法的系统性科学,是统计工作实践的理论概括和科学总结,是研究、整理、分析统计资料的理论和方法的科学。
●总体:是指客观存在的,在某一相同性质基础上结合起来的许多个别事物的整体●总体单位:构成总体的个别事物●样本:从总体当中抽取出来,用从代表这一总体的部分个体组成的集合●标志:是说明总体单位属性或特征的名称●统计指标:说明总体数量特征的,简称指标。
有俩种理解,一是指反映现象总体数量特征的概念。
二是指反映现象总体数量特征的概念及其数量表现。
●普查:是专门组织的一次性的全面调查。
这种调查,主要用来搜集一些比较全面而又不能或不宜从经常调查中得出的统计资料。
●重点调查:是一种非全面调查,它是从所要调查的单位中选择一部分重点单位进行调查●抽样调查:也是一种非全面调查,它是按照随机原则从被研究总体中抽取出一定数量的单位(样本)进行调查,根据样本指标数值来推算总体指标数值的一种调查●典型调查:是一种十分重要的、行之有效的非全面调查方法。
它是从研究总体中有意识地选取若干具有代表性单位(典型单位)进行调查,用来了解总体的详细情况●统计调查:根据统计工作任务和统计设计的要求,用科学的方法,有计划有组织地向调查单位搜集调查资料的过程●统计分组:根据统计研究的需要,将统计总体按照一定的标志区分为若干组成部分的一种统计方法●分配数列:又称分布数列、次数数列,是在统计分组的基础上形成的,用来反映总体单位在各组中分布状况的统计数列●总量指标:是反映社会经济现象的总体规模和水平的统计指标。
WORD 格式可编辑第一章绪论一、填空题1 •统计一词从不同角度理解有三种涵义,即 统计工作、统计资料和 统计学。
2 •社会经济统计的研究对象是 社会经济现象的数量方面 ___________ 。
3 •统计总体具有的特点是大量性 、同质性 和 差异性 。
4 •标志是说明 总体单位 特征的,可以分为 品质标志 和数量标志 。
5 •统计指标是说明总体特征的,其构成要素有 6个,即指标名称而值、计量单位、计算方法、时间范围、空间范围。
6 •职工的文化程度是 ________ 标志,工龄是 数量 标志。
7 •企业的机器台数和职 — 离散 变量,而固定资产原值和销售收入是 连续变量。
8 •要了解我国乳品企业的生产情况,总体 ,总体单位是 每一个乳品企业 。
9 •要了解我国乳品企业的设备状况,总体是 所有乳品企业,总体单位是每一个乳品企业。
10.学生的性别、民族属于 品质 标志,而学生的身高、体重是 数量 标志。
11.统计指标的概念完整表述为:“说明社会经济现象总体的数量特征的概念和具体数值”。
12. 按统计指标的性质不同,统计指标可分为 数量指标 和 质量指标、判断题I. 随着研究目的的不同,总体与总体单位之间是可以变换的,指标与标志也是可以变换的。
( 2 •张明同学期末数学成绩 85分,这是统计指标。
(F ) 3 .总体单位的特征用指标来说明,总体的特征用标志来说明。
( F )4 •标志可以用文字表现,也可以用数字表现。
( T )5 •指标可以用文字表现,也可以用数字表现。
( F )6 •指标值是由标志值汇总计算而得到。
( T )7 .在全国人口普查中,“年龄”是变量。
(T )8 .某班学生学习情况调查中,班级名称和学生姓名都是可变标志。
(F )9 •张明同学期末数学成绩 85分,“成绩”是连续变量,“ 85分”是变量值。
(F ) 10. 某企业职工的姓名、民族、年龄、工种等都是品质标志。
(F )II. 统计的研究对象是社会经济现象总体的数量方面。
Statistics ReportGROUP MEMBERS: Shao Xiayu(SCN 20085878339 )Yang Zheng(SCN 20085878040 )Liang Han(SCN 20085878018 )Liang Zhanning(SCN 20085878087 )Zhang Yajie(SCN 20085878335 )Zhou Weifeng(SCN 20085878271 )TO: Su NanClass: International TradeDate: May 23. 2010Letter of transmittalThe report is written for HSBC‟s Research Headquarters. The report has finished at May 20, 2010. and finished with team work.Executive summaryThe report‟s structures are descriptive analysis, inferential analysis and decision making analysis. The methods are used in the case are probability distribution and quantitative approach. Through analysis the bank RBCC, bankers accepted the personal loan never considering Gender. Bankers prefer to lend them money because their family and income more stable. People whose ages between 30 to 50 also lend more money from RBCC. If the family has more children they will have more debt. In RBCC there are 15.3% customers go into default. RBCC collect the default debt by approach the customers the right way which falls behind on payments. When HSBC manage its subsidiaries they should control the subsidiaries‟managers. And fully control their shares.ContentsExecutive summary (2)Introduction (4)List of table (5)Assumptions (5)Task 1. Theoretical Statistics (6)Question one: (6)Question two: (6)Question three: (7)Question four: (7)Question five: (10)Question six: (11)Task 2: Application Statistics (12)Question one (12)Question two (14)Question three (16)Task three: Business Decision Making (17)Question 1. (17)Question 2. (18)Question 3. (19)Question 4.. (20)Question 5. (22)Question 6. (23)Question 7. (24)Conclusion (25)Recommendation (25)Reference list (25)Distribution the assignment (26)Meeting Schedule (26)IntroductionThis report is consists of three parts, theoretical statistics, application statistics, and business decision making. In theoretical statistics, it main to explain the basics knowledge of statistics, such as primary information, second information and method of data collection and so on. It has six questions. In application statistics. It main to explain the bell distribution of statistics. In business decision making. It main to deal with default debt, the follows is our report.List of tableAssumptions●all given information are true and correct●assumptions are binominal distribution●the confidence level is 95% through the whole report●the average service fee is 13.8 per month in each bank●Different lenders has different loan coefficient and the given different interest ratesTask 1. Theoretical StatisticsQuestion one: Distinguish primary sources and secondary sources?Primary informationPrimary information is that collected from our experienced and we sort out from files. Primary information has stronger confidential. First-hand information refer to: literature and cultural information. Primary information on the holders of the data and firstly contact with that information, but also has a high degree of confidentiality.Primary information with the empirical, and the advantages is lively and readable.In case study, the conversation of Juli Beck, Andy Beck, and Dan beck is Primary information.secondary dataSecondary data is that already exists somewhere and has been for a purpose or what happened up the data editor. secondary sources (secondary data)refer to other purpose that others had previously collected, rather than the researchers on hand to collect information on their research。
In case study, the summary of HSBC‟s History and foundation and growth are the secondary data.Question two: Describe and justify the survey methodology used for data collection related to the caseThere are many methods of data collection. Such as, observation, survey, questionnaire, search in the online, face to face investigation. In the case, we can find it has search in the online and investigation by face to face.The current management team, led by Julia, Daniel, and Andrew Beck (the president, CFO, and COO, respectively) has engaged your consulting team to analyze data from1,000 recent loans and make recommendations. This is primary information.This HSBC case has been clear up by somebody. So it can be defined second information.From the case, we can find it has used group research and phone interviewQuestion three: If ask you designed the questionnarie for colleting the data, what types of question you will use ? consider the charecterists of different types of question for data analysis.In the case, open questions are more than closed questions. Open questions can make more kinds of answer about one question. People‟s thought can be opened, Understand everyone idea about it. We also can compare them then choose the best answer. On the other hand, it may make some answers which are unable. it is difficult to statistic data, what‟s more, we need more time to statistic data, people have same question that we can not find common ideas.Question four: Using the different variables for RBCC, you will be required to present the a set of data which are in Excel file rbcc.xls by appropiate charts or diagrams ? (at least three charts or diagrams)a.Pareto chart of Marital Status of HSBCMaritalStatus Number PercentageSingle 288 28.80%Married 562 56.20%Divorced 89 8.90%Widow 61 6.10%Total 1000 100%Look at the chart, we can find married easiest to credit from HSBC, about 56% loaners are married. And single easier take out a loan than divorced and widow. It was indicating that the HSBC has chosen to loan to who has complete family, which emphasis on stability.b. Pie chart of children of loanerchildren people percentage0 414 42%1 199 20%2 158 16%3 115 11%4 60 6%5 40 4%6 15 1%Look at the chart, we can found the loaners of no children easiest to take out a loan from HSBC, while the loaners of a child and the loaners of two children not easy to loan from HSBC. The chart was indicating that the children of more and more difficult access to loan. If family has some children, it would has some press in family, so bank don‟t like loan to the people like that.c. The bar chart of genderGender totalfemale 496male504Look at the bar chart, we can find the no impact of gender on loans.Question five: Using “income” as the key variable for RBCC, calculates the mean, median, mode, quartile, and measure of dispersion of data.We calculate the mean is $49,481, and the median is $49518, the mode is 65311.we can get some information from that, such as they level of income and the capacity of purchasing. Skew, Because of the mean<median, so it belonged to theleft-skewAnd the max is $89489, the mix is $9118. Then we can get the range is 80371.So it can tell us that their income is not equality, so the company will face the risk.We can calculate the standard deviation, because of their means is $49,418, the standard deviation is 18586.34, Because of different marital status has different mean, so we compare single and divorced, CV(single)=0.19, CV(divorced)=0.08.CV (divorced) < CV (single). So we can get the conclusion that because of the dataof income, it has not z-score, so the bank can get more profit. Through the analysisthe data, we find the kurtosis is -1.02219063; it can represent the mild of their incomeThrough the calculating and analysis, we assumption x=divorced and y=widowed that they have 48, so we can get the result R=0.08, it was not nearly 1, it has not connect.Question six: Using the appropriate set of data, you will be required to create the histogram?接收频率累积%5000 0 0.00%10000 2 0.20%15000 13 1.50%20000 33 4.80%25000 60 10.80%30000 69 17.70%35000 88 26.50%40000 83 34.80%45000 87 43.50%50000 73 50.80%55000 78 58.60%60000 81 66.70%65000 81 74.80%70000 87 83.50%75000 73 90.80%80000 50 95.80%85000 36 99.40%90000 6 100.00%其他0 100.00%From the chart we can find the people which income about $45000 is easier to take out a loan from HSBC. Through analysis the chart the people which income between 35000 to 75000 is main loaner to loan from HSBC and other income not easier to loan from HSBC.The bank cans choice to loan to people which between 35000 to 75000, because of it easy to take back.Task 2: Application StatisticsQuestion oneDue to economic going to down from 2008, the commercial bank has been decline for a period. In an attempt by commercial banks to raise revenue, many banks increase the service charge, typically between $10 to $20. According to commission of banks, about 90% of commercial banks charge the customer fees when applying for the loan or opening accounts.a.You select a random sample of 5 commercial banks in China. Assume the numberof the 5 commercial bank charging service feels distributed as binomial random distribution. What are the mean and standard deviation of the distributionb. compute the probability that of the 5 banks:1. exactly one bank charges service fee2. two or fewer banks charge service fee3. three or more banks charge service feeAnswer: a. From the question, we can find sample size is 5, and probability of success is 0.9,mean=5*probability, the standard deviation=)1(**5p p . So we can get the mean is 4.5, the variance is 0.45,and the standard deviation is about 0.67.b:The result is:binomial probabilities table xp(x)0 1E-05 1 0.00045 2 0.0081 3 0.0729 4 0.32805 5 0.59049It has a bank to receive the services fee that's probability is 0.00045, from this we can find has increased the cost of management, it may be reduce its competitiveness, it would lose the advantage of competitive.If there are two or fewer banks charge service fee, their probability is 0.0081, because of their probability is very small, if they want to improve their services fee, they need to build better services for customers or other advantages to different to other banks.When there are three or more banks charge service fee, lots of consumers can accept this fact, their probability is 0.073, the consumer would pay the services fee to bank, and the government can agree this term.Banks charge service fees, and it has two different ways, receive or not receive,moreover, these two cases are mutually exclusive events, and the sum probability of two cases is 1. Besides, the binomial distribution is belonged to multiple. So it conform all the standards about Bernoulli experimentsQuestion twoHSBC‟ share in Hong Seng Index , the number of shares traded daily on the Hong Kong Stock Exchange is freferred to as the volume of trading. On July 30, 2009, 1.39 billion shares of stock were traded. Theis volume to trading is near the mean volume for the Hong Seng Index. Assume that the number of share traded on Hong Seng Index is noraml random distribution with mean 1.4 billion and a standard deviation of 0.15 billion. For a randomly selected day, what is the probability that the volume of trading on the Hong Seng Index is :a.below 1.8 billionb.below 1.2 billionc.above 1.0 billionAnswer:From the condition, we can know that they are normal distribution, so it has the characteristic, their average are equals with media, and they are uniform, their normal distribution is bell distribution.Below 1.8 billionBelow 1.2 billionAbove 1.0 billionQuestion threeIn scenario, the following data represent the monthly service fee in dollars if a customer …saving account falls below the minimum $ 2000 balance for a sample of 10 banks for direct –deposit customer.12, 10, 10, 12, 15, 15, 12, 12, 20, 10a. construct a 95% confidence interval for the population mean monthly service fee in dollars if a customer‟s account balance falls below the minimum required balance Answer:When calculate the z-value, we can get the number must greater than 30, but in this process, their number is 10, so we can use the T value to get the level of confidence. We can through the nstx ± to calculate the confidence level. We to analysis the excel can get the result is mean is 12.8 and standard deviation is 3.12So we can defined the x =12.8 s= 3.12, from the table, we can know that t=2.26,16.310==n , according to the nstx ±, we can get the their confidence level is12.8-2.26*(3.12/10)≤≤μ12.8+2.26*(3.12/10)=10.57≤≤μ15.03. From this we can get the conclusion their overall mean was in their interval, very money receive the services in the interval.Because their overall mean is 13.8.Step 1: The null hypothesis is that the services fee mean has not changed from its previous values of 13.88.13= H (Assume it is true)The alternative hypothesis is the opposite of the null hypothesis. Since the null hypothesis is that the services fees means is 13.8, the alternative hypothesis is that services fee mean is not 13.8H1:μ>13.8(as to the bank receive the services fees)Step 2: You have to selected a sample of n=10, the level of significance is 0.025. Step 3: Because σ=is known, you use the normal distribution and the T-text statistic. Step 4: Since α=0.025, the critical values of the T-text statistic are -2.26 and +2.26. The rejection region is T<-2.26 or Z>+2.26. The region is -2.26<T<+2.26.Step 5: You collect the data and services fee 8.12=x , the overall means is 13.8, the standard deviation is 3.12, we can use the Z=-1.01Step 6: Since Z=-1.01>-2.26, it belonged to the region, the consumer can agree to receive the services fees that was higher than 13.8.Task three: Business Decision MakingFrom all information given, you need to consider the following problems: Question 1. What pr oportion of RBCC’s customers go into default? Answer:We can use the pivot able to calculate the result.From the chart table, we can get the conclusion that it‟s proportion 15%: 85%, from this proportion, we can see it, there will be 15% people who are default, their reputation will reduce and that would be affecting their credit in the future. The company should take measure to manage the people who are default. Otherwise, there will be a lot of bad debt of bank, which will directly affect the economic interests of banks.Question 2. What criteria should RBCC use when deciding which customers are good credit risks? If you were to recommend a set of variables for RBCC to use, which variables would you use?Answer: let us to arrange it, Income is first, because of this is the most is the basic condition for loans. The second is Debt, because of this reflects the individual's external debt situation, the third is Credit Rating, because of it reflecting his own credibility, to see whether banks can accept, the fourth is Marital Status, because of it reflects the ability of family income, then is children and age, the finally is gender.Question 3. Once a loan is in default, what “script” should they use to try and collect the overdue debt?You should include in your report statements about Hypothesis-testing methodolgy ,p-values, where appropriate.Exhibit 1.Also:Answer:The most commonly used three scripts are A, B, C. “A” indicates the “Responsibility” appeal; if the customer goes through the “A” appeal and still not willing to pay the debt then RBCC can reduce the customer‟s credit level of rank. Script “B” indicates the “Credit Rating” appeal, if script “B” is also not wo rking to the customer, RBCC can reduce the customer‟s credit level. The third script is focuses on the threat of legal action referring as scr ipt “C”. Script “C” indicates the “Legal” appeal, which is that RBCC can request the court to assist for debt collection.Question 4. Evaluate each of the following loan applicants, and make recommendations to the Becks as to which of them ought to be approved for a loan. Rank them in order of credit-worthiness and discuss your conclusions.Subject MaritalStatusB&HRating Children Age Income Debt GenderLEE Married A 4 24 $50,049 $92,876 Male Ferreira Single B 1 34 $21,334 $139,639 Male Aboud Divorced E 1 40 $49,638 $33,509 Male Coismain Single C 0 27 $35,541 $25,589 Female Arnold Married A 2 35 $53,269 $93,890 Female Chandra Widowed D 0 69 $44,070 $41,143 Female Manya Divorced E 1 36 $43,243 $29,775 FemaleBakshi Married C 1 32 $19,223 $18,006 MalePaul Married D 3 34 $33,754 $55,331 Male Scott Married B 2 29 $56,893 $44,657 Male Answer: We can be obvious see that their S can be divided into five parts that is A is 0.4, B is 0.5, C is 0.6, D is 0.7, E is 0.8. and we can analysis it in five credit ways.From the table, the LEE and Aronld‟s reputation level is A. they were credit from individual reputation or other ways, their risk indicator is low than 0.5, so they easy to get the debt from bank, but Ferreira and Scott‟s B&H Rating is B, and B is means 0.5, when they credit using the fixed asset mortgage, their risk indicators are 0.1, 0.35, 0.25, 0.4, because of they are low than 0.5, so the bank would like to credit for them, because of the bank have small risk.But as to Scott, because of he risk indicators is 0.5, but she is married, so when she pay the credit can through their income, and prepared their income and credit, the can pay the credit easily, so the bank can credit for her.Coismain and Bakshi, because of their B&H rating is equally that is 0.6. Through the table we can get the conclusion, when they use the other four, their risk indicators islow than 0.5, the bank would like to credit for them.Chandra and Paul, when they credit use the 20%, 50%, 70%, their risk indicators is low than 0.5, they can get the credit from bank.Aboud and Manya, from the 20%, 50%, we can get their risk indicators is 0.16, 0.4, they can debt from bank, from the 80%, 100%, we get the risk indicators is 0.64, 0.8, they are high than 0.6, they hardly to credit from bank. In the 70%, their risk indicators is 0.56, so they need to consider other reason, such as Aboud‟s age, her income is $49,638, her credit is $33.509, her income is high than her credit, and she have 1 children, she use the 70% to credit, so she can repay the credit easily, so the bank can credit for her. But as to Manya, the bank ought to credit for herQuestion 5. If all of these applicants were to be approved for a loan, what sort of interest rates would the Becks charge them, so that each applicant contribute 10% expected annual profit for RBCC? Make any necessary assumptions and state them clearly. (Assume that the Becks can charge a different rate to each customer, depending on their estimated risk of default.)Answer:Assumption ten people are credit from individual, they will credit $50000. Their confidence level is 95%we can get their risk indicators isFrom this we can get their mean is 0.42 and standard deviation is 0.1, then we can get their confidence interval. Because of their means is 0.42, the standard deviation is 0.1,μ0.49, from their mean is in their so we can get the confidence interval is 0.35≤≤interval.Because of the bank receive the credit interest is 3%≤interests≤12%, we can get the conclusion that when we credit from the bank, we need to consider our risk indicators, to build high credit ranking, if that we can get the credit easily.Each applicant contribute the annual profit to HSBC = 50,000* each applicants'Form this, we can get the conclusion that when we credit from the bank, we need to build high credit ranking, because of the risk indicators is high the profit is high. Bank would credit to people like that, if do that the bank can get more profits.Question 6. RBCC how to collect the default debt, what the further action they need to take?Answer: The further action that the RBCC need to take to collect the default debt it‟s that they need to prepare to analysis of different accounts. According to the different age length of the receivable accounts, to determine the proportion of the bad debt that needsto be received due to the default.. Once a loan is in default there are four most commonly used methods that the bank will use to collect the overdue debt. First one is to phone or send mails to the mortgagors that a loan default may have negative effect on their credit rating. The second one is to dispose the mortgagor‟s mortgages. The third one is to in quire into relative warrantor‟s responsibilities. Fourth one is to enable legal procedures, request the court to assist for debt collection.Question 7. As HSBC, what’s the action will be considered in future for supervising the subcidiaries to avoid lots of amont of default debt ?If the HSBC want to control their financial institution, or avoiding the default, they can increase the profits , it can affect the financial institution and control money supply, and descend the bank's risk, HSBS can engage the person who has the experience to mange the RSBS, and improve the credit card,in currently, some bank have improved their risk indicators, that is 0.5, it avoid the default, maintaining the bank's profit. they can check the accounts in fixing date. The HSBC would like to improve the internal control, to avoid the internal conflict, and encourage them to realize the financial market and business environment, then to do the better decision to the RSBC.ConclusionThis report has some questions, it all need use our used knowledge of statistics.We use the table to analysis the data, such as we use the pareto table to analysis the data and get the married easiest to credit from HSBC, and to explain the default debt, and use we studied method of solve default debt to deal with the default debt customers of HSBC.These issues are related to the knowledge we have learned, we use we learned to deal with these issues in flexibility.RecommendationSome people have default debt, it impact on the HSBC‟s profits. The HSBC may improve its interest rate. Because of it can reduce a lot of default debts. When people want to lend money from HSBC, they will consider they really can afford the interest rate or not. When customers apply business in HSBC. It can reduce the amount of service charges.Reference listDavid M. Levine/Timothy C. Krehbiel/Mark L.Berenson, Business Statistics, 2006Ken Black, Business Statistics, 2006Distribution the assignmentMeeting Schedule。