Subjective Bayesian Analysis Principles and Practice
- 格式:pdf
- 大小:166.07 KB
- 文档页数:18
《统计学》_各章关键术语(中英⽂对照)第⼆部分各章关键术语(中英⽂对照)第1章统计学(statistics)随机性(randomness)描述统计学(descriptive statistics)推断统计学(inferential statistics)总体(population)母体(parent)(parent population)样本、⼦样(sample)调查对象总体(respondents population)有限总体(finite population)调查的理论总体(survey’s heoretical population)超总体(super population)变量(variable)数据(data)原始数据(original data)派⽣数据(derived data)定类尺度(nominal scale)定类尺度变量(nominal scale level variable)定类尺度数据(nominal scale level data)定序尺度(ordinal scale)定序尺度变量(ordinal scale level variable)定序尺度数据(ordinal scale level data)定距尺度(interval scale)定距尺度变量(interval scale level variable)定距尺度数据(interval scale level data)定⽐尺度(ratio scale)定⽐尺度变量(ratio scale level variable)定⽐尺度数据(ratio scale level data)分类变量(categorical variable)定性变量、属性变量(qualitative variable)数值变量(numerical variable)定量变量、数量变量(quantitative variable)绝对数变量(absolute number level variable)绝对数数据(absolute number level data)⽐率变量(ratio level variable)⽐率数据(ratio level data)实验数据(experimental data)调查数据(survey data)观察数据(observed data)第2章随机性(randomness)随机现象(random phenomenon)随机试验(random experiment)事件(event)基本事件(elementary event)复合事件(union of event)必然事件(certain event)不可能事件(impossible event)基本事件空间(elementary event space)互不相容事件(mutually exclusive events)统计独⽴(statistical independent)统计相依(statistical dependence)概率(probability)古典⽅法概率(classical method probability)相对频数⽅法概率(relative frequency method probability)主观⽅法概率(subjective method probability)⼏何概率(geometric probability)条件概率(conditional probability)全概率公式(formula of total probability)贝叶斯公式(Bayes’ formula)先验概率(prior probability)后验概率(posterior probability)随机变量(random variable)离散型随机变量(discrete type random variable)连续型随机变量(continuous type random variable)概率分布(probability distribution)特征数(characteristic number)位置特征数(location characteristic number)数学期望(mathematical expectation)散布特征数(scatter characteristic number)⽅差(variance)标准差(standard deviation)变异系数(variable coefficient)贝努⾥分布(Bernoulli distribution)⼆点分布(two-point distribution) 0-1分布(zero-one distribution)贝努⾥试验(Bernoulli trials)⼆项分布(binomial distribution)超⼏何分布(hyper-geometric distribution)正态分布(normal distribution)正态概率密度函数(normal probability density function)正态概率密度曲线(normal probability density curve)正态随机变量(normal random variable)卡⽅分布(chi-square distribution)F_分布(F-distribution)t_分布(t-distribution) “学⽣”⽒t_分布(Student’s t-distribution)列联表(contingency table)联合概率分布(joint probability distribution)边缘概率分布(marginal probability distribution)条件分布(conditional distribution)协⽅差(covariance)相关系数(correlation coefficient)第3章统计调查(statistical survey)数据收集(collection of data)统计单位(statistical unit)统计个体(statistical individual)社会经济总体(socioeconomic population)调查对象总体(respondents population)有限总体(finite population)标志(character)标志值(character value)属性标志(attributive character )品质标志(qualitative character )数量标志(numerical indication)不变标志(invariant indication)变异(variation)调查条⽬(item of survey)指标(indicator)统计指标(statistical indicator)总量指标(total amount indicator)绝对数(absolute number)统计单位总量(total amount of statistical unit )标志值总量(total amount of indication value)(total amount of character value)时期性总量指标(time period total amount indicator)流量指标(flow indicator)时点性总量指标(time point total amount indicator)存量指标(stock indicator)平均指标(average indicator)平均数(average number)相对指标(relative indicator)相对数(relative number)动态相对指标(dynamic relative indicator)发展速度(speed of development)增长速度(speed of growth)增长量(growth amount)百分点(percentage point)计划完成相对指标(relative indicator of fulfilling plan)⽐较相对指标(comparison relative indicator)结构相对指标(structural relative indicator)强度相对指标(intensity relative indicator)基期(base period)报告期(given period)分组(classification)(grouping)统计分组(statistical classification)(statistical grouping)组(class)(group)分组设计(class divisible design)(group divisible design)互斥性(mutually exclusive)包容性(hold)分组标志(classification character)(grouping character)按品质标志分组(classification by qualitative character)(grouping by qualitative character)按数量标志分组(classification by numerical indication)(grouping by numerical indication)离散型分组标志(discrete classification character)(discrete grouping character)连续型分组标志(continuous classification character)(continuous grouping character)单项式分组设计(single-valued class divisible design)(single-valued group divisible design)组距式分组设计(class interval divisible design)(group interval divisible design)组界(class boundary)(group boundary)频数(frequency)(frequency number)频率(frequency)组距(class interval)(group interval)组限(class limit)(group limit)下限(lower limit)上限(upper limit)组中值(class mid-value)(group mid-value)开⼝组(open class)(open-end class)(open-end group)开⼝式分组(open-end grouping)等距式分组设计(equal class interval divisible design)(equal group interval divisible design)不等距分组设计(unequal class interval divisible design)(unequal group interval divisible design)调查⽅案(survey plan)抽样调查(sample survey)有限总体概率抽样(probability sampling in finite populations)抽样单位(sampling unit)个体抽样(elements sampling)等距抽样(systematic sampling)整群抽样(cluster sampling)放回抽样(sampling with replacement)不放回抽样(sampling without replacement)分层抽样(stratified sampling)概率样本(probability sample)样本统计量(sample statistic)估计量(estimator)估计值(estimate)⽆偏估计量(unbiased estimator)有偏估计量(biased estimator)偏差(bias)精度(degree of precision)估计量的⽅差(variance of estimates)标准误(standard error)准确度(degree of accuracy)均⽅误差(mean square error)估计(estimation)点估计(point estimation)区间估计(interval estimate)置信区间(confidence interval)置信下限(confidence lower limit)置信上限(confidence upper limit)置信概率(confidence probability)总体均值(population mean)总体总值(population total)总体⽐例(population proportion)总体⽐率(population ratio)简单随机抽样(simple random sampling)简单随机样本(simple random sample)研究域(domains of study)⼦总体(subpopulations)抽样框(frame)估计量的估计⽅差(estimated variance of estimates)第4章频数(frequency)(frequency number)频率(frequency)分布列(distribution series)经验分布(empirical distribution)理论分布(theoretical distribution)品质型数据分布列(qualitative data distribution series)数量型数据分布列(quantitative data distribution series)单项式数列(single-valued distribution series)组距式数列(class interval distribution series)频率密度(frequency density)分布棒图(bar graph of distribution)分布直⽅图(histogram of distribution)分布折线图(polygon of distribution)累积分布数列(cumulative distribution series)累积分布图(polygon of cumulative distribution)位置特征(location characteristic)位置特征数(location characteristic number)平均值、均值(mean)平均数(average number)权数(weight number)加权算术平均数(weighted arithmetic average)加权算术平均值(weighted arithmetic mean)简单算术平均数(simple arithmetic average)简单算术平均值(simple arithmetic mean)加权调和平均数(weighted harmonic average)加权调和平均值(weighted harmonic mean)简单调和平均数(simple harmonic average)简单调和平均值(simple harmonic mean)加权⼏何平均数(weighted geometric average)加权⼏何平均值(weighted geometric mean)简单⼏何平均数(simple geometric average)简单⼏何平均值(simple geometric mean)绝对数数据(absolute number data)⽐率类型数据(ratio level data)中位数(median)众数(mode)耐抗性(resistance)散布特征(scatter characteristic)散布特征数(scatter characteristic number)极差、全距(range)四分位差(quartile deviation)四分间距(inter-quartile range)上四分位数(upper quartile)下四分位数(lower quartile)在外截断点(outside cutoffs)平均差(mean deviation)⽅差(variance)标准差(standard deviation)变异系数(variable coefficient)第5章随机样本(random sample)简单随机样本(simple random sample)参数估计(parameter estimation)矩(moment)矩估计(moment estimation)修正样本⽅差(modified sample variance)极⼤似然估计(maximum likelihood estimate)参数空间(space of paramete)似然函数(likelihood function)似然⽅程(likelihood equation)点估计(point estimation)区间估计(interval estimation)假设检验(test of hypothesis)原假设(null hypothesis)备择假设(alternative hypothesis)检验统计量(statistic for test)观察到的显著⽔平(observed significance level)显著性检验(test of significance)显著⽔平标准(critical of significance level)临界值(critical value)拒绝域(rejection region)接受域(acceptance region)临界值检验规则(test regulation by critical value)双尾检验(two-tailed tests)显著⽔平(significance level)单尾检验(one-tailed tests)第⼀类错误(first-kind error)第⼀类错误概率(probability of first-kind error)第⼆类错误(second-kind error)第⼆类错误概率(probability of second-kind error)P_值(P_value)P_值检验规则(test regulation by P_value)经典统计学(classical statistics)贝叶斯统计学(Bayesian statistics)第6章⽅差分析(analysis of variance,ANOVA)⽅差分析恒等式(analysis of variance identity equation)单因⼦⽅差分析(one-factor analysis of variance)双因⼦⽅差分析(two-factor analysis of variance)总变差平⽅和(total variation sum of squares)总平⽅和SST(total sum of squares)组间变差平⽅和(among class(group) variation sum of squares),回归平⽅和SSR(regression sum of squares)组内变差平⽅和(within variation sum of squares)误差平⽅和SSE(error sum ofsquares)⽪尔逊χ2统计量(Pearson’s chi-statistic)分布拟合(fitting of distrbution)分布拟合检验(test of fitting of distrbution)⽪尔逊χ2检验(Pearson’s chi-square test)列联表(contingency table)独⽴性检验(test of independence)数量变量(quantitative variable)属性变量(qualitative variable)对数线性模型(loglinear model)回归分析(regression analysis)随机项(random term)随机扰动项(random disturbance term)回归系数(regression coefficient)总体⼀元线性回归模型(population linear regression model with a single regressor)总体多元线性回归模型(population multiple regression model with a single regressor)完全多重共线性(perfect multicollinearity)遗漏变量(omitted variable)遗漏变量偏差(omitted variable bias)⾯板数据(panel data)⾯板数据回归(panel data regressions)⼯具变量(instrumental variable)⼯具变量回归(instrumental variable regressions)两阶段最⼩平⽅估计量(two stage least squares estimator)随机化实验(randomized experiment)准实验(quasi-experiment)⾃然实验(natural experiment)普通最⼩平⽅准则(ordinary least squares criterion)最⼩平⽅准则(least squares criterion)普通最⼩平⽅(ordinary least squares,OLS)最⼩平⽅(least squares)最⼩平⽅法(least squares method)第7章简单总体(simple population)复合总体(combined population)个体指数:价⽐(price relative),量⽐(quantity relative)总指数(general index)(combined index)统计指数(statistical indices)类指数、组指数(class index)动态指数(dynamic index)⽐较指数(comparison index)计划完成指数(index of fulfilling plan)数量指标指数(quantitative indicator index)物量指数(quantitative index)(quantity index)(quantum index)质量指标指数(qualitative indicator index)价格指数、物价指数(price index)综合指数(aggregative index)(composite index)拉斯贝尔指数(Laspeyres’ index)派许指数(Paasche’s index)阿斯·杨指数(Arthur Young’s index)马歇尔—埃奇沃斯指数(Marshall-Edgeworth’s index)理想指数(ideal index)加权综合指数(weighted aggregate index)平均指数(average index)加权算术平均指数(weighted arithmetic average index)加权调和平均指数(weighted harmonic average index)因⼦互换(factor-reversal)购买⼒平价(purchasing power parity,PPP)环⽐指数(chain index)定基指数(fixed base index)连环替代因素分析法(factor analysis by chain substitution method)不变结构指数、固定构成指数(index of invariable construction)结构指数、结构影响指数(structural index)第8章截⾯数据(cross-section data)时序数据(time series data)动态数据(dynamic data)时间数列(time series)发展⽔平(level of development)基期⽔平(level of base period)报告期⽔平(level of given period)平均发展⽔平(average level of development)序时平均数(chronological average)增长量(growth quantity)平均增长量(average growth amount)发展速度(speed of development)增长速度(speed of growth)增长率(growth rate)环⽐发展速度(chained speed of development)定基发展速度(fixed base speed of development)环⽐增长速度(chained growth speed)定基增长速度(fixed base growth speed)平均发展速度(average speed of development)平均增长速度(average speed of growth)平均增长率(average growth rate)算术图(arithmetic chart)半对数图(semilog graph)时间数列散点图(scatter diagram of time series)时间数列折线图(broken line graph of time series)⽔平型时间数列(horizontal patterns in time series data)趋势型时间数列(trend patterns in time series data)季节型时间数列(season patterns in time series data)趋势—季节型时间数列(trend-season patterns in time series data)⼀次指数平滑平均数(simple exponential smoothing mean)⼀次指数平滑法(simple exponential smoothing method)最⼩平⽅法(leas square method)最⼩平⽅准则(least squares criterion)原资料平均法(average of original data method)季节模型(seasonal model)(seasonal pattern)长期趋势(secular trends)季节变动(变差)(seasonal variation)季节波动(seasonal fluctuations)不规则变动(变差)(erratic variation)不规则波动(random fluctuations)时间数列加法模型(additive model of time series)时间数列乘法模型(multiplicative model of time series)。
贝叶斯混合效应模型1. 引言贝叶斯混合效应模型(Bayesian Mixed Effects Model)是一种用于统计建模的方法,常用于分析具有层次结构和重复测量的数据。
该模型结合了贝叶斯统计学和混合效应模型的思想,能够对个体差异和群体差异进行建模,并通过后验分布进行参数估计。
本文将介绍贝叶斯混合效应模型的基本概念、建模步骤以及在实际数据分析中的应用。
同时还将讨论该模型的优点和限制,并给出一些相关资源供读者进一步学习和探索。
2. 贝叶斯统计学基础在介绍贝叶斯混合效应模型之前,我们先来回顾一下贝叶斯统计学的基本概念。
2.1 贝叶斯公式贝叶斯公式是贝叶斯统计学的核心思想,它描述了如何根据观察到的数据更新对参数的信念。
设θ为待估参数,x为观测到的数据,则根据贝叶斯公式,后验概率可以表示为:P(θ|x)=P(x|θ)P(θ)P(x)其中,P(x|θ)为似然函数,表示在给定参数θ的情况下观测到数据x的概率;P(θ)为先验概率,表示对参数θ的先前信念;P(x)为边缘概率,表示观测到数据x的概率。
2.2 贝叶斯模型贝叶斯统计学将参数视为随机变量,并引入先验分布来描述对参数的不确定性。
在贝叶斯模型中,我们可以通过似然函数和先验分布来计算后验分布,从而得到关于参数的更准确的推断。
常见的贝叶斯模型包括线性回归模型、混合效应模型等。
其中,混合效应模型是一种广泛应用于多层次数据分析中的方法。
3. 混合效应模型基础混合效应模型(Mixed Effects Model),也称为多层次线性模型(Hierarchical Linear Model),是一种用于分析具有层次结构和重复测量的数据的统计建模方法。
3.1 模型结构混合效应模型将数据分为不同层次,并假设每个层次具有不同的随机效应。
模型的基本结构可以表示为:y ij=X ijβ+Z ij b i+ϵij其中,y ij表示第i个个体在第j个层次上的观测值;X ij和Z ij分别为固定效应和随机效应的设计矩阵;β为固定效应系数;b i为第i个个体的随机效应;ϵij为误差项。
APPLICATION OF BAYESIAN REGULARIZED BP NEURALNETWORK MODEL FOR TREND ANALYSIS,ACIDITY ANDCHEMICAL COMPOSITION OF PRECIPITATION IN NORTHCAROLINAMIN XU1,GUANGMING ZENG1,2,∗,XINYI XU1,GUOHE HUANG1,2,RU JIANG1and WEI SUN21College of Environmental Science and Engineering,Hunan University,Changsha410082,China;2Sino-Canadian Center of Energy and Environment Research,University of Regina,Regina,SK,S4S0A2,Canada(∗author for correspondence,e-mail:zgming@,ykxumin@,Tel.:86–731-882-2754,Fax:86-731-882-3701)(Received1August2005;accepted12December2005)Abstract.Bayesian regularized back-propagation neural network(BRBPNN)was developed for trend analysis,acidity and chemical composition of precipitation in North Carolina using precipitation chemistry data in NADP.This study included two BRBPNN application problems:(i)the relationship between precipitation acidity(pH)and other ions(NH+4,NO−3,SO2−4,Ca2+,Mg2+,K+,Cl−and Na+) was performed by BRBPNN and the achieved optimal network structure was8-15-1.Then the relative importance index,obtained through the sum of square weights between each input neuron and the hidden layer of BRBPNN(8-15-1),indicated that the ions’contribution to the acidity declined in the order of NH+4>SO2−4>NO−3;and(ii)investigations were also carried out using BRBPNN with respect to temporal variation of monthly mean NH+4,SO2−4and NO3−concentrations and their optimal architectures for the1990–2003data were4-6-1,4-6-1and4-4-1,respectively.All the estimated results of the optimal BRBPNNs showed that the relationship between the acidity and other ions or that between NH+4,SO2−4,NO−3concentrations with regard to precipitation amount and time variable was obviously nonlinear,since in contrast to multiple linear regression(MLR),BRBPNN was clearly better with less error in prediction and of higher correlation coefficients.Meanwhile,results also exhibited that BRBPNN was of automated regularization parameter selection capability and may ensure the excellentfitting and robustness.Thus,this study laid the foundation for the application of BRBPNN in the analysis of acid precipitation.Keywords:Bayesian regularized back-propagation neural network(BRBPNN),precipitation,chem-ical composition,temporal trend,the sum of square weights1.IntroductionCharacterization of the chemical nature of precipitation is currently under con-siderable investigations due to the increasing concern about man’s atmospheric inputs of substances and their effects on land,surface waters,vegetation and mate-rials.Particularly,temporal trend and chemical composition has been the subject of extensive research in North America,Canada and Japan in the past30years(Zeng Water,Air,and Soil Pollution(2006)172:167–184DOI:10.1007/s11270-005-9068-8C Springer2006168MIN XU ET AL.and Flopke,1989;Khawaja and Husain,1990;Lim et al.,1991;Sinya et al.,2002; Grimm and Lynch,2005).Linear regression(LR)methods such as multiple linear regression(MLR)have been widely used to develop the model of temporal trend and chemical composition analysis in precipitation(Sinya et al.,2002;George,2003;Aherne and Farrell,2002; Christopher et al.,2005;Migliavacca et al.,2004;Yasushi et al.,2001).However, LR is an“ill-posed”problem in statistics and sometimes results in the instability of the models when trained with noisy data,besides the requirement of subjective decisions to be made on the part of the investigator as to the likely functional (e.g.nonlinear)relationships among variables(Burden and Winkler,1999;2000). On the other hand,recently,there has been increasing interest in estimating the uncertainties and nonlinearities associated with impact prediction of atmospheric deposition(Page et al.,2004).Besides precipitation amount,human activities,such as local and regional land cover and emission sources,the actual role each plays in determining the concentration at a given location is unknown and uncertain(Grimm and Lynch,2005).Therefore,it is of much significance that the model of temporal variation and precipitation chemistry is efficient,gives unambiguous models and doesn’t depend upon any subjective decisions about the relationships among ionic concentrations.In this study,we propose a Bayesian regularized back-propagation neural net-work(BRBPNN)to overcome MLR’s deficiencies and investigate nonlinearity and uncertainty in acid precipitation.The network is trained through Bayesian reg-ularized methods,a mathematical process which converts the regression into a well-behaved,“well-posed”problem.In contrast to MLR and traditional neural networks(NNs),BRBPNN has more performance when the relationship between variables is nonlinear(Sovan et al.,1996;Archontoula et al.,2003)and more ex-cellent generalizations because BRBPNN is of automated regularization parameter selection capability to obtain the optimal network architecture of posterior distri-bution and avoid over-fitting problem(Burden and Winkler,1999;2000).Thus,the main purpose of our paper is to apply BRBPNN method to modeling the nonlinear relationship between the acidity and chemical compositions of precipitation and improve the accuracy of monthly ionic concentration model used to provide pre-cipitation estimates.And both of them are helpful to predict precipitation variables and interpret mechanisms of acid precipitation.2.Theories and Methods2.1.T HEORY OF BAYESIAN REGULARIZED BP NEURAL NETWORK Traditional NN modeling was based on back-propagation that was created by gen-eralizing the Widrow-Hoff learning rule to multiple-layer networks and nonlinear differentiable transfer monly,a BPNN comprises three types ofAPPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL 169Hidden L ayerInput a 1=tansig(IW 1,1p +b 1 ) Output L ayer a 2=pu relin(LW 2,1a 1+b 2)Figure 1.Structure of the neural network used.R =number of elements in input vector;S =number of hidden neurons;p is a vector of R input elements.The network input to the transfer function tansig is n 1and the sum of the bias b 1.The network output to the transfer function purelin is n 2and the sum of the bias b 2.IW 1,1is input weight matrix and LW 2,1is layer weight matrix.a 1is the output of the hidden layer by tansig transfer function and y (a 2)is the network output.neuron layers:an input layer,one or several hidden layers and an output layer comprising one or several neurons.In most cases only one hidden layer is used (Figure 1)to limit the calculation time.Although BPNNs with biases,a sigmoid layer and a linear output layer are capable of approximating any function with a finite number of discontinuities (The MathWorks,),we se-lect tansig and pureline transfer functions of MATLAB to improve the efficiency (Burden and Winkler,1999;2000).Bayesian methods are the optimal methods for solving learning problems of neural network,which can automatically select the regularization parameters and integrates the properties of high convergent rate of traditional BPNN and prior information of Bayesian statistics (Burden and Winkler,1999;2000;Jouko and Aki,2001;Sun et al.,2005).To improve generalization ability of the network,the regularized training objective function F is denoted as:F =αE w +βE D (1)where E W is the sum of squared network weights,E D is the sum of squared net-work errors,αand βare objective function parameters (regularization parameters).Setting the correct values for the objective parameters is the main problem with im-plementing regularization and their relative size dictates the emphasis for training.Specially,in this study,the mean square errors (MSE)are chosen as a measure of the network training approximation.Set a desired neural network with a training data set D ={(p 1,t 1),(p 2,t 2),···,(p i ,t i ),···,(p n ,t n )},where p i is an input to the network,and t i is the corresponding target output.As each input is applied to the network,the network output is compared to the target.And the error is calculated as the difference between the target output and the network output.Then170MIN XU ET AL.we want to minimize the average of the sum of these errors(namely,MSE)through the iterative network training.MSE=1nni=1e(i)2=1nni=1(t(i)−a(i))2(2)where n is the number of sample set,e(i)is the error and a(i)is the network output.In the Bayesian framework the weights of the network are considered random variables and the posterior distribution of the weights can be updated according to Bayes’rule:P(w|D,α,β,M)=P(D|w,β,M)P(w|α,M)P(D|α,β,M)(3)where M is the particular neural network model used and w is the vector of net-work weights.P(w|α,M)is the prior density,which represents our knowledge of the weights before any data are collected.P(D|w,β,M)is the likelihood func-tion,which is the probability of the data occurring,given that the weights w. P(D|α,β,M)is a normalization factor,which guarantees that the total probability is1.Thus,we havePosterior=Likelihood×PriorEvidence(4)Likelyhood:A network with a specified architecture M and w can be viewed as making predictions about the target output as a function of input data in accordance with the probability distribution:P(D|w,β,M)=exp(−βE D)Z D(β)(5)where Z D(β)is the normalization factor:Z D(β)=(π/β)n/2(6) Prior:A prior probability is assigned to alternative network connection strengths w,written in the form:P(w|α,M)=exp(−αE w)Z w(α)(7)where Z w(α)is the normalization factor:Z w(α)=(π/α)K/2(8)APPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL171 Finally,the posterior probability of the network connections w is:P(w|D,α,β,M)=exp(−(αE w+βE D))Z F(α,β)=exp(−F(w))Z F(α,β)(9)Setting regularization parametersαandβ.The regularization parameters αandβdetermine the complexity of the model M.Now we apply Bayes’rule to optimize the objective function parametersαandβ.Here,we haveP(α,β|D,M)=P(D|α,β,M)P(α,β|M)P(D|M)(10)If we assume a uniform prior density P(α,β|M)for the regularization parame-tersαandβ,then maximizing the posterior is achieved by maximizing the likelihood function P(D|α,β,M).We also notice that the likelihood function P(D|α,β,M) on the right side of Equation(10)is the normalization factor for Equation(3). According to Foresee and Hagan(1997),we have:P(D|α,β,M)=P(D|w,β,M)P(w|α,M)P(w|D,α,β,M)=Z F(α,β)Z w(α)Z D(β)(11)In Equation(11),the only unknown part is Z F(α,β).Since the objective function has the shape of a quadratic in a small area surrounding the minimum point,we can expand F(w)around the minimum point of the posterior density w MP,where the gradient is zero.Solving for the normalizing constant yields:Z F(α,β)=(2π)K/2det−1/2(H)exp(−F(w MP))(12) where H is the Hessian matrix of the objective function.H=β∇2E D+α∇2E w(13) Substituting Equation(12)into Equation(11),we canfind the optimal values for αandβ,at the minimum point by taking the derivative with respect to each of the log of Equation(11)and set them equal to zero,we have:αMP=γ2E w(w MP)andβMP=n−γ2E D(w MP)(14)whereγ=K−αMP trace−1(H MP)is the number of effective parameters;n is the number of sample set and K is the total number of parameters in the network. The number of effective parameters is a measure of how many parameters in the network are effectively used in reducing the error function.It can range from zero to K.After training,we need to do the following checks:(i)Ifγis very close to172MIN XU ET AL.K,the network may be not large enough to properly represent the true function.In this case,we simply add more hidden neurons and retrain the network to make a larger network.If the larger network has the samefinalγ,then the smaller network was large enough;and(ii)if the network is sufficiently large,then a second larger network will achieve comparable values forγ.The Bayesian optimization of the regularization parameters requires the com-putation of the Hessian matrix of the objective function F(w)at the minimum point w MP.To overcome this problem,the Gauss-Newton approximation to Hessian ma-trix has been proposed by Foresee and Hagan(1997).Here are the steps required for Bayesian optimization of the regularization parameters:(i)Initializeα,βand the weights.After thefirst training step,the objective function parameters will recover from the initial setting;(ii)Take one step of the Levenberg-Marquardt algorithm to minimize the objective function F(w);(iii)Computeγusing the Gauss-Newton approximation to Hessian matrix in the Levenberg-Marquardt training algorithm; (iv)Compute new estimates for the objective function parametersαandβ;And(v) now iterate steps ii through iv until convergence.2.2.W EIGHT CALCULATION OF THE NETWORKGenerally,one of the difficult research topics of BRBPNN model is how to obtain effective information from a neural network.To a certain extent,the network weight and bias can reflect the complex nonlinear relationships between input variables and output variable.When the output layer only involves one neuron,the influences of input variables on output variable are directly presented in the influences of input parameters upon the network.Simultaneously,in case of the connection along the paths from the input layer to the hidden layer and along the paths from the hidden layer to the output layer,it is attempted to study how input variables react to the hidden layer,which can be considered as the impacts of input variables on output variable.According to Joseph et al.(2003),the relative importance of individual input variable upon output variable can be expressed as:I=Sj=1ABS(w ji)Numi=1Sj=1ABS(w ji)(15)where w ji is the connection weight from i input neuron to j hidden neuron,ABS is an absolute function,Num,S are the number of input variables and hidden neurons, respectively.2.3.M ULTIPLE LINEAR REGRESSIONThis study attempts to ascertain whether BRBPNN are preferred to MLR models widely used in the past for temporal variation of acid precipitation(Buishand et al.,APPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL173 1988;Dana and Easter,1987;MAP3S/RAINE,1982).MLR employs the following regression model:Y i=a0+a cos(2πi/12−φ)+bi+cP i+e i i=1,2,...12N(16) where N represents the number of years in the time series.In this case,Y i is the natural logarithm of the monthly mean concentration(mg/L)in precipitation for the i th month.The term a0represents the intercept.P i represents the natural logarithm of the precipitation amount(ml)for the i th month.The term bi,where i(month) goes from1to12N,represents the monotonic trend in concentration in precipitation over time.To facilitate the estimation of the coefficients a0,a,b,c andφfollowing Buishand et al.(1988)and John et al.(2000),the reparameterized MLR model was established and thefinal form of Equation(16)becomes:Y i=a0+αcos(2πi/12)+βsin(2πi/12)+bi+cP i+e i i=1,2,...12N(17)whereα=a cosϕandβ=a sinϕ.a0,α,β,b and c of the regression coefficients in Equation(17)are estimated using ordinary least squares method.2.4.D ATA SET SELECTIONPrecipitation chemistry data used are derived from NADP(the National At-mospheric Deposition Program),a nationwide precipitation collection network founded in1978.Monthly precipitation information of nine species(pH,NH+4, NO−3,SO2−4,Ca2+,Mg2+,K+,Cl−and Na+)and precipitation amount in1990–2003are collected in Clinton Crops Research Station(NC35),North Carolina, rmation on the data validation can be found at the NADP website: .The BRBPNN advantages are that they are able to produce models that are robust and well matched to the data.At the end of training,a Bayesian regularized neural network has the optimal generalization qualities and thus there is no need for a test set(MacKay,1992;1995).Husmeier et al.(1999)has also shown theoretically and by example that in a Bayesian regularized neural network,the training and test set performance do not differ significantly.Thus,this study needn’t select the test set and only the training set problem remains.i.Training set of BRBPNN between precipitation acidity and other ions With regard to the relationship between precipitation acidity and other ions,the input neurons are taken from monthly concentrations of NH+4,NO−3,SO2−4,Ca2+, Mg2+,K+,Cl−and Na+.And precipitation acidity(pH)is regarded as the output of the network.174MIN XU ET AL.ii.Training set of BRBPNN for temporal trend analysisBased on the weight calculations of BRBPNN between precipitation acidity and other ions,this study will simulate temporal trend of three main ions using BRBPNN and MLR,respectively.In Equation(17)of MLR,we allow a0,α,β,b and c for the estimated coefficients and i,P i,cos(2πi/12),and sin(2πi/12)for the independent variables.To try to achieve satisfactoryfitting results of BRBPNN model,we similarly employ four unknown items(i,P i,cos(2πi/12),and sin(2πi/12))as the input neurons of BRBPNN,the availability of which will be proved in the following. 2.5.S OFTWARE AND METHODMLR is carried out through SPSS11.0software.BRBPNN is debugged in neural network toolbox of MATLAB6.5for the algorithm described in Section2.1.Concretely,the BRBPNN algorithm is implemented through“trainbr”network training function in MATLAB toolbox,which updates the weight and bias according to Levenberg-Marquardt optimization.The function minimizes both squared errors and weights,provides the number of network parameters being effectively used by the network,and then determines the correct combination so as to produce a network that generalizes well.The training is stopped if the maximum number of epochs is reached,the performance has been minimized to a suitable small goal, or the performance gradient falls below a suitable target.Each of these targets and goals is set at the default values by MATLAB implementation if we don’t want to set them artificially.To eliminate the guesswork required in determining the optimum network size,the training should be carried out many times to ensure convergence.3.Results and Discussions3.1.C ORRELATION COEFfiCIENTS OF PRECIPITATION IONSFrom Table I it shows the correlation coefficients for the ion components and precipitation amount in NC35,which illustrates that the acidity of precipitation results from the integrative interactions of anions and cations and mainly depends upon four species,i.e.SO2−4,NO−3,Ca2+and NH+4.Especially,pH is strongly correlated with SO2−4and NO−3and their correlation coefficients are−0.708and −0.629,respectively.In addition,it can be found that all the ionic species have a negative correlation with precipitation amount,which accords with the theory thatthe higher the precipitation amount,the lower the ionic concentration(Li,1999).3.2.R ELATIONSHIP BETWEEN PH AND CHEMICAL COMPOSITIONS3.2.1.BRBPNN Structure and RobustnessFor the BRBPNN of the relationship between pH and chemical compositions,the number of input neurons is determined based on that of the selected input variables,APPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL175TABLE ICorrelation coefficients of precipitation ionsPrecipitation Ions Ca2+Mg2+K+Na+NH+4NO−3Cl−SO2−4pH amountCa2+ 1.0000.4620.5480.3490.4490.6270.3490.654−0.342−0.369Mg2+ 1.0000.3810.9800.0510.1320.9800.1230.006−0.303K+ 1.0000.3200.2480.2260.3270.316−0.024−0.237Na+ 1.000−0.0310.0210.9920.0210.074−0.272NH+4 1.0000.7330.0110.610−0.106−0.140NO−3 1.0000.0500.912−0.629−0.258Cl− 1.0000.0490.075−0.265SO2−4 1.000−0.708−0.245pH 1.0000.132 Precipitation 1.000 amountcomprising eight ions of NH+4,NO−3,SO2−4,Ca2+,Mg2+,K+,Cl−and Na+,and the output neuron only includes pH.Generally,the number of hidden neurons for traditional BPNN is roughly estimated through investigating the effects of the repeatedly trained network.But,BRBPNN can automatically search the optimal network parameters in posterior distribution(MacKay,1992;Foresee and Hagan, 1997).Based on the algorithm of Section2.1and Section2.5,the“trainbr”network training function is used to implement BRBPNNs with a tansig hidden layer and a pureline output layer.To acquire the optimal architecture,the BRBPNNs are trained independently20times to eliminate spurious effects caused by the random set of initial weights and the network training is stopped when the maximum number of repetitions reaches3000epochs.Add the number of hidden neurons(S)from1to 20and retrain BRBPNNs until the network performance(the number of effective parameters,MSE,E w and E D,etc.)remains approximately the same.In order to determine the optimal BRBPNN structure,Figure2summarizes the results for training many different networks of the8-S-1architecture for the relationship between pH and chemical constituents of precipitation.It describes MSE and the number of effective parameters changes along with the number of hidden neurons(S).When S is less than15,the number of effective parameters becomes bigger and MSE becomes smaller with the increase of S.But it is noted that when S is larger than15,MSE and the number of effective parameters is roughly constant with any network.This is the minimum number of hidden neurons required to properly represent the true function.From Figure2,the number of hidden neurons (S)can increase until20but MSE and the number of effective parameters are still roughly equal to those in the case of the network with15hidden neurons,which suggests that BRBPNN is robust.Therefore,using BPBRNN technique,we can determine the optimal size8-15-1of neural network.176MIN XU ET AL.Figure2.Changes of optimal BRBPNNs along with the number of hidden neurons.parison of calculations between BRBPNN(8-15-1)and MLR.3.2.2.Prediction Results ComparisonFigure3illustrates the output response of the BRBPNN(8-15-1)with a quite goodfit.Obviously,the calculations of BRBPNN(8-15-1)have much higher correlationcoefficient(R2=0.968)and more concentrated near the isoline than those of MLR. In contrast to the previous relationships between the acidity and other ions by MLR,most of average regression R2achieves less than0.769(Yu et al.,1998;Baez et al.,1997;Li,1999).Additionally,Figures2and3show that any BRBPNN of8-S-1architecture hasbetter approximating qualities.Even if S is equal to1,MSE of BRBPNN(8-1-1)ismuch smaller and superior than that of MLR.Thus,we can judge that there havebeen strong nonlinear relationships between the acidity and other ion concentration,which can’t be explained by MLR,and that it may be quite reasonable to apply aAPPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL177TABLE IISum of square weights(SSW)and the relative importance(I)from input neurons to hidden layer Ca2+Mg2+K+Na+NH+4NO−3Cl−SO2−4 SSW 2.9589 2.7575 1.74170.880510.4063 4.0828 1.3771 5.2050 I(%)10.069.38 5.92 2.9935.3813.88 4.6817.70neural network methodology to interpret nonlinear mechanisms between the acidity and other input variables.3.2.3.Weight Interpretation for the Acidity of PrecipitationTo interpret the weight of the optimal BRBPNN(8-15-1),Equation(15)is used to evaluate the significance of individual input variable and the calculations are illustrated in Table II.In the eight inputs of BRBPNN(8-15-1),comparatively, NH+4,SO2−4,NO−3,Ca2+and Mg2+have greater impacts upon the network and also indicates thesefive factors are of more significance for the acidity.From Table II it shows that NH+4contributes by far the most(35.38%)to the acidity prediction, while SO2−4and NO−3contribute with17.70%and13.88%,respectively.On the other hand,Ca2+and Mg2+contribute10.06%and9.38%,respectively.3.3.T EMPORAL TREND ANALYSIS3.3.1.Determination of BRBPNN StructureUniversally,there have always been lowfitting results in the analysis of temporal trend estimation in precipitation.For example,the regression R2of NH+4and NO−3 for Vhesapeake Bay Watershed in Grimma and Lynch(2005)are0.3148and0.4940; and the R2of SO2−4,NH+4and NO−3for Japan in Sinya et al.(2002)are0.4205, 0.4323and0.4519,respectively.This study also applies BRBPNN to estimate temporal trend of precipitation chemistry.According to the weight results,we select NH+4,SO2−4and NO−3to predict temporal trends using BRBPNN.Four unknown items(i,P i,cos(2πi/12),and sin(2πi/12))in Equation(17)are assumed as input neurons of BRBPNNs.Spe-cially,two periods(i.e.1990–1996and1990–2003)of input variables for NH+4 temporal trend using BRBPNN are selected to compare with the past MLR results of NH+4trend analysis in1990–1996(John et al.,2000).Similar to Figure2with training20times and3000epochs of the maximum number of repetitions,Figure4summarizes the results for training many different networks of the4-S-1architecture to approximate temporal variation for three ions and shows the process of MSE and the number of effective parameters along with the number of hidden neurons(S).It has been found that MSE and the number of effective parameters converge and stabilize when S of any network gradually increases.For the1990–2003data,when the number of hidden neurons(S)can178MIN XU ET AL.Figure4.Changes of optimal BRBPNNs along with the number of hidden neurons for different ions.∗a:the period of1990–2003;b:the period of1990–1996.increase until10,we canfind the minimum number of hidden neurons required to properly represent the accurate function and achieve satisfactory results are at least 6,6and4for trend analysis of NH+4,SO2−4and NO−3,respectively.Thus,the best BRBPNN structures of NH+4,SO2−4and NO−3are4-6-1,4-6-1,4-4-1,respectively. Additionally for NH+4data in1990–1996,the optimal one is BRBPNN(4-10-1), which differs from BRBPNN(4-6-1)of the1990–2003data and also indicates that the optimal BRBPNN architecture would change when different data are inputted.parison between BRBPNN and MLRFigure5–8summarize the comparison results of the trend analysis for different ions using BRBPNN and MLR,respectively.In particular,for Figure5,John et al. (2000)examines the R2of NH+4through MLR Equation(17)is just0.530for the 1990–1996data in NC35.But if BRBPNN method is utilized to train the same1990–1996data,R2can reach0.760.This explains that it is indispensable to consider the characteristics of nonlinearity in the NH+4trend analysis,which can make up the insufficiencies of MLR to some extent.Figure6–8demonstrate the pervasive feasibility and applicability of BRBPNN model in the temporal trend analysis of NH+4,SO2−4and NO−3,which reflects nonlinear properties and is much more precise than MLR.3.3.3.Temporal Trend PredictionUsing the above optimal BRBPNNs of ion components,we can obtain the optimal prediction results of ionic temporal trend.Figure9–12illustrate the typical seasonal cycle of monthly NH+4,SO2−4and NO−3concentrations in NC35,in agreement with the trend of John et al.(2000).APPLICATION OF BAYESIAN REGULARIZED BP NEURAL NETWORK MODEL179parison of NH+4calculations between BRBPNN(4-10-1)and MLR in1990–1996.parison of NH+4calculations between BRBPNN(4-6-1)and MLR in1990–2003.parison of SO2−4calculations between BRBPNN(4-6-1)and MLR in1990–2003.Based on Figure9,the estimated increase of NH+4concentration in precipita-tion for the1990–1996data corresponds to the annual increase of approximately 11.12%,which is slightly higher than9.5%obtained by MLR of John et al.(2000). Here,we can confirm that the results of BRBPNN are more reasonable and im-personal because BRBPNN considers nonlinear characteristics.In contrast with180MIN XU ET AL.parison of NO−3calculations between BRBPNN(4-4-1)and MLR in1990–2003Figure9.Temporal trend in the natural log(logNH+4)of NH+4concentration in1990–1996.∗Dots (o)represent monitoring values.The solid and dashed lines respectively represent predicted values and estimated trend given by BRBPNN method.Figure10.Temporal trend in the natural log(logNH+4)of NH+4concentration in1990–2003.∗Dots (o)represent monitoring values.The solid and dashed lines respectively represent predicted values and estimated trend given by BRBPNN method.。
计算机研究与发展ISSN 1000 1239/CN 11 1777/T PJournal of Computer Research and Development 42(9):1527~1532,2005收稿日期:2003-11-13;修回日期:2004-04-05基金项目:国家自然科学基金项目(60175011,60375011);安徽省自然科学基金项目(03042207);安徽省优秀青年科技基金项目(04042044)Bayes 网络推理结论的解释机制研究汪荣贵 张佑生 高 隽 彭青松(合肥工业大学计算机与信息学院 合肥 230009)(wangrgui@mail hf ah cn)Research on Explanation Function for Reason C onclusions with Bayesian NetworkWang Ronggui,Zhang Yousheng,Gao Jun,and Peng Qingsong(College of Comp uter and I nf or mation ,H ef ei Univer sity of T echnology ,H ef ei 230009)Abstract In this paper,an explanation function about Bayesian network is presented With it,evidences effect deg ree,direction and paths on reason conclusion can be ex plained Necessity factor and sufficiency factor are designed as a measure approach,to valuate evidences effect degree on posteriori distributions By the w ay of qualitatively analysis the character of netw ork structure,notes relative to reason conclusion are find out Based on those notes,and combined w ith the quantitatively analysis,sub chains w hich consist of effect paths are found out,too Those sub chains are valuated to create and explain the effect paths Ex per im ent results show the effectiveness of the ex plain functionKey words Bayesian network;posteriori distribution;1 normal;effect degree;effect direction;effect path摘 要提出一种关于Bayes 网络的解释机制,用于解释证据对推理结论的作用程度、方向及路径 引入必要性和充分性因子作为度量来评价证据对推理结论的作用程度;通过定性分析网络结构特点,找出与推理结论有关的节点,在此基础上,结合定量分析找出组成作用路径的子链,并分析这些子链对推理结论的作用,由此生成和解释证据对推理结论的作用路径 实验结果验证了方法的有效性关键词Bayes 网络;后验概率分布;1 范数;作用程度;作用方向;作用路径中图法分类号 T P1811 引 言Bayes 网络模型[1,2]的知识表示和推理算法基于联合概率分布,不能像MYCIN 等系统那样通过翻译推理链自动生成解释[3,4] Bayes 网络推理结论的解释机制的研究正在形成一个热点[5]Bayes 网络推理的解释方法研究的核心问题在于找到适当的方法评价证据对兴趣节点的后验概率分布的作用大小及路径[2,6] 文本提出一种称为删除法的技术来研究这个问题,在此基础上建立一种Bayes 网络推理的解释方法,实验结果验证了方法的有效性2 证据对结论的作用程度Bayes 网络推理就是在已知证据节点集合E 的取值状态下,算出网络中其余节点(非证据节点)X 的概率分布,即计算后验概率分布P (X |E ) 现在提出一种称为删除法的技术来分析证据对推理结论的作用程度,就是当考察网络结构中的某个(些)证据对推理结论的作用时,就将它(们)从证据集合中删除(即在Bayes 网络模型中将其看成非证据节点),然后计算推理结论的变化 对于Bayes 网络模型中某一特定的非证据节点X ,在证据集合为E 时的推理结论为后验概率分布P(X |E ) 考察E 中的某一证据L 对P (X |E )的作用 从E 中删除L ,得到的集合为E -L,相应的后验概率分布为P (X |E -L ) 使用向量差P (X |E )-P (X |E -L )的1 范数来度量两者间的差异,并记为M (P (X |E ),P (X |E-L)),即M (P (X |E ),P (X |E -L ))=|p i -q i |,(1)其中,P (X =x i |E )=p i ,P (X |E -L )=q i ,i =1,2,!,m ;m 是X 的状态数令 (L ,X )=M (P (X |E ),P (X |E -L))M (P (X |E ),P (X )), (L ,X )为L 对X 的必要性因子 它表示L 的必要性程度在证据集合E 中所占的比例 令 (L ,X )的阈值 =1 (|E |+1),|E |表示集合E 的基数 若 (L ,X )> ,则表明L 对推理结论P(X |E )具有显著的必要性图1是关于大学生学习成绩简化的Bayes 网络,称之为成绩网络 它由一个有向无环图和6个边际或条件概率矩阵组成 各变量的名称分别为智商(X 1)、努力程度(X 2)、应试能力(X 3)、知识掌握程度(X 4)、考试成绩(X 5)与作业成绩(X 6) 网络结构中蕴含的独立性及条件独立性有P (X 1,X 2)=P (X 1)P (X 2);P (X 3|X 1,X 2)=P (X 3|X 1);P (X 4|X 1,X 2,X 3)=P (X 4|X 1,X 2);P (X 5|X 1,X 2,X 3,X 4)=P (X 5|X 3,X 4);P (X 6|X 1,X 2,X 3,X 4,X 5)=P (X 6|X 4) 相应的知识表示为 P(X 1,X 2,X 3,X 4,X 5,X 6)=P (X 1)P (X 2)P (X 3|X 1)P(X 4|X 1,X 2)P(X 5|X 3,X 4)P (X 6|X 4)F ig 1 Bayesian networ ks for student sco re图1 学生学习成绩的Bay es 网络令证据集合E ={X 1=∀高#,X 2=∀勤奋#},需要解释的是X 5的后验概率分布P (X 5|E ) 若要考察E 中的证据∀X 2=勤奋#,对P (X 5|E )的作用,则取L =∀X 2=勤奋#,E -L ={X 1=∀高#} 计算结果(表1)表明,要取得优良的成绩,高智商和勤奋都很重要Table 1 The Necessity Factor of P(X 5|E )for L表1 L 对P (X 5|E)的必要性因子Probabilistic Distribution Probabilistic Distribution of X 5A B C D E 1 Normal Effect Factor(%)P (X 5|E )0 55290 28880 09340 04580 0191P (X 5)0 23980 22260 24720 21470 0756M (P (X 5|E ),P(X 5))0 5896P (X 5|E-L)0 37400 24750 17560 14350 0594M (P (X 5|E ),P(X 5|E-L))0 3014(L ,X 5)51 12表1中的 (L ,X 5)值表示如果不知道X 2的取值状态,对推理结论会造成多大的损失 若 (L ,X )值较大,则说明证据L 在证据集合E 中对形成推理结论P(X |E )的必要性较大,对P (X |E )的作用当然也较大 但是,若 (L ,X )值较小,则并不能由此断定L 对P (X |E )的作用较小 因为L 对推理结论的作用可能与证据集合中其他证据相重叠 如果在原有的证据集合中增加一条证据:作业成绩(X 6)=∀优#,即令证据集合E ={X 1=∀高#,X 2=∀勤奋#,X 6=∀优#},需要解释的是X 5的后验概率分布P (X 5|E ) 若要考察E 中的证据X 2=∀勤奋#对P(X 5|E )的作用,则取L =X 2为该证据,于是E-L={X 1=∀高#,X 6=∀优#} 得到计算结果如表2所示 可以看出,此时L 对X 5的必要性因子 (L ,X 5)较小 这说明X 2的取值状态未知(不知道是否∀勤奋#),对推理结论的作用较小 由于证据1528计算机研究与发展 2005,42(9)∀X6=优#对推理结论的作用与证据∀X2=勤奋#的作用相重叠, (L,X5)较小并不意味着证据∀X2=勤奋#单独对推理结论的作用不重要为消除重叠现象对解释的干扰,现在寻找L对推理结论P(X|E)的充分性程度 若将除L以外的所有证据节点看成非证据节点,则X的后验概率分布从P(X|E)变成P(X|L),相应的必要性因子为 (E-L,X),它度量了除L以外的所有证据E-L对形成推理结论的必要性程度 令(L,X)=1- (E-L,X),称(L,X)为L对X的充分性因子 从表2可以看出,L对X5的充分性因子(L,X5)较大,体现了学习中勤奋的重要性Table2 The Ef fect Factor of P(X5|E)for L 表2 L对P(X5|E)的作用因子Probabilistic DistributionProbabilistic Dis tribution of X5A B C D E1 Normal E ffect Factor(%)P(X5|E)0 57600 29410 08290 03320 0139P(X5)0 23980 22260 24720 21470 0756M(P(X5|E),P(X5))0 8152(L,X5)42 32 P(X5|E-L)0 53640 28490 10100 05480 0228M(P(X5|E),P(X5|E-L))0 0974 (L,X5)11 95 P(X5|L)0 36630 26860 18430 13470 0460M(P(X5|E),P(X5|L))0 4702 (E-L,X5)57 68可以根据表3所示的规则,使用 , (L,X)和(L,X)这3个数据,将证据分为关键证据、必要证据、重要证据、次要证据等4种类型 例如,若取阈值 =1 4,则根据表2、表3得到如下解释:证据集合{智商(X1)=∀高#,努力(X2)=∀勤奋#,作业成绩(X6)=∀优#}中,证据∀努力(X2)=勤奋#是重要证据 ∀努力(X2)=勤奋#与∀作业成绩(X6)=优#的对推理结论的作用有重叠现象 因为若没有证据∀作业成绩(X6)=优#,则∀努力(X2)=勤奋#就有显著的必要性 因此,对于证据集合{智商(X1)=∀高#,努力(X2)=∀勤奋#},∀努力(X2)=勤奋#是关键证据Table3 The Explanation Rule for the Effect Extent to the Inf erence Conclusion表3 对推理结论的作用程度的解释规则(L,X)(L,X)T he Effect Extent of L for X,i e the Effect Extent for the In feren ce Conclusion P(X|E)T ype > > L has important contributi on to inference conclusion,and this kind of contributi on can t be s ubstituted for other evi dences in EKey Evidence> < T hough L itself does not have much contribution to inference conclusion,it is important in E Because if th ere is no L,the contribution of other eviden ces i n E w ill decrease remarkably Necessity Evi dence< > L has important contribution to inference conclusion,w hile part of the contri bution overlap w ith the contributi on of other evidences in E Th en L could be substituted for other evidences in E Important Evi dence< < L i tself does not have much contri bution to inference conclusi on,and it could be substituted forother evidences in E T hen it is unimportant evidence in EM inor Evidence3 证据对结论的作用路径3 1 由P(X|E)生成的Bayes网络根据Bayes网络的结构特征与条件独立性之间的关系,推理结论P(X|E)一般只与网络中部分节点有关 找出证据对推理结论产生影响的路径,只需考察可能与计算P(X|E)有关的节点 对于规模较大的Bayes网络,删除网络中与计算P(X|E)无关的节点可以减少问题的复杂性 不难证明:如果节点N满足如下3个条件之一,那么它与计算P(X| E)无关:∃N不是X或X的某一先驱节点,也不是证据节点或某个证据节点的先驱节点;%N和X被E或E中的元素有向分割(d separation)[7,8];&连接N和X的每条路径中都存在与计算P(X|E)无关的节点删除网络结构中所有与计算P(X|E)无关的节点,就生成一个新的Bayes网络模型,本文称之为由P(X|E)生成的Bayes网络,该Bayes网络的网络结构就是简化后的网络结构,条件概率分布就是原Bayes网络模型中对应的条件概率分布3 2 生成作用路径现在要从由P(X|E)生成的Bayes网络结构中找出所有连接L和X的有向路径(有向链) 所谓1529汪荣贵等:Bayes网络推理结论的解释机制研究连接L和X的有向路径,就是指一些互异的节点{X1,X2,!,X K}组成的节点序列,且满足如下4个条件:∃序列的起点和终点分别对应于L和X,即X1=L,X K=X;%对于X i和X i+1,存在从X i到X i+1,或者从X i+1到X i的有向边;&{X1,X2,!, X K}中每个节点都与计算P(X|E)有关;∋不能含有异于L证据节点 本文设计了如下深度优先穷尽搜索算法,寻找连接L和X的有向路径对于单连通的网络结构,任意两个节点之间只有惟一的连接路径,可以通过分析路径中节点的概率分布的变化,直接生成对该路径的解释 对于多连通网络结构,可能有多个连接L和X的有向路径,它们对P(X|E)的作用程度不一定相同,作用路径间可能存在重叠 为此将连接路径适当分割为若干子链,定量分析每条子链对推理结论P(X|E)的作用程度,生成作用子链 具体的分割算法比较简单,不再赘述 要从这些可能的子链中找出L作用X的作用子链,则需要对这些子链做进一步的定量分析3 3 解释作用路径如果某条作用子链存在于每条连接L与X的有向路径,则该子链显然是L作用X的必经之路,称这种子链为关键子链 对于所有的非关键子链,可以使用删除法来分析它们对推理结论作用程度 如果要考察某条非关键子链对推理结论P(X|E)的作用,就从由P(X|E)生成的Bayes网络结构中删除该链两端点之间的部分,然后通过度量和分析P(X|E)的变化来生成对该链的解释 使用删除法分析子链对推理结论的作用需要处理好如下两个问题:∃从网络结构中删除子链两端点之间的部分,改变了网络结构,子链的某个端点在网络结构中的父节点数可能会减少,其条件概率分布会发生变化,此时如何计算新的条件概率分布;%从网络结构中删除子链,相当于增加了条件独立性假设,改变了知识库的结构,由此可能改变网络节点X的先验概率P(X),如何处理先验概率的这种变化设Bayes网络中某节点A有父节点B0,B1,!, B K,删除A的父节点B0,根据概率的归一性,有:P(A|B1,!,B K)=B0P(A,B0|B1,B2,!,B K)= B0P(B0|B1,B2,!,B K)P(A|B0,B1,B2,!,B K)(3) 可以按上式计算删除B0后的关于节点A的条件概率分布,其中P(A|B0,B1,B2,!,B K)就是删除B0前节点A的条件概率分布,P(B0|B1,B2, !,B K)可由Bay es网络所确定的联合概率分布计算:P(B0|B1,B2,!,B K)=P(B0,B1,B2,!,B K) P(B1,B2,!,B K)从由P(X|E)生成的Bayes网络模型中删除某条作用子链C两端点之间的部分,若端点父节点数发生变化,则可以根据式(3)算出新的条件概率分布,形成一个新的Bayes网络模型 根据该模型可以算出节点X的新的先验概率分布P*(X)和作为推理结论的后验概率分布P*(X|E),进一步算出基于该模型的L对X的充分性因子*(L,X)Table4 The Explanation Rule of Effect Path f or Evidence L to the Influence C onclusion表4 证据L对推理结论作用路径的解释规则d(L,X)M(P,P*)Color Explanati on T ype>!<∀Bulky Black T he effect of the changing for the know ledge database s tructure is not obvious,then difference d(L,X)is mai nly the probabilistic informationpropagated by the sub chain T hen this sub chain transfers mai n probabilistic informationBelong to main path>!>∀Black M(P,P*)being obvious,difference d(L,X)contains part effect ofthe changing of know ledge database structure H ow ever it could not assure the sub chain w i ll transfer mass informationBelong to main path<!<∀Bulky Gray T his sub chain will transfer minor probabilistic information Not belong to main path <!>∀Colorless T he effect of knowledge database changing is obvi ous,and d(L,X)isminor T his situati on i s peculiar使用P(X)与P*(X)之差的1 范数M(P(X), P*(X))度量两者的差异 令d(L,X)=(L,X) -*(L,X),根据如表4所示的解释和表示规则,就可以通过分别考察d(L,X)及M(P(X),P*(X))的大小,生成对作用子链C的解释,并着上相应的颜色 由颜色相同的子链组成的有向路径就是一条作用路径 该作用路径的类型由其颜色确定 表4中的阈值!和∀需要根据经验选取1530计算机研究与发展 2005,42(9)4 应用实例:解释ALAR M 模型的作用路径ALARM 模型是由Beinlich 等人构建的[9],用于监视麻醉状态下病人身体状况及相关医疗设备的工作状态 它有46条边37个节点,包括8个诊断节点(感兴趣的输出节点)、16个证据节点、13个中间节点,图2表示该模型的网络结构,本节用N j 表示图中第j 个节点 如每个节点的医学含义、取值状态、条件或先验概率分布等,可以从网页[10,11]中得到令L =∀N 13=0#,首先找出与计算P (X |E )有关的所有节点,组成由P (X |E )生成的Bayes 网络(如图3(a)所示);然后找出所有连接L 和X 的有向路径,共有3条:{N 13,N 36,N 24};{N 13,N 22,N 35,N 36,N 24};{N 13,N 23,N 35,N 36,N 24} 这些有向路径组成一个关于L 和X 的网络结构,如图3(b)所示;将图3(b )中网络结构分割成若干有向路径的子链,共有5条,它们是C 1={N 36,N 24};C 2={N 13,N 36};C 3={N 35,N 36};C 4={N 13,N 22,N 35};C 5={N 13,N 23,N 35} 节点N 23的概率分布几乎不变,满足命题2的条件,因此C 5不是作用子链 其余4条为所有可能的作用子链,C 1是关键子链;最后,分别计算作用子链C 2,C 3,C 4的(d (L ,X ),M (P (X ),P *(X )))值,计算结果分别为(48 82%,0 0213);(2 35%,0 0324);(2 12%,0 0532),根据表4生成对作用路径的解释,如图3(c)所示证据∀N 13=0#对节点N 24的作用路径有两条:{N 13,N 36,N 24}和{N 13,N 22,N 35,N 36,N 24} 其中{N 13,N 36,N 24}是主要作用路径,{N 13,N 22,N 35,N 36,N 24}是次要作用路径 证据N 13主要通过N 36来作用N 24,而节点N 22,N 35对N 24的作用不大 事实上,节点N 36表示输氧管道的通风状态,与节点N 24的因果关系显然十分密切F ig 3 Finding and Explaining the effect path (a)Nodes r elated to the computing process of P (X |E);(b)Dir ect path connecting L and X ;and (c)T he effect path of L to X图3 寻找并解释作用路径 (a)与计算P(X |E )有关的节点;(b)连接L 和X 的有向路径;(c)L 对X 的作用路径F ig 2 N etw ork structure of the AL ARM model图2 AL ARM 模型的网络结构1531汪荣贵等:Bayes 网络推理结论的解释机制研究5 总 结使用经典的概率理论处理不确定性信息和知识面临着两个主要困难[4]:∃计算量与概率模型的精度之间难以取舍;%概率理论基于公理系统,其推理方式与人的思维方式有较大差别,难以构建基于概率型智能系统的解释机制 因此,人们一般使用广义概率方法(如主观Bayes方法、确定性因子等)或其他的启发式方法(D S证据理论、模糊理论等),来解决智能信息处理中的不确定性信息和知识 Bayes 网络模型通过巧妙使用条件独立性克服了第1个困难[1],使得人们恢复了对经典的概率理论和方法的信心,导致近十几年来智能信息处理中经典的概率理论和方法的复兴 本文的研究表明,基于Bay es网络的智能信息系统可以构造有效的解释机制,解释证据对推理结论作用的程度及路径参考文献1J Pearl Probabilistic Reasoning i n Expert Systems:Netw orks of Plausible Inference San M ateo,CA:M organ Kaufmann,1988 2S L Lauritzen,D J Spiegelhalter Local computati ons with probabilities on graphical structures and their appli cati on to expert systems Journal of the Royal Statistical Society,Series B,1988, 50(2):157~2243C Elsaes s er,et al Explanation of probabi listic inference In: Proc Conf Un certainty in Artificial Intelligence Amsterdam, Holland:Elsevier Science Publishers,1989 319~3284G S hafer,J Pearl,eds Readings in Uncertain Reasoning San M ateo,CA:M organ Kaufmann,19905D M adigan,et al Graphical explanations in belief networks Journal of Computational and Graphic Statistics,1997,6(2): 1601~1816U Chajew ska,J Y Halpern Defining explanation in probabilis tic systems In:Proc 30th Conf Uncertai nty in Artificial Intelli gence San Francisco:M organ Kaufmann,1997 62~717D Geiger,et al D Separation:From th eorems to algori thms T he5th W orkshop on U ncertainty in Artificial Intelligence,Wind sor,Ontari o,19898D Geiger,T Verma,J Pearl Identifying independence in Bayesian netw orks Netw orks,1990,20(2):507~5349I A Bei nli ch,e t al T he ALARM monitoring system:A case study w ith tw o probabilis tic inference techniques for belief n et works In Proc the2n d European Conf Artificial Intelligence i n M edical Care Berlin:S pringer Verlag,1989 247~25610I A Beinlich A logical alarm reduction mechanism http:( ww netlib alarm.htm,2002 1011I A Beinlich A m edical diagnostic alarm message system http: (ww netlib ALARM.dnet,2002 10Wang Ronggui,born in1966 Received hisPh D deg ree in2004 His curr ent researchinter ests include intelligent information processing,know ledge engineer,Bay esian Network,imag e understanding汪荣贵,1966年生,博士,副教授,主要研究方向为智能信息处理、知识工程、Bay es网络、图像理解Zhang Yousheng,born in1941 He hasbeen professor and Ph. D.supervisor ofHefei U niversity of T echnology since1994His current r esearch interests include ar tificial intelligent and its applicat ion in imagerecog nit ion and understanding张佑生,1941年生,教授,博士生导师,主要研究方向为人工智能在图形和图像识别与理解中应用Gao Jun,born in1963 He has been professor and Ph. D.supervisor of Hefei U niv ersity of T echnolog y since2000 H i s current resear ch interests include image processing,pattern recognition,neural netw orks,intelligent information processing高隽,1963年生,教授,博士生导师,主要研究方向为图像处理、模式识别、神经网络理论及应用、光电信息处理、智能信息处理Peng Qingsong,born in1975 Received hisPh D.deg ree in2004 His curr ent researchinter ests include art ificial intelligent and itsapplicatio n in image reco gnition and understanding彭青松,1975年生,博士,主要研究方向为人工智能及其图像识别与理解中应用Research BackgroundBayesian networ k can be used to compute the uncertain information and knowledge T he joint probabilist ic distribution is used to r epresent the knowledge to enhance consistency and to improve the reasonability of the inference conclusion,and conditional indepen dence contained in graphical mo dels is used to decrease the complex ity of joint probabili stic distribution Ho wever the infer ence conclu sion of the Bayesian network i s t he form of posteriori pr obabilistic distr ibut ion,and what s serious is that it could not be ex plained di rectly by translating inference chain o r inference networ k So the ex planation mechanism of the Bay esian netw ork is a research topic t hat is w orthy of researching We have resear ched and r ealized a kind of explanation mechanism and have prov ided new solut ion to study on the ex planation mechanism of Bayesian network T he research is supported by the national science research foundation(No 60175011,No 60375011)1532计算机研究与发展 2005,42(9)。
Bayesian Analysis(2006)1,Number3,pp.403–420Subjective Bayesian Analysis:Principles andPracticeMichael Goldstein∗Abstract.We address the position of subjectivism within Bayesian statistics.We argue,first,that the subjectivist Bayes approach is the only feasible methodfor tackling many important practical problems.Second,we describe the essentialrole of the subjectivist approach in scientific analysis.Third,we consider possiblemodifications to the Bayesian approach from a subjectivist viewpoint.Finally,weaddress the issue of pragmatism in implementing the subjectivist approach.Keywords:coherency,exchangeability,physical model analysis,high reliabilitytesting,objective Bayes,temporal sure preference1IntroductionThe subjective Bayesian approach is based on a very simple collection of ideas.You are uncertain about many things in the world.You can quantify your uncertainties as probabilities,for the quantities you are interested in,and conditional probabilities for observations you might make given the things you are interested in.When data arrives, Bayes theorem tells you how to move from your prior probabilities to new conditional probabilities for the quantities of interest.If you need to make decisions,then you may also specify a utility function,given which your preferred decision is that which maximises expected utility with respect to your conditional probability distribution.There are many compelling accounts explaining how and why this view should form the basis for statistical methodology;see,for example,Lindley(2000)and the accompa-nying discussion.Careful treatments of the Bayesian approach are given in,for example, Bernardo and Smith(1994),O’Hagan and Forster(2004)and Robert(2001).In partic-ular,Lad(1996)provides an excellent introduction to the subjectivist viewpoint,with a wide ranging collection of references to the development of this position.Moving from principles to practice can prove very challenging and so there are many flavours of Bayesianism reflecting the technical challenges and requirements of different fields.In particular,a form of Bayesian statistics,termed“objective Bayes”aims to gain the formal advantages arising from the structural clarity of the Bayesian approach without paying the“price”of introducing subjectivity into statistical analysis.Such attempts raise important questions as to the role of subjectivism in Bayesian statistics. This account is my subjective take on the issue of subjectivism.My treatment is split into four parts.First,the subjectivist Bayes approach is the only feasible method for tackling many important practical problems,and in Section ∗Department of Mathematical Sciences,University of Durham,UK, :8000/stats/people/mg/mg.htmlc 2006International Society for Bayesian Analysis ba0003404Subjective Bayesian Analysis 2I’ll give examples to illustrate this.Next,in Section3,I’ll look at scientific analy-ses,where the role of subjectivity is more controversial,and argue the necessity of the subjective formulation in this context.In Section4,I’ll consider how well the Bayes approach stands up to scrutiny from the subjective viewpoint itself.In Section5,I’ll dis-cuss the issue of pragmatism in implementing the subjectivist approach.In conclusion, I’ll comment on general implications for developing the full potential of the subjectivist approach to Bayesian analysis.2Applied subjectivismAmong the most important growth areas for Bayesian methodology are those applica-tions that are so complicated that there is no obvious way even to formulate a more traditional analysis.Such applications are widespread;for many examples,consult the series of Bayesian case studies volumes from the Carnegie Mellon conference series. Here are just a couple of areas that I have been personally involved in,with colleagues at Durham,chosen so that I can discuss,from the inside,the central role played by subjectivity.2.1High reliability testing for complex systemsSuppose that we want to test some very complicated system-a large software system would be a good example of this.Software testing is a crucial component of the software creation cycle,employing large numbers of people and consuming much of the software budget.However,while there is a great deal of practical expertise in the software testing community,there is little rigorous guidance for the basic questions of software testing, namely how much testing a system needs,and how to design an efficient test suite for this purpose.Though the number of tests that we could,in principle,carry out is enormous,each test has non-trivial costs,both in time and money,and we must plan testing(and retesting given each fault we uncover)to a tight time/money budget.How can we design and analyse an optimal test suite for the system?This is an obvious example of a Bayesian application waiting to happen.There is enormous uncertainty and we are forced to extrapolate beliefs about the results of all the tests that we have not carried out from the outcomes of the relatively small number of tests that we do carry out.There is a considerable amount of prior knowledge carried by the testers who are familiar with the ways in which this software is likely to fail,both from general considerations and from testing andfield reports for earlier generations of the software.The expertise of the testers therefore lies in the informed nature of the prior beliefs that they hold.However,this expertise does not extend to an ability to analyse,without any formal support tools,the conditional effect of test observations on their prior beliefs,still less to an ability to design a test system to extract optimum information from this extremely complex and interconnected probabilistic system.A Bayesian approach proceeds as follows.First,we construct a Bayesian belief net. In this net,the ancestor nodes represent the various general reasons that the testersMichael Goldstein405 may attribute to software failure,for example incorrectly stripping leading zeroes from a number.The links between ancestor nodes show relationships between these types of failure.The child nodes are the various test types,where the structuring ensures that all tests represented by a given test node are regarded exchangeably by the testers.Second, we quantify beliefs as to the likely levels of failure of each type and the conditional effects of each failure type on each category of test outcome.Finally,we may choose a test suite to optimise any prespecified criterion,either based on the probability of any faults remaining in the system or on the utility of allowing certain types of failure to pass undetected at software release.This optimality calculation is tractable even for large systems.This is because what concerns us,for any test suite,is the probability of faults remaining given that all the chosen tests are successful,provided any faults that are detected will befixed before release.In principle,this methodology,by combining Bayesian belief networks with optimal experimental design,is massively more efficient andflexible than current approaches. Is the approach practical?From our experiences working with an industrial partner,I would say definitely yes.A general overview of the approach that we developed is given in Wooffet al.(2002).As an indication of the potential increase in efficiency,we found, in one case study,that Bayesian automatic design provided eight tests which together were more efficient than233tests designed by the original testing team,and identified additional tests that were appropriate for checking areas of functionality that had not been covered by the original test suite.This is not a criticism of the testers,who were very experienced,but simply illustrates that optimal multi-factor probabilistic design is very difficult.The value of the subjectivist approach lies in translating the compli-cated but informal generalised uncertainty judgements of the experts into a language which allows for precise and rigorous analysis.In system testing,the careful use of this language offers enormous potential for clarity and efficiency gains.Of course,there are many issues that must be sorted out before such benefits can be realised,from the construction of user-friendly interfaces for building the models to(a much larger obstacle!)the culture change required to recognise and routinely exploit such methods.However,the subjective Bayes approach does provide a complete framework for quantifying and managing the uncertainties of high-reliability testing.It is hard to imagine any other approach which could do so.2.2Complex physical systemsMany large physical systems are studied through a combination of mathematical mod-elling,computer simulation and matching against past physical data,which can,hope-fully,be used to extrapolate future system behaviour;for example,this accounts for much of what we claim to know about the nature and pace of global climate change. Such analysis is riddled with uncertainty.In climate modelling,each computer simu-lation can take between days and months,and requires many input parameters to be set,whose values are unknown.Therefore,we may view computer simulations with varied choices of input parameters as a small sample of evaluations from a very high dimensional unknown function.The only way to learn about the input parameters is406Subjective Bayesian Analysis by matching simulator output to historical data,which is,itself,observed with error. Finally,and often most important,the computer simulator is just a model,and we need to consider the ways in which the model and reality may differ.Again,the subjectivist Bayesian approach offers a framework for specifying and synthesising all of the uncertainties in the problem.There is a wide literature on the probabilistic treatment of computer models;a good starting point with a wide collec-tion of references is the recent volume Santner et al.(2003).Our experience at Durham started with work on oil reservoir simulators,which are constructed to help with all the problems involved in efficient management of reservoirs.Typically,these are very high dimensional computer models which are very slow to evaluate.The approach that we employed for reservoir uncertainty analysis was based on representing the reservoir sim-ulator by an emulator.This is a probabilistic description of our beliefs about the value of the simulator at each input value.This is combined with statements of uncertainty about the input values,about the discrepancy between the model and the reservoir and about the measurement uncertainty associated with the historical data.This completely specified stochastic system provides a formal framework allowing us to synthesise expert elicitation,historical data and a careful choice of simulator runs.While there are many challenging technical issues arising from the size and complexity of the system,this spec-ification does allow us to identify“correct”settings for simulator inputs(often termed history matching in the oil industry),see Craig et al.(1996),and to assess uncertainty for forecasts of future behaviour of the physical system,see Craig et al.(2001).Our approach relies on a Bayes linear foundation(which I’ll discuss in Section4)to handle the technical difficulties involved with the high dimensional analysis;for a full Bayes approach for related problems,see Kennedy and O’Hagan(2001).Our approach has been implemented in software employed by users in the oil indus-try,through our collaborators ESL(Energy SciTech Limited).This means that we get to keep track,just a little,of how the approach works in practice.Here’s an example of the type of success which ESL has reported to us.They were asked to match an oilfield containing650wells,based on one million plus grid cells(for each of which permeability,porosity,fault lines,etc.are unknown inputs).Finding the previous best history match had taken one man-year of effort.Our Bayesian approach,starting from scratch,found a match using32runs(each lasting4hours and automatically chosen by the software),with a fourfold improvement according to the oil company measure of match quality.This kind of performance is impressive,although,of course,these remain very hard problems and much must still be done to make the approach more flexible,tractable and reliable.Applications such as these make it clear that careful representation of subjective beliefs can give much improved performance in tasks that people are already trying to do.There is an enormous territory where subjective Bayes methods are the only feasible way forward.This is not to discount the large amount of work that must often be done to bring an application into Bayes form,but simply to observe that for such applications there are no real alternatives.In such cases,the benefits from the Bayesian formulation are potentially very great and clearly demonstrable.The only remaining issue,therefore,is whether such benefits outweigh the efforts required to achieve them.Michael Goldstein407 This“pain to gain”ratio is crucial to the success of subjective Bayes applications.When the answer really matters,such as for global climate change,the pain threshold would have to be very high indeed to dissuade us from the analysis.By explicitly introducing our uncertainty about the ways in which our models fall short of reality,the subjective Bayes analysis also does something new and important. Only technical experts are concerned with how climate models behave,while everybody has an interest in how global climate will actually change.For example,the Guardian newspaper leader on Burying Carbon(Feb3,2005)tell us that“the chances of the Gulf Stream-the Atlantic thermohaline circulation that keeps Britain warm-shutting down are now thought to be greater than50%.”This sounds like something we should know.However,I am reasonably confident that no climate scientist has actually carried an uncertainty analysis which would be sufficient to provide a logical bedrock for such a statement.We can only use the analysis of a global climate model to guide rational policy towards climate change if we can construct a statement of uncertainty about the relation between analysis from the climate model and the behaviour of the real climate. To further complicate the assessment,there are many models for climate change in current use,all of whose analyses should be synthesised as the basis of any judgements about actual climate change.Specifying beliefs about the discrepancy between models and reality is unfamiliar and difficult.However,we cannot avoid this task if we want our statements to carry weight in the real world.A general framework for making such specifications is described in Goldstein and Rougier(2005).3Scientific subjectivism3.1The role of subjectivism in scientific enquiryIn the kind of applications we’ve discussed so far,the only serious issues about the role of subjectivity are pragmatic ones.Each aspect of the specification,whether part of the“likelihood function”or the“prior distribution,”encodes a collection of subjective judgements.The value of the Bayesian approach liesfirst in providing a language within which we can express all these judgements and second in providing a calculus for analysing these judgements.Controversy over the role of subjectivity tends to occur in those areas of scientific experimentation where we do appear to have a greater choice of statistical approaches. Laying aside the obvious consideration that any choice of analysis is the result of a host of subjective choices,there are,essentially,two types of objections to the explicit use of subjective judgements;those of principle,namely that subjective judgements have no place in scientific analyses;and those of practice,namely that the pain to gain ratio is just too high.These are deep issues which have received much attention;a good starting place for discussion of the role of Bayesian analysis in traditional science is Howson and Urbach (1989).Much of the argument can be captured in simple examples.Here’s one such, versions of which are often used to introduce the Bayesian idea to people who already408Subjective Bayesian Analysis have some familiarity with traditional statistical analysis.First,we can imagine carrying out Fisher’s famous tea-tasting experiment.Here an individual,Joan say,claims to be able to tell whether the milk or the tea has been addedfirst in a cup of tea.We perform the experiment of preparing ten cups of tea, choosing each time on a coinflip whether to add the milk or teafirst.Joan then tastes each cup and gives an opinion as to which ingredient was addedfirst.We count the number,X,of correct assessments.Suppose,for example,that X=9.Now compare the tea-tasting experiment to an experiment where an individual, Harry say,claims to have ESP as demonstrated by being able to forecast the outcome of fair coinflips.We test Harry by getting forecasts for tenflips.Let X be the number of correct forecasts.Suppose that,again,X=9.Within the traditional view of statistics,we might accept the same formalism for the two experiments,namely that,for each experiment,each assessment is independent with probability p of success.In each case,X has a binomial distribution parameters10 and p,where p=1/2corresponds to pure guessing.Within the traditional approach, the likelihood is the same,the point null is the same if we carry out a test for whether p=1/2,and confidence intervals for p will be the same.However,even without carrying out formal calculations,I would be fairly convinced of Joan’s tea tasting powers while remaining unconvinced that Harry has ESP.You might decide differently,but that is because you might make different prior judgements. This is what the Bayesian approach adds.First,we require our prior probability,g say,that Harry or Joan is guessing.Then,if not guessing,we need to specify a prior distribution q over possible values of p.Given g,q,we can use Bayes theorem to update our probability that Harry or Joan is just guessing and,if not guessing,we can update our prior distribution over p.We may further clarify the Bayesian account by giving a more careful description of our uncertainty within each experiment based on our judgements of exchangeability for the individual outcomes.This allows us to replace our judgements about the abstract model parameter p with judgements about observable experimental outcomes as the basis for the analysis.Therefore,the Bayes approach shows us exactly how and where to input our prior judgements.We have moved away from a traditional view of a statistical analysis, which attempts to express what we may learn about some aspect of reality by analysing an individual data set.Instead,the Bayesian analysis expresses our current state of belief based on combining information from the data in question with whatever other knowledge we consider relevant.The ESP experiment is particularly revealing for this discussion.I used to use it routinely for teaching purposes,considering that it was sufficiently unlikely that Harry would actually possess ESP that the comparison with the tea-tasting experiment would be self-evident.I eventually came to realise that some of my students considered it perfectly reasonable that Harry might possess such powers.While writing this article, I tried googling“belief in ESP”over the net,which makes for some intriguing reading. Here’s a particularly relevant discussion from an article in the September2002issue ofMichael Goldstein409 Scientific American,by Michael Sherme,titled“Smart People Believe Weird Things”. After noting that,for example,around60%of college graduates appear to believe in ESP,Sherme reports the results of a study that found“no correlation between science knowledge(facts about the world)and paranormal beliefs.”The authors,W.Richard Walker,Steven J.Hoekstra and Rodney J.Vogl,concluded:“Students that scored well on these[science knowledge]tests were no more or less sceptical of pseudo-scientific claims than students that scored very poorly.Apparently,the students were not able to apply their scientific knowledge to evaluate these pseudo-scientific claims.We suggest that this inability stems in part from the way that science is traditionally presented to students:Students are taught what to think but not how to think.”Sherme continues as follows:“To attenuate these paranormal belief statistics,we need to teach that science is not a database of unconnected factoids but a set of methods designed to describe and interpret phenomena,past or present,aimed at building a testable body of knowledge open to rejection or confirmation.”The subjective Bayesian approach may be viewed as a formal method for connecting experimental factoids.Rather than treating each data set as though it has no wider context,and carrying out each statistical analysis just as though this were thefirst investigation that had ever been carried out of any relevance to the questions at issue, we consider instead how the data in question adds to,or changes,our beliefs about these questions.If we think about the ESP experiment in this way,then we should expand the prob-lem description to reflect this requirement.Here is a minimum that I should consider. First,I would need to assess my probability for E,the event that ESP is a real phe-nomenon that at least some people possess.This is the event that joins my analysis of Harry’s performance with my generalised knowledge of the scientific phenomenon at issue.Conditional on E,I should evaluate my probability for J,the event that Harry possesses ESP.Conditional on J and on J complement,I should evaluate my proba-bilities for G,the event that Harry is just guessing and C,the event that either the experiment isflawed or Harry is,somehow,cheating;for example,the coin might be heads biased and Harry mostly calls heads.This is the event that captures my gen-eralised knowledge of the reliability of experimental procedures in this area.If there is either cheating or ESP,I need a probability distribution over the magnitude of the effect.What do we achieve by this formalism?First,this gives me a way of assessing my actual posterior probability for whether Harry has ESP.Second,if I can lay out the considerations that I use in a transparent way,it is easy for you to see how your conclusions might differ from mine.If we disagree as to whether Harry has ESP,then we can trace this disagreement back to differing probabilities for the general phenomenon, in this case ESP,or different judgements about particulars of the experiment,such as Harry’s possible ability at sleight of hand.More generally,by considering the range of prior judgements that might reasonably be made,I can distinguish between the extent to which the experiment might convince me as to Harry’s ESP,and the effect it might have on others.I could even determine how large and how stringently controlled an experiment would need to be in order to have a chance of convincing me of Harry’s410Subjective Bayesian Analysis powers.More generally,how large would the experiment need to be to convince the wider community?The above example provides a simple version of a general template for any scientific Bayesian analysis.There are scientific questions at issue.Beliefs about these issues require prior specification.Then we must consider the relevance of the scientific for-mulation to the current experiment along with all the possibleflaws in the experiment which would invalidate the analysis.Finally,a likelihood must be specified,expressing data variability given the hypotheses of interest.There are two versions of the subsequent analysis.First,you may only want to know how to revise your own beliefs given the data.Such private analyses are quite common. Many scientists carry out at least a rough Bayes assessment of their results,even if they never make such analysis public.Second,you may wish to publish your results,to contribute to,or even to settle, a scientific issue.It may be that you can construct a prior specification that is very widely supported.Alternately,it may be that,as with the ESP experiment,no such generally agreed prior specification may be made.Indeed,the disagreement between experts may be precisely what the experiment is attempting to resolve.Therefore,our Bayesian analysis of an experiment should begin with a probabilistic description whose qualitative form can be agreed on by everyone.This means that all features,in the prior and the likelihood,that cause substantial disagreement should have explicit form in the representation,so that differing judgements can be expressed over them.There is a rich literature on elicitation,dealing with how generalised expert knowledge may be converted into probabilistic form;for a recent overview,see Garthwaite et al.(2005). As with each other aspect of the scientific argument,such elicitation has two aims;first,to obtain sensible prior values and second,to make clear the scientific basis for assigning these values.Statistical aspects of the representation may employ standard data sharing methodologies such as meta-analysis,multi-level modelling and empirical Bayes,provided all the relevant judgements are well sourced.We can then produce the range of posterior judgements,given the data,which correspond to the range of “reasonable”prior judgements held within the scientific community.We may argue that a scientific case is“proven”if the evidence should be convincing given any reasonable assignment of prior beliefs.Otherwise,we can assess the extent to which the community might still differ given the evidence.We should make this analysis at the planning stage in order to design experiments that can be decisive for the scientific community or to conclude that no such experiments are feasible.All of this is clear in principle,though implementation of the program may be difficult in individual cases.Each uncertainty statement is a well sourced statement of belief by an individual.If individual judgements differ and if this is relevant,then such differences are reflected in the analysis.In practice,it is unusual tofind such a subjectivist approach within scientific analysis.Let us therefore consider objections and alternatives to the subjective Bayesian approach.Michael Goldstein411 3.2Objections and alternatives to scientific subjectivismThe principled objection to Bayesian subjectivism is that the subjective Bayesian ap-proach answers problems wrongly,because of unnecessary and unhelpful appeals to arbitrary prior assumptions,which should have no place in scientific analyses.Individ-ual subjective reasoning is inappropriate for reaching objective scientific conclusions, which form the basis of consensus within the scientific community.This objection would have more force if there was a logically acceptable alternative.I do not here want to dwell on the difficulties in interpretation of the core concepts of more traditional inference,such as significance and coverage properties:a valid confidence interval may be empty,for example when constructed by the intersection of a series of repeated confidence intervals;a statistically significant result obtained with high power may be almost certainly false,and so forth.Further,I do not know of any way to construct even the basic building blocks of the inference,such as the relative frequency probabilities that we must use if we reject the subjective interpretation,that will stand up to proper logical scrutiny.Instead,let us address the principled objection directly. We cannot consider whether the Bayes approach is appropriate withoutfirst clarifying the objectives of the analysis.When we discussed the analysis of physical models,we made the fundamental distinction between analysis of the model and analysis of the physical system.Analysing various models may give us insights but at some point these insights must be integrated into statements of uncertainty about the system itself. Analysing experimental data is essentially the same.We must be clear as to whether we are analysing the experiment or the problem.In the ESP experiment,the question is whether Harry has ESP,or,possibly,whether ESP exists at all.If we analyse the experimental data as part of a wider effort to address our uncertainty about these questions,then external judgements are clearly relevant. As described above,the beliefs that are analysed may be those of an individual,if that individual can make a compelling argument for the rationality of a particular belief spec-ification,or instead we may analyse the collection of beliefs held by informed individuals in the community.The Bayes analysis is appropriate for this task,as it is concerned to evaluate the relevant kinds of uncertainty judgements,namely the uncertainties over the quantities that we want to learn about,given the quantities that we observe,based on careful foundational arguments using ideas such as coherence and exchangeability to show why this is the unavoidable way to analyse our actual uncertainties.On the other hand,suppose that,for now,we only want to analyse the data from this individual experiment.Our goal,therefore,cannot be to consider directly the basic question about the existence of ESP.Indeed,it is hard to say exactly what our goal is,which is why there often is so much confusion in discussions between proponents of different approaches.All that we can say informally is that the purpose of such analysis is to provide information which will be helpful at some future time for whoever does attempt to address the real questions of interest.We are now in the same position as the modeller;we have great freedom in carrying out our analyses but we must be modest in the claims that we make for them.。