Exact Path Integral Quantization of 2-D Dilaton Gravity
- 格式:pdf
- 大小:81.63 KB
- 文档页数:4
A new goodness offit test:the reversed Berk-Jones statisticLeah Jager1and Jon A.Wellner2University of WashingtonJanuary23,2004;revised July22and29,2005AbstractOwen inverted a goodness offit statistic due to Berk and Jones to obtain confidence bands for a distribution function using No´e’s recursion.As argued by Owen,the resulting bands are narrower in the tails and wider in the center than the classical Kolmogorov-Smirnov bands and have certain advantages related to the optimality theory connected with the test statistic proposed by Berk and Jones.In this article we introduce a closely related statistic,the“reversed Berk-Jones statistic”which differs from the Berk and Jones statistic essentially because of the asymmetry of Kullback-Leibler information in its two arguments.We parallel the development of Owen for the new statistic,giving a method for constructing the confidence bands using the recursion formulas of No´e to compute rectangle probabilities for order statistics.Along the way we uncover some difficulties in Owen’s calculations and give appropriate corrections.We also compare the exclusion probabilites(corresponding to the power of the tests)of our new bands with the (corrected version of)Owen’s bands for a simple Lehmann type alternative considered by Owen and show that our bands are preferable over a certain range of alternatives.1IntroductionConsider the classical goodness-of-fit testing problem:based on X1,...,X n i.i.d.F,testH0:F(x)=F0(x)for all x∈I R(1) versusH1:F(x)=F0(x)for some x∈I R(2) where F0is afixed continuous distribution function.Berk and Jones(1979)introduced the test statistic R n,which is defined asR n=sup−∞<x<∞K(F n(x),F0(x)),(3)whereK(x,y)=x log x1−y,(4)and F n is the empirical distribution function of the X i’s,given byF n(x)=12Exact quantiles of the null distribution of˜R n forfinite n2.1Exact null distribution of˜R2Proposition1.Under the null hypothesis,P(˜R2≤x)=r2x,0≤x≤log2(7) where0≤r x≤1is the unique solution of(1−r x)log(1−r x)+(1+r x)log(1+r x)=2x.(8) Proof.Without loss of generality we can take F0to be the uniform distribution on[0,1],F0(x)=x, 0≤x≤1.Note that˜R2=supX(1)≤x<X(2)K(x,F2(x))=max{K(X(1),12)}=max{K(X1,12)}where X1,X2∼Uniform[0,1]are independent.Thus we calculateP(K(U,12solves K(u x,12solves K(l x,12−t,12+t,12,it is clear that u x−12−l x,or u x=1−l x.Hence it follows that(9)equals P(l x≤U≤1−l x)=P(12+v x)=2v x(10)where v x=12]satisfies K(12)=x.But this means that r x=2v x satisfies(8).The conclusion follows sinceP(˜R2≤x)=P(max{K(X1,12)}≤x)=P(K(U,12((1−√1−α)+(1+√1−α)).(12)The0.95quantile is then˜λ0.952=0.625251.The0.99quantile is˜λ0.992=0.675634.2.2Quantiles of the null distribution of˜R n for n>2Owen(1995)computed exact quantiles of the Berk-Jones statistic under the null distribution for finite n using a recursion of No´e(1972).Using an analagous method,we compute exact quantiles of the reversed statistic using this recursion.We want to calculate˜λ1−αn such that P(˜R n≤˜λ1−αn)=1−α.With this˜λ1−αn we can form1−αconfidence bands for F byfinding˜L n(x)and˜H n(x)(depending on the data)such that P(˜R n≤˜λ1−αn)=P(˜L n(x)≤F(x)≤˜H n(x),x∈R).We can rewrite this probability in terms of the order statistics,and then use the recursions due to No´e(1972)to compute it.Our procedure is as follows.We want tofind˜λ1−αn for a given confidence interval correspondingto1−α.Given this˜λ1−αn ,we calculate a confidence band of the form{˜a i,i=1,···,n}and{˜b i,i=1,2,···,n}such that P(˜R n≤˜λ1−αn)=P(˜a i<X(i)≤˜b i,i=1,2,···,n).To see how to calculate{˜a i}and{˜b i},we look at the reversed statistic,˜R n,itself.Similarly to the n=2case above,we can separate the event[˜R n≤˜λn]into parts associated with each order statistic.Now˜Rn=supX(1)≤x<X(n)K(x,F n(x))=K(X(1),1n),K(X(i),in).So the event[˜R n≤˜λn]is equivalent to the intersection of the eventsK(X(1),1n ),K(X(i),in)≤˜λn.(15)Here we have managed to divide the event into smaller events relating to each order statistic separately.In order to compute thefinite sample quantiles,we are looking for{˜a i}n i=1and{˜b i}n i=1such that P(˜R n≤˜λn)=P(˜a i<X(i)≤˜b i,1≤i≤n).(Note that we have deliberately chosen somewhat different notation from Owen(1995).)Splitting the event[˜R n≤˜λn]into events(13),(14),and (15),we can define{˜a i}and{˜b i}in terms of these smaller events.From(13),K(X(1),1n)≤˜λn},(16)˜a1=min{x|K(x,1n)≤˜λn},˜a n=min{x|K(x,n−1Finally,the event(14)yields that for2≤i≤n−1,˜bi=max{x|max{K(x,i−1n)}≤˜λn}=max{x|K(x,i−1n)≤˜λn},(18)˜a i=min{x|max{K(x,i−1n)}≤˜λn}=min{x|K(x,i−1n)≤˜λn}.(19)The above equations for2≤i≤n−1are not as easy to deal with as those for the cases wherei=1and i=n.However,by noticing the relationship between K(x,i−1n ),we cansimplify further.To do this we use the following claim.Claim1.Let˜λn>0.Then for anyfixed y1and y2such that0<y1<y2<1,(i)max{x|K(x,y1)≤˜λn,K(x,y2)≤˜λn}=max{x|K(x,y1)≤˜λn},(ii)min{x|K(x,y1)≤˜λn,K(x,y2)≤˜λn}=min{x|K(x,y2)≤˜λn}, provided{x|K(x,y1)≤˜λn,K(x,y2)≤˜λn}is not empty.Proof.(i)First,note that∂y−log1−x∂xK(x,y1)>∂n)≤˜λn},(20)˜a i=min{x|K(x,iWe can further simplify the calculation by noticing that˜a i=1−˜b n−i+1for1≤i≤n.This is shown in the following two claims.Claim2.Letλ>0.Then for anyfixed y,max{x|K(x,y)≤˜λ}=1−min{x|K(x,1−y)≤˜λ}(22) Proof.Fix y.Now∂y−log1−x)≤˜λn}ni=1−max{x|K(x,1−)≤˜λn}n=1−˜b n−i+1.The cases i=1and i=n are trivial.2 Now that we have defined{˜b i}for all i,and thus defined{˜a i},we can calculate P(˜R n≤˜λn)= P(˜a i<X(i)≤˜b i,1≤i≤n)by a recursion due to No´e(1972).The computing process involved in finding the values of{˜a i},{˜b i},and˜λn follows the method outlined in Owen(1995),using the Van Wijngaarden-Decker-Brent method tofirstfind the{˜a i}and{˜b i}corresponding to a particular˜λnand then reapplying the same Van Wijngaarden-Decker-Brent method again to solve for the˜λ1−αn associated with the1−αquantile.There is,however,one slight complication in these calculations compared to the method outlined in Owen(1995).When calculating confidence bands by inverting the Berk-Jones statistic,we lookat K(x,y)as a function of y for afixed x.For each x∈(0,1),K(x,y)is a continuous function of y with a minimum of0at y=x that tends to∞as y→0or y→1.Therefore for anyλ>0there exists a y∗such that K(x,y∗)=˜λ.For the reversed statistic,however,we look at K(x,y)as a function of x for afixed y.Again, K(x,y)is a continuous function of x with a minimum of0at x=y.But K(0,y)=log1<∞.So we are not guaranteed that for each˜λ>0there exists an x∗such ythat K(x∗,y)=˜λ.Thus care must be taken when looking at˜b i as defined in(20).If there is no x∗satisfying K(x∗,i−1Proof.Note thatR n =sup 0≤x ≤1K (F n (x ),x )=max 1≤i ≤n{K (i −1n,X (i ))}.Thus for n =1we have (interpreting 0log 0=0)R 1=log1X 1.(25)It follows thatP (R 1≤x )=P (log1X 1≤x )=P (e −x≤X 1≤1−e −x )=(1−2e −x )1[log 2,∞)(x ).2Knowing the exact distribution of R 1allows us to calculate the exact quantiles for n =1.We do this by solvingP (R 1≤λ1−α1)=1−α(26)for λ1−α1given a value of 1−α.This implies that the 1−αquantile λ1−α1of R 1is given by λ1−α1=−logαn,X (i )),K (i1−X (1),K (1n,X (i )),K (in,X (n )),log11−X (1),K (1n ,X (i )),K (in,X (n )),log1Again we are looking for numbers{a i}n i=1and{b i}n i=1(which depend also on n andλn, dependence suppressed in the notation)such thatP(R n≤λn)=P(a i<X(i)≤b i,1≤i≤n).Splitting the event[R n≤λn]into events(27),(28),and(29),we can define{a i}and{b i}in terms of these smaller events,as we did in the case of the reversed statistic.From(27),we see thatb1=max{x|max{log 1n,x)}≤λn}=max{x|log 1n,x)≤λn},a1=min{x|max{log 1n,x)}≤λn}=min{x|log 1n,x)≤λn}.Similarly,because of(29),we haveb n=max{x|max{K(n−1x}≤λn}=max{x|K(n−1x≤λn},a n=min{x|max{K(n−1x}≤λn}=min{x|K(n−1x≤λn}.Finally,event(28)gives us that for2≤i≤n−1,b i=max{x|max{K(i−1n,x)}≤λn}=max{x|K(i−1n,x)≤λn}(30)a i=min{x|max{K(i−1n,x)}≤λn}=min{x|K(i−1n,x)≤λn}.(31)Again,we can simplify these expressions for a i and b i by noticing a few things.We begin with the following claim.Claim4.Letλn>0.Then for anyfixed x1and x2such that0<x1<x2<1,(i)max{y|K(x1,y)≤λn,K(x2,y)≤λn}=max{y|K(x1,y)≤λn},(ii)min{y|K(x1,y)≤λn,K(x2,y)≤λn}=min{y|K(x2,y)≤λn}, provided{y|K(x1,y)≤λn,K(x2,y)≤λn}is not empty.Proof.(i)First,note that∂y(1−y).So K decreases in y on the interval[0,x),has a minimum of0at y=x,and increases in y on the interval(x,1].This means that max{y|K(x,y)≤λn}will occur on the interval(x,1].Nowfix x1and x2such that x1<x2.Then K(x1,y)and K(x2,y)will have a point of intersection at c in the interval(x1,x2).That is,K(x1,c)=K(x2,c).Now K(x1,y)is increasing in y on the interval(c,x2],while K(x2,y)is decreasing in y on this same interval.So K(x2,y)<K(x1,y)on(c,x2].But∂∂y K(x2,y)for all y.So K(x2,y)<K(x1,y)for all y in(c,1].There are three cases to consider.First,supposeλn>K(x1,c),where again,c is the point of intersection.Then max{y|K(x1,y)≤λn,K(x2,y)≤λn}>c.Since we have shown that K(x2,y)< K(x1,y)for all y>c,the maximum y value where both K(x1,y)≤λn and K(x2,y)≤λn is the same as the maximum y for which K(x1,y)≤λn.So max{y|K(x1,y)≤λn,K(x2,y)≤λn}= max{y|K(x1,y)≤λn}.Now supposeλn=K(x1,c).Then max{y|K(x1,y)≤λn,K(x2,y)≤λn}=c= max{y|K(x1,y)≤λn}.Finally,supposeλn<K(x1,y).Then there is no y in the interval[0,1]that satisfies max{y|K(x1,y)≤λn,K(x2,y)≤λn},since K(x1,y)>λn for y≥c and K(x2,y)>λn for y≤c.(ii)The result for the minimum is proved in a similar way.2 This claim allows us to simplify(30)and(31)to be(for2≤i≤n−1)b i=max{x|K(i−1n,x)≤λn}.(33) We can also simplify the cases where i=1and i=n.These cases can be written asb1=max{x|log1n,x)≤λn},(35)b n=max{x|K(n−1x≤λn}=e−λn.(37) The rationale for this is as follows.First the i=1case.Notice that log11−yis increasing on (0,c)and K(x,y)is decreasing on this interval,log1∂y log11−y>1y)=∂1−y>K(x,y)onthe interval(c,1).This gives the result for b1.The case for i=n is similar,except that log11−y case.Finally,as in the case of the reversed statistic described above,we see that we can once again define the{a i}in terms of the{b i}as a i=1−b n−i+1for1≤i≤n.The following two claims (analogous to claims2and3in the case of the reversed statistic)show this.Claim5.Letλ>0.Then for anyfixed x,max{y|K(x,y)≤λ}=1−min{y|K(1−x,y)≤λ}(38)Proof.Fix x.Now∂y(1−y).So K decreases in y on the interval[0,x),has a minimum of0at y=x,and increases in y on the interval(x,1].This means that max{y|K(x,y)≤λ}will occur on the interval(x,1],while min{y|K(x,y)≤λ}will occur on the interval[0,x).Also,notice that for all x and y,K(x,y)=K(1−x,1−y).Now,since K(x,y)→∞as x→1for afixed y,there exists a y∗in(x,1]such that K(x,y∗)=λ. So y∗=max{y|K(x,y)≤λ}.But K(1−x,1−y∗)=λas well.Now1−y∗is in the interval [0,1−x).So1−y∗=min{y|K(1−x,y)≤λ}.Thus the result is proved.2 Claim6.For1≤i≤n,a i=1−b n−i+1.(39) Proof.From(32)and(33)we have that for2≤i≤n−1a i=min{x|K(in,x)≤λn}by Claim5=1−max{x|K(n−iP(L i−1<X(i)≤H i,i=1,2,···,n).As defined by Owen,in the two displays below his formula (7),page517,{H i}and{L i}becomeH i=max{x|K(in ,x)≤λn},1≤i≤n−1,L0=0,and then Owen(1995)claims in his formula(9)page518that these are linked to the event {R n≤λn}by{R n≤λn}={L i−1≤X(i)≤H i:i=1,...,n}.(42) But in fact,(42)is false.Note that the event on the right side of(42)implies thatR n≥K((n−1)/n,H n−ǫ)=K((n−1)/n,1−ǫ)→∞asǫ↓0.Thus{R n≤λn}⊂{L i−1≤X(i)≤H i:i=1,...,n}(43) with strict inclusion since the event on the right side allows R n=∞.Moreover,note that R n= R n(X1,...,X n)>λ.95n also at the points X(i)=H i with i<n since K((i−1)/n,H i)>λ.95n. Our definition of{b i}isb1=max{x|log1n,x)≤λn},2≤i≤n.Note that Owen’s H i’s involve the maximum x such that K(in ,x)≤λn.Thus it follows that b i=H i−1for i=2,...,n−1,and similarly a i=1−b n−i+1=1−H n−i=L i,i=2,...,n−1by virtue of Owen’s relation L i=1−H n−i.Thus we claim the correct event identity is:{R n≤λn}={a i<X(i)≤b i,i=1,...,n}(44) where a i=L i,i=1,...,n−1,a n=e−λn,b i=H i−1,i=2,...,n,b1=1−e−λn The following table illustrates the situation numerically for n=4.Table1:Numerical illustration of order statistic bounds,n=4J-W bounds J-W boundsa1=.002737a1=.001340L1=.002737L1=.001340a3=.147088a3=.114653L3=.147088L3=.114653H1=.852912H1=.885347b2=.852912b2=.885347H3=.997263H3=.998660b4=.997263b4=.998660Actual coverage=.901771Actual coverage=.95 wrongλcorrectλcorrect bounds correct bounds Thefirst column of table1gives the constants L i and H i involved in the right side of Owen’s claimed event identity(42)corresponding to hisλ.954=.9149....The“actual coverage”in this column is the probability of the event on the right side of(42)and(43).[Note thatthis probability,.99841...,is not.95.It seems that in his computer program Owen used{a1,...,a n}={L1,...,L n−1,e−λn}in place of L0,...,L n−1.When we make this replacementwefind P(M4)≡P(∩4i=1[a i≤X(i)≤H i])=.95when a i and H i are determined by Owen’sλ.954.But this latter event M4also satisfies[R4≤λ.954]⊂M4.]The second column of table1gives the constants{a i}and{b i}corresponding to Owen’sλ.954=.9149....The resulting probability of thetwo equal events in(44)is0.9017...<.95.(This makes sense since the four dimensional rectangle involved on the right side of(44)is strictly contained in the rectangle on the right side of(42),even with the L i’s replaced by a i’s as in Owen’s program.)The third column of table1gives the constants L i and H i corresponding to ourλ.954=1.092....This is the correctλ.954,but Owen’s bounds L i and H i are still wrong,so equality in(42)fails and the resulting probability of the eventon right side of(43)is.9995...(or.9747...if we use the a i’s in place of L0,...,L4as the lower boundsas in Owen’s program).Finally,column4of table1gives our{a i}and{b i}corresponding to ourλ.954,and the probability of the event on the left side of(43)and both terms in(44)is.95.Table2compares theλ0.95n values calculated using Owen’s definition of the{H i}’s and thosecalculated using our{b i}’s with exact results for n=2,and simulation results for3≤n≤20 and selected larger values of n.The Monte Carlo simulations were carried out by simulating the Berk-Jones statistic100,000times and taking the0.95quantile(or95000th order statistic)of the simulated values.In each case,thefinite sample quantile calculated according to the{b i}found by our method (which is similar to the method used for the reversed statistic)agrees more closely with the simulated result.Finally,we determine the confidence bandsL n(x)=ni=0l i1(X(i),X(i+1)](x),andH n(x)=ni=0h i1[X(i),X(i+1))(x),where X(0)≡−∞and X(n+1)=∞by convention.For x∈(X(i),X(i+1))we have F n(x)=i/n and it is clear that the event[R n≤λ1−αn]restricts F(x)only by K(i/n,F(x))≤λ1−αn.Henceh i=max{p|K(i/n,p)≤λ1−αn},whilel i=min{p|K(i/n,p)≤λ1−αn}.From(32)and(36)it follows that h i=b i+1,i∈{0,...,n−1},and from(35)and(33)we have l i=a i,i∈{1,...,n}.Furthermore h n=1and l0=0(trivially).Table2:Comparison of0.95quantiles of the Berk-Jones statistic with simulation2 1.67031.90032 1.67117.90058 2.024950 2.027693 1.176631.90122 1.17665.90122 1.414108 1.4136240.914983.901950.915054.90198 1.092493 1.0890750.751753.901840.751894.902630.8927880.89133760.639718.903850.639889.903970.7562510.75572570.557816.903500.557992.903590.6567880.65978580.495200.905950.495369.906030.5809900.57874890.445698.907670.445852.907760.5212420.519253100.405531.906600.405670.906500.4728950.473739 110.372252.906460.372376.906610.4329430.433429 120.344209.905940.344319.906050.3993580.402910 130.320240.908650.320337.908760.3707180.370418 140.299506.909030.299592.909080.3459950.344839 150.281384.910730.281461.910410.3244320.322943 160.265404.909260.265473.909350.3054560.306919 170.251203.911450.251265.911560.2886220.287871 180.238495.911720.238551.911800.2735850.272886 190.227054.912120.227104.912190.2600690.258733 200.216696.911420.216742.911340.2478530.249022 500.093344.919850.0933644.919880.1042390.103634 1000.048899.921860.0489062.923330.0537660.053617 5000.010631.930430.0106328.930290.0113810.011379 10000.005466.932510.0054659.931450.005804.00580896As in Owen(1995)and Owen(2001),we give approximation formulas for the0.95and0.99 quantiles of the Berk-Jones statistic which are polynomial in log n.These formulas compare to (10)-(13)in Owen(1995)and to Table7.1,page159in Owen(2001).Wefind that Owen’s exact and approximate critical values for bands with claimed confidence coefficient.95have true coverage ranging from about.90to.93for sample sizes between3and1000;see Table2for estimated coverage probabilities(with105monte-carlo samples).λ0.95 n .=1n(3.7752+0.5062log n−0.0417(log n)2+0.0016(log n)3),100<n≤1000.(46)λ0.99n.=1n(5.6392+0.4018log n−0.0183(log n)2),100<n≤1000.(48)4Power considerations4.1Power heuristicsHere we more specifically address issues of power.We are able to get some qualitative ideas of the behavior of these test statistics against different alternatives by considering the functions K(F0(x),F(x))and K(F(x),F0(x))pointwise in x,rather than taking the supremum over all x.Consider the distribution functionsF1(x)=1x(49)andF2(x)=e−(1x0.00.20.40.60.8 1.00.00.20.40.60.81.Figure 1:Extreme distribution functions F 1(solid line)and F 2(dashed line).than the null,we are actually more interested in the power behavior for alternatives which are slightly different from the null distribution.For example,alternatives which are more moderately stochastically larger or smaller than F 0.Natural alternatives to consider are those of the form F c 0,for different values of c ∈(0,∞).For values of c >1,this distribution is stochastically larger than F 0.For values of c <1,this distribution is stochastically smaller than F 0.Based on the behavior of the functions g 1,g 2,˜g 1,and ˜g 2,we would guess that in this case as well,the dual statistic would be more powerful against stochastically larger alternatives (c >1),while the Berk-Jones statistic would remain more powerful for stochastically smaller alternatives (c <1).4.2Power calculationsTo test our conjectures about power,we use the same algorithm by No ´e (1972)to calculate the probability that F 1,F 2,and F c are contained in the 95%confidence band for F 0.Figure 3plots these probabilities for F c against c for different sample sizes.The curves for sample size n =20can be compared to Figure 5in Owen (1995).The line representing the Berk-Jones statistic is the same as that which Owen calls the curve for the nonparametric likelihood bands.We see that the reverse Berk-Jones statistic has greater power than both the Berk-Jones statistic and the Kolmogorov-Smirnov statistic for values of c >1.5ExamplesFigure 4shows the empirical distribution function of the velocities of 82galaxies from the Corona Borealis region along with 95%confidence intervals generated by inverting both the Berk-Jonesx0.00.20.40.60.8 1.00.00.20.40.60.81.0(a)x0.00.20.40.60.8 1.00.00.20.40.60.81.0(b)Figure 2:(a)The functions g 1(solid line)and ˜g 1(dashed line).(b)The functions g 2(solid line)and ˜g 2(dashed line).0.00010.00100.01000.10001.0000Figure 3:The the 95%confidence bands for F 0(vertical distribution F 0.statistic and the reversed statistic.This data appears in Table 1of Roeder (1990).This figure can be compared to Figures 1and 2in Owen (1995).Comparison shows that the confidence band based on the reversed Berk-Jones statistic are narrower at the tails than the one based on the Kolmogorov-Smirnov statistic.Also,there are slight differences between the confidence band based on the reversed statistic compared to the Berk-Jones statistic.In the region of the lower tail,the band based on the reversed statistic is shifted slightly downward,while in the region of the upper tail,this band is shifted slightly upward.This behavior is more noticable when looking at equally spaced data points.Figure 5shows these same 95%confidence bands for n =20equispaced observations.Velocity (km/sec)50001000015000200002500030000350000.00.20.40.60.81.0Figure 4:The empirical CDF of the velocities of 82galaxies in the Corona Borealis Region (dark solid line)and 95%confidence bands obtained by inverting the Berk-Jones statistic (dashed line)and the reversed Berk-Jones statistic (solid line)Equispaced Data51015200.00.20.40.60.81.0Figure 5:The empirical CDF of 20equally spaced data points (dark solid line)and 95%confidence bands obtained by inverting the Berk-Jones statistic (dashed line)and the reversed Berk-Jones statistic (solid line)Finally,Figure 6gives a comparison of Owen’s bands based on the Berk-Jones statistic to our bands based on the same statistic;as argued above,Owen’s bands do not have the correct coverage probability.Equispaced Data51015200.00.20.40.60.81.0Figure 6:Comparison of Owen’s bands based on the Berk-Jones statistic (dashed line)to our bands based on the Berk-Jones statistic (solid line)for 20equally spaced data pointsThe C and R programs used to carry out the computations presented here are available (in several forms)at/jaw/RESEARCH/SOFTWARE/software.list.html .Acknowledgements:We owe thanks to Art Owen for sharing his C programs used to carry out the computations for his 1995paper.Those programs were used as a starting point for the programs used here.We also owe thanks to Art for several helpful discussions.Mame Astou Diouf pointed out several typographical errors in the first version.ReferencesBerk,R.H.and Jones,D.H.(1979).Goodness-of-fit test statistics that dominate the Kolmogorov statistics.Zeitschrift f¨u r Wahrscheinlichkeitstheorie und Verwandte Gebiete47,47-59.No´e,M.(1972).The calculation of distributions of two-sided Kolmogorov-Smirnov type statistics.Annals of Mathematical Statistics43,58-64.Owen,A.B.(1995).Nonparametric likelihood confidence bands for a distribution function.Journal of the American Statistical Association90,516-521.Owen,A.B.(2001).Empirical Likelihood.Chapman&Hall/CRC,Boca Raton.Roeder,K.(1990).Density estimation with confidence sets exemplified by superclusters and voids in the galaxies.Journal of the American Statistical Association85,617-624. Wellner,J.A.and Koltchinskii,V.(2003).A note on the asymptotic distribution of Berk-Jones type statistics under the null hypothesis.High Dimensional Probability III,321-332.Birkh¨a user,Basel(2003).University of WashingtonStatisticsBox354322Seattle,Washington98195-4322e-mail:leah@University of WashingtonStatisticsBox354322Seattle,Washington98195-4322e-mail:jaw@21。
Using XPSPEAK Version 4.1 November 2000Contents Page Number XPS Peak Fitting Program for WIN95/98 XPSPEAK Version 4.1 (1)Program Installation (1)Introduction (1)First Version (1)Version 2.0 (1)Version 3.0 (1)Version 3.1 (2)Version 4.0 (2)Version 4.1 (2)Future Versions (2)General Information (from R. Kwok) (3)Using XPS Peak (3)Overview of Processing (3)Appearance (4)Opening Files (4)Opening a Kratos (*.des) text file (4)Opening Multiple Kratos (*.des) text files (5)Saving Files (6)Region Parameters (6)Loading Region Parameters (6)Saving Parameters (6)Available Backgrounds (6)Averaging (7)Shirley + Linear Background (7)Tougaard (8)Adding/Adjusting the Background (8)Adding/Adjusting Peaks (9)Peak Types: p, d and f (10)Peak Constraints (11)Peak Parameters (11)Peak Function (12)Region Shift (13)Optimisation (14)Print/Export (15)Export (15)Program Options (15)Compatibility (16)File I/O (16)Limitations (17)Cautions for Peak Fitting (17)Sample Files: (17)gaas.xps (17)Cu2p_bg.xps (18)Kratos.des (18)ASCII.prn (18)Other Files (18)XPS Peak Fitting Program for WIN95/98 XPSPEAKVersion 4.1Program InstallationXPS Peak is freeware. Please ask RCSMS lab staff for a copy of the zipped 3.3MB file, if you would like your own copyUnzip the XPSPEA4.ZIP file and run Setup.exe in Win 95 or Win 98.Note: I haven’t successfully installed XPSPEAK on Win 95 machines unless they have been running Windows 95c – CMH.IntroductionRaymond Kwok, the author of XPSPEAK had spent >1000 hours on XPS peak fitting when he was a graduate student. During that time, he dreamed of many features in the XPS peak fitting software that could help obtain more information from the XPS peaks and reduce processing time.Most of the information in this users guide has come directly from the readme.doc file, automatically installed with XPSPEAK4.1First VersionIn 1994, Dr Kwok wrote a program that converted the Kratos XPS spectral files to ASCII data. Once this program was finished, he found that the program could be easily converted to a peak fitting program. Then he added the dreamed features into the program, e.g.∙ A better way to locate a point at a noise baseline for the Shirley background calculations∙Combine the two peaks of 2p3/2 and 2p1/2∙Fit different XPS regions at the same timeVersion 2.0After the first version and Version 2.0, many people emailed Dr Kwok and gave additional suggestions. He also found other features that could be put into the program.Version 3.0The major change in Version 3.0 is the addition of Newton’s Method for optimisation∙Newton’s method can greatly reduce the optimisation time for multiple region peak fitting.Version 3.11. Removed all the run-time errors that were reported2. A Shirley + Linear background was added3. The Export to Clipboard function was added as requested by a user∙Some other minor graphical features were addedVersion 4.0Added:1. The asymmetrical peak function. See note below2. Three additional file formats for importing data∙ A few minor adjustmentsThe addition of the Asymmetrical Peak Function required the peak function to be changed from the Gaussian-Lorentzian product function to the Gaussian-Lorentzian sum function. Calculation of the asymmetrical function using the Gaussian-Lorentzian product function was too difficult to implement. The software of some instruments uses the sum function, while others use the product function, so both functions are available in XPSPEAK.See Peak Function, (Page 12) for details of how to set this up.Note:If the selection is the sum function, when the user opens a *.xps file that was optimised using the Gaussian-Lorentzian product function, you have to re-optimise the spectra using the Gaussian-Lorentzian sum function with a different %Gaussian-Lorentzian value.Version 4.1Version 4.1 has only two changes.1. In version 4.0, the printed characters were inverted, a problem that wasdue to Visual Basic. After about half year, a patch was received from Microsoft, and the problem was solved by simply recompiling the program2. The import of multiple region VAMAS file format was addedFuture VersionsThe author believes the program has some weakness in the background subtraction routines. Extensive literature examination will be required in order to revise them. Dr Kwok intends to do that for the next version.General Information (from R. Kwok)This version of the program was written in Visual Basic 6.0 and uses 32 bit processes. This is freeware. You may ask for the source program if you really want to. I hope this program will be useful for people without modern XPS software. I also hope that the new features in this program can be adopted by the XPS manufacturers in the later versions of their software.If you have any questions/suggestions, please send an email to me.Raymund W.M. KwokDepartment of ChemistryThe Chinese University of Hong KongShatin, Hong KongTel: (852)-2609-6261Fax:(852)-2603-5057email: rmkwok@.hkI would like to thank the comments and suggestions from many people. For the completion of Version 4.0, I would like to think Dr. Bernard J. Flinn for the routine of reading Leybold ascii format, Prof. Igor Bello and Kelvin Dickinson for providing me the VAMAS files VG systems, and my graduate students for testing the program. I hope I will add other features into the program in the near future.R Kwok.Using XPS PeakOverview of Processing1. Open Required Files∙See Opening Files (Page 4)2. Make sure background is there/suitable∙See Adding/Adjusting the Background, (Page 8)3. Add/adjust peaks as necessary∙See Adding/Adjusting Peaks, (Page 9), and Peak Parameters, (Page 11)4. Save file∙See Saving Files, (Page 6)5. Export if necessary∙See Print/Export, (Page 15)AppearanceXPSPEAK opens with two windows, one above the other, which look like this:∙The top window opens and displays the active scan, adds or adjusts a background, adds peaks, and loads and saves parameters.∙The lower window allows peak processing and re-opening and saving dataOpening FilesOpening a Kratos (*.des) text file1. Make sure your data files have been converted to text files. See the backof the Vision Software manual for details of how to do this. Remember, from the original experiment files, each region of each file will now be a separate file.2. From the Data menu of the upper window, choose Import (Kratos)∙Choose directory∙Double click on the file of interest∙The spectra open with all previous processing INCLUDEDOpening Multiple Kratos (*.des) text files∙You can open up a maximum of 10 files together.1. Open the first file as above∙Opens in the first region (1)2. In the XPS Peak Processing (lower) window, left click on 2(secondregion), which makes this region active3. Open the second file as in Step2, Opening a Kratos (*.des) text file,(Page 4)∙Opens in the second region (2)∙You can only have one description for all the files that are open. Edit with a click in the Description box4. Open further files by clicking on the next available region number thenfollowing the above step.∙You can only have one description for all the files that are open. Edit with a click in the Description boxDescriptionBox 2∙To open a file that has already been processed and saved using XPSPEAK, click on the Open XPS button in the lower window. Choose directory and file as normal∙The program can store all the peak information into a *.XPS file for later use. See below.Saving Files1. To save a file click on the Save XPS button in the lower window2. Choose Directory3. Type in a suitable file name4. Click OK∙Everything that is open will be saved in this file∙The program can also store/read the peak parameter files (*.RPA)so that you do not need to re-type all the parameters again for a similar spectrum.Region ParametersRegion Parameters are the boundaries or limits you have used to set up the background and peaks for your files. These values can be saved as a file of the type *.rpa.Note that these Region Parameters are completely different from the mathematical parameters described in Peak Parameters, (Page 11) Loading Region Parameters1. From the Parameters menu in the upper window, click on Load RegionParameters2. Choose directory and file name3. Click on Open buttonSaving Parameters1. From the Parameters menu in the XPS Peak Fit (Upper) window, clickon Save Region Parameters2. Choose directory and file name3. Click on the Save buttonAvailable BackgroundsThis program provides the background choices of∙Shirley∙Linear∙TougaardAveraging∙ Averaging at the end points of the background can reduce the time tofind a point at the middle of a noisy baseline∙ The program includes the choices of None (1 point), 3, 5, 7, and 9point average∙ This will average the intensities around the binding energy youselect.Shirley + Linear Background1. The Shirley + Linear background has been added for slopingbackgrounds∙ The "Shirley + Linear" background is the Shirley background plus astraight line with starting point at the low BE end-point and with a slope value∙ If the slope value is zero , the original Shirley calculation is used∙ If the slope value is positive , the straight line has higher values atthe high BE side, which can be used for spectra with higher background intensities at the high BE side∙ Similarly, a negative slope value can be used for a spectrum withlower background intensities at the high BE side2. The Optimization button may be used when the Shirley background is higher at some point than the signal intensities∙ The program will increase the slope value until the Shirleybackground is below the signal intensities∙ Please see the example below - Cu2p_bg.xps - which showsbackground subtraction using the Shirley method (This spectrum was sent to Dr Kwok by Dr. Roland Schlesinger).∙ A shows the problematic background when the Shirley backgroundis higher than the signal intensities. In the Shirley calculation routine, some negative values were generated and resulted in a non-monotonic increase background∙ B shows a "Shirley + Linear" background. The slope value was inputby trial-and-error until the background was lower than the signal intensities∙ C was obtained using the optimisation routineA slope = 0B slope = 11C slope = 15.17Note: The background subtraction calculation cannot completely remove the background signals. For quantitative studies, the best procedure is "consistency". See Future Versions, (Page 2).TougaardFor a Tougaard background, the program can optimise the B1 parameter by minimising the "square of the difference" of the intensities of ten data points in the high binding energy side of the range with the intensities of the calculated background.Adding/Adjusting the BackgroundNote: The Background MUST be correct before Peaks can be added. As with all backgrounds, the range needs to include as much of your peak as possible and as little of anything else as possible.1. Make sure the file of interest is open and the appropriate region is active2. Click on Background in the upper window∙The Region 0 box comes up, which contains the information about the background3. Adjust the following as necessary. See Note.∙High BE (This value needs to be within the range of your data) ∙Low BE (This value needs to be within the range of your data) NOTE: High and Low BE are not automatically within the range of your data. CHECK CAREFULLY THAT BOTH ENDS OF THE BACKGROUND ARE INSIDE THE EDGE OF YOUR DATA. Nothing will happen otherwise.∙No. of Ave. Pts at end-points. See Averaging, (Page 7)∙Background Type∙Note for Shirley + Linear:To perform the Shirley + Linear Optimisation routine:a) Have the file of interest openb) From the upper window, click on Backgroundc) In the resulting box, change or optimise the Shirley + LinearSlope as desired∙Using Optimize in the Shirley + Linear window can cause problems. Adjust manually if necessary3. Click on Accept when satisfiedAdding/Adjusting PeaksNote: The Background MUST be correct before peaks can be added. Nothing will happen otherwise. See previous section.∙To add a peak, from the Region Window, click on Add Peak ∙The peak window appears∙This may be adjusted as below using the Peak Window which will have opened automaticallyIn the XPS Peak Processing (lower) window, there will be a list of Regions, which are all the open files, and beside each of these will be numbers representing the synthetic peaks included in that region.Regions(files)SyntheticPeaks1. Click on a region number to activate that region∙The active region will be displayed in the upper window2. Click on a peak number to start adjusting the parameters for that peak.∙The Processing window for that peak will open3. Click off Fix to adjust the following using the maximum/minimum arrowkeys provided:∙Peak Type. (i.e. orbital – s, p, d, f)∙S.O.S (Δ eV between the two halves of the peak)∙Position∙FWHM∙Area∙%Lorenzian-Gaussian∙See the notes for explanations of how Asymmetry works.4. Click on Accept when satisfiedPeak Types: p, d and f.1. Each of these peaks combines the two splitting peaks2. The FWHM is the same for both the splitting peaks, e.g. a p-type peakwith FWHM=0.7eV is the combination of a p3/2 with FWHM at 0.7eV anda p1/2 with FWHM at 0.7eV, and with an area ratio of 2 to 13. If the theoretical area ratio is not true for the split peaks, the old way ofsetting two s-type peaks and adding the constraints should be used.∙The S.O.S. stands for spin orbital splitting.Note: The FWHM of the p, d or f peaks are the FWHM of the p3/2,d5/2 or f7/2, respectively. The FWHM of the combined peaks (e.g. combination of p3/2and p1/2) is shown in the actual FWHM in the Peak Parameter Window.Peak Constraints1. Each parameter can be referenced to the same type of parameter inother peaks. For example, for four peaks (Peak #0, 1, 2 and 3) with known relative peak positions (0.5eV between adjacent peaks), the following can be used∙Position: Peak 1 = Peak 0 + 0.5eV∙Position: Peak 2 = Peak 1 + 0.5eV∙Position: Peak 3 = Peak 2 + 0.5eV2. You may reference to any peak except with looped references.3. The optimisation of the %GL value is allowed in this program.∙ A suggestion to use this feature is to find a nice peak for a certain setting of your instrument and optimise the %GL for this peak.∙Fix the %GL in the later peak fitting process when the same instrument settings were used.4. This version also includes the setting of the upper and lower bounds foreach parameter.Peak ParametersThis program uses the following asymmetric Gaussian-Lorentzian sumThe program also uses the following symmetrical Gaussian-Lorentzian product functionPeak FunctionNote:If the selection is the sum function, when the user opens a *.xps file that was optimised using the Gaussian-Lorentzian product function, you have to re-optimise the spectra using the Gaussian-Lorentzian sum function with a different %Gaussian-Lorentzian value.∙You can choose the function type you want1. From the lower window, click on the Options button∙The peak parameters box comes up∙Select GL sum for the Gaussian-Lorentzian sum function∙Select GL product for the Gaussian-Lorentzian product function. 2. For the Gaussian-Lorentzian sum function, each peak can have sixparameters∙Peak Position∙Area∙FWHM∙%Gaussian-Lorentzian∙TS∙TLIf anyone knows what TS or TL might be, please let me know. Thanks, CMH3. Each peak in the Gaussian-Lorentzian product function can have fourparameters∙Peak Position∙Area∙FWHM∙%Gaussian-LorentzianSince peak area relates to the atomic concentration directly, we use it as a peak parameter and the peak height will not be shown to the user.Note:For asymmetric peaks, the FWHM only refers to the half of the peak that is symmetrical. The actual FWHM of the peak is calculated numerically and is shown after the actual FWHM in the Peak Parameter Window. If the asymmetric peak is a doublet (p, d or f type peak), the actual FWHM is the FWHM of the doublet.Region ShiftA Region Shift parameter was added under the Parameters menu∙Use this parameter to compensate for the charging effect, the fermi level shift or any change in the system work function∙This value will be added to all the peak positions in the region for fitting purposes.An example:∙ A polymer surface is positively charged and all the peaks are shifted to the high binding energy by +0.5eV, e.g. aliphatic carbon at 285.0eV shifts to 285.5eV∙When the Region Shift parameter is set to +0.5eV, 0.5eV will be added to all the peak positions in the region during peak fitting, but the listed peak positions are not changed, e.g. 285.0eV for aliphatic carbon. Note: I have tried this without any actual shift taking place. If someone finds out how to perform this operation, please let me know. Thanks, CMH.In the meantime, I suggest you do the shift before converting your files from the Vision Software format.OptimisationYou can optimise:1. A single peak parameter∙Use the Optimize button beside the parameter in the Peak Fitting window2. The peak (the peak position, area, FWHM, and the %GL if the "fix" box isnot ticked)∙Use the Optimize Peak button at the base of the Peak Fitting window3. A single region (all the parameters of all the peaks in that region if the"fix" box is not ticked)∙Use the Optimize Region menu (button) in the upper window4. All the regions∙Use the Optimize All button in the lower window∙During any type of optimisation, you can press the "Stop Fitting" button and the program will stop the process in the next cycle.Print/ExportIn the XPS Peak Fit or Region window, From the Data menu, choose Export or Print options as desiredExport∙The program can export the ASCII file of spectrum (*.DAT) for making high quality figures using other software (e.g. SigmaPlot)∙It can export the parameters (*.PAR) for further calculations (e.g. use Excel for atomic ratio calculations)∙It can also copy the spectral image to the system clipboard so that the spectral image can be pasted into a document (e.g. MS WORD). Program Options1. The %tolerance allows the optimisation routine to stop if the change inthe difference after one loop is less that the %tolerance2. The default setting of the optimisation is Newton's method∙This method requires a delta value for the optimisation calculations ∙You may need to change the value in some cases, but the existing setting is enough for most data.3. For the binary search method, it searches the best fit for each parameterin up to four levels of value ranges∙For example, for a peak position, in first level, it calculates the chi^2 when the peak position is changed by +2eV, +1.5eV, +1eV, +0.5eV,-0.5eV, -1eV, -1.5eV, and -2eV (range 2eV, step 0.5eV) ∙Then, it selects the position value that gives the lowest chi^2∙In the second level, it searches the best values in the range +0.4eV, +0.3eV, +0.2eV, +0.1eV, -0.1eV, -0.2eV, -0.3eV, and -0.4eV (range0.4eV, step 0.1eV)∙In the third level, it selects the best value in +0.09eV, +0.08eV, ...+0.01eV, -0.01eV, ...-0.09eV∙This will give the best value with two digits after decimal∙Level 4 is not used in the default setting∙The range setting and the number of levels in the option window can be changed if needed.4. The Newton's Method or Binary Search Method can be selected byclicking the "use" selection box of that method.5. The selection of the peak function is also in the Options window.6. The user can save/read the option parameters with the file extension*.opa∙The program reads the default.opa file at start up. Therefore, the user can customize the program options by saving the selectionsinto the default.opa file.CompatibilityThe program can read:∙Kratos text (*.des) files together with the peak fitting parameters in the file∙The ASCII files exported from Phi's Multiplex software∙The ASCII files of Leybold's software∙The VAMAS file format∙For the Phi, Leybold and VAMAS formats, multiple regions can be read∙For the Phi format, if the description contains a comma ",", the program will give an error. (If you get the error, you may use any texteditor to remove the comma)The program can also import ASCII files in the following format:Binding Energy Value 1 Intensity Value 1Binding Energy Value 2 Intensity Value 2etc etc∙The B.E. list must be in ascending or descending order, and the separation of adjacent B.E.s must be the same∙The file cannot have other lines before and after the data∙Sometimes, TAB may cause a reading error.File I/OThe file format of XPSPEAK 4.1 is different from XPSPEAK 3.1, 3.0 and 2.0 ∙XPSPEAK 4.1 can read the file format of XPSPEAK 3.1, 3.0 and 2.0, but not the reverse∙File format of 4.1 is the same as that of 4.0.LimitationsThis program limits the:∙Maximum number of points for each spectrum to 5000∙Maximum of peaks for all the regions to 51∙For each region, the maximum number of peaks is 10. Cautions for Peak FittingSome graduate students believe that the fitting parameters for the best fitted spectrum is the "final answer". This is definitely not true. Adding enough peaks can always fit a spectrum∙Peak fitting only assists the verification of a model∙The user must have a model in mind before adding peaks to the spectrum!Sample Files:gaas.xpsThis file contains 10 spectra1. Use Open XPS to retrieve the file. It includes ten regions∙1-4 for Ga 3d∙5-8 for Ga 3d∙9-10 for S 2p2. For the Ga 3d and As 3d, the peaks are d-type with s.o.s. = 0.3 and 0.9respectively3. Regions 4 and 8 are the sample just after S-treatment4. Other regions are after annealing5. Peak width of Ga 3d and As 3d are constrained to those in regions 1 and56. The fermi level shift of each region was determined using the As 3d5/2peak and the value was put into the "Region Shift" of each region7. Since the region shift takes into account the Fermi level shift, the peakpositions can be easily referenced for the same chemical components in different regions, i.e.∙Peak#1, 3, 5 of Ga 3d are set equal to Peak#0∙Peak#8, 9, 10 of As 3d are set equal to Peak#78. Note that the %GL value of the peaks is 27% using the GL sum functionin Version 4.0, while it is 80% using the GL product function in previous versions.18 Cu2p_bg.xpsThis spectrum was sent to me by Dr. Roland Schlesinger. It shows a background subtraction using the Shirley + Linear method∙See Shirley + Linear Background, (Page 7)Kratos.des∙This file shows a Kratos *.des file∙This is the format your files should be in if they have come from the Kratos instrument∙Use import Kratos to retrieve the file. See Opening Files, (Page 4)∙Note that the four peaks are all s-type∙You may delete peak 2, 4 and change the peak 1,3 to d-type with s.o.s. = 0.7. You may also read in the parameter file: as3d.rpa. ASCII.prn∙This shows an ASCII file∙Use import ASCII to retrieve the file∙It is a As 3d spectrum of GaAs∙In order to fit the spectrum, you need to first add the background and then add two d-type peaks with s.o.s.=0.7∙You may also read in the parameter file: as3d.rpa.Other Files(We don’t have an instrument that produces these files at Auckland University., but you may wish to look at them anyway. See the readme.doc file for more info.)1. Phi.asc2. Leybold.asc3. VAMAS.txt4. VAMASmult.txtHave Fun! July 1, 1999.。