1.Yu and Moyeed(2001)Bayesian quantile regression
- 格式:pdf
- 大小:209.00 KB
- 文档页数:11
Nonlinear Analysis:Real World Applications7(2006)1104–1118/locate/naAnalysis of a predator–prey model with modified Leslie–Gower and Holling-type II schemes with time delayA.F.Nindjin a,M.A.Aziz-Alaoui b,∗,M.Cadivel ba Laboratoire de Mathématiques Appliquées,Universitéde Cocody,22BP582,Abidjan22,Côte d’Ivoire,Franceb Laboratoire de Mathématiques Appliquées,Universitédu Havre,25rue Philippe Lebon,B.P.540,76058Le Havre Cedex,FranceReceived17July2005;accepted7October2005AbstractTwo-dimensional delayed continuous time dynamical system modeling a predator–prey food chain,and based on a modified version of Holling type-II scheme is investigated.By constructing a Liapunov function,we obtain a sufficient condition for global stability of the positive equilibrium.We also present some related qualitative results for this system.᭧2005Elsevier Ltd.All rights reserved.Keywords:Time delay;Boundedness;Permanent;Local stability;Global stability;Liapunov functional1.IntroductionThe dynamic relationship between predators and their prey has long been and will continue to be one of dominant themes in both ecology and mathematical ecology due to its universal existence and importance.A major trend in theoretical work on prey–predator dynamics has been to derive more realistic models,trying to keep to maximum the unavoidable increase in complexity of their mathematics.In this optic,recently[2],see also[1,5,6]has proposed afirst study of two-dimensional system of autonomous differential equation modeling a predator prey system.This model incorporates a modified version of Leslie–Gower functional response as well as that of the Holling-type II.They consider the following model⎧⎪⎪⎨⎪⎪⎩˙x=a1−bx−c1yx+k1x,˙y=a2−c2yx+k2y(1)with the initial conditions x(0)>0and y(0)>0.This two species food chain model describes a prey population x which serves as food for a predator y.The model parameters a1,a2,b,c1,c2,k1and k2are assuming only positive values.These parameters are defined as follows:a1is the growth rate of prey x,b measures the strength of competition among individuals of species x,c1∗Corresponding author.Tel./fax:+133232744.E-mail address:Aziz-Alaoui@univ-lehavre.fr(M.A.Aziz-Alaoui).1468-1218/$-see front matter᭧2005Elsevier Ltd.All rights reserved.doi:10.1016/j.nonrwa.2005.10.003A.F.Nindjin et al./Nonlinear Analysis:Real World Applications7(2006)1104–11181105 is the maximum value of the per capita reduction rate of x due to y,k1(respectively,k2)measures the extent to which environment provides protection to prey x(respectively,to the predator y),a2describes the growth rate of y,and c2 has a similar meaning to c1.It wasfirst motivated more by the mathematics analysis interest than by its realism as a model of any particular natural dynamical system.However,there may be situations in which the interaction between species is modelized by systems with such a functional response.It may,for example,be considered as a representation of an insect pest–spider food chain.Furthermore,it is afirst step towards a predator–prey model(of Holling–Tanner type)with inverse trophic relation and time delay,that is where the prey eaten by the mature predator can consume the immature predators.Let us mention that thefirst equation of system(1)is standard.By contrast,the second equation is absolutely not standard.This intactness model contains a modified Leslie–Gower term,the second term on the right-hand side in the second equation of(1).The last depicts the loss in the predator population.The Leslie–Gower formulation is based on the assumption that reduction in a predator population has a reciprocal relationship with per capita availability of its preferred food.Indeed,Leslie introduced a predator prey model where the carrying capacity of the predator environment is proportional to the number of prey.He stresses the fact that there are upper limits to the rates of increase of both prey x and predator y,which are not recognized in the Lotka–V olterra model.In case of continuous time,the considerations lead to the following:d y d t =a2y1−yx,in which the growth of the predator population is of logistic form,i.e.d y d t =a2y1−yC.Here,“C”measures the carry capacity set by the environmental resources and is proportional to prey abundance,C= x, where is the conversion factor of prey into predators.The term y/ x of this equation is called the Leslie–Gower term.It measures the loss in the predator population due to the rarity(per capita y/x)of its favorite food.In the case of severe scarcity,y can switch over to other population,but its growth will be limited by the fact that its most favorite food,the prey x,is not available in abundance.The situation can be taken care of by adding a positive constant to the denominator,hence the equation above becomes,d y d t =a2y1−yand thus,d y d t =y⎛⎜⎝a2−a2.yx+d⎞⎟⎠that is the second equation of system(1).In this paper,we introduce time delays in model(1),which is a more realistic approach to the understanding of predator–prey dynamics.Time delay plays an important role in many biological dynamical systems,being particularly relevant in ecology,where time delays have been recognized to contribute critically to the stable or unstable outcome of prey densities due to predation.Therefore,it is interesting and important to study the following delayed modified Leslie–Gower and Holling-Type-II schemes:⎧⎪⎪⎨⎪⎪⎩˙x(t)=a1−bx(t)−c1y(t)x(t)+k1x(t),˙y(t)=a2−c2y(t−r)x(t−r)+k2y(t)(2)for all t>0.Here,we incorporate a single discrete delay r>0in the negative feedback of the predator’s density.1106 A.F .Nindjin et al./Nonlinear Analysis:Real World Applications 7(2006)1104–1118Let us denote by R 2+the nonnegative quadrant and by int (R 2+)the positive quadrant.For ∈[−r,0],we use the following conventional notation:x t ( )=x( +t).Then the initial conditions for this system take the formx 0( )= 1( ),y 0( )= 2( )(3)for all ∈[−r,0],where ( 1, 2)∈C([−r,0],R 2+),x(0)= 1(0)>0and y(0)= 2(0)>0.It is well known that the question of global stability of the positive steady state in a predator–prey system,with a single discrete delay in the predator equation without instantaneous negative feedback,remains a challenge,see [3,5,7].Our main purpose is to present some results about the global stability analysis on a system with delay containing modified Leslie–Gower and Holling-Type-II terms.This paper is organized as follows.In the next section,we present some preliminary results on the boundedness of solutions for system (2)–(3).Next,we study some equilibria properties for this system and give a permanence result.In Section 5,the analysis of the global stability is made for a boundary equilibrium and sufficient conditions are provided for the positive equilibrium of both instantaneous system (1)and system with delay (2)–(3)to be globally asymptotically stable.Finally,a discussion which includes local stability results for system (2)–(3)is given.2.PreliminariesIn this section,we present some preliminary results on the boundedness of solutions for system (2)–(3).We consider (x,y)a noncontinuable solution,see [4],of system (2)–(3),defined on [−r,A [,where A ∈]0,+∞].Lemma 1.The positive quadrant int (R 2+)is invariant for system (2).Proof.We have to show that for all t ∈[0,A [,x(t)>0and y(t)>0.Suppose that is not true.Then,there exists 0<T <A such that for all t ∈[0,T [,x(t)>0and y(t)>0,and either x(T )=0or y(T )=0.For all t ∈[0,T [,we havex(t)=x(0)expt0a 1−bx(s)−c 1y(s)x(s)+k 1d s (4)andy(t)=y(0)expt0a 2−c 2y(s −r)x(s −r)+k 2d s .(5)As (x,y)is defined and continuous on [−r,T ],there is a M 0such that for all t ∈[−r,T [,x(t)=x(0)expt0a 1−bx(s)−c 1y(s)x(s)+k 1d s x(0)exp (−T M)andy(t)=y(0)expt0a 2−c 2y(s −r)x(s −r)+k 2d sy(0)exp (−T M).Taking the limit,as t →T ,we obtainx(T ) x(0)exp (−T M)>0andy(T ) y(0)exp (−T M)>0,which contradicts the fact that either x(T )=0or y(T )=0.So,for all t ∈[0,A [,x(t)>0and y(t)>0.A.F.Nindjin et al./Nonlinear Analysis:Real World Applications7(2006)1104–11181107 Lemma2.For system(2)–(3),A=+∞andlim supt→+∞x(t) K(6) andlim supt→+∞y(t) L(7) where K=a1/b and L=a2/c2(K+k2)e a2r.Proof.From thefirst equation of system(2)–(3),we have for all t∈[0,A[,˙x(t)<x(t)(a1−bx(t)).A standard comparison argument shows that for all t∈[0,A[,x(t) ˜x(t)where˜x is the solution of the following ordinary differential equation˙˜x(t)=˜x(t)(a1−b˜x(t)),˜x(0)=x(0)>0.As lim t→+∞˜x(t)=a1/b,then˜x and thus x is bounded on[0,A[.Moreover,from Eq.(5),we can define y on all interval[kr,(k+1)r],with k∈N,and it is easy to see that y is bounded on[0,A[if A<+∞.Then A=+∞,see [4,Theorem2.4].Now,as for all t 0,x(t) ˜x(t),thenlim sup t→+∞x(t) lim supt→+∞˜x(t)=K.From the predator equation,we have˙y(t)<a2y(t),hence,for t>r,y(t) y(t−r)e a2r,which is equivalent,for t>r,toy(t−r) y(t)e−a2r.(8) Moreover,for any >1,there exists positive T ,such that for t>T ,x(t)< K.According to(8),we have,fort>T +r,˙y(t)<y(t)a2−c2e−a2rK+k2y(t),which implies by the same arguments use for x that,lim supt→+∞y(t) L ,where L =a2/c2( K+k2)e a2r.Conclusion of this lemma holds by letting →1.3.EquilibriaIn this section we study some equilibria properties of system(2)–(3).These steady states are determined analytically by setting˙x=˙y=0.They are independent of the delay r.It is easy to verify that this system has three trivial boundary equilibria,E0=(0,0),E1=(a1/b,0)and E2(0,a2k2/c2).1108 A.F .Nindjin et al./Nonlinear Analysis:Real World Applications 7(2006)1104–1118Proposition 3.System (2)–(3)has a unique interior equilibrium E ∗=(x ∗,y ∗)(i.e.x ∗>0and y ∗>0)if the following condition holdsa 2k 2c 2<a 1k 1c 1.(9)Proof.From system (2)–(3),such a point satisfies(a 1−bx ∗)(x ∗+k 1)=c 1y ∗(10)andy ∗=a 2(x ∗+k 2)c 2.(11)If (9)holds,this system has two solutions (x ∗+,y ∗+)and (x ∗−,y ∗−)given by ⎧⎪⎪⎨⎪⎪⎩x ∗±=12c 2b (−(c 1a 2−a 1c 2+c 2bk 1)± 1/2),y ∗±=a 2(x ∗±+k 2)c 2,where =(c 1a 2−a 1c 2+c 2bk 1)2−4c 2b(c 1a 2k 2−c 2a 1k 1)>0.Moreover,it is easy to see that,x ∗+>0and x ∗−<0. Linear analysis of system (2)–(3)shows that point E 0is unstable (it repels in both x and y directions)and point E 1is also unstable (it attracts in the x -direction but repels in the y -direction).For r >0,the characteristic equation of the linearized system at E 2takes the formP 2( )+Q 2( )e − r =0,whereP 2( )= 2−A ,Q 2( )=a 2( −A),andA =a 1−c 1a 2k 2c 2k 1.Let us defineF 2(y)=|P 2(iy)|2−|Q 2(iy)|2.It is easy to verify that the equation F 2(y)=0has one positive root.Therefore,if E 2is unstable for r =0,it will remain so for all r >0,and if it is stable for r =0,there is a positive constant r 2,such that for r >r 2,E 2becomes unstable.It is easy to verify that for r =0,E 2is asymptotically stable if a 2k 2/c 2>a 1k 1/c 1,stable (but not asymptotically)if a 2k 2/c 2=a 1k 1/c 1and unstable if a 2k 2/c 2<a 1k 1/c 1.4.Permanence resultsDefinition 4.System (2)–(3)is said to be permanent,see [4],if there exist , ,0< < ,independent of the initial condition,such that for all solutions of this system,min lim inf t →+∞x(t),lim inf t →+∞y(t)A.F .Nindjin et al./Nonlinear Analysis:Real World Applications 7(2006)1104–11181109andmaxlim sup t →+∞x(t),lim sup t →+∞y(t).Theorem 5.System (2)–(3)is permanent ifL <a 1k 1c 1.(12)Proof.By Lemma (2),there is a =max {K,L }>0independent of the initial condition such thatmax lim sup t →+∞x(t),lim sup t →+∞y(t) .We only need to show that there is a >0,independent of the initial data,such thatmin lim inf t →+∞x(t),lim inf t →+∞y(t) .It is easy to see that,for system (2)–(3),for any >1and for t large enough,we have y(t)< L .Thus,we obtain˙x >xa 1−bx −c 1 Lk 1 .By standard comparison arguments,it follows thatlim inf t →+∞x(t)1ba 1−c 1 L k 1 and letting →1,we obtainlim inf t →+∞x(t)1ba 1−c 1Lk 1.(13)Let us denote by N 1=1/b(a 1−c 1L/k 1).If (12)is satisfied,N 1>0.From (13)and Lemma 2and for any >1,there exists a positive constant,T ,such that for t >T ,x(t)>N 1/ and y(t)< L .Then,for t >T +r ,we have˙y(t)>y(t)a 2−c 2N 1+ k 2y(t −r) .(14)On the one hand,for t >T +r ,these inequalities lead to˙y(t)>−2c 2LN 1+ k 2y(t),which involves,for t >T +r ,y(t −r)<y(t)exp2c 2LN 12r .(15)On the other hand,from (14)and (15),we have for t >T +r ,˙y(t)>y(t)a 2−c 2N 1+ k 2exp 2c 2LN 1+ k 2r y(t) which yieldslim inf t →+∞y(t)a 2(N 1+ k 2)2exp − 2c 2LN 12r =y .1110 A.F .Nindjin et al./Nonlinear Analysis:Real World Applications 7(2006)1104–1118Letting →1,we getlim inf t →+∞y(t) a 2(N 1+k 2)c 2exp −c 2LN 1+k 2r=y 1.Let be =min {N 1,y 1}>0.Then we have shown that system (2)–(3)is permanent.5.Global stability analysis 5.1.Stability of E 2Theorem 6.If a 1−c 1y 1(k 1<0,then E 2is globally asymptotically stable for system (2)–(3).Proof.As lim inf t →+∞y(t) y 1and lim sup t →+∞x(t) K ,from the prey’s equation we obtain:∀ 1,∃T >0,∀t >T ,˙x(t)<x(t)a 1−bx −c 1y 1(k 1+ K).Asa 1−c 1y 1(k 1+K)<0,there exists >1such thata 1−c 1y 1(k 1<0.Then,by standard comparison arguments,it follows that lim sup t →+∞x(t) 0and thus,lim t →+∞x(t)=0.The -limit set of every solution with positive initial conditions is then contained in {(0,y),y 0}.Now from (7),we obviously obtain ⊂{(0,y),0 y L }.As E 0/∈ ,(E 0is unstable is repels in both x and y directions)and as is nonempty closed and invariant set,therefore ={E 2}. 5.2.Stability of E ∗without delayFirst,we give some sufficient conditions which insure that the steady state in the instantaneous system,i.e.without time delay,is globally asymptotically stable.Theorem 7.The interior equilibrium E ∗is globally asymptotically stable ifa 1+c 1<b(k 1+x ∗),(16)a 1a 2<bk 2(c 2−a 2).(17)Proof.The proof is based on constructing a suitable Lyapunov function.We defineV (x,y)=(x −x ∗)−x ∗ln x x ∗ + (y −y ∗)−y ∗ln yy ∗ ,where =k 2c 1/k 1a 2.A.F .Nindjin et al./Nonlinear Analysis:Real World Applications 7(2006)1104–11181111This function is defined and continuous on int(R 2+).It is obvious that the function V is zero at the equilibrium E∗and is positive for all other values of x and y ,and thus,E ∗is the global minimum of V .The time derivative of V along a solution of system (2)–(3)is given byd V d t =(x −x ∗) a 1−bx −c 1y x +k 1 + (y −y ∗) a 2−c 2y x +k 2.Centering d V /d t on the positive equilibrium,we getd V d t =−b +a 1−bx ∗x +k 1 (x −x ∗)2−c 2x +k 2(y −y ∗)2+a 2 x +k 2−c 1x +k 1 (x −x ∗)(y −y ∗)= −b +a 1−bx ∗x +k 1(x −x ∗)2−c 2x +k 2(y −y ∗)2(18)+a 2x +k 2+c 1x +k 1 (x −x ∗)2+(y −y ∗)22. −b +a 1−bx ∗k 1+c 1k1(x −x ∗)2+−c 2 x +k 2+c 1k 1(y −y ∗)2 −b +a 1−bx ∗k 1+c 1k 1(x −x ∗)2+c 1k 1(x +k 2)x +k 2−c 2k 2a 2 (y −y ∗)2.(19)From (16),we obtain−b +a 1−bx ∗k 1+c 1k 1<0.From (6)and (17),there exists >1and T >0,such thatK +k 2−c 2k 2a 2<0,and for t >T ,d V d t−b +a 1−bx ∗k 1+c 1k 1(x −x ∗)2+c 1k 1(x +k 2)K +k 2−c 2k 2a 2(y −y ∗)2.Thus,d V /d t is negative definite provided that (16)and (17)holds true.Finally,E ∗is globally asymptoticallystable.If in particular,we suppose that environment provides the same protection to both prey and predator (i.e.k 1=k 2)then Theorem 7can be simplified as follows.Corollary 8.The interior equilibrium E ∗is globally asymptotically stable ifk 1=k 2(20)anda 1<b(k 1+x ∗).(21)1112 A.F .Nindjin et al./Nonlinear Analysis:Real World Applications 7(2006)1104–1118Proof.From (18),we haved Vd t =−b +a 1−bx ∗x +k 1 (x −x ∗)2−c 2 x +k 2(y −y ∗)2+c 1(k 2−k 1)xk 1(x +k 2)(x +k 1) (x −x ∗)(y −y ∗) −b +a 1−bx ∗k 1(x −x ∗)2−c 2x +k 2(y −y ∗)2+c 1(k 2−k 1)xk 1(x +k 2)(x +k 1)(x −x ∗)(y −y ∗),which is negative definite provided that (20)and (21)holds true and thus E ∗is globally asymptotically stable.5.3.Stability of E ∗with delayIn this subsection we shall give a result on the global asymptotic stability of the positive equilibrium for the delayed system.Theorem 9.Assume that parameters of system (2)–(3)satisfya 1k 1c 1>maxa 12c 2,32a 2k 2c 2.(22)Then ,for b large enough ,there exists r 0>0such that ,for r ∈[0,r 0],the interior equilibrium E ∗is globally asymp-totically stable in R 2+.Proof.First of all,we rewrite Eqs.(2)to center it on its positive equilibrium.By using the following change of variables,X(t)=lnx(t)x ∗ andY (t)=lny(t)y ∗,the system becomes⎧⎪⎨⎪⎩˙X(t)=x ∗ −b +a 1−bx ∗x(t)+k 1 (e X(t)−1)−c 1y ∗x(t)+k 1(e Y (t)−1),˙Y (t)=−c 2y ∗x(t −r)+k 2(e Y (t −r)−1)+a 2x ∗x(t −r)+k 2(e X(t −r)−1).(23)According to the global existence of solutions established in Lemma 2,we can assume that the initial data exists on [−2r,0](this can be done by changing initial time).Now,let >1be fixed and let us define the following Liapunov functional V :C ([−2r ;0],R 2)→R ,V ( 1, 2)= 1(0)0(e u −1)d u + 2(0)(e u−1)d u +a 2x ∗2k 2 0−r (e 1(u)−1)2d u+c 2y ∗2k 2 0−r 0v e 2(s) c 2y ∗k 2(e 2(s −r)−1)2+a 2x ∗k 2(e 1(s −r)−1)2 d s d v +c 22k 2r L 0−r c 2y ∗k 2(e 2(s)−1)2+a 2x ∗k 2(e 1(s)−1)2 d s .Let us define the continuous and nondecreasing function u :R +→R +byu(x)=e x −x −1.A.F .Nindjin et al./Nonlinear Analysis:Real World Applications 7(2006)1104–11181113We have u(0)=0,u(x)>0for x >0andu(| (0)|) u( 1(0))+u( 2(0)) V ( 1, 2),where =( 1, 2)and |·|denotes the infinity norm in R 2.Let v :R +→R +be define byv(x)=2u(x)+(e x −1)2 a 2x ∗r 2k 2+ c 2y ∗k 2+a 2x ∗k 2 c 2y ∗r 24k 2e x +c 22k 2r 2 L .It is clear that v is a continuous and nondecreasing function and satisfies v(0)=0,v(x)>0andV ( 1, 2) v( ),where · denotes the infinity norm in C ([−2r ;0],R 2).Now,let (X,Y )be a solution of (23)and let us compute˙V(23),the time derivative of V along the solutions of (23).First,we start with the functionV 1( 1, 2)= 1(0)0(e u −1)d u + 2(0)0(e u −1)d u +a 2x ∗2k 2 0−r (e 1(u)−1)2d u .We haveV 1(X t ,Y t )=X(t)(e u−1)d u +Y (t)(e u−1)d u +a 2x ∗2k 2t t −r(e X(u)−1)2d u .Then,˙V1|(23)=(e X(t)−1)˙X(t)+(eY (t)−1)˙Y (t)+a 2x ∗2k 2((e X(t)−1)2−(e X(t −r)−1)2).System (23)gives us˙V 1|(23)=x ∗ −b +a 1−bx ∗x(t)+k 1 (e X(t)−1)2−c 1y ∗x(t)+k 1(e Y (t)−1)(e X(t)−1)−c 2y∗x(t −r)+k 2(e Y (t)−1)(e Y (t −r)−1)+a 2x ∗x(t −r)+k 2(e Y (t)−1)(e X(t −r)−1)+a 2x ∗2k 2((e X(t)−1)2−(e X(t −r)−1)2).By using several times the obvious inequalities of type−c 1y ∗x(t)+k 1(e Y (t)−1)(e X(t)−1) c 1y ∗x(t)+k 1 (e Y (t)−1)22+(e X(t)−1)22c 1y ∗2k 1((e Y (t)−1)2+(e X(t)−1)2),we get˙V1|(23)x ∗−b +a 1−bx ∗k 1+a 22k 2 +c 1y ∗2k 1 (e X(t)−1)2+c 1y ∗2k 1+a 2x ∗2k 2 (e Y (t)−1)2−c 2y ∗x(t −r)+k 2(e Y (t)−1)(e Y (t −r)−1).As,e Y(t−r)=e Y(t)− tt−re Y(s)˙Y(s)d s,we obtain˙V1|(23)x∗−b+a1−bx∗k1+a22k2+c1y∗2k1(e X(t)−1)2+c1y∗2k1+a2x∗2k2−c2y∗x(t−r)+k2(e Y(t)−1)2+c2y∗x(t−r)+k2(e Y(t)−1)tt−re Y(s)˙Y(s)d s.(24)From(23),we have(e Y(t)−1) tt−re Y(s)˙Y(s)d s= tt−re Y(s)−c2y∗x(s−r)+k2(e Y(t)−1)(e Y(s−r)−1)d s +tt−re Y(s)a2x∗x(s−r)+k2(e Y(t)−1)(e X(s−r)−1)d sc2y∗2k2+a2x∗2k2tt−re Y(s)d s.(e Y(t)−1)2+c2y∗2k2tt−re Y(s)(e Y(s−r)−1)2d s+a2x∗2k2tt−re Y(s)(e X(s−r)−1)2d sand as y∗e Y(s) L for s large enough,we obtain,for t large enough,(e Y(t)−1) tt−re Y(s)˙Y(s)d s Lrc22k2+a2x∗2k2y∗(e Y(t)−1)2+ tt−re Y(s)c2y∗2k2(e Y(s−r)−1)2+a2x∗2k2(e X(s−r)−1)2d sand as x(s−r) L for s large enough,(24)becomes,for t large enough,˙V1|(23)x∗−b+a1−bx∗k1+a22k2+c1y∗2k1(e X(t)−1)2+c1y∗2k1+a2x∗2k2−c2y∗x(t−r)+k2+c2Lrk2c2y∗2k2+a2x∗2k2(e Y(t)−1)2+c2y∗k2tt−re Y(s)c2y∗2k2(e Y(s−r)−1)2+a2x∗2k2(e X(s−r)−1)2d s.(25)The next step consists in computation of the time derivative along the solution of(23),of the termV2( 1, 2)=c2y∗2k2−rve 2(s)c2y∗k2(e 2(s−r)−1)2+a2x∗k2(e 1(s−r)−1)2d s d v+c22k2r L−rc2y∗k2(e 2(s)−1)2+a2x∗k2(e 1(s)−1)2d s.We haveV2(X t,Y t)=c2y∗2k2tt−rtve Y(s)c2y∗k2(e Y(s−r)−1)2+a2x∗k2(e X(s−r)−1)2d s d v+c22k2r Ltt−rc2y∗k2(e Y(s)−1)2+a2x∗k2(e X(s)−1)2d s.Then,˙V2|(23)=c2y∗2k2e Y(t)c2y∗k2(e Y(t−r)−1)2+a2x∗k2(e X(t−r)−1)2r−c2y∗2k2tt−re Y(s)c2y∗k2(e Y(s−r)−1)2+a2x∗k2(e X(s−r)−1)2d s+c22k2r Lc2y∗k2(e Y(t)−1)2+a2x∗k2(e X(t)−1)2−c22k2r Lc2y∗k2(e Y(t−r)−1)2+a2x∗k2(e X(t−r)−1)2=c2r2k2c2y∗k2(e Y(t−r)−1)2+a2x∗k2(e X(t−r)−1)2(y∗e Y(t)− L)−c2y ∗2k2 tt−re Y(s)c2y∗k2(e Y(s−r)−1)2+a2x∗k2(e X(s−r)−1)2d s+c22k2r Lc2y∗k2(e Y(t)−1)2+a2x∗k2(e X(t)−1)2.For t large enough,we have y∗e Y(t)− L<0and thus,˙V2|(23) −c2y∗2k2tt−re Y(s)c2y∗k2(e Y(s−r)−1)2+a2x∗k2(e X(s−r)−1)2d s+c22k2r Lc2y∗k2(e Y(t)−1)2+a2x∗k2(e X(t)−1)2.This inequality and(25)lead to,for t large enough,˙V| (23)x∗−b+a1−bx∗k1+a22k2+c1y∗2k1(e X(t)−1)2+c1y∗2k1+a2x∗2k2−c2y∗x(t−r)+k2+c2Lrk2c2y∗2k2+a2x∗2k2(e Y(t)−1)2+c22k2r Lc2y∗k2(e Y(t)−1)2+a2x∗k2(e X(t)−1)2−bx∗(k1+x∗)k1+a1x∗k1+a2x∗2k2+c1y∗2k1+c2a2x∗r L2k22(e X(t)−1)2+c1y∗2k1+a2x∗2k2−c2y∗x(t−r)+k2+c2Lrk2c2y∗k2+a2x∗2k2(e Y(t)−1)2.Now,if−bx ∗(k1+x∗)k1+a1x∗k1+a2x∗2k2+c1y∗2k1<0(26)andc1y∗2k1+a2x∗2k2−c2y∗x(t−r)+k2<0(27)then,for r small enough,we can conclude that ˙V|(23)is negative definite.Hence V satisfies all the assumptions of Corollary 5.2in [4]and the theorem follows.We now study when inequalities (26)and (27)hold.From (10),we have −bx ∗(k 1+x ∗)=c 1(y ∗−a 1(k 1+x ∗)/c 1)and using (11),(26)becomes32c 1k 1a 2(x ∗+k 2)c 2−a 1+a 2x ∗2k 2<0which is rewritten as x ∗a 2k 1c 1k 2+3a 2c 2<2a 1k 1c 1−3a 2k 2c 2.(28)As x ∗ a 1b ,then the following inequalitya 1a 2k 1c 1k 2+3a 2c 2 <b 2a 1k 1c 1−3a 2k 2c 2(29)implies (28).Now,if the following inequality holdsc 1y ∗2k 1+a 2x ∗2k 2−c 2y ∗a 1b+k 2<0,(30)then,for t large enough,(27)holds ing (11),(30)is reformulated asc 1y ∗2k 1+c 2y ∗−a 2k 22k 2<c 2y ∗a1b+k 2c 12k 1+c 22k 2<c 2a 1b+k 2+a 22y ∗c 12k 1+c 22k 2<c 2a 1b +k 2+c 22(x ∗+k 2)c 1k 1<c 2⎛⎜⎝2a 1b +k 2+1x ∗2−1k 2⎞⎟⎠.As x ∗ a 1/b ,then the last inequality is satisfied ifc 1k 1<c 2⎛⎜⎝3a 1b +k 2−1k 2⎞⎟⎠(31)that is ifa 1c 1k 2k 1+a 1c 2<b k 2k 1(2c 2k 1−c 1).(32)In conclusion,ifa 1k 1c 1>maxa 12c 2,32a 2k 2c 2and for b large enough,there exists a unique interior equilibrium and for r small enough it is globally asymptoticallystable.6.DiscussionIt is interesting to discuss the effect of time delay r on the stability of the positive equilibrium of system (2)–(3).We assume that positive equilibrium E ∗exists for this system.Linearizing system (2)–(3)at E ∗,we obtain˙X(t)=A 11X(t)+A 12Y (t),˙Y (t)=A 21X(t −r)+A 22Y (t −r),(33)whereA 11=−bx ∗+c 1x ∗y ∗(x ∗+k 1)2,A 12=−c 1x ∗x ∗+k 1,A 21=c 2(y ∗)2(x ∗+k 2)2=a 22c 2andA 22=−c 2y ∗x ∗2=−a 2.The characteristic equation for (33)takes the formP ( )+Q( )e − r =0(34)in whichP ( )= 2−A 11 andQ( )=−A 22 +(A 11A 22−A 12A 21).When r =0,we observe that the jacobian matrix of the linearized system isJ = A 11A 12A 21A 22 .One can verify that the positive equilibrium E ∗is stable for r =0provided thata 1<bk 1.(35)Indeed,when (35)holds,then it is easy to verify thatT r(J )=A 11+A 22<0andDet (J )=A 11A 22−A 12A 21>0.By denotingF (y)=|P (iy)|2−|Q(iy)|2,we haveF (y)=y 4+(A 211−A 222)y 2−(A 11A 22−A 12A 21)2.If (35)holds,then =0is not a solution of (34).It is also easy to verify that if (35)holds,then equation F (y)=0has at least one positive root.By applying standard theorem on the zeros of transcendental equation,see [4,Theorem4.1],we see that there is a positive constant r0(which can be evaluated explicitly),such that for r>r0,E∗becomes unstable.Then,the global stability of E∗involves restrictions on length of time delay r.Therefore,it is obvious that time delay has a destabilized effect on the positive equilibrium of system(2)–(3).References[1]M.A.Aziz-Alaoui,Study of Leslie–Gower-type tritrophic population,Chaos Solitons Fractals14(8)(2002)1275–1293.[2]M.A.Aziz-Alaoui,M.Daher Okiye,Boundeness and global stability for a predator–prey model with modified Leslie–Gower and Holling-typeIIschemes,Appl.Math.Lett.16(2003)1069–1075.[3]E.Beretta,Y.Kuang,Global analyses in some delayed ratio-depended predator–prey systems,Nonlinear Anal.Theory Methods Appl.32(3)(1998)381–408.[4]Y.Kuang,Delay Differential Equations,with Applications in Population Dynamics,Academic Press,New York,1993.[5]R.K.Upadhyay,V.Rai,Crisis-limited chaotic dynamics in ecological systems,Chaos Solitons Fractals12(2)(2001)205–218.[6]R.K.Upadhyay,S.R.K.Iyengar,Effect of seasonality on the dynamics of2and3species prey–predator system,Nonlinear Anal.:Real WorldAppl.6(2005)509–530.[7]R.Xu,M.A.J.Chaplain,Persistence and global stability in a delayed predator–prey system with Michaelis–Menten type functional response,put.130(2002)441–455.。
贝叶斯混合效应模型1. 引言贝叶斯混合效应模型(Bayesian Mixed Effects Model)是一种用于统计建模的方法,常用于分析具有层次结构和重复测量的数据。
该模型结合了贝叶斯统计学和混合效应模型的思想,能够对个体差异和群体差异进行建模,并通过后验分布进行参数估计。
本文将介绍贝叶斯混合效应模型的基本概念、建模步骤以及在实际数据分析中的应用。
同时还将讨论该模型的优点和限制,并给出一些相关资源供读者进一步学习和探索。
2. 贝叶斯统计学基础在介绍贝叶斯混合效应模型之前,我们先来回顾一下贝叶斯统计学的基本概念。
2.1 贝叶斯公式贝叶斯公式是贝叶斯统计学的核心思想,它描述了如何根据观察到的数据更新对参数的信念。
设θ为待估参数,x为观测到的数据,则根据贝叶斯公式,后验概率可以表示为:P(θ|x)=P(x|θ)P(θ)P(x)其中,P(x|θ)为似然函数,表示在给定参数θ的情况下观测到数据x的概率;P(θ)为先验概率,表示对参数θ的先前信念;P(x)为边缘概率,表示观测到数据x的概率。
2.2 贝叶斯模型贝叶斯统计学将参数视为随机变量,并引入先验分布来描述对参数的不确定性。
在贝叶斯模型中,我们可以通过似然函数和先验分布来计算后验分布,从而得到关于参数的更准确的推断。
常见的贝叶斯模型包括线性回归模型、混合效应模型等。
其中,混合效应模型是一种广泛应用于多层次数据分析中的方法。
3. 混合效应模型基础混合效应模型(Mixed Effects Model),也称为多层次线性模型(Hierarchical Linear Model),是一种用于分析具有层次结构和重复测量的数据的统计建模方法。
3.1 模型结构混合效应模型将数据分为不同层次,并假设每个层次具有不同的随机效应。
模型的基本结构可以表示为:y ij=X ijβ+Z ij b i+ϵij其中,y ij表示第i个个体在第j个层次上的观测值;X ij和Z ij分别为固定效应和随机效应的设计矩阵;β为固定效应系数;b i为第i个个体的随机效应;ϵij为误差项。
贝叶斯公式王社英2015年11月7日摘要贝叶斯公式原来没搞懂,据说它很重要,用的很广,自己重新看看书,总结了一下计算的方法,理解定理含有的意义。
目录1贝叶斯公式1 2计算方法2 3意义3 1贝叶斯公式定理1.1.设E是随机实验,若B,A1,A2,···,A n是E中的事件,且满足:质贱贩P质A i贩>贰贬i贽贱,贲,···,n贻质贲贩事件A1,A2,···,A n是样本空间的一个分割贻质贳贩P质B贩>贰贮则P质A i|B贩贽P质A i B贩P质B贩贽P质A i贩P质B|A i贩nj=1P质A j贩P质B|A j贩,i贽贱,贲,···,n.质贱贩利用条件概率公式和全概率公式易证式质贱贩.称式质贱贩为贝叶斯公式(Bayes formula),又称之为逆概率公式.它是概率论中一个著名的公式,由英国学者贝叶斯首先提出。
贱2计算方法贝叶斯公式的计算可以画一个图,或者叫做概率树贱P 质A 2贩P 质贖B|A 2贩P 质B |A 2贩P 质A 1贩P 质贖B|A 1贩P 质B |A 1贩贝叶斯公式的计算就是两层贮•第一层的子树数目不定,最常见的是两个;•第二层的子树是确定的,就是两个;•要把第二层的位置放整齐贬含有B 全部放在上面贮那么,贝叶斯公式的计算,就可以流程化了。
P 质A i 贩P 质B |A i 贩 n j =1P 质A j 贩P 质B |A j 贩,i 贽贱,贲,···,n.质贲贩可以把P 质A i 贩P 质B |A i 贩视为根节点P 质A i 贩与子节点P 质B |A i 贩的乘积。
把P 质A i 贩从上到下依次计算。
当我们计算时,•如果计算事件B 发生了,那么所有出现贖B的项可以全部忽略贻•如果计算事件贖B发生了,那么所有出现B 的项可以全部忽略如果我们定义向量•x 贽质P 质A 1贩,P 质A 2贩,···,P 质A n 贩贩贻•y 贽质P 质B |A 1贩,P 质B |A 2贩,···,P 质B |A n 贩贩贻•¯y 贽质P 质贖B|A 1贩,P 质贖B |A 2贩,···,P 质贖B |A n 贩贩贮那么贝叶斯公式的计算可以用向量表示如下贺贲•事件B已经发生的情况下,事件A i发生的概率更新为贺P质A i贩贽x i·y ix·y,i贽贱,贲,...,n.•事件贖B已经发生的情况下,事件A i发生的概率更新为贺P质A i贩贽x i·贖y i x·¯y3意义赛贱贬走赡赧赥赳贲贰购贲贳赝在全概率公式和贝叶斯公式中,如果我们把事件B看成“结果”,而把事件A1,A2,···,A n看成导致结果发生的可能“原因”,则可以形象的把全概率公式看成“由原因推结果”贻而贝叶斯公式恰好相反,其作用在于“由结果找原因”贮在贝叶斯公式中,称P质A i贩为事件A i的先验概率,称P质A i|B贩为事件A i的后验概率,贝叶斯公式是专门用于计算后验概率的贮也就是说,在没有更多的信息质不知事件B是否发生贩的情况下,人们对诸事件A i,A2,···,A n发生的可能性有一个最初的认识贮当有了新的信息质知道事件B已经发生贩贬人们对A i,A2,···,A n发生的可能性大小就有了新的估计贮下面的例子很好的说明了这一点贮例3.1.伊索寓言“孩子与狼”讲的是一个小孩每天到山上放羊,山里有狼出没贮有一天,他闲得无聊在山上喊:“狼来了!狼来了!”,山下的村民闻声便去打狼,可是到了山上发现并没有狼贮第二天仍是如此贮第三天狼真的来了,可是无论小孩怎么叫喊,也没有人来救他,因为前两次他说了谎,人们不再相信他了贮现在用贝叶斯公式来分析寓言中村民对这个小孩的可信程度是如何下降的贮首先记A贽{小孩说谎}贬B贽村民相信小孩的话质即小孩可信贩贮不妨设村民过去对这个小孩的印象为P质B贩贽贰.贸,P质贖B贩贽贰.贲,现在用贝叶斯公式来求P质B|A贩,即小孩说了一次谎后,村民对他可信程度的改变贮在贝叶斯公式中我们要用到概率P质A|B贩和P质A|贖B贩贬这两个概念的含义是:前者为“可信”的孩子说谎的可能性,后者为“不可信”的孩子说谎的可能性贮在此不妨设P质A|B贩贽贰.贱,P质A|贖B贩贽贰.贵.第一次村民上山打狼,发现狼没有来,即小孩说了谎贮村民根据这个信息,对这个小孩的可信程度改变为质由贝叶斯公式贩P质B|A贩贽P质B贩P质A|B贩P质B贩P质A|B贩贫P质贖B贩P质A|贖B贩贽贰.贸×贰.贱贰.贸×贰.贱贫贰.贲×贰.贵贽贰.贴贴贴,贳这表明村民上了一次当后,对这个小孩的可信程度由原来的贰.贸调整为贰.贴贴贴贬即P质B贩贽贰.贴贴贴,P质贖B贩贽贰.贵贵贶,在此基础上,我们再一次利用贝叶斯公式来计算P质B|A贩贬即小孩第二次说谎后,村民对他的可信度改变为P质B|A贩贽贰.贴贴贴×贰.贱贰.贴贴贴×贰.贱贫贰.贵贵贶贽贰.贱贳贸,这表明村民经过两次上当,对这个小孩的可信程度由原来的贰.贸下降到了贰.贱贳贸贬如此低的可信度,村名们听到第三次呼叫时怎么会再上山打狼呢贮贝叶斯公式解释了一种直观的现象,一个说谎的人真的说谎了,他的可信度更低,一个被检测有病的人真的有病了,那么以后就对诊断结果会更相信。
The Cross-Section of V olatility and Expected Returns∗Andrew Ang†Columbia University,USC and NBERRobert J.Hodrick‡Columbia University and NBERYuhang Xing§Rice UniversityXiaoyan Zhang¶Cornell UniversityThis Version:9August,2004∗We thank Joe Chen,Mike Chernov,Miguel Ferreira,Jeff Fleming,Chris Lamoureux,Jun Liu,Lau-rie Hodrick,Paul Hribar,Jun Pan,Matt Rhodes-Kropf,Steve Ross,David Weinbaum,and Lu Zhang for helpful discussions.We also received valuable comments from seminar participants at an NBER Asset Pricing meeting,Campbell and Company,Columbia University,Cornell University,Hong Kong University,Rice University,UCLA,and the University of Rochester.We thank Tim Bollerslev,Joe Chen,Miguel Ferreira,Kenneth French,Anna Scherbina,and Tyler Shumway for kindly providing data. We especially thank an anonymous referee and Rob Stambaugh,the editor,for helpful suggestions that greatly improved the article.Andrew Ang and Bob Hodrick both acknowledge support from the NSF.†Marshall School of Business,USC,701Exposition Blvd,Room701,Los Angeles,CA90089.Ph: 2137405615,Email:aa610@,WWW:/∼aa610.‡Columbia Business School,3022Broadway Uris Hall,New York,NY10027.Ph:(212)854-0406, Email:rh169@,WWW:/∼rh169.§Jones School of Management,Rice University,Rm230,MS531,6100Main Street,Houston TX 77004.Ph:(713)348-4167,Email:yxing@;WWW:/yxing ¶336Sage Hall,Johnson Graduate School of Management,Cornell University,Ithaca NY14850. Ph:(607)255-8729Email:xz69@,WWW:/faculty/pro-files/xZhang/AbstractWe examine the pricing of aggregate volatility risk in the cross-section of stock returns. Consistent with theory,wefind that stocks with high sensitivities to innovations in aggregate volatility have low average returns.In addition,wefind that stocks with high idiosyncratic volatility relative to the Fama and French(1993)model have abysmally low average returns. This phenomenon cannot be explained by exposure to aggregate volatility risk.Size,book-to-market,momentum,and liquidity effects cannot account for either the low average returns earned by stocks with high exposure to systematic volatility risk or for the low average returns of stocks with high idiosyncratic volatility.1IntroductionIt is well known that the volatility of stock returns varies over time.While considerable research has examined the time-series relation between the volatility of the market and the expected re-turn on the market(see,among others,Campbell and Hentschel(1992),and Glosten,Jagan-nathan and Runkle(1993)),the question of how aggregate volatility affects the cross-section of expected stock returns has received less attention.Time-varying market volatility induces changes in the investment opportunity set by changing the expectation of future market returns, or by changing the risk-return trade-off.If the volatility of the market return is a systematic risk factor,an APT or factor model predicts that aggregate volatility should also be priced in the cross-section of stocks.Hence,stocks with different sensitivities to innovations in aggregate volatility should have different expected returns.Thefirst goal of this paper is to provide a systematic investigation of how the stochastic volatility of the market is priced in the cross-section of expected stock returns.We want to de-termine if the volatility of the market is a priced risk factor and estimate the price of aggregate volatility risk.Many option studies have estimated a negative price of risk for market volatil-ity using options on an aggregate market index or options on individual stocks.1Using the cross-section of stock returns,rather than options on the market,allows us to create portfolios of stocks that have different sensitivities to innovations in market volatility.If the price of ag-gregate volatility risk is negative,stocks with large,positive sensitivities to volatility risk should have low average ing the cross-section of stock returns also allows us to easily con-trol for a battery of cross-sectional effects,like the size and value factors of Fama and French (1993),the momentum effect of Jegadeesh and Titman(1993),and the effect of liquidity risk documented by P´a stor and Stambaugh(2003).Option pricing studies do not control for these cross-sectional risk factors.Wefind that innovations in aggregate volatility carry a statistically significant negative price of risk of approximately-1%per annum.Economic theory provides several reasons why the price of risk of innovations in market volatility should be negative.For example,Campbell (1993and1996)and Chen(2002)show that investors want to hedge against changes in mar-ket volatility,because increasing volatility represents a deterioration in investment opportuni-ties.Risk averse agents demand stocks that hedge against this risk.Periods of high volatility also tend to coincide with downward market movements(see French,Schwert and Stambaugh (1987),and Campbell and Hentschel(1992)).As Bakshi and Kapadia(2003)comment,assets 1See,among others,Jackwerth and Rubinstein(1996),Bakshi,Cao and Chen(2000),Chernov and Ghysels (2000),Burashi and Jackwerth(2001),Coval and Shumway(2001),Benzoni(2002),Jones(2003),Pan(2002), Bakshi and Kapadia(2003),Eraker,Johannes and Polson(2003),and Carr and Wu(2003).with high sensitivities to market volatility risk provide hedges against market downside risk. The higher demand for assets with high systematic volatility loadings increases their price and lowers their average return.Finally,stocks that do badly when volatility increases tend to have negatively skewed returns over intermediate horizons,while stocks that do well when volatil-ity rises tend to have positively skewed returns.If investors have preferences over coskewness (see Harvey and Siddique(2000)),stocks that have high sensitivities to innovations in market volatility are attractive and have low returns.2The second goal of the paper is to examine the cross-sectional relationship between id-iosyncratic volatility and expected returns,where idiosyncratic volatility is defined relative to the standard Fama and French(1993)model.3If the Fama-French model is correct,forming portfolios by sorting on idiosyncratic volatility will obviously provide no difference in average returns.Nevertheless,if the Fama-French model is false,sorting in this way potentially provides a set of assets that may have different exposures to aggregate volatility and hence different aver-age returns.Our logic is the following.If aggregate volatility is a risk factor that is orthogonal to existing risk factors,the sensitivity of stocks to aggregate volatility times the movement in aggregate volatility will show up in the residuals of the Fama-French model.Firms with greater sensitivities to aggregate volatility should therefore have larger idiosyncratic volatilities relative to the Fama-French model,everything else being equal.Differences in the volatilities offirms’true idiosyncratic errors,which are not priced,will make this relation noisy.We should be able to average out this noise by constructing portfolios of stocks to reveal that larger idiosyncratic volatilities relative to the Fama-French model correspond to greater sensitivities to movements in aggregate volatility and thus different average returns,if aggregate volatility risk is priced.While high exposure to aggregate volatility risk tends to produce low expected returns,some economic theories suggest that idiosyncratic volatility should be positively related to expected returns.If investors demand compensation for not being able to diversify risk(see Malkiel and Xu(2002),and Jones and Rhodes-Kropf(2003)),then agents will demand a premium for holding stocks with high idiosyncratic volatility.Merton(1987)suggests that in an information-segmented market,firms with largerfirm-specific variances require higher average returns to compensate investors for holding imperfectly diversified portfolios.Some behavioral models, 2Bates(2001)and Vayanos(2004)provide recent structural models whose reduced form factor structures have a negative risk premium for volatility risk.3Recent studies examining total or idiosyncratic volatility focus on the average level offirm-level volatility. For example,Campbell,Lettau,Malkiel and Xu(2001),and Xu and Malkiel(2003)document that idiosyncratic volatility has increased over time.Brown and Ferreira(2003)and Goyal and Santa-Clara(2003)argue that id-iosyncratic volatility has positive predictive power for excess market returns,but this is disputed by Bali,Cakici, Yan and Zhang(2004).like Barberis and Huang(2001),also predict that higher idiosyncratic volatility stocks should earn higher expected returns.Our results are directly opposite to these theories.Wefind that stocks with high idiosyncratic volatility have low average returns.There is a strongly significant difference of-1.06%per month between the average returns of the quintile portfolio with the highest idiosyncratic volatility stocks and the quintile portfolio with the lowest idiosyncratic volatility stocks.In contrast to our results,earlier researchers either found a significantly positive relation between idiosyncratic volatility and average returns,or they failed tofind any statistically sig-nificant relation between idiosyncratic volatility and average returns.For example,Lintner (1965)shows that idiosyncratic volatility carries a positive coefficient in cross-sectional regres-sions.Lehmann(1990)alsofinds a statistically significant,positive coefficient on idiosyncratic volatility over his full sample period.Similarly,Tinic and West(1986)and Malkiel and Xu (2002)unambiguouslyfind that portfolios with higher idiosyncratic volatility have higher av-erage returns,but they do not report any significance levels for their idiosyncratic volatility premiums.On the other hand,Longstaff(1989)finds that a cross-sectional regression coeffi-cient on total variance for size-sorted portfolios carries an insignificant negative sign.The difference between our results and the results of past studies is that the past literature either does not examine idiosyncratic volatility at thefirm level or does not directly sort stocks into portfolios ranked on this measure of interest.For example,Tinic and West(1986)work only with20portfolios sorted on market beta,while Malkiel and Xu(2002)work only with 100portfolios sorted on market beta and size.Malkiel and Xu(2002)only use the idiosyncratic volatility of one of the100beta/size portfolios to which a stock belongs to proxy for that stock’s idiosyncratic risk and,thus,do not examinefirm-level idiosyncratic volatility.Hence,by not di-rectly computing differences in average returns between stocks with low and high idiosyncratic volatilities,previous studies miss the strong negative relation between idiosyncratic volatility and average returns that wefind.The low average returns to stocks with high idiosyncratic volatilities could arise because stocks with high idiosyncratic volatilities may have high exposure to aggregate volatility risk, which lowers their average returns.We investigate this issue andfind that this is not a complete explanation.Our idiosyncratic volatility results are also robust to controlling for value,size, liquidity,volume,dispersion of analysts’forecasts,and momentum effects.Wefind the effect robust to different formation periods for computing idiosyncratic volatility and for different holding periods.The effect also persists in both bull and bear markets,recessions and expan-sions,and volatile and stable periods.Hence,our results on idiosyncratic volatility represent a substantive puzzle.The rest of this paper is organized as follows.In Section2,we examine how aggregate volatility is priced in the cross-section of stock returns.Section3documents thatfirms with high idiosyncratic volatility have very low average returns.Finally,Section4concludes.2Pricing Systematic Volatility in the Cross-Section2.1Theoretical MotivationWhen investment opportunities vary over time,the multi-factor models of Merton(1973)and Ross(1976)show that risk premia are associated with the conditional covariances between as-set returns and innovations in state variables that describe the time-variation of the investment opportunities.Campbell’s(1993and1996)version of the Intertemporal CAPM(I-CAPM) shows that investors care about risks from the market return and from changes in forecasts of future market returns.When the representative agent is more risk averse than log utility,assets that covary positively with good news about future expected returns on the market have higher average returns.These assets command a risk premium because they reduce a consumer’s abil-ity to hedge against a deterioration in investment opportunities.The intuition from Campbell’s model is that risk-averse investors want to hedge against changes in aggregate volatility because volatility positively affects future expected market returns,as in Merton(1973).However,in Campbell’s set-up,there is no direct role forfluctuations in market volatility to affect the expected returns of assets because Campbell’s model is premised on homoskedastic-ity.Chen(2002)extends Campbell’s model to a heteroskedastic environment which allows for both time-varying covariances and stochastic market volatility.Chen shows that risk-averse in-vestors also want to directly hedge against changes in future market volatility.In Chen’s model, an asset’s expected return depends on risk from the market return,changes in forecasts of future market returns,and changes in forecasts of future market volatilities.For an investor more risk averse than log utility,Chen shows that an asset that has a positive covariance between its return and a variable that positively forecasts future market volatilities causes that asset to have a lower expected return.This effect arises because risk-averse investors reduce current consumption to increase precautionary savings in the presence of increased uncertainty about market returns.Motivated by these multi-factor models,we study how exposure to market volatility risk is priced in the cross-section of stock returns.A true conditional multi-factor representation of expected returns in the cross-section would take the following form:r i t+1=a it+βim,t(r mt+1−γm,t)+βiv,t(v t+1−γv,t)+Kk=1βik,t(f k,t+1−γk,t),(1)where r it+1is the excess return on stock i,βim,tis the loading on the excess market return,βiv,tis the asset’s sensitivity to volatility risk,and theβik,tcoefficients for k=1...K representloadings on other risk factors.In the full conditional setting in equation(1),factor loadings, conditional means of factors,and factor premiums potentially vary over time.The model inequation(1)is written in terms of factor innovations,so r mt+1−γm,t represents the innovation in the market return,v t+1−γv,t represents the innovation in the factor reflecting aggregate volatility risk,and innovations to the other factors are represented by f k,t+1−γk,t.The conditional mean of the market and aggregate volatility are denoted byγm,t andγv,t,respectively,while the conditional mean of the other factors are denoted byγk,t.In equilibrium,the conditional mean of stock i is given by:a i t =E t(r it+1)=βim,tλm,t+βiv,tλv,t+Kk=1βik,tλk,t,(2)whereλm,t is the price of risk of the market factor,λv,t is the price of aggregate volatility risk, and theλk,t are prices of risk of the other factors.Note that only if a factor is traded is the conditional mean of a factor equal to its conditional price of risk.The main prediction from the factor model setting of equation(1)that we examine is that stocks with different loadings on aggregate volatility risk have different average returns.4How-ever,the true model in equation(1)is infeasible to examine because the true set of factors is unknown and the true conditional factor loadings are unobservable.Hence,we do not attempt to directly use equation(1)in our empirical work.Instead,we simplify the full model in equation (1),which we now detail.2.2The Empirical FrameworkTo investigate how aggregate volatility risk is priced in the cross-section of equity returns we make the following simplifying assumptions to the full specification in equation(1).First,we use observable proxies for the market factor and the factor representing aggregate volatility risk. We use the CRSP value-weighted market index to proxy for the market factor.To proxy innova-tions in aggregate volatility,(v t+1−γv,t),we use changes in the V IX index from the Chicago 4While an I-CAPM implies joint time-series as well as cross-sectional predictability,we do not examine time-series predictability of asset returns by systematic volatility.Time-varying volatility risk generates intertemporal hedging demands in partial equilibrium asset allocation problems.In a partial equilibrium setting,Liu(2001)and Chacko and Viceira(2003)examine how volatility risk affects the portfolio allocation of stocks and risk-free assets, while Liu and Pan(2003)show how investors can optimally exploit the variation in volatility with options.Guo and Whitelaw(2003)examine the intertemporal components of time-varying systematic volatility in a Campbell (1993and1996)equilibrium I-CAPM.Board Options Exchange(CBOE).5Second,we reduce the number of factors in equation(1) to just the market factor and the proxy for aggregate volatility risk.Finally,to capture the con-ditional nature of the true model,we use short intervals,one month of daily data,to take into account possible time-variation of the factor loadings.We discuss each of these simplifications in turn.Innovations in the V IX IndexThe V IX index is constructed so that it represents the implied volatility of a synthetic at-the-money option contract on the S&P100index that has a maturity of one month.It is constructed from eight S&P100index puts and calls and takes into account the American features of the option contracts,discrete cash dividends and microstructure frictions such as bid-ask spreads (see Whaley(2000)for further details).6Figure1plots the V IX index from January1986to December2000.The mean level of the daily V IX series is20.5%,and its standard deviation is7.85%.Because the V IX index is highly serially correlated with afirst-order autocorrelation of 0.94,we measure daily innovations in aggregate volatility by using daily changes in V IX, which we denote as∆V IX.Dailyfirst differences in V IX have an effective mean of zero(less than0.0001),a standard deviation of2.65%,and also have negligible serial correlation(the first-order autocorrelation of∆V IX is-0.0001).As part of our robustness checks in Section 2.3,we also measure innovations in V IX by specifying a stationary time-series model for the conditional mean of V IX andfind our results to be similar to using simplefirst differences. While∆V IX seems an ideal proxy for innovations in volatility risk because the V IX index is representative of traded option securities whose prices directly reflect volatility risk,there are two main caveats with using V IX to represent observable market volatility.Thefirst concern is that the V IX index is the implied volatility from the Black-Scholes 5In previous versions of this paper,we also considered sample volatility,following Schwert and Stambaugh (1987);a range-based estimate,following Alizadeh,Brandt and Diebold(2002);and a high-frequency estima-tor of volatility from Andersen,Bollerslev and Diebold(2003).Using these measures to proxy for innovations in aggregate volatility produces little spread in cross-sectional average returns.These tables are available upon request.6On September22,2003,the CBOE implemented a new formula and methodology to construct its volatility index.The new index is based on the S&P500(rather than the S&P100)and takes into account a broader range of strike prices rather than using only at-the-money option contracts.The CBOE now uses V IX to refer to this new index.We use the old index(denoted by the ticker V XO).We do not use the new index because it has been constructed by back-filling only to1990,whereas the V XO is available in real-time from1986.The CBOE continues to make both volatility indices available.The correlation between the new and the old CBOE volatility series is98%from1990-2000,but the series that we use has a slightly broader range.(1973)model,and we know that the Black-Scholes model is an approximation.If the true stochastic environment is characterized by stochastic volatility and jumps,∆V IX will reflect total quadratic variation in both diffusion and jump components(see,for example,Pan(2002)). Although Bates(2000)argues that implied volatilities computed taking into account jump risk are very close to original Black-Scholes implied volatilities,jump risk may be priced differ-ently from volatility risk.Our analysis does not separate jump risk from diffusion risk,so our aggregate volatility risk may include jump risk components.A more serious reservation about the V IX index is that V IX combines both stochastic volatility and the stochastic volatility risk premium.Only if the risk premium is zero or constant would∆V IX be a pure proxy for the innovation in aggregate volatility.Decomposing∆V IX into the true innovation in volatility and the volatility risk premium can only be done by writing down a formal model.The form of the risk premium depends on the parameterization of the price of volatility risk,the number of factors and the evolution of those factors.Each different model specification implies a different risk premium.For example,many stochastic volatility option pricing models assume that the volatility risk premium can be parameterized as a linear function of volatility(see,for example,Chernov and Ghysels(2000),Benzoni(2002),and Jones(2003)).This may or may not be a good approximation to the true price of risk.Rather than imposing a structural form,we use an unadulterated∆V IX series.An advantage of this approach is that our analysis is simple to replicate.The Pre-Formation RegressionOur goal is to test if stocks with different sensitivities to aggregate volatility innovations(prox-ied by∆V IX)have different average returns.To measure the sensitivity to aggregate volatility innovations,we reduce the number of factors in the full specification in equation(1)to two,the market factor and∆V IX.A two-factor pricing kernel with the market return and stochastic volatility as factors is also the standard set-up commonly assumed by many stochastic option pricing studies(see,for example,Heston,1993).Hence,the empirical model that we examine is:r i t =β0+βiMKT·MKT t+βi∆V IX·∆V IX t+εit,(3)where MKT is the market excess return,∆V IX is the instrument we use for innovations inthe aggregate volatility factor,andβiMKT andβi∆V IXare loadings on market risk and aggregatevolatility risk,respectively.Previous empirical studies suggest that there are other cross-sectional factors that have ex-planatory power for the cross-section of returns,such as the size and value factors of the Fama and French(1993)three-factor model(hereafter FF-3).We do not directly model these effectsin equation(3),because controlling for other factors in constructing portfolios based on equa-tion(3)may add a lot of noise.Although we keep the number of regressors in our pre-formation portfolio regressions to a minimum,we are careful to ensure that we control for the FF-3factors and other cross-sectional factors in assessing how volatility risk is priced using post-formation regression tests.We construct a set of assets that are sufficiently disperse in exposure to aggregate volatility innovations by sortingfirms on∆V IX loadings over the past month using the regression(3) with daily data.We run the regression for all stocks on AMEX,NASDAQ and the NYSE,with more than17daily observations.In a setting where coefficients potentially vary over time,a 1-month window with daily data is a natural compromise between estimating coefficients with a reasonable degree of precision and pinning down conditional coefficients in an environment with time-varying factor loadings.P´a stor and Stambaugh(2003),among others,also use daily data with a1-month window in similar settings.At the end of each month,we sort stocks into quintiles,based on the value of the realizedβ∆V IX coefficients over the past month.Firms in quintile1have the lowest coefficients,whilefirms in quintile5have the highestβ∆V IX loadings. Within each quintile portfolio,we value-weight the stocks.We link the returns across time to form one series of post-ranking returns for each quintile portfolio.Table1reports various summary statistics for quintile portfolios sorted by pastβ∆V IX over the previous month using equation(3).Thefirst two columns report the mean and standard deviation of monthly total,not excess,simple returns.In thefirst column under the heading ‘Factor Loadings,’we report the pre-formationβ∆V IX coefficients,which are computed at the beginning of each month for each portfolio and are value-weighted.The column reports the time-series average of the pre-formationβ∆V IX loadings across the whole sample.By con-struction,since the portfolios are formed by ranking on pastβ∆V IX,the pre-formationβ∆V IX loadings monotonically increase from-2.09for portfolio1to2.18for portfolio5.The columns labelled‘CAPM Alpha’and‘FF-3Alpha’report the time-series alphas of these portfolios relative to the CAPM and to the FF-3model,respectfully.Consistent with the negative price of systematic volatility risk found by the option pricing studies,we see lower average raw returns,CAPM alphas,and FF-3alphas with higher past loadings ofβ∆V IX.All the differences between quintile portfolios5and1are significant at the1%level,and a joint test for the alphas equal to zero rejects at the5%level for both the CAPM and the FF-3model.In particular,the5-1spread in average returns between the quintile portfolios with the highest and lowestβ∆V IX coefficients is-1.04%per month.Controlling for the MKT factor exacerbates the5-1spread to-1.15%per month,while controlling for the FF-3model decreases the5-1 spread to-0.83%per month.Requirements for a Factor Risk ExplanationWhile the differences in average returns and alphas corresponding to differentβ∆V IX loadings are very impressive,we cannot yet claim that these differences are due to systematic volatility risk.We will examine the premium for aggregate volatility within the framework of an uncon-ditional factor model.There are two requirements that must hold in order to make a case for a factor risk-based explanation.First,a factor model implies that there should be contemporane-ous patterns between factor loadings and average returns.For example,in a standard CAPM, stocks that covary strongly with the market factor should,on average,earn high returns over the same period.To test a factor model,Black,Jensen and Scholes(1972),Fama and French(1992 and1993),Jagannathan and Wang(1996),and P´a stor and Stambaugh(2003),among others,all form portfolios using various pre-formation criteria,but examine post-ranking factor loadings that are computed over the full sample period.While theβ∆V IX loadings show very strong patterns of future returns,they represent past covariation with innovations in market volatility. We must show that the portfolios in Table1also exhibit high loadings with volatility risk over the same period used to compute the alphas.To construct our portfolios,we took∆V IX to proxy for the innovation in aggregate volatil-ity at a daily frequency.However,at the standard monthly frequency,which is the frequency of the ex-post returns for the alphas reported in Table1,using the change in V IX is a poor approximation for innovations in aggregate volatility.This is because at lower frequencies,the effect of the conditional mean of V IX plays an important role in determining the unanticipated change in V IX.In contrast,the high persistence of the V IX series at a daily frequency means that thefirst difference of V IX is a suitable proxy for the innovation in aggregate volatility. Hence,we should not measure ex-post exposure to aggregate volatility risk by looking at how the portfolios in Table1correlate ex-post with monthly changes in V IX.To measure ex-post exposure to aggregate volatility risk at a monthly frequency,we follow Breeden,Gibbons and Litzenberger(1989)and construct an ex-post factor that mimics aggre-gate volatility risk.We term this mimicking factor F V IX.We construct the tracking portfolio so that it is the portfolio of asset returns maximally correlated with realized innovations in volatility using a set of basis assets.This allows us to examine the contemporaneous relation-ship between factor loadings and average returns.The major advantage of using F V IX to measure aggregate volatility risk is that we can construct a good approximation for innovations in market volatility at any frequency.In particular,the factor mimicking aggregate volatility innovations allows us to proxy aggregate volatility risk at the monthly frequency by simply cumulating daily returns over the month on the underlying base assets used to construct the mimicking factor.This is a much simpler method for measuring aggregate volatility innova-。
贝叶斯原理及应用贝叶斯定理(Bayes' theorem),是概率论中的一个结果,它跟随机变量的条件概率以及边缘概率分布有关。
在有些关于概率的解说中,贝叶斯定理能够告知我们如何利用新证据修改已有的看法。
通常,事件A在事件B(发生)的条件下的概率,与事件B在事件A的条件下的概率是不一样的;然而,这两者是有确定的关系,贝叶斯定理就是这种关系的陈述。
Bayes法是一种在已知先验概率与条件概率的情况下的模式分类方法,待分样本的分类结果取决于各类域中样本的全体。
作为一个普遍的原理,贝叶斯定理对于所有概率的解释是有效的;然而,频率主义者和贝叶斯主义者对于在应用中,某个随机事件的概率该如何被赋值,有着不同的看法:频率主义者根据随机事件发生的频率,或者总体样本里面的发生的个数来赋值概率;贝叶斯主义者则根据未知的命题来赋值概率。
这样的理念导致贝叶斯主义者有更多的机会使用贝叶斯定理Bayes方法的薄弱环节在于实际情况下,类别总体的概率分布和各类样本的概率分布函数(或密度函数)常常是不知道的。
为了获得它们,就要求样本足够大。
另外,Bayes法要求表达文本的主题词相互独立,这样的条件在实际文本中一般很难满足,因此该方法往往在效果上难以达到理论上的最大值。
1.贝叶斯法则机器学习的任务:在给定训练数据D时,确定假设空间H中的最佳假设。
最佳假设:一种方法是把它定义为在给定数据D以及H中不同假设的先验概率的有关知识下的最可能假设。
贝叶斯理论提供了一种计算假设概率的方法,基于假设的先验概率、给定假设下观察到不同数据的概率以及观察到的数据本身。
2.先验概率和后验概率用P(h)表示在没有训练数据前假设h拥有的初始概率。
P(h)被称为h的先验概率。
先验概率反映了关于h是一正确假设的机会的背景知识如果没有这一先验知识,可以简单地将每一候选假设赋予相同的先验概率。
类似地,P(D)表示训练数据D的先验概率,P(D|h)表示假设h成立时D的概率。
A - E∙Odd Aalen(英语:Odd Aalen) (1947–)∙戈特弗里德·阿亨瓦尔(英语:Gottfried Achenwall) (1719–1772) ∙亚伯拉罕·曼尼·艾德斯坦(英语:Abraham Manie Adelstein) (1916–1992)∙约翰·艾奇逊(英语:John Aitchison) (1926–)∙亚历山大·艾特肯(英语:Alexander Aitken) (1895–1967)∙赤池弘次 (1927–)∙奥斯卡·安德尔森(英语:Oskar Anderson) (1887–1960)∙彼得·阿米蒂奇(英语:Peter Armitage) (1924–)∙Raghu Raj Bahadur(英语:Raghu Raj Bahadur) (1924-1997)∙乔治·阿尔弗雷德·巴纳德(英语:George Alfred Barnard) (1915–2002)∙莫里斯·史蒂文森·巴特利特(英语:M. S. Bartlett) (1910–2002) ∙Debabrata Basu(英语:Debabrata Basu) (1924–2001)∙劳伦斯·巴克斯特(英语:Laurence Baxter) (1954–1996)∙迈克尔·巴克斯特(英语:Michael Baxter) (1956-)∙托马斯·贝叶斯 (1702–1761)∙唐·贝瑞(英语:Don Berry (statistician))∙艾伦·伯恩鲍姆(英语:Allan Birnbaum) (1923–1976)∙戴维·布莱克威尔(英语:David Blackwell) (1919–)∙拉迪斯劳斯·博特基威茨(英语:Ladislaus Bortkiewicz) (1868–1931) ∙钱德拉·鲍斯(英语:Raj Chandra Bose) (1901–1987)∙亨利·伯顿利(Henry Bottomley) (1963–)∙亚瑟·里昂·鲍利(英语:Arthur Lyon Bowley) (1869–1957)∙George Box(英语:George Box) (1919–)∙Bento de Jesus Caraça(英语:Bento de Jesus Caraça) (1901–1948) ∙伊恩·卡索耳(英语:Ian Castles)∙乔治·查尔默斯(英语:George Chalmers) (1742–1825)∙戴维·高恩·钱珀瑙恩(英语:D. G. Champernowne) (1912–2000) ∙卡尔·沙利叶(英语:Carl Charlier) (1862–1934)∙切比雪夫 (1821–1894)∙赫尔曼·切尔诺夫(英语:Herman Chernoff) (1923–)∙Alexey Chervonenkis(英语:Alexey Chervonenkis) (1938–)∙Alexander Alexandrovich Chuprov(英语:Alexander Alexandrovich Chuprov) (1874–1926)∙科林·克拉克(英语:Colin Clark) (1905–1989)∙理查德·威廉·巴恩斯·克拉克(英语:Richard W. B. Clarke) (1910–1975)∙罗伯特·汉密尔顿·考特(英语:Robert H. Coats) (1874–1960) ∙科克伦(英语:William Gemmell Cochran) (1909–1980)∙Timothy Augustine Coghlan(英语:Timothy Augustine Coghlan) (1856–1926)∙勒拿·库克(英语:Len Cook) (1949–)∙Gauss Moutinho Cordeiro(英语:Gauss Moutinho Cordeiro) (1952–) ∙大卫·科斯(英语:David Cox (statistician)) (1924–)∙Gertrude Mary Cox(英语:Gertrude Mary Cox) (1900–1978)∙理查德·思雷尔克德·考克斯(英语:Richard Threlkeld Cox) (1898–1991)∙哈拉尔德·克拉梅尔 (1893 – 1985)∙菲利普·大卫(英语:Philip Dawid) (1946–)∙德福内梯(英语:Bruno de Finetti) (1906–1985)∙爱德华兹·戴明 (1900–1993)∙戴康尼斯 (1945–)∙理查德·多尔 (1912–2005)∙Karen Dunnell(英语:Karen Dunnell) (1946–)∙ A. Ross Eckler(英语:A. Ross Eckler) (1901–1991)∙ A. Ross Eckler, Jr.(英语:A. Ross Eckler, Jr.) (1927–)∙弗雷德里克·莫尔顿·艾登(英语:Frederick Morton Eden) (1766–1809)∙弗朗西斯·伊西德罗·埃奇沃思(英语:Francis Ysidro Edgeworth)(1845–1926)∙Anthony William Fairbank Edwards(英语:A. W. F. Edwards) (1935–) ∙邱吉尔·艾森哈特(英语:Churchill Eisenhart) (1913–1994)∙恩格尔 (1821–1896)∙安格那·克瑞乐普·尔蓝(英语:Agner Krarup Erlang) (1878–1929) ∙陈希孺(1934-2005)(数理统计学家)[编辑] F - J∙威廉·法尔(英语:William Farr) (1807–1883)∙费希纳(英语:Gustav Fechner) (1801–1887)∙费利奇(英语:Ivan Fellegi) (1935–)∙霍华德·芬克尔(英语:Howard Finkel) (1950-)∙欧文·费雪 (1867–1947)∙罗纳德·费雪 (1890–1962)∙William Fleetwood(英语:William Fleetwood) (1656–1723)∙约翰·福克斯(英语:John Fox (statistician)) (1945–)∙Arnoldo Frigessi(英语:Arnoldo Frigessi) (1959–)∙乔治·盖洛普(英语:George Gallup) (1901–1984)∙法兰西斯·高尔顿 (1822–1911)∙Seymour Geisser(英语:Seymour Geisser) (1929–2004)∙Lyndhurst Giblin(英语:Lyndhurst Giblin) (1872–1951)∙罗伯特·吉芬 (1837–1910)∙科拉多·基尼(英语:Corrado Gini) (1884–1965)∙格拉斯 (统计学家)(英语:Gene V. Glass) (1940–)∙本杰明·贡培兹(英语:Benjamin Gompertz) (1779–1865)∙I. J. Good(英语:I. J. Good) (1916–)∙Phillip Good(英语:Phillip Good)(1937–)∙威廉·戈塞 (1876–1937) - 以其笔名“学生氏”(Student)著称∙约翰·格朗特(英语:John Graunt) (1620–1674)∙格林伍德(英语:Major Greenwood) (1880–1949)∙Emil Julius Gumbel(英语:Emil Julius Gumbel) (1891–1966)∙Dimitrie Gusti(英语:Dimitrie Gusti) (1880–1955)∙Pierre Gy(英语:Pierre Gy) (1924–)∙Steven Haberman(英语:Steven Haberman) (1951–)∙安德斯·哈尔德 (1913–)∙Jotun Hein(英语:Jotun Hein) (1956–)∙赫尔默特(英语:Friedrich Robert Helmert) (1843–1917)∙吉姆·亨迪(英语:Jim Hendy) (1905-1961)∙杰克·希伯特(英语:Jack Hibbert) (1932–2005)∙Joseph Hilbe(英语:Joseph Hilbe) (1944-)∙奥斯汀·布莱德福·希尔(英语:Austin Bradford Hill) (1897–1991) ∙Nils Lid Hjort(英语:Nils Lid Hjort) (1953–)∙Wassily Hoeffding(英语:Wassily Hoeffding) (1914–1991)∙赫尔曼·荷勒里斯(英语:Herman Hollerith) (1860–1929)∙蒂姆·霍尔特(英语:Tim Holt (statistician)) (1943–)∙Reginald Hawthorn Hooker(英语:Reginald Hawthorn Hooker) (1867–1944)∙哈罗德·霍特林 (1895–1973)∙达莱尔.哈夫(英语:Darrell Huff) (1913–2001)∙Ion Ionescu de la Brad(英语:Ion Ionescu de la Brad) (1818–1891) ∙约瑟夫·奥斯卡·欧文(英语:Joseph Oscar Irwin) (1898–1982) ∙石川馨 (1915–1989)∙比尔·詹姆斯(英语:Bill James) (1949-)∙埃德温·汤普森·杰尼斯(英语:Edwin Thompson Jaynes) (1922–1998) ∙Gwilym Jenkins(英语:Gwilym Jenkins) (1933–1982)∙William H. Jefferys(英语:William H. Jefferys) (1940–)∙哈罗德·杰弗里斯(英语:Harold Jeffreys) (1891–1989)∙ E. Morton Jellinek(英语:E. Morton Jellinek) (1890–1963)∙威廉姆·斯坦利·杰文斯 (1835–1882)∙诺曼·劳埃德·约翰逊(英语:Norman Lloyd Johnson) (1917–2004) ∙约瑟夫·M·朱兰(英语:Joseph M. Juran) (1904–)∙James Jurin(英语:James Jurin) (1684-1750)∙方开泰∙郭祖超(1912-)[编辑] K - N∙大卫·乔治·肯德尔(英语:David George Kendall) (1918–)∙莫里斯·肯得尔(英语:Maurice Kendall) (1907–1983)∙约翰·景文(英语:John Kingman) (1939–)∙莱斯利·基什(英语:Leslie Kish) (1910–2000)∙安德雷·柯尔莫哥洛夫 (1903–1987)∙Jaromír Korčák(英语:Jaromír Korčák) (1895–1989)∙Joseph Kruskal(英语:Joseph Kruskal) (1929–)∙William Kruskal(英语:William Kruskal) (1919–2005)∙蓝光国(英语:Gordon Lan)∙埃蒂恩·拉斯贝尔(英语:Etienne Laspeyres) (1834–1913)∙约翰·劳(英语:John Law (economist)) (1671–1729)∙Lucien le Cam(英语:Lucien le Cam) (1924–2000)∙威尔赫姆·莱克希斯(英语:Wilhelm Lexis) (1837–1914)∙Jarl Waldemar Lindeberg(英语:Jarl Waldemar Lindeberg) (1876–1932)∙Dennis Lindley(英语:Dennis Lindley) (1923–)∙Yuri Vladimirovich Linnik(英语:Yuri Vladimirovich Linnik) (1915–1972)∙阿弗雷德·洛特卡 (1880–1949)∙菲利普·伦德伯格(英语:Filip Lundberg) (1876–1965)∙Prasanta Chandra Mahalanobis(英语:Prasanta Chandra Mahalanobis)(1893–1972)∙南希·曼特尔(英语:Nathan Mantel) (1919–2002)∙唐纳德·麦夸尔特(英语:Donald Marquardt) (1929–1997)∙Motosaburo Masuyama(英语:Motosaburo Masuyama) (1912–2005) ∙Anderson Gray McKendrick(英语:Anderson Gray McKendrick) (1876–1943)∙克劳斯·莫瑟(英语:Claus Moser, Baron Moser) (1922–)∙菲德里克·莫斯提勒(英语:Frederick Mosteller) (1916–2006) ∙威廉·纽马奇(英语:William Newmarch) (1820–1882)∙耶日·奈曼(英语:Jerzy Neyman) (1894–1981)∙弗罗伦斯·南丁格尔 (1820–1910)∙刘大钧∙孟晓犁 (1963–)[编辑] P - T∙伊曼纽尔·帕尔逊(英语:Emanuel Parzen) (1929–∙伊根·皮尔逊(英语:Egon Pearson) (1895–1980)∙卡尔·皮尔逊 (1857–1936)∙威廉·配第 (1623–1687)∙ E.J.G. Pitman(英语:E.J.G. Pitman) (1897–1993)∙威廉·普莱菲(英语:William Playfair) (1759–1823)∙乔治·波利亚 (1887–1985)∙Alwyn Pritchard(英语:Alwyn Pritchard) (1947–)∙凯特勒 (1796–1874)∙霍华德·瑞法(英语:Howard Raiffa)∙Ştefan Ralescu(英语:Ştefan Ralescu) (1952-)∙ C.R. 劳(英语:C.R. Rao) (1920–)∙Georg Rasch(英语:Georg Rasch) (1901–1980)∙Šarūnas Raudys(英语:Šarūnas Raudys) (1941-)∙Olav Reiersøl(英语:Olav Reiersøl) (1908–2001)∙Brian Ripley(英语:Brian Ripley) (1952–)∙赫伯特·罗宾斯(英语:Herbert Robbins) (1922–2001)∙Irving Rosenwater(英语:Irving Rosenwater) (1932–2006)∙S. N. Roy(英语:S. N. Roy) (1906–1966)∙Jeff Sagarin(英语:Jeff Sagarin)∙Leonard Jimmie Savage(英语:Leonard Jimmie Savage) (1917–1971) ∙罗伯特·施赖弗(英语:Robert Schlaifer) (1915–1994)∙阿瑟·舒斯特(英语:Arthur Schuster) (1851–1934)∙豪威·斯奇瓦布(英语:Howie Schwab)∙Tore Schweder(英语:Tore Schweder) (1943–)∙伊丽莎白·斯科特(英语:Elizabeth Scott) (1917–1988)∙Pyotr Semenov-Tyan-Shansky(英语:Pyotr Semenov-Tyan-Shansky)(1827–1914)∙劳埃德·夏普利(英语:Lloyd Shapley) (1923–)∙William Fleetwood Sheppard(英语:William Fleetwood Sheppard)(1863–1936)∙沃特·阿曼德·休哈特(英语:Walter A. Shewhart) (1891–1967) ∙赫伯特·希雪(英语:Herbert Sichel) (1915–1995)∙约翰·辛克莱(英语:John Sinclair (writer)) (1754–1835)∙尤金·斯勒茨基(英语:Eugen Slutsky) (1880–1948)∙阿德里安·史密斯(英语:Adrian Smith (academic))∙Cedric Smith(英语:Cedric Smith) (1917–2002)∙斯内德克(英语:George W. Snedecor) (1881–1974)∙卡尔斯奈德(英语:Carl Snyder) (1869–1946)∙查尔斯·斯皮尔曼(英语:Charles Spearman) (1863–1945)∙David Spiegelhalter(英语:David Spiegelhalter) (1953–)∙乔赛亚·斯坦普(英语:Josiah Stamp, 1st Baron Stamp) (1880–1941) ∙斯蒂芬·施蒂茨勒(英语:Stephen Stigler)∙Dietrich Stoyan(英语:Dietrich Stoyan) (1940–)∙田口玄一 (1924–)∙Thorvald N. Thiele(英语:Thorvald N. Thiele) (1838–1910)∙约翰·塔基(英语:John Tukey) (1915–2000)[编辑] U - Z∙杰西卡·乌兹(英语:Jessica Utts)∙弗拉基米尔·万普尼克 (1935 - )∙亚伯拉罕·瓦尔德(英语:Abraham Wald) (1902–1950)∙克里斯·华莱仕(英语:Chris Wallace (computer scientist)) (1933–2004)∙Waloddi Weibull(英语:Waloddi Weibull) (1887–1979)∙阿诺德·温斯托克(英语:Arnold Weinstock) (1924–2002)∙Harald Ludvig Westergaard(英语:Harald Ludvig Westergaard) (1853–1936)∙Peter Whittle(英语:Peter Whittle) (1927–)∙威尔科克森(英语:Frank Wilcoxon) (1892–1965)∙Martin Wilk(英语:Martin Wilk) (1922–)∙Samuel Stanley Wilks(英语:Samuel Stanley Wilks) (1906–1964) ∙约翰·维希特(英语:John Wishart (statistician)) (1898–1956) ∙Herman Wold(英语:Herman Wold) (1908–1992)∙伍佛维兹(英语:Jacob Wolfowitz) (1910–1981)∙Frank Yates(英语:Frank Yates) (1902–1994)∙Arthur Young(英语:Arthur Young (writer)) (1741–1820)∙犹勒(英语:Udny Yule) (1871–1951)∙齐普夫(英语:George Kingsley Zipf) (1902–1950)∙阿诺德·泽尔纳(英语:Arnold Zellner) (1927–)∙许宝騄(1910-1970)∙薛仲三。
Bayesian surface estimation forwhite light interferometryMichael Hissmann∗Fred A.Hamprecht†2004/07/05AbstractIn conventional white light interferometry(WLI)surface estimation, data acquisition is followed by a preprocessing step in which the3D(space-time)data is reduced to a2D surface or height map.Finally,a postprocessing step eliminates the height outliers that arise from thepreprocessing under ordinary experimental conditions.We introduce a Bayesian approach that unifies pre-and postprocessing by considering simultaneously both the full3D data set and knowledgeconcerning the surface smoothness.This knowledge is coded into the priorprobability of local height configurations.The surface is estimated as themode of the marginal posterior at each pixel.An adept formulation ofthe prior allows for the exact computation of the estimate,obliviatingthe need to sample from the posterior using Markov Chain Monte Carlo(MCMC)methods.A complete surface can thus be obtained in3–30s.A quantitative comparison with(adaptive)medianfiltering shows thatall three approaches decimate outliers,but that the Bayesian estimationleads to smaller average absolute errors.For slow scanning speeds andgood raw data,this is due to a reduced tendency to oversmooth;whilefor poor input data,the enhancement is explained by the more completeexploitation of the observations.1Introduction1.1Significance of white light interferometryWhite light interferometry(WLI)has become an important visual inspection tool infields which require high precision surface height maps.With its short coherence length,WLIfills the precision gap between mechanical ball testing devices(>μm scale)and laser interferometers(<μm scale).While mechanical devices produce1D height profiles only,interferometers can acquire a2D height map in one take,which both opens new analytic possibilities and raises new questions concerning the standardized characterization of surfaces.∗M.Hissmann is with the Robert Bosch GmbH,Stuttgart,Germany(e-mail: michael.hissmann@).†F.A.Hamprecht is with the Interdisciplinary Center for Scientific Computing,University of Heidelberg,Germany(e-mail:fred.hamprecht@iwr.uni-heidelberg.de).1The development of white light interferometers for industry wasfirst guidedby the requirements of semiconductor wafer inspection.In recent years,espe-cially in the automotive sector,the precision of mechanical component man-ufacture has exceeded that of mechanical testers.Machined surfaces that areto be inspected typically exhibit a roughness with a standard deviation above λ/2≈0.4μm(λ:center wavelength of the interferometer)and can usually be considered optically rough.For in-line quality control during the manufacturing process,the require-ments often include robust data acquisition and processing at high speed,whereas a height resolution near or beyond the roughness limit is not needed.In this paper,we propose a new approach for the denoising of WLI data which,although of general applicability,is particularly well suited for the sce-nario depicted above.1.2Data acquisitionFor our purpose,we consider a white light interferometer as a conventional two-arm Michelson interferometer with a broad-band light source.One arm is delimited by the surface under investigation,the other by the reference object, usually a plane mirror.The change of the interference pattern caused by a systematic variation of the length of one of the optical paths is recorded in full spatial detail by means of a camera.The recorded dataset consists of a time series for each camera pixel.Each time series contains noise as well as an oscillatory signal centered around the surface height at that pixel,which we aim to detect.WLIs have been proposed as measuring devices in[1],[2]and [3].Recent technical developments address curved surfaces of high industrial relevance[4],[5].The ideal WLI signal has the following form for each pixel(cf.[1]for details):I(z)=¯I+A(z−z0)cos(ϕ(z)).(1) Here,¯I denotes the incoherent intensity at a pixel and A(z)the envelope of the interference signal.Its width is inversely proportional to the spectral width of the light source.For commonly used LED sources,it is approximately a Gaussian centered around z0,the height relative to the reference surface that is to be estimated.The cosine term with phaseϕ(z)comes from the fast inter-ference modulation and is sometimes denoted“inner oscillation”of the signal. Especially for rough surfaces,which the backscattered intensity follows an expo-nential probability distribution[6],the amplitude is often very low.In addition, the signal is degraded with photonic and electronics noise,seefigure1.1.3Conventional height map estimationThe scanning procedure yields a3D data set,consisting of a time series for each pixel in a2D plane.The task then is to embed a surface into the data, thus collapsing each pixel’s height dimension into a unique height value.This surface is subject to restrictions imposed by both the data and prior knowledge concerning the smoothness.In conventional WLI processing,thefinal height map is obtained in three stages:21.The interferometer is translated perpendicularly to the surface under in-vestigation,while its CCD-sensor records the optical interference pattern in a(time/height)series of frames.2.In the preprocessing stage,the center of the interference pattern(centralfringe)is estimated,separately,for each pixel’s time series and stored in a2D matrix,the height map.3.In the postprocessing stage,outliers in the height map are detected,andpossibly corrected,using prior knowledge.Most tools available for gray-value image processing can be adopted for this task.In this paper,we propose an approach completely different in nature(section 2),and compare it to alternative methods detailed in section3.Results are presented in section4.2Bayesian surface estimationWe use a Bayesian approach to amalgamate the last two steps in the scheme above:We aim to reconcile all locally available evidence(that is,the entire noisy time series of a pixel,along with those of its neighbors)with our prior knowledge in a single step.This is in contrast to all previous work,which lets3a postprocessing stage make do with what reduced information is passed on by the preprocessing stage.2.1BackgroundExtending the approach by Hartvig and Jensen[7]from spatially resolved binary to spatially resolved time series data,we aim to accommodate information from both observations and prior knowledge in a Bayesian framework as follows: Consider a set S of pixel sites and a pixel index j∈S.The possible height values h j for each pixel in the height map are from afinite,discrete set˜H,which for convenience we map to H={1,...,h max}.A2D height map h={h j}is apoint in the product space H1×···×H|S|.During the scan process,the camera acquires a3D data set composed of intensities for all pixels and all scan steps/height values.We denote these 3D recording x={x j}with each x j a1D time series of h max intensities. We assume the scanning process is stable in the sense that the height-to-index mapping˜H→H is linear,so that we can calculate height values directly from the framerate and interframe distance of our interferometer.We will not address interpolation between height values of two adjacent camera frames,although especially for higher speeds,interpolation to afiner discretization is a useful and viable extension(cf.[8],[9]for interpolation strategies in preprocessing).We look for a2D height map estimateˆh,given as the marginal posterior mode estimate(MPME)of the conditional probability density P(h|x).Accord-ing to the Bayes paradigm,for each pixel j,P(h j|x j)∼f(x j|h j)P(h j)(2) P(h j|x j)is the a posteriori probability of a height value h j given the data x j. The likelihood f(x j|h j)describes the probability of the observations given a height value and is determined by the technical characteristics of the interfero-meter.It plays the role of a data term,linking our estimate to the observations. The a priori probability P(h j)carries our prior knowledge on the surface,e.g. in the form of local smoothness constraints.Let C indicate the union of a pixel and its immediate neighborhood.For convenience,we index the elements of C with0,...,k=|C|−1,and reserve the index0for the central pixel.Throughout the later examples,k=8is used and C includes the8nearest neighbors of the central pixel.We may write:P(h C|x C)∼f(x C|h C)P(h C)(3) with h C the configuration of height values and x C all observations in C.Tofind the marginal posterior probability of a height value for the central pixel given all observations in the surroundings,we sum over all possible height configurations of the neighbors:P(h0|x C)=h maxh1=1···h maxh k=1P(h C|x C)(4)If the aperture is chosen such that the subjective speckle size matches the cam-era’s pixel dimensions,the likelihoods at neighboring pixels are approximately,4but not entirely,independent.Assuming independence,we can factorize the likelihood and obtainP (h 0|x C )∼h max h 1=1···h max h k =1f (x C |h C )P (h C )(5)∼f (x 0|h 0)h maxh 1=1···h max h k =1k j =1f (x j |h j )P (h C )(6)The vast number of possible height configurations in the neighborhood is re-flected in the nested summations above.It is obvious that a direct evaluation is computationally prohibitive,even for small neighborhoods:for h max =1000height steps in an 8-pixel neighborhood (k =8),we face of the order of |H|k +1=1027operations to find a height estimate for each pixel,provided the likelihoods have been precalculated.However,the estimate can be transferred to the realm of the feasible for a limited class of prior probabilities,by using a manipulation proposed by Hartvig and Jensen [7]and detailed below.In our application,we wish to cast our prior knowledge,that the surfaces under investigation exhibit smoothness on a macroscopic scale,into a prior probability.A simple way of expressing this knowledge is to assume that those configurations,in which no neighbor has a vertical distance larger than δfrom the central height value,have a larger probability than the others:P (h C )= q 1if h 1,...,h k ∈[−δ+h 0,δ+h 0]q 0otherwise(7)q 0and q 1obey the following normalization condition (which neglects minor boundary effects):1=q 1(2δ+1)k ·h max +q 0(h max k +1−(2δ+1)k ·h max )(8)The enormity of the configuration space leads to small and very small numerical values for q 1and q 0.Since we are only interested in the mode and to avoid numerical instabilities,both can be rescaled.The algorithm proposed thus far depends on two parameters,δand the ratio q 0/q 1.We suggest choosing δto be of the same order as the surface roughness.The best retio q 0/q 1can be found through screening a small range of values typical for the specific surface.See section 4and table 1for the appropriate values for the sample piece depicted in figure 2.Using the prior in eq.(7),we can write P (h 0|x C )∼f (x 0|h 0)⎛⎝h max h 1=1···h max h k =1k j =1f (x j |h j )q 0+h 0+δ h 1=h 0−δ···h 0+δ h k =h 0−δk j =1f (x j |h j )(q 1−q 0)⎞⎠(9)It is now possible to make use of the following equality:h 1··· h k k j =1f (x j |h j )=k j =1 h f (x j |h ),(10)5which can be proven easily via induction over k .We can now rearrange P (h 0|x C )to obtain a computable formula:P (h 0|x C )∼f (x 0|h 0)⎛⎝q 0k j =1h max h =1f (x j |h )+(q 1−q 0)k j =1h 0+δ h =h 0−δf (x j |h )⎞⎠(11)The number of operations required for each pixel is significantly reduced to the order of k ·h max =8000in our example.The simple a priori density of eq.(7)allows further speed-up by implementing a sliding summation and other efficient routines.Our final implementation reaches practicable execution times (see section 4).Note that any prior that reduces to a fixed number of constants independent of the likelihood,can be used to transform the nested a posteriori expression.While less restrictive priors from this class,investigated in [7],may better re-flect our prior knowledge on the surface,they also reduce the achievable com-putational simplification.Also note that we can compute our estimate exactly,avoiding the sampling of the posterior by means of Gibbs or Metropolis sampling that is customary in spatial Bayesian inference [10],[11].Likelihood functions There are two major paths to a likelihood function f (x j |h j ):On one hand,the likelihood can be obtained from a physical modeling of the WLI signal formation process.This strategy is the most accurate,but is ham-pered by our restricted knowledge on the optical scattering processes and the technical inaccuracies of the interferometer’s components.In addition,a proper modeling will take into account the unknown phase of the inner oscillation,which leads to a large computational burden.On the other hand,there is the empirical route to a likelihood function.This phenomenological approach mimics the characteristics of a properly derived likelihood function.An ideal function would feature a strong response in the vicinity of the true height value and a weak response to the noise elsewhere in the time series.The methods proposed and established by various authors (an overview is given by [12],[13])for interferogram preprocessing are a good approximation.For the results presented in the next sections,we chose the second path and used a quasi-likelihood obtained from low-pass filtered finite differences of the time series in a pixel.3Methods 3.1Data acquisitionWe recorded interference patterns of a part of a turned steel piece,19.5mm in diameter,that features three circular steps of 20μm height difference each,pro-duced using a new tool,and additionally smaller tracks,obtained using a worn6Figure2:Photograph(full view)of the piece(diameter:19.5mm)used in the subsequent analyses.turning tool(see picture,figure2).This surface exhibits various characteristics that are typically found in the industrial application of WLI.To challenge the algorithms,we recorded interferograms at scan speeds of 14μm/s(which,at a framerate of50Hz,corresponds to the Nyquist frequency of the inner oscillation of the interferogram),28μm/s,56μm/s,84μm/s and 112μm/s(which corresponds to an8-fold subsampling).Two exemplary results are shown infigures3and4.These results were obtained with a C++implementation of the algorithm. On a1.2GHz P IIIm machine,the processing time varied between31s for14μm/s and2.8s for112μm/s,including the raw data input from hard disk.3.2Height map estimationWe compare the Bayesian method introduced here with other established ap-proaches detailed in the following.For all evaluations,a3×3pixel neighborhood orfilter mask was chosen.Preprocessing only A pixelwise detection of interference patterns with no postprocessing yields,in the case of rough surfaces,a height map which is heavily contaminated by outliers that render further analysis difficult or impossible. These results are included only to allow for an unbiased comparison of the remaining methods.Medianfilter mask Filtering the noisy height map obtained from prepro-cessing with a medianfilter mask is simple and a common approach.As we are dealing mainly with high amplitude outliers,the medianfilter seems appropri-ate.The rank orderfilter provides optimal robustness against outliers(see e.g.[14])and preserves edges well.On the other hand,thisfilter does not adapt7Figure3:A height map of the sample piece obtained with the Bayesian algo-rithm at14μm/s.Scale:0.28μm/frame.to the quality of the original data and so tends to oversmoothing in areas with little or no outliers.Outlier-sensitive medianfilter The basic idea is to replace only suspectedoutliers with the median of the neighborhood,and to leave all other height values unscathed.The medianfilter itself provides a way of detecting outliers:if applied toa central pixel of height h0,it yields˜h0=med{h0,...,h k}.For an outlying central height h0,the difference|h0−˜h0|is outside the usual range which reflects the height variations of a rough surface.An optimum distinction requires athreshold that adapts locally to the fraction of outliers.In a study on breakdown points,Hampel[15]proposed to identify outliers by their larger-than-normal variability,based on|h0−˜h0|≥c mad{h0,...,h k}.(12) Here,mad{}denotes the median of absolute deviations(MAD)of the sample. The parameter c can be optimized by simulation[16].We chose c to minimize the average absolute error of the estimate(see section4).8Figure4:A height map of the sample piece obtained with the Bayesian algo-rithm at112μm/s.Scale:2.24μm/frame.Note the coarse height steps here.3.3Reference height mapNo other measurement method delivers height maps that are directly compa-rable to those from WLI.In spite offirst efforts to establish a correspondence between results from WLI and tactile devices[17],their relation is as yet un-resolved for real-world surfaces.Instead,we recurred to a partially recursiveargument by constructing a reference height map from repeated high-quality WLI measurements.To this end,we performed a series of25measurements un-der optimum conditions(balanced illumination,minimal vibrations,constanttemperature and a slower scan speed of2μm/s).The data of one recording was spoilt and had to be removed.The remaining time series were free ofdeterioration,according to visual inspection of the raw data.We applied the sliding average of absolutefinite differences as preprocessingalgorithm,which is a recommended linearfiltering algorithm[13],to produceheight mapsˆh l.Overall shifts of the maps were detected with the image-wise median value and then corrected by moving the height maps to a common refer-ence height.Tilts and potential higher order deformations were not accountedfor.The pixelwise median height map h medh med=med{ˆh1,...,ˆh24}(13) exhibits a small number of invalid pixels,as expected for rough surfaces.These9Figure5:The reference height map.sites could easily be detected by the large variation of height values reported from preprocessing.The median absolute deviation over24recordings gives a robust estimate of this variation:mad{h med}=med{|ˆh1−h med|,...,|ˆh24−h med|}(14) By heuristics,we marked those sited having a mad≥30as defective and re-placed their values by the median of the immediate neighborhood:h ref=med3×3h med for mad{h med}≥30fr.h med otherwise(15)The reference map h ref is depicted infigure5.4ResultsFour algorithms were compared in the benchmarking:1.Sliding average of absolutefinite differences,the standard preprocessingfor rough surface WLI,2.the same sliding average,followed by a3×3medianfilter mask post-processing,3.the sliding average followed by the adaptive3×3medianfilter mask(section3.2),and10Table1:Optimum parameters for the algorithms under comparison by the measure of eq.(16).The medianfilter has no parameter other than the mask size.scanning pre-adaptive Bayesianspeed h max processing median estimationwindow size threshold cδq0/q114μm/s289fr9fr15.46fr10−228μm/s144fr5fr10.74fr10−256μm/s72fr3fr0.54fr10−484μm/s49fr2fr05fr10−4112μm/s37fr2fr03fr10−4Table2:Absolute error per pixel(E pp)for the algorithms under comparison. For the reasons given in section4,the signal to noise ratio of the raw data is lowest for84μm/s.scanning preproces-median adaptive Bayesianspeed sing onlyfilter median estimation14μm/s0.55μm0.69μm0.56μm0.45μm28μm/s0.71μm0.78μm0.72μm0.68μm56μm/s1.27μm1.02μm1.02μm0.89μm84μm/s4.84μm2.43μm2.43μm1.59μm112μm/s2.69μm1.64μm1.64μm1.17μm4.the Bayesian algorithm with a3×3neighborhood,as described in section2. The implementation of the sliding average algorithm and the setting for its parameter(size of sliding mean window)follows ref.[13].The benchmarking data(see section3.1)were obtained under real-world conditions with a suboptimal illumination,and sequences of25recordings were acquired during normal business hours in our laboratory.We computed a scalar error estimate to optimize the respective parameters of all algorithms,which can be found in table1,and to provide a coarse quality measure.The25height maps calculated with each algorithm were rescaled to the scale of the reference map and any overall height shift was removed by comparison of the image-wise median value with that of the reference map.The mean absolute difference per pixel E pp between the25estimated height maps and the reference map is not a robust estimator and thus penalizes left-over outliers:E pp=125·1|S|·25l=1|S|j=1|h j ref−ˆh j l|(16)Table2gives the results for optimized parameters.For84and112μm/s, the optimal adaptive medianfilter is not adaptive,the reasons being that over-smoothing does not occur with large interframe distances and that outliers be-11020406080100120140160501001502002501234567891011(see section 3)of interferograms acquired at 0.56μm /frame.Logarithmized occupancies of the histogram are plotted as a function of estimation error and variability of a height estimate in repeated trials,see text for details.come frequent.Not surprisingly,the error generally increases with higher scan speeds,both due to the more difficult localization of the interference pattern in the subsampled signals,and the coarser discretization of permissible height values.Also,the higher density of erroneous pixels leads to more and larger clusters that cannot be corrected with the 3×3filter masks used here.The results for 84μm /s are,however,particularly inaccurate.The reason is that this sampling rate corresponds almost exactly to four entire periods of the interferogram,thus leading to a weak response in a preprocessing based on finite differences.This is not the case with the data obtained at 112μm /s,which therefore leads to better height estimates.For a more detailed analysis of the estimation error,we present 2D his-tograms in which the distribution of estimation errors is given as a function of error magnitude and the difficulty of obtaining a correct estimate for a pixel.The latter was measured by the pixelwise spread of the estimates ˆhl obtained by preprocessing only.This spread was,in turn,quantified by the median absolute deviation (MAD,see section 3.2)of the estimates.These histograms can be computed for each estimator,and we will refer to those as histogram(preprocessing only),histogram(median filter),etc.Figure 6shows histogram(preprocessing only).The strong peak in the lower left corner indicates that the vast majority of pixels has a variability of about125-10frames and is estimated with an absolute error of5-10frames.The hori-zontal ridge contains all pixels of high MAD,which are nonetheless estimated accurately sometimes.The vertical ridge stems from pixels which give outlying height estimates in a number of measurements smaller than the break-point of the MAD-estimator.For a direct comparison of the different methods we propose to inspect the difference of two histograms.Figure7shows the small-error range of histogram(medianfilter)–histo-gram(preprocessing only).The medianfilter eliminates outliers;however,these are outnumbered by far by the remaining pixels and do not contribute signif-icantly to the distribution shown.Instead,thefigure shows the impact of the medianfilter on“ordinary”pixels,most of which are affected by thefiltering operation,and mostly adversely:excessive smoothing leads to a deterioration of height estimates from the preprocessing,as manifested by the large depletion and accumulation,respectively,in the absolute error ranges around4and9 frames.In summary,the medianfilter smoothes in regions in which it is not necessary and thus“destroys”the microstructure of the surface.Inspection of histogram(median)(not shown)reveals that the bulk of the distribution resides in an error range of3−5frames,which hints at a surface roughness of the magnitude of1.5−3μm on the scale of3×3pixels,that is, on a horizontal scale of about100μm.The adaptive medianfilter based on Hampel’s outlier detector reduces over-smoothing;the effect is already visible in table2for slow scanning speeds,and is also pronounced in histogram(adaptive medianfilter)–histogram(preprocessing), seefigure8.Note that the color scales infigures7through9are different:the adaptive medianfilter changes less height values than the medianfilter.Figure9shows that Bayesian surface estimation introduces the least bias. Since the histograms are sparsely populated for large MAD-values,only a sum-mary analysis is reasonable:Figure10shows the(cumulative)“marginal distribution”for MAD≥20 frames(it is not a proper distribution since the histogram difference images attain values below zero and integrate to zero).The profiles show that Bayesian denoising reduces errors also in the high-variability regime.5ConclusionBayesian inference can accommodate both experimental observations and smooth-ness assumptions in the estimation process,without imposing a pipeline struc-ture in which the postprocessing loses sight of the original data.If the smooth-ness assumptions can be expressed in a simple prior on the local height con-figurations,full Bayesian inference without recourse to stochastic sampling is possible.The procedure introduced in section2gives estimates that are more accurate than the(adaptive)medianfiltering we compare to,in the sense of minimizing the average absolute error.The difference in accuracy is more pronounced in the case of weak signal to noise ratios and fast scanning speeds,cf.the last two lines in table2.The computational cost of the method presented here is of the order of seconds and increases with the scan length h max.131234567892468101214−−−−−−0100020003000tering minus error histogram of height maps obtained from preprocessing.In short:histogram(median filter)–histogram(preprocessing).Scale:0.56μm /frame.141234567892468101214−−−05001000Scale:0.56μm /frame.In summary,if the measurements are of high quality and a simple post-processing is desired,we recommend the adaptive median filter.For a more accurate description of the microtopology,of if the input data is of low quality,Bayesian surface estimation appears a better choice.AcknowledgmentThe authors would like to thank Chr.Hennig (Seminar for Statistics,ETH Z¨u rich)for pointing out the possibility of using the Hampel detector in an adaptive filtering scheme.Support from Robert Bosch GmbH is gratefully ac-knowledged.References[1]Thomas Dresel,Gerd H¨a usler,and Holger Venzke.Three-dimensional sens-ing of rough surfaces by coherence radar.Appl.Opt.,31(7):919–925,1992.[2]James C.Wyant.How to extend interferometry for rough-surface tests.Laser Focus World ,29(9):131–135,1993.[3]Peter de Groot and Leslie Deck.Three-dimensional imaging by sub-Nyquistsampling of white-light interferograms.Opt.Lett.,18(17):1462–1464,1993.151234567892468101214−−−−05001000150020002500Scale:0.56μm /frame.[4]David J.Aziz.Interferometric measurement of surface roughness in enginecylinder walls.Opt.Eng.,37(5):1429–1434,1998.[5]Peter de Groot and Xavier Colonna de Lega.Valve cone measurement usingwhite light interference microscopy in a spherical measurement geometry.Opt.Eng.,42(5):1232–1237,2003.[6]D.W.Goodman.Statistical Properties of Laser Speckle Patterns .2ndedition,1984.[7]Niels Væver Hartvig and Jens Ledet Jensen.Spatial mixture modeling offMRI data.Hum.Brain Mapping ,11:233–248,2000.[8]Akiko Harasaki,Joanna Schmit,and James C.Wyant.Improved vertical-scanning interferometry.Appl.Opt.,39(13):2107–2115,2000.[9]Min-Cheol Park and Seung-Woo Kim.Direct quadratic polynomial fittingfor fringe peak detection of white light scanning interferograms.Opt.Eng.,39(4):952–959,2000.[10]Stuart Geman and Donald Geman.Stochastic relaxation,Gibbs distribu-tions,and the Bayesian restoration of images.6(6):721–741,1984.[11]Gerhard Winkler.Image Analysis,Random Fields and Markov ChainMonte Carlo Methods .Springer,Berlin,Heidelberg,New York,2nd edition,2003.16。
《信号与噪音》——定量预测学经典在上届总统竞选中,奈特·西尔弗(Nate Silver)准确地预测了每个州的竞选结果,只在2008年对一个州做出了错误预测。
2012年,他预测奥巴马的直接得票与实际统计数值的差距应该在千分之一以内。
他的预测能力似乎不可思议。
在对奥巴马胜选的早期和持续跟踪预测中,他均领先于众多投票组织以及我的政治学者同事们。
他的著作《信号与噪音》(The Signal and the Noise)是当之无愧的畅销书,然而,此书可能会使购买者感到吃惊。
书中仅有一篇简短章节涉及选举的预测,其用笔较之对橄榄球、天气和国际象棋的预测更为简略。
实际上,他为预测学撰写了一部严谨的专著,不含任何理论数学,仅针对普通读者。
西尔弗的书包罗万象,预测范围从扑克到天气变化、从地震到恐怖主义。
我们知道,较之人类其他活动,尽管我们大量收集了棒球场上每个球员的统计数据,但是经验丰富的球探对球员表现的预测远比计算机更准确。
由于玩扑克运气和技术参半,专业玩家靠与富有的业余玩手们博弈来维持生计。
地震预测这一章节较长,我们从中了解到:虽然人类擅于估测地震,但它们“其实根本无法预测”。
经济学家也持同样谨慎态度,因为他们鲜能准确预测次年的经济增长。
他们的预测模式也许考究,西尔弗说,但“他们用以预测的原始数据还不够有力。
”关于预测,最瞩目的成功案例莫过于预言飓风将席卷何处。
在过去的25年,准确定位飓风登陆地区的能力提高了12倍。
西尔弗说,同时,新闻广播会故意报高下雨的几率,因为他们知道,听众发现没有下雨、自己不用带伞时会心生感激。
但他并未将“高度数学化和数据驱动的技术”置之一旁,他告诫气候模型专家不要将温度和海平面的精确变化公诸于众。
他还谈及出席的一场有关恐怖主义的会议,其中,一位可口可乐的销售主管和一名约会服务顾问被问到如何识别自杀式炸弹。
海量信息造就了这个数据爆炸的时代。
西尔弗转述了IBM的估算——每天有2.5万兆(计量单位中1后面有17个0)的新字节(电脑中的八位二进制数字序列,每一位编码一个文本字符)产生,记录了一切信息,包括从你昨天所买牙膏的品牌到今早给朋友打电话时的所在位置。
s Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media atten-tion of late. What is all the excitement about?This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges in-volved in real-world applications of knowledge discovery, and current and future research direc-tions in the field.A cross a wide variety of fields, data arebeing collected and accumulated at adramatic pace. There is an urgent need for a new generation of computational theo-ries and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. These theories and tools are the subject of the emerging field of knowledge discovery in databases (KDD).At an abstract level, the KDD field is con-cerned with the development of methods and techniques for making sense of data. The basic problem addressed by the KDD process is one of mapping low-level data (which are typically too voluminous to understand and digest easi-ly) into other forms that might be more com-pact (for example, a short report), more ab-stract (for example, a descriptive approximation or model of the process that generated the data), or more useful (for exam-ple, a predictive model for estimating the val-ue of future cases). At the core of the process is the application of specific data-mining meth-ods for pattern discovery and extraction.1This article begins by discussing the histori-cal context of KDD and data mining and theirintersection with other related fields. A briefsummary of recent KDD real-world applica-tions is provided. Definitions of KDD and da-ta mining are provided, and the general mul-tistep KDD process is outlined. This multistepprocess has the application of data-mining al-gorithms as one particular step in the process.The data-mining step is discussed in more de-tail in the context of specific data-mining al-gorithms and their application. Real-worldpractical application issues are also outlined.Finally, the article enumerates challenges forfuture research and development and in par-ticular discusses potential opportunities for AItechnology in KDD systems.Why Do We Need KDD?The traditional method of turning data intoknowledge relies on manual analysis and in-terpretation. For example, in the health-careindustry, it is common for specialists to peri-odically analyze current trends and changesin health-care data, say, on a quarterly basis.The specialists then provide a report detailingthe analysis to the sponsoring health-care or-ganization; this report becomes the basis forfuture decision making and planning forhealth-care management. In a totally differ-ent type of application, planetary geologistssift through remotely sensed images of plan-ets and asteroids, carefully locating and cata-loging such geologic objects of interest as im-pact craters. Be it science, marketing, finance,health care, retail, or any other field, the clas-sical approach to data analysis relies funda-mentally on one or more analysts becomingArticlesFALL 1996 37From Data Mining to Knowledge Discovery inDatabasesUsama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth Copyright © 1996, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1996 / $2.00areas is astronomy. Here, a notable success was achieved by SKICAT ,a system used by as-tronomers to perform image analysis,classification, and cataloging of sky objects from sky-survey images (Fayyad, Djorgovski,and Weir 1996). In its first application, the system was used to process the 3 terabytes (1012bytes) of image data resulting from the Second Palomar Observatory Sky Survey,where it is estimated that on the order of 109sky objects are detectable. SKICAT can outper-form humans and traditional computational techniques in classifying faint sky objects. See Fayyad, Haussler, and Stolorz (1996) for a sur-vey of scientific applications.In business, main KDD application areas includes marketing, finance (especially in-vestment), fraud detection, manufacturing,telecommunications, and Internet agents.Marketing:In marketing, the primary ap-plication is database marketing systems,which analyze customer databases to identify different customer groups and forecast their behavior. Business Week (Berry 1994) estimat-ed that over half of all retailers are using or planning to use database marketing, and those who do use it have good results; for ex-ample, American Express reports a 10- to 15-percent increase in credit-card use. Another notable marketing application is market-bas-ket analysis (Agrawal et al. 1996) systems,which find patterns such as, “If customer bought X, he/she is also likely to buy Y and Z.” Such patterns are valuable to retailers.Investment: Numerous companies use da-ta mining for investment, but most do not describe their systems. One exception is LBS Capital Management. Its system uses expert systems, neural nets, and genetic algorithms to manage portfolios totaling $600 million;since its start in 1993, the system has outper-formed the broad stock market (Hall, Mani,and Barr 1996).Fraud detection: HNC Falcon and Nestor PRISM systems are used for monitoring credit-card fraud, watching over millions of ac-counts. The FAIS system (Senator et al. 1995),from the U.S. Treasury Financial Crimes En-forcement Network, is used to identify finan-cial transactions that might indicate money-laundering activity.Manufacturing: The CASSIOPEE trou-bleshooting system, developed as part of a joint venture between General Electric and SNECMA, was applied by three major Euro-pean airlines to diagnose and predict prob-lems for the Boeing 737. To derive families of faults, clustering methods are used. CASSIOPEE received the European first prize for innova-intimately familiar with the data and serving as an interface between the data and the users and products.For these (and many other) applications,this form of manual probing of a data set is slow, expensive, and highly subjective. In fact, as data volumes grow dramatically, this type of manual data analysis is becoming completely impractical in many domains.Databases are increasing in size in two ways:(1) the number N of records or objects in the database and (2) the number d of fields or at-tributes to an object. Databases containing on the order of N = 109objects are becoming in-creasingly common, for example, in the as-tronomical sciences. Similarly, the number of fields d can easily be on the order of 102or even 103, for example, in medical diagnostic applications. Who could be expected to di-gest millions of records, each having tens or hundreds of fields? We believe that this job is certainly not one for humans; hence, analysis work needs to be automated, at least partially.The need to scale up human analysis capa-bilities to handling the large number of bytes that we can collect is both economic and sci-entific. Businesses use data to gain competi-tive advantage, increase efficiency, and pro-vide more valuable services to customers.Data we capture about our environment are the basic evidence we use to build theories and models of the universe we live in. Be-cause computers have enabled humans to gather more data than we can digest, it is on-ly natural to turn to computational tech-niques to help us unearth meaningful pat-terns and structures from the massive volumes of data. Hence, KDD is an attempt to address a problem that the digital informa-tion era made a fact of life for all of us: data overload.Data Mining and Knowledge Discovery in the Real WorldA large degree of the current interest in KDD is the result of the media interest surrounding successful KDD applications, for example, the focus articles within the last two years in Business Week , Newsweek , Byte , PC Week , and other large-circulation periodicals. Unfortu-nately, it is not always easy to separate fact from media hype. Nonetheless, several well-documented examples of successful systems can rightly be referred to as KDD applications and have been deployed in operational use on large-scale real-world problems in science and in business.In science, one of the primary applicationThere is an urgent need for a new generation of computation-al theories and tools toassist humans in extractinguseful information (knowledge)from the rapidly growing volumes ofdigital data.Articles38AI MAGAZINEtive applications (Manago and Auriol 1996).Telecommunications: The telecommuni-cations alarm-sequence analyzer (TASA) wasbuilt in cooperation with a manufacturer oftelecommunications equipment and threetelephone networks (Mannila, Toivonen, andVerkamo 1995). The system uses a novelframework for locating frequently occurringalarm episodes from the alarm stream andpresenting them as rules. Large sets of discov-ered rules can be explored with flexible infor-mation-retrieval tools supporting interactivityand iteration. In this way, TASA offers pruning,grouping, and ordering tools to refine the re-sults of a basic brute-force search for rules.Data cleaning: The MERGE-PURGE systemwas applied to the identification of duplicatewelfare claims (Hernandez and Stolfo 1995).It was used successfully on data from the Wel-fare Department of the State of Washington.In other areas, a well-publicized system isIBM’s ADVANCED SCOUT,a specialized data-min-ing system that helps National Basketball As-sociation (NBA) coaches organize and inter-pret data from NBA games (U.S. News 1995). ADVANCED SCOUT was used by several of the NBA teams in 1996, including the Seattle Su-personics, which reached the NBA finals.Finally, a novel and increasingly importanttype of discovery is one based on the use of in-telligent agents to navigate through an infor-mation-rich environment. Although the ideaof active triggers has long been analyzed in thedatabase field, really successful applications ofthis idea appeared only with the advent of theInternet. These systems ask the user to specifya profile of interest and search for related in-formation among a wide variety of public-do-main and proprietary sources. For example, FIREFLY is a personal music-recommendation agent: It asks a user his/her opinion of several music pieces and then suggests other music that the user might like (<http:// www.ffl/>). CRAYON(/>) allows users to create their own free newspaper (supported by ads); NEWSHOUND(<http://www. /hound/>) from the San Jose Mercury News and FARCAST(</> automatically search information from a wide variety of sources, including newspapers and wire services, and e-mail rele-vant documents directly to the user.These are just a few of the numerous suchsystems that use KDD techniques to automat-ically produce useful information from largemasses of raw data. See Piatetsky-Shapiro etal. (1996) for an overview of issues in devel-oping industrial KDD applications.Data Mining and KDDHistorically, the notion of finding useful pat-terns in data has been given a variety ofnames, including data mining, knowledge ex-traction, information discovery, informationharvesting, data archaeology, and data patternprocessing. The term data mining has mostlybeen used by statisticians, data analysts, andthe management information systems (MIS)communities. It has also gained popularity inthe database field. The phrase knowledge dis-covery in databases was coined at the first KDDworkshop in 1989 (Piatetsky-Shapiro 1991) toemphasize that knowledge is the end productof a data-driven discovery. It has been popular-ized in the AI and machine-learning fields.In our view, KDD refers to the overall pro-cess of discovering useful knowledge from da-ta, and data mining refers to a particular stepin this process. Data mining is the applicationof specific algorithms for extracting patternsfrom data. The distinction between the KDDprocess and the data-mining step (within theprocess) is a central point of this article. Theadditional steps in the KDD process, such asdata preparation, data selection, data cleaning,incorporation of appropriate prior knowledge,and proper interpretation of the results ofmining, are essential to ensure that usefulknowledge is derived from the data. Blind ap-plication of data-mining methods (rightly crit-icized as data dredging in the statistical litera-ture) can be a dangerous activity, easilyleading to the discovery of meaningless andinvalid patterns.The Interdisciplinary Nature of KDDKDD has evolved, and continues to evolve,from the intersection of research fields such asmachine learning, pattern recognition,databases, statistics, AI, knowledge acquisitionfor expert systems, data visualization, andhigh-performance computing. The unifyinggoal is extracting high-level knowledge fromlow-level data in the context of large data sets.The data-mining component of KDD cur-rently relies heavily on known techniquesfrom machine learning, pattern recognition,and statistics to find patterns from data in thedata-mining step of the KDD process. A natu-ral question is, How is KDD different from pat-tern recognition or machine learning (and re-lated fields)? The answer is that these fieldsprovide some of the data-mining methodsthat are used in the data-mining step of theKDD process. KDD focuses on the overall pro-cess of knowledge discovery from data, includ-ing how the data are stored and accessed, howalgorithms can be scaled to massive data setsThe basicproblemaddressed bythe KDDprocess isone ofmappinglow-leveldata intoother formsthat might bemorecompact,moreabstract,or moreuseful.ArticlesFALL 1996 39A driving force behind KDD is the database field (the second D in KDD). Indeed, the problem of effective data manipulation when data cannot fit in the main memory is of fun-damental importance to KDD. Database tech-niques for gaining efficient data access,grouping and ordering operations when ac-cessing data, and optimizing queries consti-tute the basics for scaling algorithms to larger data sets. Most data-mining algorithms from statistics, pattern recognition, and machine learning assume data are in the main memo-ry and pay no attention to how the algorithm breaks down if only limited views of the data are possible.A related field evolving from databases is data warehousing,which refers to the popular business trend of collecting and cleaning transactional data to make them available for online analysis and decision support. Data warehousing helps set the stage for KDD in two important ways: (1) data cleaning and (2)data access.Data cleaning: As organizations are forced to think about a unified logical view of the wide variety of data and databases they pos-sess, they have to address the issues of map-ping data to a single naming convention,uniformly representing and handling missing data, and handling noise and errors when possible.Data access: Uniform and well-defined methods must be created for accessing the da-ta and providing access paths to data that were historically difficult to get to (for exam-ple, stored offline).Once organizations and individuals have solved the problem of how to store and ac-cess their data, the natural next step is the question, What else do we do with all the da-ta? This is where opportunities for KDD natu-rally arise.A popular approach for analysis of data warehouses is called online analytical processing (OLAP), named for a set of principles pro-posed by Codd (1993). OLAP tools focus on providing multidimensional data analysis,which is superior to SQL in computing sum-maries and breakdowns along many dimen-sions. OLAP tools are targeted toward simpli-fying and supporting interactive data analysis,but the goal of KDD tools is to automate as much of the process as possible. Thus, KDD is a step beyond what is currently supported by most standard database systems.Basic DefinitionsKDD is the nontrivial process of identifying valid, novel, potentially useful, and ultimate-and still run efficiently, how results can be in-terpreted and visualized, and how the overall man-machine interaction can usefully be modeled and supported. The KDD process can be viewed as a multidisciplinary activity that encompasses techniques beyond the scope of any one particular discipline such as machine learning. In this context, there are clear opportunities for other fields of AI (be-sides machine learning) to contribute to KDD. KDD places a special emphasis on find-ing understandable patterns that can be inter-preted as useful or interesting knowledge.Thus, for example, neural networks, although a powerful modeling tool, are relatively difficult to understand compared to decision trees. KDD also emphasizes scaling and ro-bustness properties of modeling algorithms for large noisy data sets.Related AI research fields include machine discovery, which targets the discovery of em-pirical laws from observation and experimen-tation (Shrager and Langley 1990) (see Kloes-gen and Zytkow [1996] for a glossary of terms common to KDD and machine discovery),and causal modeling for the inference of causal models from data (Spirtes, Glymour,and Scheines 1993). Statistics in particular has much in common with KDD (see Elder and Pregibon [1996] and Glymour et al.[1996] for a more detailed discussion of this synergy). Knowledge discovery from data is fundamentally a statistical endeavor. Statistics provides a language and framework for quan-tifying the uncertainty that results when one tries to infer general patterns from a particu-lar sample of an overall population. As men-tioned earlier, the term data mining has had negative connotations in statistics since the 1960s when computer-based data analysis techniques were first introduced. The concern arose because if one searches long enough in any data set (even randomly generated data),one can find patterns that appear to be statis-tically significant but, in fact, are not. Clearly,this issue is of fundamental importance to KDD. Substantial progress has been made in recent years in understanding such issues in statistics. Much of this work is of direct rele-vance to KDD. Thus, data mining is a legiti-mate activity as long as one understands how to do it correctly; data mining carried out poorly (without regard to the statistical as-pects of the problem) is to be avoided. KDD can also be viewed as encompassing a broader view of modeling than statistics. KDD aims to provide tools to automate (to the degree pos-sible) the entire process of data analysis and the statistician’s “art” of hypothesis selection.Data mining is a step in the KDD process that consists of ap-plying data analysis and discovery al-gorithms that produce a par-ticular enu-meration ofpatterns (or models)over the data.Articles40AI MAGAZINEly understandable patterns in data (Fayyad, Piatetsky-Shapiro, and Smyth 1996).Here, data are a set of facts (for example, cases in a database), and pattern is an expres-sion in some language describing a subset of the data or a model applicable to the subset. Hence, in our usage here, extracting a pattern also designates fitting a model to data; find-ing structure from data; or, in general, mak-ing any high-level description of a set of data. The term process implies that KDD comprises many steps, which involve data preparation, search for patterns, knowledge evaluation, and refinement, all repeated in multiple itera-tions. By nontrivial, we mean that some search or inference is involved; that is, it is not a straightforward computation of predefined quantities like computing the av-erage value of a set of numbers.The discovered patterns should be valid on new data with some degree of certainty. We also want patterns to be novel (at least to the system and preferably to the user) and poten-tially useful, that is, lead to some benefit to the user or task. Finally, the patterns should be understandable, if not immediately then after some postprocessing.The previous discussion implies that we can define quantitative measures for evaluating extracted patterns. In many cases, it is possi-ble to define measures of certainty (for exam-ple, estimated prediction accuracy on new data) or utility (for example, gain, perhaps indollars saved because of better predictions orspeedup in response time of a system). No-tions such as novelty and understandabilityare much more subjective. In certain contexts,understandability can be estimated by sim-plicity (for example, the number of bits to de-scribe a pattern). An important notion, calledinterestingness(for example, see Silberschatzand Tuzhilin [1995] and Piatetsky-Shapiro andMatheus [1994]), is usually taken as an overallmeasure of pattern value, combining validity,novelty, usefulness, and simplicity. Interest-ingness functions can be defined explicitly orcan be manifested implicitly through an or-dering placed by the KDD system on the dis-covered patterns or models.Given these notions, we can consider apattern to be knowledge if it exceeds some in-terestingness threshold, which is by nomeans an attempt to define knowledge in thephilosophical or even the popular view. As amatter of fact, knowledge in this definition ispurely user oriented and domain specific andis determined by whatever functions andthresholds the user chooses.Data mining is a step in the KDD processthat consists of applying data analysis anddiscovery algorithms that, under acceptablecomputational efficiency limitations, pro-duce a particular enumeration of patterns (ormodels) over the data. Note that the space ofArticlesFALL 1996 41Figure 1. An Overview of the Steps That Compose the KDD Process.methods, the effective number of variables under consideration can be reduced, or in-variant representations for the data can be found.Fifth is matching the goals of the KDD pro-cess (step 1) to a particular data-mining method. For example, summarization, clas-sification, regression, clustering, and so on,are described later as well as in Fayyad, Piatet-sky-Shapiro, and Smyth (1996).Sixth is exploratory analysis and model and hypothesis selection: choosing the data-mining algorithm(s) and selecting method(s)to be used for searching for data patterns.This process includes deciding which models and parameters might be appropriate (for ex-ample, models of categorical data are differ-ent than models of vectors over the reals) and matching a particular data-mining method with the overall criteria of the KDD process (for example, the end user might be more in-terested in understanding the model than its predictive capabilities).Seventh is data mining: searching for pat-terns of interest in a particular representa-tional form or a set of such representations,including classification rules or trees, regres-sion, and clustering. The user can significant-ly aid the data-mining method by correctly performing the preceding steps.Eighth is interpreting mined patterns, pos-sibly returning to any of steps 1 through 7 for further iteration. This step can also involve visualization of the extracted patterns and models or visualization of the data given the extracted models.Ninth is acting on the discovered knowl-edge: using the knowledge directly, incorpo-rating the knowledge into another system for further action, or simply documenting it and reporting it to interested parties. This process also includes checking for and resolving po-tential conflicts with previously believed (or extracted) knowledge.The KDD process can involve significant iteration and can contain loops between any two steps. The basic flow of steps (al-though not the potential multitude of itera-tions and loops) is illustrated in figure 1.Most previous work on KDD has focused on step 7, the data mining. However, the other steps are as important (and probably more so) for the successful application of KDD in practice. Having defined the basic notions and introduced the KDD process, we now focus on the data-mining component,which has, by far, received the most atten-tion in the literature.patterns is often infinite, and the enumera-tion of patterns involves some form of search in this space. Practical computational constraints place severe limits on the sub-space that can be explored by a data-mining algorithm.The KDD process involves using the database along with any required selection,preprocessing, subsampling, and transforma-tions of it; applying data-mining methods (algorithms) to enumerate patterns from it;and evaluating the products of data mining to identify the subset of the enumerated pat-terns deemed knowledge. The data-mining component of the KDD process is concerned with the algorithmic means by which pat-terns are extracted and enumerated from da-ta. The overall KDD process (figure 1) in-cludes the evaluation and possible interpretation of the mined patterns to de-termine which patterns can be considered new knowledge. The KDD process also in-cludes all the additional steps described in the next section.The notion of an overall user-driven pro-cess is not unique to KDD: analogous propos-als have been put forward both in statistics (Hand 1994) and in machine learning (Brod-ley and Smyth 1996).The KDD ProcessThe KDD process is interactive and iterative,involving numerous steps with many deci-sions made by the user. Brachman and Anand (1996) give a practical view of the KDD pro-cess, emphasizing the interactive nature of the process. Here, we broadly outline some of its basic steps:First is developing an understanding of the application domain and the relevant prior knowledge and identifying the goal of the KDD process from the customer’s viewpoint.Second is creating a target data set: select-ing a data set, or focusing on a subset of vari-ables or data samples, on which discovery is to be performed.Third is data cleaning and preprocessing.Basic operations include removing noise if appropriate, collecting the necessary informa-tion to model or account for noise, deciding on strategies for handling missing data fields,and accounting for time-sequence informa-tion and known changes.Fourth is data reduction and projection:finding useful features to represent the data depending on the goal of the task. With di-mensionality reduction or transformationArticles42AI MAGAZINEThe Data-Mining Stepof the KDD ProcessThe data-mining component of the KDD pro-cess often involves repeated iterative applica-tion of particular data-mining methods. This section presents an overview of the primary goals of data mining, a description of the methods used to address these goals, and a brief description of the data-mining algo-rithms that incorporate these methods.The knowledge discovery goals are defined by the intended use of the system. We can distinguish two types of goals: (1) verification and (2) discovery. With verification,the sys-tem is limited to verifying the user’s hypothe-sis. With discovery,the system autonomously finds new patterns. We further subdivide the discovery goal into prediction,where the sys-tem finds patterns for predicting the future behavior of some entities, and description, where the system finds patterns for presenta-tion to a user in a human-understandableform. In this article, we are primarily con-cerned with discovery-oriented data mining.Data mining involves fitting models to, or determining patterns from, observed data. The fitted models play the role of inferred knowledge: Whether the models reflect useful or interesting knowledge is part of the over-all, interactive KDD process where subjective human judgment is typically required. Two primary mathematical formalisms are used in model fitting: (1) statistical and (2) logical. The statistical approach allows for nondeter-ministic effects in the model, whereas a logi-cal model is purely deterministic. We focus primarily on the statistical approach to data mining, which tends to be the most widely used basis for practical data-mining applica-tions given the typical presence of uncertain-ty in real-world data-generating processes.Most data-mining methods are based on tried and tested techniques from machine learning, pattern recognition, and statistics: classification, clustering, regression, and so on. The array of different algorithms under each of these headings can often be bewilder-ing to both the novice and the experienced data analyst. It should be emphasized that of the many data-mining methods advertised in the literature, there are really only a few fun-damental techniques. The actual underlying model representation being used by a particu-lar method typically comes from a composi-tion of a small number of well-known op-tions: polynomials, splines, kernel and basis functions, threshold-Boolean functions, and so on. Thus, algorithms tend to differ primar-ily in the goodness-of-fit criterion used toevaluate model fit or in the search methodused to find a good fit.In our brief overview of data-mining meth-ods, we try in particular to convey the notionthat most (if not all) methods can be viewedas extensions or hybrids of a few basic tech-niques and principles. We first discuss the pri-mary methods of data mining and then showthat the data- mining methods can be viewedas consisting of three primary algorithmiccomponents: (1) model representation, (2)model evaluation, and (3) search. In the dis-cussion of KDD and data-mining methods,we use a simple example to make some of thenotions more concrete. Figure 2 shows a sim-ple two-dimensional artificial data set consist-ing of 23 cases. Each point on the graph rep-resents a person who has been given a loanby a particular bank at some time in the past.The horizontal axis represents the income ofthe person; the vertical axis represents the to-tal personal debt of the person (mortgage, carpayments, and so on). The data have beenclassified into two classes: (1) the x’s repre-sent persons who have defaulted on theirloans and (2) the o’s represent persons whoseloans are in good status with the bank. Thus,this simple artificial data set could represent ahistorical data set that can contain usefulknowledge from the point of view of thebank making the loans. Note that in actualKDD applications, there are typically manymore dimensions (as many as several hun-dreds) and many more data points (manythousands or even millions).ArticlesFALL 1996 43Figure 2. A Simple Data Set with Two Classes Used for Illustrative Purposes.。
Extensive-Form Games: IntroductionÙIntroduction______________________________________________________________________1 Game Trees______________________________________________________________________1 Information sets___________________________________________________________________6 Games of perfect recall_____________________________________________________________9 Arborescences___________________________________________________________________12 Simultaneity_____________________________________________________________________13 Divide and conquer: subgames______________________________________________________13IntroductionWhen we model a strategic economic situation we want to capture as much of the relevant detail as tractably possible. A game can have a complex temporal and information structure; and this structure could well be very significant to understanding the way the game will be played. These structures are not acknowledged explicitly in the game’s strategic form, so we seek a more inclusive formulation. It would be desirable to include at least the following: 1) the set of players, 2) who moves when and under what circumstances, 3) what actions are available to a player when she is called upon to move, 4) what she knows when she is called upon to move, and 5) what payoff each player receives when the game is played in a particular way. Components 2 and 4are additions to the strategic-form description; component 3 typically involves more specification in the extensive form than in the strategic form.We can incorporate all of these features within an extensive-form description of the game. The foundation of the extensive form is a game tree. First I’ll discuss what a tree is and then describe the additional specifications and interpretations we need to make in order to transform it into a full-fledged extensive-form description of the game. We’ll discuss additional restrictions upon the information structure to reflect a common assumption, viz. perfect recall, which asserts that players don’t forget. We’ll discuss the relation between this traditional, graph-theoretic exposition and a more recent one based on the arborescence. After that we’ll learn how to ease our analysis of complicated games by breaking them up into simpler subgames.Game TreesWe’ll construct our definition of a tree by first defining more primitive graph-theoretic entities. (Iyanaga and Kawada [1980: 234–235]). A graph G=(V,E) consists of a finite set V of vertices and a finite set E of edges of V. An edge e˙E is an unordered pair of distinct elements of V; e.g. the edge e=(v,v’)joins the two vertices v,v’˙V.Ù© 1997 by Jim Ratliff , <jim@>, </gametheory>.Consider an alternating sequence of vertices and edges of a graph G, P={v0,e1,v1,e2,…,e n,v n}. P is a path if 1 each edge e i joins its neighbors v i¥1 and v i and 2 no edge is encountered more than once (i.e.e i≠e j when i≠j).1 In this case we say that v i and v j, for i,j˙{1,…,n}, are connected by P and that P runs from v0to v n.2 (Note that a path may encounter the same vertex more than once.) A path is closed if it begins and ends with the same vertex, i.e. v0=v n. A closed path which never reencounters a vertex, except for the first, is called a circuit. I.e. a closed path P is a circuit if v i≠v j whenever i≠j and {i,j}≠{0,n}. In other words a circuit is a single “loop.” A graph G is connected if for every pair of distinct vertices v,v’˙V, v≠v’, there exists a path in G which connects v and v’.Now we’re ready to define a tree. A tree is a connected graph which contains no circuit.3 In game theory we often refer to the vertices as nodes and, consistent with our dendro-metaphor, to edges as branches.4Because a tree is a connected graph, any node can be reached from any other node by traversing a sequence of branches. Because it has no circuits, a tree is unicursal: for any pair of nodes, there is exactly one path which runs from the first to the second.Example: graphs and treesFigure 1: A graph.The sets of vertices and edges in Figure 1 form a graph. The edges depicted graphically are defined bye1=(v0,v1),e2=(v0,v2),e3=(v1,v2),e4=(v1,v3),e5=(v2,v4),e6=(v3,v4),e7=(v2,v3).The following two alternating sequences of vertices and edges are paths:{v0,e1,v1}, {v0,e2,v2,e5,v4,e6,v3,e7,v2,e3,v1}.Note in the second case that the vertex v2 is encountered twice. The sequence {v0,e1,v2} is not a path. Neither is1Let e=(v,v’) and e’=(v’,v) be edges, where v and v’ are vertices. Because edges are unordered pairs, we have e=e’.2This “runs from… to” terminology is not standard, but I find it useful for expository purposes here.3I.e., no circuit can be constructed from vertices and edges belonging to the graph.4My treatment roughly follows that of Kuhn [1953], which is the hugely influential, seminal paper concerning extensive-form games.{v0,e1,v1,e3,v2,e7,v3,e4,v1,e3,v2},because the edge e3 is encountered twice.The two sequences{v0,e2,v2,e5,v4,e6,v3,e4,v1,e1,v0}, {v0,e1,v1,e3,v2,e7,v3,e6,v4,e5,v2,e2,v0}are closed paths. The first is a circuit; the second is not, because the vertex v2 is encountered twice. Figure 2 shows two connected graphs. The first is a tree; the second is not.Figure 2: Two connected graphs: (a) a tree and (b) a nontree.Now we want to enhance our tree to become a game tree. We henceforth assume that the tree has at least two vertices. We bestow one vertex of the tree with the honor of being the initial node, O. (See Figure 3.) This is where the game begins. It is a decision node.wFigure 3: A game tree.If a noninitial node x is incident with two or more branches, it is also a decision node. Let XÓV be the set of decision nodes. (In Figure 3, X={O,å,∫,∂}.) One of these branches leads to the node; the remaining branches lead away from the node. Since branches are unordered pairs of nodes, how do youknow which branch leads to the node and which lead away? It’s simple: For some noninitial decision node x˙X\{O}, find the unique path which runs from the initial node O to node x. That path will contain exactly one of the branches incident with x; this branch is the one which leads to x. The other branches incident at v are the branches which lead away from x. For example, in Figure 3, there are three edges incident at node ∂: g, U, and D. The unique path running from the initial node to node ∂ is {O,L,å,g,∂}. This path contains only the g edge incident at ∂; therefore g is the branch leading to ∂ and U and D are the edges leading away.We interpret each edge e˙E as an action. Let A be the set of all actions and let f:ÙE§A be the function which assigns an action to each edge. The branches which lead away from a decision node represent the actions available at that node. We require that, for a given decision node x˙X, the function f assign a unique action to each branch leading away from x. (Otherwise it would not be well specified what path would result from a particular action taken at x.) The actions available at any decision node x˙X are given by the set Aªxº=fª{e˙E:Ùe leads away from x}º.5 (So A is a correspondence A:ÙXéA.) In Figure 3, AªOº={L,R}, Aªåº=Aª∫º={g,s}, and Aª∂º={U,D}. More than one edge may correspond to the same action (as long as any such pair of edges does not lead away from the same node). For example, in Figure 3, there are two edges which correspond to the action g. In what follows I will often refer to an edge e˙E by its associated action fªeº.6Any noninitial node with only one incident branch necessarily has no actions available. Such a node is a terminal node. Let ZÓV be the set of terminal nodes. The tree in Figure 3 has five terminal nodes: Z={z1,z2,z3,z4,z5}. The decision and terminal nodes partition the game tree’s nodes: i.e. V=X¨Z and XËZ=õ. The game ends whenever a terminal node is reached. We identify the terminal nodes with outcomes of the game. (Sometimes outcome is used to refer not just to a terminal node but to the unique path from the initial node to that terminal node.)So we have established that the game begins at the initial node and play ends at a terminal node. In what order are the intermediate decisions made? Consider two distinct nodes v,v’˙V, v≠v’. We say that v precedes v’, or that v is a predecessor of v’, if v lies on the unique path which runs from the initial node O to v’. We write vüv’. Equivalently, we say that v’ is a successor of v, and write v’öv. For example, in Figure 3, the unique path running from the initial node O to ∂ is {O,L,å,g,∂}. This path contains å but not contain ∫; therefore å is a predecessor of ∂, but ∫ is not.Predecession, then, defines a binary relation ü on the set of nodes V, viz.ü={(v,v’)˙V2:Ùvüv’}.7,8(1) 5Recall that if f:ÙX§Y is a function and SÓX, then fªSº is the image of S under f, i.e. fªSº={fªxº:Ùx˙S}.6For example, in the previous paragraph I referred to the edge connecting å and ∂ by its action g.7 A binary relation R on a set X can be thought of as a subset of the Cartesian product of X with itself, i.e. RÓX˜X, which includes allthose pairs (x,y)˙X2 such that xRy. I.e. (x,y)˙R⁄xRy.8The succession binary relation ö is the inverse relation of ü; i.e. Åv,v’˙V, vüv’⁄v’öv; it could also be called the dual order (except for a minor problem that ü is irreflexive and therefore not actually providing an ordering). (See later footnote.)It is intuitively obvious—and you can show rigorously—that the predecession relation is irreflexive (Åv˙V, not vüv), asymmetric [vüv’flnotÙÙ(v’üv)], and transitive [(vüv’ and v’üv”) fl vüv”]. This binary relation need not be complete; there may exist a pair of nodes v,v’˙V such that neither vüv’ nor v’üv. For example in Figure 3, å and ∫ are unordered by precedence. A partial listing of the precedence relations in Figure 3 is: Oüz3, åüz2, and åü∂.The initial node is a predecessor of every noninitial node and has no predecessor itself. For every noninitial node x˙V\{O}, let Pªxº be the set of predecessors of x; i.e. Pªxº={v˙V:Ùvüx}. For example in Figure 3, Pª∂º={O,å} and Pªz4º={O,∫}. Although predecession provides only a partial ordering in general on the set V of all nodes, you can show that it totally orders the set Pªxº of predecessors of any noninitial node x˙V\{O}. (I.e. if v,v’˙Pªxº and v≠v’, then either vüv’ or v’üv.9)10We say that v is the immediate predecessor of v’ if there is an action available at v (i.e. a branch incident at v which leads away from the initial node) which leads directly to v’. (This requires that vüv’and that a single edge joins v and v’. You can prove that each noninitial node has a unique immediate predecessor.) Alternatively we could say that v is the immediate predecessor of v’ if v=maxüÙPªv’º, which means that all other predecessors of v’ must also precede v.11 In Figure 3, for example, O is the immediate predecessor of å but not of ∂, and å is the immediate predecessor of ∂.We have thus far identified the locations in the tree at which decisions are made—the decision nodes—and created a structure which relates the various decisions by specifying what actions lead to which other decisions and in what temporal order. But we still haven’t said who makes which decisions. We retain our standard player set I={1,…,n}, but we also allow Nature to be player 0 when required. (We will see later that Nature can be given the job of introducing new information into the game.)So now we assign to each decision node x˙X exactly one player i˙I¨{0}. We specify this assignment by a node-ownership function ô:ÙX§(I¨{0}) where each decision node x˙X is assigned to the player specified by ôªxº. The set of decision nodes assigned to player i˙I¨{0} is X i={x˙X:Ùôªxº=i}. The node-ownership function ô thus induces a player partitionÙ={X0,X1,…,X n} of the set X of decision nodes.12 We say that a node belongs to a particular player or that a player owns a particular node.Every terminal node z˙Z represents some outcome which affects the players. Perhaps it’s success for 9Note that if a node x˙V has only one predecessor, so that Pªxº is a singleton set, then Pªxº is trivially totally ordered by the binary relation ü.10In my experience it is conventional to require that a binary relation R on a set X be reflexive (i.e. Åx˙X, xRx) in order that it impose a preordering upon X. (See Debreu [1959: 7] and Iyanaga and Kawada [1980: 958, 960].) Kreps [1990: 364] explicitly weakens this notion by defining that “a set is totally ordered by a binary relation if any two distinct elements of the set are ordered in one direction or the other.” [emphasis added] This allows the concept of totally ordered to be usefully applied to irreflexive binary relations. Note that for reflexive binary relations the two definitions are equivalent. Kreps [1988: 8] provides alternative terminology for this concept by saying that a binary relation R on a set X is weakly connected if for all x,y˙X, either x=y, xRy, or yRx.11Because Pªv’º is totally ordered by the binary relation ü, we can arrange its elements such that v1üv2üÚüv k. Then v k=maxüÙPªv’º.12 A partition of a set A is a collection of cells A1,…,A n, such that 1 each A iÓA, 2 the collection is exhaustive, i.e. A1¨A2¨Ú¨A n=A,and 3 the cells are mutually exclusive, i.e. when i≠j, A iËA j=õ.one and failure for another; the outcomes may represent profit levels for each firm or lengths of prison sentences for each criminal suspect. Whatever the outcomes are, they are events over which each player has preferences. We assume that each player has von Neumann-Morgenstern preferences, which can be represented by a utility function µi :ÙZ§Â with the expected-utility property, and seeks to maximize her expected utility.13Figure 4 shows an extensive-form game elaborated from the tree in Figure 3. I have indicated the player partition, which assigns players to nodes, by numerical labels and assigned payoff vectors to each terminal node. By convention the upper entry in each payoff vector is the first player’s utility derived from that outcome; the second entry is the second player’s utility. The player partition is given by X 1={O ,∂} and X 2={å,∫}. (Nature does not appear in this game.)Information setsIn the first paragraph I identified five aspects of an economic scenario we’d capture using the extensive form. So far we’ve captured the set of players, who moves when and under what conditions, what actions are available to a player when it’s her turn to move, and what payoffs are received at the end of the game. The remaining challenge is to represent the information which a player knows when called upon to move.What kind of information do I have in mind? We need to be concerned about things which have changed during the course of the game as a result of the actions of the players.14 The key to representing information in the game tree is realizing the connection between nodes and history . At any point in the play of the game the history is a record of who did what when. Remember that game trees are13I’m using µi to represent the utility to player i of an outcome z˙Z because I want to reserve the symbol u i for later use as her utility to a strategy profile. (We will see that an outcome z˙Z could be the result of any of several strategy profiles.)14If information is entering the game in some other fashion, this can be modeled by the addition of Nature as a nonstrategic player who chooses her strategy according to a probability distribution. The revelation of Nature’s moves then communicates the exogenouslygenerated information to the players.Figure 4: An extensive-form game indicating the player partition and payoffs as a function of outcome.unicursal—there’s precisely one path which runs from the initial node to any given node. Therefore, if you know the node you have reached, you also know precisely the history of play that got you there.15,16We can divide extensive-form games into two classes: In games of perfect information, whenever it is your turn to move you know precisely where in the tree you are located. Therefore you know the history of play completely. In games of imperfect information, you might be uncertain about the history of play when you are called on to play. This uncertainty about history corresponds to uncertainty about which of several nodes you might be at.To express this uncertainty we employ the information set. An information set hÓX is a set of decision nodes. In the course of playing the game the players’ actions may cause one of your nodes to be reached. When this occurs, you might not know precisely which of your nodes has been reached. This uncertainty is captured by saying that you only know that a particular information set has been reached; all you know is that you’ve reached one of the nodes x˙h in that information set. It is as if you are sent a message, whenever it is your move, specifying the information set you have reached.There are some obvious restrictions we need to impose on the composition of information sets. First, each decision node must belong to exactly one information set. (If a node belonged to more than one information set, it wouldn’t be well defined what message its owner should receive when that node is reached.) Therefore the set H of all information sets forms a partition of the decision nodes: X=U h˙HÙh and Åh,h’˙H, hËh’=õ. We can also define an information-set membership function h:ÙX§H, which specifies an information set h˙H for each decision node x˙X according to hªxº=h⁄x˙h.Secondly, all of the nodes in an information set must belong to the same player. (I.e. Åh˙H,Åx,x’˙h, ôªxº=ôªx’º.) Otherwise, two or more players may think it is their turn to move.17 (Therefore the information partition H is a refinement of the player partition Ù={X0,X1,…,X n}, because for each information set h˙H there exists a player i˙I¨{0} such that this information set is entirely contained within player i’s set of nodes, i.e. such that hÓX i.18)We can partition the set H of information sets into sets of information sets which belong to each player: For every player i˙I¨{0}, the set H i={h˙H:ÙÅx˙h,ôªxº=i} contains the information sets 15Note that this wouldn’t be the case if we were playing the game represented by the nontree of Figure 1b. If we had reached the bottom node, we wouldn’t know whether we had arrived there via the clockwise or counterclockwise path from the top initial node.16You might object that in the real world there often are many different ways to reach the same result. Such a situation can be modeled within this framework. You would still have a different terminal node for each of these histories; but you would assign the same payoff vector to each of these terminal nodes.17“But what’s wrong with that? Perhaps it’s a simultaneous move game. Then it would be perfectly kosher to have two players believing it’s their turn,” you say. In an extensive-form representation a simultaneous-move game is modeled as a sequential game where the second mover doesn’t get to observe the first mover’s choice at least until after the second mover has played. We’ll discuss simultaneity further later.18One partition is a refinement of a second partition if each cell of the first is a subset of some cell of the second. More formally…. Let ˜={M i}i˙{1,…,m} and ˆ={N i}i˙{1,…,n}be two partitions. M is a refinement of N, denoted ˜üˆ, if ÅM i˙˜, ‰N j˙ˆsuch that M iÓN j. In such a case we say that ˆ is coarser than ˜.owned by player i. Therefore, for all i˙I¨{0}, U h˙HÙh=X i.iIn general, for any information set h˙H, ôªhºfi{ôªxº:Ùx˙h} would refer to the image of the set of nodes h under the node-ownership function ô; i.e. ôªhº would be the set of players who owned one or more nodes in h. However, since all nodes in h are owned by the same player, we see that ôªhº is single valued; i.e. Åh˙H, #ôªhº=1. Therefore we can also consider, with some abuse of notation, ô to be an information-set ownership function ôªhº={i˙I¨{0}:ÙÅx˙h,ôªxº=i}. As with decision nodes, we say that an information set belongs to a particular player or that a player owns a particular information set. Thirdly, every node in an information set must have the same set of available actions. Otherwise the player wouldn’t be sure which actions were truly available to her. I.e. Åh˙H, Åx,x’˙h, Aªxº=Aªx’). It is useful then to refer to the set of actions available at an information set h˙H, viz. the image of h under the correspondence A: Aªhº=U x˙hÙAªxº, because for all nodes in the information set Aªhº is the set of actions available there: Åx˙h, Aªxº=Aªhº.We can summarize these last two restrictions byÅh˙H, Åx,x’˙h, ôªxº=ôªx’º and Aªxº=Aªx’º.(2) An information set may be a singleton—it may consist of only a single node. If every information set of every player is a singleton, we have a game of perfect information; i.e. Åh˙H, #h=1, or equivalently: Åi˙I¨{0}, Åh˙H i, #h=1. If any information set contains more than one node—i.e.‰i˙I¨{0}, ‰h˙H i, such that #h>1—then the game is one of imperfect information. When an information set contains more than one node, we indicate this graphically on the extensive form by connecting all of the nodes in that information set with a dashed line. Singleton information sets require no special ornamentation. For example, the game in Figure 4 has perfect information; therefore H1={{O},{∂}}, H2={{å},{∫}}.(3) Return to the perfect information game of Figure 4. If we change the information structure so that player 1’s first move is unobserved by player 2 by the time player 2 makes his first move, then he doesn’t know at which of his two nodes he is located. We indicate this information imperfection by connecting these two nodes with a dashed line. (See Figure 5a.) Each of player 1’s two nodes belongs to its own information set. In this game then there are three information sets: two belonging to the first player and one belonging to the second. Note that, as required, player 2 has the same two actions, viz. g and s, available at both nodes of his information set; Aªåº=Aª∫º={g,s}. Contrast each player’s collection of information sets in this game, viz.H1={{O},{∂}}, H2={{å,∫}},(3) with those given in (3) for the game in Figure 4.The pseudo–extensive form of Figure 5b has two violations of the restrictions upon information sets. One information set consists of nodes belonging to different players. (Consider the information set h={å,∫}, where ôªåº=2 but ôª∫º=3.) A second information set has different actions available to itsplayer at different nodes. (Consider the information set h’={∂,©}, where Aª∂º={U,D} but Aª©º={L,C,R}.)In summary, then, an extensive-form game Ì is defined by a tree (V,E), a distinguished vertex O , a set of actions A , an action-for-edge assignment function f:ÙE§A , an information partition H of the decision nodes X , a player set I , a node-ownership function ô:ÙX§I¨{0}, and player payoff functions µi :ÙZ§Â for i˙I . From this definition the following entities can be derived: the set X of decision nodes,the set Z of terminal nodes, the set of actions Aªxº available at each decision node x˙X , the player partition Ù, and the precedence relation ü on V . It is further required that, for each information set h˙H ,every node in h shares the same player-owner and shares the same set of available actions: Åh˙H ,Åx,x’˙H , ôªxº=ôªx’º and Aªxº=Aªx’º.Games of perfect recallWe’ve established some restrictions on what nodes we can include in the same information set: We can only draw dotted lines which connect nodes which belong to the same player and at which the same set of actions is available. Now I’ll present two extensive-form fragments which might suggest additional restrictions.In Figure 6a, at node å player 1 can choose Left and turn the move over to player 2 or she can choose Right and immediately move again. If she chooses Right, she reaches node ∫, which is in the same information set as the node å from which she had just come. Note what this particular information set construction implies: When player 1 chose Right at node å, she realized what she was doing. When she then reaches ∫, she doesn’t know whether she’s at ∫ or back at å. Therefore she no longer knows that she chose Right when at å; she has forgotten the action she just took.It is typical in game theory to restrict ourselves to games of perfect recall . In such games a player always know what actions she has taken and never forgets anything she ever knew. How do we(a)(b)Figure 5: A game of (a) imperfect information and (b) implausible information.formalize this restriction?19 We can rule out the game fragment in Figure 6a with the following requirement: If two nodes are in the same information set, then neither can precede the other. (Åh˙H ,Åx,x’˙h , neither xüx’ nor x’üx .)However, that requirement isn’t sufficient. Consider Figure 6b. Nodes ∂ and © are both in the same information set. There are two ways player 1 can reach that information set: She can be at node å and choose stop, or she can start at node ∫ and play Up. Note that å and ∫ are not in the same information set; therefore they represent different information—different histories. If player 1 reaches å she knows that she reached å but never passed through ∫; if she reaches ∫ she knows she reached ∫ but never passed through å. But if she reaches the {©,∂} information set via node å, say, she cannot possibly know that she didn’t pass through ∫. (If she did know this, then she would know for certain that she is at node ∂. But that would violate the information set’s knowledge restriction that she doesn’t know which of the two nodes ∂ and © she is at.) Therefore she has forgotten something she knew earlier; so this would be a violation of perfect recall.The extensive-form fragment of Figure 6b doesn’t violate the prohibition that members of the same information set should not be ordered by precedence; so we need a stronger restriction. The formal characterization of this restriction is extremely arcane and abstruse.20 I’ll try to motivate it the best I can.Consider two nodes ∂ and © which belong to the same information set for player 1. Now consider a player-1 node å which precedes ∂ when player 1 chooses the action a at å. (See Figure 7a.21) When19There are two reasons to be interested in perfect recall. The first is that most if not all games we would be interested in analyzing would satisfy this property. (But for precisely this reason there would be little point in formalizing the restriction.) More important theoretically, however, is that the restriction to games of perfect recall allows a simplification in the analysis of these games: We will see later that in games of perfect recall there is an equivalence between mixed strategies and behavior strategies such that we can choose whichever analytical perspective is more convenient.20See Fudenberg and Tirole [1991: 81], Kreps and Wilson [1982: 867], Kreps [1990: 374–375], and Myerson [1991: 43–44]. The property so defined will prove sufficient for a later demonstration of the equivalence between mixed and behavior strategies. Kuhn [1953: 213] defines a weaker property, which he also calls perfect recall. Although this is a misnomer, his property is necessary and21∂but not necessarily an immediate predecessor.(a)(b)Figure 6: Two game fragments with forgetful players.。
《概率论与数理统计》(全英语)教学大纲课程名称:概率论与数理统计学时:48学时学分:2.5分先修课程:高等数学,线性代数开课院系:上海交通大学理学院数学系教材:华章统计学原版精品系列:概率统计(英文版·第4版), [美]德格鲁特(Morris H.DeGroot),[美]舍维什(Mark J.Schervish)著Morris H.DeGroot ,Mark J.Schervish 编, 机械工业出版社, 2012教学参考:[1] M.N. DeGroot, M.J. Schervish, Probability and Statistics, 3rd ed. Boston, MA; London:Addison-Wesley, 2002[2] Jay.L. Devore, Probability and Statistics, 5th ed. Higher Education Press, 2010[3] H. Jeffreys, Theory of Probability, 3rd ed. Oxford: Oxford University Press, 1998[4] J.T. McClave, T. Sincich, A First Course in Statistics, 7th ed. Upper Saddle River, NJ: PrenticeHall; London: Prentice-Hall International, 2000[5] S.M. Ross, Introduction to Probability and Statistics for Engineers and Scientists,2nd ed. SanDiego, CA; London: Harcourt/Academic, 2000[6] V.K. Rothagi, S.M. Ehsanes, An Introduction to Probability and Statistics, 2nd ed.New York, Chichester: Wiley, 2001Probability and Statistics (English)Curriculum IntroductionCourse Title: Probability and Statistics (English)Total Hours: 48Credit: 2.5Pre-Course:Calculus, Linear AlgebraDepartment of giving course: Department of mathematics in Shanghai Jiaotong UniveristyTextbook:Probability and Statistics ( fourth edition), [美]德格鲁特(Morris H.DeGroot),[美]舍维什(Mark J.Schervish)著Morris H.DeGroot ,MarkJ.Schervish 编, 机械工业出版社, 2012Reference:[1] M.N. DeGroot, M.J. Schervish, Probability and Statistics, 3rd ed. Boston, MA; London: Addison-Wesley, 2002[2] Jay.L. Devore, Probability and Statistics, 5th ed. Higher Education Press, 2010[3] H. Jeffreys, Theory of Probability, 3rd ed. Oxford: Oxford University Press, 1998[4] J.T. McClave, T. Sincich, A First Course in Statistics, 7th ed. Upper Saddle River, NJ: Prentice Hall; London: Prentice-Hall International, 2000[5] S.M. Ross, Introduction to Probability and Statistics for Engineers and Scientists,2nd ed. San Diego, CA; London: Harcourt/Academic, 2000[6] V.K. Rothagi, S.M. Ehsanes, An Introduction to Probability and Statistics, 2nd ed. New York, Chichester: Wiley, 2001<<概率论与数理统计>>是一门从数量方面研究随机现象规律性的数学学科,它已广泛地应用于工农业生产和科学技术之中,并与其它数学分支互相渗透与结合。
艾拉姆咖分布参数多重模糊假设检验的贝叶斯方法杨淳偓;魏立力【摘要】模糊假设检验问题是应用统计推断和统计决策中处理模糊概念的一种比较重要的情形。
对于艾拉姆咖分布参数的模糊假设检验问题,本文在特定的损失函数下研究了其贝叶斯解,并给出了数值算例。
其中,先验分布考虑了Jeffreys先验和共轭先验。
%Fuzzy hypotheses testing is an important case in statistical inference and statistical decision dealing with fuzzy concepts .The purpose of this paper is to give the Bayesian rule to multiple fuzzy hypothesis testing for Эрлангаdistribution under the given loss function , with both Jeffreys prior distribution and conjugate prior distribution , and two examples is given to illustrate this method .【期刊名称】《佳木斯大学学报(自然科学版)》【年(卷),期】2014(000)002【总页数】5页(P303-307)【关键词】艾拉姆咖分布;模糊假设;贝叶斯检验【作者】杨淳偓;魏立力【作者单位】宁夏大学数学计算机学院,宁夏银川 750021;宁夏大学数学计算机学院,宁夏银川 750021【正文语种】中文【中图分类】O212.1;O212.8;O1590 引言假设检验作为应用统计推断和统计决策中一个非常重要的内容,已发展了一套比较完善的理论[1].然而,如果在假设或观测值中引入模糊概念,则将面临很多新颖而又有趣的问题.近年来,许多学者致力于模糊假设检验问题的研究[2~11],其中一个重要的方面是把模糊集理论与贝叶斯方法结合起来.Delgado等人[2]利用模糊集理论与贝叶斯方法,得到原假设的一个模糊拒绝域.Casals[3]研究了与[2]相同的问题,但其考虑的观测信息也是模糊的.Taheri[4~5]等人研究了观测数据是精确的而假设是模糊的模糊假设的贝叶斯检验.Saade[6]在决策过程中应用了模糊化的贝叶斯准则研究了二重假设检验问题及模糊似然函数.对于经典的统计学问题,国内有不少学者在研究,但是只有部分学者将模糊数学与统计学结合起来做研究.魏立力等人[7~8]将模糊性引入到多重假设中,从贝叶斯决策的角度出发,在定数截尾样本情形下,研究了两参数指数分布中尺度参数的多重模糊假设检验的贝叶斯方法.沈红菊等人[9]研究了单参数指数分布中参数的多重模糊假设的贝叶斯检验.马翠玲等人[10]研究了两参数威布尔分布中尺度参数的多重模糊假设的贝叶斯检验.夏亚峰等人[11]研究了两参数指数_威布尔分布中参数的多重模糊假设的贝叶斯检验.艾拉姆咖分布[12]是俄罗斯在研究武器装备的维修时间时引入的,此分布在装备维修理论中占有非常重要的地位,其模糊假设通常有很好的解释.本文在精确样本情形下,考虑艾拉姆咖分布参数的多重模糊假设的贝叶斯检验.其中,使用的先验分布为Jeffreys先验和共轭先验分布,并且给出了数值例子.1 预备知识首先引入一些基本概念.为了避免可能的争议,在这里强调:先验分布是关于假设的真实性的先验信息,而隶属函数仅是对于假设本身所描述概念的不确定性的一个反设Θ为我们所考虑的参数空间,通常是Rn的一个子集.定义1设是以参数空间Θ为论域的k个模糊子集,其隶属函数分别为H0(θ),H1(θ),…,Hk-1(θ),则以下假设称为一个k重模糊假设(k≥2,且为自然数),记为(H0,H1,…,Hk-1).记A={a0,a1,…,ak-1}为行动空间,其中ai表示接受Hi(i=0,1,…,k-1)的行动.设X=(X1,X2,…,Xn)是简单随机样本,观测值为x=(x1,x2,…,xn),其中X的概率函数为f(x|θ),θ∈Θ是未知参数,且有先验分布π(θ),现要对 k 重模糊假设 H0,H1,…,Hk-1 做出判决,即选择行动空间A中的一个行动.由贝叶斯公式可知,在得到样本x后,参数θ的后验分布为其中,m(x)= ∫Θπ(θ)f(x|θ)dθ为样本X的边缘分布.损失函数是指L(θ,a):Θ × A→R+,L(θ,a)表示参数真值为θ时采取行动a所造成的损失.从样本空间X={x=(x1,x2,…,xn)}到行动空间A上的一个映射δ(x)称为一个判决函数.设D表示所有判决函数组成的判决函数类.其中,损失函数对样本分布的期望值称为δ(·)的风险函数.风险函数对先验分布π(θ)的期望.称为δ(·)的贝叶斯风险.如果在决策函数类D中存在这样的决策函数δ*(·),使得.则称δ*(·)为贝叶斯风险准则下的最优决策函数[13].损失函数L(θ,δ(x))关于后验分布π(θ|x)的期望值称为决策函数δ(·)的后验风险.使后验风险达到最小的决策函数称为后验风险准则下的最优决策函数.对于给定的决策问题和决策函数类D,如果给定的先验分布π(θ)使得贝叶斯风险B(π,δ)满足,则贝叶斯风险准则和后验风险准则等价[14].2 艾拉姆咖分布参数多重模糊假设检验的贝叶斯方法假设总体的寿命分布为艾拉姆咖分布,其分布函数为密度函数为其中t0是参数.若令,则艾拉姆咖分布函数为密度函数为现在从总体X中抽取n个样本进行寿命试验.引理1 θ的Jeffreys先验为证明: 由Jeffreys先验的定义直接可证.此时,后验分布为易见,这是倒咖玛分布I引理2 倒咖玛分布为θ的共轭先验分布.如果取倒咖玛分布IGa(σ,λ)为θ的先验分布,则θ的后验分布为IGa(σ′,λ′),其中σ′= σ +2n,λ′=证明: 若θ的先验分布π(θ)=IGa(σ,λ),则θ的后验分布为容易看出,这是倒咖玛分布现在我们从总体X中抽取n个样本进行寿命试验.为了研究k重模糊假设检验问题(H0,H1,…,Hk-1),使用如下的损失函数[15]:其中gi(θ)(i=0,1,…,k-1)为任意非负函数,gi(θ)的选取取决于试验者对错误接受Hi这一假设的敏感程度.定理1 对于k重模糊假设检验问题(H0,H1,…Hk-1),如果取损失函数(7),则接受Hi当且仅当其中是统计量,它的值域是T.证明: 考虑决策函数δ(x)的贝叶斯风险由贝叶斯公式f(x|θ)π(θ)= π(θ|x)m(x),m(x)为X的边缘密度函数,因而由Fubini定理,可得其中如果定理中的(8)式成立,则因而B(π,ai)≤B(π,aj),j≠i.即ai(接受Hi)为贝叶斯检验法则.证毕.特别的,如果在定理中gi≡C(常数),i=0,1,…,k-1,则式(8)等价于假如把Hi理解为模糊事件,则(9)式说明贝叶斯检验法则为接受使后验概率最大的假设Hi,这正是所谓的后验概率准则.推论1 若取Jeffreys先验πJ∝1/θ,θ> 0,则式(8),(9)分别等价于推论2 若取倒咖玛分布为的共轭先验分布,则θ的后验分布也是倒咖玛分布,IGa(σ′,λ′),其中那么式(8),(9)分别等价于3 数值算例例1 设X=(X1,X2,…,X10)是来自艾拉姆咖分布(5)的简单样本,考虑如下三重模糊假设:H0:θ比10较小,H1:θ大约为10,H2:θ比10 较大其隶属函数分别为取gi(θ)≡C(常数),记,则对于 Jeffreys先验,由推论1,需要计算对共轭先验分布IGa(σ,λ),超参数σ,λ需要事先确定,我们这里假设σ =0.5,λ =5,则由推论2,需要计算对T=120,140,160,180,200,220,240 分别计算αi(T)和αi′(T)i=0,1,2,结果列入表 1.表1 例1计算及检验结果36 α1(T) 4.77E- 27 4.42E- 28 5.33E- 29 6.57E-30 8.80E- 31 1.64E- 31 1.93E-32 α2(T) 1.00E- 27 6.85E- 29 4.75E- 29 3.04E- 30 6.85E- 31 1.26E- 31 4.05E- 32贝叶斯检验接受H0 接受H0 接受H1 接受H1 接受H1 接受H1 接受H2 α0′(T) 4.52E- 26 1.69E- 27 8.46E-29 5.21E- 30 3.73E- 31 2.99E- 32 2.61E-33 α1′(T) 1.02E- 26 9.04E-28 9.06E- 29 1.02E- 29 1.25E- 30 1.68E- 31 2.42E-32 α2′(T) 3.55E-28 6.54E- 29 1.29E- 29 2.66E- 30 5.70E- 32 1.28E- 31 3.01E- 32贝叶斯检验接受H0 接受H0 接受H1 接受H1 接受H1 接受H1 接受H 120 140 160180 200 220 240 α0(T) 9.42E- 27 4.61E- 28 2.80E- 29 1.99E- 30 1.59E-31 1.38E- 32 3.40E-T 2通过表1可以看到,使用Jeffreys先验和共轭先验分布,其决策基本相同,说明贝叶斯检验法则对两种先验分布不敏感,具有一定稳健性.例2 (续例1)对于例1中的三重假设,选取损失函数为对于Jeffreys先验,由推论1,需要计算对共轭先验分布IGa(σ,λ),超参数σ,λ需要事先确定,我们这里假设σ =0.5,λ =5,则由推论2,需要计算对T=120,140,160,180,200,220,240 分别计算α(ai|T)和α′(ai|T)i=0,1,2,结果列入表2.比较表1和表2易知,当T=140时,检验结果发生了改变,这是因为我们选取了g0=20,g1=9,g2=9.说明我们对错误接受H0这一行动比较敏感,不能轻易接受H0.表2 例2计算及检验结果8 3.13E-29 5.78E-30 α(a1|T) 2.37E-24 8.79E-26 4.77E-27 3.79E-28 4.53E-29 7.60E-30 1.59E-30 α(a2|T) 2.84E-241.28E-25 8.44E-27 7.32E-28 7.71E-29 9.35E-30 1.26E-30贝叶斯检验接受H0 接受H1 接受H1 接受H1 接受H1 接受H1 接受H2 α'(a0|T)2.11E-25 1.94E-26 2.07E-27 2.56E-283.65E-29 3.75E-30 1.08E-30 α'(a1|T)4.10E-25 1.58E-26 8.78E-28 7.08E-29 8.49E-30 1.42E-30 2.95E-31 α'(a2|T) 4.99E-25 2.34E-26 1.54E-27 1.38E-28 1.47E-29 1.78E-302.41E-31贝叶斯检验接受H0 接受H1 接受H1 接受H1 接受H1 接受H1 接受H 120 140 160 180 200 220 240 α(a0|T) 1.12E-24 1.02E-25 1.09E-261.35E-27 1.92E-2 T 2参考文献:[1]Lehmann E L.Testing Statisticalhypotheses(Second Ddition)[M].New York:John Wiley& Sons,1986.[2]Delgado M,Verdegay J L,Vila M A.Testing Fuzzy Hypotheses:a Bayesian Approach[A].Gupt a M M,Kandel A,Bandler W,Kiszka JB.Approximate Reasoning in Expert Systems[C].Amsterdam:Elsevier,1985:307-316.[3]Casals M R.Bayesian Testing of Fuzzy Parametric Hypotheses from Fuzzy Information[J].RAIRO Operations Research,1993,27(2):189-199.[4]Taheri S M,Behboodian J.A Bayesian Approach to Fuzzy Hypotheses Testing[J].Fuzzy Sets and Systems,2001,123(1):39-48.[5]Taheri S M,Behboodian J.Fuzzy Hypotheses Testing with Fuzzy Data:a Bayesian Approach[A].Lecture Notes in Artifical Intelligence2275[C].Springer,2002:527-533[6]Saade J J.Extension of Fuzzy Hypothesis Testing with Hybrid Data.Fuzzy Sets Systems,1994,63:57-71.[7]魏立力,张文修.两参数指数分布模型多重模糊假设检验的贝叶斯方法[J].系统工程,2002,20(2):1-5.[8]魏立力,张文修.多重模糊假设检验的贝叶斯方法[J].工程数学报,2004,21(3):400-416.[9]沈菊红,魏立力.指数分布模型模糊假设检验的贝叶斯方法[J].宁夏大学学报:自然科学版,2005,26(1):26-29.[10]马翠玲,张德存,李彪.两参数威布尔分布模型多重模糊假设检验的贝叶斯方法[J].海军航空工程学院学报,2009,24(1):97-100.[11]夏亚峰,王瑞.两参数指数-威布尔分布模型多重模糊假设检验的贝叶斯方法[J].甘肃科学学报,2011,24(3):1-5.[12]吕会强,高连华,陈春良.艾拉姆咖分布及其在保障性数据分析中的应用[J].装甲兵工程学院学报,2002,16(3):48-52.[13]陈希孺.数理统计引论[M].北京:科学出版社,1997.[14]范金成,吴可法,统计推断导引[M].北京:科学出版社,2001.。