State determination an iterative algorithm

格式：pdf
大小：163.45 KB
文档页数：16

下载文档原格式

Protecting coherence in Optimal Control Theory State dependent constraint approach

ˆ (t) backward in time in time and the target operator B [11, 12, 13]. The prospect of quantum computing has posed an even more complex control problem: imposing a unitary transˆ on a subset of quantum states which act as formation U the quantum register. The unitary transformation carries out a speciﬁc computational task. This control task is equivalent to N simultaneous state-to-state transformations [6, 14, 15]. The solution of the iterative set of equations has been shown to become exponentially more diﬃcult with the size N of the unitary transformation [6]. These ﬁndings are in accordance with a very complex control landscape [16]. A further step up in complexity is the task of imposing a unitary transformation under dissipative conditions. This task emerges in the quantum governor [17], in quantum information processing and it is a traditional task in nuclear magnetic resonance (NMR) spectroscopy [18]. In any practical implementation the positive task of obtaining the ﬁnal goal has to be weighted by possible negative consequences. For example control ﬁelds of high intensity can damage the system by causing ionization or dissociation. A remedy for this problem consists in restricting the population in certain lossy excited state manifolds. This task has been the motivation for the development of local control theory (LCT) [19, 20]. LCT has been applied to lock unwanted electronic excitations [21] and recently to the problem of quantum information processing where avoiding population loss becomes crucial [22, 23]. However, OCT is more powerful than LCT and it is therefore desirable to incorporate constraints describing negative consequences of the control process into the algorithm. Such constraints depend on the state of the system at intermediate times [24, 25, 26, 27]. For example, the system can simply be restricted to remain in an “allowed” or to avoid a “forbidden” subspace during its evo-

Large-N Gauge Theories

2
ቤተ መጻሕፍቲ ባይዱ
YURI MAKEENKO
While QCD is simpliﬁed in the large-Nc limit, it is still not yet solved. Generically, it is a problem of inﬁnite matrices, rather than of inﬁnite vectors as in the theory of second-order phase transitions in statistical mechanics. We start these Lectures by showing how the 1/N -expansion works for the O(N )-vector models, and describing some applications to the four-Fermi interaction and the nonlinear sigma model. Then we concentrate on the Yang–Mills theory at large Nc . The methods described in these Lectures are developed mostly in the seventies and the beginning of the eighties. They are used over and over again when discussing a relation between Gauge Theory and String Theory. The conformal invariance, described in Section 2 for the four-Fermi interaction, is used in the modern approaches based on the AdS/CFT correspondence [2] (discussed in the lectures by L. Thorlacius at this School). The notion of the large-N limit, described in Section 3, is widely used, in particular, in Matrix Theory [3] (discussed in the lectures by W. Taylor at this School). The loop equation, described in Section 4, has been recently applied [4] for studying the AdS/CFT correspondence. The large-N reduced models, described in Section 5, are applied to a nonperturbative formulation of superstring [5] and very recently to noncommutative gauge theory [6]. The content of my ﬁfth lecture at this School, which was devoted to an application of the large-N methods to Matrix Theory at ﬁnite temperature, is not included in the text. It is mostly contained in Ref. [7]. 2. O(N ) Vector Models The simplest models, which become solvable in the limit of a large number of ﬁeld components, deal with a ﬁeld which has N components forming an O(N ) vector in an internal symmetry space. A model of this kind was ﬁrst considered by Stanley [8] in statistical mechanics and is known as the spherical model. The extension to quantum ﬁeld theory was done by Wilson [9] both for the four-Fermi and ϕ4 theories. In the framework of perturbation theory, the four-Fermi interaction is renormalizable only in d = 2 dimensions and is non-renormalizable for d > 2. The 1/N -expansion resums perturbation-theory diagrams after which the four-Fermi interaction becomes renormalizable to each order in 1/N for 2 ≤ d < 4. An analogous expansion exists for the nonlinear O(N ) sigma model. The 1/N expansion of the vector models is associated with a resummation of Feynman diagrams. A very simple class of diagrams — the bubble graphs — survives to the leading order in 1/N . This is why the large-N

Calculating state-to-state transition probabilities within TDDFT

a r X i v :q u a n t -p h /0511167v 1 16 N o v 2005Calculating state-to-state transition probabilities within TDDFTNina Rohringer,Simone Peter,Joachim Burgd¨o rferInstitute for Theoretical Physics,Vienna University of Technology,A-1040Vienna,AustriaAbstractThe determination of the elements of the S-matrix within the framework of time-dependent density-functional theory (TDDFT)has remained a widely open question.We explore two diﬀerent methods to calculate state-to-state transition probabilities.The ﬁrst method closely follows the extraction of the S-matrix from the time-dependent Hartree-Fock approximation.This method suﬀers from cross-channel correlations resulting in oscillating transition probabilities in the asymptotic channels.An alternative method is proposed which corresponds to an implicit functional in the time-dependent density.It gives rise to stable and accurate transition probabilities.An exactly solvable two-electron system serves as benchmark for a quantitative test.As a matter of principle,time-dependent density functional theory [1]provides a highly eﬃcient method to solve the time-dependent quantum many-body problem.It yields directly the time-dependent one-particle density n ( r ,t )of the many-body system.All physical ob-servables of the quantum system can,in principle,be determined from the density.In practice there a two essential ingredients to a TDDFT calculation.First an approximation to the time-dependent exchange-correlation potential V xc [n ]( r ,t )has to be found which viathe non-interacting Kohn-Sham system determines the evolution of the density.The second ingredient are functionals that allow the extraction of physical observables from the density.For some of the observables such as the ground-state energy extraction is straight forward within ground-state density functional theory [2].Excited-state spectra have been obtained from linear-response functionals [3,4].Beyond linear response,the time-dependent dipole moment which governs the emission of high-harmonic radiation can be directly determined from n ( r ,t ).Ionization probabilities can be approximately extracted by identifying the in-tegrated density beyond a certain critical distance from the bound system with the ﬂux of ionized particles [5,6].However,in general,on the most fundamental level,state-to-state transition probabilities contain the full information on the response of a many-body systemto an external perturbation.One example are bound-bound transition amplitudes required, e.g.in coherent control calculations of laser-matter interactions within TDDFT[7],cur-rently a hot topic since atto-second laser pulses allow the control of the electron dynamics. This poses one fundamental question:How can state-to-state transition probabilities be extracted from TDDFT?The ultimate goal of the study of the in general non-linear response of the many-body sys-tem to a time-dependent perturbation is the determination of the state-to-state transition amplitudeS if=limχf|U(t,−t)|χi ,(0.1)t→∞where|χi,f are the initial(ﬁnal)channel states of the system prior to(i)and after(f)the perturbation,and U(t1,t2)is the time evolution operator of the system.The challenge is, thus,to construct a functional S i,f[n]that allows to extract S if from TDDFT.The present paper addresses methods to extract transition probabilities between discrete states of the many-body system.As point of reference,we investigateﬁrst the evaluation of eq.(0.1) employing Kohn-Sham orbitals in close analogy to the time-dependent Hartree-Fock(TDHF) method(see[8,9,10,11]and references therein).This method would be the equivalent of the at least zeroth order of a time-dependent many-body perturbation theory(S-matrix theory)in which the time-dependent non-interacting Kohn-Sham system is considered as the unperturbed system.This method involves three steps of approximations:The initial state,theﬁnal state and the many-body propagator are approximated by their TDDFT equivalents.We encounter similar conceptual problems(“cross-channel correlations”)as TDHF does.We then formulate a novel functional that allows the determination of S i,f[n] which is shown to be free of these deﬁciencies.We test the method with the help of an exactly solvable two-electron model.We consider an interacting N-electron system of Hamiltonian H0with stationary eigenstates χi,f which is subject to a perturbation V(t)which is switched on at time t=0and switched oﬀat time t=τ.The initial state of the system|χi is assumed to be the ground state and evolves according to the time-dependent many-body Schr¨o dinger equation(in a.u.)∂ithe unperturbed systemS i,f=limt→∞χf|Ψ(t) .(0.3) For later reference we note that the time evolution of the projection amplitude for t>τis given in terms of the eigenenergies of the asymptoticﬁnal states,εf,byχf|Ψ(t) =exp(−iεt(t−τ)) χf|Ψ(τ) .(0.4) Since the perturbation vanishes for t>τ,the state-to-state transition probability is given by P i,f=|S i,f|2=| χf|Ψ(τ) |2.Within TDDFT,the time-dependent density is represented through the time-dependent Kohn-Sham spin-orbitalsΦσ,j( r,t)asn( r,t)= σ=↑,↓nσ( r,t)= σ=↑,↓Nσ j=1|Φσ,j( r,t)|2,(0.5) where Nσdenotes the number of electrons of spinσ.The one-particle spin-orbitalsΦσ,j( r,t) evolve according to the time-dependent Kohn-Sham equation governed by the one-particle Kohn-Sham HamiltonianH KSσ[n↑,n↓]=−1is imposed onχf when the evolution is calculated by forward-propagation.The simplest choice for channel states|χf are Kohn-Sham Slater determinants built up from occupied and virtual Kohn-Sham orbitals|Φσ,j of the ground state DFT problem,i.e.S if∼=limt→∞ χDF Tf|ΨT DDF T(t) .(0.7)Reliable transition probabilities can only be expected if both,time-dependent and station-ary Kohn-Sham Slater determinants are good approximations for the time-dependent and stationary many-body wavefunctions,respectively.Excited states often show a higher de-gree of correlation and a single Kohn-Sham Slater determinant is no longer a satisfactory approximation.In those cases,more elaborateﬁnal-state channel functions are needed.Al-ternatively,conﬁguration-interaction(CI)or multi-conﬁguration Hartree-Fock(MCHF)can be employed[14,15].Within TDDFT,linear response theory allows the calculation of the excitation spectrum by including particle-hole excitations[3,4].As a by-product,improved excited-states,i.e.the particle-hole reduced density matrix,are generated in terms of an expansion in single-particle excitations of Kohn-Sham orbitals.In the present case of a two-electron system this approach should give an improved wavefunction compared to the initial single Kohn-Sham Slater determinant[16].As discussed below,also this approach suﬀers from the same short-comings as does eq.(0.7).We therefore introduce a new functional which depends only on the time-dependent den-sity.For simplicity and in line with the present model system,the derivation is presented for two-electron systems.Generalizations to arbitrary many-electron systems are straight forward.Starting point is the expansion of the exact time-dependent wavefunction in terms of a complete set ofﬁnal-state wavefunctionsΨ( r1, r2,t)= f χf|Ψ(t) χf( r1, r2).(0.8) Using the stationary one-particle reduced density matrixρ(1)f′,f( r)=2 d r2χ∗f′( r, r2)χf( r, r2)(0.9) and the time-dependent transition density matrix deﬁned byT f′,f(t)= χf′|Ψ(t) ∗ χf|Ψ(t) (0.10) the exact time-dependent density is given byn( r,t)= f,f′T f′,f(t)ρ(1)f′,f( r)=Tr T(t)ρ(1)( r) .(0.11)FIG.1:state |N cm=1,n rel=0 (ﬁgure b)and second excited state|N cm=2,n rel=0 of exact calculation (full line)and TDDFT calculation(dashed line)for a conﬁning frequencyω=0.25.TDDFT occupation probabilities are obtained using the approximate S-matrix eq.(0.7),ﬁnal channel states are Slater determinants of Kohn-Sham orbitals|0,0 ,|0,1 and|0,2 .Parameters of the laser pulse: F0=0.07,ωL=0.1839,τ=168The transition density matrix and,in particular,transition probabilities|S i,f|2can be directly determined by inversion of eq.(0.11).Our primary interest lies in the diagonal elements(f′=f)of the transition density matrix T f′,f(t)→S∗i,f′(t)S i,f(t)at times after the switch-oﬀof the external perturbation t>τ.In this case the inversion problem of dimension N F×N F(N F:dimension of truncatedﬁnal-state space considered)can be drastically sim-pliﬁing eq.(0.4),for non-degenerateﬁnal states(εf=εf′),the transition probabilities T ff can be extracted from a time-average over an interval(t−τ)|εf′−εf|≫2π,¯n( r,t):= tτn( r,t′)t−τdt′,(0.12) leading to the read-out functional¯n( r,t)= fρ(1)f,f( r)|S fi|2.(0.13)limt→∞Unlike eq.(0.11),eq.(0.13)requires only an N F-dimensional inversion.In practice,theapplication of eq.(0.13)requires evaluating theﬁnal state densitiesρ(1)f,f( r)at N F distinctpoints r j j=1,...,N f,so that the matrix R f,j:=ρ(1)f,f( r j)does not become near-singular and remains invertible.The state-to-state transition probabilities then become|S i,f|2=limt→∞N fj=1¯n( r j,t)R−1f,j f=1,..,N f.(0.14)We have tested the functionals(eqs.(0.7)and(0.14))for an exactly solvable system of two electrons conﬁned to a harmonic quantum dot[17,18].In its present1D version, the electron-electron interaction must be replaced by a soft Coulomb potential[19].The Hamiltonian of the system is given byˆH(t)= i=1,2 ˆp2i2x2i−F(t)x i +1b+(x1−x2)2,(0.15) where x i and p i are the coordinates and momenta of electron i(i=1,2).The softening parameter b is set to b=0.55.Similar systems to describe helium in one dimension were studied in the past[5,15,19].The laserﬁeld F(t)with driving frequencyωL=0.1839 and a peakﬁeld amplitude F0=0.07is treated in dipole approximation.A pulse length τ=168was chosen with a two cycle turn on,a two cycleﬂat top and a two cycle turn oﬀ.Introducing center of mass(c.o.m.)coordinates R=x1+x2and relative coordinates r=x1−x2the Schr¨o dinger equation(0.2)can be separated,since the Hamiltonian of eq.(0.15)splits intoˆH=ˆH rel+ˆH cm(t).The eigenstates of the unperturbed system are char-acterized by the set of quantum numbers(N cm,n rel),the number of nodes of the c.o.m.and relative wavefunction.Initial state is the spin-singlet ground state|N cm=0,n rel=0 .The time dependence of the total Hamiltonian is conﬁned to the c.o.m.Hamiltonian.The exact time-dependent wavefunction therefore separates into a time-dependent c.o.m.and a time-independent relative part,Ψ(r,R,t)=g(r)h(R,t).Since the system starts out from the ground state,h(R,t)represents a coherent state of the one-dimensional harmonic oscillator driven by an external electricﬁeld.The dynamics of the density of this model system is subject to the harmonic potential theorem[20,21],i.e.the density is rigidly shifted without any distortion.In the case of a two-electron spin-singlet system evolving from the ground state,the time-dependent Kohn-Sham scheme consists in solving the Kohn-Sham equation for one doubly-occupied Kohn-Sham orbitalΦ(x,t).For spin-unpolarized two-particle systems the exactFIG.2:Comparison of exact occupation probabilities(solid line)and transition probabilities ob-tained by inversion of eq.(0.14)(dashed line)for the ground state(ﬁgure a),ﬁrst excited state (ﬁgure b)and second excited state(ﬁgure c).V xc(t)can be constructed using the exact density derived from the Schr¨o dinger equation and inverting the Kohn-Sham equation[22,23].Since the harmonic two-electron quantum dot satisﬁes the harmonic potential theorem the time-dependence of the exchange-correlation potential is a rigid shift and the exact V xc(t)is easily constructed.We also employed the adiabatic local spin density approximation(ALSDA)with self-interaction correction(SIC) [24].Since no reliable correlation potential is available for a one-dimensional electron system we only consider exchange.The L1norm of the deviation between the exact and TDDFT density is about0.2for the highly correlated system ofω=0.25.The ground state Kohn-Sham equation generates a set of excited virtual Kohn-Sham orbitals|n .Figure1shows a comparison of exact occupation probabilities and those obtained from the approximate S-matrix(eq.(0.7))by projectingΨT DDF T(t)=Φ(x1,t)Φ(x2,t)onto Kohn-Sham Slater determinants.Shown are the occupation of the ground state(ﬁgure a,projection onto the Kohn-Sham determinant|0,0 ),theﬁrst excited state(ﬁgure b,projection onto|0,1 ) and the second excited state(ﬁgure c,projection onto0,2 ).The second excited state |N cm=2,n rel=0 involves a conﬁguration mixture of at least two Kohn-Sham Slater de-terminants to be well represented(conﬁgurations|0,2 and|1,1 ).Neither the projectiononto a single Kohn-Sham conﬁguration state|0,2 (ﬁgure1c)nor the projection onto the exact excited state(not shown)yields satisfactory transition probabilities.After the switch-oﬀof the laser,the density undergoes oscillations resulting in time-dependent Hartree and exchange correlation potentials which give rise to oscillations in the occupation probabili-ties.These are the signatures of the”spurious cross-channel correlations”well-known from TDHF[9,10].For the present system,exact excitedﬁnal states can be easily calculated.We have checked that cross-channel correlations persist when we project onto exact exit-channel states rather than Kohn-Sham determinants.Moreover,we also tested channel states ob-tained from a TDDFT linear-response equation.In this equation the exchange-correlation kernel of TDDFT was approximated by the exchange-only time-dependent optimized ef-fective potential[4,25]and the exact Kohn-Sham orbitals obtained by the exact ground state exchange-correlation potential have been used.The obtained excited-state wavefunc-tions,however,do not signiﬁcantly diﬀer from the single Kohn-Sham Slater determinants although the excitation energies are considerably improved compared to the Kohn-Sham en-ergy diﬀerences.With presently available approximations to the exchange-correlation kernel, TDDFT linear-response theory does not provide improved excited-state wavefunctions.We thus conclude that the projection amplitude of eq.(0.7)is not well-suited to determine an approximate S-matrix within TDDFT.To test the newly proposed read-out functional which depends only on the density,we use the exact time-dependent density n(x,t).In this way,errors due to the approximate exchange-correlation potential can be ruled out and the quality of the proposed functional for state-to-state transition probabilities can be directly assessed.The sum in eqs.(0.12), (0.13)and(0.14)is truncated after the second excited state.Figure2shows a comparison of transition probabilities obtained by projecting the wavefunction according to eq.(0.3) (solid line)and by the new density functional of eq.(0.14)(dashed line)for the three lowest lying states.In the limit of t→∞,i.e.as the averaging interval in eq.(0.12)increases, the transition probability converges within the numerical accuracy towards the exact result. Numerical errors are due to the truncation of the sum overﬁnal states in eq.(0.14).Note that simply averaging over the cross-channel correlation in eq.(0.7)(seeﬁgure1)would lead, in general,to incorrect results.Similar good agreement can be found for other initial states and other systems[26].In conclusion,we have investigated two diﬀerent methods to extract state-to-state transi-tion amplitudes from TDDFT calculations.In theﬁrst method,the correlated many-body wavefunction is approximated by a Slater determinant of the time-dependent Kohn-Sham orbitals.This approximate wave-function is projected onto appropriateﬁnal states(exact states or Kohn-Sham conﬁguration states).The resulting state-to-state transition prob-abilities suﬀer oscillations after the switch-oﬀof the external perturbation.The second read-out functional to calculate state-to-state transition probabilities directly involves the time-dependent densities and represents thus a well-suited density functional within the framework of TDDFT.The problem of cross-channel correlations can be avoided and well-deﬁned transition probabilities can be determined in the asymptotic limit t→∞.First results obtained by evaluating the read-out functional with densities resulting from the exact solution of a one-dimensional two-electron Schr¨o dinger equation are very promising. The application of the read-out functional to double excitation of helium are currently being investigated.Work supported by FWF-SFB016.[1] E.Runge,E.K.U.Gross,Phys.Rev.Lett.52,997(1984).[2]P.Hohenberg and W.Kohn,Phys.Rev.136,B864(1964)[3]M.E.Casida,in Recent developments and applications of modern density functional theory,edited by J.M.Seminaro(Elsevier,Amsterdam,1996)p.391[4]M.Petersilka,E.K.U.Gross,K.Burke,Int.J.Quant.Chem.80,534(2000).[5] ppas,R.van Leeuwen,J.Phys.B:At.Mol.Opt.Phys.31,L249(1998)[6]M.Petersilka,E.K.U.Gross,Laser Physics9,105(1999).[7]J.Werschnik,E.K.U.Gross,J.Chem.Phys.123,62206(2005).[8]P.Bonche,S.Koonin,J.W.Negele,Phys.Rev.C13,1226(1976).[9]J.J.Griﬃn,P.C.Lichtner,M.Dworzecka,Phys.Rev.C21,1351(1980).[10]Y.Alhassid,S.E.Koonin,Phys.Rev.C23,1590(1981).[11]J.W.Negele,Rev.Mod.Phys.54,913(1982)[12] A.G¨o rling,M.Levy,Phys.Rev.B47,13105(1993).[13] A.G¨o rling,M.Levy,Phys.Rev.A50,196(1994).[14]M.H.Beck,A.J¨a ckle,G.A.Worth and H.-D.Meyer,Physics Reports324,1(2000).[15]J.Zanghellini,M.Kitzler,T.Brabec,A.Scrinzi,J.Phys.B:At.Mol.Opt.Phys.37,763(2004).[16]Angel Rubio,private communications[17]M.Taut,Phys.Rev.A48,3561(1993).[18]ufer,J.B.Krieger,Phys.Rev.A33,1480(1986)[19]R.Grobe,J.H.Eberly,Phys.Rev.A48,4664(1993).[20]J.F.Dobson,Phys.Rev.Lett.73,2244(1994).[21]G.Vignale,Phys.Rev.Lett.74,3233(1995).[22]I.D’Amico,G.Vignale,Phys.Rev.B59,7876(1999).[23]M.Lein,S.K¨u mmel,Phys.Rev.Lett.94,143003(2005)[24]J.P.Perdew,A.Zunger,Phys.Rev.B23,5048(1982).[25] C.A.Ullrich,U.J.Gossmann,E.K.U.Gross,Phys.Rev.Lett.74,872(1995).[26]N.Rohringer,S.Peter,J.Burgd¨o rfer to be published。

非线性方程重根的三阶迭代方法

非线性方程重根的三阶迭代方法李福祥;田金燕;费兆福;巩英海;曹作宝【摘要】为提高求解非线性方程的收敛速度和计算效率,以牛顿法为基础提出一种求解非线性方程重根的迭代方法,该方法以重数已知为前提,迭代格式根据重数为奇数和偶数两种情形分别给出,两种迭代格式每步迭代都只需计算三个函数值(包含一阶导数值)且完全摆脱了二阶导数值的计算,其收敛效果皆可达到三阶.算例实验结果验证了该迭代方法的有效性.他丰富了非线性方程求根的方法,在理论上和应用上都有一定的价值.【期刊名称】《哈尔滨理工大学学报》【年(卷),期】2013(018)006【总页数】4页(P117-120)【关键词】非线性方程;牛顿法;重根;迭代法;三阶收敛【作者】李福祥;田金燕;费兆福;巩英海;曹作宝【作者单位】哈尔滨理工大学应用科学学院,黑龙江哈尔滨 150080;哈尔滨理工大学应用科学学院,黑龙江哈尔滨 150080;哈尔滨理工大学应用科学学院,黑龙江哈尔滨 150080;哈尔滨理工大学应用科学学院,黑龙江哈尔滨 150080;哈尔滨理工大学应用科学学院,黑龙江哈尔滨 150080【正文语种】中文【中图分类】O24求解非线性方程f(x)=0是数学界经久不衰的研究课题，究其原因就是其在科学研究以及生产生活中的广泛应用，而迭代法又是求解非线性方程最为常用的方法之一．本文着重研究求解非线性方程重根的情形．寻找非线性方程重根的迭代方法通常以重数已知为前提，最为经典的当属二阶收敛的Newton迭代法，为提高其收敛速度一系列高阶迭代方法得到人们的广泛研究，这些方法大多是三阶收敛的，大致可分为两类，一类叫做单点法每步迭代需要计算三个函数值分别是 f，f'，和f″，比如欧拉－切比雪夫方法［1－2］哈雷方法［3］，Chun and Neta 方法［4］，Biazar and Ghanbari法［5］，和 Osada 法［6］等都属于此类，此类方法相比于经典的Newton法收敛效率大幅提高，但是每部迭代需要计算二阶导数值，这无疑增加了迭代过程的复杂度;另一类叫做多点法(本文的方法属此类)又可以被分成两子类，一个子类需要计算两个函数值和一个一阶导数值，比如文［7］，文［9］，文［10］等人给出的方法，另一子类需要计算两个一阶导数值和一个函数值，比如文［11－14］等人给出的方法，此类方法不仅提高了收敛速率，计算量也得到了很好的控制．再次就是四阶收敛的迭代方法，例如文［15－18］等人提出的方法，增加一个函数计算(f或f')将收敛速率提高到了四阶．由于受到上述研究工作的启发，为提高求解非线性方程的收敛速度和计算效率，本文以牛顿法为基础提出一种求解非线性方程重根的迭代方法，迭代格式根据重数为奇数和偶数两种情形分别给出，两种迭代格式每步迭代都只需计算三个函数值(包含一阶导数值)且完全摆脱了二阶导数值的计算，其收敛效果皆可达到三阶．正如引言中所提到的，本文的迭代方法同样以重数m已知为前提，当m为奇数时迭代格式如下所示:从迭代格式可以发现，不管重数是奇数还是偶数，每步迭代都仅需计算三个函数值，接下来我们将给出其收敛性证明．定理设x*∈D是充分光滑的实值函数f:D⊆R→R在开区间D上的m重根，当x0∈D充分靠近x*时，由式(1)、(2)决定的迭代方法至少是三阶收敛的．注意到，无论重数m是奇数还是偶数，由式(1)、(2)决定的三阶迭代方法每步迭代都只需计算三个函数值．其效率指数为，这里p指代收敛阶数，c指代所需计算函数个数．在这里将本文给出的方法应用到如下数值算例中，为展示本文迭代方法的属性及可行性，我们将从引言中提到的两类三阶迭代法中分别挑选出四种方法与本文迭代法(用M代表)做对比，SM(属单点法，需计算 f，f'，和f″)代表文［14］提出的迭代法，HMM(需计算 f，f，f')出自于文［10］，LM － 1(同HMM)和 LM －2(需计算 f，f'，f')都出自文献［12］．所有的数学计算都将用数学软件Mathematica完成，选择|x－x*|≤10－100作为终止条件．数值计算结果将呈现在表1中，其中en保留五位有效小数，n表示迭代次数．2)f2=ex－2+1 －x，f2(x)=0 有 2 重根 x*=2，选择x0=2.4为初始值;3)f3(x)=sin［(x－3)5］，f3(x)=0有 5重根x*=3，选择x0=2.5为初始值．4)f4(x)=(x－1)(ex－1－1)，f4(x)=0 有 2 重根x*=1，选择x0=1.5为初值．从表1中可以看出，本文构造的迭代方法M与同收敛阶的其他方法SM，HMM，LM－1，LM－2均能以较快的速度收敛．由算例可以明显发现本文提出的迭代方法是三阶收敛的．田金燕(1988—)，女，硕士研究生;费兆福(1989—)，男，硕士研究生．【相关文献】［1］TRAUB J F．Iterative Methods for the Solution of Equations［M］．NJ，Prentice-Hall:Englewood Cliffs，1964:10 －85．［2］NETA B．New Third Order Nonlinear Solvers for Multiple Roots，Appl ［J］．Math．Comput．，2008，202:162 －170．［3］HALLEY E．A New Exact and Easy Method of Finding the Roots of Equations Generally and without any Previous Reduction［J］．Phil．Trans．Roy．Soc．Lori．，1694，18:136 －148．［4］CHUN C，Neta B．A Third Order Modification of Newton’s Method for Multiple Roots［J］．Appl．Math．Comput．，2009，211:474－479．［5］BIAZAR J，GHANBARI B．A New third-order Family of Nonlinear Solvers for Multiple Roots［J］．Comput．Math．Appl．，2010，59:3315－3319．［6］OSADA N．An Optimal Multiple Root-finding Method of Order Three ［J］．Comput．Appl．Math．，1994，51:131 －133．［7］DONG C．A Basic Theorem of Constructing an Iiterative Formula of the Higher Order for Computing Multiple Roots of an Equation［J］．Math．Numer．Sinica，1982，11:445 －450．［8］NETA B．New Third Order Nonlinear Solvers for Multiple Roots ［J］．Appl．Math．Comput．，2008，202:162 － 170．［9］CHUN C，Bae H，Neta B．New Families of Nonlinear Third-order Solvers for Finding Multiple Roots［J］．Comput．Math．Appl．，2009，57:1574－1582．［10］HOMEIER H H H．On Newton-type Methods for Multiple Roots with Cubic Convergence ［J］．Comput．Appl．Math．，2009，231:249－254．［11］DONG C．A Family of Multipoint Iterative Functions for Finding Multiple Roots of Equations ［J］．Int．J．Comput．Math．，1987，21:363－367．［12］LI S，LI H，CHENG L．Some Second-derative-free Variants of Halley’s Method for Multiple Roots［J］．Appl．Math．Comput．，2009，215:2192 －2198．［13］SHARMA J R，SHARMA R．New Third and Fourth Order Nonlinear Solvers for Computing Multiple Roots［J］．Appl．Math．Comput，2011，217:9756 －9764．［14］SHARMA J R，SHARMA R．Modified Chebyshev-Hally Type Method and Its Variants for Computing Mmultiple Roots［J］．Numer Algor，2012，61:567 －578．［15］LI S，CHENG L，NETA B．Some Fourth-order Nonlinear Solvers with Closed Formulae for Multiple Roots ［J］．Comput．Math．Appl．，2010，59:126 －135．［16］NETA B．Extension of Murakami’s high-order Nonlinear Solver to Multiple Roots ［J］．Int．J．Comput．Math，2010，87:1023 －1031．［17］NETA B，JOHNSON A N．High Order Nonlinear Solver for Multiple Roots ［J］．Comput．Math．Appl，2008，55:2012 －2017．［18］SHARMA J R，SHARMA R．New Third and Rourth Order Nonlinear Solvers for Computingmultiple Roots［J］．Appl．Math．Comput，2011，217:9756 －9764．。

15_Speech Signal Processing(语音信号处理)

General Approaches
Time Domain Coders and Linear Prediction Linear Predictive Coding (LPC) is a modeling technique that has seen widespread application among timedomain speech coders, largely because it is computationally simple and applicable to the mechanisms involved in speech production. In LPC, general spectral characteristics are described by a parametric model based on estimates of autocorrelations or autocovariances. The model of choice for speech is the all-pole or autoregressive (AR) model. This model is particularly suited for voiced speech because the vocal tract can be well modeled by an all-pole transfer function. In this case, the estimated LPC model parameters correspond to an AR process which can produce waveforms very similar to the original speech segment. Differential Pulse Code Modulation (DPCM) coders (i.e., ITU-T G.721 ADPCM [CCITT, 1984]) and LPC vocoders (i.e., U.S. Federal Standard 1015 [National Communications System, 1984]) are examples of this class of time-domain predictive architecture. Code Excited Coders (i.e., ITU-T G728 [Chen, 1990] and U.S. Federal Standard 1016 [National Communications System, 1991]) also utilize LPC spectral modeling techniques.1 Based on the general spectral model, a predictive coder formulates an estimate of a future sample of speech based on a weighted combination of the immediately preceding samples. The error in this estimate (the prediction residual) typically comprises a signiﬁcant portion of the data stream of the encoded speech. The residual contains information that is important in speech perception and cannot be modeled in a straightforward fashion. The most familiar form of predictive coder is the classical Differential Pulse Code Modulation (DPCM) system shown in Fig. 15.1. In DPCM, the predicted value at time instant k, ˆ s(k Έ k – 1), is subtracted from the input signal at time k, s(k), to produce the prediction error signal e(k). The prediction error is then approximated (quantized) and the quantized prediction error, eq(k), is coded (represented as a binary number) s(k Έ k – 1) to yield a for transmission to the receiver. Simultaneously with the coding, eq(k) is summed with ˆ s(k). Assuming no channel errors, an identical reconstruction, reconstructed version of the input sample, ˆ distorted only by the effects of quantization, is accomplished at the receiver. At both the transmitter and receiver, the predicted value at time instant k +1 is derived using reconstructed values up through time k, and the procedure is repeated. N ˆ (z) = 0 and Â(z) = The ﬁrst DPCM systems had B a z -i , where {ai ,i = 1…N} are the LPC coefﬁcients i =1 i –1 and z represents unit delay, so that the predicted value was a weighted linear combination of previous reconstructed valuesJ. Watson Research Center

Conditional Random Fields_ Probabilistic Models for Segmenting and Labeling Sequence Data

Conditional Random Fields:Probabilistic Modelsfor Segmenting and Labeling Sequence DataJohn Lafferty LAFFERTY@ Andrew McCallum MCCALLUM@ Fernando Pereira FPEREIRA@ WhizBang!Labs–Research,4616Henry Street,Pittsburgh,PA15213USASchool of Computer Science,Carnegie Mellon University,Pittsburgh,PA15213USADepartment of Computer and Information Science,University of Pennsylvania,Philadelphia,PA19104USAAbstractWe present,a frame-work for building probabilistic models to seg-ment and label sequence data.Conditional ran-domﬁelds offer several advantages over hid-den Markov models and stochastic grammarsfor such tasks,including the ability to relaxstrong independence assumptions made in thosemodels.Conditional randomﬁelds also avoida fundamental limitation of maximum entropyMarkov models(MEMMs)and other discrimi-native Markov models based on directed graph-ical models,which can be biased towards stateswith few successor states.We present iterativeparameter estimation algorithms for conditionalrandomﬁelds and compare the performance ofthe resulting models to HMMs and MEMMs onsynthetic and natural-language data.1.IntroductionThe need to segment and label sequences arises in many different problems in several scientiﬁcﬁelds.Hidden Markov models(HMMs)and stochastic grammars are well understood and widely used probabilistic models for such problems.In computational biology,HMMs and stochas-tic grammars have been successfully used to align bio-logical sequences,ﬁnd sequences homologous to a known evolutionary family,and analyze RNA secondary structure (Durbin et al.,1998).In computational linguistics and computer science,HMMs and stochastic grammars have been applied to a wide variety of problems in text and speech processing,including topic segmentation,part-of-speech(POS)tagging,information extraction,and syntac-tic disambiguation(Manning&Sch¨u tze,1999).HMMs and stochastic grammars are generative models,as-signing a joint probability to paired observation and label sequences;the parameters are typically trained to maxi-mize the joint likelihood of training examples.To deﬁne a joint probability over observation and label sequences, a generative model needs to enumerate all possible ob-servation sequences,typically requiring a representation in which observations are task-appropriate atomic entities, such as words or nucleotides.In particular,it is not practi-cal to represent multiple interacting features or long-range dependencies of the observations,since the inference prob-lem for such models is intractable.This difﬁculty is one of the main motivations for looking at conditional models as an alternative.A conditional model speciﬁes the probabilities of possible label sequences given an observation sequence.Therefore,it does not expend modeling effort on the observations,which at test time areﬁxed anyway.Furthermore,the conditional probabil-ity of the label sequence can depend on arbitrary,non-independent features of the observation sequence without forcing the model to account for the distribution of those dependencies.The chosen features may represent attributes at different levels of granularity of the same observations (for example,words and characters in English text),or aggregate properties of the observation sequence(for in-stance,text layout).The probability of a transition between labels may depend not only on the current observation, but also on past and future observations,if available.In contrast,generative models must make very strict indepen-dence assumptions on the observations,for instance condi-tional independence given the labels,to achieve tractability. Maximum entropy Markov models(MEMMs)are condi-tional probabilistic sequence models that attain all of the above advantages(McCallum et al.,2000).In MEMMs, each source state1has a exponential model that takes the observation features as input,and outputs a distribution over possible next states.These exponential models are trained by an appropriate iterative scaling method in the 1Output labels are associated with states;it is possible for sev-eral states to have the same label,but for simplicity in the rest of this paper we assume a one-to-one correspondence.maximum entropy framework.Previously published exper-imental results show MEMMs increasing recall and dou-bling precision relative to HMMs in a FAQ segmentation task.MEMMs and other non-generativeﬁnite-state models based on next-state classiﬁers,such as discriminative Markov models(Bottou,1991),share a weakness we callhere the:the transitions leaving a given state compete only against each other,rather than againstall other transitions in the model.In probabilistic terms, transition scores are the conditional probabilities of pos-sible next states given the current state and the observa-tion sequence.This per-state normalization of transition scores implies a“conservation of score mass”(Bottou, 1991)whereby all the mass that arrives at a state must be distributed among the possible successor states.An obser-vation can affect which destination states get the mass,but not how much total mass to pass on.This causes a bias to-ward states with fewer outgoing transitions.In the extreme case,a state with a single outgoing transition effectively ignores the observation.In those cases,unlike in HMMs, Viterbi decoding cannot downgrade a branch based on ob-servations after the branch point,and models with state-transition structures that have sparsely connected chains of states are not properly handled.The Markovian assump-tions in MEMMs and similar state-conditional models in-sulate decisions at one state from future decisions in a way that does not match the actual dependencies between con-secutive states.This paper introduces(CRFs),a sequence modeling framework that has all the advantages of MEMMs but also solves the label bias problem in a principled way.The critical difference between CRFs and MEMMs is that a MEMM uses per-state exponential mod-els for the conditional probabilities of next states given the current state,while a CRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequence.Therefore,the weights of different features at different states can be traded off against each other.We can also think of a CRF as aﬁnite state model with un-normalized transition probabilities.However,unlike some other weightedﬁnite-state approaches(LeCun et al.,1998), CRFs assign a well-deﬁned probability distribution over possible labelings,trained by maximum likelihood or MAP estimation.Furthermore,the loss function is convex,2guar-anteeing convergence to the global optimum.CRFs also generalize easily to analogues of stochastic context-free grammars that would be useful in such problems as RNA secondary structure prediction and natural language pro-cessing.2In the case of fully observable states,as we are discussing here;if several states have the same label,the usual local maxima of Baum-Welch arise.bel bias example,after(Bottou,1991).For concise-ness,we place observation-label pairs on transitions rather than states;the symbol‘’represents the null output label.We present the model,describe two training procedures and sketch a proof of convergence.We also give experimental results on synthetic data showing that CRFs solve the clas-sical version of the label bias problem,and,more signiﬁ-cantly,that CRFs perform better than HMMs and MEMMs when the true data distribution has higher-order dependen-cies than the model,as is often the case in practice.Finally, we conﬁrm these results as well as the claimed advantages of conditional models by evaluating HMMs,MEMMs and CRFs with identical state structure on a part-of-speech tag-ging task.2.The Label Bias ProblemClassical probabilistic automata(Paz,1971),discrimina-tive Markov models(Bottou,1991),maximum entropy taggers(Ratnaparkhi,1996),and MEMMs,as well as non-probabilistic sequence tagging and segmentation mod-els with independently trained next-state classiﬁers(Pun-yakanok&Roth,2001)are all potential victims of the label bias problem.For example,Figure1represents a simpleﬁnite-state model designed to distinguish between the two words and.Suppose that the observation sequence is. In theﬁrst time step,matches both transitions from the start state,so the probability mass gets distributed roughly equally among those two transitions.Next we observe. Both states1and4have only one outgoing transition.State 1has seen this observation often in training,state4has al-most never seen this observation;but like state1,state4 has no choice but to pass all its mass to its single outgoing transition,since it is not generating the observation,only conditioning on it.Thus,states with a single outgoing tran-sition effectively ignore their observations.More generally, states with low-entropy next state distributions will take lit-tle notice of observations.Returning to the example,the top path and the bottom path will be about equally likely, independently of the observation sequence.If one of the two words is slightly more common in the training set,the transitions out of the start state will slightly prefer its cor-responding transition,and that word’s state sequence will always win.This behavior is demonstrated experimentally in Section5.L´e on Bottou(1991)discussed two solutions for the label bias problem.One is to change the state-transition struc-ture of the model.In the above example we could collapse states1and4,and delay the branching until we get a dis-criminating observation.This operation is a special case of determinization(Mohri,1997),but determinization of weightedﬁnite-state machines is not always possible,and even when possible,it may lead to combinatorial explo-sion.The other solution mentioned is to start with a fully-connected model and let the training procedureﬁgure out a good structure.But that would preclude the use of prior structural knowledge that has proven so valuable in infor-mation extraction tasks(Freitag&McCallum,2000). Proper solutions require models that account for whole state sequences at once by letting some transitions“vote”more strongly than others depending on the corresponding observations.This implies that score mass will not be con-served,but instead individual transitions can“amplify”or “dampen”the mass they receive.In the above example,the transitions from the start state would have a very weak ef-fect on path score,while the transitions from states1and4 would have much stronger effects,amplifying or damping depending on the actual observation,and a proportionally higher contribution to the selection of the Viterbi path.3In the related work section we discuss other heuristic model classes that account for state sequences globally rather than locally.To the best of our knowledge,CRFs are the only model class that does this in a purely probabilistic setting, with guaranteed global maximum likelihood convergence.3.Conditional Random FieldsIn what follows,is a random variable over data se-quences to be labeled,and is a random variable over corresponding label sequences.All components of are assumed to range over aﬁnite label alphabet.For ex-ample,might range over natural language sentences and range over part-of-speech taggings of those sentences, with the set of possible part-of-speech tags.The ran-dom variables and are jointly distributed,but in a dis-criminative framework we construct a conditional model from paired observation and label sequences,and do not explicitly model the marginal..Thus,a CRF is a randomﬁeld globally conditioned on the observation.Throughout the paper we tacitly assume that the graph isﬁxed.In the simplest and most impor-3Weighted determinization and minimization techniques shift transition weights while preserving overall path weight(Mohri, 2000);their connection to this discussion deserves further study.tant example for modeling sequences,is a simple chain or line:.may also have a natural graph structure;yet in gen-eral it is not necessary to assume that and have the same graphical structure,or even that has any graph-ical structure at all.However,in this paper we will be most concerned with sequencesand.If the graph of is a tree(of which a chain is the simplest example),its cliques are the edges and ver-tices.Therefore,by the fundamental theorem of random ﬁelds(Hammersley&Clifford,1971),the joint distribu-tion over the label sequence given has the form(1),where is a data sequence,a label sequence,and is the set of components of associated with the vertices in subgraph.We assume that the and are given andﬁxed. For example,a Boolean vertex feature might be true if the word is upper case and the tag is“proper noun.”The parameter estimation problem is to determine the pa-rameters from training datawith empirical distribution. In Section4we describe an iterative scaling algorithm that maximizes the log-likelihood objective function:.As a particular case,we can construct an HMM-like CRF by deﬁning one feature for each state pair,and one feature for each state-observation pair:. The corresponding parameters and play a simi-lar role to the(logarithms of the)usual HMM parameters and.Boltzmann chain models(Saul&Jor-dan,1996;MacKay,1996)have a similar form but use a single normalization constant to yield a joint distribution, whereas CRFs use the observation-dependent normaliza-tion for conditional distributions.Although it encompasses HMM-like models,the class of conditional randomﬁelds is much more expressive,be-cause it allows arbitrary dependencies on the observationFigure2.Graphical structures of simple HMMs(left),MEMMs(center),and the chain-structured case of CRFs(right)for sequences. An open circle indicates that the variable is not generated by the model.sequence.In addition,the features do not need to specify completely a state or observation,so one might expect that the model can be estimated from less training data.Another attractive property is the convexity of the loss function;in-deed,CRFs share all of the convexity properties of general maximum entropy models.For the remainder of the paper we assume that the depen-dencies of,conditioned on,form a chain.To sim-plify some expressions,we add special start and stop states and.Thus,we will be using the graphical structure shown in Figure2.For a chain struc-ture,the conditional probability of a label sequence can be expressed concisely in matrix form,which will be useful in describing the parameter estimation and inference al-gorithms in Section4.Suppose that is a CRF given by(1).For each position in the observation se-quence,we deﬁne the matrix random variableby,where is the edge with labels and is the vertex with label.In contrast to generative models,con-ditional models like CRFs do not need to enumerate over all possible observation sequences,and therefore these matrices can be computed directly as needed from a given training or test observation sequence and the parameter vector.Then the normalization(partition function)is the entry of the product of these matrices:. Using this notation,the conditional probability of a label sequence is written as, where and.4.Parameter Estimation for CRFsWe now describe two iterative scaling algorithms toﬁnd the parameter vector that maximizes the log-likelihood of the training data.Both algorithms are based on the im-proved iterative scaling(IIS)algorithm of Della Pietra et al. (1997);the proof technique based on auxiliary functions can be extended to show convergence of the algorithms for CRFs.Iterative scaling algorithms update the weights asand for appropriately chosen and.In particular,the IIS update for an edge feature is the solution ofdef.where is thedef. The equations for vertex feature updates have similar form.However,efﬁciently computing the exponential sums on the right-hand sides of these equations is problematic,be-cause is a global property of,and dynamic programming will sum over sequences with potentially varying.To deal with this,theﬁrst algorithm,Algorithm S,uses a“slack feature.”The second,Algorithm T,keepstrack of partial totals.For Algorithm S,we deﬁne the bydef,where is a constant chosen so that for all and all observation vectors in the training set,thus making.Feature is“global,”that is,it does not correspond to any particular edge or vertex.For each index we now deﬁne thewith base caseifotherwiseand recurrence.Similarly,the are deﬁned byifotherwiseand.With these deﬁnitions,the update equations are,where.The factors involving the forward and backward vectors in the above equations have the same meaning as for standard hidden Markov models.For example,is the marginal probability of label given that the observation sequence is.This algorithm is closely related to the algorithm of Darroch and Ratcliff(1972),and MART algorithms used in image reconstruction.The constant in Algorithm S can be quite large,since in practice it is proportional to the length of the longest train-ing observation sequence.As a result,the algorithm may converge slowly,taking very small steps toward the maxi-mum in each iteration.If the length of the observations and the number of active features varies greatly,a faster-converging algorithm can be obtained by keeping track of feature totals for each observation sequence separately. Let def.Algorithm T accumulates feature expectations into counters indexed by.More speciﬁcally,we use the forward-backward recurrences just introduced to compute the expectations of feature and of feature given that.Then our param-eter updates are and,whereand are the unique positive roots to the following polynomial equationsmax max,(2)which can be easily computed by Newton’s method.A single iteration of Algorithm S and Algorithm T has roughly the same time and space complexity as the well known Baum-Welch algorithm for HMMs.To prove con-vergence of our algorithms,we can derive an auxiliary function to bound the change in likelihood from below;this method is developed in detail by Della Pietra et al.(1997). The full proof is somewhat detailed;however,here we give an idea of how to derive the auxiliary function.To simplify notation,we assume only edge features with parameters .Given two parameter settings and,we bound from below the change in the objective function with anas followsdefwhere the inequalities follow from the convexity of and.Differentiating with respect to and setting the result to zero yields equation(2).5.ExperimentsWeﬁrst discuss two sets of experiments with synthetic data that highlight the differences between CRFs and MEMMs. Theﬁrst experiments are a direct veriﬁcation of the label bias problem discussed in Section2.In the second set of experiments,we generate synthetic data using randomly chosen hidden Markov models,each of which is a mix-ture of aﬁrst-order and second-order peting models are then trained and compared on test data.As the data becomes more second-order,the test er-ror rates of the trained models increase.This experiment corresponds to the common modeling practice of approxi-mating complex local and long-range dependencies,as oc-cur in natural data,by small-order Markov models.OurFigure3.Plots of error rates for HMMs,CRFs,and MEMMs on randomly generated synthetic data sets,as described in Section5.2. As the data becomes“more second order,”the error rates of the test models increase.As shown in the left plot,the CRF typically signiﬁcantly outperforms the MEMM.The center plot shows that the HMM outperforms the MEMM.In the right plot,each open square represents a data set with,and a solid circle indicates a data set with.The plot shows that when the data is mostly second order(),the discriminatively trained CRF typically outperforms the HMM.These experiments are not designed to demonstrate the advantages of the additional representational power of CRFs and MEMMs relative to HMMs.results clearly indicate that even when the models are pa-rameterized in exactly the same way,CRFs are more ro-bust to inaccurate modeling assumptions than MEMMs or HMMs,and resolve the label bias problem,which affects the performance of MEMMs.To avoid confusion of dif-ferent effects,the MEMMs and CRFs in these experiments use overlapping features of the observations.Fi-nally,in a set of POS tagging experiments,we conﬁrm the advantage of CRFs over MEMMs.We also show that the addition of overlapping features to CRFs and MEMMs al-lows them to perform much better than HMMs,as already shown for MEMMs by McCallum et al.(2000).5.1Modeling label biasWe generate data from a simple HMM which encodes a noisy version of theﬁnite-state network in Figure1.Each state emits its designated symbol with probabilityand any of the other symbols with probability.We train both an MEMM and a CRF with the same topologies on the data generated by the HMM.The observation fea-tures are simply the identity of the observation symbols. In a typical run using training and test samples, trained to convergence of the iterative scaling algorithm, the CRF error is while the MEMM error is, showing that the MEMM fails to discriminate between the two branches.5.2Modeling mixed-order sourcesFor these results,we useﬁve labels,(),and26 observation values,();however,the results were qualitatively the same over a range of sizes for and .We generate data from a mixed-order HMM with state transition probabilities given byand,simi-larly,emission probabilities given by.Thus,for we have a standardﬁrst-order HMM.In order to limit the size of the Bayes error rate for the resulting models,the con-ditional probability tables are constrained to be sparse. In particular,can have at most two nonzero en-tries,for each,and can have at most three nonzero entries for each.For each randomly gener-ated model,a sample of1,000sequences of length25is generated for training and testing.On each randomly generated training set,a CRF is trained using Algorithm S.(Note that since the length of the se-quences and number of active features is constant,Algo-rithms S and T are identical.)The algorithm is fairly slow to converge,typically taking approximately500iterations for the model to stabilize.On the500MHz Pentium PC used in our experiments,each iteration takes approximately 0.2seconds.On the same data an MEMM is trained using iterative scaling,which does not require forward-backward calculations,and is thus more efﬁcient.The MEMM train-ing converges more quickly,stabilizing after approximately 100iterations.For each model,the Viterbi algorithm is used to label a test set;the experimental results do not sig-niﬁcantly change when using forward-backward decoding to minimize the per-symbol error rate.The results of several runs are presented in Figure3.Each plot compares two classes of models,with each point indi-cating the error rate for a single test set.As increases,the error rates generally increase,as theﬁrst-order models fail toﬁt the second-order data.Theﬁgure compares models parameterized as,,and;results for models parameterized as,,and are qualitatively the same.As shown in theﬁrst graph,the CRF generally out-performs the MEMM,often by a wide margin of10%–20% relative error.(The points for very small error rate,with ,where the MEMM does better than the CRF, are suspected to be the result of an insufﬁcient number of training iterations for the CRF.)HMM 5.69%45.99%MEMM 6.37%54.61%CRF 5.55%48.05%MEMM 4.81%26.99%CRF 4.27%23.76%Using spelling featuresFigure4.Per-word error rates for POS tagging on the Penn tree-bank,usingﬁrst-order models trained on50%of the1.1million word corpus.The oov rate is5.45%.5.3POS tagging experimentsTo conﬁrm our synthetic data results,we also compared HMMs,MEMMs and CRFs on Penn treebank POS tag-ging,where each word in a given input sentence must be labeled with one of45syntactic tags.We carried out two sets of experiments with this natural language data.First,we trainedﬁrst-order HMM,MEMM, and CRF models as in the synthetic data experiments,in-troducing parameters for each tag-word pair andfor each tag-tag pair in the training set.The results are con-sistent with what is observed on synthetic data:the HMM outperforms the MEMM,as a consequence of the label bias problem,while the CRF outperforms the HMM.The er-ror rates for training runs using a50%-50%train-test split are shown in Figure5.3;the results are qualitatively sim-ilar for other splits of the data.The error rates on out-of-vocabulary(oov)words,which are not observed in the training set,are reported separately.In the second set of experiments,we take advantage of the power of conditional models by adding a small set of or-thographic features:whether a spelling begins with a num-ber or upper case letter,whether it contains a hyphen,and whether it ends in one of the following sufﬁxes:.Here weﬁnd,as expected,that both the MEMM and the CRF beneﬁt signif-icantly from the use of these features,with the overall error rate reduced by around25%,and the out-of-vocabulary er-ror rate reduced by around50%.One usually starts training from the all zero parameter vec-tor,corresponding to the uniform distribution.However, for these datasets,CRF training with that initialization is much slower than MEMM training.Fortunately,we can use the optimal MEMM parameter vector as a starting point for training the corresponding CRF.In Figure5.3, MEMM was trained to convergence in around100iter-ations.Its parameters were then used to initialize the train-ing of CRF,which converged in1,000iterations.In con-trast,training of the same CRF from the uniform distribu-tion had not converged even after2,000iterations.6.Further Aspects of CRFsMany further aspects of CRFs are attractive for applica-tions and deserve further study.In this section we brieﬂy mention just two.Conditional randomﬁelds can be trained using the expo-nential loss objective function used by the AdaBoost algo-rithm(Freund&Schapire,1997).Typically,boosting is applied to classiﬁcation problems with a small,ﬁxed num-ber of classes;applications of boosting to sequence labeling have treated each label as a separate classiﬁcation problem (Abney et al.,1999).However,it is possible to apply the parallel update algorithm of Collins et al.(2000)to op-timize the per-sequence exponential loss.This requires a forward-backward algorithm to compute efﬁciently certain feature expectations,along the lines of Algorithm T,ex-cept that each feature requires a separate set of forward and backward accumulators.Another attractive aspect of CRFs is that one can imple-ment efﬁcient feature selection and feature induction al-gorithms for them.That is,rather than specifying in ad-vance which features of to use,we could start from feature-generating rules and evaluate the beneﬁt of gener-ated features automatically on data.In particular,the fea-ture induction algorithms presented in Della Pietra et al. (1997)can be adapted toﬁt the dynamic programming techniques of conditional randomﬁelds.7.Related Work and ConclusionsAs far as we know,the present work is theﬁrst to combine the beneﬁts of conditional models with the global normal-ization of randomﬁeld models.Other applications of expo-nential models in sequence modeling have either attempted to build generative models(Rosenfeld,1997),which in-volve a hard normalization problem,or adopted local con-ditional models(Berger et al.,1996;Ratnaparkhi,1996; McCallum et al.,2000)that may suffer from label bias. Non-probabilistic local decision models have also been widely used in segmentation and tagging(Brill,1995; Roth,1998;Abney et al.,1999).Because of the computa-tional complexity of global training,these models are only trained to minimize the error of individual label decisions assuming that neighboring labels are correctly -bel bias would be expected to be a problem here too.An alternative approach to discriminative modeling of se-quence labeling is to use a permissive generative model, which can only model local dependencies,to produce a list of candidates,and then use a more global discrimina-tive model to rerank those candidates.This approach is standard in large-vocabulary speech recognition(Schwartz &Austin,1993),and has also been proposed for parsing (Collins,2000).However,these methods fail when the cor-rect output is pruned away in theﬁrst pass.。

algorand原理

algorand原理Algorand是一种基于去中心化区块链技术的算法，其核心思想是使用密码学互操作来实现去中心化区块链的整个生态系统的完全一致性。

Algorand采用了一种被称为随机互动算法的机制，将所有的事务分为两个类别，一个是提议类别，另一个是投票类别。

这种算法将所有的事务按照顺序进行排列，在每一个周期中，随机选择一个人作为领导人，该领导人就要向大家提出一个提议。

这个提议包括两部分内容，一部分是某笔交易的内容，而另一部分则是一个随机的难题。

而其他参与者必须回答这个领导人提出的难题，其实质上就是进行了一个共识过程，所有人共同解决。

Algorand是一个基于概率的分布式共识协议，旨在解决区块链网络中的双花问题和分叉问题。

Algorand采用Pure Proof of Stake (PPoS) 模型，并采用一种称为分层随机选择（Hierarchical Byzantine Agreement）的算法来确保网络的高度安全性和快速的交易确认。

其核心原理可以分为以下4个关键点：1. 可验证的随机函数（VRF）：Algorand引入了可验证的随机函数，确保生成随机数的过程是公开的、可验证的，并且不利于操纵。

这种随机函数的使用使得区块提议者和投票者的选择过程变得无法预测，从而防止了潜在的攻击行为。

2. 随机权益证明（randPoS）：Algorand利用随机权益证明来选举权益者，确保网络中的权益者是由随机选择的，并且没有明显的操纵可能性。

这种证明机制与传统的PoS（Proof of Stake）类似，但在随机性和公平性上有所提升。

3. 分层随机选择（HBA）：Algorand使用了一种名为分层随机选择的算法来实现网络的共识过程。

这个算法确保了即使在网络出现分区或者节点故障的情况下，依然能够达成一致的共识，从而保障了网络的安全性和可用性。

4. 均衡共识：Algorand的共识算法旨在实现区块链网络的高吞吐量和低延迟。

具有时滞和不确定性多智能体鲁棒一致性研究

ＲｏｂｕｓｔＣｏｎｓｅｎｓｕｓＡｎａｌｙｓｉｓｏｆＭｕｌｔｉ — ＡｇｅｎｔＳｙｓｔｅｍｓｗｉｔｈｂｏｔｈＴｉｍｅ — ＤｅｌａｙａｎｄＵｎｃｅｒｔａｉｎｔｙ
协议，证明了多智能体系统最终能够实现鲁棒一致的充分条件，而此充分条件的成
立是当且仅当多智能体系统的拓扑结构具有生成树，最后通过仿真证实了控制协
议的有效性。
关键词：多智能体；一致性；时延；鲁棒中图分类号：ＴＰ２７３文献标识码：Ａ
ＺＥＮＧＹａｏ — ｗｕ，ＦＥＮＧＷｅｉ
（ＳｃｈｏｏｌｏｆＭａｔｈｅｍａｔｉｃｓａｎｄＳｙｓｔｅｍＳｃｉｅｎｃｅｓ，ＬＭＩＢ，ＢｅｉｈａｎｇＵｎｉｖｅｒｓｉｔｙ，Ｂｅｉｊｉｎｇ１００１９１，Ｃｈｉｎａ）
．
Ｋｅｙｗｏｒｄｓ：ｍｕｌｔｉ — ａｇｅｎｔ；ｃｏｎｓｅｎｓｕｓ；ｔｉｍｅ — ｄｅｌａｙ；ｒｏｂｕｓｔ
０引言
多智能体系统是近年来随着计算机和网络通信技术的迅猛发展而出现的新型控制系统，在生物界、工程和社会经济等领域具有广泛的应用背景。在多智能体系统协调控制中，一致性问题是其中的一个重要问题，是研究其他问题的基础，具有广泛的应用价值。例如无人驾驶飞机协同作战，水下机器人协同作业等。Ｊａｄ — ｂａｂａｉｅ等研究了有领导者和跟随者的一群自由移动的多智能体系统的协调控制。Ｏｌｆａｔｉ — Ｓａｂｅｒ等Ｅｚ考虑

Designing genetic algorithms for the state assignment problem

State Table with State Assignments.
II. State Assignment Problem
This study investigates the suitability of genetic algorithms for nding good solutions for Combinatorial Optimization Problems (COPs) 8], selecting the state assignment problem (SAP) as a case study. The SAP entails the codi cation of states in a Finite State Machine (FSM), and is a well studied NP-complete problem. Moreover, several well-known COPs such as the traveling salesman problem, can be formulated as special cases of the SAP. For these reasons, we use SAP in this paper as a testbed to investigate the optimization capabilities of Genetic Algorithms. A wide variety of heuristics are available for the SAP. For example, KISS (Keep Internal State Simple) works with symbolic minimization and multivalued logic 4]. Varma and Trachterberg 14] use partition theory and spectral translation techniques to search for a good state assignment. Amaral and Cunha developed an algorithm that uses a weighted graph to sort a set of heuristic rules 1]. Villa and Sangiovanni-Vincentelli created algorithms for state assignment based on the solution of face hypercube and ordered face hypercube embedding 16]. Devadas and Newton introduce an \exact" algorithm for encoding prob-

IEEE Xplore Full-Text PDF_

Manuscript received October 10, 2005; revised December 22, 2005. The work of Y. He and M. Wu was supported in part by the National Science Foundation of China under Grants 60425310 and 60574014. Y. He is with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore 119260, Singapore, on leave from the School of Information Science and Engineering, Central South University, Changsha 410083, China. Q.-G. Wang and C. Lin are with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore 119260, Singapore (e-mail: elewqg@.sg). M. Wu is with the School of Information Science and Engineering, Central South University, Changsha 410083, China. Digital Object Identiﬁer 10.1109/TNN.2006.875969
1

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

arXiv:0712.4018v1 [quant-ph] 24 Dec 2007
probability distributions for the eigenvalues of two or more observables in an unknown state Φ. Starting form an arbitrary state Ψ0 , a succession of states Ψn is obtained that converges to Φ or to a Pauli partner. This algorithm for state reconstruction is eﬃcient and robust as is seen in the numerical tests presented and is a useful tool not only for state determination but also for the study of Pauli partners. Its main ingredient is the Physical Imposition Operator that changes any state to have the same physical properties, with respect to an observable, of another state. Keywords: state determination, state reconstruction, Pauli partners, unbiased bases. 03.65.Wj 02.60.Gf PACS:
State determination: an iterative algorithm.
rdo M. Goyeneche and Alberto C. de la Torre∗
Departamento de F´ ısica, Universidad Nacional de Mar del Plata Funes 3350, 7600 Mar del Plata, Argentina CONICET An iterative algorithm for state determination is presented that uses as physical input the
∗ Electronic
address: delatorre@.ar
2 probability distribution (assuming non-degeneracy) is given by ̺(ak ) = | ϕk , Φ |2 where ϕk are the eigenvectors of the operator. The state is not directly observable; what can be measured, are the probability distributions of the eigenvalues of the observables and we want to be able to determine the state Φ of the system using these distributions. Besides the academic interest of quantum state reconstruction based on measurements of probability distributions, the issue has gained actuality in the last decade in the possible practical applications of quantum information theory [4]. In order to state clearly the problem, let us consider a system described in an N dimensional Hilbert space. The determination of the state requires the determination of 2N − 2 real numbers and a complete measurement of an observable provides N − 1 equations. With the measurement of two observables (like position and momentum in the Pauli problem) we have the same number of equations as unknowns. However the equations available are not linear and the system of equations will not have, in general, a unique solution. In many practical cases, a minimal additional information (like the sign of an expectation value) is suﬃcient to determine the state. In this work we will not search the minimal extra information required, but instead, we will add a complete measurement of a third observable. One may think that this massive addition of information will make the system over-determined and that with three complete measurements we should always be able to ﬁnd a unique state. This is wrong; there are pathological cases where the complete measurement of N observables, that is N (N − 1) equations, is not suﬃcient for the determination of a unique set of 2(N − 1) numbers! In the other extreme, if the state happens to be equal to one of the eigenvectors of the observable measured, then, of course, just one complete measurement is suﬃcient to ﬁx the state. From these two cases we conclude that the choice of the observables to be measured is crucial for the determination of the state; an observable with a probability distribution peaked provides much more information than an observable with uniform distribution. A pair of observables may provide redundant information and we expect that it is convenient to use observables as diﬀerent as possible; this happens when their eigenvectors build two unbiased bases as is the case, for example, with position and momentum (two bases {ϕk } and {φr } are √ unbiased when | ϕk , φr | = 1/ N ∀k, r, that is, every element of one basis has equal “projection” on all elements of the other basis). For this reason, unbiased bases have been intensively studied in the problem of state determination and also in quantum information theory [5, 6, 7]. The number of mutually unbiased bases that one can deﬁne in an N dimensional Hilbert space is not known in general although it can be proved that if N is equal to a power of a prime number, then there are N + 1 unbiased bases. Unbiased observables, those represented by operators whose eigenvectors build unbiased bases, provide independent information; there are however pathological cases where the measurement of several unbiased observables is useless to determine a unique state: assume for instance that the state belongs to a basis that is unbiased to several other mutually unbiased bases associated with the measured observables. In this case all the probability distributions are uniform and the state can not be uniquely determined because there are at least N diﬀerent states (corresponding to the N elements of the basis to which the state belongs) all generating