A metric for covariance matrices
- 格式:pdf
- 大小:207.17 KB
- 文档页数:16
Package‘sEparaTe’August18,2023Title Maximum Likelihood Estimation and Likelihood Ratio TestFunctions for Separable Variance-Covariance StructuresVersion0.3.2Maintainer Timothy Schwinghamer<***************************.CA>Description Maximum likelihood estimation of the parametersof matrix and3rd-order tensor normal distributions with unstructuredfactor variance covariance matrices,two procedures,and for unbiasedmodified likelihood ratio testing of simple and double separabilityfor variance-covariance structures,two procedures.References:Dutilleul P.(1999)<doi:10.1080/00949659908811970>,Manceur AM,Dutilleul P.(2013)<doi:10.1016/j.cam.2012.09.017>,and Manceur AM,DutilleulP.(2013)<doi:10.1016/j.spl.2012.10.020>.Depends R(>=4.3.0)License MIT+file LICENSEEncoding UTF-8LazyData trueRoxygenNote7.2.0NeedsCompilation noAuthor Ameur Manceur[aut],Timothy Schwinghamer[aut,cre],Pierre Dutilleul[aut,cph]Repository CRANDate/Publication2023-08-1807:50:02UTCR topics documented:data2d (2)data3d (2)lrt2d_svc (3)lrt3d_svc (5)mle2d_svc (7)12data3d mle3d_svc (8)sEparaTe (10)Index11 data2d Two dimensional data setDescriptionAn i.i.d.random sample of size7from a2x3matrix normal distribution,for a small numerical example of the use of the functions mle2d_svc and lrt2d_svc from the sEparaTe packageUsagedata2dFormatA frame(excluding the headings)with42lines of data and4variables:K an integer ranging from1to7,the size of an i.i.d.random sample from a2x3matrix normal distributionId1an integer ranging from1to2,the number of rows of the matrix normal distributionId2an integer ranging from1to3,the number of columns of the matrix normal distribution value2d the sample data for the observed variabledata3d Three dimensional data setDescriptionAn i.i.d.random sample of size13from a2x3x2tensor normal distribution,for a small numerical example of the use of the functions mle3d_svc and lrt3d_svc from the sEparaTe packageUsagedata3dFormatA frame(excluding the headings)with156lines of data and5variables:K an integer ranging from1to13,the size of an i.i.d.random sample from a2x3x2tensor matrix normal distributionId3an integer ranging from1to2,the number of rows of the3rd-order tensor normal distribution Id4an integer ranging from1to3,the number of columns of the3rd-order tensor normal distribu-tionId5an integer ranging from1to2,the number of edges of the3rd-order tensor normal distribution value3d the sample data for the observed variablelrt2d_svc Unbiased modified likelihood ratio test for simple separability of avariance-covariance matrix.DescriptionA likelihood ratio test(LRT)for simple separability of a variance-covariance matrix,modified tobe unbiased infinite samples.The modification is a penalty-based homothetic transformation of the LRT statistic.The penalty value is optimized for a given mean model,which is left unstruc-tured here.In the required function,the Id1and Id2variables correspond to the row and column subscripts,respectively;“value2d”refers to the observed variable.Usagelrt2d_svc(value2d,Id1,Id2,subject,data_2d,eps,maxiter,startmat,sign.level,n.simul)Argumentsvalue2d from the formula value2d~Id1+Id2Id1from the formula value2d~Id1+Id2Id2from the formula value2d~Id1+Id2subject the replicate,also called the subject or individual,thefirst column in the matrix (2d)datafiledata_2d the name of the matrix dataeps the threshold in the stopping criterion for the iterative mle algorithm(estimation) maxiter the maximum number of iterations for the mle algorithm(estimation)startmat the value of the second factor variance-covariance matrix used for initialization,i.e.,to start the mle algorithm(estimation)and obtain the initial estimate of thefirst factor variance-covariance matrixsign.level the significance level,or rejection rate in the testing of the null hypothesis of simple separability for a variance-covariance structure,when the unbiased mod-ified LRT is used,i.e.,the critical value in the chi-square test is derived bysimulations from the sampling distribution of the LRT statistic n.simul the number of simulations used to build the sampling distribution of the LRT statistic under the null hypothesis,using the same characteristics as the i.i.d.random sample from a matrix normal distributionOutput“Convergence”,TRUE or FALSE“chi.df”,the theoretical number of degrees of freedom of the asymptotic chi-square distribution that would apply to the unmodified LRT statistic for simple separability of a variance-covariance structure“Lambda”,the observed value of the unmodified LRT statistic“critical.value”,the critical value at the specified significance level for the chi-square distribution with“chi.df”degrees of freedom“mbda”will indicate whether or not the null hypothesis of separability was rejected, based on the theoretical LRT statistic“Simulation.critical.value”,the critical value at the specified significance level that is derived from the sampling distribution of the unbiased modified LRT statistic“mbda.simulation”,the decision(acceptance/rejection)regarding the null hypothesis of simple separability,made using the theoretical(biased unmodified)LRT“Penalty”,the optimized penalty value used in the homothetic transformation between the biased unmodified and unbiased modified LRT statistics“U1hat”,the estimated variance-covariance matrix for the rows“Standardized_U1hat”,the standardized estimated variance-covariance matrix for the rows;the standardization is performed by dividing each entry of U1hat by entry(1,1)of U1hat“U2hat”,the estimated variance-covariance matrix for the columns“Standardized_U2hat”,the standardized estimated variance-covariance matrix for the columns;the standardization is performed by multiplying each entry of U2hat by entry(1,1)of U1hat“Shat”,the sample variance-covariance matrix computed from the vectorized data matrices ReferencesManceur AM,Dutilleul P.2013.Unbiased modified likelihood ratio tests for simple and double separability of a variance-covariance structure.Statistics and Probability Letters83:631-636.Examplesoutput<-lrt2d_svc(data2d$value2d,data2d$Id1,data2d$Id2,data2d$K,data_2d=data2d,n.simul=100)outputlrt3d_svc An unbiased modified likelihood ratio test for double separability of avariance-covariance structure.DescriptionA likelihood ratio test(LRT)for double separability of a variance-covariance structure,modified tobe unbiased infinite samples.The modification is a penalty-based homothetic transformation of the LRT statistic.The penalty value is optimized for a given mean model,which is left unstructured here.In the required function,the Id3,Id4and Id5variables correspond to the row,column and edge subscripts,respectively;“value3d”refers to the observed variable.Usagelrt3d_svc(value3d,Id3,Id4,Id5,subject,data_3d,eps,maxiter,startmatU2,startmatU3,sign.level,n.simul)Argumentsvalue3d from the formula value3d~Id3+Id4+Id5Id3from the formula value3d~Id3+Id4+Id5Id4from the formula value3d~Id3+Id4+Id5Id5from the formula value3d~Id3+Id4+Id5subject the replicate,also called individualdata_3d the name of the tensor dataeps the threshold in the stopping criterion for the iterative mle algorithm(estimation) maxiter the maximum number of iterations for the mle algorithm(estimation)startmatU2the value of the second factor variance-covariance matrix used for initialization startmatU3the value of the third factor variance-covariance matrix used for initialization,i.e.,startmatU3together with startmatU2are used to start the mle algorithm(estimation)and obtain the initial estimate of thefirst factor variance-covariancematrix U1sign.level the significance level,or rejection rate in the testing of the null hypothesis of simple separability for a variance-covariance structure,when the unbiased mod-ified LRT is used,i.e.,the critical value in the chi-square test is derived bysimulations from the sampling distribution of the LRT statistic n.simul the number of simulations used to build the sampling distribution of the LRT statistic under the null hypothesis,using the same characteristics as the i.i.d.random sample from a tensor normal distributionOutput“Convergence”,TRUE or FALSE“chi.df”,the theoretical number of degrees of freedom of the asymptotic chi-square distribution that would apply to the unmodified LRT statistic for double separability of a variance-covariance structure“Lambda”,the observed value of the unmodified LRT statistic“critical.value”,the critical value at the specified significance level for the chi-square distribution with“chi.df”degrees of freedom“mbda”,the decision(acceptance/rejection)regarding the null hypothesis of double sep-arability,made using the theoretical(biased unmodified)LRT“Simulation.critical.value”,the critical value at the specified significance level that is derived from the sampling distribution of the unbiased modified LRT statistic“mbda.simulation”,the decision(acceptance/rejection)regarding the null hypothesis of double separability,made using the unbiased modified LRT“Penalty”,the optimized penalty value used in the homothetic transformation between the biased unmodified and unbiased modified LRT statistics“U1hat”,the estimated variance-covariance matrix for the rows“Standardized_U1hat”,the standardized estimated variance-covariance matrix for the rows;the standardization is performed by dividing each entry of U1hat by entry(1,1)of U1hat“U2hat”,the estimated variance-covariance matrix for the columns“Standardized_U2hat”,the standardized estimated variance-covariance matrix for the columns;the standardization is performed by multiplying each entry of U2hat by entry(1,1)of U1hat“U3hat”,the estimated variance-covariance matrix for the edges“Shat”,the sample variance-covariance matrix computed from the vectorized data tensorsReferencesManceur AM,Dutilleul P.2013.Unbiased modified likelihood ratio tests for simple and double separability of a variance-covariance structure.Statistics and Probability Letters83:631-636.Examplesoutput<-lrt3d_svc(data3d$value3d,data3d$Id3,data3d$Id4,data3d$Id5,data3d$K,data_3d=data3d,n.simul=100)outputmle2d_svc Maximum likelihood estimation of the parameters of a matrix normaldistributionDescriptionMaximum likelihood estimation for the parameters of a matrix normal distribution X,which is char-acterized by a simply separable variance-covariance structure.In the general case,which is the case considered here,two unstructured factor variance-covariance matrices determine the covariability of random matrix entries,depending on the row(one factor matrix)and the column(the other factor matrix)where two X-entries are.In the required function,the Id1and Id2variables correspond to the row and column subscripts,respectively;“value2d”indicates the observed variable.Usagemle2d_svc(value2d,Id1,Id2,subject,data_2d,eps,maxiter,startmat)Argumentsvalue2d from the formula value2d~Id1+Id2Id1from the formula value2d~Id1+Id2Id2from the formula value2d~Id1+Id2subject the replicate,also called individualdata_2d the name of the matrix dataeps the threshold in the stopping criterion for the iterative mle algorithmmaxiter the maximum number of iterations for the iterative mle algorithmstartmat the value of the second factor variance-covariance matrix used for initializa-tion,i.e.,to start the algorithm and obtain the initial estimate of thefirst factorvariance-covariance matrixOutput“Convergence”,TRUE or FALSE“Iter”,will indicate the number of iterations needed for the mle algorithm to converge“Xmeanhat”,the estimated mean matrix(i.e.,the sample mean)“First”,the row subscript,or the second column in the datafile“U1hat”,the estimated variance-covariance matrix for the rows“Standardized.U1hat”,the standardized estimated variance-covariance matrix for the rows;the stan-dardization is performed by dividing each entry of U1hat by entry(1,1)of U1hat“Second”,the column subscript,or the third column in the datafile“U2hat”,the estimated variance-covariance matrix for the columns“Standardized.U2hat”,the standardized estimated variance-covariance matrix for the columns;the standardization is performed by multiplying each entry of U2hat by entry(1,1)of U1hat“Shat”,is the sample variance-covariance matrix computed from of the vectorized data matrices ReferencesDutilleul P.1990.Apport en analyse spectrale d’un periodogramme modifie et modelisation des series chronologiques avec repetitions en vue de leur comparaison en frequence.D.Sc.Dissertation, Universite catholique de Louvain,Departement de mathematique.Dutilleul P.1999.The mle algorithm for the matrix normal distribution.Journal of Statistical Computation and Simulation64:105-123.Examplesoutput<-mle2d_svc(data2d$value2d,data2d$Id1,data2d$Id2,data2d$K,data_2d=data2d) outputmle3d_svc Maximum likelihood estimation of the parameters of a3rd-order ten-sor normal distributionDescriptionMaximum likelihood estimation for the parameters of a3rd-order tensor normal distribution X, which is characterized by a doubly separable variance-covariance structure.In the general case, which is the case considered here,three unstructured factor variance-covariance matrices determine the covariability of random tensor entries,depending on the row(one factor matrix),the column (another factor matrix)and the edge(remaining factor matrix)where two X-entries are.In the required function,the Id3,Id4and Id5variables correspond to the row,column and edge subscripts, respectively;“value3d”indicates the observed variable.Usagemle3d_svc(value3d,Id3,Id4,Id5,subject,data_3d,eps,maxiter,startmatU2,startmatU3)Argumentsvalue3d from the formula value3d~Id3+Id4+Id5Id3from the formula value3d~Id3+Id4+Id5Id4from the formula value3d~Id3+Id4+Id5Id5from the formula value3d~Id3+Id4+Id5subject the replicate,also called individualdata_3d the name of the tensor dataeps the threshold in the stopping criterion for the iterative mle algorithmmaxiter the maximum number of iterations for the iterative mle algorithmstartmatU2the value of the second factor variance covariance matrix used for initialization startmatU3the value of the third factor variance covariance matrix used for initialization,i.e.,startmatU3together with startmatU2are used to start the algorithm andobtain the initial estimate of thefirst factor variance covariance matrix U1Output“Convergence”,TRUE or FALSE“Iter”,the number of iterations needed for the mle algorithm to converge“Xmeanhat”,the estimated mean tensor(i.e.,the sample mean)“First”,the row subscript,or the second column in the datafile“U1hat”,the estimated variance-covariance matrix for the rows“Standardized.U1hat”,the standardized estimated variance-covariance matrix for the rows;the stan-dardization is performed by dividing each entry of U1hat by entry(1,1)of U1hat“Second”,the column subscript,or the third column in the datafile“U2hat”,the estimated variance-covariance matrix for the columns“Standardized.U2hat”,the standardized estimated variance-covariance matrix for the columns;the standardization is performed by multiplying each entry of U2hat by entry(1,1)of U1hat“Third”,the edge subscript,or the fourth column in the datafile“U3hat”,the estimated variance-covariance matrix for the edges“Shat”,the sample variance-covariance matrix computed from the vectorized data tensorsReferenceManceur AM,Dutilleul P.2013.Maximum likelihood estimation for the tensor normal distribution: Algorithm,minimum sample size,and empirical bias and dispersion.Journal of Computational and Applied Mathematics239:37-49.10sEparaTeExamplesoutput<-mle3d_svc(data3d$value3d,data3d$Id3,data3d$Id4,data3d$Id5,data3d$K,data_3d=data3d) outputsEparaTe MLE and LRT functions for separable variance-covariance structuresDescriptionA package for maximum likelihood estimation(MLE)of the parameters of matrix and3rd-ordertensor normal distributions with unstructured factor variance-covariance matrices(two procedures),and for unbiased modified likelihood ratio testing(LRT)of simple and double separability forvariance-covariance structures(two procedures).Functionsmle2d_svc,for maximum likelihood estimation of the parameters of a matrix normal distributionmle3d_svc,for maximum likelihood estimation of the parameters of a3rd-order tensor normaldistributionlrt2d_svc,for the unbiased modified likelihood ratio test of simple separability for a variance-covariance structurelrt3d_svc,for the unbiased modified likelihood ratio test of double separability for a variance-covariance structureDatadata2d,a two-dimensional data setdata3d,a three-dimensional data setReferencesDutilleul P.1999.The mle algorithm for the matrix normal distribution.Journal of StatisticalComputation and Simulation64:105-123.Manceur AM,Dutilleul P.2013.Maximum likelihood estimation for the tensor normal distribution:Algorithm,minimum sample size,and empirical bias and dispersion.Journal of Computational andApplied Mathematics239:37-49.Manceur AM,Dutilleul P.2013.Unbiased modified likelihood ratio tests for simple and doubleseparability of a variance covariance structure.Statistics and Probability Letters83:631-636.Index∗datasetsdata2d,2data3d,2data2d,2data3d,2lrt2d_svc,3lrt3d_svc,5mle2d_svc,7mle3d_svc,8sEparaTe,1011。
Package‘bvarsv’October12,2022Type PackageTitle Bayesian Analysis of a Vector Autoregressive Model withStochastic V olatility and Time-Varying ParametersVersion1.1Date2015-10-29Author Fabian KruegerMaintainer Fabian Krueger<**************************>DescriptionR/C++implementation of the model proposed by Primiceri(``Time Varying Structural Vec-tor Autoregressions and Monetary Policy'',Review of Economic Studies,2005),with functional-ity for computing posterior predictive distributions and impulse responses.License GPL(>=2)Imports Rcpp(>=0.11.0)LinkingTo Rcpp,RcppArmadilloURL https:///site/fk83research/codeNeedsCompilation yesRepository CRANDate/Publication2015-11-2514:40:22R topics documented:bvarsv-package (2)p (3)Example data sets (5)helpers (6)impulse.responses (8)p (9)Index1212bvarsv-packagebvarsv-package Bayesian Analysis of a Vector Autoregressive Model with StochasticVolatility and Time-Varying ParametersDescriptionR/C++implementation of the Primiceri(2005)model,which allows for both stochastic volatility and time-varying regression parameters.The package contains functions for computing posterior predictive distributions and impulse responses from the model,based on an input data set.DetailsPackage:bvarsvType:PackageVersion: 1.0Date:2014-08-14License:GPL(>=2)URL:https:///site/fk83research/codeAuthor(s)Fabian Krueger<**************************>,based on Matlab code by Dimitris Korobilis(see Koop and Korobilis,2010).ReferencesThe code incorporates the recent corrigendum by Del Negro and Primiceri(2015),which points to an error in the original MCMC algorithm of Primiceri(2005).Del Negro,M.and Primicerio,G.E.(2015).‘Time Varying Structural Vector Autoregressions and Monetary Policy:A Corrigendum’,Review of Economic Studies82,1342-1345.Koop,G.and D.Korobilis(2010):‘Bayesian Multivariate Time Series Methods for Empirical Macroeconomics’,Foundations and Trends in Econometrics3,267-358.Accompanying Matlab code available at https:///site/dimitriskorobilis/matlab.Primiceri,G.E.(2005):‘Time Varying Structural Vector Autoregressions and Monetary Policy’, Review of Economic Studies72,821-852.Examples##Not run:#Load US macro datadata(usmacro)#Estimate trivariate model using Primiceri s prior choices(default settings)set.seed(5813)bv<p(usmacro)##End(Not run)p Bayesian Analysis of a Vector Autoregressive Model with StochasticVolatility and Time-Varying ParametersDescriptionBayesian estimation of theflexible V AR model by Primiceri(2005)which allows for both stochasticvolatility and time drift in the model parameters.Usagep(Y,p=1,tau=40,nf=10,pdrift=TRUE,nrep=50000,nburn=5000,thinfac=10,itprint=10000,save.parameters=TRUE,k_B=4,k_A=4,k_sig=1,k_Q=0.01,k_S=0.1,k_W=0.01,pQ=NULL,pW=NULL,pS=NULL)ArgumentsY Matrix of data,where rows represent time and columns are different variables.Y must have at least two columns.p Lag length,greater or equal than1(the default)tau Length of the training sample used for determining prior parameters via leastsquares(LS).That is,data in Y[1:tau,]are used for estimating prior parame-ters via LS;formal Bayesian analysis is then performed for data in Y[(tau+1):nrow(Y),].nf Number of future time periods for which forecasts are computed(integer,1orgreater,defaults to10).pdrift Dummy,indicates whether or not to account for parameter drift when simulatingforecasts(defaults to TRUE).nrep Number of MCMC draws excluding burn-in(defaults to50000)nburn Number of MCMC draws used to initialize the sampler(defaults to5000).Thesedraws do not enter the computation of posterior moments,forecasts etc.thinfac Thinning factor for MCMC output.Defaults to10,which means that the fore-cast sequences(fc.mdraws,fc.vdraws,fc.ydraws,see below)contain onlyevery tenth draw of the original sequence.Set thinfac to one to obtain the fullMCMC sequence.itprint Print every itprint-th iteration.Defaults to10000.Set to very large value toomit printing altogether.save.parametersIf set to TRUE,parameter draws are saved in lists(these can be very large).De-faults to TRUE.k_B,k_A,k_sig,k_Q,k_W,k_S,pQ,pW,pSQuantities which enter the prior distributions,see the links below for details.Defaults to the exact values used in the original article by Primiceri.ValueBeta.postmean Posterior means of coefficients.This is an array of dimension[M,Mp+1,T],where T denotes the number of time periods(=number of rows of Y),and Mdenotes the number of system variables(=number of columns of Y).The sub-matrix[,,t]represents the coefficient matrix at time t.The intercept vector isstacked in thefirst column;the p coefficient matrices of dimension[M,M]areplaced next to it.H.postmean Posterior means of error term covariance matrices.This is an array of dimension[M,M,T].The submatrix[,,t]represents the covariance matrix at time t.Q.postmean,S.postmean,W.postmeanPosterior means of various covariance matrices.fc.mdraws Draws for the forecast mean vector at various horizons(three-dimensional array,where thefirst dimension corresponds to system variables,the second to forecasthorizons,and the third to MCMC draws).Note:The third dimension will beequal to nrep/thinfac,apart from possible rounding issues.fc.vdraws Draws for the forecast covariance matrix.Design similar to fc.mdraws,ex-cept that thefirst array dimension contains the lower-diagonal elements of theforecast covariance matrix.fc.ydraws Simulated future observations.Design analogous to fc.mdraws.Beta.draws,H.drawsMatrices of parameter draws,can be used for computing impulse responses lateron(see impulse.responses),and accessed via the helper function parameter.draws.These outputs are generated only if save.parameters has been set to TRUE.Author(s)Fabian Krueger,based on Matlab code by Dimitris Korobilis(see Koop and Korobilis,2010).In-corporates the corrigendum by Del Negro and Primiceri(2015),which points to an error in the original MCMC algorithm of Primiceri(2005).ReferencesDel Negro,M.and Primicerio,G.E.(2015).‘Time Varying Structural Vector Autoregressions and Monetary Policy:A Corrigendum’,Review of Economic Studies82,1342-1345.Koop,G.and D.Korobilis(2010):‘Bayesian Multivariate Time Series Methods for Empirical Macroeconomics’,Foundations and Trends in Econometrics3,267-358.Accompanying Matlab code available at https:///site/dimitriskorobilis/matlab.Primiceri,G.E.(2005):‘Time Varying Structural Vector Autoregressions and Monetary Policy’, Review of Economic Studies72,821-852.Example data sets5See AlsoThe helper functions predictive.density and predictive.draws provide simple access to the forecast distribution produced by p.Impulse responses can be computed using im-pulse.responses.For detailed examples and explanations,see the accompanying pdffile hosted at https:///site/fk83research/code.Examples##Not run:#Load US macro datadata(usmacro)#Estimate trivariate BVAR using default settingsset.seed(5813)bv<p(usmacro)##End(Not run)Example data sets US Macroeconomic Time SeriesDescriptionInflation rate,unemployment rate and treasury bill interest rate for the US,as used by Primiceri (2005).Whereas usmacro covers the time period studied by Primiceri(1953:Q1to2001:Q3), usmacro.update updates the data until2015:Q2.FormatMultiple time series(mts)object,series names:‘inf’,‘une’,and‘tbi’.SourceInflation data provided by Federal Reserve Bank of Philadelphia(2015):‘Real-Time Data Research Center’,https:///research-and-data/real-time-center/real-time-data/ data-files/p Accessed:2015-10-29.The inflation rate is the year-over-year log growth rate of the GDP price index.We use the2001:Q4vintage of the price index for usmacro,and the2015:Q3 vintage for usmacro.update.Unemployment and Treasury Bill:Federal Reserve Bank of St.Louis(2015):‘Federal Reserve Economic Data’,/fred2/.Accessed:2015-10-29.The two series have the identifiers‘UNRATE’and‘TB3MS’.For each quarter,we compute simple averages over three monthly observations.Disclaimer:Please note that the providers of the original data cannot take responsibility for the data posted here,nor can they answer any questions about ers should consult their respective websites for the official and most recent version of the data.ReferencesPrimiceri,G.E.(2005):‘Time Varying Structural Vector Autoregressions and Monetary Policy’, Review of Economic Studies72,821-852.Examples##Not run:#Load and plot datadata(usmacro)plot(usmacro)##End(Not run)helpers Helper Functions to Access BVAR Forecast Distributions and Param-eter DrawsDescriptionFunctions to extract a univariate posterior predictive distribution from a modelfit generated by p.Usagepredictive.density(fit,v=1,h=1,cdf=FALSE)predictive.draws(fit,v=1,h=1)parameter.draws(fit,type="lag1",row=1,col=1)Argumentsfit List,modelfit generated by pv Index for variable of interest.Must be in line with the specification of fit.h Index for forecast horizon of interest.Must be in line with the specification offit.cdf Set to TRUE to return cumulative distribution function,set to FALSE to return probability density functiontype Character string,used to specify output for function parameter.draws.Setting to"intercept"returns parameter draws for the intercept vector.Setting to oneof"lag1",...,"lagX",(where X is the lag order used in fit)returns parameterdraws from the autoregressive coefficient matrices.Setting to"vcv"returnsdraws for the elements of the residual variance-covariance matrix.row,col Row and column index for the parameter for which parameter.draws should return posterior draws.That is,the function returns the row,col element of thematrix specified by type.Note that col is irrelevant if type="intercept"hasbeen chosen.Valuepredictive.density returns a function f(z),which yields the value(s)of the predictive density at point(s)z.This function exploits conditional normality of the model,given the posterior draws of the parameters.predictive.draws returns a list containing vectors of MCMC draws,more specifically:y Draws from the predictand itselfm Mean of the normal distribution for the predictand in each drawv Variance of the normal distribution for the predictand in each drawBoth outputs should be closely in line with each other(apart from a small amount of sampling noise),see the link below for details.parameter.draws returns posterior draws for a single(scalar)parameter of the modelfitted by p.The output is a matrix,with rows representing MCMC draws,and columns repre-senting time.Author(s)Fabian KruegerSee AlsoFor examples and background,see the accompanying pdffile hosted at https://sites.google.com/site/fk83research/code.Examples##Not run:#Load US macro datadata(usmacro)#Estimate trivariate BVAR using default settingsset.seed(5813)bv<p(usmacro)#Construct predictive density function for the second variable(inflation),one period ahead f<-predictive.density(bv,v=2,h=1)#Plot the density for a grid of valuesgrid<-seq(-2,5,by=0.05)plot(x=grid,y=f(grid),type="l")#Cross-check:Extract MCMC sample for the same variable and horizonsmp<-predictive.draws(bv,v=2,h=1)#Add density estimate to plotlines(density(smp),col="green")##End(Not run)8impulse.responses impulse.responses Compute Impulse Response Function from a Fitted ModelDescriptionComputes impulse response functions(IRFs)from a modelfit produced by p.The IRF describes how a variable responds to a shock in another variable,in the periods following the shock.To enable simple handling,this function computes IRFs for only one pair of variables that must be specified in advance(see impulse.variable and response.variable below).Usageimpulse.responses(fit,impulse.variable=1,response.variable=2,t=NULL,nhor=20,scenario=2,draw.plot=TRUE) Argumentsfit Modelfit produced by p,with the option save.parameters set to TRUE.impulse.variableVariable which experiences the shock.response.variableVariable which(possibly)responds to the shock.t Time point from which parameter matrices are to be taken.Defaults to most recent time point.nhor Maximal time between impulse and response(defaults to20).scenario If1,there is no orthogonalizaton,and the shock size corresponds to one unit of the impulse variable.If scenario is either2(the default)or3,the errorterm variance-covariance matrix is orthogonalized via Cholesky decomposition.For scenario=2,the Cholesky decomposition of the error term VCV matrix attime point t is used.scenario=3is the variant used in Del Negro and Primiceri(2015).Here,the diagonal elements are set to their averages over time,whereasthe off-diagonal elements are specific to time t.See the notes below for furtherinformation.draw.plot If TRUE(the default):Produces a plot showing the5,25,50,75and95percent quantiles of the simulated impulse responses.ValueList of two elements:contemporaneousContemporaneous impulse responses(vector of simulation draws).irf Matrix of simulated impulse responses,where rows represent simulation draws, and columns represent the number of time periods after the shock(1infirstcolumn,nhor in last column).NoteIf scenario is set to either2or3,the Cholesky transform(transpose of chol)is used to produce the orthogonal impulse responses.See Hamilton(1994),Section11.4,and particularly Equation[11.4.22].As discussed by Hamilton,the ordering of the system variables matters,and should beconsidered carefully.The magnitude of the shock(impulse)corresponds to one standard deviation of the error term.If scenario=1,the function simply outputs the matrices of the model’s moving average represen-tation,see Equation[11.4.1]in Hamilton(1994).The scenario considered here may be unrealistic, in that an isolated shock may be unlikely.The magnitude of the shock(impulse)corresponds to one unit of the error term.Further supporting information is available at https:///site/FK83research/ code.Author(s)Fabian KruegerReferencesHamilton,J.D.(1994):Time Series Analysis,Princeton University Press.Del Negro,M.and Primicerio,G.E.(2015).‘Time Varying Structural Vector Autoregressions and Monetary Policy:A Corrigendum’,Review of Economic Studies82,1342-1345.Supplementary material available at /content/82/4/1342/suppl/DC1(ac-cessed:2015-11-17).Examples##Not run:data(usmacro)set.seed(5813)#Run BVAR;save parametersfit<p(usmacro,save.parameters=TRUE)#Impulse responsesimpulse.responses(fit)##End(Not run)p Simulate from a VAR(1)with Stochastic Volatility and Time-VaryingParametersDescriptionSimulate from a V AR(1)with Stochastic V olatility and Time-Varying ParametersUsagep(B0=NULL,A0=NULL,Sig0=NULL,Q=NULL,S=NULL,W=NULL,t=500,init=1000)ArgumentsB0Initial values of mean parameters:Matrix of dimension[M,M+1],where the first column holds the intercept vector and the other columns hold the matrixoffirst-order autoregressive coefficients.By default(NULL),B0corresponds toM=2uncorrelated zero-mean processes with moderate persistence(first-orderautocorrelation of0.6).A0Initial values for(transformed)error correlation parameters:Vector of length0.5∗M∗(M−1).Defaults to a vector of zeros.Sig0Initial values for log error term volatility parameters:Vector of length M.De-faults to a vector of zeros.Q,S,W Covariance matrices for the innovation terms in the time-varying parameters (B,A,Sig).The matrices are symmetric,with dimensions equal to the numberof elements in B,A and Sig,respectively.Default to diagonal matrices withvery small terms(1e-10)on the main diagonal.This corresponds to essentiallyno time variation in the parameters and error term matrix elements.t Number of time periods to simulate.init Number of draws to initialize simulation(to decrease the impact of starting val-ues).Valuedata Simulated data,with rows corresponding to time and columns corresponding to the M system variables.Beta Array of dimension[M,M+1,t].Submatrix[,,l]holds the parameter matrix for time period l.H Array of dimension[M,M,t].Submatrix[,,l]holds the error term covariancematrix for period l.NoteThe choice of‘reasonable’values for the elements of Q,S and W requires some care.If the elements of these matrices are too large,parameter variation can easily become excessive.Too large elements of Q can lead the parameter matrix B into regions which correspond to explosive processes.Too large elements in S and(especially)W may lead to excessive error term variances.Author(s)Fabian KruegerReferencesPrimiceri,G.E.(2005):‘Time Varying Structural Vector Autoregressions and Monetary Policy’, Review of Economic Studies72,821-852.p11See Alsop can be used tofit a model on data generated by p.This can be a useful way to analyze the performance of the estimation methods.Examples##Not run:#Generate data from a model with moderate time variation in the parameters#and error term variancesset.seed(5813)sim<p(Q=1e-5*diag(6),S=1e-5*diag(1),W=1e-5*diag(2))#Plot both seriesmatplot(sim$data,type="l")#Plot AR(1)parameters of both equationsmatplot(cbind(sim$Beta[1,2,],sim$Beta[2,3,]),type="l")##End(Not run)Index∗datasetsExample data sets,5∗forecasting methodsp,3p,9∗helpershelpers,6∗impulse response analysisimpulse.responses,8∗packagebvarsv-package,2p,3,5–8,11bvarsv(bvarsv-package),2bvarsv-package,2chol,9Example data sets,5helpers,6impulse.responses,4,5,8parameter.draws,4,6,7parameter.draws(helpers),6predictive.density,5,7predictive.density(helpers),6 predictive.draws,5,7predictive.draws(helpers),6p,9,11usmacro(Example data sets),512。
基于黎曼度量结构相似度的图像质量评价方法王玲;刘昊【摘要】图像最主要的是其结构特性,人类的视觉系统是从图像结构上获得图像信息,为了更好的对图像的质量进行评价,本文在SSM模型的基础上提出一种基于黎曼度量的结构相似度评价,把图像置于黎曼流形中进行讨论.实验结果表明本文提出的方法比SSM模型更有区分度,更符合人类视觉系统的特性.【期刊名称】《西安文理学院学报(自然科学版)》【年(卷),期】2014(017)001【总页数】4页(P62-65)【关键词】图像质量评价;结构相似度;结构张量;黎曼度量【作者】王玲;刘昊【作者单位】西安建筑科技大学理学院,西安710055;西安建筑科技大学理学院,西安710055【正文语种】中文【中图分类】O29人类获取外界环境信息的重要来源是图像,因此在整个人类感知系统中视觉系统是最重要的,在人类接受外界信息的过程中80%的获取途径是视觉系统,图像、视频信息不但可以被人类视觉系统感知和理解,计算机也可以对图像信息进行处理,但是就处理的能力而言人类视觉系统的效果远远好于计算机处理,为了满足社会的发展需要,利用人类视觉系统的神经机理,结合相关数学的最新研究成果,建立一种新的计算机处理过程,人类视觉系统可以用计算机来代替,完成处理和解释,使得计算机能够像人类一样观察世界、理解世界.近几年,很多学者根据人类视觉系统特性来研究图像质量问题,wen xu 等提出误差分割的度量方法[3],jayant N J 等给出一种感知度量方法[4],wang 等人认为人的眼睛最主要是通过图像的结构信息来感知图像的,并提出了结构相似度(Structural similarity , SSM)[5]的图像评价方法,虽然 SSM 算法简单,评价性能好,但是它不能评价严重模糊的图像质量.本文提出一种基于黎曼度量结构相似度(DSSM) 的图像质量评价方法.结构相似度评价方法分别从亮度、对比度、结构信息三个方面进行讨论,设原图像为x,重构图像为y,亮度比较函数,对比度比较函数,结构相似度比较函数分别由l(x,y),c(x,y),s(x,y)来表示,上式中:C1,C2,C3分别为非常小的正常数,为了使l(x,y),c(x,y),s(x,y)的分母不为零,C3=C2/2,x和y的结构相似度如下表示结构相似度的值越高X与Y越接近.为了使l(x,y),c(x,y),s(x,y)处于0~1之间,μx·μy,σxy取绝对值.结构张量具有流形结构,采用流形上的测地距离来度量边缘结构张量的距离,结构张量在角点检测[1]、光流计算[2]等底层视觉问题中得到了成功应用,构造二阶的结构张量.2.1 结构张量设I为原图像,图像中某一点的局部二阶结构张量定义为[1-2]:式中: *为卷积运算; Ix和Iy为图像在x和y方向的偏导数.G是尺度为σ的高斯函数:结构张量代表了其局部方向的特征值与特征向量,图像灰度的特征值在图像处理中必不可少,因此我们构造一个灰度与梯度信息的结构张量如下:I为图像在x和y处的灰度;Ix和Iy为图像在x和y方向的偏导数.2.2 黎曼流形下结构张量的距离一般来说黎曼流形是微分流形,其中每个切空间都有一个黎曼度量〈y,y〉,可以由范数‖y‖得到.我们可以通过测量X和Y之间的距离来得到这个范数:结构张量是一个对称正定矩阵,形成黎曼流形,我们定义一个如下黎曼度量[6] 上述黎曼度量下的指数映射为[7]所以我们可以得到对数映射[7]将公式(11)带入公式(12)得到上式描述的是两个张量之间的距离,所以(13)式可等价为λk为X和Y的广义特征值.2.3 黎曼度量结构相似度结构张量还有一个特点,对尺度变化具有不变性,只跟所选特征数有关,便于设计相似度算法.另外,由于其融合了梯度信息,结构张量对光照变化也具有一定的不变性,结构张量是一个对称正定矩阵,形成黎曼流形,给结构相似度中引入黎曼度量.设原图像为I,每个像素点为Ii,重构图像为Y,每个像素点为Yi.基于原图像的结构张量如下基于重构图像的结构张量如下所以两个图像灰度所构成的张量之间的黎曼度量[9]为基于黎曼度量结构相似度如下将式(1)中的s(x,y)用黎曼度量相似度g(x,y)代替得到基于黎曼度量结构相似度:为了验证本文方法的正确性和有效性,我们对一副原始图像用matlab进行加噪处理,图中(1)为原图像,(2)为加了均值为零、方差为0.02的高斯噪声的图像,(3)为加了6%的椒盐噪声图像,(4)为加了方差为0.02的乘性噪声的图像.分别求出三个加噪图像的MAE、PSNR、SSM、DSSM 的值如表1.从表1中我们可以看出DSSM评价指标比SSM评价指标更好的区分两个图像,有较高的精度.由于上述方法中的两个张量是由图像灰度和该灰度在x和y方向上的偏导构成,这两个张量使得图像空间成为一个黎曼流形,然后在这个黎曼流形上确定两个张量的距离,并给出两个张量的一个相似度,通过该方法能更好的表达图像的结构特征.本文利用黎曼几何的观点和方法构造一种含有图像信息的度量张量,把图像置于一个黎曼流形中,在黎曼流形上讨论两个图像的相似度,并把这个相似度引入wang 等人提出的SSM模型中,使其更好的读取图像的结构信息,并通过实验验证了该模型的有效性,更符合人眼视觉特性,可以很好的评价图像质量的优劣.【相关文献】[1] HARRIS C, STEPHENS M J. A combined corner and edge detector[C]// Alvey Conference, 1988:147-152.[2] WANG Hai-yun, MA Kai-kuang. Accurate optical flow estimation using adaptive scale-space and 3D structure tensor [C]//ICIP(2),2002:301-304.[3] WEN Xu. Picture quality evaluation based on error segmentation[J]. SPIE,1994,81:1454-1454.[4] JAYAT N J. Signal compression based on modles of human perception Proc[J]. IEEE,1993,81:1385-1422.[5] WANG Z, SIMONCELLI E P, BOVIK A C. Multi-scale structural similarity for image quality Proceedings of the 37th Asilomar Conference on Signals, Systems and Computers[J]. Pacific Grove,2003,2:1398-1402.[6] PENNEC X, FILLARD P, AYACHE N. A riemannian framework for tensor computing[J]. International Journal of Computer Vision, 2006, 66(1): 41-66.[7] TUZEL O, PORIKLI F M, MEER P. Human detection via classification on riemannian manifolds[C]//CVPR,2007:1-8.[8] TUZEL O, PORIKKI F, MEER P. Region covariance:A fastdescriptor for detection and classification[J]. In Proc. European Conf. on Computer Vision, Graz, Austria,2006,2:589-600.[9] FORSTNER W, MOONEN B. A metric for covariance matrices[M]. Technical report, Dept. of Geodesy and Geoinformatics, Stuttgart University,1999.。
经典常用例句目录经典常用例句 (1)目录 (1)说明 (5)常用动词 (5)一、中性词 (5)1.(文章等)给出、研究、建立、提出、提供 (5)2.由...得到、得出、得到(结论等).. (6)3.集中、侧重、强调、注重、聚焦、着重、投精力于 (6)4.用、使用、使用、采用、采取 (6)5.构造、形成、构成、由...构成、由...组成.. (7)6.覆盖、包括 (8)7.包含、包括、涉及 (8)8.认为、发现、观察 (9)9.基于、建立在...基础上. (9)10.在于 (10)11.放、置于 (10)12.影响 (10)13.考虑、考虑到 (10)14.回到、追溯、回归、回顾 (11)15.寻求、打算 (11)16.确定、决定、作决定 (11)17.刻画、描述、表述、描绘、叙述、陈述 (11)18.指示、显示、表明、指出、指明、标明 (11)19.意味着、推断、暗示、建议 (12)20.描述、刻画、理解 (12)21.需要指出的是、需要强调的是、需要注意的是 (12)22.推荐、建议、劝告 (12)23.展示、表现、展现 (13)24.控制、管理、监管、安排 (13)25.使得 (13)26.扩展、拓展、扩张 (13)27.改变、变更、变化、修改 (13)28.贡献、占据、捐献 (14)29.持续、维持 (14)30.近似、逼近 (14)31.接近、接触、进入 (14)32.成为 (14)33.趋势、趋向、潮流、发展(变化)方向 (14)二、褒义词 (14)1.保证、确保、担保 (14)3.证明、证实、演示、例证 (15)4.尽、尽量、尽力、尽可能的 (15)5.努力、尝试 (16)6.给出、提出、提供、给予、供给 (16)7.能、使能、能够、有能力 (16)8.增加、增长、增强、加强 (17)9.胜过、超过、比...多 (17)10.水平、有水平、高水平 (17)11.有、享有、允许有、拥有、具有、带有 (18)12.(对...)起作用、有效、运行(执行)良好 .. (19)13.优化 (19)14.支持、赞成、推荐、喜欢、更喜欢 (19)15.期待、期望、指望、有望、有希望 (19)16.提高、改进、有利于、发展、健康运行 (20)17.进行、执行、实现、贯彻、完成 (20)18.解决、克服、突破、避免 (21)19.使...简单(容易)、简洁、简便、方便、简单 . (22)20.优点、利益、好处 (22)21.有价值、具有理论价值、使用价值(工程使用、价值) (22)常用名词 (23)一、中性词 (23)I.单纯性名词 (23)II.动词的名词形式 (23)III.动名词 (23)二、褒义词 (23)I.单纯性名词 (23)II.动词的名词形式 (23)III.动名词 (23)三、贬义词 (23)I.单纯性名词 (23)II.动词的名词形式 (24)III.动名词 (24)常用连词 (24)一、比、象、如、连(联) (24)1.象、如、例如、正如 (24)2.联系、相关、联合、连接、关联、关系 (24)3.相似、类似、和...一样(相似). (24)4.比、比较、对比 (25)5.比...好,优于、超过、比...高、不亚于. (27)6.比...差、不如、不比...好、比...少 (27)二、因为、为了、所以、目标、观点、角度 (28)1.因为、由于、鉴于、归功于、归因于 (28)2.因此,所以 (29)4.目标、目的 (29)5.从...观点来看、从...角度讲、在...意义下、以...意义来看 .. (30)常用短语/习语、常用副词/介词 (30)1.在...的前沿,在...领域.. (30)2.在...框架内. (30)3.事先、预先、先于、在...以前、先前的、在前的 (30)4.适合于、适用于、可行的 (30)5.重要的、有用的、本质的、关键的、有益的、作为工具的 (31)6.剩余的、其余的、剩下的 (31)7.详细的、详细地 (31)8.以...(速度、顺序、尺寸、步长、字体等等). (32)9.就...而言、从...方面来看、在...方面.. (32)10.倾向于、易于 (32)11.可接受的、能接受的 (32)12.直接的、直截了当的、显然的、平凡的、容易的 (32)13.可利用的、可获得的、空闲的 (32)14.上(半)部分、下(半)部分、左(右)上部、左(右)下部 (33)15.稍微的(地)、稍稍的(地)、稍许 (33)16.显然、明显的 (33)17.大量的、丰富的 (33)18.怎样、怎么 (33)19.无论如何...、不管如何...、无论何事 (34)语法及特殊结构、用法 (34)1.现在分词的用法 (34)2.过去分词的用法 (36)3.不定式的用法:作宾语、作后置定语 (37)4.缩写、略写、省略句 (38)5.特殊符号的用法 (39)6.特殊句式 (39)7.(特殊)语法结构:独立主格结构、虚拟语气等等 (40)负面表述 (42)一、否定形式 (42)1.Not及No的形式否定 (42)2.介词意义否定 (43)3.动词意义否定 (43)4.短语意义否定 (43)5.形容词短语意义否定 (44)6.形容词、副词及其比较级意义否定 (44)7.前缀及后缀否定 (45)8.连词意义否定 (45)二、贬义动词 (46)1.出现、发生、遇到、遭遇 (46)2.牵扯、牵涉、卷入、包含 (47)3.阻止 (47)4.导致、引起、招致、受困于 (47)5.掩盖、遮住、隐瞒、隐藏 (48)6.欺骗、被骗 (48)7.忽略、忽视、省略、避免 (48)8.除...外、除...外(还有). (49)9.排除、去除、删除、去掉、移动 (49)10.降低、减少、退化、恶化、减小 (49)11.失败、失效、舍弃 (50)12.歪曲、曲解、扭曲 (50)13.滥用、混淆、盲目 (50)14.要求、需要、必需、必需品、必须 (50)三、贬义短语、名词、形容词、介词、连词 (52)1.不便,麻烦,繁重 (52)2.破费、昂贵、在损害...的情况下、以损害...为代价.. (52)3.冒险、风险 (52)4.挑战 (52)5.缺点、缺陷、局限、不利条件 (53)6.困难、麻烦、障碍、损失(不利结果) (53)7.苛刻的、苛求的、受限的、有限的 (54)8.差、差的、最差、最差的 (54)9.尽管、不管、不论 (54)四、矛盾(常用于反证法) (55)五、区别、不同、和...不同.. (55)图、表、例 (55)1.图 (55)2.表 (56)3.例 (56)文章的结尾部分 (56)1.经验、教训 (56)2.总结、概括、报告、结论 (56)3.将来的工作(研究)、开放性的问题 (57)4.附录 (57)5.感谢、感激 (57)6.(文献)引用、参考 (58)专业知识 (59)一、控制 (59)二、测量 (59)三、神经网络 (59)1.数据及其处理 (59)2.神经网络的结构和算法 (60)3.神经网络的训练 (61)4.神经网络训练的偏差和精度 (61)5.神经网络训练的收敛性(稳定) (62)6.神经网络的逼近性能和特点 (63)数学常用语 (63)1.向量、空间、系统的维数 (63)2.微分、求导、初等变换、可微的(可导的)、导数 (63)3.解方程、给出...的解 (64)4.张成向量空间、取秩 (64)5.距离、度量 (64)6.平方根 (64)7.区间 (64)8.精度、准确性、精确性 (64)9.前提、前提条件、充要条件 (64)10.在...情况/条件/背景/前提下、背景、情况、前提.. (65)11.满足条件、满足要求、条件成立、结论成立 (66)12.可能性、概率、百分比 (66)13.(作)差、距离、差值 (67)14.带入、替代 (67)15.迭代 (68)16.划分、分类、分组、分解 (68)17.逐步、逐点、逐渐 (68)18.等价、等于 (68)19.收敛、收敛速度 (69)20.有限步内 (69)21.计算、计算量 (69)说明1.“经典常用例句”在其注释中包含“经典短语、经典搭配”等;2. 这些例句均摘自“美国(或英国)原版外文材料(论文或图书)”,完全值得信赖和模仿。
M.J.F.GalesCambridge University Engineering DepartmentTrumpington Street,Cambridge.CB21PZUnited Kingdommjfg@AbstractA new form of covariance modelling for Gaussian mixture models andhidden Markov models is presented.This is an extension to an efficientform of covariance modelling used in speech recognition,semi-tied co-variance matrices.In the standard form of semi-tied covariance matricesthe covariance matrix is decomposed into a highly shared decorrelatingtransform and a component-specific diagonal covariance matrix.The useof a factored decorrelating transform is presented in this paper.This fac-toring effectively increases the number of possible transforms without in-creasing the number of free parameters.Maximum likelihood estimationschemes for all the model parameters are presented including the compo-nent/transform assignment,transform and component parameters.Thisnew model form is evaluated on a large vocabulary speech recognitiontask.It is shown that using this factored form of covariance modellingreduces the word error rate.1IntroductionA standard problem in machine learning is to how to efficiently model correlations in multi-dimensional data.Solutions should be efficient both in terms of number of model param-eters and cost of the likelihood calculation.For speech recognition this is particularly important due to the large number of Gaussian components used,typically in the tens of thousands,and the relatively large dimensionality of the data,typically30-60.The following generative model has been used in speech recognition(1)(2) where is the underlying speech signal,is the observation transformation matrix,is generated by a hidden Markov model(HMM)with diagonal covariance matrix Gaussianmixture model(GMM)to model each state and is usually assumed to be generated by aGMM,which is common to all HMMs.This differs from the static linear Gaussian models presented in[7]in two important ways.First is generated by either an HMM or GMM,rather than a simple Gaussian distribution.The second difference is that the“noise”isnow restricted to the null space of the signal.This type of system can be considered to have two streams.Thefirst stream,the dimensions associated with,is the setof discriminating,useful,dimensions.The second stream,the dimensions associated with,is the set of non-discriminating,nuisance,dimensions.Linear discriminant analy-sis(LDA)and heteroscedastic LDA(HLDA)[5]are both based on this form of generativemodel.When the dimensionality of the nuisance dimensions is reduced to zero this gener-ative model becomes equivalent to a semi-tied covariance matrix system[3]with a single,global,semi-tied class.This generative model has a clear advantage during recognition compared to the standardlinear Gaussian models[2]in the reduction in the computational cost of the likelihoodcalculation.The likelihood for component may be computed asdiagAlthough it is not strictly necessary to use diagonal covariance matrices,these currently dominate applications in speech recognition.could also be generated by a simple GMM.This paper uses the following convention:capital bold letters refer to matrices e.g.,bold letters refer to vectors e.g.,and scalars are not bold e.g..When referring to elements of a matrix or vector subscripts are used e.g.is the row of matrix,is the element of row column of matrix and is element of vector.Diagonal matrices are indicated by diag.Wheremultiple streams are used this is indicated,for example,by,this is a matrix(is the dimensionality of the feature vector and is the size of stream).Where subsets of the diagonal matrices are specified the matrices are square,e.g.diag is square diagonal matrix.is the transpose of the matrix and det is the determinant of the matrix.increasing the number of transforms increases the number of model parameters to be esti-mated,hence reducing the robustness of the estimates.There is a corresponding increase in the computational cost during recognition.In the limit there is a single transform per com-ponent,the standard full-covariance matrix case.The approach adopted in this paper is to factor the transform into multiple streams.Each component can then use a different trans-form for each stream.Hence instead of using an assignment variable an assignment vector is used.In order to maintain the efficient likelihood computation of equation3,, rather than,must be factored into rows.This is a partitioning of the feature space into a set of observation streams.In common with other factoring schemes this dramatically in-creases the effective number of transforms from which each component may select without increasing the number of transform parameters.Though this paper only considers factoring semi-tied covariance matrices the extension to the“projection”schemes presented in[2]is straightforward.This paper describes how to estimate the set of transforms and determine which subspaces a particular component should use.The next section describes how to assign components to transforms and,given this assignment,how to estimate the appropriate transforms.Some initial experiments on a large vocabulary speech recognition task are presented in the fol-lowing section.2Factored Semi-Tied Covariance MatricesIn order to factor semi-tied covariance matrices the inverse of the observation transforma-tion for a component is broken into multiple streams.The feature space of each stream is then determined by selecting from an inventory of possible transforms.Consider the case where there are streams.The effective full covariance matrix of component,, may be written as diag,where the form of is restricted so that...(4)and is the-dimensional assignment vector for component.The complete set of model parameters,,consists of the standard model parameters,the component means, variances,weights and,additionally,the set of transforms for each stream(is the number of transforms associated with stream)and the assignment vector for each component.Note that the semi-tied covariance matrix scheme is the case when.The likelihood is efficiently estimated by storing transformed observa-tions for each stream transform,i.e..The model parameters are estimated using ML training on a labelled set of training data.The likelihood of the training data may be written as(5)diagwhere is the set of all valid state sequences according to the transcription for the data, is the state at time of the current path,is the set of Gaussian components be-longing to state,and is the prior of component.Directly optimising equation5 is a very large optimisation task,as there are typically millions of model parameters.Alter-natively,as is common with standard HMM training,an EM-based approach is used.The posterior probability of a particular component,,generating the observation at a given time instance is denoted as.This may be simply found using the forward backward algorithm[6]and the old set of model parameters.The new set of model parameters will be denoted as.The estimation of the component priors and HMM transition ma-trices are estimated in the standard fashion[6].Directly optimising the auxiliary function for the model parameters is computationally expensive[3]and does not allow the embed-ding of the assignment process.Instead a simple iterative optimisation scheme is used as follows:1.Estimate the within class covariance matrix for each Gaussian component in thesystem,,using the values of.Initialise the set of assignment vectors,and the set of transforms for each stream.ing the current estimates of the transforms and assignment vectors obtain theML estimate of the set of component specific diagonal covariance matrices incor-porating the appropriate parameter tying as required.This set of parameters willbe denoted as diag diag.3.Estimate the new set of transforms,,using the current set of component co-variance matrices and assignment vectors.The new auxiliary function at this stage will be written as.4.Update the set of assignment variables for each component,given the currentset of model transforms,.5.Goto(2)until convergence,or an appropriate stopping criterion is satisfied.Oth-erwise update and the component means using the latest transforms andassignment variables.There are three distinct optimisation problems within this task.First the ML estimate of the set of component specific diagonal covariance matrices is required.Second,the new set of transforms must be estimated.Finally the new set of assignment vectors is required. The ML estimates of the component specific variances(and means)under a transformation is a standard problem,e.g.for the semi-tied case see[3]and is not described further.The ML estimation of the transforms and assignment variables are described below.The transforms are estimated in an iterative fashion.The proposed scheme is derived by modifying the standard semi-tied covariance optimisation equation in[3].A row by rowoptimisation is used.Consider row of stream of transform,,the auxiliary function may be written as(ignoring constant scalings and elements independent of)where,(8) The main cost for computing the gradient is calculating the cofactors for each component. Having computed the gradient the Hessian may also be simply calculated as(11)det diagwhere the hypothesised assignment of factor stream,,is given byotherwise(12)As the assignment is dependent on the cofactors,which themselves are dependent on the other stream assignments for that component,an iterative scheme is required.In practice this was found to converge rapidly.3Results and DiscussionAn initial investigation of the use of factored semi-tied covariance matrices was carried out on a large-vocabulary speaker-independent continuous-speech recognition task.The recognition experiments were performed on the1994ARPA Hub1data(the H1task),an unlimited vocabulary task.The results were averaged over the development and evaluation data.Note that no tuning on the“development”data was performed.The baseline sys-tem used for the recognition task was a gender-independent cross-word-triphone mixture-Gaussian tied-state HMM system.For details of the system see[8].The total number of phones(counting silence as a separate phone)was46,from which6399distinct context states were formed.The speech was parameterised into a39-dimensional feature vector. The set of baseline experiments with semi-tied covariance matrices()used“expert”knowledge to determine the transform classes.Two sets were used.Thefirst was based on phone level transforms where all components of all states from the same phone shared the same class(phone classes).The second used an individual transform per state(state classes).In addition a global transform(global class)and a full-covariance matrix system (comp class)were tested.Two systems were examined,a four Gaussian components per state system and a twelve Gaussian component system.The twelve component system is the standard system described in[8].In both cases a diagonal covariance matrix system(la-belled none)was generated in the standard HTK fashion[9].These systems were then used to generate the initial alignments to build the semi-tied systems.An additional iteration of Baum-Welch estimation was then performed.Three forms of assignment training were compared.The previously described expert sys-tem and two ML-based schemes,standard and factored.The standard scheme used a single stream()which is similar to the scheme described in[3].The factored scheme used the new approach described in this paper with a separate stream for each of the elements of the feature vector().Table1:System performance on the1994ARPA H1taskTransform ComponentsScheme4—11.11—10.34expert10.04expert9.20—9.22standard9.73factored9.4895%level)outperformed the standard12-component system(9.71%).The expert phone system shows around an9%degradation in performance compared to the state system, but used less than a hundredth of the number of transforms(46versus6399).Using the standard ML assignment scheme with initial phone classes,,reduced the error rate of the phone system by around3%over the expert system.The factored scheme,, achieved further reductions in error rate.A5%reduction in word error rate was achieved over the expert system,which is significant at the95%level.Table1also shows the performance of the twelve component system.The use of a global semi-tied transform significantly reduced the error rate by around9%relative.Increasing the number of transforms using the expert assignment showed no reduction in error rate. Again using the phone level system and training the component transform assignments, either the standard or the factored schemes,reduced the word error ing the factored semi-tied transforms()significantly reduced the error rate,by around5%,compared to the expert systems.4ConclusionsThis paper has presented a new form of semi-tied covariance,the factored semi-tied co-variance matrix.The theory for estimating these transforms has been developed and im-plemented on a large vocabulary speech recognition task.On this task the use of these factored transforms was found to decrease the word error rate by around5%over using a single transform,or multiple transforms,where the assignments are expertly determined. The improvement was significant at the95%level.In future work the problems of deter-mining the required number of transforms for each of the streams and how to determine the appropriate dimensions will be investigated.References[1]A P Dempster,N M Laird,and D B Rubin.Maximum likelihood from incomplete data via theEM algorithm.Journal of the Royal Statistical Society,39:1–38,1977.[2]M J F Gales.Maximum likelihood multiple projection schemes for hidden Markov models.Tech-nical Report CUED/F-INFENG/TR365,Cambridge University,1999.Available via anonymous ftp from:.[3]M J F Gales.Semi-tied covariance matrices for hidden Markov models.IEEE TransactionsSpeech and Audio Processing,7:272–281,1999.[4]N K Goel and R Gopinath.Multiple linear transforms.In Proceedings ICASSP,2001.To appear.[5]N Kumar.Investigation of Silicon-Auditory Models and Generalization of Linear DiscriminantAnalysis for Improved Speech Recognition.PhD thesis,John Hopkins University,1997.[6]L R Rabiner.A tutorial on hidden Markov models and selected applications in speech recogni-tion.Proceedings of the IEEE,77,February1989.[7]S Roweiss and Z Ghahramani.A unifying review of linear Gaussian models.Neural Computa-tion,11:305–345,1999.[8]P C Woodland,J J Odell,V Valtchev,and S J Young.The development of the1994HTK largevocabulary speech recognition system.In Proceedings ARPA Workshop on Spoken Language Systems Technology,pages104–109,1995.[9]S J Young,J Jansen,J Odell,D Ollason,and P Woodland.The HTK Book(for HTK Version2.0).Cambridge University,1996.。
Package‘MGMM’September30,2023Title Missingness Aware Gaussian Mixture ModelsDate2023-09-30Version1.0.1.1Description Parameter estimation and classification for Gaussian Mixture Mod-els(GMMs)in the presence of missing data.This package complements existing implementa-tions by allowing for both missing elements in the input vectors and full(as op-posed to strictly diagonal)covariance matrices.Estimation is performed using an expecta-tion conditional maximization algorithm that accounts for missingness of both the cluster assign-ments and the vector components.The output includes the marginal cluster membership proba-bilities;the mean and covariance of each cluster;the posterior probabilities of cluster member-ship;and a completed version of the input data,with missing values imputed to their poste-rior expectations.For additional details,please see McCaw ZR,Julienne H,Aschard H.``Fit-ting Gaussian mixture models on incomplete data.''<doi:10.1186/s12859-022-04740-9>. Depends R(>=3.5.0)License GPL-3Encoding UTF-8LinkingTo Rcpp,RcppArmadilloImports cluster,methods,mvnfast,plyr,Rcpp(>=1.0.3),statsSuggests testthat(>=3.0.0),knitr,rmarkdown,withrVignetteBuilder knitrRoxygenNote7.2.3Config/testthat/edition3NeedsCompilation yesAuthor Zachary McCaw[aut,cre](<https:///0000-0002-2006-9828>)Maintainer Zachary McCaw<*********************.edu>Repository CRANDate/Publication2023-09-3015:42:38UTC12CalHar R topics documented:CalHar (2)ChooseK (3)ClustQual (4)CombineMIs (5)DavBou (6)FitGMM (6)FitMix (8)FitMVN (9)GenImputation (10)logLik.mix (11)logLik.mvn (11)mean.mix (12)mean.mvn (12)mix-class (13)MixUpdateMeans (13)mvn-class (14)PartitionData (14)print.mix (15)print.mvn (15)ReconstituteData (16)rGMM (16)show,mix-method (18)show,mvn-method (18)vcov.mix (19)vcov.mvn (19)Index20 CalHar Calinski-Harabaz IndexDescriptionCalculates the Calinski-Harabaz index.UsageCalHar(data,assign,means)Argumentsdata Observations.assign Assignments.means List of cluster means.ChooseK3ValueScalar metric.ChooseK Cluster Number SelectionDescriptionFunction to choose the number of clusters k.Examines cluster numbers between k0and k1.For each cluster number,generates boot bootstrap data sets,fits the Gaussian Mixture Model(FitGMM), and calculates quality metrics(ClustQual).For each metric,determines the optimal cluster number k_opt,and the k_1SE,the smallest cluster number whose quality is within1SE of the optimum. UsageChooseK(data,k0=2,k1=NULL,boot=100,init_means=NULL,fix_means=FALSE,init_covs=NULL,init_props=NULL,maxit=10,eps=1e-04,report=TRUE)Argumentsdata Numeric data matrix.k0Minimum number of clusters.k1Maximum number of clusters.boot Bootstrap replicates.init_means Optional list of initial mean vectors.fix_means Fix the means to their starting value?Must provide initial values.init_covs Optional list of initial covariance matrices.init_props Optional vector of initial cluster proportions.maxit Maximum number of EM iterations.eps Minimum acceptable increment in the EM objective.report Report bootstrap progress?4ClustQualValueList containing Choices,the recommended number of clusters according to each quality metric, and Results,the mean and standard error of the quality metrics at each cluster number evaluated. See AlsoSee ClustQual for evaluating cluster quality,and FitGMM for estimating the GMM with a specified cluster number.Examplesset.seed(100)mean_list<-list(c(2,2),c(2,-2),c(-2,2),c(-2,-2))data<-rGMM(n=500,d=2,k=4,means=mean_list)choose_k<-ChooseK(data,k0=2,k1=6,boot=10)choose_k$ChoicesClustQual Cluster QualityDescriptionEvaluates cluster quality.Returns the following metrics:•BIC:Bayesian Information Criterion,lower value indicates better clustering quality.•CHI:Calinski-Harabaz Index,higher value indicates better clustering quality.•DBI:Davies-Bouldin,lower value indicates better clustering quality.•SIL:Silhouette Width,higher value indicates better clustering quality.UsageClustQual(fit)Argumentsfit Object of class mix.ValueList containing the cluster quality metrics.See AlsoSee ChooseK for using quality metrics to choose the cluster number.CombineMIs5Examplesset.seed(100)#Data generationmean_list=list(c(2,2,2),c(-2,2,2),c(2,-2,2),c(2,2,-2))data<-rGMM(n=500,d=3,k=4,means=mean_list)fit<-FitGMM(data,k=4)#Clustering qualitycluster_qual<-ClustQual(fit)CombineMIs Combine Multiple ImputationsDescriptionCombines point estimates and standard errors across multiple imputations.UsageCombineMIs(points,covs)Argumentspoints List of point estimates,potentially vector valued.covs List of sampling covariances,potentially matrix valued.ValueList containing thefinal point estimate(‘point‘)and sampling covariance(‘cov‘).Examplesset.seed(100)#Generate data and introduce missingness.data<-rGMM(n=25,d=2,k=1)data[1,1]<-NAdata[2,2]<-NAdata[3,]<-NA#Fit GMM.fit<-FitGMM(data)#Lists to store summary statistics.points<-list()covs<-list()#Perform50multiple imputations.#For each,calculate the marginal mean and its sampling variance.for(i in seq_len(50)){imputed<-GenImputation(fit)points[[i]]<-apply(imputed,2,mean)covs[[i]]<-cov(imputed)/nrow(imputed)}#Combine summary statistics across imputations.results<-CombineMIs(points,covs)DavBou Davies-Bouldin IndexDescriptionCalculates the Davies-Bouldin index.UsageDavBou(data,assign,means)Argumentsdata Observationsassign Assignmentsmeans List of cluster meansValueScalar index.FitGMM Estimate Multivariate Normal MixtureDescriptionGiven an n×d matrix of random vectors,estimates the parameters of a Gaussian Mixture Model (GMM).Accommodates arbitrary patterns of missingness at random(MAR)in the input vectors.UsageFitGMM(data,k=1,init_means=NULL,fix_means=FALSE,init_covs=NULL,init_props=NULL,maxit=100,eps=1e-06,report=TRUE)Argumentsdata Numeric data matrix.k Number of mixture components.Defaults to1.init_means Optional list of initial mean vectors.fix_means Fix the means to their starting value?Must provide initial values.init_covs Optional list of initial covariance matrices.init_props Optional vector of initial cluster proportions.maxit Maximum number of EM iterations.eps Minimum acceptable increment in the EM objective.report Reportfitting progress?DetailsInitial values for the cluster means,covariances,and proportions are specified using M0,S0,and pi0,respectively.If the data contains complete observations,i.e.observations with no missing elements,then fit.GMM will attempt to initialize these parameters internally using K-means.If the data contains no complete observations,then initial values are required for M0,S0,and pi0.Value•For a single component,an object of class mvn,containing the estimated mean and covariance, thefinal objective function,and the imputed data.•For a multicomponent model k>1,an object of class mix,containing the estimated means, covariances,cluster proportions,cluster responsibilities,and observation assignments.See AlsoSee rGMM for data generation,and ChooseK for selecting the number of clusters.8FitMix Examples#Single component without missingness#Bivariate normal observationssigma<-matrix(c(1,0.5,0.5,1),nrow=2)data<-rGMM(n=1e3,d=2,k=1,means=c(2,2),covs=sigma)fit<-FitGMM(data,k=1)#Single component with missingness#Trivariate normal observationsmean_list<-list(c(-2,-2,-2),c(2,2,2))sigma<-matrix(c(1,0.5,0.5,0.5,1,0.5,0.5,0.5,1),nrow=3)data<-rGMM(n=1e3,d=3,k=2,means=mean_list,covs=sigma)fit<-FitGMM(data,k=2)#Two components without missingness#Trivariate normal observationsmean_list<-list(c(-2,-2,-2),c(2,2,2))sigma<-matrix(c(1,0.5,0.5,0.5,1,0.5,0.5,0.5,1),nrow=3)data<-rGMM(n=1e3,d=3,k=2,means=mean_list,covs=sigma)fit<-FitGMM(data,k=2)#Four components with missingness#Bivariate normal observations#Note:Fitting is slow.mean_list<-list(c(2,2),c(2,-2),c(-2,2),c(-2,-2))sigma<-0.5*diag(2)data<-rGMM(n=1000,d=2,k=4,pi=c(0.35,0.15,0.15,0.35),m=0.1,means=mean_list,covs=sigma)fit<-FitGMM(data,k=4)FitMix Fit Multivariate Mixture DistributionDescriptionGiven a matrix of random vectors,estimates the parameters for a mixture of multivariate normal distributions.Accommodates arbitrary patterns of missingness,provided the elements are missing at random(MAR).FitMVN9UsageFitMix(data,k=2,init_means=NULL,fix_means=FALSE,init_covs=NULL,init_props=NULL,maxit=100,eps=1e-06,report=FALSE)Argumentsdata Numeric data matrix.k Number of mixture components.Defaults to2.init_means Optional list of initial mean vectors.fix_means Fix means to their starting values?Must initialize.init_covs Optional list of initial covariance matrices.init_props Optional vector of initial cluster proportions.maxit Maximum number of EM iterations.eps Minimum acceptable increment in the EM objective.report Reportfitting progress?ValueObject of class mix.FitMVN Fit Multivariate Normal DistributionDescriptionGiven a matrix of n x d-dimensional random vectors,possibly containing missing elements,esti-mates the mean and covariance of the bestfitting multivariate normal distribution.UsageFitMVN(data,init_mean=NULL,fix_mean=FALSE,init_cov=NULL,10GenImputation maxit=100,eps=1e-06,report=TRUE)Argumentsdata Numeric data matrix.init_mean Optional initial mean vector.fix_mean Fix the mean to its starting value?Must initialize.init_cov Optional initial covariance matrix.maxit Maximum number of EM iterations.eps Minimum acceptable increment in the EM objective.report Reportfitting progress?ValueAn object of class mvn.GenImputation Generate ImputationDescriptionGenerates a stochastic imputation of a data set from afitted data set.UsageGenImputation(fit)Argumentsfit Fitted model.ValueNumeric matrix with missing values imputed.logLik.mix11Examplesset.seed(100)#Generate data and introduce missingness.data<-rGMM(n=25,d=2,k=1)data[1,1]<-NAdata[2,2]<-NAdata[3,]<-NA#Fit GMM.fit<-FitGMM(data)#Generate imputation.imputed<-GenImputation(fit)logLik.mix Log likelihood for Fitted GMMDescriptionLog likelihood for Fitted GMMUsage##S3method for class mixlogLik(object,...)Argumentsobject A mix object....Unused.logLik.mvn Log likelihood for Fitted MVN ModelDescriptionLog likelihood for Fitted MVN ModelUsage##S3method for class mvnlogLik(object,...)Argumentsobject A mvn object....Unused.12mean.mvn mean.mix Mean for Fitted GMMDescriptionMean for Fitted GMMUsage##S3method for class mixmean(x,...)Argumentsx A mix object....Unused.mean.mvn Mean for Fitted MVN ModelDescriptionMean for Fitted MVN ModelUsage##S3method for class mvnmean(x,...)Argumentsx A mvn object....Unused.mix-class13 mix-class Mixture Model ClassDescriptionDefines a class to hold Gaussian Mixture Models.SlotsAssignments Maximum a posteriori assignments.Completed Completed data,with missing values imputed to their posterior expectations.Components Number of components.Covariances List offitted cluster covariance matrices.Data Original data,with missing values present.Density Density of each component at each example.Means List offitted cluster means.Objective Final value of the EM objective.Proportions Fitted cluster proportions.Responsibilities Posterior membership probabilities for each example. MixUpdateMeans Mean Update for Mixture of MVNs with Missingness.DescriptionMean Update for Mixture of MVNs with Missingness.UsageMixUpdateMeans(split_data,means,covs,gamma)Argumentssplit_data Data partitioned by missingness.means List of component means.covs List of component covariances.gamma List of component responsibilities.ValueList containing the updated component means.14PartitionData mvn-class Multivariate Normal Model ClassDescriptionDefines a class to hold multivariate normal models.SlotsCompleted Completed data,with missing values imputed to their posterior expectations.Covariance Fitted covariance matrix.Data Original data,with missing values present.Mean Fitted mean vector.Objective Final value of the EM objective.PartitionData Partition Data by Missingness PatternDescriptionReturns a list with the input data split in separate matrices for complete cases,incomplete cases, and empty cases.UsagePartitionData(data)Argumentsdata Data.frame.ValueList containing:•The original row and column names:‘orig_row_names‘,‘orig_col_names‘.•The original row and column numbers:‘n_row‘and‘n_col‘.•The complete cases‘data_comp‘.•The incomplete cases‘data_incomp‘.•The empty cases‘data_empty‘.•Counts of complete‘n0‘,incomplete‘n1‘,and empty‘n2‘cases.•Initial order of the observations‘init_order‘.print.mix15 print.mix Print for Fitted GMMDescriptionPrint method for objects of class mix.Usage##S3method for class mixprint(x,...)Argumentsx A mix object....Unused.print.mvn Print for Fitted MVN ModelDescriptionPrint for Fitted MVN ModelUsage##S3method for class mvnprint(x,...)Argumentsx A mvn object....Unused.16rGMM ReconstituteData Reconstitute DataDescriptionReassembles a data matrix split by missingness pattern.UsageReconstituteData(split_data)Argumentssplit_data Split data are returned by PartitionData.ValueNumeric matrix.rGMM Generate Data from Gaussian Mixture ModelsDescriptionGenerates an n×d matrix of multivariate normal random vectors with observations(examples)as rows.If k=1,all observations belong to the same cluster.If k>1the observations are generated via two-step procedure.First,the cluster membership is drawn from a multinomial distribution, with mixture proportions specified by pi.Conditional on cluster membership,the observation is drawn from a multivariate normal distribution,with cluster-specific mean and covariance.The cluster means are provided using means,and the cluster covariance matrices are provided using covs.If miss>0,missingness is introduced,completely at random,by setting that proportion of elements in the data matrix to NA.UsagerGMM(n,d=2,k=1,pi=NULL,miss=0,means=NULL,covs=NULL) Argumentsn Observations(rows).d Observation dimension(columns).k Number of mixture components.Defaults to1.pi Mixture proportions.If omitted,components are assumed equiprobable.miss Proportion of elements missing,miss∈[0,1).rGMM17 means Either a prototype mean vector,or a list of mean vectors.Defaults to the zero vector.covs Either a prototype covariance matrix,or a list of covariance matrices.Defaults to the identity matrix.ValueNumeric matrix with observations as rows.Row numbers specify the true cluster assignments.See AlsoFor estimation,see FitGMM.Examplesset.seed(100)#Single component without missingness.#Bivariate normal observations.cov<-matrix(c(1,0.5,0.5,1),nrow=2)data<-rGMM(n=1e3,d=2,k=1,means=c(2,2),covs=cov)#Single component with missingness.#Trivariate normal observations.mean_list<-list(c(-2,-2,-2),c(2,2,2))cov<-matrix(c(1,0.5,0.5,0.5,1,0.5,0.5,0.5,1),nrow=3)data<-rGMM(n=1e3,d=3,k=2,means=mean_list,covs=cov)#Two components without missingness.#Trivariate normal observations.mean_list<-list(c(-2,-2,-2),c(2,2,2))cov<-matrix(c(1,0.5,0.5,0.5,1,0.5,0.5,0.5,1),nrow=3)data<-rGMM(n=1e3,d=3,k=2,means=mean_list,covs=cov)#Four components with missingness.#Bivariate normal observations.mean_list<-list(c(2,2),c(2,-2),c(-2,2),c(-2,-2))cov<-0.5*diag(2)data<-rGMM(n=1000,d=2,k=4,pi=c(0.35,0.15,0.15,0.35),miss=0.1,means=mean_list,covs=cov)18show,mvn-method show,mix-method Show for Fitted Mixture ModelsDescriptionShow for Fitted Mixture ModelsUsage##S4method for signature mixshow(object)Argumentsobject A mix object.show,mvn-method Show for Multivariate Normal ModelsDescriptionShow for Multivariate Normal ModelsUsage##S4method for signature mvnshow(object)Argumentsobject A mvn object.vcov.mix19 vcov.mix Covariance for Fitted GMMDescriptionCovariance for Fitted GMMUsage##S3method for class mixvcov(object,...)Argumentsobject A mix object....Unused.vcov.mvn Covariance for Fitted MVN ModelDescriptionCovariance for Fitted MVN ModelUsage##S3method for class mvnvcov(object,...)Argumentsobject A mvn object....Unused.IndexCalHar,2ChooseK,3,4,7ClustQual,3,4,4CombineMIs,5DavBou,6FitGMM,3,4,6,17FitMix,8FitMVN,9GenImputation,10logLik.mix,11logLik.mvn,11mean.mix,12mean.mvn,12mix-class,13MixUpdateMeans,13mvn-class,14PartitionData,14,16print.mix,15print.mvn,15ReconstituteData,16rGMM,7,16show,mix-method,18show,mvn-method,18vcov.mix,19vcov.mvn,1920。
Style-conscious quadraticfield classifierSriharsha Veeramachaneni,Hiromichi Fujisawa,Cheng-Lin Liu and George Nagy Rensselaer Polytechnic Institute,Troy,NY,USAHitachi Central Research Laboratory,Tokyo,JapanE-mail:veeras@Abstract1When patterns occur in the form of groups generated by the same source,distinctions between sources can be ex-ploited to improve accuracy.We present a method for ex-ploiting such‘style’consistency using a quadratic discrim-inant.We show that under reasonable assumptions on the feature and class distributions,the estimation of style pa-rameters is simple and accurate.1.IntroductionIt is well known that an optical character recognition sys-tem which is tailored to a certain font(machine print)or personalized to a particular writer(handwriting)has higher accuracy on input drawn from the particular font or writer than one which is trained on multiple fonts or multiple writ-ers.Confusions between classes across fonts or writers cause this increase in error rate.If,during classification, the identity of the font or the writer is known,a mixed font/writer classifier can perform font/writer-specific clas-sification leading to higher accuracy.However,this infor-mation is often unknown and difficult to determine during operation time.Nevertheless,if the input patterns appear in the form of isogenousfields,that is,a group of multiple patterns(hence-forth called singlets)belonging to the same font or gener-ated by the same writer,classification accuracy can be im-proved.This dependence or correlation between features of co-occurring patterns is called style context.Style context can exist independently of linguistic context(dependence between class labels of co-occuring patterns)and hence can be exploited concurrently.Methods to utilize linguistic con-text to improve word recognition accuracy have been well studied[12][9].Many multi-source recognition methods at-tempt to improve accuracy by extracting features that are invariant to source-specific peculiarities such as size,slant, 1Submitted to ICPR’02skew,etc.[13].Bazzi et al.observe that due to the assumption that successive observations are statistically independent,HMM methods have higher error rate for infrequent styles[1]. They propose modifying the mixture of styles in the training data to overcome this effect.Cha and Srihari propose a dichotomizer that classifies a pair of documents into the categories‘same author’or‘dif-ferent author’[2].This is irrespective of the classes that appear on the documents,which is converse to our prob-lem where it is known that co-occurring patterns are from the same writer and the problem is to identify the class irre-spective of the writer.Nagy suggests that even in the absence of any linguis-tic context,spatial context,i.e.,the stationary nature of typeset,typeface and shape deformations over a long se-quence of symbols can be used to improve classification accuracy[9].Sarkar models styles by multi-modal distri-butions with two layers of hidden mixtures corresponding to styles and variants[11].The mixing parameters are es-timated using the EM algorithm.Although under the as-sumptions made,this approach is optimal,the complex EM estimation stage is prone to small-sample errors.Under es-sentially the same assumptions as ours,Kawatani attempts to utilize style consistency by identifying‘unnaturalness’in input patterns based on other patterns by the same writer [5].This is done byfinding the distance to other patterns classified into the same class or correlations with patterns classified into different classes.Our method combines the above heuristic criteria into one classification decision us-ing all the information available.Koshinaka et al.model style context as the dependence between subcategory labels of adjacent symbols[7].In many OCR applications quadratic discriminant meth-ods provide a robust yet versatile means for classification. In order to reduce the effect of small-sample estimation of covariance matrices on accuracy,various methods to smooth the estimates,such as the Modified Quadratic Dis-criminant Function[6]and Rectified Discriminant Function [10],have been proposed.2.Mathematical formulationConsider the problem of classifying an isogenousfield(each represents feature mea-surements for one of patterns in thefield)generated by one of sources(writers,fonts,etc.).Letbe the set of singlet-class labels.Let represent the identity of the pattern of thefield.We make the following assumptions on the class and fea-ture distributions.1..That is, any linguistic context is source independent.2..The features of each pat-tern in thefield are class-conditionally independent of the features of every other in the samefield.Under the above assumptions,we will show that for any function,we have for,.This result will be useful while deriving the formulae for thefield-class-conditional means and covari-ance matrices.(1)2.1.Expressions for means and covariancesUnder the assumption offield-class-conditionally nor-mally distributed features we can use a quadratic discrim-inant function forfield classification.We will now derive the formulae forfield-class-conditional means and covari-ance matrices.For now we set.Let be afield-class label.The mean vector for thefield class is given by(2)Thus thefield-class-conditional mean vector can be constructed by concatenating the component singlet-class-conditional mean vectors.Let us compute thefield-class-conditional covariance matrix for the class which we will denote by.(3)whereThus,can be written as a block matrix where the diagonal blocks are just the class-conditional singlet covari-ance matrices.Also,note that the above derivations can begeneralized to longerfields to yieldfield-class-conditional means and covariance matrices that can be constructedfrom the singlet-class-conditional means and from the blocks and the blocks (since).For example,thefield-feature mean and covariance matrix for class are given by;Hence the computation offield-class-conditional covari-ance matrices for longfields is no more complicated than for.2.2.Estimation of the covariance matricesAs mentioned previously the matrices are the class-conditional singlet covariance matrices.We have,from Equation1(4)Hence,the singlet covariance matrices can be computed from the weighted sum of the source-conditional‘power’matrices,and the singlet class means.In general,the accurate estimation of the off-diagonal matrices requires a large number offield-samples for eachfield-class.We will show that under assumptions made at the begin-ning of Section2the computation of the cross covariance matrices is simplified.We have(5)Thus,the cross covariance matrices can be computed from the source-specific singlet-class means() as follows.(6)Hence the estimates of the cross covariance matrices are believed to be accurate.2.3.ClassificationHaving obtained the estimates for thefield-class-specific means and covariance matrices,we can classify the input fields using a straightforward quadratic classifier.If all thefield classes are a priori equally likely, the quadratic discriminant function(QDF)forfield-class is given by(7)Thefield feature vector is assigned the class label which yields the minimum discriminant value.We will now show that when the off-diagonal blocks in thefield-class covariance matrices are zero matrices,i.e.,,the quadraticfield classifier is equivalent to the quadratic singlet classifier.Such a scenario results from the presence of only one source,or when the source-specific class means are same for all classes.(8)This implies that thefield-class that minimizes is the one formed by the singlet-classes that minimizeand respectively.Hence,any style information used to improve singlet classification is derived from the varia-tion in the class means between sources.This is intuitively appealing since within-source variation does not contribute any style information.We note that the quadraticfield classifier is computation-ally expensive due to the exponential increase in number of classes with and also because the discriminant computa-tion involves larger matrices and vectors.2.4.Covariance matrix smoothingIn the absence of sufficient training samples the estima-tion error of the covariance matrices degrades the accuracy of the QDF classifier.In order to alleviate the adverse effect of small-sample estimation on accuracy Friedman proposed the Regularized Discriminant Function(RDF)[3].In RDF, the covariance matrix of a class is an interpolation of the estimated covariance matrix and the identity matrix,(9) where,and.For thefield-class covariance matrices,RDF was gen-eralized as follows.The singlet-class covariance matrices(the diagonal blocks in thefield-covariance matrices)were modified according to Equa-tion9,and the off-diagonal blocks were modified according to.3.ExperimentsWe used the databases SD3and SD7,which are con-tained in the NIST Special Database SD19[4].The database contains handwritten numeral samples labeled by writer and class.SD3was the training data released for the First Census OCR Systems Conference and SD7was used as the test data.We constructed four datasets,two from each of SD3and SD7,as shown in Table1.Since we compute thefield class-conditional covariance matrices from source-specific class-conditional matrices we require that each writer have at least two samples for each singlet class.We therefore deleted all writers not satisfying this cri-terion from the training sets.Some of testfields are shown in Figure1along with their true class labels and the classi-fication results.Writers Number of samples SD3-Train0-399(395)42698SD7-Train2100-2199(99)11495SD3-Test400-799(399)42821SD7-Test2200-2299(100)11660Table1.Handwritten numeral datasetsWe extracted100blurred directional(chaincode)fea-tures from each sample[8].We then computed the principal components of the SD3-Train+SD7-Train data onto which the features of all samples were projected to obtain100 principal-component features for each sample.The sam-ples of each writer in the test sets were randomly permuted to simulatefields.True label (8,8)Singlet classification result (2,8)Field classification result (8,8)True label (2,3)Singlet classification result (2,5)Field classification result (2,3)True label (1,3)Singlet classification result (6,3)Field classification result (1,3)Figure 1.Some test fields with recognition resultsThe accuracy of the QDF and RDF classifiers was tested on various data sets for all 100and the top 50principal com-ponent (pca)features.The parameter for RDF was set to .The results are presented in Table 2.Test set FeaturesQDFRDFSD3-Test Top 50543528440427(42821)SD7-Test Top 50456448398388(11660)SD3-Test Top 50999976838815+SD7-Test SD3-Test All 100746712396372(42821)SD7-Test All 100551534369352(11660)SD3-Test All 10012971246765724+SD7-TestTable 2.Number of character errors for vari-ous experiments (Training set =SD3-Train +SD7-Train)The field classifier ()consistently outperforms the singlet classifier ().The results indicate that the reg-ularization of the covariance matrices improves accuracy.4.Discussion and future workWe have presented a model of style context in co-occurring patterns using second order correlations.We in-troduced a methodology to exploit such correlations to im-prove classification accuracy.We have demonstrated the ef-ficacy of the scheme on handwritten data.We have alsoshown that our classifier is easy to train and is a natu-ral extension of the widely used quadratic singlet classi-fier.As mentioned earlier,although the extension of the method to longer fields is theoretically possible,computa-tional and storage constraints restrict its application.We are currently studying schemes to extend the pair classifier to longer fields without the need to store or manipulate large field-covariance matrices.References[1]I.Bazzi,R.Schwartz,and J.Makhoul.An omnifont open-vocabulary OCR system for English and Arabic.IEEE Transactions on Pattern Analysis and Machine Intelligence ,PAMI-21(6):495–504,June 1999.[2]S.Cha and S.Srihari.Writer identification:Statistical anal-ysis and dichotomizer.In Proceedings of the International Workshop on Structural and Syntactical Pattern Recogni-tion ,pages 123–132,2000.[3]H.Friedman.Regularized discriminant analysis.Journal ofAmerican Statistical Association ,84(405):166–175,1989.[4]P.Grother.Handprinted forms and character database,NISTspecial database 19,March 1995.Technical Report and CDROM.[5]T.Kawatani.Character recognition performance improve-ment using personal handwriting characteristics.In Pro-ceedings of the Third International Conference on Docu-ment Analysis and Recognition ,volume 1,pages 98–103,1995.[6] F.Kimura,K.Takashina,S.Tsuruoka,and Y .Miyake.Mod-ified quadratic discriminant functions and the application to Chinese character recognition.IEEE Transactions on Pat-tern Analysis and Machine Intelligence ,PAMI-9(1):149–153,January 1987.[7]T.Koshinaka,D.Nishikawa,and K.Yamada.A stochasticmodel for handwritten word recognition using context de-pendency between character patterns.In Proceedings of the Sixth International Conference on Document Analysis and Recognition ,pages 154–158,2001.[8] C.Liu,H.Sako,and H.Fujisawa.Performance evaluation ofpattern classifiers for handwritten character recognition.In-ternational Journal on Document Analysis and Recognition ,in press 2002.[9]G.Nagy.Teaching a computer to read.In Proceedings of theEleventh International Conference on Pattern Recognition ,volume 2,pages 225–229,The Hague,1992.[10]M.Sakai,M.Yoneda,and H.Hase.A new robust quadraticdiscriminant function.In Proceedings of the Fourteenth In-ternational Conference on Pattern Recognition ,volume 1,pages 99–102,1998.[11]P.Sarkar and G.Nagy.Style consistency in isogenous pat-terns.In Proceedings of the Sixth International Conference on Document Analysis and Recognition ,pages 1169–1174,2001.[12]R.Shingal and G.Toussaint.Experiments in text recognitionwith the modified Viterbi algorithm.IEEE Transactions on Pattern Analysis and Machine Intelligence ,1(2):164–172,April 1979.[13]M.Shridhar and F.Kimura.Character recognition perfor-mance improvement using personal handwriting character-istics.In IEEE International Conference on Systems,Man and Cybernetics,volume3,pages2341–2346,1995.。
A Metric for Covariance MatricesWolfgang F¨o rstner and Boudewijn MoonenDiese S¨a tze f¨u hren dahin,die Theorie der krummen Fl¨a chen aus einem neuen Gesichtspunkt zu betrachten,wo sich der Untersuchung ein weites noch ganz unangebautes Feld ¨o ffnet ...so begreift man,dass zweierlei wesentlich verschiedene Relationen zu unterscheiden sind,theils nemlich solche,die eine bestimmte Form der Fl¨a che im Raume voraussetzen,theils solche,welche von den verschiedenen Formen ...unabh¨a ngig sind.Die letzteren sind es,wovon hier die Rede ist ...man sieht aber leicht,dass eben dahin die Betrachtung der auf der Fl¨a che construirten Figuren,...,die Verbindung der Punkte durch k¨u rzeste Linien ∗]u.dgl.geh¨o rt.Alle solche Untersuchungen m¨u ssen davon ausgehen,dass die Natur der krummen Fl¨a che an sich durch den Ausdruck eines unbestimmten Linearelements in der Form √(Edp 2+2F dpdq +Gdq 2)gegeben ist ...Carl Friedrich GaussAbstractThe paper presents a metric for positive definite covariance matrices.It is a natural expression involving traces and joint eigenvalues of the matrices.It is shown to be the distance coming from a canonical invariant Riemannian metric on the space Sym +(n,R )of real symmetric positive definite matricesIn contrast to known measures,collected e.g.in Grafarend 1972,the metric is invariant under affine transformations and inversion.It can be used for evaluating covariance matrices or for optimization of measurement designs.Keywords:Covariance matrices,metric,Lie groups,Riemannian manifolds,exponential map-ping,symmetric spaces1BackgroundThe optimization of geodetic networks is a classical problem that has gained large attention in the 70s.1972E.W.Grafarend put together the current knowledge of network design,datum transfor-mations and artificial covariance matrices using covariance functions in his classical monograph[?];see also [?].One critical part was the development of a suitable measure for comparing two covariance matrices.Grafarend listed a dozen measures.Assuming a completely isotropic network,represented by a unit matrix as covariance matrix,the measures depended on the eigenvalues of the covariance matrix.1983,11years later,at the Aalborg workshop on ’Survey Control Networks’Schmidt [?]used these measures for finding optimal networks.The visualization of the error ellipses for a single point,leading to the same deviation from an ideal covariance structure revealed deficiencies of these measures,as e.g.the trace of the eigenvalues of the covariance matrix as quality measure ∗]emphasized by the authorswould allow a totally flat error ellipse to be as good as a circular ellipse,more even,as good as the flat error ellipse rotated by 900.Based on an information theoretic point of view,where the information ofa Gaussian variable increases with ln σ2,the first author guessed the squared sum d 2= i ln 2λi of the logarithmsof the eigenvalues to be a better measure,as deviations in both directions would be punished the same amount if measured in percent,i.e.relative to the given variances.He formulated the conjecture that the distance measure d would be a metric.Only in case d would be a metric,comparing two covariance matrices A and B with covariance matrix C would allow to state one of the two to be better than the other.Extensive simulations by K.Ballein [?]substantiated this conjecture as no case was found where the triangle inequality was violated.1995,12years later,taking up the problem within image processing,the first author proved the validity of the conjecture for 2×2-matrices [?].For this case the measure already had been proposed by V.V.Kavrajski [?]for evaluating the isotropy of map projections.However,the proof could not be generalized to higher ing classical results from linear algebra and differential geometry the second author proved the distance d to be a metric for general positive definite symmetric matrices.An extended proof can be found in [?].This paper states the problem and presents the two proofs for 2×2-matrices and for the general case.Giving two proofs for n =2may be justified by the two very different approaches to the problem.2MotivationComparing covariance matrices is a basic task in mensuration design.The idea,going back to Baarda 1973[?]is to compare the variances of arbitrary functions f =e T x on one hand determined with a given covariance matrix C and on the other hand determined with a reference or criterion matrix H .One requirement would be the variance σ2(C )fof f when calculated with C to be always smaller than the variance σ2(H )f of f when calculated with H .This means:e T Ce ≤e T He for all e =0or the Raleigh ratio0≤λ(e )=e T Ce e T He≤1for all e =0.The maximum λfrom 1/2∂λ(e )/∂e =0↔λHe −Ce =(λH −C )e =0results in the maximum eigenvalue λmax (CH −1)from the generalized eigenvalue problem|λH −C |=0,(1)Observe that λe T He −e T Ce =e T (λH −C )e =0for e =0only is fulfilled if (??)holds.The eigenvalues of (??)are non-negative if the two matrices are positive semidefinite.This suggests the eigenvalues of CH −1to capture the difference in form of C and H completely.The requirement λmax ≤1can be visualized by stating that the (error)ellipsoid x T C −1x =c lies completely within the (error)ellipsoid x T H −1x =c .The statistical interpretation of the ellipses results from the assumption,motivated by the principle of maximum entropy,that the stochastical variables are normally distributed,thus having density:p (x )=1 (2π)n det Σe −12x T Σ−1xwith covariance matrix Σ.Isolines of constant density are ellipses with semiaxes proportional to the square roots of the eigenvalues of Σ.The ratioλmax =max e σ2(C )f σ2(H )f thus gives the worst case for the ratio of the variances when calculated with the covariances C and H respectively.Instead of requiring the worst precision to be better than a specification one also could require the covariance matrix C to be closest to H in some sense.Let us for a moment assume H =I .Simple examples for measuring the difference in form of C compared to I are the tracetr C =n i =1λi (C )(2)or the determinantdet C =ni =1λi (C ).(3)These classical measures are invariant with respect to rotations (??)or affine transformations (??)of the coordinate system.Visualizing covariance matrices of equal trace or determinant can use the eigenvalues.Restricting to n =2in a 2D-coordinate system (λ1,λ2)covariance matrices of equal trace tr(C )=c tr are characterized by the straight line λ1=c tr −λ2or σ21=c tr −σ22.Covariance matrices of equal determinant det(C )=c det are determined by the hyperbola λ1=c det /λ2or σ21=c det /σ22.Obviously in both cases covariance matrices with very flat form of the corresponding error ellipse e T Ce =c are allowed.E.g.,if one requires c tr =2then the pair (0.02,1.98)with a ratio of semiaxes 1.98/0.02=7is evaluated as being similar to the unit circle.The determinant measure is even more unfavourable.When requiring c det =1even a pair (0.02,50.0)with ratio of semiaxes 50is called similar to the unit circle.However,it would be desirable that the similarity between two covariance matrices reflects the deviation in variance in both directions according to the ratio of the variances.Thus deviations in variance by a factor f should be evaluated equally as a deviation by a factor 1/f ,of course a factor f =1indicating no difference.Thus other measures capturing the anisotropy,such as (1−λ1)2+(1−λ2)2,not being invariant to inversion,cannot be used.The conditions can be fulfilled by using the sum of the squared logarithms of the eigenvalues.Thus we propose the distance measured (A ,B )= n i =1ln 2λi (A ,B )(4)between symmetric positive definite matrices A and B ,with the eigenvalues λi (A ,B )from |λA −B |=0.The logarithm guarantees,that deviations are measured as factors,whereas the squaring guarantees factors f and 1/f being evaluated equally.Summing squares is done in close resemblance with the Euclidean metric.This note wants to discuss the properties of d :•d is invariant with respect to affine transformations of the coordinate system.•d is invariant with respect to an inversion of the matrices.•It is claimed that d is a metric .Thus(i)positivity:d(A,B)≥0,and d(A,B)=0only if A=B.(ii)symmetry:d(A,B)=d(B,A),(iii)triangle inequality:d(A,B)+d(A,C)≥d(B,C).The proof for n=2is given in the next Section.The proof for general n is sketched in the subsequent Sections??–??.3Invariance Properties3.1Affine TransformationsAssume the n×n matrix X to be regular.Then the distance d(A,B)of the transformed matricesA=XAX T B=XBX Tis invariant w.r.t.X.Proof:We immediately obtain:λ(A,B)=λ(XAX T,XBX T)=λ(XAX T(XBX T)−1)=λ(XAX T(X T)−1B−1X−1)=λ(XAB−1X−1)=λ(AB−1)=λ(A,B).Comment:A and B can be interpreted as covariance matrices of y=Xx in case A and B are the covariance matrices of x.Changing coordinate system does not change the evaluation of covariance matrices.Obviously,this invariance only relies on the properties of the eigenvalues, and actually was the basis for Baarda’s evaluation scheme using so-called S-transformations. 3.2InversionThe distance is invariant under inversion of the matrices.Proof:We obtaind2(A−1,B−1)=d2(A−1B)=ni=1lnλi(A−1B)2=ni=1ln[λ−1i(AB−1)]2=ni=1−lnλi(AB−1)2=ni=1lnλi(AB−1)2=d2(AB−1)=d2(A,B).Comment:A−1and B−1can be interpreted as weight matrices of x if one choosesσ2o=1. Here essential use is made of the propertyλ(A)=λ−1(A−1).The proof shows,that also individual inversions of eigenvalues do not change the value of distance measure,as required.3.3d is a Distance MeasureWe show that d is a distance measure,thus thefirst two criteria for a metric hold in general. ad1d≥0is trivial from the definition of d,keeping in mind,that the eigenvalues all are positive.Proof of d=0↔A=B:←:If A=B then d=0.Proof:Fromλ(AB−1)=λ(I)followsλi=1for all i,thus d=0.→:If d=0then A=B.Proof:From d=0followsλi(AB−1)=1for all i,thus AB−1=I from whichA=B follows.ad2As(AB−1)−1=BA−1symmetry follows from the inversion invariance.3.4Triangle InequalityFor d providing a metric on the symmetric positive definite matrices the triangle inequality must hold.Assume three n×n matrices with the following structure:•Thefirst matrix is the unit matrix:A=I.•The second matrix is a diagonal matrix with entries e b i thusB=Diag(e b i).•The third matrix is a general matrix with eigenvalues e c i and modal matrix R,thusC=R Diag(e c i)R T.This setup can be chosen without loss of generality,as A and B can be orthogonalized simulta-neously[?].The triangle inequality can be written in the following form and reveals three termss .=d(A,B)+d(A,C)−d(B,C)=d c+d b−d a≥0.(5)The idea of the proof is the following:(i)Wefirst use the fact that d b and d c are independent on the rotation R.s(R)=d c+d b−d a(R).(ii)In case R=I then the correctness of(??)results from the triangle inequality in R n.This even holds for any permutation P(i)of the indices i of the eigenvaluesλi of BC−1.There exists a permutation P max for which d a is maximum,thus s(R)is a minimum.(iii)We then want to show,and this is the crucial part,that any rotation R=I leads to a decrease of d a(R),thus to an increase of s(R)keeping the triangle inequality to hold. 3.4.1Distances d c and d bThe distances d c and d b are given byd2c=ni=1b2i,d2b=ni=1c2i.The special definition of the matrices B and C now shows to be useful.The last expression results from the fact that the eigenvalues of CA−1=C are independent on rotations R.3.4.2Triangle Inequality for No RotationIn case of no rotations the eigenvalues of BC−1are e b i/e c i.Therefore the distance d a yieldsd2a=ni=1lne b ie c i2=ni=1(b i−c i)2.With the vectors b=(b i)and c=(c i)the triangle inequality in R n yields|c|+|b|−|b−c|≥0ors=ni=1c2i+ni=1b2i−ni=1(b i−c i)2≥0.holds.For any permutation P(i)we also gets=ni=1c2P(i)+ni=1b2i−ni=1(b i−c P(i))2≥0.(6)which guarantees that there is a permutation P max(i)for which s in(??)is minimum.3.4.3d is Metric for2×2-MatricesWe now want to show that the triangle inequality holds for2×2matrices.Thus we only need to show that s(R(φ))is monotonous withφin[0..π/2],or equivalently that d a(R)is monotonous. We assume(observe the change of notation in the entries b i and c i of the matrices)B=b100b2,b1>0,b2>0.(7)Withx=sinφ(8) the rotation R(x)=R(φ)is represented asR(x)= √1−x2x−x√1−x2,thus only values x∈[0,1]need to be investigated.With the diagonal matrix Diag(c1,c2),containing the positive eigenvaluesc1>0,c2>0,(9) this leads to the general matrixC=Rc100c2R T=c1x2+c2(1−x2)−x√1−x2(c2−c1)−x√1−x2(c2−c1)c1(1−x2)+c2x2.The eigenvalues of CB−1are(from Maple)λ1=u(x)+v(x)2b1b2≥0,λ2=u(x)−v(x)2b1b2≥0with the discriminantv(x)=u(x)2−w≥0andu(x)=(b1c1+c2b2)(1−x2)+(b1c2+b2c1)x2≥0,(10)w=4b1b2c1c2≥0,(11) the last inequality holding due to(??)(??).The distanced a(x)=ln2λ1(x)+ln2λ2(x),which is dependent on x,hasfirst derivative∂d a(x)∂x =2x(b2−b1)(c2−c1)v(x)d a(x)lnu(x)−v(x)2b1b2−ln u(x)+v(x)2b1b2=2x(b2−b1)(c2−c1)lnu(x)−v(x)u(x)+v(x)v(x)d a(x).(12)Forfixed b1,b2,c1and c2this expression does not change sign in x∈[0,1].This is because the discriminant v(x)=u2(x)−w(x)(cf.(??))is always positive,due tov(0)=(b1c1−b2c2)2≥0v(1)=(b2c1−b1c2)2≥0∂v(x)∂x=−4x(b2−b1)(c2−c1)u(x)with u(x)≥0(cf.(??))thus v(x)being monotonous.Furthermore,v(x)is always smaller than u2(cf.(??),(??)),thus the logarithmic expression always negative.As the triangle equation is fulfilled at the extremes of the interval[0,1]it is fulfilled for all x,thus for all rotations.Comment:When substituting x=sinφ(cf.(??))thefirst derivative(??)is of the form ∂d a(φ)/∂φ=sinφf(φ)with a symmetric function f(φ)=f(−φ).Thus the derivative is skew symmetric w.r.t.(0,0),indication d a to be symmetric d a(φ)=d a(−φ),which is to be expected.3.4.4d is Claimed to be a Metric for n×n-MatricesThe proof of the metric properties of d for2×2matrices suggests that in the general case of n×n matrices any rotation away from the worst permutation of the indices(cf.(??))results in an increase of the value s.The proof for the case n=2can be used to show,that,starting with the worst permutation of the indices,any single rotation around one of the axes leads to a monotonous change of s.Therefore,for proving the case of general n,there would have to be shown that any combination of two rotations away from the worst permutation leads to a monotonous change of s allowing to reach any permutation by a rotation while increasing s(R). Completing this line of proof has not been achieved so far.4Restating the ProblemIn der K¨u rze liegt die W¨u rze.Deutscher VolksmundLetM (n,R ):={A =(a ij )|1≤i,j ≤n ,a ij ∈R }be the space of real n ×n -matrices,and letS +:=Sym +(n,R ):= A ∈M (n,R ) A =A T ,A >0be the subspace of real,symmetric,positive definite matrices.Recall that any symmetric matrix A can be substituted into a function f :R −→R which gives a symmetric matrix f (A )commuting with all matrices commuting with A .In particular,a symmetric matrix A has an exponential exp (A ),and a symmetric positive definite matrix A has a logarithm ln (A ),and these assignments are inverse to each other.A symmetric positive definite matrixA also has a unique square root √A which is of the same type.Define,for A ,B ∈Sym +(n,R ),theirdistance d (A ,B )≥0byd 2(A ,B ):=tr ln 2 √A −1B √A −1 ,(13)where tr denotes the trace.In particular,this shows thatd (A ,B )≥0,d (A ,B )=0⇐⇒A =B .In more down-to-earth terms:d (A ,B )=n i =1ln 2λi (A ,B ),(14)where λ1(A ,B ),...,λn (A ,B )are the joint eigenvalues of A and B ,i.e.the roots of the equationdet(λA −B )=0.This is the proposal of [?],i.e.of equation (??)above.(To see why these two definitions coincide,note that λA −B =√A (λE −√A −1B √A −1)√A ,so that the joint eigenvaluesλi (A ,B )are just the eigenvalues of the real symmetric positive definite matrix √A −1B √A −1;in particular,they are positive real numbers and so the definition (??)makes sense.)The equation (??)shows that d is invariant under congruence transformations with X ∈GL (n,R ),where GL (n,R )is the group of regular linear transformations of R n :∀X ∈GL (n,R ):d (A ,B )=d (XAX T ,XBX T )(15)(since det (λA −B )and det X (λA −B )X T have the same roots);this is not easily seen fromdefinition (??).It also shows thatd (A ,B )=d (B ,A ),d (A ,B )=d (A −1,B −1).5The resultsOne then hasTheorem1.The map d defines a distance on the space Sym+(n,R),i.e there holds(i)Positivity:d(A,B)≥0,and d(A,B)=0⇐⇒A=B(ii)Symmetry:d(A,B)=d(B,A)(iii)Triangle inequality:d(A,C)≤d(A,B)+d(B,C)for all A,B,C∈Sym+(n,R).Moreover,d has the following invariances:(iv)It is invariant under congruence transformations,i.e.d(A,B)=d(XAX T,XBX T)for all A,B∈Sym+(n,R),X∈GL(n,R)(v)It is invariant under inversion,i.e.d(A,B)=d(A−1,B−1)for all A,B∈Sym+(n,R)The same conclusions hold for the space SSym(n,R)of real symmetric positive definite matrices of determinant one,when one replaces the general linear group GL(n,R)with the special linear group SL(n,R),the n×n–matrices of determinant one,and the space of real symmetric matrices Sym(n,R)with the space Sym0(n,R)of real symmetric traceless matrices.Remark1.We use here the terminology“distance”in contrast to the standard terminology “metric”in order to avoid confusion with the notion of“Riemannian metric”,which is going to play a rˆo le soon.The case n=2is already interesting;see Remark??below.All the properties except property(iv),the triangle inequality,are more or less obvious from the definition(see above),but the triangle inequality is not.In fact,the theorem will be the consequence of a more general theorem as follows.The most important geometric way distances arise is as associated distances to Riemannian metrics on manifolds;the Riemannian metric,as an infinitesimal description of length is used to define the length of paths by integration,and the distance between two points then arises as the greatest lower bound on the length of paths joining the two points.More precisely,if M is a differentiable manifold(in what follows,“differentiable”will always mean“infinitely many times differentiable”,i.e.of class C∞),a Riemannian metric is the assignment to any p∈M of a Euclidean scalar product −|− p in the tangent space T p M depending differentiably on p. Technically,it is a differentiable positive definite section of the second symmetric power S2T∗M of the cotangent bundle,or a positive definite symmetric2-tensor.In classical terms,it is given in local coordinates(U,x)as the“square of the line element”or“first fundamental form”d s2=g ij(x)d x i d x j(16) (Einstein summation convention:repeated lower and upper indices are summed over).Here the g ij are differentiable functions(the metric coefficients)subjected to the transformation ruleg ij(x)=g kl(y(x))∂y k∂x i∂y l∂x j.A differentiable manifold together with a Riemannian metric is called a Riemannian manifold .Given a piecewise differentiable path c :[a,b ]−→M in M ,its length L [c ]is defined to beL [c ]:=b a ˙c (t ) c (t )d t ,where for X ∈T p M we have X p := X |X p ,the Euclidean norm associated to the scalar product in T p M given by the Riemannian metric.In local coordinatesL [c ]=b a g ij (c (t ))˙c i (t )˙c j (t )d t.Given p ,q ∈M ,the distance d (p,q )associated to a given Riemannian metric then is defined to bed (p,q ):=inf cL [c ],(17)the infimum running over all piecewise differentiable paths c joining p to q .This defines indeed a distance:Proposition.The distance defined by (??)on a connected Riemannian manifold is a metric in the sense of metric spaces,i.e.defines a map d :M ×M −→R satisfying(i)d (p,q )≥0,d (p,q )=0⇐⇒p =q(ii)d (p,q )=d (q,p )(iii)d (p,r )≤d (p,q )+d (q,r ).An indication of proof will be given in the next section.So the central issue here is the fact that a Riemannian metric is the differential substrate of a distance and,in turn,defines a distance by integration.This is the most important way of constructing distances,which is the fundamental discovery of Gaußand Riemann .In our case,this paradigm is realized in the following way.The space Sym +(n,R )is a differentiable manifold of dimension n (n +1)/2,more specifically,it is an open cone in the vector spaceSym (n,R ):= A ∈M (n,R ) A =A Tof all real symmetric n ×n -matrices.Thus the tangent space T A Sym +(n,R )to Sym +(n,R )at a point A ∈Sym +(n,R )is just given asT A Sym +(n,R )=Sym (n,R ).The tangent space T A SSym +(n,R )to SSym (n,R )at a point A ∈SSym (n,R )is just given asT A SSym (n,R )=Sym 0(n,R ):={X ∈Sym (n,R )|tr (X )=0},the space of traceless symmetric matrices.Now note that there is a natural action of GL (n,R )on S +,namely,as already referred to above,by congruence transformations:X ∈GL (n,R )acts via A →XAX T .If one regards A as the matrix corresponding to a bilinear form with respect to a given basis,this action represents a change of basis.This action is transitive,and the isotropy subgroup at E ∈S +is just the orthogonal group O (n ):X ∈GL (n,R ) XEX T =E = X ∈GL (n,R ) XX T =E =O (n ),and so S +can be identified with the homogeneous space GL (n,R )/O (n )upon which GL (n,R )acts by left translations (its geometric significance being that its points parametrize the possible scalar products in R n )†].In general,a homogeneous space is a differentiable manifold with a transitive action of a Lie group G ,whence it has a representation as a quotient M =G/H with H a closed subgroup of G .The most natural Riemannian metrics in this case are then those for which the group G acts by isometries,or,in other words,which are invariant under the action of G ;e.g.the classical geometries –the Euclidean,the elliptic,and the hyperbolic geometry –arise in this manner.Looking out for these metrics,Theorem ??will be a consequence of the following theorem :Theorem 2.(i)The Riemannian metrics g on Sym +(n,R )invariant under congruencetransformations with matrices X ∈GL (n,R )are in one-to-one-correspondence with posi-tive definite quadratic forms Q on T E Sym +(n,R )=Sym (n,R )invariant under conjuga-tion with orthogonal matrices,the correspondence being given byg A (X ,Y )=B (√A −1X √A −1,√A −1Y √A −1),where A ∈Sym +(n,R ),X ,Y ∈Sym (n,R )=T A Sym +(n,R ),and B is the symmetric positive bilinear form corresponding to Q(ii)The corresponding distance d Q is invariant under congruence transformations and inver-sion,i.e.satisfiesd Q (A ,B )=d Q (XAX T ,XBX T )andd Q (A ,B )=d Q (A −1,B −1)for all A ,B ∈Sym +(n,R ),X ∈GL (n,R ),and is given by d 2Q (A ,B )=14Q ln √A −1B √A −1 .(18)(iii)In particular,the distance in Theorem ??is given by the invariant Riemannian metriccorresponding to the canonical non–degenerate bilinear form ‡]∀X ,Y ∈M (n,R ):B (X ,Y ):=4tr (XY )on M (n,R ),restricted to Sym (n,R )=T E Sym (n,R )+,i.e.to the quadratic form∀X ∈Sym (n,R ):Q (X ):=4tr X 2on Sym (n,R ).As a Riemannian metric it is,in classical notation,d s 2=4tr √X −1d X √X −1 2 =4tr X −1d X 2 (19)where X =(X ij )is the matrix of the natural coordinates on Sym +(n,R )and d X =(dX ij )is a matrix of differentials.The same conclusions hold for the space SSym (n,R )of real symmetric positive definite matrices of determinant one,when one replaces the general linear group GL (n,R )with the special linear group SL (n,R ),the n ×n –matrices of determinant one,and the space of real symmetric matrices Sym (n,R )with the space Sym 0(n,R )of real symmetric traceless matrices.†]A concrete map p :G −→S +achieving this identification is given by p (A ):=AA T ;it is surjective and satisfies p (XA )=X p (A )X T .The fact that p is an identification map is then equivalent to the polar decomposi-tion :any regular matrix can be uniquely written as the product of a positive definite symmetric matrix and an orthogonal matrix.This generalizes the representation z =re iϕof a nonzero complex number z .‡]the famous Cartan-Killing-form of Lie group theoryRemark2.Although the expression(??)appears to be explicit in the coordinates,it seems to be of no use for analyzing the properties of the corresponding Riemannian metric,since the operations of inverting,squaring,and taking the trace gives,in the general case,untractable expressions.In particular,it apparently is of no help in deriving the expression(??)for the associated distance by direct elementary means.There is,however,one interesting case where it can be checked to give a very classical expression; this is the case n=2.In this case,one hasSSym+(2,R)=x yy zx,y,z∈R,xz−y2=1,x>0a hyperboloid in3-space.This is classically known as a candidate for a model of the hyperbolic plane.In fact,in this case,one may show by explicit computation that the metric(??)restricts to the classical hyperbolic metric,and that the corresponding distance just gives one of the classical formulas for the hyperbolic distance.For details,see[?].Of course,the next question is which invariant metrics there are.Also this question can be answered:Addendum.(i)The positive definite quadratic forms Q on Sym(n,R)invariant under con-jugation with orthogonal matrices are of the formQ(X)=αtrX2+βtr(X)2,α>0,β>−αn(ii)The positive definite quadratic forms Q on Sym0(n,R)invariant under conjugation with orthogonal matrices are unique up to a positive scalar and hence of the formQ(X)=αtrX2,α>0.In particular,the Riemannian metric(??)corresponds to the caseα=1,β=0.Since from the point of this classification all these metrics stand on an equal footing,it would be interesting to know by which naturality requirements this choice can be singled out.6The proofsTo put this result into proper perspective and to cut a long story short,let us very briefly summarize why Theorem??,and consequently Theorem??,are true.First,however,we indicate a proof of the Proposition above,since it is on this Proposition that our approach to the triangle equality for the distance defined by(??)is based.The fact that d(p,q)≥0and the symmetry of d are immediate from the definitions.There remains to show d(p,q)=0=⇒p=q and the triangle inequality.For given p∈M,choose a coordinate neighbourhood U∼=R n around p such that p corresponds to0∈R n.We then have the expression(??)for the given metric in U.Moreover,we have in U the standard Euclidean metricd s2E:=δij d x i d x j=ni=1(dx i)2.Let − denote the norm belonging to the given Riemannian metric in U and|−|the norm given by the standard Euclidean metric.For r>0letB(p;r):={x∈R n||x|≤r}be the standard closed ball with radius r around p=0,andS(p;r):={x∈R n||x|=r}its boundary,the sphere of radius r around p∈R n.As a continuous function U×R n−→R the norm − takes its minimum m>0and its maximum M>0on the compact set B(p;1)×S(p;1).It follows that we have∀q∈B(p;1),X∈R n:m|X|≤ X q≤M|X|by homogeneity of the norm,and so by integrating and taking the infimum∀q∈B(p;1):md E(p,q)≤d(p,q)≤Md E(p,q)(20) where d E(p,q)=|q−p|is the Euclidean distance.If q∈B(p;1),then any path c joining p to q meets the boundary S(p;1)in some point r,from which follows L[c]≥L[c ]≥d(p,r)≥m –where c denotes the part of c joining p to r for thefirst time,say–whence d(p,q)≥m.In other words,if d(p,q)<m we have q∈B(p;1),where we can apply(??).If now d(p,q)=0, then surely d(p,q)<m,and then by(??)md E(p,q)≤d(p,q)=0,whence d E(p,q)=0,which implies p=q.For the triangle inequality,let c be a path joining p to q and d a path joining q to r.Let c∗d be the composite path joining p to r.Then L[c∗d]=L[c]+L[d].Taking the infimum on the left hand side over all paths joining p to r gives d(p,r)≤L[c]+L[d].Taking on the right hand sidefirst the infimum over all paths joining p to q and subsequently over all the paths joining q to r then gives d(p,r)≤d(p,q)+d(q,r),which is the triangle inequality.Remark3.In particular,(??)shows that the metric topology induced by the distance d on a connected Riemannian manifold coincides with the given manifold topology.Now to the proof of Theorem??.Recall the terminology of[?],Chapter X:Let G be a Lie group with Lie algebra g,H⊆G a closed Lie subgroup corresponding to the Lie subalgebra h⊆g.Let M be the homogeneous space M=G/H.Then G operates as a symmetry group on M by left translations.M has the distinguished point o=eH=H corresponding to the coset of the unit element e∈G with tangent space T o M=g/h.This homogeneous space is called reductive if g splits as a direct vector space sum g=h⊕m for a linear subspace m⊆g such that m is invariant under the adjoint action Ad:H−→GL(g).Then canonically T o M=m. In our situation,G=GL(n,R),H=O(n).Then g=M(n,R),the full n×n-matrices,and h=Asym(n,R),the antisymmetric matrices.As is well known,M(n,R)=Asym(n,R)⊕Sym(n,R),since any matrix X splits into the sum of its antisymmetric and symmetric part viaX=X−X T2+X+X T2.The adjoint action of O∈O(n)on M(n,R)is given by X→OXO−1=OXO T and clearly preserves Sym(n,R).So Sym+(n,R)is a reductive homogeneous space.We now have the following facts from the general theory:a)On a reductive homogeneous space there is a distinguished connection invariant under the action of G,called the natural torsion free connection in[?].It is uniquely characterized by the following properties([?],Chapter X,Theorem2.10)。