Separable nonlinear least squares-- the variable projection method and its applications
- 格式:pdf
- 大小:260.31 KB
- 文档页数:27
Package‘OptM’October12,2022Title Estimating the Optimal Number of Migration Edges from'Treemix'Version0.1.6Description The popular population genetic software'Treemix'by'Pickrell and Pritchard'(2012)<DOI:10.1371/journal.pgen.1002967>estimates the number of migration edges on a population tree.However,it can be difficult to determine the number of migrationedges to include.Previously,it was customary to stop adding migrationedges when99.8%of variation in the data was explained,but'OptM'automates this process using an ad hoc statistic based on thesecond-order rate of change in the log likelihood.'OptM'alsohas added functionality for various threshold modelingto compare with the ad hoc statistic.Maintainer Robert Fitak<*****************>Author Robert Fitak[aut,cre]Depends R(>=3.2.2)License GPL(>=2)Encoding UTF-8NeedsCompilation noImports SiZer(>=0.1-4),stats,splines,grDevices,boot(>=1.3-20)Date2021-9-30Repository CRANRoxygenNote7.1.1.9001Date/Publication2021-09-3020:40:05UTCR topics documented:optM (2)plot_optM (3)Index512optM optM optM functionDescriptionLoad a folder of.llikfiles from the program Treemix and determine the optimal number of migration edges to includeUsageoptM(folder,orientagraph=F,tsv=NULL,method="Evanno",ignore=NULL,thresh=0.05,...)Argumentsfolder A character string of the path to a directory containing.llik,.cov.gz and.model-cov.gzfiles produced by Treemixorientagraph A logical indicating whether thefiles were produced from Treemix(FALSE)or OrientAGraph(TRUE).Default=Ftsv a string defining the name of the tab-delimited outputfile.If NULL(default), then no datafile is produced.method a string containing the method to use,either"Evanno","linear",or"SiZer".Default is"Evanno".ignore a numeric vector of whole numbers indicating migration edges to eful when running Treemix on a prebuilt tree(ignore=0).Default is NULL.thresh a numeric value between0and1for the threshold to use for the proportion of increase in likelihood that defines when a plateau is reached.Default is0.05(5%),only applicable for method="linear"....other options sent to the function"SiZer"-see the R package’SiZer’ValueIf method="Evanno":A data frame with17columns summarizing the results for each migration edge(rows).The columns are:"m"-number of migration edges from the model;"runs"=number of itera-tions for"m";"mean(Lm)"-mean log likelihood across runs;"sd(Lm)"-standard deviation of log likelihood across runs;"min(Lm)"-minimum log likelihood across runs;"max(Lm)"-max-imum log likelihood across runs;"L’(m)"-first-order rate of change in log likelihood;"sdL’(m)"-standard deviation offirst-order rate of change in log likelihood;"minL’(m)"-minimumfirst-order rate of change in log likelihood;"maxL’(m)"-maximumfirst-order rate of change in log likelihood;"L”(m)"-second-order rate of change in log likelihood;"sdL”(m)"-standard deviation of the second-order rate of change in log likelihood;"minL”(m)"-minimum second-order rate of change in log likelihood;"maxL”(m)"-maximum second-order rate of change in log likelihood;"Deltam"-the ad hoc deltaM statistic(secord order rate of change in log likelihood);"mean(f)"-mean proportion of variation explained by the models;"sd(f)"-standard deviation of the proportion of variation explained by the modelsIf method="linear":A list containing5elements:$out-a data frame with the name of each model,the degrees of freedom(df),the Akaike informa-tion criterion(AIC),the deltaAIC,and the optimal estimate for m based on the model.$PiecewiseLinear-the piecewise linear model object$BentCable-the bent cable model object$SimpleExponential-the simple exponential model object$NonLinearLeastSquares-the NLS model objectIf method="SiZer":an object of class"SiZer"(see the R package’SiZer’for more information)Examples#Load a folder of simulated test data for m=3folder<-system.file("extdata",package="OptM")test.optM=optM(folder)#To view the various linear modeling estimates:#test.linear=optM(folder,method="linear")#To view the results from the SiZer package:#test.sizer=optM(folder,method="SiZer")plot_optM plot_optM functionDescriptionPlotting the optM results.This function visualizes the output of optM,including the amount of total variation explained across each value of the migration rateUsageplot_optM(input,method="Evanno",plot=TRUE,pdf=NULL)Argumentsinput an object produced by the fucntion’optM’method a string containing the method to use,either"Evanno","linear",or"SiZer".Default is"Evanno",but needs to match that used in’optM’plot logical of whether or not to display the plotpdf string of thefile name to save the resulting pdf plot.If NULL,nofile is saved.Default is NULLValuea plot or pdf of a plotExamples#Load a folder of simulated test data for m=3folder<-system.file("extdata",package="OptM")#Run the Evanno method and plot the resultstest.optM=optM(folder)plot_optM(test.optM,method="Evanno")#To view the various linear modeling estimates and plot:#test.linear=optM(folder,method="linear")#plot_optM(test.linear,method="linear")#To view the results from the SiZer package:#test.sizer=optM(folder,method="SiZer")#plot_optM(test.sizer,method="SiZer")Index∗optMoptM,2∗plot_optMplot_optM,3optM,2plot_optM,35。
基于摩尔库伦理论的浇注式沥青混合料材料强度分析王民【摘要】为了揭示浇注式沥青混合料的材料特性及强度形成机理,以摩尔-库伦理论为基础,采用单轴贯入试验和无侧限抗压强度试验,计算浇注式沥青混凝土强度特征参数(粘聚力c、内摩察角φ、最大剪应力τmax);通过沥青用量及温度对其强度参数影响,分析其强度机理,以及与常规沥青混合料材料特性的差异.结果表明,浇注式沥青混凝土强度受温度、沥青用量影响较大,60℃的抗剪强度要高于常规沥青混合料,而其强度主要来源于高粘改性沥青及较高粉胶比形成的粘聚力;抗剪强度分析结果与贯入度及增量指标的评价结论一致,贯入度及增量指标更适合于评价浇注式沥青混合料的高温稳定性,并可以表征该类型混合料的强度特征.【期刊名称】《武汉理工大学学报(交通科学与工程版)》【年(卷),期】2015(039)006【总页数】5页(P1149-1152,1156)【关键词】浇注式沥青混合料;强度机理;摩尔库伦理论;单轴贯入试验;抗剪强度【作者】王民【作者单位】重庆市智翔铺道技术工程有限公司重庆401336;招商局重庆交通科研设计院有限公司重庆401336【正文语种】中文【中图分类】U4440 引言浇注式沥青混合料(guss asphalt,GA)具有流动性,浇注式摊铺一般不需要碾压,只需要简单的摊铺整平即可完成施工,且具有矿粉含量高、沥青含量高、拌和温度高等“三高”特点,较多的沥青及细集料含量使粗骨料处于悬浮状态[1],它与普通热碾压沥青混合料的结构组成不同.相对于普通沥青混合料,具有空隙率小、密水性佳,协调变形能力强,疲劳寿命高,整体性强等特点[2].由于浇注式沥青混合料高沥青含量的特点,行业对其高温强度一直存在担忧.采用车辙动稳定度对浇注式沥青混合料的高温承载能力进行测试,其结果仅为常规改性沥青混合料的30%左右,SMA的20%左右;而采用贯入度及增量作为高温承载能力的指标,在相同温度条件下,SMA与浇注式的贯入度及增量却可以十分接近.同样,在实桥应用中,面对中国苛刻的交通条件,钢桥面浇注式铺装体系也可表现出良好的使用效果[3],这使得研究人员对其强度机理十分感兴趣.近年来,研究人员对浇注式沥青混合料结构特性,从不同角度进行挖掘和探索.金磊等[4]通过动态剪切流变试验,分析了沥青胶浆广义剪切模量与浇注式沥青混合料高温变形之间的相关性,探讨了该指标作为浇注式胶浆高温性能评价指标的适用性.杨宇明等[5]通过对浇注式沥青混合料进行了不同荷载水平下的三轴重复荷载试验,分析了混合料永久变形和黏弹性变形的发展特性.认为在半正弦重复荷载作用下,浇注式沥青混合料的变形规律与Burgers模型变形公式吻合较好.张顺先等[6]采用改进的Burgers模型推导出浇注式沥青混凝土高温变形性能的粘弹性本构模型,在一定条件下可以预测GA高温条件下永久变形的变化规律.Yangxu [7]通过使用非线性最小二乘回归方法,建立动态模量和相位角预测方程,对浇注式等多种混合料的模量进行分析.上述研究主要集中在分析理论模型与混合料强度指标之间相关性,还有较多研究对浇注式沥青混合料级配构成、材料性能及评价体系等进行了研究[8-11],但未对其结构特征与强度形成机理的关系以及与其他沥青混合料的差异性开展分析.基于此,本文将以摩尔-库伦强度理论为基础,采用单轴贯入试验,通过沥青用量、温度条件、混合料类型对浇注式沥青混合料强度参数的影响分析,揭示了浇注式沥青混合料的强度机理及与常规沥青混合料的差异性,为钢桥面浇注式沥青混合料铺装体系设计及性能评价提供理论支撑.1 沥青混合料强度机理及测试方法在常温和高温状态下,由于沥青混合料内部抗剪强度不足,产生了各种破坏现象,如推移、车辙等.按照摩尔-库伦强度理论,材料的抗剪强度主要来源于摩阻力和粘结力.沥青混合料的材料强度按照摩尔-库伦理论,可表示为式中:τ为抗剪强度,MPa;c为材料粘聚力,MPa;σ为法向压应力,MPa;φ为内摩擦角,(°).沥青混合料抗剪强度参数的确定,一般借鉴岩土工程试验方法,采用三轴试验,通过绘制摩尔圆和相应包络线,按照式(1)线性关系近似确定c,φ值,见图1.图1 摩尔库伦强度理论莫尔圆包络线随着研究的深入,研究人员发现路面工程与岩土工程有较大差异.鉴于三轴试验中,应力分布与围压的大小均与实际路面不相符,本研究借鉴同济大学提出的单轴贯入试验,对浇注式沥青混合料的强度机理进行分析,单轴贯入试验的优势在于受力接近于真实路面.通过有限元建立符合实际受力状态的贯入模型,进而求解出在压头为1MPa时模型中最大剪应力处的主应力值,以此作为基本的抗剪参数(见表1).然后,利用这些基本参数乘以贯入强度值,也就求出了试件中的各主应力值和剪应力值.同时,为了求解出混合料的粘聚力c和内摩擦角φ,再进行一组无侧限抗压强度试验.利用这两组数据,绘制莫尔圆,通过几何关系,推导出基于贯入试验和无侧限抗压强度的粘结力c和内摩擦角φ,即表1 混合料抗剪强度基本参数?式中:σg1为单轴贯入强度乘以抗剪强度参数后的第一主应力;σg3为单轴贯入强度乘以抗剪强度参数后的第三主应力;σu为无侧限抗压强度.2 原材料及配合比2.1 主要原材料及技术指标本研究采用以韩国SK沥青为母体,自行生产加工的改性沥青,浇注式沥青混合料采用湖沥青复合改性沥青,SMA,AC采用高弹体改性沥青,其主要性能指标见表2.表2 沥青结合料主要性能指标?粗集料采用玄武岩,细集料采用石灰岩,矿粉采用石灰岩矿粉,集料及矿粉性能指标满足现行行业规范要求.2.2 混合料配合比设计浇注式沥青混合料(GA10)、沥青玛蹄脂碎石混合料(SMA10)、密级配沥青混合料(AC10)的矿料级配构成见图2.从图2可以看出,3种沥青混合料的矿料级配差异非常大,GA10细集料较多,级配偏细;SMA10粗集料较多,级配偏粗;AC10级配居于两者之间.图2 3种混合料矿料级配曲线图按照《公路工程沥青及沥青混合料试验规程(JTG E20—2011)》,采用高弹体改性沥青,确定SMA10和AC10混合料最佳油石比.按照《公路钢箱梁桥面铺装设计与施工技术指南》的规定,采用湖沥青复合改性沥青,确定GA10混合料最佳油石比.采用最佳油石比,测试3种混合料的基本性能,结果见表3.表3 不同类型沥青混合料的物理性能注:其中弯拉应变采用的是300mm×100mm×50mm尺寸的大梁.?由表3和图2可见,GA10与常规类型沥青混合料SMA10,AC10的材料组成、基本性能及其评价方法等都存在较大差异,其中高温稳定性的3个指标(稳定度、贯入度及增量、动稳定度)评价结论不相一致.可见,选用此3种类型混合料,可以验证三者之间的差异,并揭示浇注式沥青材料的特性.3 实验方案及数据分析根据浇注式沥青混合料结构组成特点,从沥青用量、温度2个方面测试、并计算浇注式沥青混合料的强度参数.同时,测试并计算沥青玛蹄脂碎石混合料、密级配沥青混合料的剪切强度参数,并将其与浇注式沥青混合料剪切强度参数进行对比分析.在此基础上,探讨浇注式沥青混合料强度形成机理.3.1 沥青用量对浇注式沥青混合料强度的影响采用湖沥青复合改性沥青,按照设计配合比,采用最佳油石比和±0.5%的油石比,测试在60℃温度下的单轴贯入强度和无侧限抗压强度,计算相应强度参数,见表4.表4 不同油石比浇注式沥青混合料强度测试结果及计算参数(60℃)?由表4可见,随着沥青用量增加,浇注式沥青混合料内部的富余沥青含量逐渐增多,沥青膜变厚,集料之间摩阻力降低,内摩擦角减小.同时,沥青玛蹄脂内部的粘结力由于沥青相对用量的增加,即粉胶比减小,混合料表现出的粘聚力略有降低.因此,对于浇注式沥青混凝土而言,随着沥青用量的增加,混合料的抗剪强度大幅度下降.而当沥青膜达到最佳厚度、并在小范围波动时,粘聚力主要取决于沥青粘度,不会由于沥青含量增加而增大.3.2 温度条件对浇注式沥青混合料强度的影响采用湖沥青复合改性沥青,在最佳油石比情况下,测试浇注式沥青混合料在不同温度下单轴贯入强度和无侧限抗压强度,计算相应强度参数,见图3~5.图3 不同温度时剪切强度趋势图图4 不同温度时粘聚力c趋势图图从图3~4可以看出,温度对浇注式沥青混合料的强度及参数影响非常大.当温度逐渐升高时,沥青结合料的粘度大幅度下降,沥青混合料由弹塑性体向粘塑性体变化,抵抗外界荷载的能力降低,抗剪性能和粘聚力随着温度的升高迅速降低.图5 不同温度时内摩擦角φ趋势内摩擦角主要与沥青混合料内部矿料的分布状态(矿料级配)相关,温度升高虽改变了集料之间的滑移状态,但作为富沥青含量的悬浮密实结构类型GA10,集料之间的摩阻力受此影响非常小,因而累计变化幅度仅0.5°.由于沥青的粘度受温度干扰较大,混合料的粘聚力随着温度升高而大幅度降低.3.3 混合料类型对其强度的影响根据初步确定GA10,SMA10,AC10矿料配合比及最佳油石比,测试60℃温度下3种混合料强度,计算相应强度参数,见表5.表5 不同类型沥青混合料强度测试结果及计算参数(60℃)?从3种混合料类型的测试结果来看,各项强度参数变化趋势与沥青用量、温度对其影响的变化趋势有较大差异,浇注式沥青混合料GA10的抗剪强度τmax在3种混合料最优.这主要源于以下2点.1)浇注式沥青混合料粉胶比范围为2.8~3.5,沥青玛蹄脂碎石混合料SMA10则是1.5~2.0,密级配沥青混凝土 AC10是0.2~0.5.浇注式沥青混合料的粉胶比越高,矿粉对沥青的吸附作用越强,自由沥青含量就越少,作为连续相的沥青胶泥则表现出最好的稳定性,即粘聚力明显增大.2)3种混合料中,SMA10的骨架结构最为明显,表现出内摩擦角在三者中最大.GA10和AC10均为悬浮密实结构,但却存在本质区别.在沥青混合料的3级空间网状结构分散系中,AC10以细集料和沥青胶泥作为连续相,粗集料为分散相;而GA10以沥青胶泥为连续相,粗、细集料为分散相.也就是说,AC10中集料之间更加紧密,集料之间嵌挤效果明显,内摩擦角会更大些.4 结论通过浇注式沥青混合料结构特性和不同条件下的浇注式沥青混合料的抗剪强度参数分析,形成以下主要结论.1)浇注式沥青混合料在材料组成方面与常规沥青混合料不同,因此其性能特点及评价指标也存在差异.2)沥青用量及温度是浇注式沥青混合料剪切强度的重要影响因素,油石比及温度增加,剪切强度大幅度降低,内摩擦角及粘聚力也发生变化.相对而言,温度对混合料性能影响更为显著.3)从3种沥青混合料剪切强度对比分析结果可以看出,浇注式沥青混合料具有较好的抗剪性能,主要源于沥青结合料粘度大,且混合料空隙率小、整体性强(微缺陷少).4)3种混合料剪切强度与贯入度及增量的变化规律一致,可见采用贯入度及增量对浇注式沥青混合料的承载能力(强度)进行评价是合理的.这也进一步解答了实桥桥面铺装下层采用动稳定度极小的浇注式沥青混合料,却并未出现车辙类热稳性病害的原因.参考文献[1]杨军,潘友强,邓学钧.桥面铺装浇注式沥青混凝土性能[J].交通运输与工程,2007(1):49-53.[2]王民,张华.浇注式沥青混凝土铺装技术应用与发展[J].交通企业管理,2009(10):98-100.[3]王民,张华.钢桥面铺装特点及设计要求综合分析[J].世界桥梁,2013(1):39-41.[4]金磊,钱振东,郑彧.基于DMA方法的浇注式沥青胶浆高温性能及评价指标[J].东南大学学报,2014(5):1062-1066.[5]杨宇明,钱振东,胡靖.重复荷载作用下浇注式沥青混合料黏弹特性[J].东南大学学报,2014(2):391-395.[6]张顺先,张肖宁,魏建明,等.GA混凝土高温粘弹性变形的非线性本构模型[J].重庆大学学报,2013,36(2):92-95.[7]YANG X,YOU Z.New predictive equations for dynamic modulus and phase angle using a nonlinear least-squares regression model[J].Journal of Materials in Civil Engineering,2014(10):1061-1065. [8]万涛涛.英国浇注式桥面铺装混合料MA优化设计[J].中外公路,2012,32(6):278-282.[9]王宏畅,李国芬.南京长江四桥浇注式沥青混凝土配合比设计研究[J].公路,2012(8):50-54.[10]孙立军.沥青路面结构行为理论[M].北京:人民交通出版社,2005.。
---------------------------------------------------------------最新资料推荐------------------------------------------------------偏最小二乘法(pls)简介(Partial least squares(PLS) Introduction)偏最小二乘法(pls)简介(Partial least squares (PLS) Introduction)偏最小二乘法(pls)简介(Partial least squares (PLS) Introduction) Partial least squares (PLS). The news about | | | | | Software Research Report training | knowledge sharing | customer list | Forum Partial least squares (PLS). Author: CIC reading: 14122 times the release date: 2004-12-30 Brief introduction The partial least squares method is a new type of multivariate statistical data analysis method, it was founded in 1983 by Wood (S.Wold) and Abano (C.Albano), is proposed for the first time. In recent decades, it has been developed rapidly in theory, method and application. The partial least squares method For a long time, the boundaries between the methods of model type and understanding of the share is very clear. The partial least squares rule them organically, in an algorithm, can simultaneously achieve regression (multivariate linear regression), simplify the structure of the data (principal component analysis) and the correlation between the two sets of variables analysis (canonical correlation analysis). This is a leap in the multivariate1/ 9statistical analysis of data. The importance of partial least squares method in statistical applications are reflected in the following aspects: Partial least squares regression modeling method is a multi variable for multiple variables. The partial least squares method can solve many previous ordinary regression can not solve the problem. The partial least squares method is called the second generation regression method, because it can realize the comprehensive application of various data analysis methods. The main purpose of principal component regression is to extract relevant information hidden in the matrix X, and then used to predict the value of the variable Y. This approach can ensure that we use only those independent variables, the noise will be eliminated, so as to improve the quality of the prediction model. However, the principal component regression still has some defects, when the correlation between some useful variables is very small, we are in the selection of the main components is easy to put them out, make the reliability prediction model for final decline, if we choose for each component, it is too difficult. Partial least squares regression can solve this problem. It adopts the way of decomposition of the variables X and Y, while extracting components from the variables X and---------------------------------------------------------------最新资料推荐------------------------------------------------------Y (usually called factor), then the factor according to the correlation between them in order from large to small. Now, we want to build a model, as long as we choose several factors involved in modeling it The basic concept Partial least squares regression is an expansion of the multivariate linear regression model, in its simplest form, only a linear model to describe the relationship between the independent variables and Y variables X: Y = B0 + b1X1 + b2X2 + bpXp +... In the equation, B0 is intercept, the value of Bi is 1 to the data points of regression coefficient P. For example, we can think of a person’s weight is a function of his height, gender, and from their respective sample points to estimate the regression coefficient, we can generally predict the weight of someone from the measured height and gender. Many of the methods of data analysis, data describing the biggest problem accurately and make a reasonable forecast of new observational data. The multiple linear regression model to deal with more complex data analysis problems, extended some other algorithms, like discriminant analysis, principal component regression, correlation analysis and so on, are multivariate statistical method with multiple linear regression model based. These3/ 9multivariate statistical methods there are two important characteristics, namely the data binding: The factor variable X and variable Y must be extracted from the X’X and Y’Y matrix, these factors cannot be said at the same time correlation between the variables X and Y. The number of prediction equation can never be more than the variable Y with variable X. Partial least squares regression from multiple linear regression and its expansion did not need these data constraints. In the partial least squares regression, prediction equation by the factor extracted from the matrix Y’XX’Y to describe; in order to be more representative, the maximum number of the number of prediction equations extracted may be greater than the variable X and Y. In short, partial least squares regression may be all multivariate calibration methods for the least variable constraint method, this flexibility makes it suitable for many occasions in the traditional multivariate calibration method is not applicable, for example, some observational data less than when the number of predictor variables. Moreover, partial least squares regression can be used as an exploratory analysis tool, before using the traditional linear regression model, first on the number of variables required for appropriate prediction and---------------------------------------------------------------最新资料推荐------------------------------------------------------ removal of noise. Therefore, the partial least squares regression is widely used in many fields of modeling, such as chemical, medicine, economics, psychology and pharmaceutical science and so on, especially it can according to arbitrarily set the variable this advantage is more prominent. In chemometrics, partial least squares regression has been used as a standard multivariate modeling tool. The calculation process The basic model As a method of multiple linear regression, partial least squares regression main purpose is to establish a linear model: Y=XB+E, where Y is the response matrix with m variables, n points, X is the prediction matrix with P variables, n sample, B regression coefficient matrix, E noise the calibration model, which has the same dimension with Y. In general, the variables X and Y was then used to calculate the standard, i.e. minus their average value and standard deviation divided by the. Partial least squares regression and principal component regression, using factor score as the original prediction variable as a linear combination of the basis, so used to establish prediction model between factor scores must be linearly independent. For example, if we have a set of response variables (Y matrix) and now a large number5/ 9of predictor variables X (matrix), some serious variable linear correlation, we use factor extracted from this set of data extraction method is used to calculate the factor, factor score matrix T=XW, then calculate the weight matrix W right then, a linear regression model: Y=TQ+E, where Q is the regression coefficient matrix of T matrix, E matrix error. Once the Q is calculated after the previous equation is equivalent to the Y=XB+E, the B=WQ, it can be directly used as predictive regression model. The difference of partial least squares regression and principal component regression in different extraction methods of factor score in short weight matrix W principal component regression generated reflects the prediction covariance between variables X, partial least squares W regression weight matrix generated reflects the covariance between the predictor variables and the response variable Y X. In the model, partial least squares regression produces weight matrix W PXC, column vector matrix for W score matrix T column vector calculation variable X nxc. The calculation of these weights the response covariance between the corresponding factor score reaches maximum constant. Ordinary least squares regression matrix Q regression on T when calculating Y, the matrix Y load factor (or weight), are used---------------------------------------------------------------最新资料推荐------------------------------------------------------to establish the regression equation: Y=TQ+E. Once calculated Q, we can get the equation: Y=XB+E, B=WQ, the final prediction model is built up. The nonlinear iterative partial least squares method For a standard algorithm for computing partial least squares regression is a nonlinear iterative partial least squares (NIPALS), there are many variables in this algorithm, some have been standardized, some are not. The following algorithm is considered to be one of the most effective method in the nonlinear iterative partial least squares. On C, A0=X’Y and h=1..., M0=X’X, C0=I, C of known variables. Calculation of QH, the principal eigenvectors of Ah’Ah. Wh=GhAhqh, wh=wh/||wh||, and wh as the column vector of W. Ph=Mhwh, ch=wh’Mhwh, ph=ph/ch, and pH as the column vect or of P. Qh=Ah’wh/ch, and QH as the column vector of Q. Ah+1=Ah - chphqh’, Bh+1=Mh - chphph’ Ch+1=Ch - whph’ Factor score matrix T can be calculated: T=XW, partial least squares regression coefficient B can be calculated by the formula B=WQ. SIMPLS algorithm There is a method to estimate the regression component of partial least squares, called SIMPLS algorithm. On C, A0=X’Y and h=1..., M0=X’X, C0=I, C of known variables. Calculation of QH, the principal eigenvectors of Ah’Ah.7/ 9Wh=Ahqh, ch=wh’Mhwh, wh=wh/sqrt (CH), and wh as the column vector of W. Ph=Mhwh, and pH as the column vector of P. Qh=Ah’wh, and QH as the column vector of Q. Vh=Chph, vh=vh/||vh|| Ch+1=Ch - vhvh’, Mh+1=Mh - phph’ Ah+1=ChAh The same as NIPALS, SIMPLS T T=XW calculated by the formula, B formula by B=WQ’calculation. Related literature Xu Lu, chemometrics methods, Science Press, Beijing, 1995. Wang Huiwen, the partial least squares regression method and application, National Science and Technology Press, Beijing, 1996. Chin, W. W., and Newsted, P. R. Structural Equation (1999). Modeling analysis with Small Samples Using Partial Least Squares. In Rick Hoyle (Ed.), Statistical Strategies for Small Sample Research, Sage Publications. Chin, W. W. (1998) The partial least squares approach for. Structural equation modelling. In George A. Marcoulides (Ed.), Modern Methods for Business Research, Lawrence Erlbaum Associates. Barclay, D., C. Higgins and R. Thompson The Partial (1995). Least Squares (PLS) Approach to Causal Modeling: Personal Computer Adoption and Use as an Illustration. Technology Studies, Volume 2, issue 2, 285-309. Chin, W. W. (1995). Partial Least Squares Is To LISREL As 主成分分析是常见的因素分析。
收稿日期:2006-09-29作者简介:廖江辉(1982-),男,江西南昌人,华南理工大学经济与贸易学院学生,研究方向:技术创新与产业组织.BASS 模型参数估计方法的分析廖江辉(华南理工大学经济与贸易学院,广州 510006)摘要:参数估计方法是是技术创新扩散模型研究过程当中一个相当重要的问题,本文对BASS 模型的几种常见的参数估计方法进行了简要的讨论,希望能够在运用BASS 模型时,根据不同的情况选择较好的方法以便得出准确的估计值。
关键字:技术创新扩散;BASS 模型;参数估计中图分类号:F220.4 文献标识码:A 文章编号:1672-5557(2006)S2-0111-02 引言社会的经济增长,除了受到资本和劳动力等因素的影响之外,很大程度上也受到技术创新的推动。
或者说,技术创新也是一种重要的生产力。
在研究技术创新过程中,技术创新的扩散受到了越来越多人的关注,因此技术创新的扩散也成为国内外研究的热点问题。
在众多的理论当中,要以BASS 为代表的学者的理论最为出众,他们通过构建微分方程,建立微分动力学速度模型来揭示技术扩散的机制,并取得了一系列的理论成果和众多实证案例。
BASS 模型创建于1969年,BASS 通过对美国11种耐用品的扩散情况总结概括而成的,而且很多企业采用Bass 模型成功地进行了技术创新扩散的研究和销售预测。
经过30多年的发展,BASS 模型已经成为研究技术创新扩散的重要模型之一,而且后人也不断的修改BASS 模型以便使它更加具有解释力,例如放宽模型的假设条件,对模型进行柔性化处理等。
为了能够更加深入理解BASS 模型,并且更好地在实际工作中应用它,本文将简要地探讨一下BASS 模型的参数估计方法的应用。
一、BASS 模型的结构首先,简要的说明下BASS 模型的基本结构。
BASS 模型假设:在t 时刻,将来采纳新产品的比例与己经采纳新产品的比例呈线性关系,这一点是Bass 模型的数学基础。
Science &Technology Vision科技视界0引言Robert G.Brown [1]于1959年在《库存管理的统计预测》一书中首次提出指数平滑法的概念,且将其常用于生产预测以及中短期经济发展趋势预测中。
指数平滑系数α是用指数平滑法计算预测趋势值是否符合实际的关键,因为平滑系数α即代表指数平滑预测模型对时间序列数据变化的反映速度,又决定了预测模型修匀误差的能力;平滑系数α的大小体现了各期观察值在指数平滑值中所占的比重,权衡各期观察值所起的不同影响作用。
许多学者提出了各种对于指数平滑系数α的最优估计方法,其原则是使预测值与实测值之间的误差最小[3-4];本文提出了一种新的基于非线性最小二乘法中具有松弛性质的搜索算法确定平滑系数的最优值。
1一次指数平滑预测模型简述1.1水平型指数平滑预测模型设水平型时序数据实测值为y 1,y 2,…,y t -1,y t ;按下式计算得到一次指数平滑预测值[5]:y ⌢t +1=S t (1)=αy t +(1-α)S (1)t -1=α(y t -S (1)t -1)+S (1)t -1(1)式中,y t —时间t 的实测观察值(t =1,2,…,t )y ⌢t +1—时间t +1的预测值(或拟合值)S t (1)—时间t 的一次指数平滑值α—指数平滑系数,且0<α<1从公式(1)中可以看出,下一期预测值y ⌢t +1是根据本期预测误差y t -S (1)t -1对本期预测值S (1)t -1的修正而得,α的大小决定了预测模型修正误差的程度。
将式(1)展开[2]:S (1)t =αy t +(1-α)[αy t -1+(1-α)S (1)t -2]=αy t +α(1-α)y t -1+(1-α)2S (1)t -2=αy t +α(1-α)y t -1+…+α(1-α)t -1y 1+(1-α)t S (1)=α∑t -1j =0(1-α)j y t -j +(1-α)t S (1)(2)当资料数据足够多,t 趋于无穷时,随着t 的增大(1-α)t 会逐渐趋于零,从而在平滑过程中S (1)0对S (1)t 式(2)的变形式为:S (1)t =α∑∞j =0(1-α)j y t -j (3)本文仅探讨一次指数平滑预测过程中平滑系数α最优估计问题,并且所采用的一次指数平滑预测模型要求时序数据符合平稳序列特点,即水平型指数平滑预测模型,后续讨论研究均建立在预测公式(1)的基础上。
荧光寿命数据的相量分析及其应用*林丹樱† 牛敬敬 刘雄波 张潇 张娇 于斌 屈军乐‡(深圳大学物理与光电工程学院, 光电子器件与系统教育部/广东省重点实验室, 深圳 518060)(2020 年4 月15日收到; 2020 年5 月14日收到修改稿)荧光寿命显微成像技术(fluorescence lifetime imaging microscopy, FLIM)具有特异性强、灵敏度高、可定量测量等优点, 被广泛应用于生物医学、材料学等领域的研究. 为使FLIM技术更好地适用于高通量数据的快速分析, 近年来涌现出多种荧光寿命分析的新算法. 其中, 相量分析法(phasor analysis, PA)通过将时间域的拟合转化为频率域的直接计算来获得荧光寿命值, 与传统的最小二乘拟合法相比, 不仅更加简便快速,适用于低光子数情形, 而且便于使数据内容可视化和对数据进行聚类分析, 因此越来越受到科研人员的青睐.本文详细阐述了相量分析法的基本原理及运用方法, 并在此基础上介绍了该方法在细胞代谢状态测量、蛋白质相互作用研究、细胞微环境测量, 以及辅助病理诊断和提高超分辨成像分辨率等方面的应用, 着重讨论了PA法在这些FLIM应用实例中的优势所在, 为相关领域的研究提供有益的参考. 最后, 对荧光寿命数据的相量分析及其应用的发展方向进行展望.关键词:荧光寿命显微成像, 数据分析, 相量分析法, 生物医学应用PACS:87.64.–t, 87.64.M–, 87.64.kv, 87.85.Pq DOI: 10.7498/aps.69.202005541 引 言随着荧光标记技术的快速发展, 荧光显微成像技术已成为生物医学研究领域不可或缺的工具之一. 荧光寿命作为荧光的一个重要参量, 反映了荧光分子从激发态回到基态的退激发速率, 是荧光分子的固有特性, 其变化能非常灵敏地反映荧光分子所处微环境(如细胞微环境中的温度、黏度、pH值、离子浓度等)的变化情况[1]. 同时由于荧光寿命的测量可不受荧光探针浓度、激发光强度、光漂白等因素的影响, 相比强度测量更有利于实现定量化, 因此探测样品中荧光寿命的分布和变化的荧光寿命显微成像技术(fluorescence lifetime imaging microscopy, FLIM)常常被应用于定量测量细胞内的一些生物物理和生物化学参数[2,3], 或气溶胶的黏度测量等[4]. 另一方面, 许多荧光团分子具有相似的荧光光谱, 假如利用它们对样品中不同的结构进行特异性标记, 直接从光谱上很难分离, 但利用它们荧光寿命不同的特点, 可借助FLIM技术对其荧光进行区分, 从而实现多结构的同时标记和成像. 这些特点使得FLIM在生命科学研究中有着越来越广泛的应用.一般而言, FLIM中荧光寿命的测量方法有时域法与频域法两类, 其中频域法使用周期调制的连续光激发样品, 检测荧光信号相对激发光的振幅和相位变化来计算样品的荧光寿命; 时域法采用高重复频率的飞秒脉冲激光激发样品, 利用门控技术、* 国家重点研发计划(批准号: 2017YFA0700500)、国家自然科学基金(批准号: 61775144, 61975131, 61620106016, 61525503, 61835009)、广东省高等学校科技创新(重点)项目(批准号: 2016KCXTD007)、广东省自然科学基金(批准号: 2018A030313362)和深圳市基础研究项目(批准号: JCYJ20170818144012025, JCYJ20170818141701667, JCYJ20170412105003520)资助的课题.† 通信作者. E-mail: dylin@‡ 通信作者. E-mail: jlqu@© 2020 中国物理学会 Chinese Physical Society 扫描相机技术、时间相关单光子计数(time-correlated single photon counting, TCSPC)技术等记录脉冲过后荧光信号的衰减过程并拟合或计算荧光寿命. 因此, 不管对于哪种类型的FLIM, 荧光寿命数据的处理都是非常关键的一环. 例如, 时域法荧光寿命数据最常用的处理方法是非线性最小二乘(nonlinear least-squares, NL-LS)拟合法,即选择一种衰减模型(单指数、双指数或多指数衰减)对每个像素点的测量数据进行曲线拟合, 然后分析出各衰减组分的荧光寿命值. 但是这种方法要求衰减模型的选择要合适, 同时要求每个像素点要采集足够多的光子数(通常 >1000)才能够进行有效拟合[5]. 这就使得基于这种数据处理方法的FLIM技术在应用上受到了相应的限制. 例如, TCSPC-FLIM通常需要在激发光较弱的情况下进行成像, 以避免相邻两个脉冲之间检测到的光子数多于1而导致的光子堆积问题, 因此要求每个像素点采集足够多的光子数就需要延长采集时间, 由此带来的问题就是获得一幅荧光寿命图像的时间相对较长, 成像速度受限, 无法应用于一些荧光寿命变化速度较快的场合.因此, 针对这些问题, 研究者们开发出了其他的算法, 例如相量分析(phasor analysis, PA)[6]、极大似然估计(maximum likelihood estimate, MLE)[7]、一阶矩(the first moment, M1)[8]、贝叶斯分析(Bayesian analysis, BA)[9]、压缩感知(compressed sensing, CS)[10]等, 这些方法可通过降低寿命分析对光子数的要求从而间接地提高FLIM技术的成像速度[11]. 其中, PA法通过将时域信息变换到频域, 可直接计算得到荧光寿命值,简单快速, 无需拟合, 且在低光子数情况下也能得到比较准确的寿命值, 适合于快速FLIM成像. 另一方面, PA法生成的相量图能将具有相似荧光衰减特性的荧光团分子所对应的像素点显示在相邻区域, 形成一定的簇状分布, 这种特点使得利用该方法可以更方便地对数据进行可视化和聚类分析,因此结合PA法的FLIM技术(phasor-FLIM)越来越受到科研人员的青睐, 在生命科学和生物医学研究中应用越来越广泛.本文首先详细阐述phasor-FLIM的基本原理及使用方法, 并在此基础上介绍该技术应用于细胞代谢状态测量、蛋白质相互作用研究、细胞微环境测量以及辅助病理诊断和提高超分辨成像分辨率等方面的最新研究进展, 最后对其发展前景进行展望.2 Phasor-FLIM的基本原理如前所述, FLIM技术中荧光寿命的测量通常分为频域法和时域法[1,6]. 这两种方法在数学意义上互为傅里叶变换, 但它们获取荧光寿命信息的方式不同, 得到的数据内容和形式不同, 从而数据处理方法一般也不同. 频域法一般使用正弦调制的连续光激发样品, 测量得到的是具有相同频率的荧光信号, 但由于荧光寿命的影响, 荧光信号的振幅和相位相比激发光均发生了变化, 因此通过计算荧光信号相对激发光的振幅调制度变化和相位延迟可计算得到荧光寿命. 而时域法则需要采用高重复频率的飞秒脉冲激光激发样品, 利用前面提到的门控技术、扫描相机或TCSPC技术等直接或间接记录脉冲过后的荧光衰减过程, 得到的是荧光强度(或光子数)随时间的变化关系, 因此一般可通过曲线拟合得到荧光寿命.PA法最先被用于处理频域FLIM技术得到的荧光寿命数据, 其相量由频域FLIM测量得到的解调系数和相位延迟来构建, 是原始数据的直接表达[12]. PA法同样适用于时域FLIM数据的分析,但需要先将时域的荧光衰减变换到频域. 由于时域FLIM中的TCSPC-FLIM目前应用最为广泛,因此PA法在该技术中的应用也是报道得最多的.以下分别介绍这两类技术中PA法分析荧光寿命的基本原理, 并结合荧光相量图的特点阐述其典型的应用思路.2.1 频域法FLIM及其相量分析在频域FLIM中, 激发光常采用正弦调制的连续光源, 如激光、氙灯、LED灯等, 激发光强可描述为其中E(0)和E(t)分别为起始时刻和t时刻的激发光强度, M E = a/A为激发光的强度调制度(a和A分别是激发光的振幅和平均强度, 如图1(a)所示), w为调制的角频率. 利用该激发光激发标记有荧光团的样品后, 所检测到的荧光也是正弦调制的, 且其频率与激发光相同, 但强度调制度会降低,相位也有一定延迟. 如果荧光衰减符合单指数衰减规律, 则荧光光强可描述为相应地, I (0)和I (t )分别为起始时刻和t 时刻的荧光强度, M F = b /B 为荧光的强度调制度(b 和B 分别是荧光的振幅和平均强度, 如图1(a)所示), f 为相位延迟或相移. 通过测量相移f 和解调系数M = MF /M E , 即可计算出相应的荧光寿命, 即:可以证明[12,13], 当荧光的衰减过程符合单指数衰减规律(即I (t ) = I (0)e –t /t , t 为寿命)时, (3)式和(4)式算得的寿命t f 和t m 是相等的, 理论上只需要计算其中一个即可. 但假如荧光的衰减过程是两个或多个单指数衰减过程的组合(即双指数衰减或多指数衰减), 则情况更加复杂, 通常需要先在不同调制频率w i 下重复测量多组f i 和M i , 并通过以下公式先计算出等效相移和解调系数再求解寿命, 即:其中f i 为第i 个调制频率测得的光强占总光强的比例.1984年, Jameson 等[13]利用相量的概念对频域法FLIM 得到的数据进行几何表示, 他们利用单个像素点对应的解调系数M 和相移f 来构建一个相量, 即以M 作为该相量的模, 以f 作为该相量的辐角, 则可以认为相量与像素点是一一对应的, 相量图上一个相量的端点就代表了一个像素点的全部荧光寿命信息(如图1(b)所示). 该相量在实轴和虚轴的分量可用Weber 符号表示[14], 即:对于单指数衰减情形, 由(3)式和(4)式可得到cos f = M , 因此可以得到:即以坐标(G , S )表示的相量端点被约束在圆心位于(0.5, 0)处、半径为0.5的半圆上. 半圆上的每个点表示不同的寿命, 寿命值从左到右递减, 其中(1, 0)表示接近零的寿命, (0, 0)表示无限长的寿命, 如图1(b)所示.I (t )=I (0)∑jαj e −t /τj 而如果一个像素点处的荧光衰减过程为多个单指数衰减过程的叠加(即 ,其中t j 表示第j 个单指数衰减组分的寿命, a j 表示该组分的占比), 则根据衰减组分间的线性叠加性质, 其在相量图上对应的相量端点应位于半圆以内[15], 即多指数衰减过程对应的G 和S 应为:1C o u n t s Arrival time(b)(a)(d)(c)图 1 荧光寿命的测量方法及相量分析(PA)法示意图 (a)频域法测量原理示意图; (b)单指数衰减的寿命相量示例图; (c)双指数衰减的寿命相量示例图; (d)时间相关单光子计数(TCSPC)测量原理示意图Fig. 1. Schematic diagram of fluorescence lifetime measurement and phasor analysis (PA): (a) Frequency domain method; (b) life-time phasor of single-exponential decay; (c) lifetime phasor of bi-exponential decay; (d) time-correlated single photon counting (TC-SPC) method.几何意义上, 多指数衰减过程对应的相量端点应位于其各个单指数衰减组分对应的寿命相量端点连接组成的集合内. 如图1(c)中, 双指数衰减过程对应的寿命相量端点(蓝色)落在半圆以内, 位于两个单指数衰减组分对应的寿命相量端点(绿色)的连线上, 且与两端点的距离(p 1, p 2)由两个组分的占比(a 1, a 2)决定.2.2 时域法FLIM 及其相量分析PA 法同样适用于时域法FLIM 数据的分析.这里以目前应用最广泛的TCSPC-FLIM 技术为例. 如图1(d)所示, TCSPC 将每一次脉冲信号作为一个信号周期, 每个周期内当探测到第一个荧光光子时就在其到达时间对应的时间通道中进行计数, 经过多次累积即可建立一个反映荧光衰减过程的光子数-时间分布直方图, 用于求解荧光寿命.2008年, Digman 等[16]将Weber [14]于1981年提出的荧光脉冲响应的傅里叶分析方法用于TCSPC 技术, 并使用以下关系式计算相量端点坐标, 也可以将每个像素的全部荧光衰减信息转换为相量图上的单个点, 即:其中G (w )和S (w )分别是荧光脉冲响应傅里叶频谱的实部和虚部, 这里用作寿命相量的两个分量;I (t )是脉冲过后t 时刻的荧光光强, w 是脉冲激光的角频率, T 是周期长度(实际可取动态范围),n 是谐波的阶次, 一般情况下取基频分量, 即n =1. 因为TCSPC 是在N 个离散的时间通道t k 中记录荧光衰减分布直方图的, 因此上述积分运算可转化为求和计算:其中C k 是第k 个时间通道记录的光子数,I =∑Nk=1C k 是所有时间通道记录的总光子数.如果荧光脉冲响应呈单指数衰减规律, 则代入(11)式可得G (w )和S (w )与荧光寿命t 的关系为:可以验证, G , S 仍然满足(8)式的关系, 即单指数衰减情形下以坐标(G , S )表示的相量端点仍被约束在圆心位于(0.5, 0)处、半径为0.5的半圆上, 与频域法得到的相量图一致.而如果荧光脉冲响应是呈多指数衰减的, 则同样根据线性叠加关系可知, 坐标(G , S )表示的相量端点也位于半圆以内, 其中双指数衰减过程对应的相量端点也在其两个单指数衰减组分对应的相量端点连线上, 与频域法得到的结论一致.2.3 相量图的特点及其典型应用思路如前所述, 对于满足单指数衰减规律的情形,荧光寿命相量图上以坐标(G , S )表示的相量端点被约束在圆心位于(0.5, 0)处、半径为0.5的半圆上, 而多指数衰减过程的相量端点则位于半圆以内, 具体地说, 是位于各单指数衰减组分相量端点连接组成的集合内. 例如, 对于双指数衰减情形,其寿命相量端点位于两个单指数衰减组分相量端点的连线上, 且位置与两个组分的占比有关. 相量图的这些特点, 使得其在荧光寿命数据的定量分析、可视化和聚类分析方面有很大优势.如前所述, 荧光寿命反映的是荧光分子从激发态回到基态的退激发速率, 因此当处于激发态的荧光分子所处的微环境不同, 或者荧光分子与其他分子发生相互作用和能量转移时, 荧光寿命会发生灵敏的变化. 所以, 许多荧光团存在两个甚至两个以上的衰减速率, 分别对应于荧光团的不同状态, 例如细胞内的还原型烟酰胺腺嘌呤二核苷酸(nicotinamide adenine dinucleotide, NADH)处于自由态时(主要位于细胞质中)的荧光寿命为几百皮秒(t 1), 与蛋白质绑定后处于结合态时(主要位于线粒体中)的荧光寿命则达到几纳秒(t 2)[6].2005年Brid 等[17]利用干扰NADH/NAD+比例的代谢药物氰化钾(Potassium cyanide, KCN)阻断细胞的呼吸作用, 证明了细胞中自由态和结合态NADH 的占比与NADH / NAD+比例有关, 从而提出利用FLIM 测量自由态和结合态NADH 的比例, 来间接获得活细胞内的NADH/NAD+比例.NADH 与其氧化形式NAD+作为生物体内许多氧化还原反应的辅酶参与生命活动并相互转化, 它们之间的平衡反映了氧化磷酸化和糖酵解的比率, 因此NADH/NAD+比例及其变化可用于监测细胞代谢方式的变化. 对于细胞内NADH/NAD+比例的探测, 传统的方法有酶循环法、毛细管电泳法、质谱分析法等, 但这些方法只适用于对细胞提取物进行检测. 直接对细胞内NADH/NAD+比例进行探测也可以采用荧光强度成像的方法, 这是因为NADH 发荧光而NAD+不发荧光. 但由于荧光团浓度的不均一性以及自由态和结合态NADH 荧光量子产率的不同, 基于强度成像的NADH/NAD+测量结果通常存在较大误差. Brid 等[17]的工作为定量监测活细胞内NADH/NAD+的比例提供了一种新的思路, 即通过FLIM 定量测量自由态和结合态NADH 的比例来间接获得NADH/NAD+比例. 他们的研究还表明, 当NADH/NAD+比例发生变化时, 自由态和结合态NADH 的荧光寿命值(即t 1和t 2)几乎保持恒定, 变化的是它们各自的占比(即a 1和a 2). 多指数衰减情形的寿命相量与其各个单指数衰减组分寿命相量之间的线性关系,使得采用了PA 法分析荧光寿命的phasor-FLIM 特别适合于在已知单指数衰减组分寿命的前提下, 对多组分尤其是双组分中不同组分的占比进行定量分析. 所以, phasor-FLIM 的典型应用之一, 就是根据寿命相量端点的位置对NADH 两种寿命组分的占比进行定量分析, 从而用于细胞代谢状态的测量[18−27].此外, 在利用荧光共振能量转移(fluorescence resonance energy transfer, FRET)对蛋白质相互作用进行研究的FLIM-FRET 实验, 以及一些利用FLIM 研究细胞微环境参量变化的场合, 也会涉及到双指数衰减甚至多指数衰减及其组分占比的变化, 因此phasor-FLIM 在这些应用中的报道近几年也多了起来[28−34].其次, PA 法生成的相量图中, 具有相似荧光衰减特征的像素点所对应的相量端点会显示在相近的位置, 因而形成一定的簇状分布(如图2(a)和图2(b)所示), 称为“相量簇”. 同一个相量簇对应于具有相似荧光衰减特性的多个像素点, 但这些像素点在样品中的分布可以是不连续的, 和它们的空间分布并没有直接联系. 利用该特点, 可在相量图上选取感兴趣的相量簇, 并通过其对应关系找到相量簇中各相量端点所对应的像素点在样品上的位置, 就可以方便地将样品中具有相似荧光寿命特征的区域标记出来(如图2(c)所示), 还可以进一步对这些荧光寿命特征相似的区域进行聚类分析, 从而为研究一些生物医学问题带来极大的便利[18−20].倘若对不同的相量簇对应的像素点赋以不同的伪彩色, 则利用该方法还可以方便地实现荧光寿命图像的多色显示(如图2(d)所示), 从而起到辅助病理判断等作用[35−37]. 最近, 这种利用PA 法对荧光寿命的分布进行聚类分析的方法还被应用于辅助提高基于受激辐射耗尽(stimulated emission depletion, STED)的超分辨成像的分辨率[38−40].综上, 目前phasor-FLIM 的应用可以用图3来表示. 以下将举例说明phasor-FLIM 的应用研究进展.Fluorescence intensity image Lifetime phasor plotPhasor -mapped FLIM image Phasor analysis(d)(c)图 2 Phasor-FLIM 的应用思路示意图 (a)包含未处理寿命信息的荧光强度图; (b)经PA 法分析得到的寿命相量图; (c)对寿命相量直接进行分析; (d)通过相量聚类分析和伪彩色标记得到的荧光寿命图Fig. 2. Schematic diagram of phasor-FLIM application:(a) Fluorescence intensity image with untreated lifetime in-formation; (b) lifetime phasor plot obtained by PA analysis;(c) direct analysis of lifetime phasors; (d) phasor-mapped FLIM image based on phasor clustering analysis and pseudo-color assignment.3 Phasor-FLIM的应用研究进展3.1 基于NADH的细胞代谢应用研究细胞的能量主要来自细胞呼吸, 正常细胞或正常分化的细胞在有氧条件下采用糖酵解进行代谢,缺氧条件下通过氧化磷酸化进行代谢. 而高度增殖的细胞(如癌细胞或干细胞)即使在氧气充足的条件下也多选择糖酵解作为主要的产能方式, 这种现象被称为“Warburg效应”[41]. 而这种细胞代谢方式的不同导致两者所含NADH的浓度和状态存在差异. 因此基于前面提到的NADH处于两种不同状态具有不同荧光寿命的特性, 以NADH作为内源荧光标志物的双光子FLIM成像常被用于研究正常细胞、干细胞、癌细胞以及其他疾病发生时细胞的代谢差异, 而结合了PA法的phasor-FLIM 可方便地实现活组织中单细胞代谢表型的观测, 在细胞分化和增殖、疾病的机理研究和诊断等方面均具有很好的应用前景, 目前也取得了一些重要的应用研究进展.细胞分化和增殖过程会改变糖酵解和氧化磷酸化之间的平衡, 所以代谢变化可用于研究细胞分化和增殖状态. Stringari等[18−20]通过对细胞内包括NADH在内的各种内源性荧光标志物进行FLIM成像, 并利用PA法对FLIM图像进行分割,可以区分胶原蛋白、视黄醇、视黄酸, 以及处于自由态和结合态的NADH. 他们对小肠进行双光子FLIM成像并对利用PA法得到的寿命相量簇进行聚类分析, 用来识别高度增殖的小肠干细胞, 通过代谢状态对小肠干细胞和分化的后代进行分类分析, 以监测与代谢变化相关的生理(病理)过程.Lee等[21]对白细胞和白血病细胞进行FLIM成像,因为白血病细胞快速增殖, 糖酵解占主导地位, 寿命相量簇向短寿命方向移动, 利用PA法定量分析自由态和结合态NADH比例的变化, 可从血液中快速筛选和分离白血病细胞. 与传统的生物分子诊断技术相比, 这种基于phasor-FLIM的单细胞筛查方法对细胞友好, 具有临床筛选血液细胞的潜力. 细胞外基质(extracellular matrix, ECM)是所有组织必不可少的动态组成部分, 并通过提供机械和生化信号来直接影响细胞行为. ECM的变化可以改变组织的动态平衡, 从而潜在地促进细胞转化和肿瘤的发生. Romero-Lopez等[22]将正常细胞接种在提取自正常人结肠和转移至肝脏的结肠肿瘤的ECM中, 利用phasor-FLIM技术定量分析了自由态和结合态NADH的比例, 结果发现接种在肿瘤ECM中的细胞比接种在正常ECM中的细胞具有更高的游离NADH水平, 糖酵解速率较高, 表明ECM在癌细胞及其相关脉管系统的生长中起到了重要作用.由于许多疾病的发病机制与细胞代谢也有着密切的关联, phasor-FLIM在疾病的机理研究及诊断方面也有不少应用实例. 例如, 亨廷顿病(Huntington’s disease, HD)是一种常染色体神经退行性疾病, 能量代谢障碍是HD的主要发病机制. Sameni等[23]使用phasor-FLIM来定量测量HD中自由态和结合态NADH的比例变化, 作为活细胞代谢变化的间接测量, 用以研究HD发病机制, 结果表明HD的代谢障碍为糖酵解增加, 导致氧化应激和细胞死亡. 这种定量分析的方法可用于筛选HD组织并进行潜在的药物筛选, 对诊断和治疗疾病具有重要意义. 阿兹海默症(Alzheimer’sPhasor-FLIMMetabolism FRET Microenvironment Pathology p-STED BoundNADH (OXPHOS)FreeNADH(Glycolysis)Low FRETHigh FRETDonorMeasuredAutofluoresencepH 7.8pH 5.4Structure 1SelectedAbandoned图 3 Phasor-FLIM 的应用分类示意图Fig. 3. Application classification diagram of phasor-FLIM.disease, AD)是一种老年人多发的神经退行性疾病, 与抗氧化保护降低和线粒体功能障碍有关.Dong 等[24,25]使用双光子激发FLIM 技术结合PA 法定量分析了老年小鼠海马区神经元游离NADH 的水平, 结果发现随着年龄的增加, 游离的NADH 浓度降低, 氧化磷酸化占据主导地位,AD 进一步恶化, 而还原性治疗可恢复老年小鼠及AD 小鼠神经元中游离NADH 的水平. Hato 等[26]则采用类似的方法对活体小鼠肾脏进行双光子FLIM 成像, 使用PA 法分析肾脏中的代谢变化,提供了一种研究肾脏疾病代谢的方法. Datta 等[27]通过双光子FLIM 对人诱导多能干细胞分化的心肌细胞(human induced pluripotent stem cell-derived cardiomyocytes, hiPS-CMs)进行成像, 检测缺氧和线粒体毒性药物氰化钾病理刺激下代谢状态的变化. 如图4所示, 缺氧状态下和用药物刺激时hiPS-CMs 的寿命相量分布向自由态NADH 的方向移动, 代谢方式转变为糖酵解. 这种非侵入性成像技术有助于研究心脏病的发病机理和治疗方法[27].3.2 基于FLIM-FRET 的蛋白互作研究当一个荧光分子(供体)的荧光发射光谱与另一个荧光分子(受体)的激发光谱相重叠, 且两者间的距离合适(一般为6—10 nm)时, 供体的激发能诱发受体发出荧光, 同时其自身的荧光强度发生衰减, 称为FRET 效应. 由于该效应发生的程度(即FRET 效率)与供体和受体的距离紧密相关,该效应常被用于研究蛋白质间的相互作用(简称“蛋白互作”). 具体的做法是将供体荧光分子和受体荧光分子分别标记于两个蛋白质分子上, 通过测量FRET 效率来反映两个蛋白质之间的距离, 从而反映其是否发生相互作用以及相互作用的程度.从荧光寿命的角度看, 当FRET 效应发生时, 由于供体将能量转移到受体上, 供体的荧光寿命将变短. 考虑到FLIM 测量不受光漂白等因素的影响,利用FLIM 进行FRET 效率的测量比测量荧光强度变化的方法要更加准确, 因此FLIM-FRET 已被广泛应用于蛋白互作的研究. 发生和不发生FRET(或者说FRET 效率很高和很低)两种状态下供体的寿命可认为是不变的, 相当于两个单指数衰减组分, 因此根据PA 法的分析, 两种状态下得到的寿命相量应位于半圆上. 而当部分供体发生FRET 时, 总体的荧光衰减满足双指数衰减规律,寿命相量端点将位于上述两个单组分寿命相量端点的连线上, 且到两端的距离与两种组分的占比有关, 由此可方便地计算出FERT 效率. 2012年和2013年, Hinde 等[28,29]利用phasor-FLIM 和FRET 监测了小G 蛋白与RBD 蛋白在溶血磷脂酸(lysophosphatidic acid, LPA)刺激下相互作用的变化, 结果表明, 在荧光强度图像看不出明显差异的情况下, FLIM 图像和PA 定量分析均显示小G 蛋白与RBD 蛋白的标签蛋白EPCF 和Critine 在LPA 刺激下FRET 效率显著增加(如图5所示). Lou 等[30]在共表达H2B-eGFP 和H2B-mCherryrry 两种荧光蛋白的Hela 细胞中利用phasor-FLIM 测得的FRET 效率对核小体的紧实度进行了量化, 用于定量反应DNA 的损伤情况.Cell 1Cell 2Cell 2Cell 1NADH FLIM map Fluorescence intensity U n t r e a t e d c e l l sT r e a t e d c e l l sABBound NADHFree NADHPhasor distribution1.00.5000.5 1.0AB LDH-bound NADHFree NADH in solution图 4 Phasor-FLIM 用于分析细胞在缺氧和线粒体毒性药物氰化钾刺激下NADH/NAD+比例的变化, 研究代谢状态的转变[27]Fig. 4. Phasor-FLIM was used to analyze the change of NADH/NAD+ ratio under the stimulation of hypoxia and mitochondrial toxic drug potassium cyanide, for studying the change of metabolic state of cells [27].。
Home Search Collections Journals About Contact us My IOPscienceSeparable nonlinear least squares: the variable projection method and its applicationsThis article has been downloaded from IOPscience. Please scroll down to see the full text article.2003 Inverse Problems 19 R1(/0266-5611/19/2/201)View the table of contents for this issue, or go to the journal homepage for moreDownload details:IP Address: 155.69.4.4The article was downloaded on 21/10/2010 at 09:18Please note that terms and conditions apply.I NSTITUTE OF P HYSICS P UBLISHING I NVERSE P ROBLEMS Inverse Problems19(2003)R1–R26PII:S0266-5611(03)52278-XTOPICAL REVIEWSeparable nonlinear least squares:the variable projection method and its applicationsGene Golub1and Victor Pereyra21Scientific Computing and Computational Mathematics,Stanford University,Stanford,CA,USA2Weidlinger Associates,4410El Camino Real,Los Altos,CA,USAReceived8August2002Published14February2003Online at /IP/19/R1AbstractIn this paper we review30years of developments and applications of the variableprojection method for solving separable nonlinear least-squares problems.These are problems for which the model function is a linear combination ofnonlinear functions.Taking advantage of this special structure,the methodof variable projections eliminates the linear variables obtaining a somewhatmore complicated function that involves only the nonlinear parameters.Thisprocedure not only reduces the dimension of the parameter space but also resultsin a better-conditioned problem.The same optimization method applied to theoriginal and reduced problems will always converge faster for the latter.Wepresentfirst a historical account of the basic theoretical work and its variouscomputer implementations,and then report on a variety of applications fromelectrical engineering,medical and biological imaging,chemistry,robotics,vision,and environmental sciences.An extensive bibliography is included.The method is particularly well suited for solving real and complex exponentialmodelfitting problems,which are pervasive in their applications and arenotoriously hard to solve.I.Introduction and historical background1.IntroductionWe consider in this paper nonlinear datafitting problems which have as their underlying model a linear combination of nonlinear functions.More generally,one can also consider that there are two sets of unknown parameters,where one set is dependent on the other and can be explicitly eliminated.Models of this type are very common and we will show a variety of applications in differentfields.Inasmuch as many inverse problems can be viewed as nonlinear datafitting problems,this material will be of interest to a wide cross-section of researchers and practitioners in parameter,material or system identification,signal analysis,the analysis of 0266-5611/03/020001+26$30.00©2003IOP Publishing Ltd Printed in the UK R1spectral data,medical and biological imaging,neural networks,robotics,telecommunications, and model order reduction,to name but a few.The authors published in April1973the paper[45]on‘The differentiation of pseudoinverses and separable nonlinear least squares’.This was work initiated in1971, motivated by and generalizing earlier work of Guttman et al[49],which in turn elaborated on and generalized work in Hugo Scolnik’s Doctoral Thesis produced at the University of Zurich[112,113].Scolnik’s original work dealt with the case in which the nonlinear functions dependedon one variable each and were of exponential type(tαi ).In[49]this was extended to generalfunctions of one variable,while in[45]functions of several variables were considered.In this last paper the authors also developed a differential calculus for projectors and pseudoinverses, and proved that the separation-of-variables approach led to the same solution set as that of the original problem.Then Ruhe and Wedin[101]extended these ideas to the more general case of two arbitrary sets of variables.The separability aspect comes from the idea of eliminating the linear variables(i.e.,either the coefficients in the linear combinations or one set of the nonlinear variables),and then minimizing the resulting variable projection(VP)functional that depends only on the remaining variables.Improvements in the algorithm,most especially a simplification due to Kaufman[57], extended by Ruhe and Wedin[101]to the general nonlinear case,made the cost per iteration for the reduced functional comparable to that for the full functional,as proven by the complexity analysis of Ruhe and Wedin.Atfirst it was thought that the main benefit of the elimination of one set of parameters was that fewer variables needed to be estimated.Numerical experience showed,however,that in most cases the reduced problem converged in fewer iterations than if the same minimization algorithm were used for the full functional.This was later substantiated theoretically for the Gauss–Newton(GN)method by the work of Ruhe and Wedin[101],where the asymptotic rates of convergence for the Golub–Pereyra(GP),the Kaufman variant,the NIPALS algorithm, and the unreduced functional are compared by studying the eigenvalues of the corresponding fixed point iteration functions.Since the reduced functional was more complicated,this reduction in the number of iterations did not guarantee that the total computing time was smaller,but there were classes of problems for which the reduction was dramatic,and in fact,it was clearly beneficial to use the VP formulation.Exponentialfitting was one such problem.See for instance[45,102].The main contributions of the GP paper were to give a clean,matrix-based formulation of the problem,a differential calculus for matrix functions,orthogonal projectors,and generalized inverses,and a modern(for the early1970s)and detailed algorithm for dealing with the complexities arising from the VP formulation,that included an early efficient implementation of an adaptive Levenberg–Marquardt(LM)algorithm.They also showed that the reduced problem had the same minima as the original one,provided that the linear parameters were recovered by solving an appropriate linear least-squares problem.We believe that an important part of the impact of this work has come from the fact that a usable,public computer implementation of the algorithm was made available.The1973paper is evenly divided between theoretical derivation and implementation details.In a Stanford report with the same name[44],we included a listing of the program V ARPRO that actually implemented the VP algorithm and produced the numerical results in the paper.Variants of this original code are still in use in some of the applications that we mention below.The purpose of this paper is to take the story from where we left it after our second paper in1976[46],which already contains details on a number of related contributions,mostlyclustered around the early1970s.In fact,what we want to stress here is the surprising richness of applications of this idea and its impact in a number of very differentfields,with lively developments carrying over to this century.Thus,we will classify the applications that we have collected through the years roughly byfield,giving a more detailed description of the basic problem solved for some selected items and pointing out whatever new insights of general value have been discovered.We hope that this strategy will help different practitionersfind clustered the material of most interest to them,while also calling their attention to possible cross-pollination.We do not attempt to be comprehensive,and we refer the reader to the excellent bibliographies of many of the quoted papers for connected contributions.The selection is more by expediency than by any attempt to single out some contributions and slight others.Some interesting trends are observed:during thefirst few years most of the contributions relate to enhancements,modifications,theoretical results,and comparisons.Some of the early applications occur in the area of signal localization,which is still one of the most activefields today.This is not totally surprising given the recent explosion in new telecommunication and mobile applications.Another very activefield is that of the modelling and interpretation of nuclear magnetic resonance data,where V ARPRO has a stellar position.A more recent interestingfield of application is that of neural network training.We will use repeatedly some acronyms that we define here for further reference:LLS, linear least squares;NLLS,nonlinear least squares;SNLLS,separable nonlinear least squares; VP,variable projection;SVD,singular value decomposition.2.Separable nonlinear least squares and the variable projection methodGiven a set of observations{y i},a separable nonlinear least-squares problem is defined in[45] as one for which the model is a linear combination of nonlinear functions that can depend on multiple parameters,and for which the i th component of the residual vector is written asr i(a,α)=y i−nj=1a jφj(α;t i).Here the t i are independent variables associated with the observations y i,while the a j and the k-dimensional vectorαare the parameters to be determined by minimizing the functional r(a,α) 22,where the components of r(a,α)are r i(a,α),and · 2stands for the l2vector norm.We can write this functional using matrix notation asr(a,α) 22= y−Φ(α)a 22,where the columns of the matrixΦ(α)correspond to the nonlinear functionsφj(α;t i)of the k parametersαevaluated at all the t i-values,and the vectors a and y represent the linear parameters and the observations respectively.Now it is easy to see that if we knew the nonlinear parametersα,then the linear parameters a could be obtained by solving the linear least-squares problem:a=Φ(α)+y,(1) which stands for the minimum-norm solution of the linear least-squares problem forfixedα, whereΦ(α)+is the Moore–Penrose generalized inverse ofΦ(α).On replacing this a in the original functional,the minimization problem takes the formmin α12(I−Φ(α)Φ(α)+)y 22,(2)where the linear parameters have been eliminated.We define r2(α)=(I−Φ(α)Φ(α)+)y,which will be called the VP of y.Its name stems from the fact that the matrix in parentheses is the projector on the orthogonal complement of the column space ofΦ(α),which we will denote in what follows by P⊥Φ(α).We will also referto12 r2(α) 22as the VP functional.This is a more powerful paradigm than the simple idea of alternating between minimizationof the two sets of variables(such as the NIPALS algorithm of Wold and Lyttkens[139]),which can be proven theoretically and practically not to result,in general,in the same enhanced performance.In summary,the VP algorithm consists offirst minimizing(2)and then using the optimal value obtained forαto solve for a in(1).One obvious advantage is that the iterative nonlinear algorithm used to solve thefirst minimization problem works in a reduced space and,in particular,fewer initial guesses are necessary.However,the main payoff of this algorithm is the fact that it always converges in fewer iterations than the minimization of the full functional, including convergence when the same minimization algorithm for the full functional diverges (see for instance[64]),i.e.,the minima for the reduced functional are better defined than those for the full one.We demonstrated also that the set of stationary points of the original and reduced functionals are the same.This theorem has been reassuring to many practitioners and has been used to derive other theoretical results in similar situations.A different reason for using the reduced functional is seen by observing from the above results that the linear parameters are determined by the nonlinear ones,and therefore the full problem must become increasingly ill-conditioned as(and if)it converges to the optimal parameters.That is probably one of the reasons that the important and prevalent problem of real or complex exponentialfitting is so hard to solve directly.See for instance[117]for a theoretical discussion of this issue and an interesting application to the training of nonlinear neural networks.Further comments on the basic results can also be found in the textbooks of Seber and Wild[114]and Bj¨o rck[14].3.Numerical methods for nonlinear least-squares problemsGeneral numerical optimization methods can be used to solve NLLS problems,but it pays to take into consideration the special form of the goal functional(a sum of squares),just as it pays to take advantage of the special form of separable problems.We review briefly some of the basic concepts that lead to the main numerical methods used today in standard and SNLLS problems.We assume in what follows that the model functions,φj(α;t),are twice differentiable with respect toα.A fundamental quantity for any optimization method for NLLS that uses derivatives is the Jacobian matrix J(α)of the vector of residuals:r2(α).This appears when calculating the gradient of the VP functional∇12r2(α) 22=J T(α)r2(α),while its Hessian is given by∇212 r2(α) 22=J T(α)J(α)+ni=1r2i(α)∇2r2i(α).The Gauss–Newton(GN)method for nonlinear least squares can be viewed as a quasi-Newton method in which the Hessian is approximated by J T(α)J(α),while the Levenberg–Marquardt (LM)enhancement adds a positive definite matrix and a coefficientλin order to combatill-conditioning.GN is very effective for small-residual problems,since in that case the neglected term is unimportant.By using a trust-region strategy for step control,needed to stabilize the plain GN and LM iterations and making them more globally convergent,one obtains Mor´e ’s implementation [76].See also [93]for an early proof of convergence of the GN algorithm,[90]for an early proof of convergence for LM,and the GP paper for a detailed implementation of an adaptive LM method for SNLLS problems.In [91],Pereyra also describes a detailed implementation of an SVD-based trust-region LM algorithm for NLLS problems that appear in large-scale seismic inversion tasks.These are notoriously ill-conditioned problems and often outright singular.It would be worthwhile to consider such an algorithm for the regularization of ill-conditioned SNLLS problems.Instead of describing the implementation in the GP paper we go directly to that of Kaufman [57],as described in Gay and Kaufman [41].In this discussion we omit the α-dependence in all the matrices in order to lighten the notation.They first observe that GP have proven that the full Moore–Penrose pseudoinverse is not necessary to represent the orthogonal projector,and that a symmetric generalized inverse Φ−satisfying only ΦΦ−Φ=Φand (ΦΦ−)T =ΦΦ−suffices.Thus,the j th column of the Jacobian J can be written as J ·j =− P ⊥Φ∂Φ∂αj Φ− + P ⊥Φ∂Φ∂αjΦ− T y .Kaufman’s simplification for the VP functional consists in approximating the Hessian in the GN method by using only the first term in the Jacobian formula:L =− P ⊥Φ∂Φ∂αj Φ− y ,thus saving in the numerical linear algebra involved at the cost of marginally more function and gradient evaluations.It has been extensively demonstrated,as we indicate below,that savings of up to 25%are achieved by this simplification,making the VPK method as cost efficient per iteration as working with the full functional.Kaufman’s argument to justify the small impact of her simplification is that if one writes J =K +L ,where K ·j =− P ⊥Φ∂Φ∂αjΦ− T ,then:J T J =K T K +L T L +K T L +KL T =K T K +L T L ,since K lies in the null space of ΦT ,while L lies in the range of Φ,so only the term L T Lr (α)is being dropped from the exact Hessian.We point to the reference above for the linear algebra aspects involved in the Kaufman simplification,which combine to produce the quoted 25%reduction in time per iteration.4.Variations and related algorithmsBates and Lindstrom [7]give an interesting statistical interpretation of both the GP and the Kaufman algorithms.After some analysis and numerical comparisons they conclude that these algorithms are very attractive and provide greater stability than methods for the full functional,besides reducing the dimensionality of the optimization problem.Later,Ruhe and Wedin [101]demonstrated the asymptotic convergence properties of the two methods,confirming the experimental results that GN always converges in fewer iterations for the reduced functional than for the full one.They also extended VP to more general separable problems,where the set of variables splits into two,and where,presumably,one of the sets can be easily eliminated.They showed that VP,with the Kaufman simplification,had the same cost per iteration as the full functional approach.Golub and LeVeque [43]extended VP to problems with multiple right-hand sides.See also [59].B¨o ckmann[10]has considered regularization through a trust-region algorithm combined with separation of variables and has obtained excellent results in comparison with state-of-the-art solvers applied to the unreduced problem.In a recent MSc Thesis from Dalhousie University(advisor Patrick Keast),Lukeman [72]extends the application of the Shen–Ypma algorithm[115]to overdetermined systems, establishing the connection with the GP approach,and he gives a very lucid and accurate description of the early developments.See also[78].Osborne and collaborators[85,86]have studied through the years Prony’s method,another reduction technique valid for the exponentialfitting problem,and they have derived variations that make it more stable.5.Constrained problemsKaufman and Pereyra[58]extended VP to problems with separable equality constraints of the formH(α)a=g(α).They show how to reduce this to an unconstrained problem that can be solved by a standard SNLLS solver,such as V ARPRO.They also go on to develop a more efficient algorithm that takes into account the special structure of this problem.See also[27]for further refinements.Parks[87]considered the basic theory for separable nonlinear programming problems in the spirit of Ruhe and Wedin,that she called reducible.Then she went on to consider specific special cases,including the one corresponding to the equality constrained problem studied by Kaufman and Pereyra,that she called semilinear.More recently,Conn et al[25]elaborated on this idea in the context of trust-region algorithms for general nonlinear programming problems,where a subset of variables occur in some predictable functional form that can be used to optimize them more economically.6.Various implementationsThe V ARPRO program[44]was written by Pereyra.At Stanford University,John Bolstad streamlined the code and improved the documentation of V ARPRO under the guidance of Gene Golub.He also included the Kaufman simplification and the calculation and output of the covariance matrix,a very important addition that is frequently missing in least-squares codes written by numerical analysts.Another graduate student at Stanford,Randy LeVeque, wrote a modified version based on the original V ARPRO code called V ARP2which extends the original code to problems with multiple right-hand sides.Both V ARPRO and V ARP2have been publicly available for a long time in netlib,on the port library compiled by David Gay[40].There one can alsofind versions of the Gay and Kaufman implementation for unconstrained(nsf.f)and bound-constrained SNLLS problems,which include the option of usingfinite differences to approximate the derivatives.A FORTRAN90version of V ARPRO can be found in Alan Miller’s repository[75].Sagara and Fukushima[106]use parameter continuation to solve SNLLS problems and report reasonable success in increasing the domain of convergence for certain complicated exponential–polynomialfitting problems.An advocate of V ARPRO,Bert Rust,brought the code early on to Oak Ridge National Laboratory and the National Bureau of Standards(NBS,now known as NIST(National Institute of Standards and Technology)),where it has been used through the years in many applications, some of which we mention below.Rall and Funderlic[97]wrote an interactive front end atOak Ridge National Laboratory,INV AR,to facilitate the use of V ARPRO and to add more statistics.They also added afinite-difference evaluation of the derivatives as an ter this code was improved at NBS,including graphical output[140],and it is still in use at NIST.Another group of scientists who adopted V ARPRO were those involved in analysing in vivo magnetic resonance spectra(MRS);this line of research began with the work of van der Veen et al[130]at Delft University in The Netherlands.A version of V ARPRO can be found in Magnetic Resonance User Interface(MRUI)[81], a widely used system for MR imaging maintained by the Universidad Aut´o noma de Barcelona, Spain(a version in MATLAB[82]is also available).This is mostly a non-profit effortfinanced through grants of the European Union under the project on advanced signal processing for medical magnetic resonance imaging and spectroscopy(TMR-CT970160).More than300 research groups in40countries have licensed MRUI.7.Performance and comparisonsIn the original paper we showed the main performance characteristics of the VP method in several problems involving exponentials,Gaussians,and a rational model forfitting iron M¨o ssbauer spectra,as compared to using the full functional.Four different methods were considered:PRAXIS,a general optimization code that does not use derivatives produced by Richard Brent[17],a GN method with step control,a Marquardt-type code with adaptive choice of the regularization parameter that we developed,and a variable metric code produced by M Osborne.The conclusions were in line with what was known at the time and they are still valid today:GN was fastest when it worked,requiring a good initial estimate,while the variable metric code was not competitive.Brent’s code is recommended if analytical derivatives are a problem,but otherwise it usually requires more function evaluations.Since the cost per iteration was higher for the VP functional,reduction in the number of iterations alone was not a guarantee of less computer time,as shown in some of the examples.Again,these results were dependent on the method used,and for the same problems with the same initial values andfinal error we obtained different comparative performances for different methods.In these examples,the VP approach consistently required fewer iterations,including problems in which it converged while the iteration for the full functional diverged.For the exponentialfitting problem,the VP approach was consistently faster,and the results for GN and Marquardt were comparable and best by far.Since the Kaufman simplification would give an additional25%edge to the VP method,we see that by adjusting the computer times accordingly in[45],in all cases considered the time performance of VPK is better than that corresponding to the minimization of the full functional.As mentioned above,this has also been confirmed theoretically in the paper by Ruhe and Wedin.That is to say,when VP is combined with the Kaufman modification,the costs per iteration for the full and reduced functionals are similar,debunking the original notion that the VP functional was considerably more complicated and therefore more expensive to calculate than the original one.Krogh[64],Kaufman[57],Corradi[26],Gay and Kaufman[41],and Nielsen[84],among others,had similar experiences.The most comprehensive independent studies are those of Corradi who also considered problems with noise,and Gay and Kaufman,who were concerned with proving that the Kaufman modification did not alter the robustness or number of iterations necessary.Unfortunately Corradi does not report computer times.See also the section on medical and biological applications for a discussion of some detailed comparisons within magnetic resonance spectroscopy applications.II.Applications8.Applications to numerical analysis,mathematics and optimizationOne of the uses of the derivative of projectors and generalized inverses,as indicated in[45,131], is in the study of sensitivity to errors in solving linear least-squares problems.For instance, the usual perturbation analysis can be extended to the rank-deficient case(for rank-preserving perturbations):if A( )=A+ B and A = B =1,we can estimate the error in the least-squares solution of A( )x( )=b,for small .By using a Taylor expansion,A+( )−A+(0)= DA+(0)B+O( 2),and the formula from[45]for the Fr´e chet derivative of the generalized inverse with respect to ,DA+(0)=−A+BA++A+A+T B T P⊥A+QB T A+T A+,where Q stands for the projector on the orthogonal complement of the row space of A and we have used the fact that DA(0)=B,we obtain the classical estimate,x( )−x(0) 2 A+ x(0) + A+ 2 r +O( 2),where r=b−Ax is the residual vector.Additional perturbation analyses of this kind are considered in[33,47,48,118].Koliha[63]has extended the differentiability results of the Moore–Penrose inverse in[45] to C*-algebras.In[123]Trosset considers the problem of computing distances between convex sets of positive definite matrices as a reducible programming problem in the sense of Park,and takes good advantage of the separability of the variables.Some of the results of[45,46]are used by Bramley and Winnicka[16]when solving linear inequalities and by Byers et al[20]when looking for the nearest non-regular pencil of matrices.Lanzkron and Rose[67]discuss in detail approximate nonlinear elimination.This is the problem of solving large-scale,sparse,nonlinear algebraic equations by a nonlinear Gauss–Seidel method.So,basically there is an external Gauss–Seidel iteration and an internal iterative solver for single nonlinear equations(or systems in the block case),and the old question of how to manage the inner iterations such that the outer iterations preserve some good properties (see[89,section3]for an early solution)is revisited.They also discuss the problem of slow convergence,associated perhaps with poor scaling of certain sets of variables,and consider the separation of those variables in order to improve the overall convergence.This is then connected to SNLLS problems and the VP approach.9.Applications to chemistryThe moment-singularity method for the calculation of densities of state in the vicinity of van Hove singularities is considered by Prais et al[94,137].Because the asymptotic behaviour of the modified moments is related to the singular behaviour of the density,information about the locations and functional forms of the singularities can be determined directly from the moments themselves.In the1974paper a least-squares method for calculating the locations and exponents of the singularities is described.This NLLS problem is separable and the authors report good success with the‘very fast’V ARPRO code,when using reasonable initial estimates for the location of the singularities.Several numerical results and comparisons arereported.In the1986paper they refine the method and still make good use of V ARPRO, reporting that they have found this code leading to accurate determination of the singularities when as few as20moments are used.Several new calculations,comparisons,and validations are included.Lee and Baeck[68]demonstrate,using a highly positive uranium ion as a test case,that the exact relationship between small and large components of a Dirac spinor in relativistic self-consistentfield calculations is not fully satisfied by the kinetic balance condition,even for a two-electron system.In order to obtain a basis set for a multiple-electron system,the numerical atomic spinors obtained by Dirac–Hartree–Fock calculations arefitted by a given number of Slater-type functions.This SNLLS problem is solved using VP.The calculations for a uranium ion demonstrate the difference between the exponents for the small and large components,giving numerical evidence that the kinetic balancing is not an exact relationship between small and large components of spinors.In[11],Beece et al use a VP algorithm in an exponentialfitting problem associated with the effect of viscosity on the kinetics of the photochemical cycle of bacteriorhodopsin. Marque and Eisenstein[73]extend this work to consider pressure effects on the photocycle of purple membrane.By considering several kinetic data sets taken at the same temperature and pressure but with different monitoring wavelengths and an exponential model,they are able to use V ARP2to separate the variables and efficiently solve a problem with multiple right-hand sides.Thefirst to use this method in such problems was Richard Lozier[71],who motivated the development of the V ARP2extension and became a champion in thisfield for many years (we thank Randy LeVeque for this insight).In[51],Holmstr¨o m considers constrained SNLLS for chemical equilibrium analysis.He uses his own implementation of a VP algorithm that can be found in the LAKE program system[54].See also the description of the MATLAB-based TOMLAB system in[52].A recent review paper with Pettersson[53]is also noteworthy.In an earlier publication with Andersson and Ruhe[2]the use of SNLS in chemical applications is described in detail. 10.Mechanical systemsPrells and Friswell[95]consider the application of the VP method to the update offinite-element models of mechanical systems when the forces applied to them are unknown,an inverse problem.Thus,the forces and the model parameters are estimated from observed values of the system response.The VP method leads to the elimination of the unknown forces and results in an extension of the output residual method,which is frequently used to solve such inverse problems.The method is exemplified in three applications that are discussed in detail:•damage detection of a multi-storey building subject to wind excitation;•estimation of a foundation model of a rotary machine;•model estimation of a steel beam tested by hammer excitation.Prells and Friswell observe that the inclusion of prior knowledge(constraints)helps stabilize some of these ill-conditioned problems,and that in general the use of VP is very successful.They plan to incorporate better regularization techniques and to consider industrial-scale problems in future work.Westwick et al[135]consider the nonlinear identification of automobile dynamics when the car is attached to a vibration test-rig.This time-dependent problem is analysed,and after some manipulations it is recognized that in the nonlinear case a SNLLS problem has to be solved.They use the algorithms of[117]and report excellent results in a benchmark example。