MetaDisc_StatisticalMethods[1]
- 格式:pdf
- 大小:138.91 KB
- 文档页数:9
Meta分析方法当今医学研究飞速发展,在全球范围内对于同一个研究问题所进行的医学研究往往有很多,但往往研究对象、设计方案、干预措施、结局变量、样本含量、随访时间等多个方面并不完全相同,研究结果也不完全一致。
除了传统的系统文献综述(review)和述评(editorial)等研究外,一些研究者希望对综述的各个研究的结果进行定量综合统计学分析——Meta分析。
本文将举例介绍Meta分析的基本概念和常用的Meta分析方法。
Meta分析中的基本概念例1 为了研究Aspirin预防心肌梗塞(MI)后死亡的发生,美国在1976年-1988年间进行了7个关于Aspirin预防MI后死亡的研究,其结果见表16.1,其中6次研究的结果表明Aspirin组与安慰剂组的MI后死亡率的差别无统计意义,只有一个研究的结果表明Aspirin在预防MI后死亡有效并且差别有统计意义。
具体结果如表1所示。
表1 Aspirin预防心肌梗塞后死亡的研究结果研究Aspirin组安慰剂组编号观察人数死亡人数死亡率P E(%) 观察人数死亡人数死亡率P C(%) P值OR*1 615 49 7.97 624 67 10.74 0.094 0.7202 758 44 5.80 771 64 8.30 0.057 0.6813 832 102 12.26 850 126 14.82 0.125 0.8034 317 32 10.09 309 38 12.30 0.382 0.8015 810 85 10.49 406 52 12.81 0.229 0.7986 2267 246 10.85 2257 219 9.70 0.204 1.1337 8587 1570 18.28 8600 1720 20.00 0.004 0.895在例1中,涉及到的主要概念如下:1、研究人群:对每个研究而言,在干预前,根据研究者在设计时,考虑确定研究人群为某地区的心肌梗死患者,通过干预上述研究人群分为两个研究人群:该地区服用Aspirin的心肌梗死人群和该地区服用安慰剂的心肌梗死人群。
Meta分析在医学研究中,绝大多数的医学现象都呈一定的随机性,因此医学研究的结果都受随机抽样误差影响而有所差异。
所以对于同一研究问题的多个研究结果往往不全相同,有些研究的结论甚至相反。
因此如何从结果不一的同类研究中综合出一个较为可靠的结论是医学研究中常常需要面临的问题。
Meta分析就是研究如何综合同类研究结果的一种统计分析方法。
Meta分析就是把相同研究问题的多个研究结果视为一个多中心研究的结果,运用多中心研究的统计方法进行综合分析。
Meta统计分析可以分为确定性模型分析方法和随机模型分析方法。
较常用的确定性模型Meta分析有Mantel-Haeszel统计方法(仅适用于效应指标为OR)和General-Variance-Based统计方法。
然而所有的确定性模型统计方法都要求Meta分析中的各个研究的总体效应指标(如:两组均数的差值等)是相等的,并称为齐性的(Homogeneity),而随机模型对效应指标没有齐性要求。
因此Meta分析可以采用下列分析策略:1)如果各个研究的效应指标是齐性的,则选用确定性模型统计方法:●效应指标为OR,则采用Mantel-Haeszel统计方法●效应指标为两个均数的差值、两个率的差值、回归系数、对数RR等近似服从正态分布的效应指标,则采用General-Variacne-Based方法进行Meta统计分析。
2)如果各个研究的效应指标不满足齐性条件或者研究背景无法用确定性模型进行解释的,则采用随机模型进行Meta 统计分析。
为了使读者较容易地掌握Meta 分析方法,以下将结合STATA软件的Meta 分析操作命令,通过实例介绍Meta 分析步骤和软件操作以及相应的统计分析结果解释,然后对Meta 分析中所涉及的统计公式进行分类汇总小结。
确定性模型的Meta 分析方法例1:为了研究Aspirin 预防心肌梗塞(MI)后死亡的发生,美国在1976年-1988年间进行了7个关于Aspirin 预防MI 后死亡的研究,其结果见表1,其中6次研究的结果表明Aspirin 组与安慰剂组的MI 后死亡率的差别无统计意义,只有一个研究的结果表明Aspirin 在预防MI 后死亡有效并且差别有统计意义。
© 2009 中国循证医学杂志编辑部 C JEBMMeta分析中的异质性及其处理方法王 丹1翟俊霞2牟振云3,*宗红侠1赵晓东2王学义4顾 平51. 河北医科大学图书馆(石家庄 050017);2. 河北省医学情报研究所(石家庄 050021);3. 河北医科大学公共卫生学院流行病与卫生统计教研室(石家庄 050017);4. 河北医科大学第一医院精神卫生研究所(石家庄 050031);5.河北医科大学第一医院神经内科(石家庄 050031)摘要介绍Meta分析中异质性研究,包括Meta分析中异质性的定义,并将异质性分为临床异质性、方法学异质性、统计学异质性三类,介绍减少纳入临床异质性和方法学异质性研究的措施,统计学异质性的五种检验方法(Q统计量、I2统计量、H统计量、Galbraith图法、L’Abbe图)、实例分析及适用情况。
根据异质性的有无将Meta分析分为探索型Meta分析和分析型Meta分析,存在异质性时可采取的措施及其流程图。
关键词 Meta分析;异质性;分类;检验方法;措施Discussing on the Research of Heterogeneity in Meta-analysisWANG Dan1, ZHAI Jun-xia2, MOU Zhen-yun3,*, ZONG Hong-xia1, ZHAO Xiao-dong2, WANG Xue-yi4, Gu Ping51. Library of Hebei Medical University, Shijiazhuang 050017, China;2. Hebei Institute of Medical Information, Shijiazhuang 050021, China;3. Department of Epidemiology and Health Statistics, School of Public Health, Hebei Medical University, Shijiazhuang 050017, China;4. Mental Health Center, The First Hospital, Hebei Medical University, Shijiazhuang 050031, China;5. Department of Neurology, The First Hospital, Hebei Medical University, Shijiazhuang 050031, ChinaAbstract This paper is to discuss the research of heterogeneity in Meta-analysis, including the defi nition of the heterogeneity in Meta-analysis and classifi cation it into clinical heterogeneity, methodological heterogeneity and statistical heterogeneity, the strategies for diminishing clinical heterogeneity and methodological heterogeneity, the fi ve testing methods in statistical heterogeneity (Q statistic, I2 statistic, H statistic, Galbraith plot and L’Abbe plot) and the examplesand applying conditions of the fi ve testing methods, classifi cation of meta-analysis into exploratory meta-analysis andanalytic meta-analysis according if the meta-analysis has heterogeneity, and the strategies and the fl owchart when existingthe heterogeneity in meta-analysis.Key words Meta-analysis; Heterogeneity; Classifi cation; Testing methods; Strategies基金项目:河北省2007年医学科学研究重点课题计划指令性课题(07025)。
M e t a分析的思想及步骤Meta分析的前身源于Fisher1920年“合并P值”的思想,1955年由Beecher 首次提出初步的概念,1976年心理学家Glass进一步按照其思想发展为“合并统计量”,称之为Meta分析;1979年英国临床流行病学家ArchieCochrane提出系统评价systematicreview,SR的概念,并发表了激素治疗早产孕妇降低新生儿死亡率随机对照试验的系统评价,对循证医学的发展起了举足轻重的作用;Meta分析国内翻译为“荟萃分析”,定义是“Thestatisticalanalysisoflargecollectionofanalysisresultsfromindivi dualstudiesforthepurposeofintegratingthefindings.”亦即“对具备特定条件的、同课题的诸多研究结果进行综合的一类统计方法;”Meta从字源来说据考证有“Metalogic:abranchofanalyticphilosophythatdealswiththecriticalexaminationoftheb asicconceptsoflogic”;“Metamathematics:thephilosophyofmathematics,especially,thelogicalsyntaxofmathematics.”其中最简洁并且一语中的的是Metascience::atheoryorscienceofscience,atheoryconcernedwiththeinvestigationanalys isordescriptionoftheoryitself.”意为一种科学中的科学或理论,一种对原理本身进行调查、分析和描述的原理;Meta分析有广义和狭义两种概念:前者指的是一个科学的临床研究活动,指全面收集所有相关研究并逐个进行严格评价和分析,再用定量合成的方法对资料进行统计学处理得出综合结论的整个过程;后者仅仅是一种单纯的定量合成的统计学方法;目前国内外文献中以广义的概念应用更为普遍,系统评价常和Meta分析交叉使用,当系统评价采用了定量合成的方法对资料进行统计学处理时即称为Meta-分;因此,系统评价可以采用Meta-分析quantitativesystematicreview 定量系统评价,也可以不采用Meta-分析non-quantitativesystematicreview,定性系统评价;参照Cochrane协作网系统评价工作手册CochraneReviewers’Handbook制定的统一标准;Meta分析的基本步骤如下:1明确简洁地提出需要解决的问题;2制定检索策略,全面广泛地收集随机对照试验;3确定纳入和排除标准,剔除不符合要求的文献;4资料选择和提取;5各试验的质量评估和特征描述;6统计学处理;a.异质性检验齐性检验;b.统计合并效应量加权合并,计算效应尺度及95%的置信区间并进行统计推断;c.图示单个试验的结果和合并后的结果;d.敏感性分析;e.通过“失安全数”的计算或采用“倒漏斗图”了解潜在的发表偏倚;7结果解释、作出结论及评价;8维护和更新资料;临床医生只需要知道Meta分析的基本思想,具体的统计学方法让统计学家研究,让统计学软件帮我们完成;ReviewManagerRevMan是Cochrane协作网提供给评价者准备和维护更新Cochrane系统评价而设计的软件,也可以说是专门为临床医生度身订做,用于完成Meta分析的软件,它不仅可以协助我们完成Meta 分析的计算过程,还可以帮助我们了解Meta分析的架构并学习系统评价的分析方法,最后把完成的系统评价制作成易于通过电子转换的文件以标准统一的格式发送到Cochrane系统评价资料库TheCochraneDatabaseofSystematicReviews,CDSR,便于电子出版和日后更新;充分利用RevMan软件对初次从事系统评价的人员获得方法学上的指导有很大的裨益;系统评价有多种类型,如病因研究、诊断性试验的评价、预后及流行病学研究等;Cochrane系统评价目前主要限于随机对照试验;非随机对照试验的系统评价方法学还处于不太完善的阶段,需要进行更多的相关研究;诊断试验的Meta 分析方法与一般的随机对照试验Meta分析不同,需要同时考虑敏感性与特异性,采用综合接受者工作特征summaryreceiveroperatingcharacteristiccurve,SROC的分析,但RevMan4.2未提供Meta分析的完整步骤,根据个人的体会,结合战友的经验总结而成,meta的精髓就是对文献的二次加工和定量合成,所以这个总结也算是对战友经验的meta分析吧;一、选题和立题一形成需要解决的临床问题:系统评价可以解决下列临床问题:1.病因学和危险因素研究;2.治疗手段的有效性研究;3.诊断方法评价;4.预后估计;5.病人费用和效益分析等;进行系统评价的最初阶段就应对要解决的问题进行精确描述,包括人群类型疾病确切分型、分期、治疗手段或暴露因素的种类、预期结果等,合理选择进行评价的指标;二指标的选择直接影响文献检索的准确性和敏感性,关系到制定检索策略;三制定纳入排除标准;二、文献检索一检索策略的制定这是关键,要求查全和查准;推荐Mesh联合freeword检索;二文献检索,获取摘要和全文国内的有维普全文VIP,CNKI,万方数据库,外文的有medline,SD,OVID等;三文献管理强烈推荐使用endnote,procite,noteexpress等文献管理软件进行检索和管理文献;查找文献全文的途径:在这里,讲一下找文献的过程,以请后来的战友们参考不包括网上有电子全文的:1.查找免费全文:1在pubmedcenter中看有无免费全文;有的时候虽然没有显示freefulltext,但是点击进去看全文链接也有提供免费全文的;我就碰到几次;2在google中搜一下;少数情况下,NCBI没有提供全文的,google有可能会找到,使用“学术搜索”;本人虽然没能在google中找到一篇所需的文献,但发现了一篇非常重要的综述,里面包含了所有我需要的文献当然不是数据,但起码提供了一个信息,所需要的文献也就这么多了,因为老外的综述也只包含了这么多的内容;这样,到底找多少文献,找什么文献,心里就更有底了;3免费医学全文杂志网站;;提供很过超过收费期的免费全文;2.图书馆查馆藏目录:包括到本校的,当然方便,使用pubmed的linkout看文献收录的数据库,就知道本校的是否有全文;其它国内高校象复旦、北大、清华等医学院的全文数据库都很全,基本上都有权限;上海的就有华东地区联目、查国内各医学院校的图书馆联目;这里给出几个:1中国高等院校医药图书馆协会的地址:,进入左侧的“现刊联目”,可以看到有“现刊联目查询”和“过刊联目查询”,当然,查询结果不可全信,里面有许多错误;本人最难找的两篇文章全部给出了错误的信息后来电话联系证实的;2再给出两个比较好的图书馆索要文献的email地址有偿服务,但可以先提供文献,后汇钱,当然做为我们,一定要讲信誉吆;一是解放军医学图书馆信息部:,电话:;3二是复旦大学医科图书馆原上医:i,联系人,周月琴,王蔚之,郑荣,电话,,需下载文献传递申请表;其他的图书馆要么要求先交开户费,比如协和500元,要么嫌麻烦,虽然网上讲过可提供有偿服务,在这里我就不一一列出了;3.请DXY战友帮忙,在馆藏文献互助站中发帖,注意格式正确,最好提供linkout 的多个数据库的全文链接,此时为帮助的人着想,就是帮助自己;自己也同时帮助别人查文献,一来互相帮助,我为人人,人人为我;二则通过帮助别人可以积分,同时学会如何发帖和下载全文,我就感觉通过帮助别人收获很大,自己积分越高,获助的速度和机会也就相应增加;现在不少免费的网络空间我常用爱存,比发邮件简便很多;所以如果你求助以后,要及时去“我的论坛”中查看帖子,有的很快就把下载链接发过来了,不要一味只看邮箱;4.实在不行,给作者发email;这里给出一个查作者email的方法,先在NCBI中查出原文献作者的所有文章,注意不要只限于第一作者,display,abstract,并尽可能显示多的篇数,100,200,500;然后在网页内查找“”,一般在前的字母会与人名有些地方相似;再根据地址来确定是否是同一作者;5.查找杂志的网址,给主编发信求取全文;这里我就不讲查找的方法了,DXY中有许多帖子;我的一篇全文就是这样得到的;6.向国外大学里的朋友求助;国外大学的图书馆一般会通过馆际互借来查找非馆藏文献,且获得率非常高;我的三篇文献是通过这一途径得到的;如果还是找不到,那就……我也没辙了,还有朋友如有其他的方法,不妨来这里交流;难度不小吧,比起做实验来如何三、对文献的质量评价和数据收集一研究的质量评价对某一试验研究的质量评价主要是评价试验结果是否有效,结果是什么该结果是否适用于当地人群;下面一系列问题可以帮助研究者进行系统的质量评价:①该研究的试验设计是否明确,包括研究人群、治疗手段和结果判定方法;②试验对象是否随机分组;③病人的随访率是否理想及每组病人是否经过统计分析;④受试对象、研究人员及其它研究参与者是否在研究过程中实行“盲法”;⑤各组病人的年龄、性别、职业等是否相似;⑥除进行研究的治疗手段不同外,其它的治疗是否一致;⑦治疗作用大小;⑧治疗效果的评价是否准确;⑨试验结果是否适用于当地的人群,种族差异是否影响试验结果;⑩是否描述了所有重要的治疗结果;治疗取得的效益是否超过了治疗的危险性和费用;系统评价者应根据上述标准进行判断,不满足标准的文献应剔除或区别对待数据合并方法不同,以保证系统评价的有效性;二、数据收集研究者应设计一个适合本研究的数据收集表格;许多电子表格制作软件如Excel、Access,和数据库系统软件如FoxPro等,可以用于表格的制作;表格中应包括分组情况、每组样本数和研究效应的测量指标;根据研究目的不同,测量指标可以是率差、比数odds、相对危险度relativerisk,包括RR和OR;各研究间作用测量指标不一致,需转化为统一指标;常用的统一指标是作用大小EffectSize,ES,ES是两比较组间作用差值除以对照组或合并组的标准差;ES无单位是其优点;三、数据分析系统评价过程中,对上述数据进行定量统计合并的流行病学方法称为Meta分析Metaanalysis;Meta意思是morecomprehensive,即更加全面综合;通过Meta分析可以达到以下目的:1.提高统计检验效能;2.评价结果一致性,解决单个研究间的矛盾;3.改进对作用效应的估计;4.解决以往单个研究未明确的新问题;统计分析的指标一、异质性检验1.检验原理:meta分析的原理首先是假定各个不同研究都是来自非同一个总体H0:各个不同样本来自不同总体,存在异质性,备择假设H1,如果p>0.1,拒绝H0,接受H1,,即来自同一总体这样就要求不同研究间的统计量应该接近总体参数真实值,所以各个不同文献研究结果是比较接近,就是要符合同质性,这时候将所有文献的效应值合并可以采用固定效应模型的有些算法,如倒方差法,mantelhaenszel法,peto法等.2.分类:异质性检验,包括三个方面:临床异质性,统计学异质性和方法学异质性,作meta分析首先应当保证临床同质性,比如研究的设计类型、实验目的、干预措施等相同,否则就要进入亚组分析,或者取消合并,在满足临床同质性的前提下非常重要,不能一味追求统计学同质性,首先考虑专业和临床同质性,我们进一步观测统计学同质性;临床异质性较大时不能行meta分析,随机效应模型也不行.只能行描述性系统综述systemicreviews,SR或分成亚组消除临床异质性.解决临床异质后再考虑统计学异质性的问题.如果各个文献研究间结果不存在异质性p>0.1,选用固定效应模型fixedmodel,这时其实选用随即效应模型的结果与固定效应模型相同;如果不符合同质性要求,即异质性检验有显着性意义p<0.1,这时候固定效应模型的算法来合并效应值就是有偏倚,合并效应值会偏离真实值.所以,异质性存在时候要求采用随机模型,主要是矫正合并效应值的算法,使得结果更加接近无偏估计,即结果更为准确.此外,这里要说明的是,采用的模型不同,和合并效应值的方法不同,都会导致异质性检验P值存在变动,这个可以从算法原理上证明,不过P值变动不会很大,一般在小数点后第三位的改变.异质性检验的Q值在固定模型中采用倒方差法和Mantel-haenszel法中也会不同;随机效应模型是不需要假定各个研究来自同一个总体为前提,本来就是对总体参数的近似无偏估计,这个与固定模型不一样必须要同质为基础,所以随机模型来作异质性检验简直是“画蛇添足”,无奈之举因此,随机模型异质性检验是否有统计学意义都是可以用,而固定模型必须要求无异质性;可以证明和实践,如果无异质性存在的时候,随机模型退化为固定,即固定模型的结果于随机模型的合并效应值是相等的具体见下图:目前,国内外对meta分析存在异质性,尤其是异质性检验P值很小的时候具体范围我不清楚,是0.05~0.1吗请版主补充,学术界有着不同的争论,很多人认为这个时候做meta分析是没有意义,相当于合并了一些来自不同总体的统计结果,也有人认为,这些异质性的存在可能是由于文献发表的时间,研究的分组,研究对象的特征等因素引起,只要采用亚组分析或meta回归分析可以将异质性进行控制或解释,还是可以进行meta分析,至少运用随机效应模型可以相对无偏的估计总体.这里要强调的是,异质性检验P值较小时候,最好能对异质性来源进行分析和说明;合理进行解释,同时进行亚组分析,相当于分层分析,消除混杂因素造成的偏倚bias;3.衡量异质性的指标一个有用的定量衡量异质性的指标是I2,I2=Q–df/Qx100%,此处的Q是卡方检验的统计值,df是其自由度Higgins2003,Higgins2002;这个I2值代表了由于异质性而不是抽样误差机会导致的效应占总效应估计值的百分率;I2值大于50%时,可以认为有明显的异质性;参考二、敏感性分析:1.敏感性分析的含义:改变纳入标准特别是尚有争议的研究、排除低质量的研究、采用不同统计方法/模型分析同一资料等,观察合并指标如OR,RR的变化,如果排除某篇文献对合并RR有明显影响,即认为该文献对合并RR敏感,反之则不敏感,如果文献之间来自同一总体,即不存在异质性,那么文献的敏感性就低,因而敏感性是衡量文献质量纳入和排除文献的证据和异质性的重要指标;敏感性分析主要针对研究特征或类型如方法学质量,通过排除某些低质量的研究、或非盲法研究探讨对总效应的影响;王吉耀第二版P76中“排除某些低质量的研究,再评价,然后前后对比,探讨剔除的试验与该类研究特征或类型对总效应的影响”;王家良第一版八年制P66、154敏感性分析是从文献的质量上来归类,亚组分析主要从文献里分组病例特征分类;敏感性分析是排除低质量研究后的meta分析,或者纳入排除研究后的meta分析;亚组分析是根据纳入研究的病人特点适当的进行分层,过多的分层和过少的分层都是不好的;例如在排除某个低质量研究后,重新估计合并效应量,并与未排除前的Meta分析结果进行比较,探讨该研究对合并效应量影响程度及结果稳健性;若排除后结果未发生大的变化,说明敏感性低,结果较为稳健可信;相反,若排除后得到差别较大甚至截然相反结论,说明敏感性较高,结果的稳健性较低,在解释结果和下结论的时候应非常慎重,提示存在与干预措施效果相关的、重要的、潜在的偏倚因素,需进一步明确争议的来源;2.衡量方法和措施其实常用的就是选择不同的统计模型或进行亚组分析,并探讨可能的偏倚来源,慎重下结论;亚组分析通常是指针对研究对象的某一特征如性别、年龄或疾病的亚型等进行的分析,以探讨这些因素对总效应的影响及影响程度;而敏感性分析主要针对研究特征或类型如方法学质量,通过排除某些低质量的研究、或非盲法的研究以探讨对总效应的影响;建议可以看参考王吉耀主编,科学出版社出版的循证医学与临床实践;敏感性分析只有纳入可能低质量文献时才作,请先保证纳入文献的质量纳入文献的质量评价方法,如果是RCT,可选用JADAD评分;如果病因学研究,我认为使用敏感性分析是评价文献质量前提是符合纳入标准的较为可行的方法;敏感性分析是分析异质性的一种间接方法;有些系统评价在进行异质性检验时发现没有异质性,这时还需不需要作敏感性分析我的看法是需要,因为我觉得异质性也是可以互相抵消的,有时候作出来没有异质性,但经过敏感性分析之后,结果就会有变化;三对入选文献进行偏倚估计发表偏倚publicationbias评估包括作漏斗图,和对漏斗图的对称性作检验;可以用stata软件进行egger检验;人是活的,软件是死的,临床是相对的,统计学是绝对的;四、总结:一结果的解释Meta-分析结果除要考虑是否有统计学意义外,还应结合专业知识判断结果有无临床意义;若结果仅有统计学意义,但合并效应量小于最小的有临床意义的差值时,结果不可取;若合并效应量有临床意义,但无统计学意义时,不能定论,需进一步收集资料;不能推荐没有Meta-分析证据支持的建议;在无肯定性结论时,应注意区别两种情况,是证据不充分而不能定论,还是有证据表明确实无效;二结果的推论Meta-分析的结果的外部真实性如何在推广应用时,应结合该Meta-分析的文献纳入/排除标准,考虑其样本的代表性如何,特别应注意研究对象特征及生物学或文化变异、研究场所、干预措施及研究对象的依从性、有无辅助治疗等方面是否与自己的具体条件一致;理想的Meta-分析应纳入当前所有相关的、高质量的同质研究,无发表性偏倚,并采用合适的模型和正确统计方法;三系统评价的完善与应用系统评价完成后,还需要在实际工作中不断完善,包括:①接受临床实践的检验和临床医师的评价;②接受成本效益评价;③关注新出现的临床研究,要及时对系统评价进行重新评价;临床医师只有掌握了系统评价的方法,才能为本专业的各种临床问题提供证据,循证医学才能够顺利发展;。
meta分析数据处理流程方法英文回答:Meta-analysis is a statistical technique used to combine and analyze data from multiple studies on a specific research question or topic. It involves a systematic review of the literature, data extraction, and statistical analysis to provide a comprehensive summary of the available evidence.The data processing workflow for conducting a meta-analysis typically includes the following steps:1. Formulating the research question: This step involves clearly defining the research question or objective of the meta-analysis. It is important to specify the inclusion and exclusion criteria for selecting studies to ensure the relevance and quality of the data.2. Literature search: A comprehensive search isconducted to identify relevant studies. This involves searching electronic databases, such as PubMed or Web of Science, as well as manual searching of reference lists and contacting experts in the field. The search strategy should be transparent and replicable.3. Study selection: In this step, the identified studies are screened based on predefined inclusion and exclusion criteria. Each study is evaluated independently by two or more reviewers to ensure consistency and minimize bias. Any discrepancies are resolved through discussion or by involving a third reviewer.4. Data extraction: Data extraction involves systematically extracting relevant information from each included study. This typically includes study characteristics (e.g., study design, sample size), participant characteristics, intervention/exposure details, outcome measures, and effect sizes or relevant statistics. It is important to ensure accurate data extraction to minimize errors.5. Statistical analysis: The extracted data are then analyzed using appropriate statistical methods. This may involve calculating summary statistics, such as effect sizes or odds ratios, and conducting meta-regression or subgroup analyses to explore sources of heterogeneity. The choice of statistical methods depends on the nature of the data and research question.6. Assessment of heterogeneity: Heterogeneity refers to the variability in effect sizes across studies. It is important to assess heterogeneity using statistical tests, such as the Q-statistic or I^2 statistic. If significant heterogeneity is present, further exploration through sensitivity analysis or subgroup analysis may be necessary.7. Publication bias assessment: Publication bias refers to the selective publication of studies based on their findings. It is important to assess publication bias using statistical tests, such as funnel plots or Egger's regression test. If publication bias is detected, appropriate adjustments, such as trim-and-fill analysis, may be applied to account for it.8. Interpretation of results: The final step involves interpreting the results of the meta-analysis and drawing conclusions based on the available evidence. This includes discussing the strengths and limitations of the included studies, the overall effect size, and any implications for practice or future research.中文回答:meta分析是一种统计技术,用于结合和分析关于特定研究问题或主题的多个研究数据。
METADISC的SROC数据解读简介METADISC是一种常用于系统评价的统计指标,用于评估诊断试验的准确性和可靠性。
SROC(Summary Receiver Operating Characteristic)曲线是一种图形工具,用于可视化METADISC的结果。
本文将深入探讨METADISC的SROC数据解读,解释其含义与应用。
什么是METADISCMETADISC是一种用于综合评价多个独立研究结果的统计方法。
它结合了不同研究的结果,计算出一个汇总的效应量。
METADISC基于统计学中的元分析方法,通过对各个独立研究结果进行权重分配,得出一个综合的效应量估计。
METADISC还提供了一些其他的统计指标,例如敏感性、特异性、阳性似然比、阴性似然比等等。
为什么要使用METADISCMETADISC的使用有以下几个优势: 1. 整合多个研究结果:METADISC可以将多个独立研究的结果整合到一个汇总结果中,提供更准确和可靠的估计。
2. 评估诊断试验准确性:METADISC可以评估诊断试验的准确性和可靠性,帮助医生和研究人员判断诊断试验的实际应用价值。
3. 提供各种统计指标:除了汇总效应量,METADISC还提供了一些其他的统计指标,如敏感性、特异性等等,帮助用户更全面地评估研究结果。
SROC曲线的解读SROC曲线是METADISC输出的重要图形之一,用来描述敏感性和1-特异性的关系。
下面是SROC曲线的解读。
SROC曲线的形状SROC曲线是以敏感性为横坐标,1-特异性为纵坐标的曲线。
曲线的形状可以有多种情况,如L型、非L型、U型等等。
不同形状的曲线反映了不同的诊断试验特征。
•L型曲线:L型曲线是指曲线突然下降到1-特异性=0,然后从该点开始逐渐上升。
L型曲线通常表示无阈值的诊断试验,也就是说,在任何阈值上,特异性为1。
•非L型曲线:非L型曲线是指曲线不是L型的曲线。
非L型曲线有多种形状,表示不同的试验特征。
Bio Med CentralBMC Medical Research MethodologySoftwareMeta-DiSc: a software for meta-analysis of test accuracy dataJavier Zamora*1, Victor Abraira 1, Alfonso Muriel 1, Khalid Khan 2 andArri Coomarasamy 2Address: 1Clinical Biostatistics Unit, Ramón y Cajal Hospital, Madrid, Ctra. Colmenar km 9.100 Madrid 28034, Spain and 2University of Birmingham and Birmingham Women's Hospital, Edgbaston, Birmingham, UKEmail: Javier Zamora*-javier.zamora@hrc.es; Victor Abraira -Victor.abraira@hrc.es; Alfonso Muriel -Alfonso.muriel@hrc.es; Khalid Khan -k.s.khan@; Arri Coomarasamy -arricoomar@ * Corresponding authorAbstractBackground: Systematic reviews and meta-analyses of test accuracy studies are increasingly being recognised as central in guiding clinical practice. However, there is currently no dedicated and comprehensive software for meta-analysis of diagnostic data. In this article, we present Meta-DiSc,a Windows-based, user-friendly, freely available (for academic use) software that we have developed, piloted, and validated to perform diagnostic meta-analysis.Results: Meta-DiSc a) allows exploration of heterogeneity, with a variety of statistics including chi-square, I-squared and Spearman correlation tests, b) implements meta-regression techniques to explore the relationships between study characteristics and accuracy estimates, c) performs statistical pooling of sensitivities, specificities, likelihood ratios and diagnostic odds ratios using fixed and random effects models, both overall and in subgroups and d) produces high quality figures,including forest plots and summary receiver operating characteristic curves that can be exported for use in manuscripts for publication. All computational algorithms have been validated through comparison with different statistical tools and published meta-analyses. Meta-DiSc has a Graphical User Interface with roll-down menus, dialog boxes, and online help facilities.Conclusion: Meta-DiSc is a comprehensive and dedicated test accuracy meta-analysis software. It has already been used and cited in several meta-analyses published in high-ranking journals. The software is publicly available at http://www.hrc.es/investigacion/metadisc_en.htm.BackgroundAccurate diagnosis forms the basis of good clinical care, as without it one can neither prognosticate correctly nor choose the right treatment. Indeed, a wrong diagnosis can harm patients by exposing them to inappropriate or sub-optimal therapy [1]. Thus studies of diagnostic accuracy,and particularly their systematic reviews and meta-analy-ses, are being recognised as instrumental in underpinning evidence-based clinical practice. Initiatives such as STARD[2] and developments within the Cochrane Collaboration [3] to accept protocols and reviews of test accuracy studies highlight the emphasis being given to evidence-based diagnosis.Currently, there is only one test accuracy meta-analysis package, Meta-Test [4], which addresses some of the unique statistical issues related to test accuracy, such as pooling of sensitivities and specificities and summaryPublished: 12 July 2006BMC Medical Research Methodology 2006, 6:31doi:10.1186/1471-2288-6-31Received: 31 March 2006Accepted: 12 July 2006This article is available from: /1471-2288/6/31© 2006 Zamora et al; licensee BioMed Central Ltd.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.receiver operating characteristics (sROC) analysis. How-ever, it is a DOS-based application with an interface that many find difficult to use, and integrate into Windows-based applications. Moreover, it lacks crucial analytical tools such as pooling of likelihood ratios (LRs), tests for heterogeneity and meta-regression facilities.We, therefore, developed, piloted and validated a compre-hensive, Windows-based test accuracy meta-analysis soft-ware, Meta-DiSc, which is presented in this article, with a worked example.ImplementationMeta-DiSc software was created in Microsoft Visual Basic 6, and some mathematical routines have been linked from the NAG C mathematical library [5]. The software is distributed as a single file, downloadable freely from URL: http://www.hrc.es/investigacion/metadisc_en.htm. Its installation is simple, guided by onscreen instructions. The programme has a user-friendly interface with roll-down menus, dialog boxes and online HTML compiledhelp files. These help files include a user manual and a description of the implemented statistical methods. Meta-DiSc allows data entry into its datasheet in three dif-ferent ways: a) directly by typing data into the datasheet using the keyboard, b) copying from another spreadsheet (e.g. Microsoft Excel) and pasting into Meta-DiSc datash-eet, or c) importing text files from other sources (for exam-ple, in the comma delimited format). Several variables can be defined in the datasheet, including study identifi-ers, accuracy data from each study (true positives, false positives, true negatives and false negatives) and study level co-variates, such as those defining population spec-trum or methodological quality of the studies.Once the data have been entered into the datasheet of Meta-DiSc, various statistical analyses can be imple-mented (Figure 1). The implementation of these statistical procedures needs to be carefully thought through and judicious, as it may be inappropriate (or indeed mislead-ing) to use all the procedures (particularly statistical pool-ing) in all reviews. Meta-DiSc provides analysts with adequate tools to assess the appropriateness of pooling. Readers interested in details of these methods are referred to statistical methods section of the help files (also avail-able as a PDF standalone document [6] and to existing texts and guidelines on diagnostic meta-analysis [7-10]. Describing the results of individual studiesWhen describing accuracy results from several studies, it is important to get an indication of the magnitude and pre-cision of the accuracy estimates derived from each study, as well as to assess the presence or absence of inconsisten-cies in accuracy estimates across studies (heterogeneity).As accuracy estimates are paired and often inter-related (sensitivity and specificity, or LR positive and LR nega-tive), it is necessary to report these simultaneously [11]. One accuracy measure that combines these paired meas-ures is diagnostic odd ratio (dOR) [12], which has limited clinical use, although useful in procedures like meta-regression (see below).Meta-DiSc computes accuracy estimates and confidence intervals from individual studies and shows results either as numerical tabulations or graphical plots in two for-mats: a) forest plots, for sensitivities, specificities, LRs or dOR, with respective confidence intervals; and b) plots of individual study results in ROC space, with or without an sROC curve.Exploring heterogeneity (threshold effect)Exploring heterogeneity is a critical issue to a) understand the possible factors that influence accuracy estimates, and b) to evaluate the appropriateness of statistical pooling of accuracy estimates from various studies. One of the pri-mary causes of heterogeneity in test accuracy studies is threshold effect, which arises when differences in sensitiv-ities and specificities or LRs occur due to different cut-offs or thresholds used in different studies to define a positive (or negative) test result. When threshold effect exists, there is a negative correlation between sensitivities and specificities (or a positive correlation between sensitivities and 1-specificities), which results in a typical pattern of "shoulder arm" plot in a sROC space [8]. It is worth not-ing that correlation between sensitivity and specificity could arise due to a number of reasons other than thresh-old (e.g. partial verification bias, different spectrum of patients or different settings).Figure 1Available tools in Meta-DiSc. Tools implemented in the software Meta-DiSc to perform different steps of meta-analy-sis of diagnostic tests accuracy.Meta-DiSc allows assessment for threshold effect in three different ways: a) visual inspection of relationship between pairs of accuracy estimates in forest plots. If threshold effect is present, the forest plots will show increasing sensitivities with decreasing specificities, or vice versa. The same inverse relationship will be apparent with LR positive and LR negative; b) representation of accuracy estimates from each study in a sROC space – a typical "shoulder arm" pattern would suggest presence of threshold effect; and c) computation of Spearman correla-tion coefficient between the logit of sensitivity and logit of 1-specificity. A strong positive correlation would suggest threshold effect.Exploring for heterogeneity (other than threshold effect) Apart from variations due to threshold effect, there are several other factors that can result in variations in accu-racy estimates amongst different test accuracy studies in a review. These reasons include chance as well as variations in study population (e.g. severity of disease and co-mor-bidities), index test (differences in technology, assays, operator etc.), reference standard, and the way a study was designed and conducted [13]. Since such heterogeneity is almost always present in accuracy systematic reviews, test-ing for the presence and the extent of heterogeneity of results between primary studies, prior to undertaking any meta-analysis, is a critical part of any diagnostic review, as is exploration of the possible causes of heterogeneity [14]. Meta-DiSc allows users to test for heterogeneity amongst various studies in two different ways: a) Visual inspection of forest plots of accuracy estimates. If the studies are rea-sonably homogeneous, the accuracy estimates from indi-vidual studies will lie along a line corresponding to the pooled accuracy estimate. Large deviations from this line will indicate possible heterogeneity; b) statistical tests, including Chi-square and Cochran-Q, which are automat-ically implemented during analysis to evaluate if the dif-ferences across the studies are greater than expected by chance alone. A low p-value will suggest presence of het-erogeneity beyond what could be expected by chance alone. In addition to these heterogeneity statistics, Meta-DiSc computes the inconsistency index (I-squared) which has been proposed as a measure to quantify the amount of heterogeneity [15].Meta-regressionIf substantial heterogeneity is found to be present from the analyses detailed above, then reasons for such hetero-geneity can be explored by relating study level co-variates (e.g., population, test, reference standard or methodolog-ical features) to an accuracy measure, using meta-regres-sion techniques. The accuracy measure that is normally used is dOR, as it is a unitary measure of diagnostic per-formance that encompasses both sensitivity and specifi-city or both LR positive and LR negative. Using dOR as a global measure of accuracy is a suitable method to com-pare the overall diagnostic accuracy of different tests [13]. However, its use is limited because it cannot be used directly in clinical practice and, furthermore, possible opposing effects of a study characteristic on sensitivity or specificity may be masked by using dOR.Meta-DiSc implements meta-regression using a generali-zation of Littenberg and Moses Linear model [8,13] weighted by inverse of the variance or study size or unweighted. Random effects between studies can be esti-mated by different methods and added to the weighting scheme [16]. Estimations of coefficients of the model are performed by least squares method as implemented in NAG mathematical routines. The outcome variable is ln(dOR) which is related via a linear model to any number of study level covariates, and optionally includ-ing the variable representing threshold effect [13]. The outputs from meta-regression modelling in Meta-DiSc are the co-efficients of the model, as well as ratio of dOR (rdOR) with respective confidence intervals. If a particular study level co-variate is significantly associated with diag-nostic accuracy, then its co-efficient will have a low p-value, and the rdOR will give a measure of magnitude of the association.More advanced meta-regression techniques such as Hier-archical sROC model [17] and bivariate analysis of sensi-tivity and specificity [18] has been developed. These methods overcome some of the statistical shortcomings inherent to Littenberg and Moses model [8,19]. Statistical poolingStatistical pooling is not always appropriate or necessary in every systematic review of test accuracy studies. How-ever, when used appropriately, pooling can provide useful summary information. The necessary precondition for simple pooling (weighted averaging) of each of sensitivi-ties, specificities, LR positives and LR negatives, is that the studies and results are reasonably homogeneous (i.e. no substantial heterogeneity, including threshold effect, is present). If heterogeneity due to threshold effect were present, the accuracy data can be pooled by fitting a sROC curve and summarising that curve by means of the Area Under the Curve (AUC) or using other statistics such as the Q* index [19] (i.e. the point of the curve in which sen-sitivity equals specificity). If there is heterogeneity due to sources other than threshold effect, then pooling should only be attempted within homogeneous subsets, which would normally have been defined a priori.Meta-DiSc has comprehensive functionality for statistical pooling: a) It allows pooling of sensitivities, specificities, LR positive and LR negative each separately, using eitherfixed or random effect [10,20] models. The output from these analyses are presented numerically in tables, and graphically as forest plots. Pooled estimates are provided with their respective confidence intervals; b) It imple-ments several ways to fit a sROC curve when threshold effect is present. Default option is to compute a symmet-rical sROC curve after fitting the linear model proposed by Littenberg and Moses. However, users can choose differ-ent options to fit this curve, for example, combining indi-vidual dORs by the Mantel-Haenszel or the DerSimonian Laird methods [10,20] to estimate an overall dOR, and then fitting an sROC curve. When the dOR changes with diagnostic threshold, the sROC curve is asymmetrical. Meta-DiSc allows the user to check for asymmetry of the sROC curve, and fit an asymmetrical sROC curve if appro-priate. Finally, Meta-DiSc allows estimation of AUC and the Q* index, along with their standard errors, as a sum-mary measure of global accuracy which also aids inter-test comparisons; c) Meta-DiSc allows pooling of various summary measures within subgroups defined by study level co-variates with the help of a filter utility.Wherever possible, the results of the above statistical pro-cedures were validated using different general purpose sta-tistical software such as STATA (ver 8.2) and SAS (8.2) using actually published and simulated data sets (Table 1).ResultsWe illustrate the various procedures that Meta-DiSc implements in a case-study of ultrasound test in the diag-nosis of uterine pathology [21,22]. Ultrasound measure-ment of the lining of the uterus (endometrium) can predict pathology such as endometrial hyperplasia (a pre-cancerous condition) or cancer. The greater the thickness of endometrium, the more likely that the target condition is present. Various thresholds (such as 3, 4 or 5 mm etc) have been used to define a positive ultrasound result.A systematic review of test accuracy studies identified 57 studies. Figure 2 shows a datasheet in Meta-DiSc which has been loaded with information from these 57 studies. The information includes study identifiers, accuracy data,Table 1: Validation of statistical procedures. Validation of different statistical procedures using a simulated data-set. Results of Meta-DiSc (version 1.4) are compared with those obtained with metan (version 1.86) and metareg (version 1.06) STATA commands. Prior to the analyses, all four cells of all studies were added with 1/2 to avoid division by zero when computing some indices or standard errors. Meta-DiSc and STATA data-set are provided as additional files [see Additional file 1] and [see Additional file 2].ResultsProcedure Meta-DiSc (version 1.4)STATA (ver 8.2)Random Effect ModelPooled +ve LR 2.447 2.447(95%(CI)(2.085 – 2.871)(2.085 – 2.871)Tau-square0.09320.0932Cochrane-Q139.71139.71Pooled -ve LR0.1570.157(95%(CI)(0.095 – 0.257)(0.095 – 0.257)Tau-square0.46310.46357Cochrane-Q33.0033.07Fixed Effect ModelPooled +ve LR 2.330 2.330(95%(CI)(2.208 – 2.459)(2.208 – 2.459)Cochrane-Q139.71139.71Pooled -ve LR0.1050.104(95%(CI)(0.073 – 0.149)(0.073 – 0.148)Cochrane-Q33.0033.07Meta-Regression1Tau-Square0.11410.1141Constant coefficient (SE) 2.520 (0.8370) 2.5197 (0.83699)S coefficient (SE)0.330 (0.1912)0.3304 (0.19123)Covariable coefficient (SE)-0.036 (0.0904)-0.0355 (0.09041)(1) Meta-regression was weighted by the inverse of the variance of dOR and between study variance was estimated by REML.thresholds, and some study level co-variates (such as hor-mone replacement therapy use).As the first step in the analysis, we have used Meta-DiSc to present accuracy measures from each individual study in forest plots for sensitivities (figure 3a), specificities (figure 3b), LRs (figures 4a and 4b) and dOR (figure 5). All these indices can also be represented in tabular form as shown in table 2. Although the forest plots and the tables contain a pooled summary at the bottom, at this early stage in the analysis, it is recommended that the plots are used to obtain a general overview of the accuracy estimates from each study, and the interpretation of the pooled summary is left to later stages of analysis.The next step is the representation of sensitivity against 1-specificity from each study in a ROC space (figure 6), which can be used for exploration for threshold effect. The pattern of the points in this plot suggest a "shoulder-arm" shape, indicating the possibility of threshold effect. We, therefore, performed a Spearman rank correlation as a fur-ther test for threshold effect, and found that there was fur-ther indication of threshold effect (Table 3, Spearman correlation coefficient = 0.394; p = 0.006). Having found some clues about the presence of threshold effect, we now focus on a subgroup of 21 studies that used a singular threshold of >5 mm to define test positivity. Although an explicit threshold of 5 mm was used in these studies, there can still be an implicit threshold effect due to, for exam-ple, variation in the interpretation of the test results. Therefore, within this subgroup with an explicit threshold of 5 mm, it is still recommended that the above explora-tions for threshold effect are undertaken. We performed such analyses for this subgroup in Meta-DiSc, and found no evidence of further threshold effect (data not shown). There are a number of other more advanced methods notFigure 2Meta-Disc datasheet. Meta-DiSc data set with details of test accuracy studies of ultrasound in the prediction of endometrialcancer.implemented in Meta-DiSc that allow to incorporate explicitly information about tests thresholds defined between or within studies [17].As the next step, heterogeneity arising from factors other than threshold effect is explored. We performed a visual exploration of the forest plots of accuracy measures for these 21 studies as well as statistical tests for heterogeneity (Meta-DiSc output not shown). In addition, possible sources of heterogeneity across the studies were explored using meta-regression analysis with the following co-vari-ates as predictor variables: use or non-use of hormone replacement therapy (HRT); technique of ultrasound measurement (single or double layer); and population enrolment (consecutive or other). Results are shown in Table 4, which suggest that the number of layers is strongly associated with accuracy. The double layer tech-nique is associated with two times higher accuracy com-pared to single layer measurement (rdOR = 2.04; 95% CI: 1.01–4.13; p = 0.048)The final step in the analysis is pooling if this is consid-ered appropriate. We illustrate pooling of the LRs for neg-ative test results in one homogenous subgroup of studies of non-HRT users, with a test threshold of ≤ 5 mm, and using a single layer technique (Figure 7). Finally, we dem-onstrate sROC curve fitting in the presence of threshold effect for the whole data-set in Figure 8.Discussion and conclusionMeta-DiSc allows description of individual study results; exploration of heterogeneity with a variety of statistics including chi-square, I-squared and Spearman correlation tests; implements meta-regression techniques to explore the relationships between study characteristics and accu-racy estimates; performs statistical pooling of sensitivities,Figure 3Forest plot. Forrest plot of sensitivities (3a) and specificities (3b) from test accuracy studies of ultrasound in the prediction of endometrial cancer.specificities, likelihood ratios and diagnostic odds ratios, using fixed and random effects models, both overall and in subgroups; and produces high quality figures, includ-ing forest plots and summary receiver operating character-istic curves that can be exported for use in manuscripts for publication.Meta-DiSc is an evolving software. As new diagnostic meta-analytic methods become established over time, they will be implemented into the program in the future. For example, bivariate method of pooling sensitivity and specificity [18] is currently being developed. We will care-fully follow the progress in this field. Once accepted as an established meta-analytic method, it will be implemented in Meta-DiSc. On similar lines, methods of data extraction from individual studies that only provide accuracy meas-ures are currently being developed within our depart-ment. Once these methods have been verified, we will implement this option to assist systematic reviewers in extracting 2-by-2 tables from such studies.Meta-DiSc is a comprehensive and dedicated test accuracy meta-analysis software. All computational algorithms in it have been validated through comparison with different statistical tools and published meta-analyses. Its use and citation in several meta-analyses published in high-rank-ing journals is evidence of external validation of its high quality [23-28].Availability and requirementsThe software is publicly available at http://www.hrc.es/ investigacion/metadisc_en.htm.Operating system: The software runs on Windows based personal computers (Windows 95 or higher) with Pen-tium-class processor or equivalent, with minimum of 32Figure 4Forest plot. Forrest plot of likelihood ratios for positive (4a) and negative (4b) test results from studies of ultrasound in the prediction of endometrial cancer.Figure 5Forrest plot. Forest plot of diagnostic odds ratios (dOR) from test accuracy studies of ultrasound in the prediction of endometrial cancer.Table 2: Tabulation of Likelihood ratio for positive test result (LR+) with respective 95% confidence intervals from all test accuracy studies included in systematic review of ultrasound for prediction of endometrial cancer.Study LR+[95% Conf. Iterval.]% WeightAuslender1,9941,623-2,4492,54Zannoni2,0921,919-2,2802,77Bakour1,8951,490-2,4082,45Botsis7,3604,437-12,2081,69Fistonic1,2001,045-1,3782,69Garuti1,4711,358-1,5932,78Granberg2,0661,935-2,2062,79Guner1,8341,569-2,1442,65Haller1,3211,118-1,5612,63Tsuda2,5171,964-3,2252,43Varner1,7950,842-3,8261,13Abu Ghazzeh1,2150,538-2,7451,03Briley1,8551,396-2,4652,33Cacciatore1,2390,877-1,7522,15DeSilva1,3060,245-6,9570,34Granberg3,9372,933-5,2842,30Grigoriou2,9462,430-3,5722,57Gu1,3070,956-1,7872,25Gupta1,8460,783-4,3500,96Hänggi4,0002,472-6,4731,76Ivanov2,2731,691-3,0542,30Karlsson2,6491,936-3,6272,24Loverro5,9573,648-9,7291,73Malinova1,9631,591-2,4212,53Merz1,6971,287-2,2362,35Nasri2,7401,833-4,0961,98Nasri2,4001,711-3,3672,17Pertl1,2931,115-1,4992,67Suchocki1,1201,027-1,2222,77Taviani1,8020,983-3,3041,44Weber1,6181,374-1,9042,64Wolman2,4811,556-3,9561,80Moreles2,3121,845-2,8962,49Rudigoz2,9811,638-5,4261,46Todorova1,6670,729-3,8081,01Gruboeck7,0363,689-13,4221,35Chan2,5431,779-3,6352,12Degenhardt2,5161,856-3,4112,27Dijkhuizen1,8591,389-2,4892,31Brolmann2,0171,487-2,7362,27Ceccini3,2672,655-4,0212,54Masearetti2,0591,096-3,8661,38Mortakis2,2131,602-3,0582,22Schramm1,2410,899-1,7142,22Smith1,9381,252-3,0011,88Osmers1,9641,699-2,2712,68Seelbach-Göbel1,6801,455-1,9402,68Altuncu et al.29,1674,089-208,020,25(REM) pooled LR+2,0871,881-2,315Heterogeneity chi-squared = 506,06 (d.f.= 47) p = 0,000Inconsistency (I-square) = 90,7%No. studies = 48.Filter OFFAdd 1/2 only zero cell studiesTable 3: Results of Spearman rank correlation of sensitivity against (1 – specificity) to assess the threshold effect in all test accuracy studies included in systematic review of ultrasound for prediction of endometrial cancer.Var.Coeff.Std. Error T p-valueA 2.4120.2928.2660.0000b(1)0.1870.101 1.8570.0697 Spearman correlation coefficient: 0,394 p-value = 0,006 (Logit(TPR) vs Logit(FPR)Moses' model (D = a + bS)Unweighted regressionTau-squared estimate = 0,3540(Convergence is achieved after 2 iterations)Restricted Maximum Likelihood estimation (REML)No. studies = 48Filter OFFTable 4: Results of meta-regression analysis for predicting the presence or absence of endometrial carcinoma with variables: use or non-use of hormone replacement therapy (HRT); technique of ultrasound measurement (single or double layer); and population enrolment (consecutive or other).Meta-Regression(Inverse Variance weights) (1)Var.Coeff.p-value RDOR[95%CI]Cte.0,8570,1571--------S0,2630,0208--------Layers0,7090,06102,03(0,97;4,27) Consecutive0,2060,73981,23(0,35;4,26)HRT0,3240,41521,38(0,63;3,06)Meta-Regression(Inverse Variance weights) (2)Var.Coeff.p-value RDOR[95%CI]Cte.0,8490,1565--------S0,2530,0194--------Layers0,7390,04242,09(1,03;4,27)HRT0,3200,41521,38(0,63;3,02)Meta-Regression(Inverse Variance weights) (3)Var.Coeff.p-value RDOR[95%CI]Cte.0,9590,0999--------S0,2580,0166--------Layers0,7120,04822,04(1,01;4,13)Figure 7Forrest plot . Forrest plots of Likelihood ratios for positive (7a) and negative (7b) test results in one homogenous sub-group of studies of non-HRT users, with a test threshold of ≤ 5 mm, and using a single layer technique.Figure 6ROC Space . Representation of sensitivity against (1-specifi-city) in Receiver Operating Characteristics space for each study of ultrasound in the prediction of endometrial cancer.Figure 8sROC curve . Receiver operating characteristics curve for all studies included in systematic review of ultrasound for prediction of endometrial cancer.MB of RAM and minimum of 20 MB of hard disk space. SVGA color monitor; minimum 800 × 600 screen resolu-tion and 256 colors.Licence: Freeware for academic use.Competing interestsThe author(s) declare that they have no competing inter-ests.Authors' contributionsJZ conceived the idea. AM, VA and JZ developed the soft-ware. AC and KSK tested the software on a number of reviews and gave suggestions for improvements. All authors participated in preparing this manuscript. Additional materialAcknowledgementsThis work has been partly funded by Spanish Health Ministry Grants no PI02/0954, G03/090 and PI04/1055.References1.Thomson R, McElroy H, Sudlow M: Guidelines on anticoagulanttreatment in atrial fibrillation in Great Britain: variation in content and implications for treatment.BMJ 1998, 316:509-513.2.Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, IrwigLM, Lijmer JG, Moher D, Rennie D, de Vet HC: Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative.Radiology 2003, 226:24-28.3.Collaboration C: M ethods Groups Newsletter.http://wwwcochrane org/newsl ett/MGNews-2004 pdf 2006 [http:// /newslett/MGNews-2004.pdf].u J: Meta-Test. Boston: New England Medical Center; 1997.5.The NAG C Library, M ark6. Oxford: Numerical AlgorithmsGroup; 2004.6.Zamora J, Muriel A, Abraira V: Meta-DiSc Statistical Methods.2006 [ftp://ftp.hrc.es/pub/programas/metadisc/ MetaDisc_StatisticalMethods.pdf].7.Irwig L, Tosteson ANA, Gatsonis C, Lau J, Colditz G, Chalmers TC,Mosteller F: Guidelines for Metaanalyses Evaluating Diagnos-tic-Tests.Annals of Internal Medicine 1994, 120:667-676.8.Moses LE, Shapiro D, Littenberg B: Combining independent stud-ies of a diagnostic test into a summary ROC curve: data-ana-lytic approaches and some additional considerations.Stat Med 1993, 12:1293-1316.9.Deville WL, Buntinx F, Bouter LM, Montori VM, de Vet HC, van derWindt DA, Bezemer PD: Conducting systematic reviews of diagnostic studies: didactic guidelines.BMC Med Res Methodol 2002, 2:9.10.Deeks JJ: Systematic reviews of evaluations of diagnostic andscreening tests studies. In Systematic reviews in health care: meta-analysis in context 2nd Edition edition. Edited by: Egger M, Davey SG and Altman DG. BMJ Books; 2001.11.Honest H, Khan KS: Reporting of measures of accuracy in sys-tematic reviews of diagnostic literature.Bmc Heal th Services Research 2002, 2:.12.Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM: The diagnos-tic odds ratio: a single indicator of test performance.J Clin Epi-demiol 2003, 56:1129-1135.13.Lijmer JG, Bossuyt PM, Heisterkamp SH: Exploring sources of het-erogeneity in systematic reviews of diagnostic tests.Stat Med 2002, 21:1525-1537.14.Dinnes J, Deeks J, Kirby J, Roderick P: A methodological reviewof how heterogeneity has been examined in systematic reviews of diagnostic test accuracy.Health Technol Assess 2005, 9:1-113.15.Higgins JP, Thompson SG: Quantifying heterogeneity in a meta-analysis.Stat Med 2002, 21:1539-1558.16.Thompson SG, Sharp SJ: Explaining heterogeneity in meta-anal-ysis: a comparison of methods.Stat Med 1999, 18:2693-2708.17.Rutter CM, Gatsonis CA: A hierarchical regression approach tometa-analysis of diagnostic test accuracy evaluations.Stat Med 2001, 20:2865-2884.18.Reitsma JB, Glas AS, Rutjes AWS, Scholten RJPM, Bossuyt PM, Zwind-erman AH: Bivariate analysis of sensitivity and specificity pro-duces informative summary measures in diagnostic reviews.Journal of Clinical Epidemiology 2005, 58:982-990.19.Walter SD: Properties of the summary receiver operatingcharacteristic (SROC) curve for diagnostic test data.Stat Med 2002, 21:1237-1256.20.DerSimonian R, Laird N: Meta-analysis in clinical trials.ControlClin Trials 1986, 7:177-188.21.Gupta JK, Chien PF, Voit D, Clark TJ, Khan KS: Ultrasonographicendometrial thickness for diagnosing endometrial pathology in women with postmenopausal bleeding: a meta-analysis.Acta Obstet Gynecol Scand 2002, 81:799-816.22.Khan KS, Kunz R, Kleijnen J, Antes G: Case study 4: Reviewingevidence on test accuracy.In Systematic Review to Support Evi-dence-based Medicine 2003 edition. London, The Royal Society of Medicine; 2003:109-119.23.Morgan M, Kalantri S, Flores L, Pai M: A commercial line probeassay for the rapid detection of rifampicin resistance in Mycobacterium tuberculosis: a systematic review and meta-analysis.BMC Infectious Diseases 2005, 5:62.24.Flores L, Pai M, Colford J M, Riley LW: In-house nucleic acidamplification tests for the detection of M ycobacterium tuberculosis in sputum specimens: meta-analysis and meta-regression.BMC Microbiol 2005, 5:55.25.Gisbert J, Abraira V: Accuracy of Helicobacter pylori Diagnos-tic Tests in Patients with Bleeding Peptic Ulcer: A System-atic Review and M eta-analysis.The American Journa l of Gastroenterology 2006, 101:848-863.26.Shiga T, Wajima Z, Inoue T, Sakamoto A: Predicting difficult intu-bation in apparently normal patients: a meta-analysis of bed-side screening test performance.Anesthesiology 2005, 103:429-437.27.Zijlstra JM, van der Werf G, Hoekstra OS, Hooft L, Huijgens PC: F-fluoro-deoxyglucose positron emission tomography for post-treatment evaluation of malignant lymphoma: a systematic review.Haematologica 2006, 91:522-9.28.Goodacre S, Sutton AJ, Sampson FC: Meta-Analysis: The Value ofClinical Assessment in the Diagnosis of Deep Venous Thrombosis.Annals of Internal Medicine 2005, 143:129-139.Pre-publication historyThe pre-publication history for this paper can be accessed here:/1471-2288/6/31/prepubAdditional File 1Meta-Disc data set. This file contains simulated data. It is provided to help users to validate statistical procedures shown in table 1.Click here for file[/content/supplementary/1471-2288-6-31-S1.dsc]Additional File 2STATA data set. This file contains simulated data. It is provided to help users to validate statistical procedures shown in table 1.Click here for file[/content/supplementary/1471-2288-6-31-S2.dta]。
Statistical methodsGeneral principlesMeta-analysis is a two-stage process 1. In the first stage, a summary statistic is calculated for each study. Unlike controlled trials, in evaluation of diagnostic tests, each study is summarized by a pair of statistics that measures the test’s accuracy. The pair is usually either sensitivity and specificity or positive and negative likelihood ratios. In the second stage, the overall test accuracy indexes are calculated as the weighted average of these summary statistics. Meta-analysis should only be performed when studies have recruited clinically similar patients and have used comparable experimental and reference tests. When there is considerable heterogeneity in study results, the reviewer should investigate the reasons for these differences rather than reporting a pooled estimate.This document describes the different tools implemented by MetaDiSc : i) summarizing data from each individual study, ii) investigating the homogeneity of studies graphically and statistically, iii) computing the pooled indexes and iv) exploring heterogeneity.Summary statistics in individual studiesThe results of each individual study should be presented in a 2 x 2 table (Table 1) showing the number of people who have been classified as positive and negative by the experimental test among the groups of participants with and without disease according the reference test.Accuracy can be expressed by sensitivity (proportion of positives among people with disease) and specificity (proportion of negatives among people without disease).a d Sen Spe DN ==Dor by the likelihood ratios11acSenSenDD LR LR SpeSpeND ND−+==−==−The likelihood ratios express how much more frequent the respective result is among subjects with disease than among subjects without disease.Another measure of the test accuracy, useful in meta-analysis, is the diagnostic odds ratio (DOR )LR a dDOR LR b c+×==−×The DOR expresses how much greater the odds of having the disease are for the people with a positive test result than for the people with a negative test result. It is a single measure of diagnostic test performance that combines both likelihood ratios.Standard errors and confidence intervalsThe confidence intervals of sensitivity and specificity are calculated using the F distribution method 2 to compute the exact confidence limits for the binomial proportion (x /n ).()112,2(1),1/22(1),2(),/21111x n x x n x n x n x LL UL xF x F αα−−−+−+−⎛⎞⎛−+−⎟⎟⎜⎜⎟⎟⎜⎜=+=+⎟⎟⎜⎜⎟⎟⎜⎜+⎝⎠⎝⎞⎠The distribution of the logarithm of the likelihood ratios are approximately normal and their standard errors are 1 :()ln SE LR +=()ln SE LR −=Thus the confidence intervals of the LR’s are/2(ln )e z SE LR LR α±The distribution of logarithm of the diagnostic odds ratio is also approximately normal, with standard error given 1 by()ln SE DOR =Thus the confidence interval of the DOR is/2(ln )e z SE DOR DOR α±Assessing homogeneityThe degree of variability among study results should first be evaluated graphically by plotting the sensitivity and specificity from each study on a forest plot. Some divergence is to be expected by chance, but variation in other factors may increase the observed heterogeneity.There is one important extra source of variation in meta-analysis of diagnostic accuracy: the studies included may have used, explicitly or implicitly, different thresholds to define positive and negative test results. To explore this source of variation, it is useful to plot sensitivity and specificity on an ROC plane. If such a threshold effect exists, the points will show a curvilinear pattern. One can also test for this threshold effect by calculating the Spearman correlation coefficient between sensitivity and specificity. If the threshold effect exists an inverse correlation appears 3. Combining study results in these cases involves fitting an ROC curve rather than pooling sensitivities and specificities or likelihood ratios.The homogeneity of the sensitivities and specificities can also be tested applying the likelihood ratio test 4.(From here on, we will use the notation in Table 1, with subscripts i to designate an individual study and T for overall index).22ln ln i i Seni i T iT iT iT i T i i iiT Ta c G a c a a c c D a D c D D D ⎛⎞⎟⎜⎟⎜⎟⎜⎟⎜=+==⎜⎜××⎟⎜⎟⎜⎟⎜⎟⎝⎠∑∑iD =∑∑22ln ln i i Spe i i T iT iT iT i T i i iiiT T d b G d b b b d d ND d ND b ND ND ND ⎛⎞⎟⎜⎟⎜⎟⎜⎟⎜⎟=+==⎜⎟⎟⎜××⎟⎜⎟⎜⎟⎟⎜⎝⎠∑∑ND =∑∑In the homogeneity hypothesis, both have asymptotic chi-squared distribution with k -1 degrees of freedom (k being to the number of the studies).The homogeneity of likelihood ratios and diagnostic odds ratios can be tested using Cochran’s Q test based upon inverse variance weights 1, which also has a chi-squared distribution with k -1 degrees of freedom.()()221ln ln ln i i T i ii Q w w SE θθθ=−=∑where θ is the positive or negative likelihood ratio or the diagnostic odds ratio.As meta-analyses often include small numbers of studies the power of both tests (G 2 and Q ) is low, so they are poor at detecting true heterogeneity among studies as significant. An alternative approach to quantify the effect of heterogeneity is the I 2 index which describes the percentage of total variation across studies that is due to heterogeneity rather than chance 5. I 2 is calculated222(..)100d f I χχ−=×where is the G 2 ó Q statistic and d.f. its degrees of freedom.2χPooled indexesSensitivities, specificities and likelihood ratios should only be pooled in the absence of variability of the diagnostic threshold.Sensitivity and specificity are pooled byii i i TTiiiia d Sen Spe DN ==∑∑∑∑DThese formulas correspond to weighted averages in which the weight of each study is its sample size.Likelihood ratios and diagnostic odds ratios can be pooled by the Mantel-Haenszel method (fixed effects model) or by the DerSimonian Laird method (random effects model) to incorporate variation among studies. Both methods compute a weighted average, but the difference lies in the weights used and the “effect size” to be averaged. With the Mantel-Haenszel method, the DOR ’s or LR ’s are averaged whereas with the DerSimonian Laird method, the logs of DOR’s or LR ’s are averaged 1.ln ln MH DL i i i iMH DL i iTTMH DL ii iiw w wwθθθθ==∑∑∑∑The Mantel-Haenszel weights are:::MH MH MH i i i i i ii i iib c b D d D DOR w LR w LR w T T =+=−=i iTFor all statistics, the DerSimonian Laird weights are:()221ln DL i i w SE θτ=+where θi is the likelihood ratio or diagnostic odds ratio and τ2 an estimation of between studies variance given by22(1) if 10if 1i i i i i i Q k Q k w w w Q k τ⎧−−⎪⎪>−⎪⎛⎞⎪⎟⎜⎪⎟⎜⎪⎟⎪⎜⎟−=⎨⎜⎟⎜⎟⎪⎟⎜⎪⎜⎟⎝⎠⎪⎪⎪⎪<−⎪⎩∑∑∑Q stands for the Cochran homogeneity statistic calculated using the Mantel-Haenszel overall estimate and w i the inverse variance weights.Standard errors and confidence intervals of pooled indexesThe confidence intervals of overall sensitivity and specificity are also calculated using the F distribution method 2 to compute the exact confidence limits for the binomial proportion. However, Meta-DiSc optionally computes them using overdispersion correction. In this case, it uses the normal approximation to binomial, i.e.()T SE Sen =()T SE Spe =the confidence intervals 6 corrected by overdispersion are:/2(T Sen Sen z SE Sen αϕ±)T /2()T Spe T Spe z SE Spe αϕ±with the correction factorsSenϕ=with 222T i T i i i T T Sen T i T i i T Ta D c D a c D D a D c D D D χ⎛⎞⎛⎞⎛⎟××⎜⎟⎟⎜⎜⎟⎜⎟⎟−−⎟⎜⎜⎜⎟⎟⎟⎟⎟⎜⎜⎜⎟⎝⎠⎝⎜=+⎜⎜××⎟⎜⎟⎜⎟⎜⎟⎜⎟⎝⎠⎟⎜∑⎞⎠Speϕ= with 222T i T i i i T T Spe T i T i i T Td ND b ND d b ND ND d ND b ND ND ND χ⎛⎞⎛⎞⎛⎟××⎜⎟⎟⎜⎜⎟⎜⎟⎟−−⎟⎜⎜⎜⎟⎟⎟⎟⎟⎜⎜⎜⎟⎝⎠⎝⎜⎟=+⎟⎜⎟⎜××⎟⎜⎟⎜⎟⎜⎟⎜⎟⎝⎠⎟⎜∑⎞⎠The distribution of the logarithm of the Mantel-Haenszel overall likelihood ratios and overall DOR are approximately normal with standard error given 1 by:()ln SE LR +=()ln SE LR −=()ln SE DOR =where()()2i i i i i i i D ND a b a bT P T ×+−=∑i i i i a ND UT =∑i i ic DV T =∑'i i b ND U T =∑i 'i i id DV T =∑i i i a d R T =∑i i ib cS T =∑ ()2i i i i a d a d E T +=∑i()2i i i i a d b c F T +=∑i()2i i i i b c a d G T +=∑i()2i i i i b c b c HT +=∑iThus the confidence intervals are:/2(ln )e z SE LR LR α±/2(ln )e z SE DOR DOR α±The distribution of the logarithm of the DerSimonian-Laird overall likelihood ratios and overall DOR are also approximately normal with standard error given 1 by:()ln DLT SE θ=Thus the confidence intervals are:/2(ln )e z SE αθθ±ROC curvesIf there is any evidence of diagnostic threshold variation among studies, the best summary of study results will be an ROC curve rather than a single point. The shape of the ROC curve depends on the underlying distribution of test results in patients with and without the disease 7. There are two methods of fitting the ROC curve.Diagnostic tests where the DOR is constant regardless of the diagnostic threshold have symmetrical curves around the “Sen =Spe ” line. In these situations, it is possible to combine DOR’s by the Mantel-Haenszel or the DerSimonian Laird methods to estimate the overall DOR and hence to determine the best-fitting ROC curve 8. The equation of curve is given by1111T Sen Spe DOR Spe =+⎛⎞−⎟⎜×⎟⎜⎟⎜⎝⎠When the DOR changes with diagnostic threshold, the ROC curve is asymmetrical. To study DOR variation in according to threshold, and thereby fit symmetrical or asymmetrical curves, the Moses- Shapiro-Littenberg method 8 is used.The method consists of studying this relationship by fitting the straight lineD a bS =+where D is the log of DOR and S a measure of threshold given by1ln 1Sen Spe S SenSpe ⎛⎞−⎜=×⎜⎜−⎝⎠Estimates of parameters a and b and their standard errors and covariance are obtained by ordinary or weighted least squares method using the NAG C library 9. The weights can be simply the sample size or the inverse of variance of the log of the DOR . Optionally, random effects between studies can be taken into account using one of three different iterative methods (Restricted maximum likelihood, Maximun likelihood and Empirical Bayes)10.Testing the hypothesis of whether or not diagnostic performance (measured by DOR ) varies with threshold is equivalent to testing whether parameter b = 0. If b =0, there is no variation, themethod yields a symmetrical ROC curve, and e ais an estimate of the overall DOR . However if b ≠ 0, variation exists and the ROC curve is asymmetrical, given by equation1111111ba bbSen Spe eSpe +−=+⎛⎞−⎟⎜×⎟⎜⎜⎝⎠A useful statistic in pooling studies by means of the ROC curve is the area under the curve (AUC) which summarizes the diagnostic performance as a single number 11: a perfect test will have an AUC close to 1 and poor tests have AUCs close to 0.5. The AUC is computed by numeric integration of the curve equation by the trapezoidal method.Another useful statistic is the Q* index, defined by the point where sensitivity and specificity are equal, which is the point closest to the ideal top-left corner of the ROC space. It is calculated 12 by:*Q =Standard errors of AUC and Q* and confidence intervals of ROC curveThe standard error of the area under the symmetrical ROC curve is given 12 by()()()(3()1ln 21ln 1T)sym T T T T T DOR SE AUC DOR DOR DOR SE DOR DOR ⎡⎤=+−−⎣⎦−but if the curve is asymmetrical its standard error is given by()asy SE AUC =where A and B are:12011exp 111exp 11pp x a x A dx b b x a x b ⎛⎞⎟⎜⎟⎜⎟⎜⎛⎞⎛⎞⎝⎠−⎟⎜⎜=⎟⎜⎜⎟⎜⎜⎝⎠⎝⎠−−⎡⎤⎛⎞⎛⎞⎢⎥⎟⎟⎜⎜+⎟⎟⎜⎜⎢⎥⎟⎟⎜⎜⎝⎝−−⎢⎥⎣⎦∫21202ln 111exp 111exp 11pp x x a a x x B dx b b x a x b ⎡⎤⎛⎞⎛⎞⎟⎟⎜⎜⎢⎥+⎟⎟⎜⎜⎟⎟⎜⎜⎢⎥⎛⎞⎛⎞⎝⎠⎝⎠−−⎣⎦⎟⎜⎜=⎟⎜⎜⎟⎜⎜⎝⎠⎝⎠−−⎡⎤⎛⎞⎛⎞⎢⎥⎜⎜+⎜⎜⎢⎥⎜⎜⎝⎝⎢⎥⎣⎦∫ with11bp b+=−.When the range of ROC curve is constrained to some limits (upper-left quadrant or user defined limits) the standard error of AUC is computed using the formula of asymmetrical curve, substituting accordingly the integration limits. Of note, the standard error of AUC for constrained curves can be only calculated when DOR T is computed from Moses’s model.The standard error of Q*is()()*ln 21T SE Q DOR =+Confidence interval of symmetrical ROC curve is calculated introducing the upper and lower limits of confidence interval of overall DOR in the equation of curve. For asymmetrical ROC curves obtained by Moses-Shapiro-Littenberg model, Mitchell 13 suggests that a confidence interval for curve is the back-transformed on the confidence band for the linear regression. The back-transformed is given by211D S Sen e+−=+ 211D S Spe e−−=+Meta-regressionTo explore sources of heterogeneity in the studies, the Moses-Shapiro-Littenberg method can be extended by adding covariates14 to the model. The antilogarithm transformations of the resulting estimated parameters can be interpreted as a relative DOR(RDOR) of the corresponding covariable. They indicate the change in diagnostic performance of the test under study per unit increase in the covariate.Note about correction of cells with zeroIf any study has a table with a 0 value in any cell, some statistics implemented by MetaDisc cannot be calculated. A solution to this problem, suggested by Cox15, is to add 0.5 to all cells in the table. MetaDisc allows users to select from the following options: I) eliminating meta-analysis of all studies with 0 in any cell, ii) adding 0.5 to all cells in the studies where any cells were 0, iii) adding 0.5 to all cells in all studies.In any case, this correction does not apply to calculations of sensitivity and specificity except for the SROC graph, where points correspond to sensitivities and specificities are calculated with the selected correction.References1. Deeks JJ. Systematic reviews of evaluations of diagnostic and screening tests. In Egger M, SmithGD, Altman DG (eds). Systematic Reviews in Health Care. Meta-analysis in context. London: BMJ Books; 2001:248-282.2. Leemis LM, Trivedi KS. A Comparison of Approximate Interval Estimators for the BernoulliParameter. Am Stat 1996; 50:63-68.3. Devillé WL, Buntinx F, Bouter LM, Montori VM, de Vet HC, van der Windt DA, Bezemer PD.Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol 2002; 2:9.4. Agresti A. Analysis of ordinal categorical data. New York: John Wileys & Sons; 1984.5. Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses.BMJ 2003; 327:557-560.6. McCullagh P, Nelder JA. Generalized Linear Models. Boca Raton: Chapman & Hall; 1989.7. Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicatorof test performance. J Clin Epidemiol 2003; 56:1129-1135.8. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into asummary ROC curve: data-analytic approaches and some additional considerations. Stat Med 1993; 12:1293-1316.9. The Numerical Algorithms Group. NAG C Library. Oxford: ; 2004.10. Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a comparison of methods.Stat Med 1999; 18:2693-2708.11. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic(ROC) curve. Radiology 1982; 143:29-36.12. Walter SD. Properties of the summary receiver operating characteristic (SROC) curve for diagnostictest data. Stat Med 2002; 21:1237-1256.13. Mitchell MD. Validation of the summary ROC for diagnostic test meta-analysis: A Monte Carlosimulation. Acad Radiol 2003; 10:25-31.14. Lijmer JG, Bossuyt PM, Heisterkamp SH. Exploring sources of heterogeneity in systematic reviewsof diagnostic tests. Stat Med 2002; 21:1525-1537.15. Cox DR. The analysis of binary data. London: Methuen; 1970.。