Pacific Symposium on Biocomputing 7211-222 (2002) The Accuracy of Fast Phylogenetic Methods
- 格式:pdf
- 大小:598.47 KB
- 文档页数:12
· 指南与共识·免疫细胞功能状态量化检测评估与临床应用专家共识中国医疗保健国际交流促进会肝脏移植学分会 中国医疗保健国际交流促进会肾脏移植学分会 中国医药生物技术协会生物诊断技术分会 【摘要】 免疫系统是维持机体器官功能健康和预防疾病的重要保障,免疫健康管理和疾病免疫治疗目标是恢复免疫系统的正常功能状态。
免疫学领域研究解决了如何抑制或提高免疫状态的技术性难题,随之带来亟需回答的问题是如何全面地检测和量化评估免疫状态,这是下一个挑战,目前国际上尚无成熟解决方案。
免疫状态量化检测与可视化评估对疾病防控、亚健康状态管理和疾病免疫治疗均具有重要意义。
本专家共识针对正常免疫状态定义和免疫细胞功能状态(免疫力)全面量化评估及可视化评分技术手段等问题进行了初步讨论,提出了正常免疫状态相关的基础概念和思考,探讨免疫细胞功能状态量化检测评估方向和原则,并以此为契机,推动免疫力解码以及免疫健康领域基础课题和临床试验的深入研究。
【关键词】 免疫力;免疫细胞;免疫评估;免疫治疗;健康管理【中图分类号】 R617, R392.4 【文献标志码】 A 【文章编号】 1674-7445(2024)04-0005-10Expert Consensus on quantify monitoring and assessment of immune cell function status and clinical application China International Exchange and Promotive Association for Medical and Health Care (CPAM), Society of Liver Transplantation ,Society of Kidney Transplantation; China Medicinal Biotech Association (CMBA ), Society of Biological Diagnostics.Corresponding authors: He Qiang, Beijing Chaoyang Hospital of Capital Medical University, Beijing 100020, China, Email:*******************Li Xianliang, Beijing Chaoyang Hospital of Capital Medical University, Beijing 100020, China, Email:***********************【Abstract 】 The immune system is the important guarantee for maintaining the health of organ function and preventing diseases. The goal of immune health management and immune treatment is to restore the normal function of the immune system. The technical problems of how to inhibit or enhance the immune status has been solved in the field of immunology, but how to comprehensively detect and quantitatively evaluate the immune status is still a challenge. There is no mature solution at present. The quantification detection and visualization evaluation of immune status are of great significance for disease prevention and control, sub-health status management, and immune treatment. This expert consensus has carried out preliminary discussions on the definition of normal immune status and the comprehensive quantitative evaluation and visual scoring techniques of immune cell function status (immune function), put forward the basic concepts and thinking related to normal immune status, discussed the direction and principles of quantitative detection and evaluation of immune cell function status, and taken this as an opportunity to promote the decoding of immunity and the study of basic and clinical trials in the field of immune health.【Key words 】 Immunity; Immune cell; Assessment of immunity; Immunotherapy; Health managementDOI: 10.3969/j.issn.1674-7445.2024078基金项目:国家自然科学基金(82370665);北京市自然科学基金(7232068、7232065)通信作者单位: 100020 北京,首都医科大学附属北京朝阳医院(贺强、李先亮)通信作者:贺强,Email : *******************;李先亮,Email : ***********************第 15 卷 第 4 期器官移植Vol. 15 No.4 2024 年 7 月Organ Transplantation Jul. 2024 免疫系统是人类健康的基石,大多数疾病和健康状况与免疫状态密切相关,因此免疫状态管理是健康核心问题。
百泰派克生物科技
单克隆抗体测序
单克隆抗体是单一种B细胞增殖分化的子代细胞(浆细胞)产生的只针对单一抗原决定簇的免疫球蛋白分子,其本质是一种抗体蛋白。
与多克隆抗体不同,一种单克隆抗体只特异性结合一种抗原决定簇。
单克隆抗体为新药研发、诊断试剂开发提供了重要思路,现已广泛用于畜禽传染病的诊断以及肿瘤的治疗等方面。
单克隆抗体测序指测定单克隆抗体的一级结构—氨基酸种类和排列顺序,即单克隆抗氨基酸序列分析。
单克隆抗体的一级结构是开发新抗体药物、抗体仿制药物的理论依据,目前主要利用RT-PCR(逆转录-PCR)和基于质谱的方法进行分析。
百泰派克生物科技基于Thermo Fisher的Orbitrap Fusion Lumos质谱平台提供单克隆抗体测序一站式技术服务,包括不同亚型抗体(例如IgG和IgM)和不同类型
抗体(荧光偶联抗体,固定化抗体,不同物种的抗体)的一级结构测定,欢迎垂询。
专利名称:用于治疗恶病质的环状肽
专利类型:发明专利
发明人:舒布赫·D·夏尔马,拉梅什·拉吉普罗希特,凯文·B·伯里斯,施亦群,安妮特·M·沙迪亚克
申请号:CN200580029661.6
申请日:20050706
公开号:CN101010095A
公开日:
20070801
专利内容由知识产权出版社提供
摘要:一种如下化学式示出的高选择性黑皮质素-4受体拮抗剂环状六肽,其中R、R、R、R、R、R、x、y和z如说明书中说明,及一种治疗体重异常,包括恶病质、肌肉减少和衰竭综合征或疾病的方法,及治疗炎症和免疫异常的方法。
申请人:帕拉丁科技公司
地址:美国新泽西
国籍:US
代理机构:永新专利商标代理有限公司
代理人:林晓红
更多信息请下载全文后查看。
第3期耿春贤,等.龟鹿补肾丸对过敏性哮喘大鼠气道炎症和Th1/Th2失衡的影响[4]郭洁,田振峰,景璇,等.自拟小青龙汤联合神阙穴拔罐对寒哮型支气管哮喘Th1/Th2型细胞因子失衡及气道重塑的影响[J].现代中西医结合杂志,2018,27(20):2167-2174.[5]宁一宁,徐丽,张会宗.天贝止喘汤对支气管小鼠Thl/Th2影响的研究[J].中南药学,2017,15(1):22-25.[6]王丽新,吴银根.运用温阳补肾法治疗支气管哮喘经验[J].上海中医药杂志,2014,48(12):1-2.[7]黄伟玲,王丽新.温阳补肾法防治支气管哮喘的现代机理研究[J].江西中医药,2019,50(3):64-68.[8]金华良,王利民,罗清莉,等.淫羊藿对哮喘大鼠气道高反应性的影响[J].中国实验方剂学杂志,2014,20(23):169-173. [9]刘建秋,张霞,王雪慧.温阳补肾类中药单体治疗支气管哮喘实验述评[J].中华中医药学刊,2017,35(9):2222-2224. [10]张亚,杨琳,姜荣燕,等.黄芪对哮喘小鼠支气管肺泡灌洗液中IL-2、IL-4及其行为学变化的影响[J].泰山医学院学报, 2017,40(12):914-916.[11]昝杰彧,孙剑玥,徐亚娜,等.平喘汤对哮喘小鼠模型NLRP3炎症小体活化的影响[J].世界中医药,2020,15(23): 3646-3652.[12]黄志豪.水飞蓟宾干预对哮喘小鼠气道炎症与气道Muc5ac的作用及其机制研究[D].贵阳:遵义医科大学,2021. [13]刘位位,孙起翔,张景鸿,等.雾化吸入灭活草分枝杆菌对哮喘模型小鼠T-bet/GATA-3表达的影响[J].中国药理学通报,2018,34(8):1099-104.[14]HAMMAD H,LAMBRECHT B N.Barrier epithelial cells andthe control of type2immunity[J].Immunity,2015,43(1):29-40.[15]CUI Haiyan,HUANG Jiawei,LU Manman,et al.Antagonisticeffect of vitamin E on nAl2O3-induced exacerbation of Th2 and Th17-mediated allergic asthma via oxidative stress[J].Environmental pollution,2019,252(Pt B):1519-1531.[16]FANTA C H.Anti-interleukin-5therapy and severe asthma[J].N Engl J Med,2009,360(24):2576-2578.[17]NAKAYAMA T,YAMASHITA M.Initiation and maintenanceof Th2cell identity[J].Curr Opin Immunol,2008,20(3):265-271.[18]TUMES D J,CONNOLLY A,DENT L A.Expression ofsurviving in lung eosinophils is associated with pathology ina mouse model of allergic asthma[J].Int Immunol,2009,21(6):633-644.[19]潘小晶,王锁英,许仕溪,等.T-bet基因传递对支气管哮喘模型小鼠T淋巴细胞亚群的免疫调节作用[J].中华实用儿科临床杂志,2013,28(4):281-284.[20]ANO S,MORISHIMA Y,ISHII Y,et al.Transcription factorsGATA-3and RORγare important for determining the phenotype of allergic airway inflammation in a murine model of asthma[J].J Immunol,2013,190(3):1056-1065.(责任编辑:幸建华)美国FDA批准第一种呼吸道合胞病毒疫苗美国FDA于2023年5月3日批准首款呼吸道合胞病毒(Respiratory syncytial virus,RSV)疫苗Arexvy,用于60岁及以上年龄人群预防呼吸道合胞病毒感染。
纳米抗体研究进展综述摘要:单域抗体因其独特的优势,如水溶性好、分子量小、稳定性好、免疫原性小等一系列特点,在生物研究和医学领域中的作用愈发广泛。
在疾病诊断、病原检测、癌症疾病治疗、药物残留检测分析,坏境检测,用作sdAbs分子探针、分子诊断和显影等等领域具有广阔的应用前景。
纳米抗体因其优势,可实现重组表达,从而使得生产周期和生产成本均可大幅下降,是目前国内外研发的热点。
作者重点介绍了纳米抗体的特点,然后简述了纳米抗体的制备流程,简述了纳米抗体在疾病诊断、疾病治疗、食品安全和环境监测等领域的应用,最后对纳米抗体的应用前景进行了分析和展望。
1 介绍自1890年,第一种抗体——抗毒素,这是在血清中发现的第一种抗体[1]。
这是一种可中和外毒素的物质,1975年,杂交瘤技术的诞生开始了抗体研究和应用快速发展的时代。
由于抗体可特异性识别和结合抗原的特性,使其在疾病诊断、疾病治疗、药物运载、病原、毒素和小分子化合物检测等领域具有广泛的应用[2]。
但通过单克隆抗体技术制备的传统单克隆抗体有其不可忽视的缺点:生产耗时长、成本高、在组织和肿瘤中穿透力差、长期使用会引起机体免疫排斥反应以及动物道德问题等。
相比于传统抗体,纳米抗体具备传统抗体不具备的分子质量小和穿透性强的优势而成为现在抗体研究的主要方向之一。
单链抗体(single chain antibody fragment,scFv)就是新型小分子抗体的一种,其穿透力更强、生产成本更低,但scFv抗体存在溶解度低、稳定性较差、表达量低、易聚合和亲和力低的缺点[3]。
1989年,比利时免疫学家Hamers-Casterman 在骆驼血清中的偶然发现一种天然缺失轻链的重链抗体(HcAbs)可以解决scFv所存在的问题,重链抗体只包含2个常规的CH2与CH3区和1个重链可变区(VHH),重链可变区具有与原重链抗体相当的结构稳定性以及与抗原的结合活性,是已知的可结合目标抗原的最小单位,其分子质量只有单克隆抗体的1/10,是迄今为止获得的结构稳定且具有抗原结合活性的最小抗体单位,因此也被称作纳米抗体(nanobody,Nb)[4]。
人类基因组测序方法人类基因组测序是获取个体基因组的DNA序列信息的过程。
随着技术的进步,出现了多种测序方法,其中一些主要的包括:1. Sanger测序(链终止法):这是早期使用的一种测序方法。
它基于DNA合成链的终止原理,通过使用包含具有特定荧光标记的二进制链终止核苷酸,来逐步合成DNA链并测序。
2. Illumina测序(高通量测序): Illumina是目前最常用的高通量测序技术。
它采用桥放大法,将DNA片段固定在表面上,通过逐步合成链来测序。
Illumina技术具有高通量、较低成本和相对较短的读长等优势。
3. 454测序(Pyrosequencing): Roche 454测序使用焦磷酸测序技术,通过测量每个核苷酸加入时释放的焦磷酸,来确定DNA序列。
然而,这个技术相对于Illumina而言,已经较少被使用。
4. Ion Torrent测序: Ion Torrent测序基于半导体芯片,通过检测氢离子的释放来确定DNA序列。
这是一种相对新的测序技术,具有快速测序的特点。
5. PacBio单分子实时测序: Pacific Biosciences (PacBio) 提供了一种单分子实时测序技术,它基于测量DNA聚合物链上的荧光来进行测序。
PacBio测序的优势之一是较长的读长,有助于解决一些基因组中的结构性变异。
6. Oxford Nanopore测序: Oxford Nanopore Technologies 提供的测序技术使用纳米孔来测定DNA序列。
这种方法具有长读长和实时测序的优势,适用于一些复杂的基因组结构。
这些测序方法有不同的优缺点,适用于不同的研究目的。
研究人员根据项目需求、预算和技术要求选择合适的测序平台。
在实践中,通常会使用多种测序方法的组合,以获取更全面的基因组信息。
西洋参协同蜂胶增强荷瘤小鼠免疫功能的实验研究【摘要】目的:观察西洋参增强蜂胶提高肝癌移植瘤(hepa)小鼠免疫功能的作用。
方法:实验设阴性对照组(蒸馏水)、蜂胶对照组(2.0g/kg体重)、蜂胶西洋参低、中、高三个剂量组(分别为0.5、1.0、2.0g/kg体重),相当于人体临床用量的7.5、15和30倍。
结果:与阴性对照组比较,蜂胶西洋参中、高二个剂量组可明显提高荷瘤小鼠血清溶血素水平、小鼠nk细胞活性和cona诱导的脾淋巴细胞转化,差异均有显著性(p<o.05);而蜂胶对照组仅cona诱导的脾淋巴细胞转化有明显的提高作用(p<o.05)。
结果还表明,蜂胶及蜂胶西洋参各组对hepa肿瘤的抑制率在10%~14%,但与阴性对照组比较无明显差异(p>o.05)。
结论:西洋参可增强蜂胶提高肝癌移植瘤(hepa)小鼠的免疫功能。
【关键词】蜂胶;西洋参;调节免疫;抗肿瘤蜂胶在我国市场出现只有40多年历史,这是由于中华蜂不产蜂胶,所以从意大利蜂引进我国才产蜂胶。
蜂胶在2005年作为中药材收录中国药典,目前国内产品多以单纯蜂胶制成的保健品为主,将蜂胶与西洋参配伍进行产品研发报导很少,本文将蜂胶、蜂胶西洋参组合物灌胃肝癌移植瘤小鼠,并采用血清溶血素测定、nk细胞活性测定、cona诱导的脾淋巴细胞转化功能测定和肿瘤抑制率测定,观察其对免疫功能的影响及抗肿瘤作用,期望为蜂胶配伍西洋参的产品研发提供实验依据,现将结果报告如下:1实验动物和检测条件实验动物用icr种小鼠由浙江省实验动物中心提供,清洁级,雌性,体重20±2g。
饲料由浙江省实验动物中心提供,执行标准gb14924-1994。
检测环境条件:温度91-93℃,相对湿度60-65%。
动物于试验前在动物房环境中适应3天2实验药物和剂量设计2.1 蜂胶:采自养蜂场的原蜂,由杭州天厨蜜源保健品有限公司提供。
将蜂胶经乙醇提纯,经冷冻干燥制成纯蜂胶。
专利名称:一类格尔德霉素衍生物及其制备方法和在制备抗肿瘤药物中的应用
专利类型:发明专利
发明人:王俊锋,王军舰,黄小龙,刘永宏
申请号:CN202011437221.7
申请日:20201207
公开号:CN112500348A
公开日:
20210316
专利内容由知识产权出版社提供
摘要:本发明公开了一类格尔德霉素衍生物及其制备方法和在制备抗肿瘤药物中的应用。
格尔德霉素类化合物,其结构如式(Ⅰ)所示的任一化合物。
本发明从海洋链霉菌Streptomyces
sp.HNM0561中分离得到1个结构新颖的含有环戊烯酮片段的格尔德霉素衍生物‑化合物1,以及化合物2‑5,化合物1‑5具有显著的抗前列腺癌和小细胞肺癌活性,可用于开发抗肿瘤药物,因此本发明为海洋链霉菌Streptomyces sp.HNM0561提供了新的应用,为开发新的抗肿瘤药物提供了备选化合物,对开发中国海洋药物资源研究具有重要的意义。
申请人:中国科学院南海海洋研究所
地址:511458 广东省广州市南沙区海滨路1119号
国籍:CN
代理机构:广州科粤专利商标代理有限公司
更多信息请下载全文后查看。
抗体药物寡糖谱分析
抗体药物是一类通过人工合成的抗体来治疗疾病的药物,通过与目标分子特异性结合从而达到治疗的目的。
常见的抗体药物类型包括单克隆抗体、人工合成的抗体片段、免疫毒素、抗体药物共轭物等。
抗体药物在治疗多种疾病方面表现出显著的疗效,如癌症、自身免疫性疾病、炎症性疾病、免疫调节及眼科疾病等。
寡糖是由少于十个单糖分子通过糖苷键连接形成的糖类结构,寡糖对于抗体的稳定性和功能具有重要的影响:例如,它们可以影响抗体药物的溶解性、稳定性以及与受体的结合能力,还可以改变抗体药物的半衰期和免疫效应。
寡糖谱分析则是抗体药物质量控制的关键一环。
通过寡糖谱分析,我们可以精确地分析和鉴定药物中的糖链结构,为药物的质量控制提供重要依据。
目前,分析寡糖谱的主要方法主要是基于质谱(MS)进行的,它可以提供关于寡糖结构和修饰的详细信息。
质谱分析通常需要将寡糖从抗体药物上释放出来,然后通过质谱仪进行精细的结构分析。
生物制品表征寡糖谱分析示意图。
百泰派克生物科技(BTP),采用ISO9001认证质量控制体系管理实验室,获国家CNAS实验室认可,为客户提供符合全球药政法规的药物质量研究服务。
我们基于
高分辨率质谱仪Obitrap Fusion Lumos,利用包含了几乎所有糖型的Byonic软件,为您提供抗体药物寡糖谱分析服务,用来确定重组蛋白样品的糖基化位点、N糖
/O糖类型、糖基化位点糖组成等。
百泰派克生物科技抗体药物表征内容。
The Accuracy of Fast Phylogenetic Methods for Large DatasetsLuay Nakhleh Bernard M.E.Moret Usman Roshan Dept.of Computer Sciences Dept.of Computer Science Dept.of Computer Sciences University of Texas University of New Mexico University of TexasAustin,TX78712Albuquerque,NM87131Austin,TX78712Katherine St.John Jerry Sun Tandy WarnowLehman College and Dept.of Computer Sciences Dept.of Computer Sciences The Graduate Center University of Texas University of TexasCity University of New York Austin,TX78712Austin,TX78712 New York,NY10468Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an accurate picture of the evolutionary history of a group of genomes.In particular,sequence-based reconstruction will play an important role,especially in resolving more recent events.But using sequences at the level of whole genomes means working with very large amounts of data—large numbers of sequences—as well as large phylogenetic distances,so that reconstruction methods must be both fast and robust as well as accurate.We study the accuracy,convergence rate,and speed of several fast reconstruction methods:neighbor-joining,Weighbor(a weighted version of neighbor-joining),greedy parsimony,and a new phylogenetic reconstruction method based on disk-covering and parsimony search(DCM-NJ+MP).Our study uses extensive simulations based on random birth-death trees,with controlled deviations from ultrametricity.Wefind that Weighbor, thanks to its sophisticated handling of probabilities,outperforms other methods for short sequences, while our new method is the best choice for sequence lengths above100.For very large sequence lengths,all four methods have similar accuracy,so that the speed of neighbor-joining and greedy parsimony makes them the two methods of choice.1IntroductionMost phylogenetic reconstruction methods are designed to be used on biomolecular (i.e.,DNA,RNA,or amino-acid)sequences.With the advent of gene maps for many organisms and complete sequences for smaller genomes,whole-genome approaches to phylogeny reconstruction are now being investigated.In order to produce accurate reconstructions for large collections of taxa,we will most likely need to combine both approaches—each has drawbacks not shared by the other.Because whole genomes will yield large numbers of sequences,the sequence-based algorithms will need to be very fast if they are to run within reasonable time bounds.They will also have to accommodate datasets that include very distant pairs of taxa.Many of the sequence-based reconstruction methods used by biologists(maximum likelihood,parsimony search,or quartet puzzling)are very slow and unlikely to scale up to the size of data generated in whole-genome studies.Faster methods exist(such as the popu-lar neighbor-joining method),but most suffer from accuracy problems,especially for datasets that include distant pairs.In this paper,we examine in detail the performance of four fast reconstruction methods,one of which we recently proposed(DCM-NJ+MP),and three others that have been used for at least a few years by biologists(neighbor-joining,Weighbor,and greedy parsimony).We ran extensive simulation studies using random birth-death trees(with deviations from ultrametricity),using about three months of computation on nearly300processors to conduct a thorough exploration of a rich parameter space. We used four principal parameters:model of evolution(Jukes-Cantor and Kimura 2-Parameter+Gamma),tree diameter(which indirectly captures rate of evolution), sequence length,and number of taxa.Wefind that Weighbor(for small sequence lengths)and our DCM-NJ+MP method(for longer sequences)are the methods of choice,although each is considerably slower than the other two methods in our study. Our data also enables us to report on the sequence-length requirements of the various methods—an important consideration,since biological sequences are offixed length. 2BackgroundMethods for inferring phylogenies are studied(both theoretically and empirically) with respect to the topological accuracy of the inferred trees.Such studies evaluate the effects of various model conditions(such as the sequence length,the rates of evolution on the tree,and the tree“shape”)on the performance of the methods.The sequence-length requirement of a method is the sequence length needed by the method in order to reconstruct the true tree topology with high probability.Ear-lier studies established analytical upper bounds on the sequence length requirements of various methods(including the popular neighbor-joining1method).These studies showed that standard methods,such as neighbor-joining,recover the true tree(with high probability)from sequences of lengths that are exponential in the evolutionary diameter of the true tree.Based upon these studies,we defined a parameterization of model trees in which the longest and shortest edge lengths arefixed2,3,so that the se-quence length requirement of a method can be expressed as a function of the number of taxa,n.This parameterization led us to define fast-converging methods,methods that recover the true tree(with high probability)from sequences of lengths bounded by a polynomial in n once f and g,the minimum and maximum edge lengths,are bounded.Several fast-converging methods were developed4,5,6,7.We and others analyzed the sequence length requirement of standard methods,such as neighbor-joining(NJ),under the assumptions that f and g arefixed.These studies8,3showed that neighbor-joining and many other methods can recover the true tree with high probability when given sequences of lengths bounded by a function that grows expo-nentially in n.We recently initiated studies on a different parameterization of the model tree space,where wefix the evolutionary diameter of the tree and let the number of taxa vary9.This parameterization,suggested to us by J.Huelsenbeck,allows us to examinethe differential performance of methods with respect to“taxon sampling”strategies 10.In this case,the shortest edges can be arbitrarily short,forcing the method to re-quire unboundedly long sequences in order to recover these shortest edges.Hence,the sequence-length requirements of methods cannot be bounded.However,for a natural class of model trees,which includes random birth-death trees,we can assume f=Θ(1/n).In this case even simple polynomial-time methods converge to the true tree from sequences whose lengths are bounded by a polynomial in n.Furthermore,the degrees of the polynomials bounding the convergence rates of neighbor-joining and the fast-converging methods are identical—they differ only with respect to the lead-ing constants.Therefore,with respect to this parameterization,there is no significant theoretical advantage between standard methods and the fast-converging methods.In a previous study9we evaluated NJ and DCM-NJ+MP with respect to their performance on simulated data,obtained on random birth-death trees with bounded deviation from ultrametricity.We found that DCM-NJ+MP dominated NJ throughout the parameter space we examined and that the difference increased as the deviation from ultrametricity or the number of taxa increased.In an unpublished study,Bruno et al.11compared Weighbor with NJ and BioNJ 12as a function of the length of the longest edge in the true tree,using random birth-death trees of50taxa,deviated from the molecular clock by multiplying each edge length by a random number drawn from an exponential distribution,and using the Jukes-Cantor(JC)model of evolution.They found that Weighbor outperformed the other methods for medium to large values of the longest edge,but was inferior to them for small values—afinding we can confirm only for larger numbers of taxa.At last year’s PSB,Bininda-Edmonds et al.13presented a study of Greedy Parsimony (which uses a single random sequence of addition and no branch swapping)in which they used very large random birth-death trees(up to10,000taxa),deviated from the molecular clock,and with sequences evolved under the Kimura2-parameter(K2P) model.Unsurprisingly,they found that scaling and accuracy are at odds:the lower the accuracy level,the better the sequence length scaling.3Basics3.1Model TreesThefirst step of every simulation study for phylogenetic reconstruction methods is to generate model trees.Sequences are then evolved down these trees,the leaf sequences are fed to the reconstruction methods under study,and the reconstructed trees com-pared to the original model tree.In this paper,we use random birth-death trees with n leaves as our underlying distribution.These trees have a natural length assigned to each edge—namely,the time t between the speciation event that began that edge and the event(which could be either speciation or extinction)that ended that edge—and thus are inherently ul-trametric.In all of our experiments we modified each edge length to deviate from this restriction,by multiplying each edge by a random number within a range[1/c,c], where we set c,the deviation factor,to be4.3.2Models of EvolutionWe use two models of sequence evolution:the Jukes-Cantor(JC)model14and the the Kimura2-Parameter+Gamma(K2P+Gamma)model15.In both models,a site evolves down the tree under the Markov assumption;in the JC model,all nucleotide substitutions(that are not the identity)are equally likely,so only one parameter is needed,whereas in the K2P model substitutions are partitioned into two classes(again other than identity):transitions,which substitute a purine(adenine or guanine)for a purine or a pyrimidine(cytosine or thymidine)for a pyrimidine;and transversions, which substitute a purine for a pyrimidine or vice versa.The K2P model has a pa-rameter which indicates the transition/transversion ratio.We set this ratio to2in our experiments.Under either model,each edge of the tree is assigned a valueλ(e),the expected number of times a random site on this edge will change its nucleotide.It is often assumed that the sites evolve identically and independently(i.i.d.) down the tree.However,we can also assume that the sites have different rates of evolution,drawn from a known distribution.One popular assumption is that the rates are drawn from a gamma distribution with shape parameterα,which is the inverse of the coefficient of variation of the substitution rate.We useα=1for our experiments under K2P+Gamma.3.3Phylogenetic Reconstruction MethodsNeighbor Joining.Neighbor-Joining(NJ)1is one of the most popular distance-based methods.NJ takes a distance matrix as input and outputs a tree.For every two taxa,it determines a score,based on the distance matrix.At each step,the algo-rithm joins the pair with the minimum score,making a subtree whose root replaces the two chosen taxa in the matrix.The distances are recalculated to this new node, and the“joining”is repeated until only three nodes remain.These are joined to form an unrooted binary tree.Weighted Neighbor Joining.Weighbor16,like NJ,joins two taxa in each iteration; the pairs of taxa are chosen based on a criterion that embodies a likelihood function on the distances,which are modeled as correlated Gaussian random variables with different means and variances,computed under a probabilistic model of sequence evolution.Then,the“joining”is repeated until only three nodes remain.These are joined to form an unrooted binary tree.DCM-NJ+MP.The DCM-NJ+MP method is a variant of a provably fast-converging method that has performed very well in previous studies17.In these simulation stud-ies,DCM-NJ+MP outperformed,in terms of topological accuracy,both the provably fast converging DCM∗-NJ(of which it is a variant)and NJ.We briefly describe thismethod now.Let d ij be the distance between taxa i and j .•Phase 1:For each q ∈{d ij },compute a binary tree T q ,by using the Disk-Covering Method 3,followed by a heuristic for re fining the resultant tree into a binary tree.Let T ={T q :q ∈{d ij }}.•Phase 2:Select the tree from T which optimizes the parsimony criterion.If we consider all n 2 thresholds in Phase 1,DCM-NJ+MP takes O (n 6)time,but,if we consider only a fixed number p of thresholds,it takes O (pn 4)time.In our experiments,we considered only 10thresholds,so that the running time of DCM-NJ+MP is O (n 4).Greedy Maximum Parsimony.The maximum parsimony method that we use in our study (and that was used by Bininda-Edmonds et al.13)is not,strictly speaking,a parsimony search:for the sake of speed,it uses no branch swapping at all and simply adds taxa to the tree one at a time following one random ordering of the taxa.3.4Measures of AccuracySince all the inferred trees are binary we use the Robinson-Foulds (RF)distance 18which is de fined as follows.Every edge e in a leaf-labeled tree T de fines a biparti-tion πe on the leaves (induced by the deletion of e ),and hence the tree T is uniquely encoded by the set C (T )={πe :e ∈E (T )},where E (T )is the set of all internal edges of T .If T is a model tree and T is the tree obtained by a phylogenetic recon-struction method,then the set of False Positives is C (T )−C (T )and the set of False Negatives is C (T )−C (T ).The RF distance is then the average of the number of false positives and the false negatives.We plot the RF rates in our figures,which are obtained by normalizing the RF distance by the number of internal edges in a fully resolved tree for the instance.Thus,the RF rate varies between 0and 1(or 0%and 100%).Rates below 5%are quite good,but rates above 20%are unacceptably large.4Our ExperimentsIn order to obtain statistically robust results,we followed the advice of 19,20and used a number of runs ,each composed of a number of trials (a trial is a single comparison),computed a mean outcome for each run,and studied the mean and standard deviation of these runs.We used 20runs in our experiments.The standard deviation of the mean outcomes in our studies varied,depending on the number of taxa:the standard deviation of the mean on 10-taxon trees is 0.2(which is 20%,since the possible values of the outcomes range from 0to 1),on 25-taxon trees is 0.1(which is 10%),whereas on 200and 400-taxon trees the standard deviation ranged from 0.01to 0.04(which is between 1%and 4%).We graph the average of the mean outcomes for the runs,but omit the standard deviation from the figures.We ran our studies on random birth-death trees generated using the r8s 21soft-ware package.These trees have diameter 2(height 1);in order to obtain trees withother diameters,we multiplied the edge lengths by factors of0.05,0.1,0.25and0.5, producing trees with diameters of0.1,0.2,0.5and1.0,respectively.To deviate these trees from ultrametricity,we set c,the deviation factor,to4(see Section3).The re-sulting trees have diameters at most4times the original diameters,and have expected diameters of0.2,0.4,1.0and2.0.We generated such random model trees with10, 25,50,100,200,and400leaves,20trees for each combination of diameter and num-ber of taxa.We then evolved sequences on these trees using two models of evolution, JC and K2P+Gamma(we choseα=1for the shape parameter and set the transi-tion/transversion ratio to2).We used afix factor22of1for distance correction.The sequence lengths that we studied are50,100,250,500,1000and2000.We used the program Seq-Gen23to generate a DNA sequence for the root and evolve it through the tree under the JC and the K2P+Gamma models of evolution. The software for DCM-NJ was written by Daniel Huson.We used PAUP*4.024for the greedy MP method,and the Weighbor1.2software package16.The experiments were run over a period of three months on about300different processors,all Pentiums running Linux,including the128-processor SCOUT cluster at UT-Austin.To generate the graphs that depict the scaling of accuracy,we linearly interpolated the sequence lengths required to achieve certain accuracy levels forfixed numbers of taxa,and then,using the interpolation,computed the sequence length,as a function of the number of taxa,that are required to achievefixed specific accuracy levels(ones that are of interest).5Results and Discussion5.1SpeedBecause we are studying methods that will scale to large datasets(large numbers of taxa and long sequences),speed is of prime importance.Table1shows the running time of our various methods on different instances.Note the very high speed and nearly perfect linear scaling of Greedy Parsimony.NJ is known to scale with the cube of the number of taxa;in our experiments,it scales slightly better than that.DCM-Table1:The running times of NJ,DCM-NJ+MP,Weighbor,and Greedy MP(in seconds)forfixed sequencelength(500)and diameter(0.4)Taxa NJ DCM-NJ+MP Weighbor Greedy MP100.01 1.820.030.01250.029.120.370.02500.0621.3 3.560.051000.3764.2544.930.10200 2.6470.31352.480.2540020.135432.464077.810.73NJ+MP scales exactly as NJ,but runs approximately200times more slowly.Finally, Weighbor scales somewhat more poorly—thefigures in the table indicate scaling that is supercubic.Thesefigures make it clear that most reasonable datasets(up to a few thousand taxa)can be processed by any of these methods—especially with the help of cluster computing,but also that very large datasets(10,000taxa or more)will prove too costly for Weighbor and perhaps also DCM-NJ+MP(at least in their current implementations).5.2Sequence-Length RequirementsWe can sort our experimental data in terms of accuracy and,for all datasets on which an accuracy threshold is met,count,for eachfixed number of taxa,the number of datasets with a given sequence length,thereby enabling us to plot the average se-quence length needed to guarantee a given maximal error rate.We show such plots for two accuracy values in Figure1:70%and85%.Larger values of accuracy cannot(a)70%accuracy(b)85%accuracyFigure1:Sequence length requirements under the K2P+Gamma model as a function of the number of taxa be plotted reliably,since they are rarely reached under our challenging experimental conditions.The striking feature in these plots is the difference between the two NJ-based methods(NJ and Weighbor)and the methods using parsimony(DCM-NJ+MP and Greedy Parsimony):as the number of taxa increases,the former require longer and longer sequences,growing linearly or worse,while the latter exhibit only modest growth.The divide-and-conquer strategy of DCM-NJ+MP pays off by letting its NJ component work only on significantly smaller subsets of taxa—effectively shifting the graph to the left—and completing the work with a method(parsimony)that is evidently much less demanding in terms of sequence lengths.Note that the curves are steeper for the higher accuracy requirement:as the accuracy keeps increasing,we expect to see supralinear,indeed possibly exponential,scaling.5.3AccuracyWe studied accuracy (in terms of the RF rate)as a function of the number of taxa,the sequence length,and the diameter of the model tree,varying one of these parameters at a time.Because accuracy varies drastically as a function of the sequence length and the number of taxa,the plots given in this section have different vertical scales.For fixed sequence lengths and fixed diameters,we find,unsurprisingly,that the error rate of all methods increases as the number of taxa increases,although the in-crease is very slow (see Figures 2and 3,but note the logarithmic scaling on the x -axis).Weighbor indeed outperforms NJ,but DCM-NJ+MP outperforms the other 25.050.0100.0200.0400.0Taxa 0.20.30.40.60.70.8A v g R F10.0NJDCM −NJ+MPWeighborMP 25.050.0100.0200.0400.0Taxa 0.20.30.40.60.70.8A v g R F10.0NJ DCM −NJ+MP Weighbor MP(a)sequence length 50(b)sequence length 100Figure 2:Accuracy as a function of the number of taxa under the K2P+Gamma model for expected diameter(0.4)and two sequence lengthsthree methods,especially for larger trees—unless the sequences are very short,in which case Weighbor dominates.If we vary sequence length for a fixed number of taxa and fixed tree diameter,we find that the error rate decreases exponentially with the sequence length (Figure 4).From this perspective as well,DCM-NJ+MP dominates the other methods,more ob-viously so for larger trees.Interestingly,NJ is the worst method across almost the entire parameter space.Finally,if we vary the diameter (which varies the rate of evolution)for a fixed number of taxa and a fixed sequence length,we find an initial increase in accu-racy (due to the disappearance of zero-length edges),followed by a de finite decrease (Figure 5).The decrease in accuracy is steeper with increasing diameter than what we observed with increasing number of taxa—and continually steepens.(At larger diameters—not shown,as we approach saturation,the error rate approaches unity.)25.050.0100.0200.0400.0Taxa 0.10.20.30.30.40.5A v g R F10.0NJDCM −NJ+MPWeighborMP 25.050.0100.0200.0400.0Taxa 0.10.20.30.30.40.5A v g R F10.0NJ DCM −NJ+MP Weighbor MP(a)sequence length 500(b)sequence length 1000Figure 3:Accuracy as a function of the number of taxa under the K2P+Gamma model for expected diameter(2.0)and two sequencelengths(a)200taxa (a)400taxaFigure 4:Accuracy as a function of the sequence length under the K2P+Gamma model for expecteddiameter (2.0)and two numbers of taxaThe dominance of DCM-NJ+MP is once again paring NJ and Weighbor,we can see that NJ is actually marginally better than Weighbor at low diameters,but Weighbor clearly dominates it at higher diameters—the two slopes are quite distinct.(a)100taxa(a)400taxaFigure5:Accuracy as a function of the diameter under the K2P+Gamma model forfixed sequence length(500)and two numbers of taxa5.4The Influence of the Model of Sequence EvolutionWe reported all results so far under the K2P+Gamma model only,due to space lim-itations.However,we explored performance under the JC(Jukes-Cantor)model as well.The relative performance of the methods we studied was the same under the JC model as under the K2P+Gamma model.However,throughout the experi-ments,the error rate of the methods was lower under the JC model(using the JC distance-correction formulas)than under the K2P+Gamma model of evolution(us-ing the K2P+Gamma distance-correction formulas).This might be expected for the Weighbor method,which is optimized for the JC model,but is not as easily explained for the other methods.Figure6shows the error rate of NJ on trees of diameter0.4 under the two models of evolution.NJ clearly does better under the JC model than under the K2P+Gamma model;other methods result in similar curves.Correlating the decrease in performance with specific features in the model is a challenge,but the results clearly indicate that experimentation with various models of evolution(beyond the simple JC model)is an important requirement in any study.6ConclusionIn earlier studies we presented the DCM-NJ+MP method and showed that it outper-formed the NJ method for random trees drawn from the uniform distribution on tree topologies and branch lengths as well as for trees drawn from a more biologically re-alistic distribution,in which the trees are birth-death trees with a moderate deviation from ultrametricity.Here we have extended our result to include the Weighbor and102550100200400Figure6:Accuracy of NJ as a function of the number of taxa under JC and K2P+Gamma Greedy Parsimony methods.Our results confirm that the accuracy of the NJ method may suffer significantly on large datasets.They also indicate that Greedy Parsimony, while very fast,has mediocre to poor accuracy,while Weighbor and DCM-NJ+MP consistently return good trees,with Weighbor doing better on shorter sequences and DCM-NJ+MP doing better on longer sequences.Among interesting questions that arise are:(i)is there a way to conduct a partial parsimony search that scales no worse than quadratically(and might outperform DCM-NJ+MP)?(ii)would a DCM-Weighbor+MP prove a worthwhile tradeoff?(iii)can we make quantitative statements about the accuracy achievable by any method(not just one of those under study)as a function of some of the model parameters?7AcknowledgmentsThis work was supported in part by the National Science Foundation with grants EIA 99-85991to T.Warnow and ACI00-81404to B.M.E.Moret and a POWRE award to K.St.John;by the Texas Institute for Computational and Applied Mathematics and the Center for Computational Biology at UT-Austin(K.St.John);and by the David and Lucile Packard Foundation(T.Warnow).References1.N.Saitou and M.Nei.The neighbor-joining method:A new method for reconstructing phylo-genetic trees.Mol.Biol.Evol.,4:406–425,1987.2.P.L.Erdos,M.Steel,L.Sz´e k´e ly,and T.Warnow.A few logs suffice to build almost all trees–I.Random Structures and Algorithms,14:153–184,1997.3.P.L.Erdos,M.Steel,L.Sz´e k´e ly,and T.Warnow.A few logs suffice to build almost all trees–p.Sci.,221:77–118,1999.4.M.Cs˝u r¨o s.Fast recovery of evolutionary trees with thousands of nodes.RECOMB01,2001.5.M.Cs˝u r¨o s and M.Y.Kao.Recovering evolutionary trees through harmonic greedy triplets.Proc.10th ACM-SIAM Symp.on Discrete Algorithms(SODA99),pages261–270,1999.6. D.Huson,tles,and T.Warnow.Disk-covering,a fast-converging method for phyloge-netic tree put.Biol.,6:369–386,1999.7.T.Warnow,B.Moret,and K.St.John.Absolute convergence:true trees from short sequences.Proc.12th ACM-SIAM Symp.on Discrete Algorithms(SODA01),pages186–195,2001.8.K.Atteson.The performance of the neighbor-joining methods of phylogenetic reconstruction.Algorithmica,25:251–278,1999.9.L.Nakhleh,U.Roshan,K.St.John,J.Sun,and T.Warnow.The performance of phylogeneticmethods on trees of bounded diameter.In Proc.1st Workshop on Algorithms in Bioinformatics (WABI01),pages214–226,Aarhus(2001).in LNCS2149.10.J.Huelsenbeck and D.Hillis.Success of phylogenetic methods in the four-taxon case.Syst.Biol.,42:247–264,1993.11./billb/weighbor/performance.html.12.O.Gascuel.BIONJ:an improved version of the NJ algorithm based on a simple model ofsequence data.Mol.Biol.Evol.,14:685–695,1997.13.O.Bininda-Emonds,S.Brady,J.Kim,and M.Sanderson.Scaling of accuracy in extremelylarge phylogenetic trees.In Proc.6th Pacific Symp.on Biocomputing(PSB01),pages547–557.World Scientific,2001.14.T.Jukes and C.Cantor.Evolution of protein molecules.In H.N.Munro,editor,MammalianProtein Metabolism,pages21–132.Academic Press,NY,1969.15.M.Kimura.A simple method for estimating evolutionary rates of base substitutions throughcomparative studies of nucleotide sequences.J.Mol.Evol.,16:111–120,1980.16.W.J.Bruno,N.Socci,and A.L.Halpern.Weighted neighbor joining:A likelihood-basedapproach to distance-based phylogeny reconstruction.Mol.Biol.Evol.,17(1):189–197,2000.17.L.Nakhleh,U.Roshan,K.St.John,J.Sun,and T.Warnow.Designing fast convergingphylogenetic methods.In Proc.9th Int’l Conf.on Intelligent Systems for Mol.Biol.(ISMB01),2001.In Bioinformatics17:S190–S198.18. D.F.Robinson and parison of phylogenetic trees.Mathematical Bio-sciences,53:131–147,1981.19. B.M.E.Moret and H.D.Shapiro.Algorithms and experiments:the new(and the old)put.Sci.,7(5):434–446,2001.20. C.McGeoch.Analyzing algorithms by simulation:variance reduction techniques and simula-tion speedups.ACM Comp.Surveys,24:195–212,1992.21.M.Sanderson.r8s software package.Available from /r8s/r8s.html.22. D.Huson,K.A.Smith,and T.Warnow.Correcting large distances for phylogenetic recon-struction.In Proc.3rd Workshop on Algorithms Engineering(WAE99),1999.London, England.23. A.Rambaut and N.C.Grassly.Seq-gen:An application for the Monte Carlo simulation of dnasequence evolution along phylogenetic p.Appl.Biosci.,13:235–238,1997.24. D.L.Swofford.PAUP*:Phylogenetic analysis using parsimony(and other methods),1996.Sinauer Associates,Underland,Massachusetts,Version4.0.。