deep sequencing
- 格式:ppt
- 大小:1.31 MB
- 文档页数:56
高通量测序基础知识汇总一代测序技术:即传统的Sanger测序法,Sanger法是根据核苷酸在待定序列模板上的引物点开始,随机在某一个特定的碱基处终止,并且在每个碱基后面进行荧光标记,产生以A、T、C、G结束的四组不同长度的一系列核苷酸,每一次序列测定由一套四个单独的反应构成,每个反应含有所有四种脱氧核苷酸三磷酸(dNTP),并混入限量的一种不同的双脱氧核苷三磷酸(ddNTP)。
由于ddNTP缺乏延伸所需要的3-OH 基团,使延长的寡聚核苷酸选择性地在G、A、T或C处终止,使反应得到一组长几百至几千碱基的链终止产物。
它们具有共同的起始点,但终止在不同的的核苷酸上,可通过高分辨率变性凝胶电泳分离大小不同的片段,通过检测得到DNA碱基序列。
二代测序技术:next generation sequencing(NGS)又称为高通量测序技术,与传统测序相比,二代测序技术可以一次对几十万到几百万条核酸分子同时进行序列测定,从而使得对一个物种的转录组和基因组进行细致全貌的分析成为可能,所以又被称为深度测序(Deep sequencing)。
NGS主要的平台有Roche(454 & 454+),Illumina(HiSeq 2000/2500、GA IIx、MiSeq),ABI SOLiD等。
基因:Gene,是遗传的物质基础,是DNA或RNA分子上具有遗传信息的特定核苷酸序列。
基因通过复制把遗传信息传递给下一代,使后代出现与亲代相似的性状。
DNA:Deoxyribonucleic acid,脱氧核糖核酸,一个脱氧核苷酸分子由三部分组成:含氮碱基、脱氧核糖、磷酸。
脱氧核糖核酸通过3',5'-磷酸二酯键按一定的顺序彼此相连构成长链,即DNA链,DNA链上特定的核苷酸序列包含有生物的遗传信息,是绝大部分生物遗传信息的载体。
RNA:Ribonucleic Acid,,核糖核酸,一个核糖核苷酸分子由碱基,核糖和磷酸构成。
deep sequencing原理下载温馨提示:该文档是我店铺精心编制而成,希望大家下载以后,能够帮助大家解决实际的问题。
文档下载后可定制随意修改,请根据实际需要进行相应的调整和使用,谢谢!并且,本店铺为大家提供各种各样类型的实用资料,如教育随笔、日记赏析、句子摘抄、古诗大全、经典美文、话题作文、工作总结、词语解析、文案摘录、其他资料等等,如想了解不同资料格式和写法,敬请关注!Download tips: This document is carefully compiled by the editor. I hope that after you download them, they can help you solve practical problems. The document can be customized and modified after downloading, please adjust and use it according to actual needs, thank you!In addition, our shop provides you with various types of practical materials, such as educational essays, diary appreciation, sentence excerpts, ancient poems, classic articles, topic composition, work summary, word parsing, copy excerpts, other materials and so on, want to know different data formats and writing methods, please pay attention!深度测序是一种高通量测序技术,广泛应用于基因组学、转录组学、表观基因组学等领域。
高通量测序技术综述姓名:舒云康学号:200830010416班级:农学院生物技术1班2011. 06. 01摘要:高通量测序技术(High-throughput sequencing)又称“下一代”测序技术("Next-generation" sequencing technology),以能一次并行对几十万到几百万条DNA分子进行序列测定和一般读长较短等为标志。
同时,高通量测序使得对一个物种的转录组和基因组进行细致全貌的分析成为可能,所以又被称为深度测序(deepsequencing) 关键字:高通量测序深度测序新一代测序应用正文:一、测序技术发展现状根据发展历史、影响力、测序原理和技术不同等,主要有以下几种:大规模平行签名测序(Massively Parallel Signature Sequencing, MPSS)、聚合酶克隆(Polony Sequencing)、454焦磷酸测序(454 pyrosequencing)、Illumina(Solexa) sequencing、ABI SOLiD sequencing、离子半导体测序(Ion semiconductor sequencing)、DNA纳米球测序(DNA nanoball sequencing)等。
随着第二代测序技术的迅猛发展,科学界也开始越来越多地应用第二代测序技术来解决生物学问题。
比如在基因组水平上对还没有参考序列的物种进行重头测序(de novo sequencing),获得该物种的参考序列,为后续研究和分子育种奠定基础;对有参考序列的物种,进行全基因组重测序(resequencing),在全基因组水平上扫描并检测突变位点,发现个体差异的分子基础。
在转录组水平上进行全转录组测序(whole transcriptome resequencing),从而开展可变剪接、编码序列单核苷酸多态性(cSNP)等研究;或者进行小分子RNA测序(smallRNA sequencing),通过分离特定大小的RNA分子进行测序,从而发现新的microRNA分子。
二代测序pooling原理
二代测序(Next Generation Sequencing, NGS)中的pooling
是指将多个样本的DNA混合在一起进行测序的过程。
这样做的主要
目的是为了提高测序效率、降低成本,并且可以同时对多个样本进
行测序分析。
首先,让我们从测序效率方面来看pooling的原理。
在实际操
作中,如果每个样本单独进行测序,会消耗大量的测序试剂和时间。
而通过将多个样本的DNA混合在一起后再进行测序,可以将这些样
本的测序数据同时生成,从而提高了测序效率。
这种高通量测序的
方式可以节约时间和成本,特别适用于大规模的基因组学研究和临
床检测。
其次,从成本方面来看,pooling的原理也能够降低测序成本。
因为在进行混合测序时,可以减少测序试剂的使用量,同时减少了
测序仪器的运行时间,从而降低了每个样本的测序成本。
另外,从数据分析的角度来看,测序后的数据需要进行分析和
解读。
在进行数据分析时,需要注意将混合测序后的数据进行解混,即将数据还原到各自的样本中。
这需要利用生物信息学的方法对数
据进行分离和比对,以确保每个样本的数据都能够被正确地分析和
解读。
因此,在进行pooling测序时,需要特别注意数据分析的方
法和技术,以确保数据的准确性和可靠性。
总的来说,二代测序中的pooling原理是通过将多个样本的
DNA混合在一起进行测序,以提高测序效率、降低成本,并且可以
同时对多个样本进行测序分析。
然而,在进行pooling测序时,需
要特别注意数据分析的方法和技术,以确保数据的准确性和可靠性。
名词解释一、生物学名称解释1. 什么是高通量测序技术?高通量测序技术(High-throughput sequencing,HTS)是对传统Sanger测序(称为一代测序技术)革命性的改变, 一次对几十万到几百万条核酸分子进行序列测定, 因此在有些文献中称其为下一代测序技术(next generation sequencing,NGS )足见其划时代的改变, 同时高通量测序使得对一个物种的转录组和基因组进行细致全貌的分析成为可能, 所以又被称为深度测序(Deep sequencing)。
2. 什么是Sanger法测序(一代测序)?Sanger法测序利用一种DNA聚合酶来延伸结合在待定序列模板上的引物。
直到掺入一种链终止核苷酸为止。
每一次序列测定由一套四个单独的反应构成,每个反应含有所有四种脱氧核苷酸三磷酸(dNTP),并混入限量的一种不同的双脱氧核苷三磷酸(ddNTP)。
由于ddNTP缺乏延伸所需要的3-OH基团,使延长的寡聚核苷酸选择性地在G、A、T或C处终止。
终止点由反应中相应的双脱氧而定。
每一种dNTPs和ddNTPs的相对浓度可以调整,使反应得到一组长几百至几千碱基的链终止产物。
它们具有共同的起始点,但终止在不同的的核苷酸上,可通过高分辨率变性凝胶电泳分离大小不同的片段,凝胶处理后可用X-光胶片放射自显影或非同位素标记进行检测。
3. 什么是SNP、SNV(单核苷酸位点变异)?单核苷酸多态性(single nucleotide polymorphism,SNP)和单核苷酸位点变异(single nucleotide variants, SNV)。
个体间基因组DNA序列同一位置单个核苷酸变异(替代、插入或缺失)所引起的多态性。
不同物种、个体基因组DNA序列同一位置上的单个核苷酸存在差别的现象。
有这种差别的基因座、DNA序列等可作为基因组作图的标志。
人基因组上平均约每1000个核苷酸即可能出现1个单核苷酸多态性的变化,其中有些单核苷酸多态性可能与疾病有关,但可能大多数与疾病无关。
植物病理学报ACTAPHYTOPATHOLOGICASINICA㊀45(1):88 ̄92(2015)收稿日期:2014 ̄03 ̄01ꎻ修回日期:2014 ̄10 ̄09基金项目:质检公益性行业科研专项项目(201310068)ꎻ浙江省重中之重林学一级学科开放基金项目(KF201330)ꎻ浙江农林大学科研发展基金项目(2013FK019)通讯作者:周雪平ꎬ教授ꎬ主要从事植物病毒学研究ꎻE ̄mail:zzhou@zju.edu.cnꎮdoi:10.13926/j.cnki.apps.2015.01.013研究简报利用小RNA深度测序和组装技术鉴定紫藤花叶病病原苏秀1ꎬ2ꎬ徐毅1ꎬ陈莎1ꎬ傅帅1ꎬ钱亚娟1ꎬ张立钦2ꎬ周雪平1∗(1浙江大学生物技术研究所ꎬ杭州310058ꎻ2浙江农林大学亚热带森林培育国家重点实验室培育基地ꎬ临安311300)DetectionofvirusesinfectingWisteriasinensisbydeepsequencingandassemblyofsmallRNA㊀SUXiu1ꎬ2ꎬXUYi1ꎬCHENSha1ꎬFUShuai1ꎬQIANYa ̄juan1ꎬZHANGLi ̄qin2ꎬZHOUXue ̄ping1㊀(1InstituteofBiotechnologyꎬZhejiangUniversityꎬHangzhou310058ꎬChinaꎻ2TheNurturingStationfortheStateKeyLaboratoryofSubtropicalSilvicultureꎬZhejiangAgricultureandForestryUniversityꎬLin an311300ꎬChina)Abstract:PlantdefenseagainstvirusesthroughsmallRNA(sRNA)mediatedRNAinterferencemechanism.AnalysisofvirusderivedsRNAprofilesinplantcanbeappliedfordenovoassemblyofvirusgenomesandvirusidentification.Inthisstudyꎬsuspectedvirus ̄infectedWisteriasinensissamplescollectedfromZijingangCampusofZhejiangUniversitywereusedforsRNAlibraryconstructionanddeepsequencing.AfterassemblyoftotalsRNAsꎬitwasfoundthatW.sinensisleaveswereinfectedbyWisteriaveinmosaicvirus(WVMV).Thelibrarygenerated18.9millionsRNAreadsꎬofwhich0.32millionwereWVMV ̄derivedsRNAs.Usingdenovoassemblyꎬ23.3%offulllengthgenomenucleotidesequenceofapreviouslyreportedpotyvirusWVMVwasobtained.ToconfirmtheexistenceofWVMVinthesamplesꎬWVMVcoatprotein(CP)genesequencewasobtainedbyRT ̄PCRꎬandensuredbySangersequencing.TakentogetherꎬthedatasuggestthatsRNAdeepsequencingtechnologyisanefficientandpowerfulgenetictoolforvirusidentificationinwoodyplants.Keywords:smallRNAꎻWisteriaveinmosaicvirusꎻdeepsequencing文章编号:0412 ̄0914(2015)01 ̄0088 ̄05㊀㊀RNA沉默(RNAsilencing)是一种在真核生物体内普遍保守的基于核酸序列特异性抑制基因表达的调控机制[1]ꎮ2009年Kreuze等[2]发现病毒特异的小RNA(smallRNAꎬsRNA)在序列上是重叠的ꎬ因此推测通过深度测序技术获得的大量sRNA序列能用来组装病毒的基因组并用来鉴定和发现新病毒ꎮ利用sRNA深度测序技术已在作物和昆虫上鉴定发现多种病毒[3㊁4]ꎬ但在木本植物上还未见报道ꎮ㊀㊀紫藤(Wisteriasinensis)是城市园林绿化美化的主要植物ꎬ在公园㊁校园㊁庭院等地普遍种植ꎮ由紫藤花叶病毒引起的紫藤花叶病在捷克㊁意大利㊁荷兰㊁美国㊁波兰㊁德国等都有发生ꎬ已成为一种世界性的病害ꎮ紫藤花叶病主要表现花叶㊁斑驳㊁黄化㊁脉明等症状ꎬ感病紫藤开花能力明显下降ꎬ严重影响其观赏性和经济价值ꎮ2006年Fan等[5]报道了紫藤脉花叶病毒北京分离物(WVMV ̄BJ)的全序列ꎬ并证实WVMV ̄BJ是Potyvirus中的一种新病毒ꎮ本文利用深度测序技术对采自浙江的紫藤花叶病病原进行了鉴定ꎮ1㊀材料与方法1.1㊀材料来源㊀㊀表现花叶症状的紫藤病叶采自浙江大学紫金㊀㊀1期苏秀ꎬ等:利用小RNA深度测序和组装技术鉴定紫藤花叶病病原港校区校园内ꎮ感病紫藤叶片表现褪绿㊁花叶㊁斑驳㊁黄化㊁脉明㊁叶片变小㊁卷曲等症状ꎬ并出现小的星状斑ꎬ褪绿部分生长较慢ꎬ致使叶片畸形(图1)ꎮFig.1㊀Wisteriasinensisleavesshowingchloroticspotsꎬblotchesandleafdistortion1.2㊀植物总RNA提取和sRNA纯化㊀㊀采集的样品用TiangenmiRcutemiRNA提取分离试剂盒提取总RNAꎬ操作步骤参照说明书进行ꎬ然后运用醋酸锂和聚乙二醇法(LiAC/PEG)分离其中的sRNAꎬ经15%的PAGE分离切割18~28nt的sRNAꎮ1.3㊀sRNA的Solexa深度测序㊀㊀上述sRNA样品送至上海美吉生物医药科技有限公司进行测序ꎮ测序流程:运用TaKaRasmallRNAcloningkit(DRR065)将分离的18~28nt的sRNA分别在3ᶄ端和5ᶄ端加上接头ꎻ随机引物反转录获得cDNA第一链ꎻPCR富集ꎻ产物回收(6%NovexTBEPAGEgelꎬ1.0mmꎬ10well)ꎻTBS380(Picogreen)定量ꎬ按数据比例混合上机ꎻcBot上桥式扩增ꎬ生成clustersꎻ运用IllumiaSolexa的Hiseq2000测序平台ꎬ进行1ˑ50bp测序试验ꎮ1.4㊀Solexa测序数据的预处理㊀㊀运用生物信息学手段对原始数据进行处理:去接头序列ꎬ去污染序列ꎬ去低质量碱基ꎬ去未插入3ᶄ接头㊁5ᶄ接头的readsꎬ获得不含接头序列的sRNA序列ꎬ在此基础上筛选出18~28nt的sRNA序列ꎮ1.5㊀测序数据的序列拼接㊁BLAST分析及病毒相关序列的筛查㊀㊀用Velvet软件对上述18~28ntsRNA进行序列拼接ꎬ得到的contigs用BLAST(BasicLocalAlignmentSearchTool)进行比对并注释ꎬ经过BLAST同源比对筛查到与紫藤脉花叶病毒北京分离物(WVMV ̄BJꎬGenBank登录号AY686816)同源ꎮ以WVMV ̄BJ为参考基因组ꎬ应用生物信息手段比对得到病毒来源的sRNAꎬ并对产生sRNA的热点区进行统计ꎮ1.6㊀RT ̄PCR扩增㊁克隆及序列分析㊀㊀提取植物总RNAꎬ采用TaKaRa公司的RNAReversePCRKit(AMV)反转录成cDNAꎬ以cD ̄NA为模板ꎬ利用根据拼接到的序列设计的特异性引物和Phusion超保真PCR试剂盒(NEB公司)进行PCR扩增ꎬPCR产物纯化后连接到PZeroBack载体(Tiangen)ꎬ并转化E.coli菌株DH5α感受态细胞ꎬ涂布于含氨苄青霉素的LB平板上ꎬ培养过夜后挑选单菌落ꎬ用Taq酶PCR扩增鉴定阳性克隆ꎬ随机挑取2个阳性克隆送Invitrogen公司进行测序ꎬ获得的序列利用BLAST进行比较分析ꎮ2㊀结果2.1㊀感病紫藤叶片sRNAs高通量测序结果㊀㊀感病紫藤叶片经总RNA提取ꎬsRNA分离㊁纯化和Solexa测序后得到18902046个readsꎬ经过Solexapipeline加工后ꎬ得到介于18~28nt之间的reads为4673447个ꎬ占总sRNAreads数量的24.72%ꎮ2.2㊀WVMV来源sRNAs(vsiRNAs)数据分析㊀㊀以WVMV ̄BJ为参考基因组ꎬ将上述经过筛选的4673447个reads进行本地BLASTt分析ꎬ在不包含重复序列的情况下ꎬ获得与参考基因组完全匹配的reads共9868个㊁允许1个错配的reads共45510个㊁允许2个错配的reads共113932个ꎮ重点分析了允许1个错配的情况ꎮ一共有315123个reads(包含重复序列)与WVMV ̄BJ匹配ꎬ其中20~24ntvsiRNA数量分别为7410㊁195697㊁99536㊁2950和1208个ꎬ其他长度的vsiRNA数98㊀植物病理学报45卷量为8402个ꎮ有1794681个来自正义链ꎬ135655个来自负义链ꎮ㊀㊀通过对不同长度的vsiRNA的读数百分比分析(图2)ꎬ可以看出21nt和22nt大小的sRNA占主要部分ꎮ不同长度sRNA占总数的百分比分别为:20nt2.35%㊁21nt62.10%㊁22nt31.59%㊁23nt0.94%㊁24nt0.38%ꎬ其他长度占2.67%ꎮ由此可见ꎬvsiRNA以21nt和22nt大小为主ꎬ可以推测紫藤的DCL4和DCL2在抗病毒中起了主要的作用ꎮFig.2㊀SizedistributionofWVMV ̄derivedsRNAs㊀㊀将vsiRNAs根据来自WVMV的正义链还是负义链进行分析ꎬ发现来自正义链的(57%)略高于来自负链的(43%)ꎮWVMV属于单链正义RNA病毒ꎬ其基因组正义链含量远高于互补链ꎬ而产生的sRNA比例比较接近ꎬ揭示WVMV复制过程中产生的双链RNA中间体可能是sRNA产生的主要来源ꎮ㊀㊀通过对WVMV来源sRNAs的5ᶄ端起始核苷酸碱基分析(图3)发现ꎬ总的WVMV来源的sRNAs中ꎬ5ᶄ端起始核苷酸碱基以 U ㊁ G ㊁ C ㊁ A 开头的比例分别是28.96%㊁20.29%㊁19.03%和31.70%ꎬ以 A 或 U 开头的sRNAs要比以 G 或 C 开头的多ꎬ并且在不同长度的sRNAs中也遵循这样的规律ꎮ这与受侵染拟南芥植株中TMV ̄Cg来源的21ntsRNAs的5ᶄ端起始核苷酸碱基分布一致ꎮ研究证实ꎬ拟南芥中不同的AGO蛋白在招募内源sRNA时ꎬ对其5ᶄ端起始核苷酸碱基具有不同的偏好性ꎻ然而ꎬ在紫藤与病毒的互作过程中ꎬAGO蛋白在招募sRNAs时是否也遵循同样的规律还需进一步的研究ꎮ㊀㊀利用基于Perl语言脚本的程序分析了WVMV来源sRNA在寄主中的热点分布(图4)ꎮ从紫藤叶片分离到的来源于WVMV的sRNA特异序列几乎覆盖了该病毒的全基因组ꎬ但在某些特定的区域ꎬ也称为热点(hotspots)区ꎬ各个sRNA序列出现的频率比较集中ꎮ在WVMV基因组的2600~2700㊁2900~3050㊁5900~6000㊁7800~7900以及9200~9500位置sRNA出现的频率较高ꎬ尤其是在9300~9390区域ꎬsRNA总量达到4230个ꎬ并且大部分来自病毒的正义链ꎬ说明该位点可能存在典型的RNA双链结构ꎮFig.3㊀WVMV ̄derivedsRNAs5ᶄterminalnucleotidepreference09㊀㊀1期苏秀ꎬ等:利用小RNA深度测序和组装技术鉴定紫藤花叶病病原Fig.4㊀PolaritydistributionofWVMV ̄derivedsRNAsFig.5㊀PositionanddistributionofWVMVsRNAcontigsThedifferentcolorsofcontigsrepresentthesequencehomologywithreferenceWVMV ̄BJgenome.Redmeanshighesthomologyꎬfollowedbypinkandgreen.2.3㊀WVMV的RT ̄PCR验证㊀㊀用velvet软件对18~28ntsRNA进行序列组装拼接ꎬ共得到894条contigsꎬ共有23条序列与已知的WVMV ̄BJ序列匹配ꎬ总长2254ntꎬ占WVMV ̄BJ基因组全长(9695nt)的23.3%(图5)ꎮ根据拼接到的序列设计特异引物ꎬ用RT ̄PCR方法ꎬ从测序样品的总RNA中克隆到971bp的片段ꎬ经测序及同源比对分析ꎬ与WVMV ̄BJ病毒序列相似性为87%ꎬ这段序列编码WVMV的CP基因(GenBank登录号KJ836282)ꎮ根据ICTV对马铃薯Y病毒属病毒命名的规定ꎬ此分离物与已知的WVMV ̄BJ是同一种病毒ꎮ3㊀讨论㊀㊀传统的植物病毒检测需要对病毒进行纯化和分离ꎬ对样品纯化要求较高ꎬ且需要对病原的生物学特性㊁理化特性㊁基因组特性㊁血清学特性等有预先的了解ꎬ对于未知病原ꎬ这些检测方法的使用就受到了极大限制ꎬ需要较长的研究周期和繁琐的研究过程ꎮ木本植物上的病毒往往含量比较低ꎬ且很难通过摩擦接种的方式进行人工接种ꎬ因此ꎬ传统方法很难检测木本植物病毒ꎬ相关的研究报道很少ꎮ㊀㊀已报道的紫藤花叶病毒北京分离物病原的鉴定是在酶联免疫吸附试验(ELISA)基础上ꎬ通过7次逆转录 ̄聚合酶链反应(RT ̄PCR)ꎬ并结合5ᶄRACE等方法得到的[5]ꎮ这些试验方法工作量很大ꎬ耗时长ꎬ较难用于紫藤等木本植物的病毒检测与鉴定ꎮ本文利用小RNA深度测序和组装技术ꎬ将分离自浙江的紫藤花叶病病原鉴定为WVMVꎮ深度测序技术的发展开辟了大规模快速诊断植物病毒的途径ꎬsRNA深度测序已在许多植物的未知病原鉴定中发挥了重要作用ꎮ利用植物体内的19㊀植物病理学报45卷sRNA病毒序列ꎬ大大提高了筛查木本植物病毒的效率ꎮ伴随着深度测序成本的降低ꎬsRNA深度测序将成为一种经济有效的可用于木本植物病毒鉴定的方法ꎬ值得推广使用ꎮ参考文献[1]㊀WaterhousePM.Genesilencingasanadaptivedefenseagainstviruses[J].Natureꎬ2001ꎬ411:834-842.[2]㊀KreuzeJFꎬPerezAꎬUntiverosMꎬetal.CompleteviralgenomesequenceanddiscoveryofnovelvirusesbydeepsequencingofsmallRNAs:Agenericmethodfordiagnosisꎬdiscoveryandsequencingofviruses[J].Virologyꎬ2009ꎬ388(1):1-7.[3]㊀WuQꎬLuoYꎬLuRꎬetal.Virusdiscoverybydeepsequencingandassemblyofvirus ̄derivedsmallsilen ̄cingRNAs[J].PNASꎬ2010ꎬ107(4):1606-1611.[4]㊀XuYꎬHuangLꎬWangZꎬetal.IdentificationofHimetobiPvirusinthesmallbrownplanthopperbydeepsequencingandassemblyofvirus ̄derivedsmallinterferingRNAs[J].VirusRes.ꎬ2013ꎬ14:pii:S0168-1702(13)00394-8.[5]㊀LiangWXꎬSongLMꎬTianGZꎬetal.ThegenomicsequenceofWisteriaveinmosaicvirusanditssimilari ̄tieswithotherpotyviruses[J].Arch.Viro.ꎬ2006ꎬ151:2311-2319.责任编辑:于金枝欢迎订阅«植物病理学报»«植物病理学报»是中国植物病理学会主办的全国性学术刊物ꎬ 中国科技核心期刊 ꎮ主要刊登植物病理学各分支未经发表的专题评述㊁研究论文和研究简报等ꎬ以反映中国植物病理学的研究水平和发展方向ꎬ推动学术交流ꎬ促进研究成果的推广和应用ꎮ本刊现已被英国农业与生物技术文摘(CAB)㊁联合国粮农组织AGRIS等收录ꎮ据«中国科技期刊引证报告»(2014年版)统计结果ꎬ«植物病理学报»影响因子0.832ꎮ荣获首届«中国学术期刊检索与评价数据规范»(CAJ ̄CD)执行优秀期刊奖㊁2012中国国际影响力优秀学术期刊奖和2013百种中国杰出学术期刊奖ꎮ本刊为双月刊ꎬ每期定价30元ꎬ全年6期共180元ꎮ邮发代号:82 ̄214ꎮ欢迎投稿ꎬ欢迎订阅ꎮ编辑部地址:北京市海淀区圆明园西路2号中国农业大学农学楼243室邮编:100193电话:(010)62732364E ̄mail:zwblxb@cau.edu.cnꎮ29。
deep unfolding原理Deep Unfolding原理什么是Deep UnfoldingDeep Unfolding是一种用于解释深度学习模型工作原理的方法。
它通过展开神经网络模型的迭代过程,将其转换为更简单、更易理解的形式。
通过这种方式,Deep Unfolding可以帮助我们更好地理解深度学习中复杂的运算过程和参数优化。
Deep Unfolding的基本原理1.展开神经网络模型Deep Unfolding首先将深度学习模型展开为一系列的网络层。
每个网络层都对应着一个迭代过程,这个过程可以用来模拟深度学习模型的计算。
2.逐层计算接下来,Deep Unfolding通过逐层计算的方式,对每个网络层进行迭代计算。
在每个网络层中,我们可以看到输入数据通过一系列的操作(如卷积、激活函数等)得到输出。
这些操作的执行顺序和参数可以通过迭代来不断优化。
3.反向传播与参数优化在进行逐层计算的同时,Deep Unfolding还通过反向传播来更新每个网络层的参数。
通过计算模型的损失函数梯度,我们可以得到对参数的优化方向,并将其应用于每个网络层。
4.收敛与结果评估当模型的参数逐渐优化,并且损失函数逐渐减小,Deep Unfolding会逐渐收敛到一个较好的解。
最后,我们可以通过评估指标(如准确率、回归误差等)来评估模型的性能。
为什么使用Deep UnfoldingDeep Unfolding作为一种解释深度学习模型的方法,具有以下优点:•可解释性强:Deep Unfolding可以将复杂的神经网络模型转换为更简单的形式,使我们可以更好地理解模型的计算过程和参数优化。
•优化过程可视化:通过展开模型并进行逐层计算,Deep Unfolding提供了一种可视化的方式来展示每个网络层的计算过程。
这样可以帮助我们更好地理解模型运算过程中的细节和特点。
•参数优化效果可见:Deep Unfolding不仅可以展示模型的优化过程,还可以通过损失函数的变化来显示参数优化的效果。
16S焦磷酸测序关键因素是选择⼀个⽬标区域细菌图谱Target Region Selection Is a Critical Determinant of Community Fingerprints Generated by16S PyrosequencingPurnima S.Kumar1*,Michael R.Brooker1,Scot E.Dowd2,Terry Camerlengo31Division of Periodontology,College of Dentistry,The Ohio State University,Columbus,Ohio,United States of America,2Research Testing Laboratory,Lubbock,Texas, United States of America,3Comprehensive Cancer Center,The Ohio State University,Columbus,Ohio,United States of AmericaAbstractPyrosequencing of16S rRNA genes allows for in-depth characterization of complex microbial communities.Although it is known that primer selection can influence the profile of a community generated by sequencing,the extent and severity of this bias on deep-sequencing methodologies is not well elucidated.We tested the hypothesis that the hypervariable region targeted for sequencing and primer degeneracy play important roles in influencing the composition of16S pyrotag communities.Subgingival plaque from deep sites of current smokers with chronic periodontitis was analyzed using Sanger sequencing and pyrosequencing using4primer pairs.Greater numbers of species were detected by pyrosequencing than by Sanger sequencing.Rare taxa constituted nearly6%of each pyrotag community and less than1%of the Sanger sequencing community.However,the different target regions selected for pyrosequencing did not demonstrate a significant difference in the number of rare and abundant taxa detected.The genera Prevotella,Fusobacterium, Streptococcus,Granulicatella,Bacteroides,Porphyromonas and Treponema were abundant when the V1–V3region was targeted,while Streptococcus,Treponema,Prevotella,Eubacterium,Porphyromonas,Campylobacer and Enterococcus predominated in the community generated by V4–V6primers,and the most numerous genera in the V7–V9community were Veillonella,Streptococcus,Eubacterium,Enterococcus,Treponema,Catonella andSelenomonas.Targeting the V4–V6 region failed to detect the genus Fusobacterium,while the taxa Selenomonas,TM7and Mycoplasma were not detected by the V7–V9primer pairs.The communities generated by degenerate and non-degenerate primers did not demonstrate significant differences.Averaging the community fingerprints generated by V1–V3and V7–V9primers providesd results similar to Sanger sequencing,while allowing a significantly greater depth of coverage than is possible with Sanger sequencing.It is therefore important to use primers targeted to these two regions of the16S rRNA gene in all deep-sequencing efforts to obtain representational characterization of complex microbial communities.Citation:Kumar PS,Brooker MR,Dowd SE,Camerlengo T(2011)Target Region Selection Is a Critical Determinant of Community Fingerprints Generated by16S Pyrosequencing.PLoSONE6(6):e20956.doi:10.1371/journal.pone.0020956Editor:Jonathan H.Badger,J.Craig Venter Institute,United States of AmericaReceived February3,2011;Accepted May14,2011;Published June29,2011Copyright:?2011Kumar et al.This is an open-access article distributed under the terms of the Creative Commons Attribution License,which permits unrestricted use,distribution,and reproduction in any medium,provided the original author and source are credited.Funding:The research was supported by NIH grant1R03DE018734-01A1.The funders had no role in study design,data collection and analysis,decision to publish,or preparation of the manuscript.Competing Interests:The authors have declared that no competing interests exist.*E-mail:kumar.83@/doc/fafdbe72168884868762d6f5.htmlIntroductionMolecular approaches have revealed the presence of large numbers of as-yet-uncultivated organisms in the subgingival microbiome;creating a paradigm shift in our understanding of periodontal health and disease[1,2,3,4].In recent years, sequencing of16S rRNA genes by the Sanger method(16S cloning and sequencing)has been widely used to examine subgingival microbial profiles in periodontal health and disease, as well as to characterize compositional shifts in these communities [5,6,7,8,9].However,recent studies suggest that next-generation sequencing methodologies provide an economical and significantly higher-throughput alternative to Sanger sequencing for compar-ative genomics[10,11].Pyrosequencing of PCR-amplified16S rDNA(‘16S pyrotags’)is a next-generation sequencing methodology that is capable of generating thousands of sequences from several samples simulta-neously.The unprecedented sampling depth provided by this deep-sequencing approach allows the identification of several numerically minor or rare species within a community and has revealed a significantly greater level of microbial diversity than was previously apparent with Sanger sequencing[12,13].Unlike Sanger sequencing,which is capable of sequencing the entire gene,pyrosequencing is currently limited to generating sequences that are usually350–500bp in length.In order to improve community coverage,various investigations have em-ployed primers that target different regions of the gene[12,13,14]. It has previously been shown,using Sanger sequencing,that the region of the16S gene that is targeted for sequencing as well as the degeneracy of the sequencing primers introduce a level of bias into the community profile[2,15].Since pyrosequencing provides an enormously increased depth-of-coverage,it is important to understand the extent and severity of bias introduced by primer selection on the profile of any given community.Previous studies have examined this bias using simulated datasets obtained by truncating full-length sequences,in silico testing of primer sequences for community coverage rates or by analyzing artificial bacterial communities created by mixingbacterial isolates[15,16,17,18,19].However,it is logical to expect that fragment length(,1.5kb with Sanger sequencing and150–500bp with pyrosequencing)and as well as sequencing chemistry will affect amplification efficiency;therefore,profiles derived from artificially generated sequences may not accurately represent the coverage obtained from naturally occurring microbial communi-ties.In fact,a recent investigation comparing454and Illumina sequencing has found significant divergence between in silico predictions and experimental results,emphasizing the need for experimental validation of primer pairs[20].Hence,it is important to investigate the extent of this bias using sequences derived from clinical samples.The purpose of this investigation,therefore,was to examine the bias introduced by target region selection and as well as by primer degeneracy on coverage of subgingival microbial communities using pyrosequencing.MethodsSubject selectionApproval for this study was obtained from the Office of Responsible Research Practices at The Ohio State University.10 current smokers with generalized moderate to severe chronic periodontitis were identified following clinical and radiographic examination and written informed consent was obtained. Exclusion criteria included diabetes,HIV infection,use of immunosuppressant medications,bisphosphonates or steroids, antibiotic therapy or oral prophylactic procedures within the last three months and less than20teeth in the dentition.Sample collection and DNA isolationSubgingival plaque samples were collected and pooled from four non-adjacent proximal sites demonstrating at least6mm of attachment loss and5mm of probe depths.Samples were collected by inserting4sterile endodontic paper points(Caulk-Dentsply)into each of the4sites for10seconds,following isolation and supragingival plaque removal.Samples were placed in1.5ml microcentrifuge tubes and frozen until further analysis.Bacteria were separated from the paper points by adding200m l of phosphate buffered saline(PBS)to the tubes and vortexing.The points were then removed,and DNA was isolated with a Qiagen DNA MiniAmp kit(Qiagen,Valencia,CA)using the tissue protocol according to the manufacturer’s instructions. Selection and optimization of primers Four sets of primers were used to amplify each sample(A17and 519R,27F and515R,519F and1114R,1114F and317).The primer sequences are listed in Table1.Primer pairs were selected to generate400–500bp products from contiguous regions of the16S rRNA gene.Previous sequencing-based investigations were exam-ined and the primers most commonly used in these studies were selected[2,6,7,8,9,13,21,22].The universality of the primer pairs was assessed by comparing them to our locally hosted,curated database of1800nearly full-length16S sequences derived from GenBank.MacVector was used for alignment and determining melting temperatures and GC ratios of the resulting amplicons. Complementary sequences were generated from the published sequences of primers519and1114.Degeneracies were added to primer515R following comparison to the oral bacterial database to maximize matches of primer against bacterial sequences. PyrosequencingMultiplexed bacterial tag-encoded FLX amplicon pyrosequenc-ing(bTEFAP)was performed using the Titanium platform(Roche Applied Science,Indianapolis,IN)as previously described[22]in a commercial facility(Research and Testing Laboratories, Lubbock,TX).Briefly,a single step PCR with broad-range universal primers and22cycles of amplification was used to amplify the16S rRNA genes as well as to introduce adaptor sequences and sample-specific10-mer oligonucleotide tags into the DNA.The same bar codes were utilized for each primer set.Three regions of the16S gene were sequenced from each sample(V1–V3,V4–V6,V7–V9).Adaptor sequences were trimmed from raw data with98%or more of bases demonstrating a quality control of 30and sequences binned into individual sample collections based on bar-code sequence tags,which were then trimmed.The resulting files were denoised with Pyronoise[23]and depleted of chimeras usingB2C2(/doc/fafdbe72168884868762d6f5.html / B2C2.html).Sequences less than,300bp in length were deleted and the rest were clustered into species-level operational taxonomic units(s-OTUs)at97%sequence similarity and assigned a taxonomic identity by alignment to locally hosted version of the Greengenes database[24]using the Blastn algorithm.Phyloge-netic trees were generated and visualized using FastTree[25].All analyses were conducted within the virtual environment provided by the QIIME pipeline[26].Statistical analysisSpecies-level OTUs(s-OTUs)were used to compute the Shannon Diversity and Equitability indices for each sample.EstimateS((Version7.5,R.K.Colwell,/doc/fafdbe72168884868762d6f5.html / estimates)was used to compute the indices and statistical analyses were carried out withJMP(SAS Institute Inc.,Cary,NC).The indices were compared between groups using ANOVA.A variance stabilizing transformation was used to create normal distribution of the data as previously described[27,28].Two sample t-tests were used to compare the transformed values of species and genus-level OTUs between groups.Fisher’s exact test was used to test for presence or absence of genera.ResultsThe pyrotag sequences were compared to previously published data obtained by Sanger sequencing using the primer pairs A17 and317on the same samples[27].A subset of the pyrosequencing data was created using a random number generator to select100 pyrotag sequences from each primer set.This subset was compared to an equivalent number of Sanger sequences.A totalof1054nearly full-length sequences(1300–1460bp)were identified by Sanger sequencing,and167,210sequences by pyrosequencing,representing a167-fold increase in depth-of-coverage with pyrosequencing.Figure1shows the Shannon Diversity and Equitability indices for all primer sets.The Diversity Index was not different between groups;however,the community generated by Sanger sequencing demonstrated significantly greater equitability than all the pyrotag communities(p,0.01,ANOVA).Pyrotag communities generated by the4primer pairs demonstrated similar diversity.Figures2A and2B show the distribution of rare and abundant taxa by primer pair and sequencing methodology. 1.9%of sequences could not be classified into any taxon below the level of domain.Taxa with less than20overall sequences were designated as rare.Sanger sequences demonstrated significantly lower coverage of rare as well as abundant species than pyrosequencing (p,0.001,ANOVA).However,there were no differences in the number of rare and abundant taxa in any of the pyrotag communities.Figure 3shows the distribution by genus of sequences generated by degenerate and non-degenerate primer pairs targeted to the V1–V3region.There were no differences between the two groups (p .0.05,2-sample t-test on transformed variable).Table 2shows the relative abundance of genera in sequences obtained by pyrosequencing different target regions.Genera accounting for 0.1%of total pyrosequences are shown.Overall,greater numbers of differences were detected in the levels of genera between the V1–V3and V7–V9regions (p ,0.05,2-sample t-test on transformed variable).The regions targeted significantly influenced community profiles generated by pyrose-quencing.The genera Prevotella ,Fusobacterium ,Streptococcus ,Granu-licatella ,Bacteroides ,Porphyromonas and Treponema formed 65%of the community when the V1–V3region was targeted,while Streptococcus ,Treponema ,Prevotella ,Eubacterium ,Porphyromonas ,Cam-pylobacter and Enterococcus accounted for the same abundance in the community generated by V4–V6primers,and 65%of the V7–V9community was formed by Veillonella ,Streptococcus ,Eubacterium ,Enterococcus ,Treponema ,Catonella and Selenomonas .Among the predominant genera,Fusobacteria were not detected in any of the samples by the V7–V9primers,while the V4–V6primers did not detect the Selenomonads,Mycoplasma ,or TM7phylum in any sample (p ,0.05,Fisher’s exact test).Table 3shows the relative abundance of genera obtained by concatenating data from pairs of target regions or by combining all three regions to provide near-full-length coverage of the 16STable 1.Sequences of primers used in study.Target region Primer name (reference)Primer sequence%GC ratio PrimerProduct V1–V3A17(Kumar et al 2005)59-GTT TGA TCC TGG CTC AG-3952.953.4519R (Lane et al 1991)59-GTA TTA CCG CGG CAG CTG GCA C-3963.6V1–V327F (Lane et al 1991)59-AGA GTT TGA TGM TGG CTC AG-395053.4515R (modified from Kroes et al 1999)59-TTA CCG CGG CMG CSG GCA C-3978.9V4–V6519F(modified from Lane et al 1991)59-GTG CCA GCT GCC GCG GTA ATA C-3963.654.61114R(modified from Stackebrandt and Goodfellow 1991)59-GGG TTG CGC TCG TTG C-3968.8V7–V91114F(Stackebrandt and Goodfellow 1991)59-GCA ACG AGC GCA ACC C-3968.854.2317(Kumar et al 2005)59-AAG GAG GTG ATC CAG GC-3958.8Sanger A17(Kumar et al 2005)59-GTT TGA TCC TGG CTC AG-3952.953.8317(Kumar et al 2005)59-AAG GAG GTG ATC CAG GC-3958.8doi:10.1371/journal.pone.0020956.t001Figure 1.Shannon diversity and equitability indices of pyrotag and Sanger communities.No differences were detected between any of the pyrotag communities;however,the Sanger community demonstrated significantly greater equitable than all the pyrotag communities (**p ,0.01,ANOVA).doi:10.1371/journal.pone.0020956.g001gene.Relative abundances of the same genera in near-full-length Sanger sequences are also shown for comparison.To arrive at these results,the subset pyrotag dataset was compared to an equivalent number of Sanger sequences from each sample.Concatenating data from V1–V3and V7–V9regions demon-strated the greatest similarity to Sanger data as well as to the averages of all 3regions.DiscussionIt has been shown that sequences of 500–700bp are required for phylogenetic discrimination at the species levels [9,29].However,previous reports have been equivocal on the level of community coverage achieved using the different hypervariable regions.While several investigations support using the V1,V2and V3regions for deep sequencing [17],others suggest that these regions overestimate species richness and promote the V4–V6region as the most appropriate [19].Yet others have demonstrated that V7–V8fragments achieve representational characterization of a community [30].Our previous investiga-tions with Sanger sequencing have revealed that the subgingival microflora associated with periodontitis in smokers is extremely diverse,with several rare species/phylotypes [27].Hence,plaque samples were collected and pooled from deep sites of current smokers with moderate to severe periodontitis to examine the extent to which primer design affects the community fingerprint of a highly complex and taxonomically heterogeneous microbial /doc/fafdbe72168884868762d6f5.html ing an adequately powered clinical study design to enable statistical analyses allowed an in-depth comparison of the community profiles generated by the different primer sets.The Shannon Diversity index incorporates both the number of species (species richness)as well as the proportion of each species (species evenness)into a single value [31].Thus,while a value of zero necessarily represents a mono-species community,a higher value may result either from the presence of several species at varying levels or from equitable distribution of a fewspecies.Hence,the Equitability index is used to elucidate the relative contributions of species richness and evenness to the Diversity index.Pyrotag communities demonstrated similar diversity to the Sanger community,however,were significantly less equitable (Figure 1),suggesting that greater species richness contributed to the diversity.The increased species richness was apparent in both rare and abundant taxa (Figure 2).This is in contrast to previous investigations;which have suggested that pyrosequencing overes-timates community diversity by overestimating the number ofrareFigure 2.Distribution of sequences by taxa.Rare taxa are shown in Figure 2A and abundant taxa in Figure 2B.The Sanger community demonstrated significantly fewer species-level taxa than pyrosequencing (***p ,0.001,ANOVA).There were no differences between the pyrotag sequences.doi:10.1371/journal.pone.0020956.g002taxa [10,32].A single-step PCR with low cycle numbers and a high fidelity,proofreading polymerase were utilized in this study;and it is possible that this minimized over-representation of rare taxa in the present investigation.No differences were apparent in the number of rare and abundant taxa between the different hypervariable regions;suggesting that targeting a specific region for pyrosequencing does not affect species richness.Taken together,it appears that selecting a specific region for pyrose-quencing is not a source of bias in the diversity of the resulting community or in the number of taxa detected.Out of the four primer pairs selected,two pairs targeted the same region (V1–V3),one pair containing degenerate sequences and the other non-degenerate.Fragments encompassing the V1–V3region have been the most common targets for both Sanger sequencing and pyrosequencing;and both non-degenerate and degenerate primers have been used to amplify this region[2,6,7,8,9,33].It has previously been suggested that inclusion of degenerate sequences improves the ‘‘universality’’of primers (reviewed by Baker et al [15]),however,our data does not support a role for primer degeneracy in improving community coverage.This is in concordance with previous investigations that have reported no effect of primer degeneracy on profiles of naturally occurring microbial communities [34].Although degenerate primers,by virtue of their lowered specificity,may amplify larger number of taxa within a community,it has been shown that this effect is magnified when large PCR cycle numbers are used [35].The present investigation used 22cycles to amplification to ensure representational amplification of the community template,and it is possible that the low cycle numbers precluded a possible influence by degenerate primers.Our data suggest that the hypervariable region targeted for sequencing plays a critical role in influencing the composition of pyrotag communities.Previous investigations have reported that amplicon size and PCR kinetics may be a source of sequencingbias [36,37].To overcome this in the present study,sequencing primers were carefully selected to generate similar amplicon sizes (,500bp for V1–V3amplicons,,550bp for V4–V6amplicons and,470bp for V7–V9amplicons).Identical PCR cycling conditions were also utilized for all primer sets,thereby reducing the possibility of bias from this/doc/fafdbe72168884868762d6f5.html ing a single pyrose-quencing run to generate all sequences further reduced bias due to PCR and sequencing kinetics.Thus,the observed differences could not be attributed to these variables.It is especially striking that even though these samples were derived from sites with severe disease,the V7–V9communities were dominated by Veillonella and the V4–V6communities by Streptococci (Table 2),genera that have been previously associated with periodontal health [33].Similarly,Treponema ,a disease-associated genus;was found in high numbers in the V4–V6and V7–V9communities;while other disease-associated genera,for example,Prevotella ,Porphyromonas ,and Bacteroides were predominant in V1–V3communities derived from the same run.Fusobacteria were undetected by the V7–V9primers while forming nearly 19%of the V1–V3community.Similarly,the Selenomonads were not detectable by the V4–V6primers,while forming 6%of the V7–V9community.Concatenated data from V1–V3and V7–V9regions resulted in community profiles that did not significantly differ from Sanger sequences or full-length pyrosequences for the predominant genera,while averages of the other two regions did not yield similar results (Table 3).It is also noteworthy that the greatest differences were observed in the community fingerprints generated by these two primer sets.The mechanism causing this difference is not clear and warrants further investigation.It could be hypothesized that presence and nature of secondary structures within the target regions as well as the GC ratios of the resultant fragments may have contributed to the differences.It is known that the V1,V4and V7regions exhibit differences in the number of stems as well as in nucleotide variations within these stems [38,39],and while is possiblethatFigure 3.Distribution of sequences generated by degenerate and non-degenerate primers by genus.Percent mean abundances and standard deviations are shown.Genera are arranged in a gradient such that those predominant in the degenerate community are arranged on the left.There were no differences between the two communities in the relative abundance of any genus (p.0.05,2-sample t-test on transformed variable).doi:10.1371/journal.pone.0020956.g003differential amplification efficiencies contributed to the composi-tional differences,it is not within the scope of this study to test this hypothesis.It has been shown that higher GC ratios result in higher amplification efficiencies[35],thereby altering PCR kinetics,with over-amplification of rare members and under-representation of dominant species[40].In the present investiga-Table2.Relative abundances of genera in pyrotag sequences.Genus Percent total pyrotags Percent abundance(mean±standard deviation)V1–V3V4–V6V7–V9Streptococcus(A,B)15.08.363.125.264.311.568.0Prevotella(A,B,C)11.523.165.98.265.4 3.361.9Fusobacterium(A,C)7.318.368.3 3.662.00.060.0Treponema(A)7.3 1.864.212.263.37.8610.2Eubacterium(C) 6.6 1.961.1 5.264.212.665.1Enterococcus(C) 5.30.360.4 5.362.410.365.0Veillonella(B,C) 5.00.360.2 1.560.113.166.6Selenomonas(B) 3.5 4.262.10.060.0 6.362.2Granulicatella(A,C) 3.5 6.963.8 1.361.8 2.261.8Dialister 3.4 1.661.8 3.160.4 5.467.1Parvimonas(B) 3.4 2.661.2 1.260.6 6.362.6 Porphyromonas(B) 3.2 3.562.2 5.861.20.260.1 Campylobacter(B) 3.1 2.161.1 6.261.2 1.060.7Catonella 3.0 1.962.4 1.262.2 5.966.4Bacteroides(C) 3.0 5.164.4 3.561.10.360.3Synergistes(B) 2.2 2.162.6 4.360.90.360.3Neisseria(A) 2.00.560.6 3.461.8 2.161.3 Capnocytophaga 1.7 2.662.3 1.361.8 1.161.1Unclassified Bacteroidales(A,B) 1.6 1.160.3 3.461.40.460.7Filifactor 1.50.760.9 1.960.3 1.862.3Gemella 1.4 1.261.50.860.6 2.261.9Unclassified Veillonellaceae(C) 1.20.560.60.960.3 2.162.5Megasphaera 1.00.760.10.560.1 1.860.6Leptotrichia(C) 1.0 2.261.30.560.10.360.4TM7phylum(A,C)0.8 2.461.40.060.00.060.0Mycoplasma(A,C)0.7 1.962.40.060.00.160.1Hemophilus0.50.160.10.0260.02 1.462.6Lautropia0.50.660.70.560.20.460.4 Corynebacterium0.50.860.40.0160.20.660.2Arthrobacter0.40.160060 1.160.3Actinomyces0.30.560.20.0260.20.560.2Oribacterium0.30.160.10.261.10.761.4Kingella0.30.360.20.560.10.060.0Unclassified Clostridiales0.20.360.40.160.010.360.2Atopobium0.20.460.30.160.20.260.2Eikenella0.20.460.10.260.20.060.0Unclassified Lachnospiraceae0.20.560.30.0160.020.0460.01Lactococcus0.20.160.10.060.00.460.3Desulfobulbus0.10.160.10.060.00.360.3Ralstonia0.10.160.20.060.00.260.2Solobacterium0.10.260.20.060.00.060.0Percent mean abundances(and standard deviations)of genera in the3pyrotag and Sanger sequence communities are shown,arranged in order of decreasing overall prevalence.Alphabets in parentheses indicate statistically significant differences between groups(p,0.05,2-sample t-test on transformed variable).A-significant difference between V1–V3&V4–V6,B-significant difference between V1–V3&V7–V9,C-significant difference between V4–V6&V7–V9(2-sample t-test on transformed variable).doi:10.1371/journal.pone.0020956.t002tion,however,the GC ratios of the different amplicons were very similar;therefore,the observed discrepancies could not be attributable to this variable.In summary,the hypervariable region targeted by the primer plays a critical role in determining the profile of a largely uncultivated,complex microbial community generated by pyro-sequencing.This effect is significant,with the presence of certain dominant community members being masked and others being under-represented with different primer sets;thereby providing a critical source of error in microbial ecological studies.However, averaging the community fingerprints generated by V1–V3and V7–V9primers provides results similar to Sanger sequencing, while allowing a significantly greater depth of coverage than is possible with Sanger sequencing.It is therefore important to useTable3.Relative abundances of genera in Sanger and concatenated pyrotag datasets.Genus Average abundance(percentage)V1–V3&V4–V6V4–V6&V7–V9V1–V3&V7–V9Sanger V1–V3,V4–V6&V7–V9 Streptococcus11.714.316.217.814.1Eubacterium 2.58.47.3 6.2 6.1Veillonella 1.9 3.811.910.9 5.9Treponema 6.1 4.6 4.8 4.6 5.2Selenomonas 4.2 3.77.28.65Catonella 1.67.6 4.3 5.9 4.5Bacteroides7.8 1.9 2.7 1.3 4.1Fusobacterium 5.4 4.1 1.30.8 3.6Granulicatella 4.1 1.8 4.6 2.2 3.5Parvimonas 1.9 3.4 4.47.1 3.4Dialister 2.4 4.3 3.5 3.2 3.4Prevotella 3.6 4.2 2.1 1.3 3.3Porphyromonas 4.73 1.9 2.1 3.2Campylobacter 4.2 3.6 1.611.8 3.1Gemella 4.7 1.5 2.2 3.6 2.8Unclassified Bacteroidales 4.3 1.90.8 1.9 2.3Synergistes 3.2 2.3 1.2 1.0 2.2Enterococcus0.6 3.2 2.70.0 2.2Neisseria2 2.8 1.30.52Megasphaera 3.2 1.2 1.3 2.3 1.9Capnocytophaga2 1.2 1.9 1.9 1.7Filifactor 1.3 1.9 1.3 2.1 1.5Unclassified Veillonellaceae0.3 2.30.60.2 1.1Leptotrichia 1.40.4 1.30.01Desulfobulbus 2.30.20.10.00.8Lautropia0.60.50.20.00.4Corynebacterium0.40.30.20.00.3Actinomyces0.30.30.30.20.3Atopobium0.30.20.30.50.2Unclassified Clostridiales0.20.20.30.00.2TM7phylum0.20.00.40.00.2Eikenella0.30.10.20.60.2Oribacterium0.00.500.00.2Arthrobacter0.40.00.00.00.1Kingella0.00.30.00.00.1Mycoplasma0.00.00.00.00.0Lactococcus0.00.00.00.00.0Ralstonia0.00.00.00.00.0Solobacterium0.00.00.00.00.0Hemophilus0.00.00.00.00.0Unclassified Lachnospiraceae0.00.00.00.00.0doi:10.1371/journal.pone.0020956.t003primers targeted to these two regions of the16S rRNA gene in all deep-sequencing efforts to characterize heterogeneous microbial communities.Author ContributionsConceived and designed the experiments:PSK.Performed the experi-ments:MRB SED.Analyzed the data:TC PSK MRB.Wrote the paper: PSK MRB.References1.Diaz PI,Chalmers NI,Rickard AH,Kong C,Milburn CL,et al.(2006)Molecular characterization of subject-specific oral microflora during initial colonization of enamel.Appl Environ Microbiol72:2837–2848.2.de Lillo A,Ashley FP,Palmer RM,Munson MA,Kyriacou L,et al.(2006)Novelsubgingival bacterial phylotypes detected using multiple universal polymerase chain reaction primer sets.Oral Microbiol Immunol21:61–68.3.Delima SL,McBride RK,Preshaw PM,Heasman PA,Kumar PS(2010)Response of subgingival bacteria to smoking cessation.J Clin Microbiol48: 2344–2349.4.Gomes SC,Piccinin FB,Oppermann RV,Susin C,Nonnenmacher CI,et al.(2006)Periodontal status in smokers and never-smokers:clinical findings and real-time polymerase chain reaction quantification of putative periodontal pathogens.J Periodontol77:1483–1490.5.Aas JA,Paster BJ,Stokes LN,Olsen I,Dewhirst FE(2005)Defining the normalbacterial flora of the oral cavity.J Clin Microbiol43:5721–5732.6.Hutter G,Schlagenhauf U,Valenza G,Horn M,Burgemeister S,et al.(2003)Molecular analysis of bacteria in periodontitis:evaluation of clone libraries, novel phylotypes and putative pathogens.Microbiology149:67–75.7.Kroes I,Lepp PW,Relman DA(1999)Bacterial diversity within the humansubgingival crevice.Proc Natl Acad Sci U S A96:14547–14552.8.Kumar PS,Griffen AL,Moeschberger ML,Leys EJ(2005)Identification ofcandidate periodontal pathogens and beneficial species by quantitative16S clonal analysis.J Clin Microbiol43:3944–3955.9.Paster BJ,Boches SK,Galvin JL,Ericson RE,Lau CN,et al.(2001)Bacterialdiversity in human subgingival plaque.J Bacteriol183:3770–3783.10.Kunin V,Engelbrektson A,Ochman H,Hugenholtz P.Wrinkles in the rarebiosphere:pyrosequencing errors can lead to artificial inflation of diversity estimates.Environ Microbiol12:118–123.11.Zaura E,Keijser BJ,Huse SM,Crielaard W(2009)Defining the healthy‘‘coremicrobiome’’of oral microbial communities.BMC Microbiol9:259.12.Keijser BJ,Zaura E,Huse SM,van der Vossen JM,Schuren FH,et al.(2008)Pyrosequencing analysis of the oral microflora of healthy adults.J Dent Res87: 1016–1020.13.Li L,Hsiao WW,Nandakumar R,Barbuto SM,Mongodin EF,et al.AnalyzingEndodontic Infections by Deep Coverage Pyrosequencing.J Dent Res.14.Dominguez-Bello MG,Costello EK,Contreras M,Magris M,Hidalgo G,et al.Delivery mode shapes the acquisition and structure of the initial microbiota across multiple body habitats in newborns.Proc Natl Acad Sci U S A107: 11971–11975.15.Baker GC,Smith JJ,Cowan DA(2003)Review and re-analysis of domain-specific16S primers.J Microbiol Methods55:541–555.16.Wang Y,Qian PY(2009)Conservative fragments in bacterial16S rRNA genesand primer design for16S ribosomal DNA amplicons in metagenomic studies.PLoS One4:e7401.17.Chakravorty S,Helb D,Burday M,Connell N,Alland D(2007)A detailedanalysis of16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria.J Microbiol Methods69:330–339.18.Nossa CW,Oberdorf WE,Yang L,Aas JA,Paster BJ,et al.rRNA gene primersfor454pyrosequencing of the human foregut microbiome.World J Gastroenterol 16:4135–4144.19.Youssef N,Sheik CS,Krumholz LR,Najar FZ,Roe BA,et al.(2009)Comparison of species richness estimates obtained using nearly complete fragments and simulated pyrosequencing-generated fragments in16S rRNA gene-based environmental surveys.Appl Environ Microbiol75:5227–5236. 20.Claesson MJ,Wang Q,O’Sullivan O,Greene-Diniz R,Cole JR,et al.Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable16S rRNA gene regions.。