Applications of next-generation sequencing to phylogeography and phylogenetics

格式：pdf
大小：763.32 KB
文档页数：13

下载文档原格式

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Review

Applicationsofnext-generationsequencingtophylogeographyandphylogenetics

JohnE.McCormacka,⇑,SarahM.Hirda,b,AmandaJ.Zellmerb,BryanC.Carstensb,RobbT.Brumﬁelda,b

aMuseumofNaturalScience,LouisianaStateUniversity,BatonRouge,LA70803,UnitedStatesbDepartmentofBiologicalSciences,LouisianaStateUniversity,BatonRouge,LA70803,UnitedStates

articleinfo

Articlehistory:Availableonline14December2011

Keywords:PopulationgenomicsCoalescenceReducedrepresentationlibraryTargetenrichmentHigh-throughputsequencingabstract

ThisisatimeofunprecedentedtransitioninDNAsequencingtechnologies.Next-generationsequencing(NGS)clearlyholdspromiseforfastandcost-effectivegenerationofmultilocussequencedataforphylo-geographyandphylogenetics.However,thefocusonnon-modelorganisms,inadditiontouncertaintyaboutwhichsamplepreparationmethodsandanalysesareappropriatefordifferentresearchquestionsandevolutionarytimescales,havecontributedtoalagintheapplicationofNGStotheseﬁelds.Here,weoutlinesomeofthemajorobstaclesspeciﬁctotheapplicationofNGStophylogeographyandphylogenet-ics,includingthefocusonnon-modelorganisms,thenecessityofobtainingorthologouslociinacost-effectivemanner,andthepredominateuseofgenetreesintheseﬁelds.Wedescribethemostpromisingmethodsofsamplepreparationthataddressthesechallenges.Methodsthatreducethegenomebyrestrictiondigestandmanualsizeselectionaremostappropriateforstudiesattheintraspeciﬁclevel,whereasmethodsthattargetspeciﬁcgenomicregions(i.e.,targetenrichmentorsequencecapture)havewiderapplicabilityfromthepopulationleveltodeep-levelphylogenomics.Additionally,wegiveanover-viewofhowtoanalyzeNGSdatatoarriveatdatasetsapplicabletothestandardtoolkitofphylogeogra-phyandphylogenetics,includinginitialdataprocessingtoalignmentandgenotypecalling(bothSNPsandlociinvolvingmanySNPs).Eventhoughwhole-genomesequencingislikelytobecomeaffordablerathersoon,becausephylogeographyandphylogeneticsrelyonanalysisofhundredsofindividualsinmanycases,methodsthatreducethegenometoasubsetoflocishouldremainmorecost-effectiveforsometimetocome.Ó2011ElsevierInc.Allrightsreserved.

Contents

1.Introduction.........................................................................................................5271.1.Multilocusstudiesandthepromiseofnext-generationsequencing.......................................................5271.2.SpecificchallengestoapplyingNGStophylogeographyandphylogenetics.................................................5271.2.1.TheneedforhomologousDNAregionsfrommanyindividuals...................................................5271.2.2.Cost-effectivemultiplexingandlibrarypreparation............................................................5271.2.3.Thelongreignofthegenetreeinphylogeographyandphylogenetics.............................................5301.3.Primarydatacollectionormarkerdevelopment?......................................................................5302.Reviewofwetlabmethodsforsamplepreparation.........................................................................5302.1.MultiplexPCRandampliconsequencing.............................................................................5302.2.Restrictiondigest-basedmethods..................................................................................5302.2.1.RADsequencing.........................................................................................5312.2.2.Otherrestrictiondigest-basedmethods......................................................................5312.3.Targetenrichment...............................................................................................5312.3.1.Probesetsdesignedfromultraconservedelementsforphylogenomics.............................................5322.3.2.Probesetsdesignedfromcloselyrelatedgenomesandtranscriptomelibraries......................................5322.4.Transcriptomesequencing........................................................................................5323.Dataanalysisandbioinformatics........................................................................................5323.1.ThedifferencebetweenSangerandNGSdataandtheimportanceofcoverage..............................................532

1055-7903/$-seefrontmatterÓ2011ElsevierInc.Allrightsreserved.doi:

10.1016/j.ympev.2011.12.007⇑Correspondingauthor.Address:MooreLaboratoryofZoology,OccidentalCollege,1600CampusRd.,LosAngeles,CA90041,UnitedStates.E-mailaddress:mccormack@oxy.edu(J.E.

McCormack).MolecularPhylogeneticsandEvolution66(2013)526–538

ContentslistsavailableatSciVerseScienceDirect

MolecularPhylogeneticsandEvolution

journalhomepage:www.elsevier.com/loca

/ympev3.2.Determiningorthologousvs.paralogousloci.........................................................................5333.3.FromrawNGSoutputtoformatsappropriateforphylogeographyandphylogenetics........................................5333.3.1.FilteringunprocessedNGSdataandqualitycontrol............................................................5333.3.2.Alignments.............................................................................................5343.3.3.Genotypecalling.........................................................................................5343.4.Dataanalysis...................................................................................................5354.Futuredirections.....................................................................................................535Acknowledgments....................................................................................................535References..........................................................................................................535

1.Introduction

1.1.Multilocusstudiesandthepromiseofnext-generationsequencing

Usingmultiplelocitoinferpopulationandspecieshistorieshas

becomethebaselineinphylogeographyandphylogenetics.

Althoughthesetwoﬁeldscametothemultilocusapproachfordif-

ferenthistoricalreasons(seeBritoandEdwards,2008),multilocus

studiesinbothﬁeldsbeneﬁtedfromdecreasingcostsofDNA

sequencinginthelastthreedecades.Morerecently,statisticalphy-

logeography(Knowles,2009)andtheemergingspecies-treepara-

digmofphylogenetics(Edwards,2009)providedconvincing

theoreticalargumentsforincorporatinginformationfrommultiple

lociintoestimatesofpopulationandspecieshistorytoaccountfor

randomvariationinpatternsofgeneinheritance(i.e.,coalescent

stochasticity).Thepracticaloutcomeofthesedevelopmentsisthat

mostpractitionersofphylogeographyandphylogeneticshave

spentasigniﬁcantportionoftheirtimeinthelastdecadedevelop-

ingandscreeningmolecularmarkerssuitabletotheirstudysystem

andappropriatetotheirevolutionarytimescaleofinterest.

Evenwiththeincreasingavailabilityofmolecularmarkersfor

non-modelorganisms(Edwards,2008;Thomsonetal.,2010),the

processofdatagenerationforamultilocusstudyislaborious.In

thefast-movingﬁeldofmolecularbiology,itseemsevermore

unjustiﬁabletoembarkonthelengthyprocessofscreeningloci

forvariability(assumingprimersalreadyexist),amplifyingand

sequencingDNAforeachsampleateachlocus,andphasingnucle-

ardataviacomputationorcloning.Researchersofphylogeography

andphylogeneticshaveunderstandablylookedtowardnext-

generationsequencing(NGS)withgreatinterestasapotential

meanstocondensethemanystepsofmultilocusdatageneration

fornon-modelorganismsintoamoretime-efﬁcientandcost-effec-

tiveprocess(EkblomandGalindo,2010;LernerandFleischer,

2010).

Despitethisobviouspotential,NGShasbeenslowtotakeroot

inphylogeographyandphylogeneticscomparedtootherﬁeldslike

metagenomicsanddiseasegenetics(Mardis,2008).Wesuggest

thatthislaghasbeencausedbyfourspeciﬁcaspectsofphylogeo-

graphicandphylogeneticresearch:thepredominantfocusonnon-

modelorganisms,theneedforsequencinglargenumbersofsam-

plesperspecies,thelackofconsensusregardinglibraryprepara-tionprotocols

forparticularresearchquestions,andthe

transitionalstateofthetechnology(whole-genomedataarestill

neithercost-effective,norevendesirableforphylogeographyand

phylogenetics,butareparadoxicallyeasiertocollect).Conse-

quentlythereareasyetrelativelyfewpublishedpaperswhere

NGShasbeenusedtogeneratethekindofphylogeographicorphy-

logeneticdatasetsthatappearinMolecularPhylogeneticsand

Evolution.

Thepurposeofthisreviewistoaddressthespeciﬁcchallenges

ofapplyingNGStophylogeographyandphylogeneticsofnon-

modelorganismsandtohighlightemergingmethodsofsample

preparationanddataanalysisthatMPE’sreadersmayﬁnduseful.WewillnotfocusonusesofNGSforecologicalgenomicsoradap-

tivedivergencebecausethesetopicshavebeenreviewedindetail

elsewhere(Stapleyetal.,2010;Riceetal.,2011).Rather,wefocus

onusesofNGStoreconstructevolutionaryanddemographichis-

toryfromthelevelofpopulationstospecies.Webrieﬂytouchon

thesteadyconvergenceofphylogeographywithecologicalgenom-

icsinthesectiononfuturedirections.

1.2.SpeciﬁcchallengestoapplyingNGStophylogeographyand

phylogenetics

Therearetwosubstantialchallengestoempiricalresearchers

wishingtocollectsequencedatausingNGS:samplepreparation

andbioinformatics.HereandinSection2,weaddresstheformer,

whilewedevoteSection3tothelatter.

1.2.1.TheneedforhomologousDNAregionsfrommanyindividuals

Phylogeographyandphylogeneticsrequirehomologousgeno-

micregionsfrommultipleindividualstoinfergenegenealogies

andphylogenetictrees.UsingtraditionalSangerDNAsequencing

methods,theprocessofgeneratingoverlappinggenomicregions

ishighlytargetedandstraightforward,involvingprimerdesignfol-

lowedbyPCRforeachindividualateachlocus.WithNGS,DNAis

notnecessarilyenrichedforasinglelocusviaPCR-basedampliﬁca-

tion,althoughthisisonepossibleapplication,butformanyloci

throughavarietyofmethodsinvolvingreductionofthesizeof

thegenome(Table1;Fig.1).UnlikewithSangersequencing,with

manyNGSmethods,thereislesscontroloverexactlywhatregions

ofthegenomearesequenced.Thetrade-offisthatthenumberof

targets(andthenumberofreadsassociatedwitheachtarget)isin-

creasedbyordersofmagnitude.GenomicDNA(gDNA)mustbe

preparedinadvancetocontainorthologousregionsamongindi-

viduals,preferablywithoutrequiringhundredstothousandsof

PCRs.Theneedtoreducethegenometooverlappingsubsetsmight

becomeunnecessaryasitbecomescost-effectivetosequence

wholegenomesforhundredsofindividualsandoncethetheory

andanalysisoffull-genomedataisbetterdeveloped(seeSec-

tion4);however,forthetimebeingtheissueofhowtoreduce

thegenomesofmanyindividualstoorthologousfragmentsre-

mainsasigniﬁcantobstacletoincorporatingNGSmethodsinto

phylogeographyandphylogenetics.InSection2andTable1,we

discussdifferentwetlabmethodsforgenomicreduction,including

thosethatselectexpressedgenes,randomfragmentsfrom

throughoutthegenome,andothertargetedregions.

1.2.2.Cost-effectivemultiplexingandlibrarypreparation

ThesecondchallengeisthatusingNGSforphylogeographyand

phylogeneticsisonlycost-effectiveifmanyindividualscanbe

combined(multiplexed)inthesamesequencingrunandthecosts

subdividedamongmanysamples(Glenn,2011).Multiplexingin-

volvestheapplicationofshortidentifyingDNAsequences(called

‘‘indexes’’,‘‘barcodes’’,orthetermwewillusehere–‘‘tags’’)that

areincorporatedintotheDNAfragmentseitherbyPCR(BinladenJ.E.McCormacketal./MolecularPhylogeneticsandEvolution66(2013)526–538527

Applications of next-generation sequencing to phylogeography and phylogenetics

合集下载

文档推荐

最新文档