Applications of next-generation sequencing to phylogeography and phylogenetics
- 格式:pdf
- 大小:763.32 KB
- 文档页数:13
Review
Applicationsofnext-generationsequencingtophylogeographyandphylogenetics
JohnE.McCormacka,⇑,SarahM.Hirda,b,AmandaJ.Zellmerb,BryanC.Carstensb,RobbT.Brumfielda,b
aMuseumofNaturalScience,LouisianaStateUniversity,BatonRouge,LA70803,UnitedStatesbDepartmentofBiologicalSciences,LouisianaStateUniversity,BatonRouge,LA70803,UnitedStates
articleinfo
Articlehistory:Availableonline14December2011
Keywords:PopulationgenomicsCoalescenceReducedrepresentationlibraryTargetenrichmentHigh-throughputsequencingabstract
ThisisatimeofunprecedentedtransitioninDNAsequencingtechnologies.Next-generationsequencing(NGS)clearlyholdspromiseforfastandcost-effectivegenerationofmultilocussequencedataforphylo-geographyandphylogenetics.However,thefocusonnon-modelorganisms,inadditiontouncertaintyaboutwhichsamplepreparationmethodsandanalysesareappropriatefordifferentresearchquestionsandevolutionarytimescales,havecontributedtoalagintheapplicationofNGStothesefields.Here,weoutlinesomeofthemajorobstaclesspecifictotheapplicationofNGStophylogeographyandphylogenet-ics,includingthefocusonnon-modelorganisms,thenecessityofobtainingorthologouslociinacost-effectivemanner,andthepredominateuseofgenetreesinthesefields.Wedescribethemostpromisingmethodsofsamplepreparationthataddressthesechallenges.Methodsthatreducethegenomebyrestrictiondigestandmanualsizeselectionaremostappropriateforstudiesattheintraspecificlevel,whereasmethodsthattargetspecificgenomicregions(i.e.,targetenrichmentorsequencecapture)havewiderapplicabilityfromthepopulationleveltodeep-levelphylogenomics.Additionally,wegiveanover-viewofhowtoanalyzeNGSdatatoarriveatdatasetsapplicabletothestandardtoolkitofphylogeogra-phyandphylogenetics,includinginitialdataprocessingtoalignmentandgenotypecalling(bothSNPsandlociinvolvingmanySNPs).Eventhoughwhole-genomesequencingislikelytobecomeaffordablerathersoon,becausephylogeographyandphylogeneticsrelyonanalysisofhundredsofindividualsinmanycases,methodsthatreducethegenometoasubsetoflocishouldremainmorecost-effectiveforsometimetocome.Ó2011ElsevierInc.Allrightsreserved.
Contents
1.Introduction.........................................................................................................5271.1.Multilocusstudiesandthepromiseofnext-generationsequencing.......................................................5271.2.SpecificchallengestoapplyingNGStophylogeographyandphylogenetics.................................................5271.2.1.TheneedforhomologousDNAregionsfrommanyindividuals...................................................5271.2.2.Cost-effectivemultiplexingandlibrarypreparation............................................................5271.2.3.Thelongreignofthegenetreeinphylogeographyandphylogenetics.............................................5301.3.Primarydatacollectionormarkerdevelopment?......................................................................5302.Reviewofwetlabmethodsforsamplepreparation.........................................................................5302.1.MultiplexPCRandampliconsequencing.............................................................................5302.2.Restrictiondigest-basedmethods..................................................................................5302.2.1.RADsequencing.........................................................................................5312.2.2.Otherrestrictiondigest-basedmethods......................................................................5312.3.Targetenrichment...............................................................................................5312.3.1.Probesetsdesignedfromultraconservedelementsforphylogenomics.............................................5322.3.2.Probesetsdesignedfromcloselyrelatedgenomesandtranscriptomelibraries......................................5322.4.Transcriptomesequencing........................................................................................5323.Dataanalysisandbioinformatics........................................................................................5323.1.ThedifferencebetweenSangerandNGSdataandtheimportanceofcoverage..............................................532
1055-7903/$-seefrontmatterÓ2011ElsevierInc.Allrightsreserved.doi:
10.1016/j.ympev.2011.12.007⇑Correspondingauthor.Address:MooreLaboratoryofZoology,OccidentalCollege,1600CampusRd.,LosAngeles,CA90041,UnitedStates.E-mailaddress:mccormack@oxy.edu(J.E.
McCormack).MolecularPhylogeneticsandEvolution66(2013)526–538
ContentslistsavailableatSciVerseScienceDirect
MolecularPhylogeneticsandEvolution
journalhomepage:www.elsevier.com/loca
t
e
/ympev3.2.Determiningorthologousvs.paralogousloci.........................................................................5333.3.FromrawNGSoutputtoformatsappropriateforphylogeographyandphylogenetics........................................5333.3.1.FilteringunprocessedNGSdataandqualitycontrol............................................................5333.3.2.Alignments.............................................................................................5343.3.3.Genotypecalling.........................................................................................5343.4.Dataanalysis...................................................................................................5354.Futuredirections.....................................................................................................535Acknowledgments....................................................................................................535References..........................................................................................................535
1.Introduction
1.1.Multilocusstudiesandthepromiseofnext-generationsequencing
Usingmultiplelocitoinferpopulationandspecieshistorieshas
becomethebaselineinphylogeographyandphylogenetics.
Althoughthesetwofieldscametothemultilocusapproachfordif-
ferenthistoricalreasons(seeBritoandEdwards,2008),multilocus
studiesinbothfieldsbenefitedfromdecreasingcostsofDNA
sequencinginthelastthreedecades.Morerecently,statisticalphy-
logeography(Knowles,2009)andtheemergingspecies-treepara-
digmofphylogenetics(Edwards,2009)providedconvincing
theoreticalargumentsforincorporatinginformationfrommultiple
lociintoestimatesofpopulationandspecieshistorytoaccountfor
randomvariationinpatternsofgeneinheritance(i.e.,coalescent
stochasticity).Thepracticaloutcomeofthesedevelopmentsisthat
mostpractitionersofphylogeographyandphylogeneticshave
spentasignificantportionoftheirtimeinthelastdecadedevelop-
ingandscreeningmolecularmarkerssuitabletotheirstudysystem
andappropriatetotheirevolutionarytimescaleofinterest.
Evenwiththeincreasingavailabilityofmolecularmarkersfor
non-modelorganisms(Edwards,2008;Thomsonetal.,2010),the
processofdatagenerationforamultilocusstudyislaborious.In
thefast-movingfieldofmolecularbiology,itseemsevermore
unjustifiabletoembarkonthelengthyprocessofscreeningloci
forvariability(assumingprimersalreadyexist),amplifyingand
sequencingDNAforeachsampleateachlocus,andphasingnucle-
ardataviacomputationorcloning.Researchersofphylogeography
andphylogeneticshaveunderstandablylookedtowardnext-
generationsequencing(NGS)withgreatinterestasapotential
meanstocondensethemanystepsofmultilocusdatageneration
fornon-modelorganismsintoamoretime-efficientandcost-effec-
tiveprocess(EkblomandGalindo,2010;LernerandFleischer,
2010).
Despitethisobviouspotential,NGShasbeenslowtotakeroot
inphylogeographyandphylogeneticscomparedtootherfieldslike
metagenomicsanddiseasegenetics(Mardis,2008).Wesuggest
thatthislaghasbeencausedbyfourspecificaspectsofphylogeo-
graphicandphylogeneticresearch:thepredominantfocusonnon-
modelorganisms,theneedforsequencinglargenumbersofsam-
plesperspecies,thelackofconsensusregardinglibraryprepara-tionprotocols
forparticularresearchquestions,andthe
transitionalstateofthetechnology(whole-genomedataarestill
neithercost-effective,norevendesirableforphylogeographyand
phylogenetics,butareparadoxicallyeasiertocollect).Conse-
quentlythereareasyetrelativelyfewpublishedpaperswhere
NGShasbeenusedtogeneratethekindofphylogeographicorphy-
logeneticdatasetsthatappearinMolecularPhylogeneticsand
Evolution.
Thepurposeofthisreviewistoaddressthespecificchallenges
ofapplyingNGStophylogeographyandphylogeneticsofnon-
modelorganismsandtohighlightemergingmethodsofsample
preparationanddataanalysisthatMPE’sreadersmayfinduseful.WewillnotfocusonusesofNGSforecologicalgenomicsoradap-
tivedivergencebecausethesetopicshavebeenreviewedindetail
elsewhere(Stapleyetal.,2010;Riceetal.,2011).Rather,wefocus
onusesofNGStoreconstructevolutionaryanddemographichis-
toryfromthelevelofpopulationstospecies.Webrieflytouchon
thesteadyconvergenceofphylogeographywithecologicalgenom-
icsinthesectiononfuturedirections.
1.2.SpecificchallengestoapplyingNGStophylogeographyand
phylogenetics
Therearetwosubstantialchallengestoempiricalresearchers
wishingtocollectsequencedatausingNGS:samplepreparation
andbioinformatics.HereandinSection2,weaddresstheformer,
whilewedevoteSection3tothelatter.
1.2.1.TheneedforhomologousDNAregionsfrommanyindividuals
Phylogeographyandphylogeneticsrequirehomologousgeno-
micregionsfrommultipleindividualstoinfergenegenealogies
andphylogenetictrees.UsingtraditionalSangerDNAsequencing
methods,theprocessofgeneratingoverlappinggenomicregions
ishighlytargetedandstraightforward,involvingprimerdesignfol-
lowedbyPCRforeachindividualateachlocus.WithNGS,DNAis
notnecessarilyenrichedforasinglelocusviaPCR-basedamplifica-
tion,althoughthisisonepossibleapplication,butformanyloci
throughavarietyofmethodsinvolvingreductionofthesizeof
thegenome(Table1;Fig.1).UnlikewithSangersequencing,with
manyNGSmethods,thereislesscontroloverexactlywhatregions
ofthegenomearesequenced.Thetrade-offisthatthenumberof
targets(andthenumberofreadsassociatedwitheachtarget)isin-
creasedbyordersofmagnitude.GenomicDNA(gDNA)mustbe
preparedinadvancetocontainorthologousregionsamongindi-
viduals,preferablywithoutrequiringhundredstothousandsof
PCRs.Theneedtoreducethegenometooverlappingsubsetsmight
becomeunnecessaryasitbecomescost-effectivetosequence
wholegenomesforhundredsofindividualsandoncethetheory
andanalysisoffull-genomedataisbetterdeveloped(seeSec-
tion4);however,forthetimebeingtheissueofhowtoreduce
thegenomesofmanyindividualstoorthologousfragmentsre-
mainsasignificantobstacletoincorporatingNGSmethodsinto
phylogeographyandphylogenetics.InSection2andTable1,we
discussdifferentwetlabmethodsforgenomicreduction,including
thosethatselectexpressedgenes,randomfragmentsfrom
throughoutthegenome,andothertargetedregions.
1.2.2.Cost-effectivemultiplexingandlibrarypreparation
ThesecondchallengeisthatusingNGSforphylogeographyand
phylogeneticsisonlycost-effectiveifmanyindividualscanbe
combined(multiplexed)inthesamesequencingrunandthecosts
subdividedamongmanysamples(Glenn,2011).Multiplexingin-
volvestheapplicationofshortidentifyingDNAsequences(called
‘‘indexes’’,‘‘barcodes’’,orthetermwewillusehere–‘‘tags’’)that
areincorporatedintotheDNAfragmentseitherbyPCR(BinladenJ.E.McCormacketal./MolecularPhylogeneticsandEvolution66(2013)526–538527