当前位置:文档之家› Applications of next-generation sequencing to phylogeography and phylogenetics

Applications of next-generation sequencing to phylogeography and phylogenetics

Applications of next-generation sequencing to phylogeography and phylogenetics
Applications of next-generation sequencing to phylogeography and phylogenetics

Review

Applications of next-generation sequencing to phylogeography and phylogenetics

John E.McCormack a ,?,Sarah M.Hird a ,b ,Amanda J.Zellmer b ,Bryan C.Carstens b ,Robb T.Brum?eld a ,b

a Museum of Natural Science,Louisiana State University,Baton Rouge,LA 70803,United States

b

Department of Biological Sciences,Louisiana State University,Baton Rouge,LA 70803,United States

a r t i c l e i n f o Article history:

Available online 14December 2011Keywords:

Population genomics Coalescence

Reduced representation library Target enrichment

High-throughput sequencing

a b s t r a c t

This is a time of unprecedented transition in DNA sequencing technologies.Next-generation sequencing (NGS)clearly holds promise for fast and cost-effective generation of multilocus sequence data for phylo-geography and phylogenetics.However,the focus on non-model organisms,in addition to uncertainty about which sample preparation methods and analyses are appropriate for different research questions and evolutionary timescales,have contributed to a lag in the application of NGS to these ?elds.Here,we outline some of the major obstacles speci?c to the application of NGS to phylogeography and phylogenet-ics,including the focus on non-model organisms,the necessity of obtaining orthologous loci in a cost-effective manner,and the predominate use of gene trees in these ?elds.We describe the most promising methods of sample preparation that address these challenges.Methods that reduce the genome by restriction digest and manual size selection are most appropriate for studies at the intraspeci?c level,whereas methods that target speci?c genomic regions (i.e.,target enrichment or sequence capture)have wider applicability from the population level to deep-level phylogenomics.Additionally,we give an over-view of how to analyze NGS data to arrive at data sets applicable to the standard toolkit of phylogeogra-phy and phylogenetics,including initial data processing to alignment and genotype calling (both SNPs and loci involving many SNPs).Even though whole-genome sequencing is likely to become affordable rather soon,because phylogeography and phylogenetics rely on analysis of hundreds of individuals in many cases,methods that reduce the genome to a subset of loci should remain more cost-effective for some time to come.

ó2011Elsevier Inc.All rights reserved.

Contents 1.

Introduction (527)

1.1.Multilocus studies and the promise of next-generation sequencing .......................................................5271.

2.Specific challenges to applying NGS to phylogeography and phylogenetics .................................................

5271.2.1.The need for homologous DNA regions from many individuals ...................................................5271.2.2.Cost-effective multiplexing and library preparation ............................................................5271.2.3.The long reign of the gene tree in phylogeography and phylogenetics .............................................

5301.3.Primary data collection or marker development?......................................................................5302.

Review of wet lab methods for sample preparation .........................................................................5302.1.Multiplex PCR and amplicon sequencing.............................................................................5302.2.Restriction digest-based methods ..................................................................................

5302.2.1.RAD sequencing .........................................................................................5312.2.2.Other restriction digest-based methods ......................................................................

5312.3.Target enrichment...............................................................................................

5312.3.1.Probe sets designed from ultraconserved elements for phylogenomics .............................................5322.3.2.Probe sets designed from closely related genomes and transcriptome libraries ......................................

5322.4.Transcriptome sequencing ........................................................................................5323.

Data analysis and bioinformatics ........................................................................................5323.1.The difference between Sanger and NGS data and the importance of coverage..............................................

532

1055-7903/$-see front matter ó2011Elsevier Inc.All rights reserved.doi:10.1016/j.ympev.2011.12.007

Corresponding author.Address:Moore Laboratory of Zoology,Occidental College,1600Campus Rd.,Los Angeles,CA 90041,United States.

E-mail address:mccormack@https://www.doczj.com/doc/f43653401.html, (J.E.McCormack).

3.2.Determining orthologous vs.paralogous loci (533)

3.3.From raw NGS output to formats appropriate for phylogeography and phylogenetics (533)

3.3.1.Filtering unprocessed NGS data and quality control (533)

3.3.2.Alignments (534)

3.3.3.Genotype calling (534)

3.4.Data analysis (535)

4.Future directions (535)

Acknowledgments (535)

References (535)

1.Introduction

1.1.Multilocus studies and the promise of next-generation sequencing

Using multiple loci to infer population and species histories has become the baseline in phylogeography and phylogenetics. Although these two?elds came to the multilocus approach for dif-ferent historical reasons(see Brito and Edwards,2008),multilocus studies in both?elds bene?ted from decreasing costs of DNA sequencing in the last three decades.More recently,statistical phy-logeography(Knowles,2009)and the emerging species-tree para-digm of phylogenetics(Edwards,2009)provided convincing theoretical arguments for incorporating information from multiple loci into estimates of population and species history to account for random variation in patterns of gene inheritance(i.e.,coalescent stochasticity).The practical outcome of these developments is that most practitioners of phylogeography and phylogenetics have spent a signi?cant portion of their time in the last decade develop-ing and screening molecular markers suitable to their study system and appropriate to their evolutionary timescale of interest.

Even with the increasing availability of molecular markers for non-model organisms(Edwards,2008;Thomson et al.,2010),the process of data generation for a multilocus study is laborious.In the fast-moving?eld of molecular biology,it seems ever more unjusti?able to embark on the lengthy process of screening loci for variability(assuming primers already exist),amplifying and sequencing DNA for each sample at each locus,and phasing nucle-ar data via computation or cloning.Researchers of phylogeography and phylogenetics have understandably looked toward next-generation sequencing(NGS)with great interest as a potential means to condense the many steps of multilocus data generation for non-model organisms into a more time-ef?cient and cost-effec-tive process(Ekblom and Galindo,2010;Lerner and Fleischer, 2010).

Despite this obvious potential,NGS has been slow to take root in phylogeography and phylogenetics compared to other?elds like metagenomics and disease genetics(Mardis,2008).We suggest that this lag has been caused by four speci?c aspects of phylogeo-graphic and phylogenetic research:the predominant focus on non-model organisms,the need for sequencing large numbers of sam-ples per species,the lack of consensus regarding library prepara-tion protocols for particular research questions,and the transitional state of the technology(whole-genome data are still neither cost-effective,nor even desirable for phylogeography and phylogenetics,but are paradoxically easier to collect).Conse-quently there are as yet relatively few published papers where NGS has been used to generate the kind of phylogeographic or phy-logenetic data sets that appear in Molecular Phylogenetics and Evolution.

The purpose of this review is to address the speci?c challenges of applying NGS to phylogeography and phylogenetics of non-model organisms and to highlight emerging methods of sample preparation and data analysis that MPE’s readers may?nd useful.We will not focus on uses of NGS for ecological genomics or adap-tive divergence because these topics have been reviewed in detail elsewhere(Stapley et al.,2010;Rice et al.,2011).Rather,we focus on uses of NGS to reconstruct evolutionary and demographic his-tory from the level of populations to species.We brie?y touch on the steady convergence of phylogeography with ecological genom-ics in the section on future directions.

1.2.Speci?c challenges to applying NGS to phylogeography and phylogenetics

There are two substantial challenges to empirical researchers wishing to collect sequence data using NGS:sample preparation and bioinformatics.Here and in Section2,we address the former, while we devote Section3to the latter.

1.2.1.The need for homologous DNA regions from many individuals

Phylogeography and phylogenetics require homologous geno-mic regions from multiple individuals to infer gene genealogies and phylogenetic https://www.doczj.com/doc/f43653401.html,ing traditional Sanger DNA sequencing methods,the process of generating overlapping genomic regions is highly targeted and straightforward,involving primer design fol-lowed by PCR for each individual at each locus.With NGS,DNA is not necessarily enriched for a single locus via PCR-based ampli?ca-tion,although this is one possible application,but for many loci through a variety of methods involving reduction of the size of the genome(Table1;Fig.1).Unlike with Sanger sequencing,with many NGS methods,there is less control over exactly what regions of the genome are sequenced.The trade-off is that the number of targets(and the number of reads associated with each target)is in-creased by orders of magnitude.Genomic DNA(gDNA)must be prepared in advance to contain orthologous regions among indi-viduals,preferably without requiring hundreds to thousands of PCRs.The need to reduce the genome to overlapping subsets might become unnecessary as it becomes cost-effective to sequence whole genomes for hundreds of individuals and once the theory and analysis of full-genome data is better developed(see Sec-tion4);however,for the time being the issue of how to reduce the genomes of many individuals to orthologous fragments re-mains a signi?cant obstacle to incorporating NGS methods into phylogeography and phylogenetics.In Section2and Table1,we discuss different wet lab methods for genomic reduction,including those that select expressed genes,random fragments from throughout the genome,and other targeted regions.

1.2.2.Cost-effective multiplexing and library preparation

The second challenge is that using NGS for phylogeography and phylogenetics is only cost-effective if many individuals can be combined(multiplexed)in the same sequencing run and the costs subdivided among many samples(Glenn,2011).Multiplexing in-volves the application of short identifying DNA sequences(called ‘‘indexes’’,‘‘barcodes’’,or the term we will use here–‘‘tags’’)that are incorporated into the DNA fragments either by PCR(Binladen

J.E.McCormack et al./Molecular Phylogenetics and Evolution66(2013)526–538527

et al.,2007)or ligation (Meyer et al.,2008).These tags identify an individual prior to pooling with other tagged samples (Craig et al.,2008).Sequences are later sorted with bioinformatics.The most desirable tag sequences are those that are as short as possible while still being multiple base pairs away from other tags in the pool so that reads with sequencing errors in the tag sequence can still be allocated to the proper sample (Hamady et al.,2008).Simple methods for hierarchical tagging (Neiman et al.,2011)ex-pand the possible number of pooled samples from hundreds to

thousands.

methods of sample preparation for NGS.(a)amplicon sequencing in which PCR products are tagged and pooled for NGS,resulting in a largely complete restriction-digest based methods,where genome reduction occurs by manual size selection.Note that mutations in the restriction site (as in individual null allele;(3)target enrichment,in which probes ‘‘catch’’gDNA,which is then pooled for NGS.The hybridization of probe to gDNA is robust to some variation,missing loci (show in individual 2)can result in missing data.

NGS also requires platform-speci?c,often proprietary adaptor sequences to be incorporated into DNA fragments(this step is called library preparation and often occurs in conjunction with tag-ging).These adaptor sequences provide priming sites or hybridiza-tion targets for sequencing,depending on the NGS platform(e.g., linkers A and B for Roche454pyrosequencing or adaptors P1and P2for Illumina sequencing-by-synthesis).Incorporating both tags and adaptors to template DNA can be expensive if proprietary kits are used(e.g.,Nextera?kits list price at$2200for20samples).A recent review calculated that library preparation was10times more expensive per sample than the sequencing itself(Glenn, 2011).In Section2,we mention options for more cost-effective li-brary preparation if they are available.

1.2.3.The long reign of the gene tree in phylogeography and phylogenetics

Another issue is the historical importance of utilizing gene trees in phylogeography and phylogenetics(Brito and Edwards,2008), and the preeminence of coalescent-based analytical methods that either require,or are currently best utilized,in concert with gene trees(Kuhner,2009;Liu et al.,2009a;Pinho and Hey,2010).At present,gene trees are most robustly inferred from loci with high information content,for example,a non-recombining locus con-taining a series of linked SNPs.Individual SNPs,on the other hand, have low information content on a per-locus basis and have been used predominately with classi?cation methods such as Structure (Pritchard et al.,2000)and principal components analysis(e.g., Novembre et al.,2008),although more versatile analyses are emerging(see below).While distance-based genealogies and phy-logenies can be built from unlinked SNPs(e.g.,Emerson et al., 2010),this ignores models of molecular substitution and probabi-listic tree-searching algorithms that have led to more robust phy-logenetic inference in the last several decades.

The practical problem is that most existing NGS technologies (e.g.,Illumina)produce short reads and therefore are best suited for generating SNPs,not whole loci featuring many linked SNPs. This has limited the application of NGS data with respect to the standard analytical tool kit of phylogenetics and phylogeography. Certainly,this problem will resolve itself as NGS platforms con-verge on longer reads,and with the advent of third generation sequencing platforms(e.g.,PacBio,Ion Torrent,Starlight).It is also possible that new analytical techniques will reduce our depen-dence on genes trees(Bryant et al.,2012;Naduvilezhath et al., 2011;Sirén et al.,2011).Until then,methods that can generate data amenable to gene-tree analysis will be preferred in phyloge-ography and phylogenetics.We point out which methods work well with gene trees versus those that generate unlinked SNPs.

1.3.Primary data collection or marker development?

A major consideration from the perspective of time investment is whether NGS data are amenable for use as primary data(e.g., Emerson et al.,2010)rather than being used to develop markers for later sequencing/genotyping(e.g.,Williams et al.,2010).Down-stream genotyping could still utilize high-throughput technologies, such as a SNP chip(Wang et al.,1998;Lipshutz et al.,1999;Buetow et al.,2001;Decker et al.,2009),or it could follow the more con-ventional route of primer design followed by individual PCR.The problem is that,unlike with Sanger sequencing,which is highly targeted to individual amplicons,NGS occurs on a pooled sample of amplicons.Although the researchers can make some decisions that in?uence which fragments of the initial pool are sequenced with suf?cient coverage(e.g.,the number of samples/individuals), pooled NGS sequencing is also subject to many stochastic factors, such as PCR bias and unequal sample pooling,which can result in patchy data matrices with large amounts of missing data.Meth-ods that allow more even distribution of sequencing,such as PCR-less library preparation(Kozarewa et al.,2009),will ameliorate this problem to some degree,but stochasticity in coverage will likely still play a larger role in NGS methods than in Sanger sequencing. In Section3,we discuss some factors relating to the use of NGS re-sults as primary data,such as coverage and analytical methods that permit missing data.

2.Review of wet lab methods for sample preparation

There are several existing reviews of different NGS sequencing platforms and chemistries(Shendure and Ji,2008;Glenn,2011), so we discuss platform-speci?c details sparingly.The major differ-ence for the purpose of this review is between platforms that pro-duce fewer longer reads($400–800bp,454Titanium)versus those that produce orders of magnitude more short reads($70–200bp, Illumina HiSeq).

2.1.Multiplex PCR and amplicon sequencing

Perhaps the most straightforward application of NGS to phylo-geography and phylogenetics is amplicon sequencing,or NGS sequencing of PCR products that have already been generated by Sanger sequencing(Fig.1a).When multiple loci for an individual are tagged and pooled with tagged loci of other individuals,this method is called parallel tagged sequencing(Meyer et al.,2008). Bene?ts over typical Sanger sequencing include faster sequencing time and one-step phasing of nuclear DNA(because NGS results are single-stranded).Parallel tagged sequencing,however,does not remove the process of PCR for each individual at each locus, which can be the most onerous part of a multilocus phylogeo-graphic study.An alternative to multiple PCRs is multiplex PCR methods with multiple primer pairs,but this approach has several limitations(described in Mamanova et al.,2009)including biased representation of some products as well as chimeric DNA se-quences.Some promising alternatives have emerged,such as microdroplet PCR(Tewhey et al.,2009b),where millions of PCR reactions occur in picoliter-sized droplets before being pooled to-gether,and the96.96Dynamic Array?by Fluidigm(Seeb et al., 2009;Smith et al.,2011),which allows96primer combinations to be used on96samples(9216total PCR reactions)using a plate-based nano?uid technology.These methods circumvent some of these challenges of multiplex PCR by ensuring that each primer combination is ampli?ed separately(albeit in parallel), but thus far they have been little applied to phylogeography,so their ease-of-use and cost-effectiveness,though promising,is dif?-cult to gauge.

Parallel tagged sequencing may be the most cost-effective method for small to medium-sized projects with few loci that am-plify well across individuals(e.g.,Grif?n et al.,2011).For this rea-son,it has been used effectively in phylogenetic studies involving whole mitochondrial sequencing(Chan et al.,2010;Morin et al., 2010;Gunnarsdóttir et al.,2011)and whole chloroplast sequenc-ing(Parks et al.,2009).It is also useful for uncovering rare se-quences and has therefore been effectively employed in environmental sampling and metagenomics where a single locus is ampli?ed from a sample containing DNA from many organisms (Fierer et al.,2008)as well as for characterizing the major histo-compatibility complex where there are many alleles,some extre-mely rare(Babik et al.,2009;Kloch et al.,2010).Here,it is more cost-effective to build multiplexing tags(Faircloth and Glenn, 2012)and NGS platform-speci?c adaptor sequences into the ampli?cation primers(Binladen et al.,2007)in order to circumvent costly kit-based library preparation.

530J.E.McCormack et al./Molecular Phylogenetics and Evolution66(2013)526–538

2.2.Restriction digest-based methods

Several methods of genome reduction involve digestion of tem-plate DNA with restriction enzymes and manual excision of some fragment size range from an agarose gel to produce a reduced rep-resentation library(RRL)(Fig.1b).RRLs pre-date the existence of NGS(Altshuler et al.,2000;Nicod and Largiadèr,2003),and library preparation protocols were then adapted to NGS platforms(Van Tassell et al.,2008).No genomic resources are required in advance, so restriction digest-based methods are in many ways ideal for non-model organisms without a close relative with a sequenced genome.

There are several potential drawbacks of restriction digest-based methods.Selecting a fragment size range by manual excision is subject to some human error,potentially leading to fewer orthol-ogous fragments across individuals.There are automated machines for size selection(PippinPrep?from Sage Science and the Lap-ChipòXT from Caliper LifeSciences),but they are somewhat expen-sive(currently$15,000–$20,000for equipment).However, methods that add barcodes and adaptors prior to the cutting step allow gel cutting of one pooled sample.

Another issue that has not been well addressed is null alleles, where mutations in the restriction site result in the loss of a frag-ment in some individuals.Null alleles are easy to detect when mutations in restriction sites are?xed among populations.How-ever,study populations that contain individuals heterozygous for null alleles present a more insidious problem,as it becomes more dif?cult to distinguish homozygotes from individuals with null al-leles.Failure to detect null alleles in highly heterozygous popula-tions could bias diversity statistics and phylogeographic inference.

Another potential drawback is that restriction-digest based methods are not suitable for deep-level phylogenetic studies,as mutations in the restriction sites quickly reduce the number of homologous fragments with increasing phylogenetic distance (McCormack et al.,2012;Althoff et al.,2007).Finally,most restric-tion-digest based methods are geared toward SNP generation (using short reads on the Illumina platform),which may not be ideal for researchers focusing on gene trees.However,most proto-cols could be adapted to platforms that produce longer reads(e.g., 454).We are also optimistic that this problem will likely be allevi-ated as all NGS platforms converge on longer reads.

2.2.1.RAD sequencing

Restriction-site Associated DNA(RAD)sequencing is the NGS method that has made the most impact on phylogeography and phylogenetics to date.As with other similar methods described be-low,DNA is digested with restriction enzymes and the resulting fragmented are size-selected from an agarose gel and sequenced via NGS.What sets RAD sequencing apart from other methods is that it combines tight control over the fragments resulting from the digest with ultra-deep sequencing across many individuals (Baird et al.,2008).For this reason,it is likely one of the most reproducible of the many restriction digest-based methods.Result-ing NGS reads are mined across individuals for SNPs that occur immediately adjacent to common digest sites.This method has proven effective at generating data for marker development(Miller et al.,2007),genome scans(Hohenlohe et al.,2010),and building distance-based phylogenies for recent demographic events(Emer-son et al.,2010).So far as we can tell,RAD sequencing has been carried out exclusively on the Illumina platform,which has limited resulting data to SNPs.However,paired-end Illumina sequencing should permit assembly of contigs up to500bp(Etter et al.,2011).

2.2.2.Other restriction digest-based methods

There are several alternatives to RAD sequencing that take advantage of restriction-digest based genome reduction(many are reviewed in Davey et al.,2011).The basic idea of these methods is akin to sequencing Ampli?ed Fragment Length Polymorphism (AFLP)fragments(Vos et al.,1995),except instead of variation in the restriction site,the variation of interest lies between common restriction sites.Variations on this method have been used to gen-erate thousands of SNPs for various organisms of agricultural importance(Van Orsouw et al.,2007;Van Tassell et al.,2008; Wiedmann et al.,2008;Amaral et al.,2009;Kerstens et al.,2009; Ramos et al.,2009;Sánchez et al.,2009;Hyten et al.,2010a,b), most involving many fewer individuals that would normally be as-sayed in a typical phylogenetic or phylogeographic study.Recent uses have involved wild populations(Bers et al.,2010)with appli-cation to phylogeography(Gompert et al.,2010;Williams et al., 2010).

In one example particularly relevant to phylogenetics,Decker et al.(2009)used a digest method to generate>40,000SNPs and resolved the recent phylogenetic history of extinct and extant ruminants.In this case,Van Tassell et al.(2008)discovered the SNPs using a restriction-digest based method and then Decker et al.(2009)later genotyped a subset of the SNPs en masse for61 species using a SNP chip(BeadChip,Illumina).Though producing a well-resolved tree based on a massive amount of data,this study also underscores the analytical limitations of SNP data,as the phy-logenetic analysis was restricted to a parsimony method that re-quired categorical coding of the data matrix(e.g.,homozygotes coded as‘‘0’’or‘‘1’’and heterozygotes coded as polymorphic).In fact,most restriction-digest studies have targeted SNPs with short reads from the Illumina platform.However,two recent studies have used this method in conjunction with454sequencing to gen-erate longer loci for use in coalescent-based phylogeographic anal-ysis,in addition to SNP-based assignment tests(McCormack et al., 2012;Zellmer et al.,2012).Paired-end sequencing should also al-low for the generation of whole loci for building gene trees with the Illumina platform.With large numbers of individuals,one eco-nomical approach is to add NGS adaptors and individual tags in a PCR step by incorporating them into the primer sequence(McCor-mack et al.,2012;Williams et al.,2010).Streamlined methods for adding adaptors and barcodes,while avoiding the use of proprie-tary kits,is a much needed area of methodological advancement.

2.3.Target enrichment

Target enrichment(also called‘‘sequence capture’’or‘‘targeted resequencing’’)involves the selective capture of genomic regions prior to NGS(Mamanova et al.,2009).In target enrichment (Fig.1c),fragmented gDNA is mixed with DNA or RNA probes, which hybridize to gDNA fragments either on an array(Albert et al.,2007;Hodges et al.,2007;Okou et al.,2007)or in solution (Gnirke et al.,2009;Maricic et al.,2010).Non-targeted DNA is then washed away,and targeted DNA is eluted and sequenced simulta-neously via NGS.

Compared to restriction-digest methods,in which random frag-ments are sequenced,target-enrichment is a non-random method of genome reduction.Consequently,it requires some prior genomic resources to design probes,although they can potentially be from distantly related species(see below).One bene?t to focusing on speci?c targets is that fragment size selection is easier and can make use of random genomic shearing(e.g.,mechanical,acoustic, or enzymatic)as opposed to the laborious manual size selection step of restriction-based methods.With target enrichment,indi-vidual samples can be tagged and multiplexed either post-enrich-ment or prior to enrichment(Kenny et al.,2011).The latter technique is especially promising from the perspective of cost per sample given that the probes themselves can be quite expensive.

J.E.McCormack et al./Molecular Phylogenetics and Evolution66(2013)526–538531

While the target enrichment technique is well described (Mamanova et al.,2009),most applications have been directed at human disease genes and exomes(Albert et al.,2007;Hodges et al.,2007;Ng et al.,2009;Tewhey et al.,2009a).Applications to phylogeography and phylogenetics will largely turn on the description of appropriate probe sets that are conserved enough that they bind genomic DNA across individuals(for phylogeogra-phy)and species(for phylogenetics)and yet show enough varia-tion to be informative at the time depth of interest.We discuss different options for probe design below.Another promising,re-lated technique is primer extension capture(PEC),which uses rel-atively short primer sequences as probes for sequence capture. Briggs et al.(2009)used PEC to capture and sequence the entire mtDNA genomes of?ve Neanderthals,allowing for their phyloge-netic analysis with modern humans.

A?nal bene?t of target enrichment is that probes can be tiled such that short reads from many tiled sections can later be assem-bled into larger contigs,obviating the problem of SNPs versus gene trees.This design maximizes the advantages of depth of coverage on a platform like Illumina,while minimizing the drawback of short read length.However,it should be noted that phasing loci generated from tiled probes can be problematic because the indi-vidual reads cannot be assigned to their respective chromosomal copies of origin within individuals.Thus,tiling might pose more of a problem for population-level studies,where coalescence among closely-related gene copies can bear strongly on the analy-sis,than it will for deep-level phylogenetics,where the coalescence times among species dwarf those between individual allele copies.

Once contigs are assembled from tiled reads,the thousands of independent loci that can be draw from target enrichment are ideal for use in species-tree analysis,an emerging systematic paradigm (Edwards,2009)that has been little applied to phylogenomics. Here,the major problem is likely computational,as many spe-cies-tree programs rely on simultaneous optimization of individual gene trees and the species tree(e.g.,BEST,Edwards et al.,2007;?BEAST,Heled and Drummond,2010).Methods that use summary statistics to arrive at a fast,analytical solutions to the species tree (Liu et al.,2009b)offer one potential workaround to this problem, and are currently the only solution for phylogenomic data sets fea-turing hundreds to thousands of loci.Alternatively,algorithmic methods such as STEM(Kubatko et al.,2009)take gene trees as in-put,thus allowing gene trees from individual loci to be estimated in parallel prior to species tree estimation.However,there is a pressing need for further methodological development so that ana-lytical robustness need not be sacri?ced for time ef?ciency.

2.3.1.Probe sets designed from ultraconserved elements for phylogenomics

Much like universal priming sites for mitochondrial DNA(Ko-cher et al.,1989),conserved probes allow for maximal applicability across taxonomically diverse organisms.A conserved probe works much like a pair of primers located in conserved coding regions, except that variability would be sought in the regions?anking the probe instead of in between primers.What are the options for probe sets that are conserved enough to work across species and higher taxonomic groups for phylogenomics?

One promising development toward universal target enrich-ment for deep-level phylogenomics is the discovery of ultracon-served elements(UCEs)in mammals(Bejerano et al.,2004).The exact de?nition of a UCE differs among studies.De?ned loosely, UCEs are genomic regions that show remarkable(in some cases 100%)conservation over a‘‘long’’stretch of DNA(generally50–200bps)among widely divergent organisms.The structure and function of UCEs is an active area of research.UCEs also possess properties that make them highly desirable as anchors for genetic markers.First,they are found in high numbers throughout the gen-ome.Second,they appear to have little overlap with known para-logous genes(Derti et al.,2006),whose occurrence is dif?cult to detect and can confound phylogenetic inference(Philippe et al., 2011).Third,because variability increases moving toward the ?anks,UCEs and?anking DNA might harbor phylogenetic signal useful for phylogenetic reconstruction at multiple evolutionary timescales(Faircloth et al.,2012).

A recent study showed that probes designed from UCEs con-served across amniotes(e.g.,mammals,reptiles,and birds)have suf?cient information content to resolve the primate tree of life and enrich over800loci in9bird species to resolve the phylogeny of three basal bird lineages(Faircloth et al.,2012).Another study of >2000UCEs shared among the chicken,zebra?nch,and Anolis liz-ard genomes found that nearly1000UCEs could also be located in the27mammals with sequenced genomes,elucidating their evolu-tionary history with as many as917loci(McCormack et al.,2012). Another study used UCE probes to capture a complete data matrix of over1000loci for six reptiles(Crawford et al.,2012).Combined with data obtained from existing genomes of birds and mammals, the resulting phylogeny revealed the evolutionary af?nities of tur-tles with archosaurs(bird and crocodilians)with perfect support. The description of UCEs in diverse animal groups from tetrapods (Stephen et al.,2008)and reptiles(Janes et al.,2011)to inverte-brates and yeast(Siepel et al.,2005)suggest that conserved probe sets for target enrichment may be applicable across a broad swath of the tree of life.

2.3.2.Probe sets designed from closely related genomes and transcriptome libraries

Given the amount of genomic resources now available,and the number of genome sequencing efforts currently being undertaken, it is also becoming feasible to design probe sets more targeted to individual species or groups using the genome of a closely related species.Due to their general conservation,yet high information content found at degenerate third codon positions,exons are a par-ticularly appealing target for sequence capture.As there is cur-rently more transcriptome data available than whole-genome data,probe sets could also be designed from transcriptome li-braries,either undertaken especially from the species of interest or mined from existing transcriptomes of closely-related species. The potential drawback is that transcriptome libraries are highly variable depending on tissue type and many other variables.One attempt to align transcriptomes of10taxonomically divergent bird species did not?nd a large number of overlapping loci(Kuenster et al.,2010).

2.4.Transcriptome sequencing

Sequencing the transcriptome itself(RNA-seq)can be viewed as another method of genomic reduction where the remaining subset of gDNA is not random,as with restriction-digest,but contains the set of expressed genes(Marioni et al.,2008;Morin et al.,2008; Wang et al.,2009).While transcriptome sequencing is more preva-lent in studies of adaptation and ecological genomics,a recent study by Hittinger et al.(2010)showed that transcriptome se-quences could be used to recover the known phylogeny of a group of mosquitoes,demonstrating their potential utility to phylogeog-raphy and phylogenetics.Nabholz et al.(2011)sequenced the brain transcriptomes of nine bird species and used the resulting data to build a phylogeny.Transcriptomes have also been mined for SNPs in a variety of species(Chepelev et al.,2009;Cánovas et al.,2010; Barbazuk and Schnable,2011;Geraldes et al.,2011).

532J.E.McCormack et al./Molecular Phylogenetics and Evolution66(2013)526–538

3.Data analysis and bioinformatics

3.1.The difference between Sanger and NGS data and the importance of coverage

The most obvious difference between Sanger sequence data and NGS data is what is initially most daunting about NGS data:quan-tity.Simply storing unprocessed NGS data requires signi?cant computer memory and often hardware upgrades or remote(i.e., online or‘‘cloud’’)storage.Whereas a reasonably large Sanger dataset may contain500sequences,typical454and Illumina runs generate1,000,000–2,000,000,000sequences,and these numbers are increasing rapidly as the sequencing platforms are re?ned.Data set sizes now are measured in terabytes and?le transfer is often conducted through normal postal mail due to the uploading/down-loading time and cost of sending?les through the internet or cloud. However,logistical dif?culties aside,these numbers are in some ways deceptively large,because another major difference between NGS data and Sanger sequence data is the quality of the reads.

Chromatograms are an intuitive way to assess the quality of a given Sanger sequence because the colored peaks are a re?ection of the strength of each nucleotide’s signal.Frequently,with Sanger data,a human being has evaluated all or most of the bases called by the sequencer.This is not possible with even a small NGS data-set.NGS quality scores are an integral part of the sequence data it-self,and come either as a series of integers or letters corresponding to every base called by the NGS platform.These are very different paradigms:with Sanger data you get,in essence,a more or less true snapshot of a pool of amplicons at every position.With NGS data,you get a slightly imperfect representation of some of the in Sanger data is relatively straightforward when using targeted primer pairs.A signal of more than two alleles(or more than one allele when mtDNA is studied)in the chromatogram is a clear sign that more than one gene copy has been sequenced.In addition, methods for detecting paralogs,such as SSCP(Sunnucks et al., 2000),are available for con?rmation.

In contrast,NGS occurs on a single strand,so paralogy cannot be detected until after data are aligned.Evidence for paralogy from NGS alignments includes any biological signal that too many al-leles have been grouped into a single,putatively homologous locus (e.g.,three alleles for a diploid).These signals include(but are not limited to)more than two bases at a particular position(especially with greater than1X coverage)or elevated values for observed het-erozygosity at a given position in an alignment.Additionally,when viewing alignments of one’s data,it is often obvious that too much variation exists for a single locus(Fig.2);for this reason,it is a good idea to become familiar with some raw data,even if there is too much to check them all manually.One complication is that low coverage and uneven read distribution generated by stochasticity in PCR and sequencing can mask this evidence.To avoid paralogous loci,some researchers have eliminated the alignments(or contigs) with the highest coverage(Emerson et al.,2010)because copy number variants are often correlated with high coverage loci in NGS data(Alkan et al.,2009).Designating a maximum coverage cutoff is less than ideal,however,because one may end up throw-ing out good data.A method that incorporates error rate and eval-uates all reads in an alignment to assign a measure of con?dence in the number of supported alleles would be highly desirable and awaits development.

combined effect of PCR bias,PCR error,and sequencing error on calling alleles for paralogous loci.(a)Duplications(dotted lines)produce paralogs.

different mutations to result in an underlying genetic signal for the eight paralogous loci.(c)Stochastic PCR bias during the ampli?cation in some of the loci being ampli?ed more than others,and one locus not being ampli?ed at all.(d)PCR error during the ampli?cation step

noise.(e)Error during NGS adds an additional layer of noise.(f)The two alleles that would be called from the reads in(e)if they were processed PRGmatic(Hird et al.,2011).Scanning the alignment by eye or by applying heterozygosity tests indicates that the alignment actually

J.E.McCormack et al./Molecular Phylogenetics and Evolution66(2013)526–538533

include discarding sequences shorter than some value,which is appropriate when there is a good idea for the target size,as with the restriction digest-based methods described above.One may also choose to discard sequence length outliers,as these are often associated with sequencing error(Oliver et al.,2010)or any reads containing an unidenti?ed base(‘‘N’’).

The second step of?ltering is to sort the data by tag and remove primer and barcode sequence(if necessary)from the?anks of the reads.This is often performed by custom Perl or Python scripts,but there are some free online services such as the Ribosomal Database Project’s Pyrosequencing Pipeline(Cole et al.,2009),the Galaxy website,or Mothur(Table2).If one has selected a set of error-cor-recting tags(see above),one must make the choice whether to re-tain primer sequence and/or tags that contain errors(and how many errors are permitted).Although it may be tempting to relax quality standards to increase the number of reads retained,low-quality data will usually require more coverage,for instance for high-con?dence heterozygote calls.

3.3.2.Alignments

Calling genotypes(SNPs or haplotypes)requires alignments,i.e. sets of homologous reads.Whereas processing and?ltering NGS data is,at its most basic,a matter of text manipulation and editing, alignments require computational resources and ef?cient algo-rithms.There are two types of alignment methods,those that use a reference and those that do not(de novo).A reference does not necessarily imply a reference genome,such as that from a model organism,but rather some information on the output reads,includ-ing,for example,the probe sequences used for target enrichment.

A good,de novo assembler(e.g.,Velvet,see Table2)is the gold standard for analysis,but requires considerable computation time and resources.On the other hand,using a reference allows very quick alignment since alignment of high-quality reads can be re-stricted to the reference(instead of requiring pairwise comparison to each other).There are many NGS analytical tools that align a set of reads to a reference(see Table2).If no reference is available,for example in most of the restriction-digest based methods,there are clustering programs(like CAP3),which,like de novo assemblers, collect reads into groups within a given percent similarity(in addi-tion to other parameters)and generate alignments from these clus-ters.Several pipelines offer streamlined solutions for taking raw data(without a reference genome)to called genotypes(Table2). Two examples of open-source software are PRGmatic(Hird et al., 2011),which builds high-con?dence clusters into a provisional-reference genome,to which all reads are then aligned;and Stacks (Catchen et al.,2011),in which‘‘stacks’’of reads corresponding to loci are created,permitting genotype calling.Many commercially available multi-functional programs will do pre-processing and alignments(Table2).

3.3.3.Genotype calling

The?nal step in taking raw NGS data to a format that is useable for phylogeography and phylogenetics is genotype calling,or call-ing two(diploid)alleles from all the reads for a given SNP or locus

Table2

Programs for quality control,assembling,and analyzing NGS data for phylogeography and phylogenetics.

Program QC Alignment Allele

Calling SNP

Calling

Visualization Open

Source?

Computer Platforms Other Functions References

CLOTU X C Y Internet Automated

BLAST

Kumar et al.(2011)

Galaxy X R X Y Internet Goecks et al.(2010) DNAstar SeqMan

Ngen

X R,D X X X N Windows,MacOSX,Linux https://www.doczj.com/doc/f43653401.html, CLC Genomics

Workbench

X R,D X X N MacOSX,Linux,Windows https://www.doczj.com/doc/f43653401.html,/

Geneious X R,D X X N WindowsVista,MacOSX https://www.doczj.com/doc/f43653401.html,/ GATK X X X X Y MacOSX,Linux DePristo et al.(2011)

RDP

Pyrosequencing

Pipeline X Y Internet Microbial DNA

Analyses

Cole et al.(2009)

Mothur X Y Windows,MacOSX,Linux https://www.doczj.com/doc/f43653401.html,/ STACKS C X X Y Unix Genetic

mapping

Catchen et al.(2011)

CAP3C Y Windows,MacOSX,Linux,

Solaris,Internet

Huang and Madan(1999)

PRGmatic C,D X X Y MacOSX Hird et al.(2011) ABySS D Y Any Simpson et al.(2009) SAMtools R X X Y Unix Li et al.(2009) BWA R Y Any(C++source)Li and Durbin(2009) Bowtie R Y Windows,MacOSX,Linux Langmead et al.(2010) Exonerate R Y Unix Slater and Birney(2005) Novocraft R Y MacOSX,Linux Hercus,C.2009.http://

https://www.doczj.com/doc/f43653401.html, Stampy R Y MacOSX,Linux Lunter and Goodson,2011 SOAP R,D X Y Any(C++source)Li et al.(2008) MIRA R,D X Y MacOSX,Linux Automatic error

removal

Chevreux et al.(1999) Velvet R*,D Y MacOSX,Linux,cygwin Zerbino and Birney(2008) Bambino X X Y Windows,MacOSX,Linux Edmonson et al.(2011) VarScan X Y Any(JAVA source)Koboldt et al.(2009) Casava X N Linux Illumina propietary Tablet X Y Windows,MacOSX,Linux,

Solaris

Milne et al.(2010)

QC=quality control.

R=reference.

D=de novo.

C=cluster generation.

*Velvet can use reference reads but it treats them as‘‘just another’’read,not a reference.

534J.E.McCormack et al./Molecular Phylogenetics and Evolution66(2013)526–538

(for a thorough review,see Nielsen et al.,2011).In its most straightforward application,genotype calling can be conducted based on threshold values.For instance,a base position would be declared a valid SNP if polymorphism were detected in a certain threshold percentage of total reads for an individual(say20%).In some cases,thresholds can lead to bias toward rare alleles(Johnson and Slatkin,2008).Thus,more statistically savvy genotype calling algorithms are based on probability theory,which permit incorpo-ration of sequencing error(Johnson and Slatkin,2006;Hellmann et al.,2008;Lynch,2009;Hohenlohe et al.,2010;Andolfatto et al.,2011;Gompert and Buerkle,2011).

Pooling individuals is another way of generating population-le-vel allele frequency data(Cutler and Jensen,2010).It saves money by limiting the number of barcode adaptors or library preparations needed and can be useful for marker discovery and describing some divergence statistics such as F ST(Gompert et al.,2010;Ko?er et al.,2011).However,it prevents simultaneous genotype calling at the level of the individual as well as paralog detection on the basis of observed heterozygosity.For the purposes of using NGS for phy-logeography and phylogenetics,there seems to be much to gain by adding tags to individuals instead of population pools.

3.4.Data analysis

Because most existing NGS studies in phylogeography and phy-logenetics are based on SNPs(Emerson et al.,2010;Gompert et al., 2010;Williams et al.,2010),most analytical approaches used to date are those amenable to SNP data,such as PCA and Structure (for phylogeography)and distance-based methods for inferring phylogenies.PCA has the limitation that it requires complete data matrices(or statistical imputation of missing data),and some methods(especially the restriction-digest methods)are prone to missing data.While a full treatment of NGS data analysis demands its own review,our observation is that NGS data as it is currently being produced is too computationally demanding for the popular suite of probabilistic coalescence-based methods that form the core of phylogenetic and phylogeographic analysis(e.g.,BEAST, IMa,species-tree analysis),although using subsets of the data is al-ways an option.We note two trends:(1)longer NGS reads are making analysis with gene trees more feasible and(2)SNP data are increasingly useful for testing demographic hypotheses,for example those involving gene?ow(Durand et al.,2011).We ad-dress future prospects for data analysis in the next section.

4.Future directions

This is a time of incredible transition in sequencing technology. It is dif?cult to predict what the future holds,but it seems clear that at some point the important technological advances for phy-logeographers and phylogeneticists will plateau with the emer-gence of affordable whole-genome sequencing.Although the technology currently exists for reasonably inexpensive genome sequencing on the Illumina Hi-Seq platform,the cost for the num-ber of individuals typically employed in a phylogeographic or phy-logenetic project is still beyond the reach of most labs.Perhaps more important than the technology is the pace of change to ana-lytical resources.Phylogeography and phylogenetics is built on a ?rm foundation of resources for analyzing discrete loci.Mean-while,whole genome analysis is still in its infancy(e.g.,Sims et al.,2009;Vishnoi et al.,2010).We argue that,practically speak-ing,we are less limited by technology than we are by the ability of research labs–i.e.,humans–to adapt to new technologies and effectively harness their information content.Thus we echo the sentiments of Davey et al.(2011)that sequencing methods based on reduced representations of the genome(however they are tar-geted)will remain useful for many years to come,until a strong foundation for analyzing whole genomes emerges.One bene?cial contribution of whole-genome analysis will be to incorporate recombination into phylogenetic inference instead of ignoring it or mitigating its effects,as is the current mode when analyzing dis-crete loci.

An advance needed immediately is that current software for analyzing phylogenetic and population genetic data needs to be scaled-up to handle hundreds of loci in a reasonable timeframe. For example,although some analytical solutions to de?ning a spe-cies tree from many gene trees are nearly as accurate as probabilis-tic methods and return an answer almost instantaneously(Liu et al.,2009b),probabilistic methods are still more robust in most cases and preferred for dif?cult questions.The trade-off is whether we are willing to sacri?ce some degree of accuracy or certitude in order to have an answer at all,or at least on a timeline suited to today’s fast-paced speed of publishing.Alternatively,more thor-ough probabilistic methods will need to be streamlined and made more ef?cient.

Finally,as phylogeography becomes more genomic,it is only natural that it will increasingly merge with the now largely sepa-rate?eld of ecological genomics(or adaptation genomics).After all,both investigate the speciation process,but differ principally in their focus on adaptive versus neutral processes and their con-comitant use of different subsets of markers.Ecological genomics has utilized candidate genes and,increasingly,the full subset of ex-pressed genes interrogated with transcriptome sequencing.Phylo-geography has traditionally used‘‘neutrally evolving’’markers; however,there are few phylogeographers that would not also be interested in the actual genes underlying speciation in their sys-tem.This joint interest is already?ourishing in research utilizing genome scans to detect outlier loci potentially under selection from a large pool of other loci experiencing background(i.e.,neu-tral)divergence.These studies have mostly employed AFLPs,thus it is no surprise that similar analytical techniques geared toward restriction-digest based NGS data(Section 2.2)are also now appearing(Hohenlohe et al.,2010;Gompert and Buerkle,2011). We assume the integration of neutral and adaptive speciation pro-cesses will only accelerate with continuing analytical and techno-logical advances at the level of the genome.

Acknowledgments

Funding was provided by the National Science Foundation (DEB-0956069and DEB-0841729).We thank members of the Brum?eld and Carstens labs,J.Good,T.Glenn,B.Faircloth,J.-M. Roulliard,and S.Herke for conversations regarding next-genera-tion sequencing and comments on the manuscript.

References

Albert,T.J.,Molla,M.N.,Muzny,D.M.,Nazareth,L.,Wheeler,D.,Song,X.,Richmond, T.A.,Middle,C.M.,Rodesch,M.J.,Packard,C.J.,2007.Direct selection of human genomic loci by microarray hybridization.Nat.Methods4,903–905.

Alkan,C.,Kidd,J.M.,Marques-Bonet,T.,Aksay,G.,Antonacci,F.,Hormozdiari,F., Kitzman,J.O.,Baker,C.,Malig,M.,Mutlu,O.,2009.Personalized copy number and segmental duplication maps using next-generation sequencing.Nat.Genet.

41,1061–1067.

Althoff,D.M.,Gitzendanner,M.A.,Segraves,K.A.,2007.The utility of ampli?ed fragment length polymorphisms in phylogenetics:a comparison of homology within and between genomes.Syst.Biol.56,477–484.

Altshuler,D.,Pollara,V.J.,Cowles,C.R.,Van Etten,W.J.,Baldwin,J.,Linton,L.,Lander,

E.S.,2000.A SNP map of the human genome generated by reduced

representation shotgun sequencing.Nature407,513–516.

Amaral,A.J.,Megens,H.J.,Kerstens,H.H.D.,Heuven,H.,Dibbits,B.,Crooijmans, R.P.M.A.,Den Dunnen,J.T.,Groenen,M.A.M.,2009.Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome.

BMC Genomics10,374.

J.E.McCormack et al./Molecular Phylogenetics and Evolution66(2013)526–538535

Andolfatto,P.,Davison,D.,Erezyilmaz,D.,Hu,T.T.,Mast,J.,Sunayama-Morita,T., Stern,D.L.,2011.Multiplexed shotgun genotyping for rapid and ef?cient genetic mapping.Genome Res.21,610–617.

Babik,W.,Taberlet,P.,Ejsmond,M.J.A.N.,Radwan,J.,2009.New generation sequencers as a tool for genotyping of highly polymorphic multilocus MHC system.Mol.Ecol.Resour.9,713–719.

Baird,N.A.,Etter,P.D.,Atwood,T.S.,Currey,M.C.,Shiver,A.L.,Lewis,Z.A.,Selker,E.U., Cresko,W.A.,Johnson,E.A.,2008.Rapid SNP discovery and genetic mapping using sequenced RAD markers.PLoS One3,e3376.

Barbazuk,W.B.,Schnable,P.S.,2011.SNP discovery by transcriptome pyrosequencing.Methods Mol.Biol.729,225–246.

Bejerano,G.,Pheasant,M.,Makunin,I.,Stephen,S.,Kent,W.J.,Mattick,J.S.,Haussler,

D.,2004.Ultraconserved elements in the human genome.Science304,1321. Bers,N.

E.M.V.,Oers,K.V.,Kerstens,H.H.D.,Dibbits, B.W.,Crooijmans,R.P.M.A., Visser,M.E.,Groenen,M.A.M.,2010.Genome wide SNP detection in the great tit Parus major using high throughput sequencing.Mol.Ecol.19,89–99. Binladen,J.,Gilbert,M.T.,Bollback,J.P.,Panitz,

F.,Bendixen, C.,Nielsen,R., Willerslev,E.,2007.The use of coded PCR primers enables high-throughput sequencing of multiple homolog ampli?cation products by454parallel sequencing.PLoS One2,e197.

Briggs,A.W.,Good,J.M.,Green,R.E.,Krause,J.,Maricic,T.,Stenzel,U.,Lalueza-Fox,C., Rudan,P.,Brajkovi,D.,Ku an,Zˇ.,2009.Targeted retrieval and analysis of?ve Neandertal mtDNA genomes.Science325,318–321.

Brito,P.H.,Edwards,S.V.,2008.Multilocus phylogeography and phylogenetics using sequence-based markers.Genetica135,439–455.

Bryant,D.,Bouckaert,R.,Felsenstein,J.,Rosenberg,N.,RoyChoudhury,A.,2012.

Inferring species trees directly from biallelic genetic markers:bypassing gene trees in a full coalescent analysis.Mol.Biol.Evol.29,1917–1932.

Buetow,K.H.,Edmonson,M.,MacDonald,R.,Clifford,R.,Yip,P.,Kelley,J.,Little,D.P., Strausberg,R.,Koester,H.,Cantor,C.R.,2001.High-throughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-?ight mass spectrometry.Proc.Natl Acad.Sci.

USA98,581–584.

Cánovas,A.,Rincon,G.,Islas-Trejo,A.,Wickramasinghe,S.,Medrano,J.F.,2010.SNP discovery in the bovine milk transcriptome using RNA-Seq technology.Mamm.

Genome21,592–598.

Catchen,J.,Amores,A.,Hohenlohe,P.,Cresko,W.,Postlethwait,J.,2011.Stacks: building and genotyping loci de novo from short-read sequences.G3:Genes, Genomes,Genetics1,171–182.

Chan,Y.C.,Roos,C.,Inoue-Murayama,M.,Inoue,E.,Shih,C.C.,Pei,K.J.C.,Vigilant,L., 2010.Mitochondrial genome sequences effectively reveal the phylogeny of Hylobates gibbons.PLoS One5,e14419.

Chepelev,I.,Wei,G.,Tang,Q.,Zhao,K.,2009.Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq.Nucleic Acids Res.37,e106.

Chevreux,B.,Wetter,T.,Suhai,S.,1999.Genome sequence assembly using trace signals and additional sequence information.In:Computer Science and Biology: Proceedings of the German Conference on Bioinformatics(GCB),vol.99,pp.45–

56.

Cole,J.R.,Wang,Q.,Cardenas,E.,Fish,J.,Chai,B.,Farris,R.J.,Kulam-Syed-Mohideen,

A.S.,McGarrell,D.M.,Marsh,T.,Garrity,G.M.,Tiedje,J.M.,2009.The ribosomal

database project:improved alignments and new tools for rRNA analysis.

Nucleic Acids Res.37,D141–D145.

Craig, D.W.,Pearson,J.V.,Szelinger,S.,Sekar, A.,Redman,M.,Corneveaux,J.J., Pawlowski,T.L.,Laub,T.,Nunn,G.,Stephan, D.A.,2008.Identi?cation of genetic variants using bar-coded multiplexed sequencing.Nat.Methods5, 887–893.

Crawford,N.G.,Faircloth,B.C.,McCormack,J.E.,Brum?eld,R.T.,Winker,K.,Glenn, T.C.,2012.More than1000ultraconserved elements provide evidence that turtles are the sister group to archosaurs.Biol.Lett.8,783–786.

Cutler,D.J.,Jensen,J.D.,2010.To pool,or not to pool?Genetics186,41–43. Davey,J.W.,Hohenlohe,P.A.,Etter,P.D.,Boone,J.Q.,Catchen,J.M.,Blaxter,M.L., 2011.Genome-wide genetic marker discovery and genotyping using next-generation sequencing.Nat.Rev.Genet.12,499–510.

Decker,J.E.,Pires,J.C.,Conant,G.C.,McKay,S.D.,Heaton,M.P.,Chen,K.,Cooper,A., Vilkki,J.,Seabury,C.M.,Caetano,A.R.,2009.Resolving the evolution of extant and extinct ruminants with high-throughput phylogenomics.Proc.Natl Acad.

https://www.doczj.com/doc/f43653401.html,A106,18644–18649.

DePristo,M.A.,Banks,E.,Poplin,R.,et al.,2011.A framework for variation discovery and genotyping using next-generation DNA sequencing data.Nat.Genet.43, 491–498.

Derti, A.,Roth, F.P.,Church,G.M.,Wu, C.-t.,2006.Mammalian ultraconserved elements are strongly depleted among segmental duplications and copy number variants.Nat.Genet.38,1216–1220.

Durand,E.Y.,Patterson,N.,Reich,D.,Slatkin,M.,2011.Testing for ancient admixture between closely related populations.Mol.Biol.Evol.28,2239–2252. Edmonson,M.N.,Zhang,J.,Yan,C.,et al.,2011.Bambino:a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format.

Bioinformatics27,865–866.

Edwards,S.V.,2008.PERSPECTIVE:a sm?rg?sbord of markers for avian ecology and evolution.Mol.Ecol.17,945–946.

Edwards,S.V.,2009.Is a new and general theory of molecular systematics emerging?Evolution63,1–19.

Edwards,S.V.,Liu,L.,Pearl, D.K.,2007.High-resolution species trees without concatenation.Proc.Natl https://www.doczj.com/doc/f43653401.html,A104,5841–5936.Ekblom,R.,Galindo,J.,2010.Applications of next generation sequencing in molecular ecology of non-model organisms.Heredity107,1–15.

Emerson,K.J.,Merz,C.R.,Catchen,J.M.,Hohenlohe,P.A.,Cresko,W.A.,Bradshaw, W.E.,Holzapfel,C.M.,2010.Resolving postglacial phylogeography using high-throughput sequencing.Proc.Natl https://www.doczj.com/doc/f43653401.html,A107,16196–16200.

Etter,P.D.,Preston,J.L.,Bassham,S.,Cresko,W.A.,Johnson,E.A.,2011.Local de novo assembly of RAD paired-end contigs using short sequencing reads.PLoS One6, e18561.

Faircloth,B.C.,Glenn,T.C.,2012.Not all sequence tags are created equal:designing and validating sequence identi?cation tags robust to indels.PLoS One7, e42543.

Faircloth,B.C.,McCormack,J.E.,Crawford,N.G.,Harvey,M.G.,Brum?eld,R.T.,Glenn, T.C.,2012.Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales.Syst.Biol.61,717–726.

Fierer,N.,Hamady,M.,Lauber, C.L.,Knight,R.,2008.The in?uence of sex, handedness,and washing on the diversity of hand surface bacteria.Proc.Natl https://www.doczj.com/doc/f43653401.html,A105,17994–17999.

Geraldes,A.,Pang,J.,Thiessen,N.,Cezard,T.,Moore,R.,Zhao,Y.,Tam,A.,Wang,S., Friedmann,M.,Birol,I.,2011.SNP discovery in black cottonwood(Populus trichocarpa)by population transcriptome resequencing.Mol.Ecol.Resour.11 (Suppl.1),81–92.

Glenn,T.C.,2011.Field guide to next-generation DNA sequencers.Mol.Ecol.Res.11, 759–769.

Gnirke,A.,Melnikov,A.,Maguire,J.,Rogov,P.,LeProust,E.M.,Brockman,W.,Fennell, T.,Giannoukos,G.,Fisher,S.,Russ,C.,2009.Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing.Nat.

Biotechnol.27,182–189.

Goecks,J.,Nekrutenko,A.,Taylor,J.,Team,T.G.,2010.Galaxy:a comprehensive approach for supporting accessible,reproducible,and transparent computational research in the life sciences.Genome Biol.11,R86.

Gompert,Z.,Buerkle,C.A.,2011.A hierarchical Bayesian model for next-generation population genomics.Genetics187,903–917.

Gompert,Z.,Forister,M.L.,Fordyce,J.A.,Nice,C.C.,Williamson,R.J.,Buerkle,C.A., 2010.Bayesian analysis of molecular variance in pyrosequences quanti?es population genetic structure across the genome of Lycaeides butter?ies.Mol.

Ecol.19,1473–2455.

Grif?n,P.C.,Robin,C.,Hoffmann,A.A.,2011.A next-generation sequencing method for overcoming the multiple gene copy problem in polyploid phylogenetics, applied to Poa grasses.BMC Biol.9,19.

Gunnarsdóttir,E.D.,Li,M.,Bauchet,M.,Finstermeier,K.,Stoneking,M.,2011.High-throughput sequencing of complete human mtDNA genomes from the Philippines.Genome Res.21,1–11.

Hamady,M.,Walker,J.J.,Harris,J.K.,Gold,N.J.,Knight,R.,2008.Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex.Nat.

Methods5,235–237.

Heled,J.,Drummond,A.J.,2010.Bayesian inference of species trees from multilocus data.Mol.Biol.Evol.27,570–580.

Hellmann,I.,Mang,Y.,Gu,Z.,Li,P.,De La Vega,F.M.,Clark,A.G.,Nielsen,R.,2008.

Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals.Genome Res.18,1020–1029.

Hird,S.M.,Brum?eld,R.T.,Carstens,B.C.,2011.PRGmatic:an ef?cient pipeline for collating genome enriched second generation sequencing data using a ‘provisional reference genome’.Mol.Ecol.Res.11,743–748.

Hittinger, C.T.,Johnston,M.,Tossberg,J.T.,Rokas, A.,2010.Leveraging skewed transcript abundance by RNA-Seq to increase the genomic depth of the tree of life.Proc.Natl https://www.doczj.com/doc/f43653401.html,A107,1476–1481.

Hodges,E.,Xuan,Z.,Balija,V.,Kramer,M.,Molla,M.N.,Smith,S.W.,Middle,C.M., Rodesch,M.J.,Albert,T.J.,Hannon,G.J.,2007.Genome-wide in situ exon capture for selective resequencing.Nat.Genet.39,1522–1527.

Hohenlohe,P.A.,Amish,S.,Catchen,J.M.,Allendorf,F.W.,Luikart,G.,2011.RAD sequencing identi?es thousands of SNPs for assessing hybridization between rainbow trout and westslope cutthroat trout.Mol.Ecol.Res.11(Supp.1),117–122.

Hohenlohe,P.A.,Bassham,S.,Etter,P.D.,Stif?er,N.,Johnson,E.A.,Cresko,W.A.,2010.

Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags.PLoS Genet.6,e1000862.

Huang,X.,Madan,A.,1999.CAP3:A DNA sequence assembly program.Genome Res.

9,868–877.

Hyten,D.L.,Cannon,S.B.,Song,Q.,Weeks,N.,Fickus,E.W.,Shoemaker,R.C.,Specht, J.E.,Farmer,A.D.,May,G.D.,Cregan,P.B.,2010a.High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence.BMC Genomics11,38. Hyten,D.L.,Song,Q.,Fickus,E.W.,Quigley,C.V.,Lim,J.S.,Choi,I.Y.,Hwang,E.Y., Pastor-Corrales,M.,Cregan,P.B.,2010b.High-throughput SNP discovery and assay development in common bean.BMC Genomics11,475.

Janes,D.E.,Chapus,C.,Gondo,Y.,Clayton,D.F.,Sinha,S.,Blatti,C.A.,Organ,C.L., Fujita,M.K.,Balakrishnan,C.N.,Edwards,S.V.,2011.Reptiles and mammals have differentially retained long conserved noncoding sequences from the Amniote ancestor.Genome Biol.Evol.3,102–113.

Johnson,P.L.F.,Slatkin,M.,2006.Inference of population genetic parameters in metagenomics:a clean look at messy data.Genome Res.16,1320–1327. Johnson,P.L.F.,Slatkin,M.,2008.Accounting for bias from sequencing error in population genetic estimates.Mol.Biol.Evol.25,199–206.

Kenny,E.M.,Cormican,P.,Gilks,W.P.,Gates,A.S.,O’Dushlaine,C.T.,Pinto,C.,Corvin,

A.P.,Gill,M.,Morris, D.W.,2011.Multiplex target enrichment using DNA

indexing for ultra-high throughput SNP detection.DNA Res.18,31–38.

536J.E.McCormack et al./Molecular Phylogenetics and Evolution66(2013)526–538

Kerstens,H.H.D.,Crooijmans,R.P.M.A.,Veenendaal,A.,Dibbits,B.W.,Chin-A-Woeng, T.F.C.,den Dunnen,J.T.,Groenen,M.A.M.,https://www.doczj.com/doc/f43653401.html,rge scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology:applied to turkey.BMC Genomics10, 479.

Kloch,A.,Babik,W.,Bajer,A.,Si Ski,E.,Radwan,J.,2010.Effects of an MHC DRB genotype and allele number on the load of gut parasites in the bank vole Myodes glareolus.Mol.Ecol.19,255–265.

Knowles,L.L.,2009.Statistical phylogeography.Annu.Rev.Ecol.Evol.Syst.40,593–612.

Koboldt, D.C.,Chen,K.,Wylie,T.,et al.,2009.VarScan:variant detection in massively parallel sequencing of individual and pooled samples.Bioinformatics 25,3.

Kocher,T.D.,Thomas,W.K.,Meyer,A.,Edwards,S.V.,P??bo,S.,Villablanca,F.X., Wilson, A.C.,1989.Dynamics of mitochondrial DNA evolution in animals: ampli?cation and sequencing with conserved primers.Proc.Natl https://www.doczj.com/doc/f43653401.html,A 86,6196–6200.

Ko?er,R.,Orozco-terWengel,P.,De Maio,N.,Pandey,R.V.,Nolte,V.,Futschik,A., Kosiol,C.,Schl?tterer,C.,2011.PoPoolation:a toolbox for population genetic analysis of next generation sequencing data from pooled individuals.PLoS One 6,e15925.

Kozarewa,I.,Ning,Z.,Quail,M.A.,Sanders,M.J.,Berriman,M.,Turner,D.J.,2009.

Ampli?cation-free Illumina sequencing-library preparation facilitates improved mapping and assembly of(G+C)-biased genomes.Nat.Methods6, 291–295.

Kubatko,L.S.,Carstens,B.C.,Knowles,L.L.,2009.STEM:species tree estimation using maximum likelihood for gene trees under coalescence.Bioinformatics25,971–973.

Kuenster,A.,Wolf,J.B.W.,Backstrom,N.,Whitney,O.,Balakrishnan,C.N.,Day,L., Edwards,S.V.,Janes, D.E.,Schlinger, B.A.,Wilson,R.K.,https://www.doczj.com/doc/f43653401.html,parative genomics based on massive parallel transcriptome sequencing reveals patterns of substitution and selection across10bird species.Mol.Ecol.19,266–276. Kuhner,M.K.,2009.Coalescent genealogy samplers:windows into population history.Trends Ecol.Evol.24,86–93.

Kumar,S.,Carlsen,T.,Mevik, B.H.,et al.,2011.CLOTU:an online pipeline for processing and clustering of454amplicon reads into OTUs followed by taxonomic annotation.BMC Bioinformatics12,182.

Langmead,B.,Trapnell,C.,Pop,M.,Salzburg,S.L.,2010.Ultrafast and memory-ef?cient alignment of short DNA sequences to the human genome.Genome Biol.10,R25.

Lerner,H.R.L.,Fleischer,R.C.,2010.Prospects for the use of next-generation sequencing methods in ornithology.Auk127,4–15.

Li,H.,Durbin,R.,2009.Fast and accurate short read alignment with Burrows–Wheeler transform.Bioinformatics25,7.

Li,R.,Li,Y.,Kristiansen,K.,Wang,J.,2008.SOAP:short oligonucleotide alignment program.Bioinformatics24,713–714.

Li,H.,Handsaker,B.,Wysoker,A.,et al.,2009.The sequence alignment/map format and SAMtools.Bioinformatics25,2078–2079.

Lipshutz,R.J.,Fodor,S.P.A.,Gingeras,T.R.,Lockhart, D.J.,1999.High density synthetic oligonucleotide arrays.Nat.Genet.21(Suppl.1),20–24.

Liu,L.,Yu,L.,Kubatko,L.,Pearl,D.K.,Edwards,S.V.,2009a.Coalescent methods for estimating phylogenetic trees.Mol.Phyl.Evol.53,320–328.

Liu,L.,Yu,L.,Pearl,D.K.,Edwards,S.V.,2009b.Estimating species phylogenies using coalescence times among sequences.Syst.Biol.58,468–477.

Lunter,G.,Goodson,M.,2011.Stampy:a statistical algorithm for sensitive and fast mapping of Illumina sequence reads.Genome Res.21,936–939.

Lynch,M.,2009.Estimation of allele frequencies from high-coverage genome-sequencing projects.Genetics182,295–301.

Mamanova,L.,Coffey,A.J.,Scott,C.E.,Kozarewa,I.,Turner,E.H.,Kumar,A.,Howard,

E.,Shendure,J.,Turner, D.J.,2009.Target-enrichment strategies for next-

generation sequencing.Nat.Methods7,111–118.

Mardis, E.R.,2008.The impact of next-generation sequencing technology on genetics.Trends Genet.24,133–141.

Maricic,T.,Whitten,M.,P??bo,S.,2010.Multiplexed DNA sequence capture of mitochondrial genomes using PCR products.PLoS One5,e14004.

Marioni,J.C.,Mason,C.E.,Mane,S.M.,Stephens,M.,Gilad,Y.,2008.RNA-seq:an assessment of technical reproducibility and comparison with gene expression arrays.Genome Res.18,1509–1517.

McCormack,J.E.,Faircloth,B.C.,Crawford,N.G.,Gowaty,P.A.,Brum?eld,R.T.,Glenn, T.C.,2012.Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with species-tree analysis.Genome Res.22,746–754.

McCormack,J.E.,Maley,J.M.,Hird,S.M.,Derryberry,E.P.,Graves,G.R.,Brum?eld, R.T.,2012.Next-generation sequencing reveals population genetic structure and a species tree for recent bird divergences.Mol.Phyl.Evol.62,397–406. Medinger,R.,Nolte,V.,Pandey,R.V.,Jost,S.,Ottenw?lder,B.,Schl?tterer,C.,Boenigk, J.,2011.Diversity in a hidden world:potential and limitation of next generation sequencing for surveys of molecular diversity of eukaryotic microorganisms.

Mol.Ecol.19(Suppl.S1),32–40.

Meyer,M.,Stenzel,U.,Hofreiter,M.,2008.Parallel tagged sequencing on the454 platform.Nat.Protoc.3,267–278.

Milne,I.,Bayer,M.,Cardle,L.,et al.,2010.Tablet–next generation sequence assembly visualization.Bioinformatics26,401–402.

Miller,M.R.,Dunham,J.P.,Amores,A.,Cresko,W.A.,Johnson,E.A.,2007.Rapid and cost-effective polymorphism identi?cation and genotyping using restriction site associated DNA(RAD)markers.Genome.Res.17,240–248.Morin,R.D.,Bainbridge,M.,Fejes,A.,Hirst,M.,Krzywinski,M.,Pugh,T.J.,McDonald,

H.,Varhol,R.,Jones,S.J.M.,Marra,M.A.,2008.Pro?ling the HeLa S3

transcriptome using randomly primed cDNA and massively parallel short-read sequencing.Biotechniques45,81–94.

Morin,P.A.,Archer,F.I.,Foote,A.D.,Vilstrup,J.,Allen,E.E.,Wade,P.,Durban,J., Parsons,K.,Pitman,R.,Li,L.,https://www.doczj.com/doc/f43653401.html,plete mitochondrial genome phylogeographic analysis of killer whales(Orcinus orca)indicates multiple species.Genome Res.20,908–916.

Nabholz,B.,Künstner,A.,Wang,R.,Jarvis,E.,Ellegren,H.,2011.Dynamic evolution of base composition:causes and consequences in avian phylogenomics.Mol.

Biol.Evol.28,2197–2210.

Naduvilezhath,L.,Rose,L.E.,Metzler,D.,2011.Jaatha:a fast composite likelihood approach to estimate demographic parameters.Mol.Ecol.20,2709–2723. Neiman,M.,Lundin,S.,Savolainen,P.,Ahmadian,A.,Andersen,M.,2011.Decoding a substantial set of samples in parallel by massive sequencing.PLoS One6, e17785.

Ng,S.B.,Turner,E.H.,Robertson,P.D.,Flygare,S.D.,Bigham,A.W.,Lee,C.,Shaffer,T., Wong,M.,Bhattacharjee,A.,Eichler,E.E.,2009.Targeted capture and massively parallel sequencing of12human exomes.Nature461,272–276.

Nicod,J.C.,Largiadèr,C.R.,2003.SNPs by AFLP(SBA):a rapid SNP isolation strategy for non model organisms.Nucleic Acids Res.31,e19.

Nielsen,R.,Paul,J.S.,Albrechtsen,A.,Song,Y.S.,2011.Genotype and SNP calling from next-generation sequencing data.Nat.Rev.Genet.12,443–451. Novembre,J.,Johnson,T.,Bryc,K.,Kutalik,Z.,Boyko,A.R.,Auton,A.,Indap,A.,King, K.S.,Bergmann,S.,Nelson,M.R.,2008.Genes mirror geography within Europe.

Nature456,98–101.

Okou,D.T.,Steinberg,K.M.,Middle,C.,Cutler,D.J.,Albert,T.J.,Zwick,M.E.,2007.

Microarray-based genomic selection for high-throughput resequencing.Nat.

Methods4,907–909.

Oliver,T.A.,Gar?eld,D.A.,Manier,M.K.,Haygood,R.,Wray,G.A.,Palumbi,S.R.,2010.

Whole-genome positive selection and habitat-driven evolution in a shallow and

a deep-sea urchin.Genome Biol.Evol.2,800–814.

Parks,M.,Cronn,R.,Liston,A.,2009.Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes.

BMC Biol.7,84.

Philippe,H.,Brinkmann,H.,Lavrov,D.V.,Littlewood,D.T.J.,Manuel,M.,W?rheide,

G.,Baurain,D.,2011.Resolving dif?cult phylogenetic questions:why more

sequences are not enough.PLoS Biol.9,e1000602.

Pinho,C.,Hey,J.,2010.Divergence with gene?ow:models and data.Annu.Rev.

Ecol.Evol.Syst.41,215–230.

Pritchard,J.K.,Stephens,M.,Donnelly,P.,2000.Inference of population structure using multilocus genotype data.Genetics155,945–959.

Ramos,A.M.,Crooijmans,R.,Affara,N.A.,Amaral,A.J.,Archibald,A.L.,Beever,J.E., Bendixen,C.,Churcher,C.,Clark,R.,Dehais,P.,2009.Design of a high density SNP genotyping assay in the pig using SNPs identi?ed and characterized by next generation sequencing technology.PLoS One4,e6524.

Rice,A.M.,Rudh,A.,Ellegren,H.,Qvarnstr?m,A.,2011.A guide to the genomics of ecological speciation in natural animal populations.Ecol.Lett.14,9–18.

Sánchez,C.C.,Smith,T.P.L.,Wiedmann,R.T.,Vallejo,R.L.,Salem,M.,Yao,J.,Rexroad,

C.E.,2009.Single nucleotide polymorphism discovery in rainbow trout by deep

sequencing of a reduced representation library.BMC Genomics10,559. Seeb,J.E.,Pascal,C.E.,Ramakrishnan,R.,Seeb,L.W.,2009.SNP genotyping by the5-nuclease reaction:advances in high throughput genotyping with non-model organisms.In:Komar,A.A.(Ed.),Single Nucleotide Polymorphisms,Methods in Molecular Biology.Humana Press,Totowa,New Jersey,pp.277–292. Shendure,J.,Ji,H.,2008.Next-generation DNA sequencing.Nat.Biotechnol.26, 1135–1145.

Siepel, A.,Bejerano,G.,Pedersen,J.S.,Hinrichs, A.S.,Hou,M.,Rosenbloom,K., Clawson,H.,Spieth,J.,Hillier,L.D.W.,Richards,S.,2005.Evolutionarily conserved elements in vertebrate,insect,worm,and yeast genomes.Genome Res.15,1034–1050.

Simpson,J.T.,Wong,K.,Jackman,S.D.,et al.,2009.ABySS:a parallel assembler for short read sequence data.Genome Res.19,1117–1123.

Sims,G.E.,Jun,S.R.,Wu,G.A.,Kim,S.H.,2009.Alignment-free genome comparison with feature frequency pro?les(FFP)and optimal resolutions.Proc.Natl Acad.

https://www.doczj.com/doc/f43653401.html,A106,2677–2682.

Sirén,J.,Marttinen,P.,Corander,J.,2011.Reconstructing population histories from single nucleotide polymorphism data.Mol.Biol.Evol.28,673–683.

Slater,G.,Birney, E.,2005.Automated generation of heuristics for biological sequence comparison.BMC Bioinformatics6,31.

Smith,M.J.,Pascal, C.E.,Grauvogel,Z.,Habicht, C.,Seeb,J.E.,Seeb,L.W.,2011.

Multiplex preampli?cation PCR and microsatellite validation enables accurate single nucleotide polymorphism genotyping of historical?sh scales.Mol.Ecol.

Res.11,268–277.

Stapley,J.,Reger,J.,Feulner,P.G.D.,Smadja,C.,Galindo,J.,Ekblom,R.,Bennison,C., Ball, A.D.,Beckerman, A.P.,Slate,J.,2010.Adaptation genomics:the next generation.Trends Ecol.Evol.25,705–712.

Stephen,S.,Pheasant,M.,Makunin,I.V.,Mattick,J.S.,https://www.doczj.com/doc/f43653401.html,rge-scale appearance of ultraconserved elements in tetrapod genomes and slowdown of the molecular clock.Mol.Biol.Evol.25,402–408.

Sunnucks,P.,Wilson,A.C.C.,Beheregaray,L.B.,Zenger,K.,French,J.,Taylor,A.C., 2000.SSCP is not so dif?cult:the application and utility of single-stranded conformation polymorphism in evolutionary biology and molecular ecology.

Mol.Ecol.9,1699–1710.

Tewhey,R.,Nakano,M.,Wang,X.,Pabón-Pe?a,C.,Novak,B.,Giuffre,A.,Lin,E., Happe,S.,Roberts, D.N.,LeProust, E.M.,2009a.Enrichment of sequencing

J.E.McCormack et al./Molecular Phylogenetics and Evolution66(2013)526–538537

targets from the human genome by solution hybridization.Genome Biol.10, R116.

Tewhey,R.,Warner,J.B.,Nakano,M.,Libby, B.,Medkova,M.,David,P.H., Kotsopoulos,S.K.,Samuels,M.L.,Hutchison,J.B.,Larson,J.W.,2009b.

Microdroplet-based PCR enrichment for large-scale targeted sequencing.Nat.

Biotechnol.27,1025–1031.

Thomson,R.C.,Wang,I.A.N.J.,Johnson,J.R.,2010.Genome enabled development of DNA markers for ecology,evolution and conservation.Mol.Ecol.19,2184–2195.

Van Orsouw,N.J.,Hogers,R.C.J.,Janssen,A.,Yalcin,F.,Snoeijers,S.,Verstege,E., Schneiders,H.,Van Der Poel,H.,Van Oeveren,J.,Verstegen,H.,https://www.doczj.com/doc/f43653401.html,plexity reduction of polymorphic sequences(CRoPS?):a novel approach for large-scale polymorphism discovery in complex genomes.PLoS One2,e1172.

Van Tassell,C.P.,Smith,T.P.L.,Matukumalli,L.K.,Taylor,J.F.,Schnabel,R.D.,Lawley,

C.T.,Haudenschild,C.

D.,Moore,S.S.,Warren,W.C.,Sonstegard,T.S.,2008.SNP

discovery and allele frequency estimation by deep sequencing of reduced representation libraries.Nat.Methods5,247–252.

Vishnoi,A.,Roy,R.,Prasad,H.K.,Bhattacharya,A.,Desalle,R.,2010.Anchor-based whole genome phylogeny(ABWGP):a tool for inferring evolutionary relationships among closely related microorganims.PLoS One5,e14159.Vos,P.,Hogers,R.,Bleeker,M.,Reijans,M.,van de Lee,T.,Hornes,M.,Frijters,A.,Pot, J.,Peleman,J.,Kuiper,M.,1995.AFLP:a new technique for DNA?ngerprinting.

Nucleic Acids Res.11,4407–4414.

Wang,D.G.,Fan,J.B.,Siao,C.J.,Berno,A.,Young,P.,Sapolsky,R.,Ghandour,G., Perkins,N.,Winchester, E.,Spencer,J.,https://www.doczj.com/doc/f43653401.html,rge-scale identi?cation, mapping,and genotyping of single-nucleotide polymorphisms in the human genome.Science280,1077.

Wang,Z.,Gerstein,M.,Snyder,M.,2009.RNA-Seq:a revolutionary tool for transcriptomics.Nat.Rev.Genet.10,57–63.

Wiedmann,R.T.,Smith,T.P.L.,Nonneman,D.J.,2008.SNP discovery in swine by reduced representation and high throughput pyrosequencing.BMC Genet.9,

81.

Williams,L.M.,Ma,X.,Boyko,A.R.,Bustamante,C.D.,Oleksiak,M.F.,2010.SNP identi?cation,veri?cation,and utility for population genetics in a non-model genus.BMC Genet.11,32.

Zellmer,A.J.,Hanes,M.M.,Hird,S.M.,Carstens,B.C.,2012.Deep phylogeographic structure and environmental differentiation in the carnivorous plant Sarracenia alata.Syst.Biol.61,763–777.

Zerbino,D.R.,Birney,E.,2008.Velvet:algorithms for de novo short read assembly using de Bruijn graphs.Genome Res.18,821–829.

538J.E.McCormack et al./Molecular Phylogenetics and Evolution66(2013)526–538

精神分裂症的常见表现及基本概念

精神分裂症概述 精神分裂症是一种常见的病因未完全阐明的精神疾病。临床表现为知觉、思维、情感、行为等多方面障碍及精神活动的不协调。患者一般意识清楚,智能基本正常,但部分病人在疾病过程中可出现认知功能损害。本病多在青壮年起病,病程多迁延,缓慢进展,如不积极治疗可逐渐加重或恶化,有发展为衰退的可能。部分病人可保持痊愈或基本痊愈状态。具体表现为:①活泼好动的青少年,逐渐变得孤僻离群、生活懒散,对外部事物不感兴趣,注意力涣散,不专心听课、学习成绩下降,或是常发呆发愣,或蒙头睡觉,衣衫不整,污秽不堪,或对镜发笑,自言自语。②精神萎靡,自诉头痛头昏,失眠心烦。谈话时往往前言不搭后语,有头无尾,支离破碎,或欲言不止,或百问不答,或喜欢使用自己创造的新词、新字、使人费解。③整日叫喊不停,独自对空说话,甚至语不成句,情绪与言语内容常不相协调,如说有人要伤害他,但面部表现却很高兴;或为了一点点小事而勃然大怒。④想入非非,遐想终日。妄想内容多离奇古怪、荒诞无稽,如无中生有地认为饭菜内有人放了毒药,或认为别人咳嗽、吐痰、搔头等都是要对他采取行动的某种“特别信号”;有的认为自己被别人爱上了,常与对方纠缠不休;有的感到自己的思想、行为、身体受到电波、仪器的控制等等。 精神病已是二十一世纪的社会公害之一,专家指出,认识精神分裂症症状可以为早期治疗赢得最佳时机。最新的研究发现,精神分裂症的预后关键在于是否早期发现,早期治疗。首次发作的6个月以内是最佳的治疗时期,这时精神分裂症的神经生物学损害最少。首次发病者,早期得到及时治疗在病后的6个月到1-2年可望完全恢复。延误首次治疗的时间将导致治疗的效果下降。但是如果不坚持服药治疗,85%的患者在5年内复发。 那么,怎么才能在早期识别精神分裂症呢?一般来说,以下几方面有助于辨别。①性格改变

水稻基因组学的的研究进展

基因组学课程论文 所在学院生命科学技术学院 专业14级生物技术(植物方向) 姓名金祥栋 学号2014193012

水稻基因组学的研究进展 摘要:随着模式植物——拟南芥和水稻基因组测序的完成,近年来关于植物基因组学的研究越来越多。水稻是世界上重要的粮食作物之一,养活着全世界近一半的人口。同时南于水稻基冈组较小、易于转化及与其他禾本科植物基因组的同线性和共线性等特点,一直被作为禾本科植物基因组研究的模式作物。水稻基因组测序的完成及种质资源的基因组重测序,为水稻功能基因组研究奠定了基础。现综述我国水稻基因组测序和功能基因组研究历史,重点介绍了近年来在水稻基因组序列分析中获得的几项最新的研究结果。 关键词:水稻;基因组测序;功能基因组;研究历史;基因组学;研究进展 The recent progress in rice genomics research Abstract: With the completion of genome sequencing ofthe model plant-- Arabidopsis and rice,more and more researches on plant genomics emerge in recent years. Rice i s one of the most important crops in the world, raised nearly half of the world popul ation. At the same time in south rice Keegan group is smaller, with linear and linear features such as easy transformation and other gramineous plant genome, has been use d as a model crop for plant genome research of Gramineae. Genome sequencing and germplasm resources the rice genome sequencing completed laid the foundation for ric e functional genomics research. This article reviews the history and function of our ge nome sequencing of rice genome research, introduces several latest research results in recent years in the analysis of rice genome sequences. 前言 基因组是1924年提出用于描述生物的全部基因和染色体组成的概念,是研究生物基因结构与功能的学科,是在遗传学的基础上发展起来的一门现代生物技术前沿科学,也是现代分子生物学和遗传工程技术所必要学科,是当今生物学研究领域最热门、最有生命力、发展最快的前沿科学之一。基因组学的主要任务是研究探索生物基因结构与功能,生物遗传和物理图谱构建,建立和发展生物信息技术,为生物遗传改良及遗传病的防治提供相关技术依据。 进入21 世纪,随着全球化、市场化农业产业发展和全球贸易一体化格局的逐步形成,我国种业正面临前所未有的严峻挑战,主要表现在:依靠传统育种技术难以大幅度提高粮食单产;土地资源短缺,农业环境污染日益突出;种质资源发掘、基因组育种技术亟需创新等。水稻不仅是重要的粮食作物,由于其基因组较小且与其他禾本科作物基因组存在共线性,以及具有成熟高效的遗传转化体系,已成为作物功能基因组研究的模式植物。因此,水稻基因组研究对发展现代农作物育种技术、提升种业国际竞争力和保障粮食有效供给具有重大战略意义。 基因组研究主要包括三个层次:①结构基因组学,以全序列测序为目标,构建高分辨率的以染色体重组交换为基础的遗传图谱和以DNA 的核苷酸序列为基础的物理图谱。②功能

纳米孔测序是极具前景的下一代测序技术

纳米孔测序是极具前景的下一代测序技术 Nanopore Sequencing 2019 - Patent Landscape Analysis 随着各种技术的新产品推出,哪些公司将在知识产权方面引领纳米孔测序? 纳米孔测序是极具前景的下一代测序技术 据麦姆斯咨询介绍,纳米孔测序是新一代测序(NGS)技术之一,被认为能够彻底革新DNA分析。随着时间地推移,目前已经开发出了不同形式的纳米孔测序技术,包括蛋白质纳米孔、固态纳米孔和复合纳米孔。该技术可以高速生成超长读数,减少样品制备时间以及将读数重组成原始序列所需要的数据处理时间。 这项新技术可以开发一个需要遗传指纹来快速识别癌症类型和病原体的全新客户群。根据DataBridge的数据,全球下一代测序市场将快速增长,市场规模预计将从2017年的48.3亿美元增长到2024年的163.5亿美元,2018~2024年期间的复合年增长率(CAGR)预计为19.2%。 目前,Oxford Nanopore Technologies是唯一一家将基于纳米孔的测序仪推向市场的公司。不过,还有其它几家公司正在开发自己的相关技术,Oxford Nanopore Technologies公司可能很快将不再是纳米孔测序仪的唯一供应商。例如,Two Pore Guys公司宣布将在2019年春季发布其产品套件。 随着新产品在未来的相继推出,了解纳米孔测序市场相关参与者的知识产权(IP)状况和策略,同时发现专利新申请人及其所带来的威胁至关重要。为此,著名市场研究机构Yole 子公司Knowmade深入调研了基于纳米孔的测序技术(蛋白质、固态和复合)及其应用(肿瘤学、植物遗传学等)中涉及的知识产权主要参与者。本报告可以帮助读者发现业务风险和机遇,预测新兴应用,支持战略决策以加强市场地位。 纳米孔测序全球专利申请趋势 对专利申请趋势的分析表明,从2008年到2013年,纳米孔测序相关的专利申请获得了重要增长。这一增长源自于学术研究团队(哈佛大学和加州大学)对纳米孔测序概念的验证。

新一代测序技术的发展及应用前景

2010年第10期杨晓玲等:新一代测序技术的发展及应用前景 等交叉学科的迅猛发展。 1.1第二代测序——高通量低成本齐头并进以高通量低成本为主要特征的第二代测序,不再需要大肠杆菌进行体内扩增,而是直接通过聚合酶或者连接酶进行体外合成测序¨】。根据其原理又可分为两类:聚合酶合成测序和连接酶合成测序。1.1.1聚合酶合成测序法Roche公司推出的454技术开辟了高通量测序的先河。该技术通量可达Sangcr测序的几百倍,而成本却只有几十分之一,因此一经推出,便受到了国际上基因组学专家的广泛关注。454采用焦磷酸合成测序法HJ,避免了传统测序进行荧光标记以及跑胶等繁琐步骤,同时利用乳胶系统对DNA分子进行扩增,实现了大规模并行测序。截止到2010年4月,已有700多篇文献是采用了454测序技术(http://454.com/publications.and—resources/publications.asp),对该技术是一个极大的肯定。 Illumina公司推出的Solexa遗传分析仪是合成技术的进一步发展与延伸。该技术借助高密度的DNA单分子阵列,使得测序成本和效率均有了较大改善。同时Solexa公司提出的可逆终止子”1也是该技术获得认可的原因之一。与454相比。Solexa拥有更高的通量,更低的成本。虽然片段长度较短仍是主要的技术瓶颈,但是对于已有基因组的物种来说,Solexa理所当然成为第二代测序技术的首选。2008年以来,利用该技术开展的研究大幅度上升,报道文献达400多篇(http://www.illumina.com/systems/genome—analyzer_iix.ilmn)o 1.1.2连接酶合成测序法2007年ABI公司在Church小组拍1研究成果的基础上推出了SOLID测序仪。该技术的创新之处在于双碱基编码…的应用,即每个碱基被阅读两次,因此大大减少了测序带来的错误率,同时可以方便的区分SNP和测序错误。在测序过程中,仪器自动加入4种荧光标记的寡核苷酸探针,探针与引物发生连接反应,通过激发末端的荧光标记识别结合上的碱基类型。目前SOLID3.0测序通量可达20G,而测序片段仅有35—50bp,这使得该技术与Solexa相比,应用范围还不够广泛。ABI公司正加快研发进度,争取在片段长度方面做出重大突破。 DanaherMotion公司推出Polonator¨1测序仪同样也是基于Church小组的研究成果,但是该设备的成本要低很多,同时用户在使用时可以根据自己的研究目的设置不同的测序条件。而CompleteGe—nomics公司推出的DNA纳米阵列与组合探针锚定连接测序法"1则具有更高的容错能力,试剂的消耗也进一步减少,目前已顺利完成3个个体基因组的测序工作。 1.2第三代测序——单分子长片段有望实现第二代测序技术虽然在各方面都有了较大的突破,但是仍然建立在PCR扩增的基础上。为了避免PCR扩增带来的偏差,科学家目前正在研制对DNA单个分子直接测序的第三代测序仪。最具代表性的包括Heliscope单分子测序仪,单分子实时合成测序法,纳米孔测序技术等。 Helicos技术仍然是基于合成测序原理¨…,它采用了一种新的荧光类似物和灵敏的监测系统,能够直接记录到单个碱基的荧光,从而克服了其他方法须同时测数千个相同基因片段以增加信号亮度的缺陷。PacificBioscienees公司研发的单分子实时合成测序法充分利用了DNA聚合酶的特性,可以形象的描述为通过显微镜实时观测DNA聚合酶,并记录DNA合成的整个过程。纳米孔测序技术[11’121则是利用不同碱基在通过纳米小孔时引起的静电感应稍有不同,或者不同碱基通过小孔的能力各有差异,来加以区分不同的碱基信号。 2应用与实践 Kahvejian在2008年的一篇综述中提到¨“:“如果你可以随心所欲地测序,你会开展哪些研究?”。人类基因组计划的完成和近年来高通量测序的兴起,使越来越多的科研工作者认识到,我们对于生物界的认识才刚刚起步。基因图谱的绘制并不意味着所有遗传密码的破解,癌症基因组的开展也没有解决所有的医学难题。DNA变异的模式和进化机制,基因调控网络的结构和相互作用方式,复杂性状及疾病的分子遗传基础等,仍是困扰生物学家和医学家的难题,而高通量测序的广泛应用,也许可以让我们知道的更多。 2.1DNA水平的应用 2.1.1全基因组测序新一代测序技术极大地推

精神分裂症分类及诊断规范标准

精神分裂症分类及诊断标准 F20精神分裂症 F20.0偏执型精神分裂症 F20.1青春型精神分裂症 F20.2紧张型精神分裂症 F20.3未分化型精神分裂症 F20.4精神分裂症后抑郁 F20.5残留型精神分裂症 F20.6单纯型精神分裂症 F20.8其它精神分裂症 F20.9精神分裂症,未特定 可采用第五位编码指明症状 F20.x0持续性 xF20.1发作性,伴有进行性损害 F20.x2发作性,伴有稳定性损害 F20.x3弛张发作性 F20.x4不完全性缓解 F20.x5完全性缓解 F20.x8其它 F20.x9观察期尚不足一年。 精神分裂症 虽然无法分辨出严格地标示病理性质的症状,但出于实践的目的,有必要将上述症状分成一些对诊断有特殊意义的、并常常同时出现的症状群,例: (a)思维鸣响,思维插入或思维被撤走以及思维广播;

(b)明确涉及躯体或四肢运动,或特殊思维、行动或感觉的被影响、被控制或被动妄想;妄想性知觉; (c)对病人的行为进行跟踪性评论,或彼此对病人加以讨论的幻听,或来源于身体某一部分的其它类型的听幻觉; (d)与文化不相称且根本不可能的其它类型的持续性妄想,如具有某种宗教或政治身份,或超人的力量和能力(例如能控制天气,或与另一世界的外来者进行交流); (e)伴有转瞬即逝的或未充分形成的无明显情感内容的妄想、或伴有持久的超价观念,或连续数周或数月每日均出现的任何感官 的幻觉; (f)思维断裂或无关的插入语,导致言语不连贯,或不中肯或词语新作; (g)紧张性行为,如兴奋、摆姿势,或蜡样屈曲、违拗、缄默及木僵; (h)“阴性”症状,如显著的情感淡漠、言语贫乏、情感反应迟钝或不协调,常导致社会退缩及社会功能的下降,但必须澄清这 些症状并非由抑郁症或神经阻滞剂治疗所致; (i)个人行为的某些方面发生显著而持久的总体性质的改变,表现为丧失兴趣、缺乏目的、懒散、自我专注及社会退缩。 诊断要点: 诊断精神分裂症通常要求在一个月或以上时期的大部分时间确实存在属于上述(a)到(d)中至少一个(如不甚明确需两个或多个症状)或(e)到(h)中来自至少两组症状群中的十分明确的症状。符合此症状要求但病程不足一个月的状况(无论是否经过治疗)应首先诊断为急性精神分裂症样精神病性障碍(F23.2),如症状持续更长的时间再重新归类为精神分裂症。 回顾疾病过程可发现在精神病性症状出现之前数周或数月,有一

一代测序与二代测序的区别与联系

一代测序与二代测序的区别与联系 Sanger 法测序(一代测序) Sanger 法测序利用一种DNA 聚合酶来延伸结合在待定序列模板上的引物。直到掺入一种链终止核苷酸为止。每一次序列测定由一套四个单独的反应构成,每个反应含有所有四种脱氧核苷酸三磷酸(dNTP),并混入限量的一种不同的双脱氧核苷三磷酸(ddNTP)。由于ddNTP 缺乏延伸所需要的3-OH 基团,使延长的寡聚核苷酸选择性地在G、A、T 或 C 处终止。终止点由反应中相应的双脱氧而定。每一种dNTPs 和ddNTPs 的相对浓度可以调整,使反应得到一组长几百至几千碱基的链终止产物。它们具有共同的起始点,但终止在不同的的核苷酸上,可通过高分辨率变性凝胶电泳分离大小不同的片段,凝胶处理后可用X-光胶片放射自显影或非同位素标记进行检测。 高通量测序(二代测序) 高通量测序技术(High-throughput sequencing,HTS)是对传统Sanger 测序(称为一代测序技术)革命性的改变, 一次对几十万到几百万条核酸分子进行序列测定, 因此在有些文献中称其为下一代测序技术(next generation sequencing,NGS )足见其划时代的改变, 同时高通量测序使得对一个物种的转录组和基因组进行细致全貌的分析成为可能, 所以又被称为深度测序(Deep sequencing)。 第一代和第二代测序技术 第 X 代公司 平台 名称 测序方 法 检测 方法 大约 读长 (碱基 数) 优点相对局限性 第一代ABI/ 生命 技术 公司 3130 xL-37 30xL 桑格- 毛细管 电泳测 序法 荧光/ 光学 600-1 000 高读长,准确度一 次性达标率高,能 很好处理重复序 列和多聚序列 通量低;样品制备 成本高,使之难以 做大量的平行测 序 第 一代贝克 曼 GeXP 遗传 分析 系统 桑格- 毛细管 电泳测 序法 荧光/ 光学 600-1 000 高读长,准确度一 次性达标率高,能 很好处理重复序 列和多聚序列;易 通量低;单个样品 的制备成本相对 较高

新一代测序法简介

新一代测序法简介 新一代测序方法是一种直接测序法,它既可以分析基因和DNA的组成(定性分析),也可以测定同一类型基因在表达过程中产生的数量(定量分析),以及不同类型基因或DNA 之间的差别所在(交叉对比分析)。自2004年,454测序技术发展以来,已经出现的测序产品超过六种之多。这些产品的技术特点见下表: 产家名称产品技术特点优缺点 化学反应测序方法误读率样品准备高通量程度 Roche (454 Life Science) 焦磷酸标记的链 反应 焦磷酸基 标记 <1% 较复杂,需PCR 中等 Illumina(Solexa)四色可逆终止码合成法1%—3% 较复杂,需PCR 中—高ABI(SOLID) 双色可逆终止码合成法1%—5% 较复杂,需PCR 中—高Helicos Bioscience 单色可逆终止码合成法2%—8% 简单,无需PCR 高—超高Intelligent Biosystm 四色可逆终止码合成法1%—5% 较复杂,需PCR 中—高 Pacific Bioscience 四色焦磷酸基标 记焦磷酸基 标记 3%—8% 简单,无需PCR 高 VisiGen 焦磷酸基标记 FRET 焦磷酸基 标记 3%—8% 简单,无需PCR 高 在这些技术中,从所分析的样本在测序前是否需要扩增,大致可以分为两类,即克隆扩增型和单分子测序型。两种类型在测序技术上区别并不大,但对结果的影响却有不小的差别。主要体现在两个方面:(1)单分子测序更能反应细胞或组织内分子的真实情况,尤其是在需要定量分析的情况下。而克隆扩增型中的PCR反应使得样品中DNA分子的扩增机会并不完全均等,这会对基因表达的定量分析造成影响;(2)单分子测序具有通量更高的优势。克隆扩增使得同一类型的分子数目急剧上升,在提高同类型分在在固相表面出现的几率同时,也降低了不同类型分子出现的机会。 面重点介绍Pacific Biosciences公司推出的Single Molecule Real Time (SMRT?) DNA Sequencing(单分子实时DNA测序)。 首先,在这一测序技术中有主要有两个关键的技术: 一、荧光标记的脱氧核苷酸避免了碱基的空间位阻效应。显微镜现在也无法实现实时看到“单分子”,但是它可以实时记录荧光的强度变化。当荧光标记的脱氧核苷酸被掺入DNA 链的时候,它的荧光就同时能在DNA链上探测到。当它与DNA链形成化学键的时候,它的荧光基团就被DNA聚合酶切除,荧光消失。这种荧光标记的脱氧核苷酸不会影响DNA聚合酶的活性,并且在荧光被切除之后,合成的DNA链和天然的DNA链完全一样; 二、纳米微孔(Zero-mode waveguide (ZMW))。因为在显微镜实时记录DNA链上的荧光的时候,DNA链周围的众多的荧光标记的脱氧核苷酸形成了非常强大的荧光背景,这种强大的荧光背景使单分子的荧光探测成为不可能。Pacific Biosciences公司发明了一种直径只有10nm的纳米孔,单分子的DNA聚合酶被固定在这个孔内。在这么小的孔内,DNA链周围的荧光标记的脱氧核苷酸有限,而且由于A,T,C,G这四种荧光标记的脱氧核苷酸非常快速地从外面进入到孔内又出去,它们形成了非常稳定的背景荧光信号。而当某一种荧光标记的脱氧核苷酸被掺入到DNA链时,这种特定颜色的荧光会持续一小段时间,直到新的化学

精神分裂症症状

精神分裂症在近些年来越来多多见了,这是一种持续、通常慢性的重大精神疾病,是精神病里最严重的一种,是以基本个性,思维、情感、行为的分裂,精神活动与环境的不协调为主要特征的一类最常见的精神病,多青壮年发病,进而影响行为及情感。给患者造成很多严重的危害,只是在早期极易被大家所忽视,给大家的心理造成很多的负担,为了能帮助大家早日认识和发现精神分裂症的产生,专家特为大家详细介绍精神分裂症的症状表现。相信能帮助大家: 1、情绪改变:小“刺激”也会引起大“反应”。如有一些患者出现情感倒错,如听到不幸的消息反而哈哈大笑,得知高兴消息后却唉声叹气等。 2、动作行为异常:这是异常精神活动的外在表现,较易识别。如具有被害妄想的患者往往对妄想对象突然发生攻击行为。情绪高涨的狂躁症患者可有过分装饰,精神病的症状或“慷慨相助”或管闲事。情绪低落的抑郁症患者则呆立或默不作声。具有幻听的患者常侧耳倾听,或作出相应的行为反应,如对空叫骂等。 3、多疑:这类人对治疗精神病周围人的一言一行都特别敏感。如听到有人讲话,就怀疑在议论自己,甚至有人咳嗽也疑为是针对自己的。这种多疑和正常人的多疑不同的是,虽经事实证实,疑点被否定,但患者仍坚信不移,无法说服。这种病态思维被称之为妄想。 4、幻觉:幻想自己是超级巨人;表现为极其自大,目空一切,横行霸道,与自身的现实情况完全不符。 5、行为障碍:常有激烈的伤人毁物与破坏一切特别是美好事物的冲动,具有爆发性与严重性。 6、妄想:妄想制造出一个超级巨人;表现为对研制机器人、对人类的精神控制技术等极度热衷。 7、思维障碍:逻辑缺陷,常有前言不搭后语、所答非所问等常人难以理解的表述极度的以自我为中心,其他人都被认为是利用的对象;偏执、顽固。 8、情感障碍:残忍、冷漠、虚伪、恶毒、嫉妒等等,对人对事与常人的反应完全不同,但会伪装。没有爱的能力,只有占有欲。 由此可见一旦换上了精神分裂是非常严重的中国人民解放军总政治部医院精神科董泉淼教授建议大家如果患上以上相关的症状那么建议您及时到正规专业的医院来进行就诊。 生物离子穴位渗透系统作用机理 首先:是利用生物智能全数字神经定位仪查出致病灶,并对病灶部位神经等各系统进行精确计算以及锁定,运用生物离子穴位渗透系统技术进行生物植入病灶穴位顶部,当生物离子导入病灶穴位后,能马上作用于受损的丘脑、大脑神经,迅速调节丘脑、大脑异常功能紊乱,激活衰竭细胞活性,保障大脑供氧以及新陈代谢频率,全面补充大脑细胞缺失的的胶原蛋白营养,更新脑部血液循环,消除精神压力,使得不正常以及受损的脑部神经系统的功能从根部得到修正治疗。 其次:需同时进行的血液磁化植入技术,它是依据生物物理学、生物磁学、生物光学、生物化学的原理,将磁、光、氧有机结合形成磁共振作用,以血液为媒介调节机体代谢实现对机

DNA第一代-第二代-第三代测序的介绍

双脱氧链终止法又称为Sanger法 原理是:核酸模板在DNA聚合酶、引物、4 种单脱氧核苷三磷酸 ( d NTP,其中的一种用放射性P32标记 )存在条件下复制时,在四管反应系统中分别按比例引入4种双脱氧核苷三磷酸 ( dd NTP ),因为双脱氧核苷没有3’-O H,所以只要双脱氧核苷掺入链的末端,该链就停止延长,若链端掺入单脱氧核苷,链就可以继续延长。如此每管反应体系中便合成以各自 的双脱氧碱基为3’端的一系列长度不等的核酸片段。反应终止后,分4个泳道进行凝胶电泳,分离长短不一的核酸片段,长度相邻的片段相差一个碱基。经过放射自显影后,根据片段3’端的双脱氧核苷,便可依次阅读合成片段的碱基排列顺序。Sanger法因操作简便,得到广泛的应用。后来在此基础上发展出多种DNA 测序技术,其中最重要的是荧光自动测序技术。 荧光自动测序技术荧光自动测序技术基于Sanger 原理,用荧光标记代替同位素标记,并用成像系统自动检测,从而大大提高了D NA测序的速度和准确性。20世纪80 年代初Jorgenson 和 Lukacs提出了毛细管电泳技术( c a p il l ar y el ect r ophor es i s )。1992 年美国的Mathies实验室首先提出阵列毛细管电泳 ( c a p il l ar y ar r a y el ectr ophor es i s ) 新方法,并采用激光聚焦荧光扫描检测装置,25只毛细管并列电泳,每只毛细管在1.5h内可读出350 bp,DNA 序列,分析效率可达6 000 bp/h。1995年Woolley研究组用该技术进行测序研究,使用四色荧光标记法,每个毛细管长3.5cM,在9min内可读取150个碱基,准确率约 97 % 。目前, 应用最广泛的应用生物系统公司 ( ABI ) 37 30 系列自动测序仪即是基于毛细管电泳和荧光标记技术的D NA测序仪。如ABI3730XL 测序仪拥有 96 道毛细管, 4 种双脱氧核苷酸的碱基分别用不同的荧光标记, 在通过毛细管时 不同长度的 DNA 片段上的 4 种荧光基团被激光激发, 发出不同颜色的荧光, 被 CCD 检测系统识别, 并直接翻译成 DNA 序列。 杂交测序技术杂交法测序是20世纪80年代末出现的一种测序方法, 该方法不同于化学降解法和Sanger 法, 而是利用 DNA杂交原理, 将一系列已知序列的单链寡核苷酸片段固定在基片上, 把待测的 DN A 样品片段变性后与其杂交, 根据杂交情况排列出样品的序列

水稻基因组进化的研究进展

水稻基因组进化的研究进展 水稻是世界上重要的粮食作物之一,养活着全世界近一半的人口。同时南于水稻基冈组较小、易于转化及与其他禾本科植物基因组的同线性和共线性等特点,一直被作为禾本科植物基因组研究的模式作物。水稻是第一个被全基因组测序的作物,目前栽培稻2个亚种全基因组测序工作已经完成:粳稻品种日本晴(Nipponbare)通过全基因组鸟枪法和逐步克隆法被测序,籼稻品种扬稻6号(9311)通过全基因组鸟枪法被测序。除核基因组外,水稻叶绿体和线粒体基因组也于1989年和2002年分别被测序。水稻2个亚种的全基因组测序完成,一方面开启了植物比较基因组学的大门,另一方面为人们在基冈组水平上鉴定出所有水稻基因并分析其功能奠定了基础,同时也使得人们对植物进化的认识,尤其是对禾本科植物进化的了解,逐步从系统分类和分子标记水平进入到了基因组序列水平。许多研究者通过对水稻基因组序列的分析,利用生物信息学工具,对水稻在基因组水平上的进化进行了大量研究。 1 水稻及其他禾本科植物基因组的古多倍体化过程 水稻是典型的二倍体植物,其核基因组中共有12条染色体。在水稻基因组被完整测序之前,人们就已经采用分子标记、DNA重复元件等方法探究水稻基因组的古多倍体化(polyploidization)过程,并发现了一些重复的染色体片段。随着水稻基因组测序计划的完成,越来越多的证据表明水稻基因组曾发生过全基因组复制(whole genome duplication),即古多倍体化过程。 Golf等利用鸟枪法完成了粳稻品种日本晴全基因组的测序工作,并利用同义替换率分布方法(Ks- based age distribution)提出水稻基因组可能发生过一次全基因组复制过程。此后多家研究机构和一些研究者对水稻基因组中的重复片段进行了研究,虽然得出的结论不尽相同,但均发现水稻基因组中存在大量的重复片段。根据所采用方法和参数的不同,这些重复片段占整个水稻基因组的15%~62%。Yu 等在水稻基因组中发现了18对大的重复片段,大约占整个基因组的65.7%。其中17对重复片段形成的时间很相近,发生在禾本科物种分化之前;最近的一次片段复制事件发生在水稻11和12号染色体之间,在禾本科物种分化之后。 水稻基因组被测序之后,许多科研机构对基因组数据进行了详尽的注释。其中应用比较广泛的是美国基因组研究院(the institute for genome research,TIGR)和日本农业生物科学研究所(national in- stitute of agrobiological sciences,NIAS)的水稻基因组注释信息。TIGR根据其注释的结果和基因相似性矩阵(gene homology matrix,GHM)方法,检测到大量染色体间的重复片段,这些重复片段几乎覆盖了整个水稻基因组。TIGR水稻基因组注释数据库从第4版开始便增加了对片段重复的注释,该分析是利用DAGChainer程序进行的,重复片段采用100 kb和500 kb 2种参数模型进行了染色体片段的基因共线性分析(图1),这是全基因组复制的有力证据。根据复制片段上同源基因的分子进化分析,估计全基因组复制发生在大约7 000万年前,在禾本科物种分化之前。此外,Zhang等利用TIGR更新的数据进行分析,采用同义替换率分布方法检测到另一次更古老的(单、双子叶植物分化前)基因组复制事件,说明水稻基因组至少经历了2次全基因组复制过程。 全基因组复制或多倍体化是植物尤其是禾本科作物物种形成和进化过程中非常重要的事件,大部分开花植物在进化过程中均经历了多倍体化过程。基因组加倍后,再经历所谓的二倍体化过程(diploidization),进化成当代的二倍体物种,并造成大量重复片段中基因的重排和丢失。Salse等研究发现基因组复制事件对禾本科植物的物种形成和演变具有重要作用。他们认为禾本科植物的祖先物种是一个基因组内包含5条染色体的物种,在进化过程中,首先在距今5 000~7 000万年前经基因组复制产生了10条染色体;此后,在基因组内发生了2次染色体置换和融合而形成了12条中间态染色体。以这12条中间态染色体为基础,逐渐分化出水稻、小麦、玉米和高粱的基因组,其中水稻基因组保留了原有的12条中间态染色体,而小麦、玉米和高粱均又发生了染色体丢失和融合才形成了现有的基因组。水稻全基因组复制片段是至今为止在动、植物基因组中发现的最为清晰、完整的基因组复制的遗迹。水稻之所以保存这么完整,一方面是水稻基因组保持了12条中间态染色体的基本形态,另一方面可能与水稻基因组相对较稳定有关。 2水稻籼粳2个亚种的分化 水稻是世界上最重要的粮食作物之一,在其11 500多年的栽培历史中,因适应不同的农业生态环境而产生了丰富的遗传多样性和明显的遗传分化。长期以来,基于形态性状、同工酶以及对一些化合物不同反应的研究,把亚洲栽培稻(Oryza sativa L.)分为籼稻(indica)和粳稻(japonica)2个亚种。其中籼亚种耐湿耐热,主要适应于热带和亚热带等低纬度地区,而粳亚种则耐寒耐弱光,适应于高纬度和高海拔地区种植。这2个亚种间不仅产生了生殖隔离的基因库,还在形态特征、农艺性状和生理生化反应等方面存在明显的差异。近期群体

下一代测序技术

下一代测序技术 摘要:DNA测序技术对生物学的发展有着最根本的意义。Sanger法测序经过了30年的应用和发展,而在过去三年中,以454, solexa, SOLiD为代表的高通量测序平台已经大幅度降低了测序成本,提高了测序速度,成为基因组测序市场的主流。在此基础上,各种下一代测序技术正在快速研发,将使基因组测序和重测序的通量和成本更加平民化,为基因组学、遗传学、生物医学和健康科学等领域的发展创造更加广阔的前景。本文将对所有新的测序技术的原理、优势和应用进行总结和展望。 1977年Maxim、Gilbert发明的化学降解法测序技术和Sanger发明的双脱氧末端终止法测序技术不仅为他们赢得了诺贝尔奖,也使得从DNA序列层面研究分子遗传学成为可能。特别是后者,从最开始的凝胶电泳到越来越高通量的毛细管电泳,从开始的手工操作到越来越多自动测序仪的出现,各种改进的Sanger 测序技术统治了DNA测序领域三十年,至今仍在长片段测序,大片段文库测序方面有广泛的应用。人类基因组计划(HGP)的完成就是靠Sanger测序法。 在耗费了庞大成本的人类基因组计划宣布完成之后,越来越多的物种基因组测序工作对测序成本和通量提出了更高的要求,新一代测序技术(也被称为第二代测序技术)开始登上历史舞台。2005年454 life science公司率先推出了焦磷酸测序技术,使测序成本较Sanger法降低了100倍,速度快了(提高)100倍,人类基因组测序逐步进入了100,000美元时代。如今,454 FLX测序仪(Roche Applied Science)、基于“边合成边测序”的Solexa测序仪(Illumina Inc.)和使用“边连接边测序”的SOLiD测序仪(Applied Biosystems)已经成为基因组测序市场的主流机型。除此之外,2008年一年内又有HeliScope单分子测序仪(Helicos)和Polonator(Dover/Harvard)两种测序机型商品化。 在NHGRI(美国人类基因组研究中心)的支持和推动下,未来几年内测序成本将在目前基础上再下降100倍,最终使个人基因组测序成本降至1000美元,人类将革命性的进入个人基因组时代。高通量和低成本的测序技术将进入到普通实验室,基因组测序的简单化将使分子生物学飞跃发展,个人基因组测序产业化也将对健康医学等领域产生革命性的影响。本文将首先对目前已经商品化的新一代测序技术(454、Solexa、SOLiD、HeliScope)做一介绍和比较,再对正在研发中的各种下一代测序方法(第三代测序技术)的原理和应用做一详细的介绍和展望。 1. Roche 454测序技术 2005年454生命科学公司在《自然》杂志发表论文,介绍了一种区别于传统Sanger法的全新高通量测序方法,将测序成本降低了100倍以上,开创了第二代测序技术的先河,454测序仪也成为最先商品化的第二代测序仪。正是在此基础上,其它如Solexa、SOLiD等第二代测序仪才相继问世。454测序技术的原理在于首先使用乳液PCR(emulsion PCR)技术(图一a)扩增已经连接上接头的基因组文库片段,扩增子结合在28 μm的磁珠表面,将乳液破坏后用变性剂处理磁珠,再将含有扩增子的磁珠富集到芯片表面,用测序引物进行测序。在测序过程中,454使用了一种“焦磷酸测序技术”(Pyrosequencing),即在合成DNA 互补链的过程中,每加入一种单核苷酸(dNTP),如与模板链配对结合,就会释放出一个焦磷酸,与底物腺苷-5’-磷酸硫酸(APS)在A TP硫酸化酶作用下合成A TP,与荧光素(Luciferin)一起在荧光素酶(Luciferase)的作用下,会发出一个光信号,由芯片背后连接的电荷耦合装置(CCD,Charge Coupled Device)捕捉。454测序技术合成DNA链使用的是普通单核苷酸,没有任何标记,合成中也没有切割基团等生化反应,因此读长可以达到300-400bp。但没有阻断(block)和去阻断(de-block)过程也意味着对连续重复单核苷酸的阅读只能根据信号强度来判断,容易对其中插入和缺失碱基阅读错误。454测序技术相比较其他第二代测序技术如Solexa和SOLiD, 在读长上有着巨大的优势,但是目前成本要略高。总体而言,高读长使得454技术比较利于De Novo拼接和测序。

ICD-10精神分裂症诊断标准

ICD-10中精神分裂症诊断标准 1、症状标准:具备下述(1)-(4)中的任何一组(如不甚明确常需两个或多个症状)或(5)-(9)至少两组症状群中的十分明确的症状。 1)思维鸣响、思维插入、思维被撤走及思维广播。 2)明确涉及躯体或四肢运动,或特殊思维、行动或感觉的被影响、被控制或被动妄想、妄 想性知觉。 3)对患者的行为进行跟踪性评论,或彼此对患者加以讨论的幻听,或来源于身体某一部分 的其他类型的幻听。 4)与文化不相称且根本不可能的其他类型的持续性妄想,如具有某种宗教或政治身份,或 超人的力量和能力。 5)伴转瞬即逝或未充分形成的无明显情感内容的妄想,或伴有持久的超价观念,或连续数 周或数月每日均出现的任何感官的幻觉。 6)思潮断裂或无关的插入语,导致言语不连贯,或不中肯或语词新作。 7)紧张性行为,如兴奋、摆姿势,或蜡样屈曲、违拗、缄默及木僵。 8)阴性症状,如显著情感淡漠、言语贫乏、情感迟钝或不协调,常导致社会退缩及社会功 能下降,但需澄清这些症状并非由抑郁症或神经阻滞剂治疗所致。 9)个人行为的某些方面发生显著而持久的总体性质的改变,表现为丧失兴趣、缺乏目的、 懒散、自我专注及社会退缩。 2、病程标准:特征性症状在至少1个月或以上时期的大部分时间内肯定存在以上1~4症状至少1个,或5~10至少2组症状群中的十分明确的症状。 3、排除标准:有三条。 (1)存在广泛情感症状(抑郁、躁狂)时,就不应作出精神分裂症的诊断,除非明确分裂症的症状早于情感症状出现。(排除心境障碍) (2)分裂症的症状和情感症状一起出现,程度均衡,应诊断分裂情感性障碍。 (3)严重脑病、癫痫、药物中毒或药物戒断状态应排除。 4、鉴别诊断:(举例) 1)神经衰弱:病人的自知力是完全存在的,病人自己完全了解自己病情的变化和处境,甚 至还对自己的病情做出过重的评价,情感反应强烈,积极要求治疗。 2)强迫性神经症: 患者能认识到强迫症状源于自身,严重干扰了自己的日常生活、学习和 工作。为此感到十分苦恼,并企图加以排除和对抗,迫切要求治疗 3)抑郁症: 病人的情绪是发自内心的,并非受幻觉和妄想的影响,常伴有自卑、自责等, 内心体验深刻,思维常是迟钝的,整个精神活动是协调的,可以伴有幻觉和妄想,但经过治疗,很快可以消失的。 4)躁狂症: 情感反应活跃、甚至高涨、生动、有感染力,情感表现无论悲喜哀乐均与思维 内容相一致与周围环境的接触、与周围人接触主动、洞察反应敏捷,动作增加,思维奔逸。 5)偏执型精神分裂症 : 妄想常较荒谬、离奇、泛化,常伴有幻觉,此外,情感反应不协 调,起病年龄较早。无精神衰退是与精神分裂症最大的不同。 6)反应性精神病: 在持久的精神刺激下,也可以出现以妄想为主要表现的偏执状态。主动 讲述自己的不幸遭遇,以求得到周围人的支持和同情,病态体验在逻辑推理上接近正常人,并且情感反应鲜明强烈。此外,病人接受心理治疗的态度是主动的,精神症状随着精神刺激的解除可逐渐减轻、消失。 注:其它诊断标准参考ICD-10精神与行为障碍分类(临床描述与诊断要点)

一代、二代、三代测序技术

一代、二代、三代测序技术 (2014-01-22 10:42:13) 转载 第一代测序技术-Sanger链终止法 一代测序技术是20世纪70年代中期由Fred Sanger及其同事首先发明。其基本原理是,聚丙烯酰胺凝胶电泳能够把长度只差一个核苷酸的单链DNA分子区分开来。一代测序实验的起始材料是均一的单链DNA分子。第一步是短寡聚核苷酸在每个分子的相同位置上退火,然后该寡聚核苷酸就充当引物来合成与模板互补的新的DNA链。用双脱氧核苷酸作为链终止试剂(双脱氧核苷酸在脱氧核糖上没有聚合酶延伸链所需要的3-OH基团,所以可被用作链终止试剂)通过聚合酶的引物延伸产生一系列大小不同的分子后再进行分离的方法。测序引物与单链DNA模板分子结合后,DNA聚合酶用dNTP延伸引物。延伸反应分四组进行,每一组分别用四种ddNTP(双脱氧核苷酸)中的一种来进行终止,再用PAGE分析四组样品。从得到的PAGE胶上可以读出我们需要的序列。 第二代测序技术-大规模平行测序 大规模平行测序平台(massively parallel DNA sequencing platform)的出现不仅令DNA测序费用降到了以前的百分之一,还让基因组测序这项以前专属于大型测序中心的“特权”能够被众多研究人员分享。新一代DNA测序技术有助于人们以更低廉的价格,更全面、更深入地分析基因组、转录组及蛋白质之间交互作用组的各项数据。市面上出现了很多新一代测序仪产品,例如美国Roche Applied Science公司的454基因组测序仪、美国Illumina公司和英国Solexa technology公司合作开发的Illumina测序仪、美国Applied Biosystems公司的SOLiD测序仪。Illumina/Solexa Genome Analyzer测序的基本原理是边合成边测序。在Sanger等测序方法的基础上,通过技术创新,用不同颜色的荧光标记四种不同的dNTP,当DNA聚合酶合成互补链时,每添加一种dNTP就会释放出不同的荧光,根据捕捉的荧光信号并经过特定的计算机软件处理,从而获得待测DNA的序列信息。以Illumina测序仪说明二代测序的一般流程,(1)文库制备,将DNA用雾化或超声波随机片段化成几百碱基或更短的小片段。用聚合酶和外切核酸酶把DNA片段切成平末端,紧接着磷酸化并增加一个核苷酸黏性末端。然后将Illumina测序接头与片段连接。(2)簇的创建,将模板分子加入芯片用于产生克隆簇和测序循环。芯片有8个纵向泳道的硅基片。每个泳道内芯片表面有无数的被固定的单链接头。上述步骤得到的带接头的DNA 片段变性成单链后与测序通道上的接头引物结合形成桥状结构,以供后续的预扩增使用。通过不断循环获得上百万条成簇分布的双链待测片段。(3)测序,分三步:DNA聚合酶结合荧光可逆终止子,荧光标记簇成像,在下一个循环开

水稻基因组

第四节基因组分析列举:水稻基因组分析 本节将结合我们近年来的一些研究结果,重点对第一个被基因组测序的作物——水稻的基因组研究和分析结果进行介绍。 水稻是第一个被全基因组测序的作物。亚洲栽培稻(Oryza sativa)共有2个亚种(籼稻和粳稻),其中一个粳稻品种“日本晴”分别通过全基因组鸟枪法(Goff et al, 2002)和逐步克隆方法(Sasaki et al, 2002; Feng et al, 2002; The Rice Chromosome 10 Sequencing Consortium, 2003; The Rice Genome Sequencing Project, 2005)测序,另一个籼稻品种“9311”通过全基因组鸟枪法测序(Yu et al, 2002; Yu et al, 2005)。除了核基因组外,水稻的叶绿体基因组序列早在15年前就已测序完成(Hiratsuka et al, 1989),同时,其线粒体基因组最近也被测序完成(Notsu et al. 2002)。 在获得基因组序列后,一项艰巨的研究任务是如何从巨量的水稻基因组序列中挖掘出潜藏的遗传事件、进化机制等重要生物信息。为此本文结合我们自身的一些研究工作,重点介绍了近年来在水稻基因组序列分析中获得的几项最新的研究结果。 1 现代的二倍体,古老的多倍体 2004年水稻基因组研究的一个重要进展,是获得清晰的证据表明水稻基因组曾发生过全基因组倍增。Paterson等( 2004)、Guyot 等(2004)和我们(Fan et al, 2004;Zhang et al, 2005a)的研究结果也一致表明,在禾本科作物分化前发生过一次全基因组倍增(whole-genome duplication)。早在2002年,根据最初的

相关主题
文本预览
相关文档 最新文档