Yeast DNA Purification comparisons
- 格式:pdf
- 大小:162.62 KB
- 文档页数:7
第一章为什么基因克隆和DNA 分析很重要genetic engineering: is the direct manipulation of an organism’s genome using modern DNAtechnology. It involves the introduction of foreign DNA or synthetic denes into the organism of interest.1.How to Create a Genetically Modified Plant?Ans:①Create recombinant bacteria with desired gene.②Allow the bacteria to “infect” the plant cells.③Desired gene is interested into plant chromosomes.2.Why gene cloning and PCR are so impartant?Ans: because both techniques can provide a pure sample of an individual gene, separated from all the other genes in the cell.第二章基因克隆的载体:质粒和噬菌体1.Please list essential parts of a cloning vetor .(1) Self-replication: ori site.(2)Plasmid size: as small as possible.(3) Insert site: MCS (Multi-cloning site ).(4)Selectable marker.(5)Not affect host cell.2.Classification of plasmid.1)Based on the main characteristic coded by the plasmid gene.①Fertility or F plasmids②Resistance or R plasmids③Col plasmid④Degradative plasmids⑤virulence plasmids2)depend on whether they carry the tra genes:① conjugative plasmids②non-conjugative plasmids.3)based on the copies they carrying :relaxed plasmids;stringent plasmidsWords:1. Plasmid:A usually circular piece of DNA, primarily independent of the host chromosome, often found in bacteria and some other types of cells.2. cccDNA:(covalently closed-circular DNA): A completely double-standed circular DNA molecule, with no nicks or disconuities, usually with a supercoiled conformation.3. ori(origin of replication: The specific position on a DNA molecule where DNA replication begins.4. episome: A plasmid capable of integration into the host cell's chromosome.5. copy number: The number of molecules of a plasmid contained in a single cell.6. bacteriophage:A virus whose host is a bacterium。
蛋白质互作技术研究进展蛋白质是生命活动的重要组成部分,因此研究蛋白质互作对生命科学的发展有着至关重要的作用。
蛋白质互作技术可以分析蛋白质与蛋白质之间的相互作用机制,揭示细胞信号传递、新药靶点的发现等方面的重要信息。
本文将在介绍蛋白质互作技术之前,先对蛋白质互作的概念及重要性进行分析,接着介绍当前主流的蛋白质互作技术及各自的优缺点。
蛋白质互作的概念及其重要性生命体内的许多生化反应和信号传导都是通过特定蛋白质之间的相互作用来实现的。
因此,了解蛋白质之间的互作关系对于研究细胞信号传递、代谢调控、遗传信息转移、免疫调节、疾病发生机制等生命科学领域具有重要作用。
目前,已知的蛋白质互作关系数量已经达到数百万个,但仍然有许多未被发现的尚未知晓功能的蛋白质之间的互作关系,因此蛋白质互作研究仍然是一个具有广泛研究价值的领域。
1. Yeast Two-Hybrid技术Yeast Two-Hybrid技术是目前最常用的蛋白质互作筛选技术之一。
它是利用酵母细胞内的一些特定标记,如His、Leu等,来筛选蛋白质之间的互作关系。
该技术用于寻找互作蛋白质的方法是将两个蛋白质与自激活域(AD)和DNA结合域(BD)融合。
这两个融合蛋白质的结合依据是核酸中三联体密码子与土壤中菌株中相应的酵母二杂合芯底盘激活因子相互作用。
在合并AD融合蛋白和BD融合蛋白的过程中,这两个蛋白质会结合,形成一个活化蛋白质,激活酵母细胞中的报告基因,最终完成蛋白质相互作用的筛选。
2. Tandem Affinity Purification(TAP)技术TAP技术是利用融合标签的方式,通过纯化过程寻找互作蛋白质的技术。
该技术与传统的亲和层析技术的主要区别在于,TAP技术采用两个不同的化合物标签,即BSA (biotinylated streptavidin)和CAL(calmodulin-binding peptide),这样可以更好的分离出蛋白质互作复合体。
广东省广州市奥林匹克中学2023-2024学年高二上学期期中英语试题学校:___________姓名:___________班级:___________考号:___________一、阅读理解· Do an anonymous (匿名的) act of kindnessfor someone.· Leave a smile card behind to encourage them to pay-it-forward.· Share your story here to spread the inspiration.· hanged the world, one kind act at a time.ORDER SMILE CARDSTo request Smile Cards, please fill out the form below. A volunteer will mail you an order of ten cards within two weeks. Smile Cards are offered to anyone who requests them on a pay-it-forward basis. That means there is no charge for a set of cards. Someone before you has paid for your cards, and you are invited to keep the chain going and pay-forward whatever you wish for the next person! For special events or circumstanced, you can also place a large quantity of request.Note: When using a Smile Card, remember not to just hand it out by itself. The idea is to do something kind for someone and then leave the Smile Card behind, so that they know someone reached out to them, and that they are invited to pay-forward the kindness and keep the rippled going!Country: *-select -Name: * ________There is a daily maximum order for each country. Please select a country first to make sure we have not exceeded the quota(定额) for the day.Address: *________City: * ________State: *- select-Email address: * ________Inspiration: * ________Please tell us what inspired you to order Smile Cards, and give us an example of a kind1.When can you use a smile card?A.Your friend's birthday is approaching.B.Your classmate has won the first prize.C.You've ordered lunch for a poor friend.D.You find your classmate in low spirits. 2.To order smile cards, you have to________A.pay for them in advance B.place a big quantity of requestC.mail some necessary information D.tell what favor you will do for others 3.The purpose of using smile cards is to encourage people to ________A.spread kindness B.become volunteersC.advertise for Kind Spring D.pay others' kindness backAfter lab-grown meat, are you getting ready for animal-free cow’s milk? A San Francisco startup believes it has found a solution.Through a combination of yeast (酵母), cow DNA and plant nutrients, Perfect Day claims to have created a product identical in taste and nutritional value to cow’s milk, but without any cows involved. It will satisfy consumers who love eating dairy (乳制品的) ice-cream, cheese and yoghurt, but loathe factory-style farming and its environmental footprint.Sales of milk alternatives such as soy, coconut and more recently pea milk are expected to be on the rise. But until now they have not cut traditional milk and dairy production. “The alternatives for yoghurt, cheese and ice-cream are so bad that people don’t even want to try them,” says Perfect Day co-founder Ryan Pandya.The missing ingredient (成分) in plant-based alternatives is cow’s milk proteins. To make the animal-free cow’s milk, Perfect Day puts cow DNA — which is readily available due to decades of research by the dairy industry — into yeast and adds sugar to create cow’smilk proteins through fermentation (发酵). These milk proteins are then combined with sugar, fats and nutrients to create the final product.“We’re taking plant nutrients and transforming them into animal proteins the same way that cows do, using the same milk proteins as found in cow’s milk, but much more efficiently, because we’re using a yeast cell not an animal,” said Pandya.Although comparisons have been made with lab-grown meat, Pandya said they are not using novel technology. Many people initially go ‘oh is this like lab or test-tube milk’, but that’s wrong. There are no test tubes in our fermentation process. The meat folks are trying to invent technology that doesn’t exist today, but our milk is made through techniques that have been in use for more than three decades.4.What does the underlined word “loathe” mean in Paragraph 2?A.Ignore.B.Doubt.C.Tolerate.D.Hate. 5.Which of the following is a part of Perfect Day’s milk-making process?A.Mixing cow DNA with yeast and sugar.B.Adding sugar and fats to plant milk.C.Mixing plant milk with cow milk.D.Adding cow DNA to plant milk. 6.What does Pandya think of their product?A.It tastes like test-tube milk.B.It needs to be tested further.C.It is produced with existing technology.D.It is well-received by green food lovers.There is something to be said for being a generalist, even if you are a specialist. Knowing a little about a lot of things that interest you can add to the richness of a whole, well-lived life.Society pushes us to specialize, to became experts. This requires commitment to a particular occupation, branch of study or research. The drawback to being specialists is we often come to know more and more about less and less. There is a great deal of pressure to master one’s field. You may pursue training, degrees, or increasing levels of responsibility at work. Then you discover the pressure of having to keep up.Some people seem willing to work around the clock in their narrow specialty. But such commitment can also weaken a sense of freedom. These specialists could work at the office until ten each night, then look back and realize they would have loved to have gone home and enjoyed the sweetness of their family and friends, or traveled to exciting places, meeting interesting people. Mastering one thing to the exclusion (排除) of others can hold back yourtrue spirit.‘Generalists, on the other hand, know a lot about a wide range of subjects and view the whole with all its connections. They are people of ability, talent, and enthusiasm who can bring their broad perspective into specific fields of expertise (专长). The doctor who is also a poet and philosopher is a superior doctor, one who can give so much more to his patients than just good medical skills.Things are connected. Let your expertise in one field fuel your passions in all related areas. Some of your interests may not appear to be connected but, once you explore their depths, you discover that they are. My editor Toni, who is also a writer, has edited several history books. She has decided to study Chinese history. Fascinated by the structural beauty of the Forbidden City as a painter, she is equally interested to learn more about Chinese philosophy. “I don’t know where it will lead, but I’m excited I’m on this pursuit.”These expansions into `new worlds help us by giving us new perspectives. We begin to see the interconnectedness of one thing to another in all aspects of our life, of ourselves and the universe. Develop broad, general knowledge and experience. The universe is all yours to explore and enjoy.7.What is good about being a generalist?A.You can enjoy your life to the fullest.B.You know more about your occupation.C.You don’t need to be pushed by society.D.You will need to know a little about many things.A.choices B.regrets C.perspectives D.expectationsA.should love poetry and philosophyB.is fully aware of his talent and abilityC.is a committed specialist in medicineD.brings knowledge of other fields to work10.What does the author intend to show with the example of Toni?A.Seemingly unrelated interests are in a way connected.B.In-depth exploration will make our discoveries possible.C.Everyone has a chance to succeed as long as they pursue.D.Passion alone does not actually ensure a person’s success.二、七选五Desertification, the process by which fertile (肥沃的) land becomes desert, has severe impacts on food production and is worsened by climate change. 11Africa’s Great Green Wall is a project to build an 8,000- kilometre-long forest across 11 of the continent s countries. The project is meant to contain the growing Sahara Desert and fight climate change. 12 They include limited political support, lack of money, weak organizational structures, and not enough consideration for the environment. Just 4 million hectares (公顷) of land have been turned into forest since work on the Green Wall began 15 years ago. 13First proposed in 2005, the project aims to plant a forest from Senegal on the Atlantic Ocean in western Africa to Eritrea, Ethiopia and Djibouti in the east. 14 It could also reduce levels of climate-related migration in the area and capture hundreds of millions of tons of carbon dioxide from the air. Several countries have struggled to keep up with the demands of the project.15 Eritrea, Ethiopia, and Sudan have all expanded their efforts. Ethiopia is producing 5.5 billion seedlings leading to thousands of hectares of restored land. Efforts in Eritrea and Sudan have also resulted in nearly 140,000 hectares of newly planted forest. The U. N. desertification agency says the project will need to plant an average of 8.2 million hectares yearly to reach its goal of 100 million hectares by 2030.A.But the project faces many problems.B.That is only 4 percent of the programme’s goal.C.However, it is difficult to work on the Great Green Wall.D.A quarter of Africa is under threat of food shortage.E.Some progress has been made in recent years in the east of the continent.F.Supporters hope that the project will create millions of green jobs in rural Africa.G.The U.N. says up to 45 percent of Africa’s land is impacted by desertification, worse than any other continent.三、完形填空I had been trying for months to find a summer job working for a newspaper, to prove27.A.good-looking B.sunburned C.strong D.warm 28.A.accept B.recommend C.save D.quit 29.A.applications B.goods C.plans D.advertisements 30.A.patiently B.only C.still D.proudly 31.A.task B.interview C.award D.promotion 32.A.reporter B.photographer C.typist D.secretary 33.A.understood B.explained C.wondered D.noticed 34.A.comment B.experience C.dream D.suggestion 35.A.sure B.able C.ready D.afraid四、用单词的适当形式完成短文阅读下面短文,在空白处填入1个适当的单词或括号内单词的正确形式,并将答案填写在答题卡上。
不同破壁方法提取酵母菌总RNA 的比较易 弋1,容元平1,程谦伟1,黎 娅1,王晓林2(1.广西工学院生物与化学工程系,广西 柳州 545006;2. 中国人民解放军军事医学科学院微生物流行病研究所病原微生物生物安全国家重点实验室,北京 100071)摘 要:选择液氮碾磨、反复冻融、超声波、加玻璃珠漩涡振荡和蜗牛酶酶解5种方法破碎酵母菌细胞壁,再使用Trizol 法提取酵母菌总RNA ,通过对总RNA 进行质量浓度和纯度分析,比较不同破壁方法对酵母菌总RNA 提取的影响。
结果表明:在使用Trizol 法提取酵母菌总RNA 的实验中,反复冻融和液氮碾磨是较为有效且简便的酵母菌细胞壁破碎方法。
关键词:Tr iz ol 试剂;R NA 提取;酵母菌;破壁Comparison of Different Cell Wall Disruption Methods for Yeast Total RNA ExtractionYI Yi 1,RONG Yuan-ping 1,CHENG Qian-wei 1,LI Ya 1,WANG Xiao-lin 2(1. Department of Biological and Chemical Engineering, Guangxi University of Technology, Liuzhou 545006, China ;2. State Key Laboratory of Pathogen and Biosecurity, Institute of Microbiology and Epidemiology, Academy of Military Medical Sciences of thePLA, Beijing 100071, China)Abstract :Five methods such as liquid nitrogen grinding, repeated freeze-thawing, ultrasonic treatment, vortex shaking with glass beads and snailase hydrolysis were used to break the cell wall of yeast for the extraction of total RNA by the Trizol method.Based on total RNA concentration and purity, these methods were compared for their effects on total RNA extraction from yeast. The results showed liquid nitrogen grinding and repeated freeze-thawing were the most effective and convenient methods for total RNA extraction from yeast by Trizol.Key words :Trizol ;RNA extraction ;yeast ;cell wall disruption中图分类号:Q936 文献标识码:A 文章编号:1002-6630(2011)11-0161-04收稿日期:2010-09-03基金项目:广西壮族自治区青年科学基金项目(桂科青0991010);广西壮族自治区教育厅科研项目(200707MS069)作者简介:易弋(1979—),男,副教授,博士,主要从事生物工程研究。
绿色荧光蛋白(GFP)的基因克隆及表达摘要绿色荧光蛋白(GFP)是一类存在于包括水母、水螅和珊瑚等腔肠动物体内的生物发光蛋白。
采用PCR技术,对实验室提供的质粒pEGFP-N1中的目的基因进行扩增。
所得PCR产物和质粒pET-28b经过BamH I和Nde I双酶切后,用琼脂糖凝胶电泳法检测酶切产物的酶切情况并回收凝胶,再利用T4DNA连接酶将目的基因与载体连接起来,得到重组质粒。
将重组质粒导入克隆菌E. coli DH5a中培养扩增,提取阳性菌落质粒进行重组子鉴定,进而导入表达菌E. coLi BL-21大肠杆菌感受态细胞中,经IPTG诱导目的基因表达产生绿色荧光蛋白。
关键词:绿色荧光蛋白 PCR 基因克隆表达1.前言1.1绿色荧光蛋白(green fluorescent protein,GFP)绿色荧光蛋白是一类存在于包括水母、水螅和珊瑚等腔肠动物体内的生物发光蛋白。
当受到紫外或蓝光激发时,GFP 发射绿色荧光[1]。
1.2 GFP 的结构GFP中央是一个圆柱形水桶样结构,如图二。
长420 nm,宽240 nm,由11 个围绕中心α螺旋的反平行β折叠组成,荧光基团的形成就是从这个螺旋开始的,桶的顶部由3个短的垂直片段覆盖,底部由一个短的垂直片段覆盖,对荧光活性很重要的生色团则位于大空腔内。
发色团是由其蛋白质内部第65-67位的Ser-Tyr-GLy自身环化和氧化形成。
1.3 GFP的研究应用GFP可标记细胞和蛋白质,具有广泛的应用前景。
GFP及其突变体已被广泛应用于基因表达调控、蛋白质空间定位、生物分子之间相互作用、转基因动物]2[等方面。
基于新型功能荧光蛋白的光学分子成像技术的发展,为在活细胞乃至活体动物内研究基因表达和蛋白质功能提供了更多的选择空间。
GFP还用于观察微生物、发育机理研究、细胞筛选、免疫学等方面。
本实验是利用实验室提供的质粒pEGFP-N1,其结构如图三所示。
其上有所用酶的酶切位点。
生物信息学主要英文术语及释义Abstract Syntax Notation (ASN.l)(NCBI发展的许多程序,如显示蛋白质三维结构的Cn3D等所使用的内部格式)A language that is used to describe structured data types formally, Within bioinformatits,it has been used by the National Center for Biotechnology Information to encode sequences, maps, taxonomic information, molecular structures, and biographical information in such a way that it can be easily accessed and exchanged by computer software.Accession number(记录号)A unique identifier that is assigned to a single database entry for a DNA or protein sequence.Affine gap penalty(一种设置空位罚分策略)A gap penalty score that is a linear function of gap length, consisting of a gap opening penalty and a gap extension penalty multiplied by the length of the gap. Using this penalty scheme greatly enhances the performance of dynamic programming methods for sequence alignment. See also Gap penalty. Algorithm(算法)A systematic procedure for solving a problem in a finite number of steps, typically involving a repetition of operations. Once specified, an algorithm can be written in a computer language and run as a program.Alignment(联配/比对/联配)Refers to the procedure of comparing two or more sequences by looking for a series of individual characters or character patterns that are in the same order in the sequences. Of the two types of alignment, local and global, a local alignment is generally the most useful. See also Local and Global alignments. Alignment score(联配/比对/联配值)An algorithmically computed score based on the number of matches, substitutions, insertions, and deletions (gaps) within an alignment. Scores for matches and substitutions Are derived from a scoring matrix such as the BLOSUM and PAM matrices for proteins, and aftine gap penalties suitable for the matrix are chosen. Alignment scores are in log odds units, often bit units (log to the base 2). Higher scores denote better alignments. See also Similarity score, Distance in sequence analysis.Alphabet(字母表)The total number of symbols in a sequence-4 for DNA sequences and 20 for protein sequences.Annotation(注释)The prediction of genes in a genome, including the location of protein-encoding genes, the sequence of the encoded proteins, anysignificantmatches to other Proteins of known function, and the location of RNA-encoding genes. Predictions are based on gene models; e.g., hidden Markov models of introns and exons in proteins encoding genes, and models of secondary structure in RNA.Anonymous FTP(匿名FTP)When a FTP service allows anyone to log in, it is said to provide anonymous FTP ser-vice. A user can log in to an anonymous FTP server by typing anonymous as the user name and his E-mail address as a password. Most Web browsers now negotiate anonymous FTP logon without asking the user for a user name and password. See also FTP.ASCIIThe American Standard Code for Information Interchange (ASCII) encodes unaccented letters a-z, A-Z, the numbers O-9, most punctuation marks, space, and a set of control characters such as carriage return and tab. ASCII specifies 128 characters that are mapped to the values O-127. ASCII tiles are commonly called plain text, meaning that they only encode text without extra markup.BAC clone(细菌人工染色体克隆)Bacterial artificial chromosome vector carrying a genomic DNA insert, typically 100–200 kb. Most of the large-insert clones sequenced in the project were BAC clones.Back-propagation(反向传输)When training feed-forward neural networks, a back-propagation algorithm can be used to modify the network weights. After each training input pattern is fed through the network, the network’s output is compared with the desired output and the amount of error is calculated. This error is back-propagated through the network by using an error function to correct the network weights. See also Feed-forward neural network.Baum-Welch algorithm(Baum-Welch算法)An expectation maximization algorithm that is used to train hidden Markov models.Baye’s rule(贝叶斯法则)Forms the basis of conditional probability by calculating the likelihood of an event occurring based on the history of the event and relevant background information. In terms of two parameters A and B, the theorem is stated in anequation: The condition-al probability of A, given B, P(AIB), is equal to the probability of A, P(A), times the conditional probability of B, given A, P(BIA), divided by the probability of B, P(B). P(A) is the historical or prior distribution value of A, P(BIA) is a new prediction for B for a particular value of A, and P(B) is the sum of the newly predicted values for B. P(AIB) is a posterior probability, representing a new prediction for A given the prior knowledge of A and the newly discovered relationships between A and B. Bayesian analysis(贝叶斯分析)A statistical procedure used to estimate parameters of an underlyingdistribution based on an observed distribution. See also Baye’s rule.Biochips(生物芯片)Miniaturized arrays of large numbers of molecular substrates, often oligonucleotides, in a defined pattern. They are also called DNA microarrays and microchips.Bioinformatics (生物信息学)The merger of biotechnology and information technology with the goal of revealing new insights and principles in biology. /The discipline of obtaining information about genomic or protein sequence data. This may involve similarity searches of databases, comparing your unidentified sequence to the sequences in a database, or making predictions about the sequence based on current knowledge of similar sequences. Databases are frequently made publically available through the Internet, or locally at your institution.Bit score (二进制值/ Bit值)The value S' is derived from the raw alignment score S in which the statistical properties of the scoring system used have been taken into account. Because bit scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches.Bit unitsFrom information theory, a bit denotes the amount of information required to distinguish between two equally likely possibilities. The number of bits of information, AJ, required to convey a message that has A4 possibilities is log2 M = N bits.BLAST (基本局部联配搜索工具,一种主要数据库搜索程序)Basic Local Alignment Search Tool. A set of programs, used to perform fast similarity searches. Nucleotide sequences can be compared with nucleotide sequences in a database using BLASTN, for example. Complex statistics areapplied to judge the significance of each match. Reported sequences may be homologous to, or related to the query sequence. The BLASTP program is used to search a protein database for a match against a query protein sequence. There are several other flavours of BLAST. BLAST2 is a newer release of BLAST. Allows for insertions or deletions in the sequences being aligned. Gapped alignments may be more biologically significant.Block(蛋白质家族中保守区域的组块)Conserved ungapped patterns approximately 3-60 amino acids in length in a set of related proteins.BLOSUM matrices(模块替换矩阵,一种主要替换矩阵)An alternative to PAM tables, BLOSUM tables were derived using local multiple alignments of more distantly related sequences than were used for the PAM matrix. These are used to assess the similarity of sequences when performing alignments.Boltzmann distribution(Boltzmann 分布)Describes the number of molecules that have energies above a certain level, based on the Boltzmann gas constant and the absolute temperature.Boltzmann probability function(Boltzmann概率函数)See Boltzmann distribution.Bootstrap analysisA method for testing how well a particular data set fits a model. For example, the validity of the branch arrangement in a predicted phylogenetic tree can be tested by resampling columns in a multiple sequence alignment to create many new alignments. The appearance of a particular branch in trees generated from these resampled sequences can then be measured. Alternatively, a sequence may be left out of an analysis to deter-mine how much the sequence influences the results of an analysis.Branch length(分支长度)In sequence analysis, the number of sequence changes along a particular branch of a phylogenetic tree.CDS or cds (编码序列)Coding sequence.Chebyshe, d inequalityThe probability that a random variable exceeds its mean is less than or equal to the square of 1 over the number of standard deviations from the mean. Clone (克隆)Population of identical cells or molecules (e.g. DNA), derived from a singleancestor.Cloning Vector (克隆载体)A molecule that carries a foreign gene into a host, and allows/facilitates the multiplication of that gene in a host. When sequencing a gene that has been cloned using a cloning vector (rather than by PCR), care should be taken not to include the cloning vector sequence when performing similarity searches. Plasmids, cosmids, phagemids, YACs and PACs are example types of cloning vectors.Cluster analysis(聚类分析)A method for grouping together a set of objects that are most similar from a larger group of related objects. The relationships are based on some criterion of similarity or difference. For sequences, a similarity or distance score or a statistical evaluation of those scores is used.CobblerA single sequence that represents the most conserved regions in a multiple sequence alignment. The BLOCKS server uses the cobbler sequence to perform a database similarity search as a way to reach sequences that are more divergent than would be found using the single sequences in the alignment for searches.Coding system (neural networks)Regarding neural networks, a coding system needs to be designed for representing input and output. The level of success found when training the model will be partially dependent on the quality of the coding system chosen. Codon usageAnalysis of the codons used in a particular gene or organism. COG(直系同源簇)Clusters of orthologous groups in a set of groups of related sequences in microorganism and yeast (S. cerevisiae). These groups are found by whole proteome comparisons and include orthologs and paralogs. See also Orthologs and Paralogs.Comparative genomics(比较基因组学)A comparison of gene numbers, gene locations, and biological functions of genes in the genomes of diverse organisms, one objective being to identify groups of genes that play a unique biological role in a particular organism. Complexity (of an algorithm)(算法的复杂性)Describes the number of steps required by the algorithm to solve a problem as a function of the amount of data; for example, the length of sequences to be aligned.Conditional probability(条件概率)The probability of a particular result (or of a particular value of a variable) given one or more events or conditions (or values of other variables). Conservation (保守)Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the physico-chemical properties of the original residue.Consensus(一致序列)A single sequence that represents, at each subsequent position, the variation found within corresponding columns of a multiple sequence alignment. Context-free grammarsA recursive set of production rules for generating patterns of strings. These consist of a set of terminal characters that are used to create strings, a set of nonterminal symbols that correspond to rules and act as placeholders for patterns that can be generated using terminal characters, a set of rules for replacing nonterminal symbols with terminal characters, and a start symbol. Contig (序列重叠群/拼接序列)A set of clones that can be assembled into a linear order. A DNA sequence that overlaps with another contig. The full set of overlapping sequences (contigs) can be put together to obtain the sequence for a long region of DNA that cannot be sequenced in one run in a sequencing assay. Important in genetic mapping at the molecular level.CORBA(国际对象管理协作组制定的使OOP对象与网络接口统一起来的一套跨计算机、操作系统、程序语言和网络的共同标准)The Common Object Request Broker Architecture (CORBA) is an open industry standard for working with distributed objects, developed by the Object Management Group. CORBA allows the interconnection of objects and applications regardless of computer language, machine architecture, or geographic location of the computers.Correlation coefficient(相关系数)A numerical measure, falling between -1 and 1, of the degree of the linear relationship between two variables. A positive value indicates a direct relationship, a negative value indicates an inverse relationship, and the distance of the value away from zero indicates the strength of the relationship. A value near zero indicates no relationship between the variables.Covariation (in sequences)(共变)Coincident change at two or more sequence positions in related sequencesthat may influence the secondary structures of RNA or protein molecules. Coverage (or depth) (覆盖率/厚度)The average number of times a nucleotide is represented by a high-quality base in a collection of random raw sequence. Operationally, a 'high-quality base' is defined as one with an accuracy of at least 99% (corresponding to a PHRED score of at least 20).Database(数据库)A computerized storehouse of data that provides a standardized way for locating, adding, removing, and changing data. See also Object-oriented database, Relational database.DendogramA form of a tree that lists the compared objects (e.g., sequences or genes in a microarray analysis) in a vertical order and joins related ones by levels of branches extending to one side of the list.Depth (厚度)See coverageDirichlet mixturesDefined as the conjugational prior of a multinomial distribution. One use is for predicting the expected pattern of amino acid variation found in the match state of a hid-den Markov model (representing one column of a multiple sequence alignment of proteins), based on prior distributions found in conserved protein domains (blocks).Distance in sequence analysis(序列距离)The number of observed changes in an optimal alignment of two sequences, usually not counting gaps.DNA Sequencing (DNA测序)The experimental process of determining the nucleotide sequence of a region of DNA. This is done by labelling each nucleotide (A, C, G or T) with either a radioactive or fluorescent marker which identifies it. There are several methods of applying this technology, each with their advantages and disadvantages. For more information, refer to a current text book. High throughput laboratories frequently use automated sequencers, which are capable of rapidly reading large numbers of templates. Sometimes, the sequences may be generated more quickly than they can be characterised. Domain (功能域)A discrete portion of a protein assumed to fold independently of the rest of the protein and possessing its own function.Dot matrix(点标矩阵图)Dot matrix diagrams provide a graphical method for comparing two sequences. One sequence is written horizontally across the top of the graph and the other along the left-hand side. Dots are placed within the graph at the intersection of the same letter appearing in both sequences. A series of diagonal lines in the graph indicate regions of alignment. The matrix may be filtered to reveal the most-alike regions by scoring a minimal threshold number of matches within a sequence window.Draft genome sequence (基因组序列草图)The sequence produced by combining the information from the individual sequenced clones (by creating merged sequence contigs and then employing linking information to create scaffolds) and positioning the sequence along the physical map of the chromosomes.DUST (一种低复杂性区段过滤程序)A program for filtering low complexity regions from nucleic acid sequences. Dynamic programming(动态规划法)A dynamic programming algorithm solves a problem by combining solutions to sub-problems that are computed once and saved in a table or matrix. Dynamic programming is typically used when a problem has many possible solutions and an optimal one needs to be found. This algorithm is used for producing sequence alignments, given a scoring system for sequence comparisons.EMBL (欧洲分子生物学实验室,EMBL数据库是主要公共核酸序列数据库之一)European Molecular Biology Laboratories. Maintain the EMBL database, one of the major public sequence databases.EMBnet (欧洲分子生物学网络)European Molecular Biology Network: / was established in 1988, and provides services including local molecular databases and software for molecular biologists in Europe. There are several large outposts of EMBnet, including EXPASY.Entropy(熵)From information theory, a measure of the unpredictable nature of a set of possible elements. The higher the level of variation within the set, the higher the entropy.Erdos and Renyi lawIn a toss of a “fair” coin, the number of heads in a row that can be expected is the logarithm of the number of tosses to the base 2. The law may begeneralized for more than two possible outcomes by changing the base of the logarithm to the number of out-comes. This law was used to analyze the number of matches and mismatches that can be expected between random sequences as a basis for scoring the statistical significance of a sequence alignment.EST (表达序列标签的缩写)See Expressed Sequence TagExpect value (E)(E值)E value. The number of different alignents with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. In a database similarity search, the probability that an alignment score as good as the one found between a query sequence and a database sequence would be found in as many comparisons between random sequences as was done to find the matching sequence. In other types of sequence analysis, E has a similar meaning.Expectation maximization (sequence analysis)An algorithm for locating similar sequence patterns in a set of sequences. A guessed alignment of the sequences is first used to generate an expected scoring matrix representing the distribution of sequence characters in each column of the alignment, this pattern is matched to each sequence, and the scoring matrix values are then updated to maximize the alignment of the matrix to the sequences. The procedure is repeated until there is no further improvement.Exon (外显子)。
Abstract Syntax Notation (ASN.l)(NCBI发展的许多程序,如显示蛋白质三维结构的Cn3D 等所使用的内部格式)A language that is used to describe structured data types formally, Within bioinformatits,it has been used by the National Center for Biotechnology Information to encode sequences, maps, taxonomic information, molecular structures, and biographical information in such a way that it can be easily accessed and exchanged by computer software.Accession number(记录号)A unique identifier that is assigned to a single database entry for a DNA or protein sequence.Affine gap penalty(一种设置空位罚分策略)A gap penalty score that is a linear function of gap length, consisting of a gap opening penalty and a gap extension penalty multiplied by the length of the gap. Using this penalty scheme greatly enhances the performance of dynamic programming methods for sequence alignment. See also Gap penalty.Algorithm(算法)A systematic procedure for solving a problem in a finite number of steps, typically involving a repetition of operations. Once specified, an algorithm can be written in a computer language and run as a program.Alignment(联配/比对/联配)Refers to the procedure of comparing two or more sequences by looking for a series of individual characters or character patterns that are in the same order in the sequences. Of the two types of alignment, local and global, a local alignment is generally the most useful. See also Local and Global alignments.Alignment score(联配/比对/联配值)An algorithmically computed score based on the number of matches, substitutions, insertions, and deletions (gaps) within an alignment. Scores for matches and substitutions Are derived from a scoring matrix such as the BLOSUM and PAM matrices for proteins, and aftine gap penalties suitable for the matrix are chosen. Alignment scores are in log odds units, often bit units (log to the base 2). Higher scores denote better alignments. See also Similarity score, Distance in sequence analysis.Alphabet(字母表)The total number of symbols in a sequence-4 for DNA sequences and 20 for protein sequences.Annotation(注释)The prediction of genes in a genome, including the location of protein-encoding genes, the sequence of the encoded proteins, anysignificantmatches to other Proteins of known function, and the location of RNA-encoding genes. Predictions are based on gene models; e.g., hidden Markov models of introns and exons in proteins encoding genes, and models of secondary structure in RNA.Anonymous FTP(匿名FTP)When a FTP service allows anyone to log in, it is said to provide anonymous FTP ser-vice. A user can log in to an anonymous FTP server by typing anonymous as the user name and his E-mail address as a password. Most Web browsers now negotiate anonymous FTP logon without asking the user for a user name and password. See also FTP.ASCIIThe American Standard Code for Information Interchange (ASCII) encodes unaccented letters a-z, A-Z, the numbers O-9, most punctuation marks, space, and a set of control characters such as carriage return and tab. ASCII specifies 128 characters that are mapped to the values O-127. ASCII tiles are commonly called plain text, meaning that they only encode text without extra markup.BAC clone(细菌人工染色体克隆)Bacterial artificial chromosome vector carrying a genomic DNA insert, typically 100–200 kb. Most of the large-insert clones sequenced in the project were BAC clones.Back-propagation(反向传输)When training feed-forward neural networks, a back-propagation algorithm can be used to modify the network weights. After each training input pattern is fed through the network, the network’s output is compared with the desired output and the amount of error is calculated. This error is back-propagated through the network by using an error function to correct the network weights. See also Feed-forward neural network.Baum-Welch algorithm(Baum-Welch算法)An expectation maximization algorithm that is used to train hidden Markov models.Baye’s rule(贝叶斯法则)Forms the basis of conditional probability by calculating the likelihood of an event occurring based on the history of the event and relevant background information. In terms of two parameters A and B, the theorem is stated in an equation: The condition-al probability of A, given B, P(AIB), is equal to the probability of A, P(A), times the conditional probability of B, given A, P(BIA), divided by the probability of B, P(B). P(A) is the historical or prior distribution value of A, P(BIA) is a new prediction for B for a particular value of A, and P(B) is the sum of the newly predicted values for B. P(AIB) is a posterior probability, representing a new prediction for A given the prior knowledge of A and the newly discovered relationships between A and B.Bayesian analysis(贝叶斯分析)A statistical procedure used to estimate parameters of an underlyingdistribution based on an observed distribution. S ee also Baye’s rule.Biochips(生物芯片)Miniaturized arrays of large numbers of molecular substrates, often oligonucleotides, in a defined pattern. They are also called DNA microarrays and microchips.Bioinformatics (生物信息学)The merger of biotechnology and information technology with the goal of revealing new insights and principles in biology. /The discipline of obtaining information about genomic or protein sequence data. This may involve similarity searches of databases, comparing your unidentified sequence to the sequences in a database, or making predictions about the sequence based on current knowledge of similar sequences. Databases are frequently made publically available through the Internet, or locally at your institution.Bit score (二进制值/ Bit值)The value S' is derived from the raw alignment score S in which the statistical properties of the scoring system used have been taken into account. Because bit scores have been normalized with respect to the scoring system, they can be used to compare alignment scores from different searches.Bit unitsFrom information theory, a bit denotes the amount of information required to distinguish between two equally likely possibilities. The number of bits of information, AJ, required to convey a message that has A4 possibilities is log2 M = N bits.BLAST (基本局部联配搜索工具,一种主要数据库搜索程序)Basic Local Alignment Search Tool. A set of programs, used to perform fast similarity searches. Nucleotide sequences can be compared with nucleotide sequences in a database using BLASTN, for example. Complex statistics are applied to judge the significance of each match. Reported sequences may be homologous to, or related to the query sequence. The BLASTP program is used to search a protein database for a match against a query protein sequence. There are several other flavours of BLAST. BLAST2 is a newer release of BLAST. Allows for insertions or deletions in the sequences being aligned. Gapped alignments may be more biologically significant.Block(蛋白质家族中保守区域的组块)Conserved ungapped patterns approximately 3-60 amino acids in length in a set of related proteins.BLOSUM matrices(模块替换矩阵,一种主要替换矩阵)An alternative to PAM tables, BLOSUM tables were derived using local multiple alignments of more distantly related sequences than were used for the PAM matrix. These are used to assess thesimilarity of sequences when performing alignments.Boltzmann distribution(Boltzmann 分布)Describes the number of molecules that have energies above a certain level, based on the Boltzmann gas constant and the absolute temperature.Boltzmann probability function(Boltzmann 概率函数)See Boltzmann distribution.Bootstrap analysisA method for testing how well a particular data set fits a model. For example, the validity of the branch arrangement in a predicted phylogenetic tree can be tested by resampling columns in a multiple sequence alignment to create many new alignments. The appearance of a particular branch in trees generated from these resampled sequences can then be measured. Alternatively, a sequence may be left out of an analysis to deter-mine how much the sequence influences the results of an analysis.Branch length(分支长度)In sequence analysis, the number of sequence changes along a particular branch of a phylogenetic tree.CDS or cds (编码序列)Coding sequence.Chebyshe, d inequalityThe probability that a random variable exceeds its mean is less than or equal to the square of 1 over the number of standard deviations from the mean.Clone (克隆)Population of identical cells or molecules (e.g. DNA), derived from a single ancestor.Cloning V ector (克隆载体)A molecule that carries a foreign gene into a host, and allows/facilitates the multiplication of that gene in a host. When sequencing a gene that has been cloned using a cloning vector (rather than by PCR), care should be taken not to include the cloning vector sequence when performing similarity searches. Plasmids, cosmids, phagemids, Y ACs and PACs are example types of cloning vectors.Cluster analysis(聚类分析)A method for grouping together a set of objects that are most similar from a larger group of related objects. The relationships are based on some criterion of similarity or difference. For sequences, a similarity or distance score or a statistical evaluation of those scores is used.CobblerA single sequence that represents the most conserved regions in a multiple sequence alignment. The BLOCKS server uses the cobbler sequence to perform a database similarity search as a way to reach sequences that are more divergent than would be found using the single sequences in the alignment for searches.Coding system (neural networks)Regarding neural networks, a coding system needs to be designed for representing input and output. The level of success found when training the model will be partially dependent on the quality of the coding system chosen.Codon usageAnalysis of the codons used in a particular gene or organism.COG(直系同源簇)Clusters of orthologous groups in a set of groups of related sequences in microorganism and yeast (S. cerevisiae). These groups are found by whole proteome comparisons and include orthologs and paralogs. See also Orthologs and Paralogs.Comparative genomics(比较基因组学)A comparison of gene numbers, gene locations, and biological functions of genes in the genomes of diverse organisms, one objective being to identify groups of genes that play a unique biological role in a particular organism.Complexity (of an algorithm)(算法的复杂性)Describes the number of steps required by the algorithm to solve a problem as a function of the amount of data; for example, the length of sequences to be aligned.Conditional probability(条件概率)The probability of a particular result (or of a particular value of a variable) given one or more events or conditions (or values of other variables).Conservation (保守)Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the physico-chemical properties of the original residue.Consensus(一致序列)A single sequence that represents, at each subsequent position, the variation found within corresponding columns of a multiple sequence alignment.Context-free grammarsA recursive set of production rules for generating patterns of strings. These consist of a set of terminal characters that are used to create strings, a set of nonterminal symbols that correspond to rules and act as placeholders for patterns that can be generated using terminal characters, a set of rules for replacing nonterminal symbols with terminal characters, and a start symbol.Contig (序列重叠群/拼接序列)A set of clones that can be assembled into a linear order. A DNA sequence that overlaps with another contig. The full set of overlapping sequences (contigs) can be put together to obtain the sequence for a long region of DNA that cannot be sequenced in one run in a sequencing assay. Important in genetic mapping at the molecular level.CORBA(国际对象管理协作组制定的使OOP对象与网络接口统一起来的一套跨计算机、操作系统、程序语言和网络的共同标准)The Common Object Request Broker Architecture (CORBA) is an open industry standard for working with distributed objects, developed by the Object Management Group. CORBA allows the interconnection of objects and applications regardless of computer language, machine architecture, or geographic location of the computers.Correlation coefficient(相关系数)A numerical measure, falling between - 1 and 1, of the degree of the linear relationship between two variables. A positive value indicates a direct relationship, a negative value indicates an inverse relationship, and the distance of the value away from zero indicates the strength of the relationship. A value near zero indicates no relationship between the variables.Covariation (in sequences)(共变)Coincident change at two or more sequence positions in related sequences that may influence the secondary structures of RNA or protein molecules.Coverage (or depth) (覆盖率/厚度)The average number of times a nucleotide is represented by a high-quality base in a collection of random raw sequence. Operationally, a 'high-quality base' is defined as one with an accuracy of at least 99% (corresponding to a PHRED score of at least 20).Database(数据库)A computerized storehouse of data that provides a standardized way for locating, adding, removing, and changing data. See also Object-oriented database, Relational database.DendogramA form of a tree that lists the compared objects (e.g., sequences or genes in a microarray analysis) in a vertical order and joins related ones by levels of branches extending to one side of the list.Depth (厚度)See coverageDirichlet mixturesDefined as the conjugational prior of a multinomial distribution. One use is for predicting the expected pattern of amino acid variation found in the match state of a hid-den Markov model (representing one column of a multiple sequence alignment of proteins), based on prior distributions found in conserved protein domains (blocks).Distance in sequence analysis(序列距离)The number of observed changes in an optimal alignment of two sequences, usually not counting gaps.DNA Sequencing (DNA测序)The experimental process of determining the nucleotide sequence of a region of DNA. This is done by labelling each nucleotide (A, C, G or T) with either a radioactive or fluorescent marker which identifies it. There are several methods of applying this technology, each with their advantages and disadvantages. For more information, refer to a current text book. High throughput laboratories frequently use automated sequencers, which are capable of rapidly reading large numbers of templates. Sometimes, the sequences may be generated more quickly than they can be characterised.Domain (功能域)A discrete portion of a protein assumed to fold independently of the rest of the protein andpossessing its own function.Dot matrix(点标矩阵图)Dot matrix diagrams provide a graphical method for comparing two sequences. One sequence is written horizontally across the top of the graph and the other along the left-hand side. Dots are placed within the graph at the intersection of the same letter appearing in both sequences. A series of diagonal lines in the graph indicate regions of alignment. The matrix may be filtered to reveal the most-alike regions by scoring a minimal threshold number of matches within a sequence window.Draft genome sequence (基因组序列草图)The sequence produced by combining the information from the individual sequenced clones (by creating merged sequence contigs and then employing linking information to create scaffolds) and positioning the sequence along the physical map of the chromosomes.DUST (一种低复杂性区段过滤程序)A program for filtering low complexity regions from nucleic acid sequences.Dynamic programming(动态规划法)A dynamic programming algorithm solves a problem by combining solutions to sub-problems that are computed once and saved in a table or matrix. Dynamic programming is typically used when a problem has many possible solutions and an optimal one needs to be found. This algorithm is used for producing sequence alignments, given a scoring system for sequence comparisons.EMBL (欧洲分子生物学实验室,EMBL数据库是主要公共核酸序列数据库之一)European Molecular Biology Laboratories. Maintain the EMBL database, one of the major public sequence databases.EMBnet (欧洲分子生物学网络)European Molecular Biology Network: /was established in 1988, and provides services including local molecular databases and software for molecular biologists in Europe. There are several large outposts of EMBnet, including EXPASY.Entropy(熵)From information theory, a measure of the unpredictable nature of a set of possible elements. The higher the level of variation within the set, the higher the entropy.Erdos and Renyi lawIn a toss of a “fair” coin, the number of heads in a row that can be expected is the logari thm of the number of tosses to the base 2. The law may be generalized for more than two possible outcomes by changing the base of the logarithm to the number of out-comes. This law was used to analyze the number of matches and mismatches that can be expected between random sequences as a basis for scoring the statistical significance of a sequence alignment.EST (表达序列标签的缩写)See Expressed Sequence TagExpect value (E)(E值)E value. The number of different alignents with scores equivalent to or better than S that are expected to occur in a database search by chance. The lower the E value, the more significant the score. In a database similarity search, the probability that an alignment score as good as the one found between a query sequence and a database sequence would be found in as many comparisons between random sequences as was done to find the matching sequence. In other types of sequence analysis, E has a similar meaning.Expectation maximization (sequence analysis)An algorithm for locating similar sequence patterns in a set of sequences. A guessed alignment of the sequences is first used to generate an expected scoring matrix representing the distribution of sequence characters in each column of the alignment, this pattern is matched to each sequence, and the scoring matrix values are then updated to maximize the alignment of the matrix to the sequences. The procedure is repeated until there is no further improvement.Exon (外显子)Coding region of DNA. See CDS.Expressed Sequence Tag (EST) (表达序列标签)Randomly selected, partial cDNA sequence; represents it's corresponding mRNA. dbEST is a large database of ESTs at GenBank, NCBI.FASTA(一种主要数据库搜索程序)The first widely used algorithm for database similarity searching. The program looks for optimal local alignments by scanning the sequence for small matches called "words". Initially, the scores of segments in which there are multiple word hits are calculated ("init1"). Later the scores of several segments may be summed to generate an "initn" score. An optimized alignment that includes gaps is shown in the output as "opt". The sensitivity and speed of the search are inversely related and controlled by the "k-tup" variable which specifies the size of a "word". (Pearson andLipman)Extreme value distribution(极值分布)Some measurements are found to follow a distribution that has a long tail which decays at high values much more slowly than that found in a normal distribution. This slow-falling type is called the extreme value distribution. The alignment scores between unrelated or random sequences are an example. These scores can reach very high values, particularly when a large number of comparisons are made, as in a database similarity search. The probability of a particular score may be accurately predicted by the extreme value distribution, which follows a double negative exponential function after Gumbel.False negative(假阴性)A negative data point collected in a data set that was incorrectly reported due to a failure of the test in avoiding negative results.False positive (假阳性)A positive data point collected in a data set that was incorrectly reported due to a failure of the test. If the test had correctly measured the data point, the data would have been recorded as negative.Feed-forward neural network (反向传输神经网络)Organizes nodes into sequence layers in which the nodes in each layer are fully connected with the nodes in the next layer, except for the final output layer. Input is fed from the input layer through the layers in sequence in a “feed-forward” direction, resulting in output at the final layer. See also Neural network.Filtering (window size)During pair-wise sequence alignment using the dot matrix method, random matches can be filtered out by using a sliding window to compare the two sequences. Rather than comparing a single sequence position at a time, a window of adjacent positions in the two sequences is compared and a dot, indicating a match, is generated only if a certain minimal number of matches occur.Filtering (过滤)Also known as Masking. The process of hiding regions of (nucleic acid or amino acid) sequence having characteristics that frequently lead to spurious high scores. See SEG and DUST.Finished sequence(完成序列)Complete sequence of a clone or genome, with an accuracy of at least 99.99% and no gaps.Fourier analysisStudies the approximations and decomposition of functions using trigonometric polynomials.Format (file)(格式)Different programs require that information be specified to them in a formal manner, using particular keywords and ordering. This specification is a file format.Forward-backward algorithmUsed to train a hidden Markov model by aligning the model with training sequences. The algorithm then refines the model to reduce the error when fitted to the given data using a gradient descent approach.FTP (File Transfer Protocol)(文件传输协议)Allows a person to transfer files from one computer to another across a network using an FTP-capable client program. The FTP client program can only communicate with machines that run an FTP server. The server, in turn, will make a specific portion of its tile system available for FTP access, providing that the client is able to supply a recognized user name and password to the server.Full shotgun clone (鸟枪法克隆)A large-insert clone for which full shotgun sequence has been produced.Functional genomics(功能基因组学)Assessment of the function of genes identified by between-genome comparisons. The function of a newly identified gene is tested by introducing mutations into the gene and then examining the resultant mutant organism for an altered phenotype.gap (空位/间隙/缺口)A space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another. To prevent the accumulation of too many gaps in an alignment, introduction of a gap causes the deduction of a fixed amount (the gap score) from the alignment score. Extension of the gap to encompass additional nucleotides or amino acid is also penalized in the scoring of an alignment.Gap penalty(空位罚分)A numeric score used in sequence alignment programs to penalize the presence of gaps within an alignment. The value of a gap penalty affects how often gaps appear in alignments produced by the algorithm. Most alignment programs suggest gap penalties that are appropriate for particular scoring matrices.Genetic algorithm(遗传算法)A kind of search algorithm that was inspired by the principles of evolution. A population of initial solutions is encoded and the algorithm searches through these by applying a pre-defined fitness measurement to each solution, selecting those with the highest fitness for reproduction. New solutions can be generated during this phase by crossover and mutation operations, defined in the encoded solutions.Genetic map (遗传图谱)A genome map in which polymorphic loci are positioned relative to one another on the basis of the frequency with which they recombine during meiosis. The unit of distance is centimorgans (cM), denoting a 1% chance of recombination.Genome(基因组)The genetic material of an organism, contained in one haploid set of chromosomes.Gibbs sampling methodAn algorithm for finding conserved patterns within a set of related sequences. A guessed alignment of all but one sequence is made and used to generate a scoring matrix that represents the alignment. The matrix is then matched to the left-out sequence, and a probable location of the corresponding pattern is found. This prediction is then input into a new alignment and another scoring matrix is produced and tested on a new left-out sequence. The process is repeated until there is no further improvement in the matrix.Global alignment(整体联配)Attempts to match as many characters as possible, from end to end, in a set of twomore sequences.Gopher (一个文档发布系统,允许检索和显示文本文件)Graph theory(图论)A branch of mathematics which deals with problems that involve a graph or network structure. A graph is defined by a set of nodes (or points) and a set of arcs (lines or edges) joining the nodes. In sequence and genome analysis, graph theory is used for sequence alignments and clustering alike genes.GSS(基因综述序列)Genome survey sequence.GUI(图形用户界面)Graphical user interface.H (相对熵值)H is the relative entropy of the target and background residue frequencies. (Karlin and Altschul, 1990). H can be thought of as a measure of the average information (in bits) available per position that distinguishes an alignment from chance. At high values of H, short alignments can be distinguished by chance, whereas at lower H values, a longer alignment may be necessary. (Altschul, 1991)Half-bitsSome scoring matrices are in half-bit units. These units are logarithms to the base 2 of odds scores times 2.Heuristic(启发式方法)A procedure that progresses along empirical lines by using rules of thumb to reach a solution. The solution is not guaranteed to be optimal.Hexadecimal system(16制系统)The base 16 counting system that uses the digits O-9 followed by the letters A-F.HGMP (人类基因组图谱计划)Human Genome Mapping Project.Hidden Markov Model (HMM)(隐马尔可夫模型)In sequence analysis, a HMM is usually a probabilistic model of a multiple sequence alignment, but can also be a model of periodic patterns in a single sequence, representing, for example, patterns found in the exons of a gene. In a model of multiple sequence alignments, each column of symbols in the alignment is represented by a frequency distribution of the symbols called a state, and insertions and deletions by other states. One then moves through the model along a particular path from state to state trying to match a given sequence. The next matching symbol is chosen from each state, recording its probability (frequency) and also the probability of going to thatparticular state from a previous one (the transition probability). State and transition probabilities are then multiplied to obtain a probability of the given sequence. Generally speaking, a HMM is a statistical model for an ordered sequence of symbols, acting as a stochastic state machine that generates a symbol each time a transition is made from one state to the next. Transitions betweenstates are specified by transition probabilities.Hidden layer(隐藏层)An inner layer within a neural network that receives its input and sends its output to other layers within the network. One function of the hidden layer is to detect covariation within the input data, such as patterns of amino acid covariation that are associated with a particular type of secondary structure in proteins.Hierarchical clustering(分级聚类)The clustering or grouping of objects based on some single criterion of similarity or difference.An example is the clustering of genes in a microarray experiment based on the correlation between their expression patterns. The distance method used in phylogenetic analysis is another example.Hill climbingA nonoptimal search algorithm that selects the singular best possible solution at a given state or step. The solution may result in a locally best solution that is not a globally best solution.Homology(同源性)A similar component in two organisms (e.g., genes with strongly similar sequences) that can be attributed to a common ancestor of the two organisms during evolution.Horizontal transfer(水平转移)The transfer of genetic material between two distinct species that do not ordinarily exchange genetic material. The transferred DNA becomes established in the recipient genome and can be detected by a novel phylogenetic history and codon content com-pared to the rest of the genome.HSP (高比值片段对)High-scoring segment pair. Local alignments with no gaps that achieve one of the top alignment scores in a given search.HTGS/HGT(高通量基因组序列)High-throughout genome sequences。
J OURNAL OF C LINICAL M ICROBIOLOGY,Oct.2005,p.5122–5128Vol.43,No.10 0095-1137/05/$08.00ϩ0doi:10.1128/JCM.43.10.5122–5128.2005Copyright©2005,American Society for Microbiology.All Rights Reserved.Comparison of Six DNA Extraction Methods for Recovery of Fungal DNA as Assessed by Quantitative PCRDavid N.Fredricks,1,3*Caitlin Smith,1and Amalia Meier2Program in Infectious Diseases1and Program in Biostatistics,2Fred Hutchinson Cancer Research Center, Seattle,Washington,and Department of Medicine,Division of Allergy and Infectious Diseases,University of Washington,Seattle,Washington3Received12May2005/Returned for modification16June2005/Accepted27July2005The detection of fungal pathogens in clinical samples by PCR requires the use of extraction methods thatefficiently lyse fungal cells and recover DNA suitable for amplification.We used quantitative PCR assays tomeasure the recovery of DNA from two important fungal pathogens subjected to six DNA extraction methods.Aspergillus fumigatus conidia or Candida albicans yeast cells were added to bronchoalveolar lavagefluid andsubjected to DNA extraction in order to assess the recovery of DNA from a defined number of fungalpropagules.In order to simulate hyphal growth in tissue,Aspergillus fumigatus conidia were allowed to formmycelia in tissue culture media and then harvested for DNA extraction.Differences among the DNA yields fromthe six extraction methods were highly significant(P<0.0001)in each of the three experimental systems.Anextraction method based on enzymatic lysis of fungal cell walls(yeast cell lysis plus the use of GNOME kits)produced high levels of fungal DNA with Candida albicans but low levels of fungal DNA with Aspergillusfumigatus conidia or hyphae.Extraction methods employing mechanical agitation with beads produced thehighest yields with Aspergillus hyphae.The MasterPure yeast method produced high levels of DNA from C.albicans but only moderate yields from A.fumigatus.A reagent from one extraction method was contaminatedwith fungal DNA,including DNA from Aspergillus and Candida species.In conclusion,the six extractionmethods produce markedly differing yields of fungal DNA and thus can significantly affect the results of fungalPCR assays.No single extraction method was optimal for all organisms.Noncultivation methods are increasingly being used to over-come the poor diagnostic sensitivities and long turnaround times associated with the detection and identification of fungal pathogens in clinical samples by cultivation.One of these cul-tivation-independent methods is real-time PCR,which can rapidly detect and quantify fungal nucleic acid sequences in human tissue samples.Real-time quantitative PCR(qPCR) assays have detection thresholds that approach single-molecule sensitivity,and thus,little additional assay sensitivity can be achieved in the PCR itself by techniques such as targeting genes with multiple copies per fungal genome or increasing the total amount of DNA tested.The ultimate sensitivity of any PCR assay for the detection of fungal pathogens depends on the efficient lysis of fungal cells in the tissue sample and the purification of DNA that is free of PCR inhibitors.Fungi have cell walls that impede lysis and the recovery of nucleic acids. Few studies have focused on the critical DNA extraction stage of sample processing,in contrast to the multitude of studies on fungal PCR assay methods.Furthermore,highly sensitive and specific nucleic acid-based methods for the detection of fungi necessitate the use of DNA extraction reagents that are free of contaminating fungal nucleic acids.We sought to compare DNA extraction methods by using qPCR to measure the amount of fungal DNA liberated from two important fungal pathogens that infect humans,Aspergillus fumigatus and Candida albicans.Candida albicans is a model yeast pathogen,and Aspergillus fumigatus is a modelfilamen-tous fungal pathogen.We elected to test DNA extraction in spiked bronchoalveolar lavage(BAL)fluid because BALfluid is the specimen most commonly subjected to fungal PCR at our institution and has been shown to contain PCR inhibitors. Although Aspergillus fumigatus is a common cause of pneumo-nia in immunocompromised patients,pneumonia due to Can-dida albicans is much less common(4,11,12).The inoculation of BALfluid with known numbers of Aspergillus fumigatus conidia or Candida albicans yeast cells allows one to test DNA extraction methods using well-defined numbers of fungal prop-agules.However,Aspergillus fumigatus normally exists as hy-phal structures in tissue.Thus,while useful for the accurate quantitation of fungal propagules subjected to DNA extrac-tion,conidia do not represent the clinically relevant structure. One can mimic hyphal growth in tissue by germinating As-pergillus fumigatus conidia in tissue culture media and allowing them to form mycelial mats.Hyphae can then be harvested for extraction.We also sought to assess the contamination of com-mercial DNA extraction kits with Aspergillus and Candida DNA.MATERIALS AND METHODSPreparation of BALfluid.BALfluid samples from patients undergoing eval-uation for pneumonia at the Fred Hutchinson Cancer Research Center were pooled.Aliquots from this pool were subjected to extraction(MasterPure yeast method[MPY])and fungal DNA quantitation by qPCR to assure that the pool was free of Aspergillus and Candida DNA.Samples of this pooled BALfluid were then spiked with fungal propagules and subjected to the various DNA extraction methods noted below.Cultivation of fungi and quantitation of fungal propagules.Aspergillus fumiga-tus(ATCC strain B5233)was grown on5ml of Sabouraud dextrose agar in a*Corresponding author.Mailing address:Program in Infectious Diseases,Fred Hutchinson Cancer Research Center,1100Fairview Avenue North,D3-100,Box19024,Seattle,WA98109-1024.Phone: (206)667-1935.Fax:(206)667-4411.E-mail:dfredric@.512250-ml tissue cultureflask at37°C for2days and then left at25°C to sporulate until mature.The agar was overlaid with5ml of sterile0.1%Tween20filtered to0.2m,and theflask was placed on a rotary shaker for10min.The solution of A.fumigatus conidia and hyphal fragments was harvested with a syringe and passed through a5.0-m polycarbonatefilter(Millipore Corporation)to remove hyphae.Conidia were washed two times by centrifuging thefiltrate at3,000ϫg for20min,removing all of the supernatant,and resuspending the pellet in fresh 0.1%Tween20.After the second centrifugation,the pellet was resuspended in 15ml of0.1%Tween20and the solution was placed on ice.Microscopic examination of the preparation was done to confirm the absence of hyphal elements.The concentration of conidia was determined by manual cell counting with a hemocytometer.Aliquots(0.1ml)of BALfluid were each inoculated with 28,000conidia and subjected to the DNA extraction protocols.A clinical isolate of Candida albicans recovered from blood culture was grown in Sabouraud dextrose broth at37°C in a shaking incubator for1day.The yeast cells were washed twice in10mM Tris-1mM EDTA buffer by centrifugation at 14,000ϫg for5min,then resuspended in buffer and placed on ice.The cells were manually counted with a hemocytometer.Yeast cells(42,000)were inocu-lated into each0.1-ml aliquot of BALfluid for DNA extraction.Cultivation of Aspergillus in tissue culture media to form mycelia.To create mycelia for DNA extraction,2,800washed A.fumigatus conidia were inoculated into wells of a96-well tissue culture plate(Costar,Corning,NY).Each well contained0.1ml of Dulbecco’s modified Eagle medium(Invitrogen,Carlsbad, CA).The plates were incubated at35°C and5%CO2in a humidified tissue culture incubator for24h.Visual inspection of the plates using an inverted phase-contrast microscope was used to confirm the presence of extensive myce-lial mats.Although the number of A.fumigatus genomes could not be indepen-dently quantified by a technique such as that used for counting conidia,we sought to produce roughly equivalent hyphal masses by inoculating tissue culture media with a known number of Aspergillus conidia and then allowing the conidia to germinate into hyphae for24h in culture.The hyphae were harvested from wells by mixing with a pipette and transferring0.1ml of resuspended hyphae to a microcentrifuge tube for DNA extraction.Preparation of fungal genomic DNA for use in qPCR standards.Aspergillus fumigatus(ATCC strain B5233)was grown on5ml of Sabouraud dextrose agar in a50-ml tissue cultureflask at37°C as described above.The agar was overlaid with10ml of sterile0.1%Tween20filtered to0.2m,and a stir bar was used to break hyphae by spinning on a stir plate.Hyphal fragments were pelleted by centrifugation at3,200ϫg for15min and resuspended in sterile water.A clinical isolate of Candida albicans was grown in Sabouraud dextrose broth at37°C overnight.The yeasts were pelleted by centrifugation at3,200ϫg for15min and resuspended in sterile water.Resuspended fungi were processed through the MasterPure yeast DNA extraction method as described below.The purified nucleic acid was treated with5units of RiboShredder RNase mixture(Epicenter, Madison,WI)at37°C for50min to degrade RNA.The DNA was precipitated with isopropanol and sodium acetate,washed with70%ethanol,and resus-pended in Tris-EDTA buffer.The optical density of the genomic DNA was measured with a spectrophotometer at260nm in order to quantify the stock concentration for subsequent dilution in qPCR standards.DNA extraction methods.Manufacturers’instructions were followed for all methods except where noted.Method MPY(MasterPure yeast DNA purification kit[Epicenter,Madison, WI])employs a nonenzymatic method for the lysis of fungi followed by a salting-out procedure to precipitate proteins and an alcohol precipitation step to purify DNA.Method UCS(UltraClean soil DNA isolation kit[MoBio,Inc.,Solana Beach, CA])uses a bead matrix and lysis buffer to pulverize cells by horizontal shaking on a vortex mixer,followed by adsorption of DNA to a spinfilter,a wash step, and the elution of DNA in buffer.The protocol was followed per the manufac-turer’s instructions,using the alternative protocol for maximum yields.Micro-centrifuge tubes with sample and bead matrices were attached to a horizontal platform on a vortex mixer and agitated vigorously for10min.Each sample was split into2volumes,and650l of solution S3was added to each tube of supernatant prior to addition to the two spin columns.Thefinal DNA eluates were combined.Method FDNA(FastDNA kit[Qbiogene,Irvine,CA])uses a bead matrix and lysis buffer to pulverize cells by agitation in a FastPrep agitator for high-speed cell disruption,followed by adsorption of DNA to glass milk,a wash step,and elution of DNA in buffer.We used lysing matrix A,cell lysis solution-yeast lysis buffer,and the spin column protocol.Samples were agitated for two30-second runs at a speed of5m per second.Method MPPL(MasterPure plant leaf DNA purification kit[Epicenter,Mad-ison,WI])uses a nonenzymatic lysis procedure,alcohol precipitation of DNA, and a cleanup procedure to bind PCR inhibitors,followed by a reprecipitation of DNA.The0.1ml of input sample was combined with0.3ml of the plant DNA extraction solution and ground with a disposable plastic micropestle prior to DNA precipitation.Method YL-GNOME(yeast cell lysis preparation kit plus GNOME kit[Qbio-gene,Irvine,CA])uses two kits to lyse fungal cells and purify DNA.The yeast cell lysis kit uses enzymes for the digestion of fungal cell walls.The resulting spheroplasts are then subjected to further lysis in the GNOME kit by using a lysis buffer and protease mix.Protein is precipitated with a salting-out procedure,and then DNA is precipitated with alcohol.Thefirst step of the yeast lysis procedure was centrifugation of the BAL or tissue culture sample with the addition of yeast enzyme enhancer to the pellet.The manufacturer’s instructions were followed, except that the steps were appropriately scaled down in size,with the addition of 0.1ml of yeast enzyme enhancer,2l of yeast enzyme salts,and20l of spheroplasting enzyme mix per sample.The resulting digest was then added to 0.4ml of cell suspension solution in the GNOME kit,0.1ml of cell lysis solution was added,and the lysis and purification proceeded with volumes scaled to the input sample volume.Method SM(SoilMaster DNA extraction kit[Epicenter,Madison,WI])uses a hot detergent lysis procedure to break open cells,a salting-out procedure to precipitate protein,a column chromatography step to remove PCR inhibitors, and an alcohol precipitation step to purify DNA.The optional step of vortex mixing at37°C for10min was not performed.Method QIAMP-S(QIAamp DNA stool mini kit[QIAGEN,Valencia,CA]) uses lysis buffer,proteinase K,and heat at70°C to break open cells.Inhibitors in the lysate are bound to an insoluble matrix(InhibitEX tablet)and pelleted by centrifugation.DNA in the supernatant is bound to a spin column,washed,and eluted in buffer.Note that this method was not used to compare DNA extraction yields due to the detection of contaminating fungal DNA.See“Identification of contaminating fungal DNA in extraction reagents”below.All DNA extractions were performed in quadruplicate except for the analysis of Aspergillus conidia by methods MPPL,YL-GNOME,and SM,which wereparison of DNA extraction methods based on costs,times,sample volumes,and additional reagents requiredExtraction method Cost/testProcessing time(h:min)a SamplevolVolrecoveredAdditional reagents/equipment(not supplied)Equipment Minimum MaximumMPY$1.810:401:20100l50l Isopropanol,ethanol,microcentrifuge tubesHeat block,microcentrifuge MPPL$3.230:551:40100l50l Isopropanol,ethanol,microcentrifuge tubesHeat block,microcentrifuge SM$4.540:501:55100l300l Microcentrifuge tubes Heat block,microcentrifuge UCS$3.000:352:40100l50l None Vortex adapter,microcentrifugeFDNA$2.560:201:15100l100l None Microcentrifuge,FastPrepmachineYL-GNOME$5.933:304:00100l100l Isopropanol,microcentrifugetubesHeat block,microcentrifuge a The minimum processing times reflect extractions of single samples,whereas the maximum times reflect extractions of12samples each.V OL.43,2005SIX DNA EXTRACTION METHODS FOR RECOVERY OF FUNGAL DNA5123performed in duplicate.Input and output volumes for each method are listed in Table 1.Quantitative PCR methods.Two TaqMan-based PCR assays were used to measure fungal DNA using a GeneAmp 7900sequence detection system (Ap-plied Biosystems,Foster City,CA)with primers that target highly conserved regions of the fungal 18S rRNA gene and 5Јnuclease probes complementary to Aspergillus species or Candida species 18S rRNA genes.For the Aspergillus fumigatus assay,we used primers Fun-18S-995F (5Ј-CGATYAGATACCGTYG TAGTC-3Ј),Fun-18S-1217R (5ЈTGTCTGGACCTGGTGAGTTT-3Ј),and a 6-carboxyfluorescein (FAM)-labeled probe with a Black Hole quencher (BHQ1)(5Ј-FAM-TTTCTATGATGACCCGCTCGGCA-BHQ1-3Ј).For the Candida albicans qPCR assay,we used primers Fun-18S-1313F (5Ј-SCGATAACGAAC GAGACCT-3Ј)and Fun-18S-1467R (5Ј-TAGCGCGCTGCGGCCCAGA-3Ј)with a VIC (Applied Biosystems)-labeled probe and a 6-carboxytetramethylrho-damine (TAMRA)quencher (5Ј-VIC-CTAAATAGTGSTGCTAGCWTTTGC-TAMRA-3Ј).The concentration of each primer was 200nM,and the concen-tration of each probe was 100nM.We used Universal master mix (Applied Biosystems)for all qPCR reactions and ran each sample in a 50-l volume consisting of 5l of target DNA and 45l of master mix with primers and probe.PCR conditions included a 2-minute incubation at 50°C to inactivate previous amplicons with uracil-DNA glycosylase,followed by a 10-minute incubation at 95°C to activate the Taq Gold polymerase.Forty-five cycles of PCR,consisting of 15seconds at 95°C,30seconds at 55°C,and 30seconds at 65°C,were performed.All qPCR assays contained 4no-template control samples (negative controls)and 12samples consisting of Aspergillus fumigatus or Candida albicans genomic DNA (as appropriate)added to reactions in duplicate to produce standards of 1,000pg,100pg,10pg,1pg,100fg,and 20fg of fungal genomic DNA.The threshold cycle values from the genomic DNA standards were used to create a standard curve to assess the amount of fungal DNA in samples subjected to the various DNA extraction methods.All samples from extraction replicates were run in duplicate.Amplification controls were performed on DNA extracted from each method;5l of extracted DNA was combined with 1l of 1,000-pg fungal genomic DNA standard,and qPCR was performed on this mixture.If PCR inhibitors are present in the extracted DNA,the threshold cycle for that sample shifts to a higher cycle number compared to the 1,000-pg standard without exogenous sample DNA.Digest controls consisted of sterile UV-irradiated water processed through each of the DNA extraction methods and then analyzed by qPCR.Identification of contaminating fungal DNA in extraction reagents.Because Aspergillus DNA was detected in digest controls from the QIAMP-S extraction method (QIAamp DNA stool mini kit;QIAGEN)and was linked to a tablet used to bind PCR inhibitors (InhibitEX tablet;QIAGEN),we used broad-range 18SrRNA gene PCR with analysis of cloned products to identify the contaminating species (2).Statistical analysis.Each sample of extracted DNA was subjected to qPCR in duplicate,and extractions were performed in duplicate or quadruplicate.Mean quantities of fungal DNA detected with each DNA extraction method for each experimental system were plotted along with the standard deviations for the replicates.Analysis of variance with additional Bonferroni-protected contrasts was performed to compare extraction methods within each organism and exper-imental system.Such comparisons were considered statistically significant when P values were less than 0.05/(number of contrasts performed).A Bonferroni correction was preferable to Scheffe’s method,as the number of contrasts was relatively small in each case.Some contrasts were performed to confirm that no significant difference in performances had been observed.In other cases,a significant difference was expected.Levene’s test was initially performed to ascertain whether the assumption of homogeneous variances was valid.Contrasts of interest were selected on the basis of graphical analysis.SAS 9.2and Excel 2000for Windows were used for statistical and graphical analyses,respectively.RESULTSLevels of fungal DNA recovered with the six extraction methods are displayed in Fig.1to 3.Overall analysis of vari-ance tests for differences by extraction method were significant at P Ͻ0.0001for all three organisms/experimental systems.Table 2displays the log differences and P values for various comparisons of mean DNA levels between extraction methods in each of the three experimental systems.Log differences translate to ratios on the original scale.For example,a log difference of 0.580(row 2)means that the average count for the first method or group of methods is 100.580,or 3.8,times larger than the other,and a log difference of 3.574(row 6)indicates that the first method or group of methods is 3,750times larger than the second.Figure 1shows the quantity of Candida DNA detected in BAL fluid samples experimentally inoculated with C.albicans yeast forms and subjected to the six different DNA extraction methods.Five contrasts were performed to examinediffer-FIG.1.Mean levels of Candida DNA detected in BAL fluid spiked with C.albicans yeast cells and subjected to six DNA extraction methods.Fungal DNA levels were measured using quantitative PCR.Error bars indicate standard deviations for replicate extractions.The MPY and YL-GNOME extraction methods produced the highest levels of Candida DNA.5124FREDRICKS ET AL.J.C LIN .M ICROBIOL .ences in extraction methods for C.albicans yeast cells,thus requiring a P value of Ͻ0.01for significance with the Bonferroni correction (Table 2).The MPY and YL-GNOME methods both produced high levels of Candida DNA,with more than 11,000pg DNA/ml detected in BAL fluid,and these methods were not significantly different from each other.The UCS method (1,618pg/ml)and the FDNA method (1,083pg/ml)yielded levels of DNA that were not significantly different from each other,but the recovery of fungal DNA with these methods was signifi-cantly less than the recoveries obtained with the MPY and YL-GNOME methods (P Ͻ0.0001).The MPPL and SM methods produced dramatically lower levels of Candida DNA compared to the other four methods,and these differences were highly statistically significant (P Ͻ0.0001).FIG.2.Mean levels of Aspergillus DNA detected in BAL fluid spiked with A.fumigatus conidia and subjected to six DNA extraction methods.Fungal DNA levels were measured using quantitative PCR.Error bars indicate standard deviations for replicate extractions.The UCS and FDNA methods produced the highest levels of Aspergillus DNA fromconidia.FIG.3.Mean levels of Aspergillus DNA in tissue culture media inoculated with A.fumigatus conidia,allowed to form mycelia,and subjected to six DNA extraction methods.Fungal DNA levels were measured using quantitative PCR.Error bars indicate standard deviations for replicate extractions.The FDNA method produced the highest levels of Aspergillus DNA from hyphae.V OL .43,2005SIX DNA EXTRACTION METHODS FOR RECOVERY OF FUNGAL DNA 5125Figure2displays the quantity of Aspergillus DNA detected in BALfluid samples experimentally inoculated with A.fumigatus conidia and subjected to the six different DNA extraction methods.Three contrasts were performed to examine differ-ences in extraction methods for A.fumigatus conidia;there-fore,P values ofϽ0.017were considered statistically signifi-cant after Bonferroni correction.The UCS and FDNA methods both employ bead beating for the physical disruption of cells;these methods produced the highest yields of Aspergil-lus DNA(Ͼ2,000pg/ml)and were not significantly different from each other.The UCS and FDNA methods produced higher levels of DNA than the MPY method(610pg/ml),but the difference was not statistically significant after Bonferroni correction(Pϭ0.0191).The MPY method was significantly better than the MPPL,YL-GNOME,and SM methods(PϽ0.0001).The latter three methods all had DNA yields of less than200pg/ml.Although the YL-GNOME method performed well when extracting DNA from C.albicans,this method per-formed poorly when extracting DNA from Aspergillus conidia. Modest PCR inhibition was detected when Aspergillus conidia were extracted with the YL-GNOME method,resulting in a 1-log drop in assay sensitivity detected by a shift in threshold cycle when the samples were spiked with1,000pg of Aspergillus genomic DNA.A combination of poor DNA extraction and modest PCR inhibition likely accounts for the absence of As-pergillus DNA detected with the YL-GNOME method.PCR inhibitors were not detected with any other extraction methods. It is possible to estimate the efficiency of extraction by com-paring the amount of fungal DNA recovered with the amount of fungal DNA initially inoculated with intact organisms in BALfluid.Unfortunately,the multicellular natures of Aspergil-lus hyphae and budding Candida yeast cells make estimates of initial cell counts highly inaccurate.In contrast,Aspergillus conidia are easily counted as separate cells with a hemocytom-eter,and each conidium contains a single genome.On the basis of an estimated Aspergillus genome mass of31.6fg,we would expect to recover8,848pg of Aspergillus DNA per ml of BAL fluid after inoculation with280,000conidia at100%extraction efficiency.The conidium extraction efficiencies for the methods studied were30.1%for UCS,26.7%for FDNA,6.9%for MPY, 1.6%for MPPL,0.3%for SM,and0%for YL-GNOME.When10-fold and100-fold fewer conidia were added to BALfluid and extracted with the UCS method,ex-traction efficiencies were8.7%and9.9%,respectively.BAL fluid spiked with280conidia still yielded DNA levels of8.6 pg/ml,or860fg per0.1ml of sample.Figure3displays the quantities of Aspergillus DNA detected in tissue culture media experimentally inoculated with A.fu-migatus conidia,cultured to form mycelial mats,and then har-vested for DNA extraction by using the six methods.Three contrasts were performed to examine differences in extraction methods for A.fumigatus hyphae;therefore,P values ofϽ0.017 were considered significant after Bonferroni correction.The FDNA method produced the highest DNA yield at3,934,258 pg/ml of culture medium and was significantly better than the UCS method(654,372pg/ml;Pϭ0.0017).The UCS method was significantly better than the MPY method(124,276pg/ml; Pϭ0.0018).The MPY method was significantly better(Pϭ0.0001)than the remaining three methods of MPPL(5pg/ml), YL-GNOME(225pg/ml),and SM(46pg/ml).Attempts to use the QIAMP-S DNA extraction method on BALfluid were unsuccessful because Aspergillus and other fungal DNA was detected in digest controls,indicating con-tamination of the reagents.We isolated the contamination to the InhibitEX tablets used to bind PCR inhibitors.No further comparisons were possible with this DNA extraction method. Several factors besides the recovery of DNA must be con-sidered when selecting a DNA extraction method.Table1 displays the cost per sample,processing time,sample volume, additional reagents,and equipment for each DNA extraction method.The MPY method was the least expensive method, had few manipulations,and could be completed in about an hour with a few samples.The YL-GNOME method was the most expensive approach because it required two separate kits and required the most processing time to produce DNA.The UCS and FDNA methods had similar reagent costs,but the FDNA method requires the purchase of a separate agitator, whereas the UCS method can use a vortex mixer for agitation. More manipulations were required in the UCS method than in the FDNA method,leading to a longer processing time.In general,the DNA extraction methods generating high yields of fungal DNA(MPY,UCS,and FDNA)gave reasonably repro-ducible results in replicate samples,reflected in the standard deviations for the means in Fig.1to3.DISCUSSIONFungi have cell walls that impede cell lysis and the recovery of DNA using conventional extraction methods(10).Simple lysis procedures,such as the use of sequential freeze-thaw cycles or incubation with hot detergent and proteases,have not produced high yields of DNA from many fungal species.Al-ternative approaches for the lysis of fungal cells include the agitation of tissue samples with microspheres or particulates within a sealed tube for physical disruption(13)and the enzy-matic digestion of cell wall polysaccharides to form sphero-plasts followed by conventional membrane lysis procedures (3).Some DNA extraction methods for fungi,such as grinding cells frozen with liquid nitrogen using a mortar and pestle andparison of mean fungal DNA levels recoveredby extraction methodIsolate ContrastLogdifferencePvalue aA.fumigatus conidia FDNA vs UCSϪ0.0270.8860FDNA and UCS vs MPY0.5800.0191MPY vs MPPL and SM 1.757<.0001A.fumigatus hyphae FDNA vs UCS0.7840.0017UCS vs MPY0.7790.0018MPY vs MPPL and SM 3.574<.0001C.albicans YL-GNOME vs MPY0.0360.8639FDNA vs UCSϪ0.1610.4524MMPL vs SM0.8900.0005YL-GNOME and MPYvs FDNA and UCS0.966<.0001FDNA and UCS vsMMPL and SM1.918<.0001a P values shown in bold are significant at0.05by Bonferroni correction forcontrasts on that organism.5126FREDRICKS ET AL.J.C LIN.M ICROBIOL.disrupting cell walls with a probe sonicator,work well for the large-scale preparation of fungal DNA from cultures(6,14). However,these methods are not practical for use in a clinical microbiology laboratory,where many samples must be pro-cessed and where cross contamination of samples must be scrupulously avoided.We compared the yields of fungal DNA produced from several commercial DNA extraction methods employing different lysis strategies that are suitable for use on multiple samples in a clinical microbiology laboratory setting. The use of commercial DNA extraction methods has been advocated for nucleic acid-based fungal diagnostics in order to provide standardized methods and reagents so that results can be compared between laboratories(1).The large differences in the amounts of fungal DNA re-covered with the different DNA extraction methods and detected by qPCR in this study highlight the importance of the extraction step in nucleic acid-based fungal diagnostics. For instance,there was almost a millionfold difference in DNA recovery levels between the MPPL and FDNA meth-ods applied to Aspergillus hyphae.The SM and MPPL meth-ods performed poorly in all tests;these methods were de-signed to extract DNA from bacterial,fungal,or plant sources in soil(SM)or from plant leaf material with com-plex cell walls(MPPL).Clearly,any fungal PCR assay that used these two extraction methods to detect Candida or Aspergillus species in tissue samples would likely suffer from unacceptably low sensitivity.The YL-GNOME kit is designed for DNA extraction from yeasts and performed very well with Candida albicans in BAL fluid but performed poorly with Aspergillus fumigatus conidia and hyphae.Ideally,DNA extraction methods should be capa-ble of detecting both yeast and hyphal forms of several differ-ent fungal pathogens in tissue samples submitted for fungal PCR testing.The failure of the YL-GNOME method to ex-tract DNA from thefilamentous fungal pathogen A.fumigatus makes this a poor general-purpose method for use in the clin-ical microbiology laboratory,where the identity of the patho-gen is initially unknown.The UCS and FDNA methods both employ agitation of the clinical sample with particulates within a microcentrifuge tube for the disruption of fungal cells,and these methods worked well with A.fumigatus hyphae and conidia.The UCS and FDNA methods performed less well than the MPY and YL-GNOME methods for the extraction of DNA from C.albicans yeast cells,demonstrating that mechanical disruption of the fungal cell wall is not always the optimal extraction approach. We found the FDNA method faster,less prone to cross con-tamination,and more amenable to high-throughput sample processing than the UCS method.The FDNA method has been studied previously using large inocula of propagules(107 to108CFU)from the organisms Candida albicans,Cryptococ-cus neoformans,Trichosporon beigelii,Aspergillus fumigatus, and Fusarium solani(13).Semiquantitative PCR was employed in the previous study to measure DNA recovery,and the in-vestigators found that the high-speed cell disruption extraction method(FDNA)produced significantly greater yields from the filamentous fungi than from the yeasts,a conclusion that is supported by our study.The MPY method is designed to extract DNA from yeasts with nonenzymatic lysis and produced good DNA recovery from C.albicans but,in our study,yielded significantly less DNA from Aspergillus conidia and hyphae than methods em-ploying mechanical disruption,such as FDNA and UCS.The MPY method was fast and relatively cheap.Cross contamina-tion is possible when opening sample tubes with lysis solution but can be minimized by changing gloves between samples. The MPY method can practically be performed on20samples or fewer in a morning for subsequent PCR the same day.The MPY method has been used by other investigators to extract DNA fromfilamentous fungi,though some modifications of the protocol were used(7).The utilities of other DNA extrac-tion methods for the detection of fungi have been studied in environmental samples(5)and blood(9).Aspergillus and Candida DNA were not detected in extrac-tion controls consisting of sterile water processed through each of the six DNA extraction methods compared in this study. However,the consistent detection of contaminating fungal DNA when using a seventh DNA extraction method(QIAMP-S) highlights the importance of testing reagents for fungal con-tamination in order to avoid false-positive PCR results.Re-agents may be sterile and still contain amplifiable microbial DNA.We selected the QIAMP-S method for testing because it employs a matrix that binds PCR inhibitors in stool(Inhibi-tEX tablet)and we suspected that it might bind PCR inhibitors found in mucus,sputum,and BALfluid.Our testing of mul-tiple tablets from multiple lots showed that there was a high level of fungal DNA contamination of this matrix.Two fungal genus-specific18S rRNA gene PCR assays were used in this study.Broad-range fungal PCR presents an additional chal-lenge,since contamination may arise from many fungal sources (8).The extraction of fungal DNA from clinical samples is a critical step in the process of detecting and identifying fungal pathogens by PCR.Our results demonstrate that different DNA extraction methods may produce dramatically different yields of fungal DNA.We have identified several methods that are well suited for the recovery of DNA from the human pathogens Candida albicans and Aspergillus fumigatus.Al-though such evaluations are somewhat subjective,we found the MPY and FDNA methods easiest to use.ACKNOWLEDGMENTThis work was supported by the National Institute of Allergy and Infectious Diseases grant R01AI054703to D.N.F.REFERENCES1.Chen,S.C.,C.L.Halliday,and W.Meyer.2002.A review of nucleicacid-based diagnostic tests for systemic mycoses with an emphasis on poly-merase chain reaction-based assays.Med.Mycol.40:333–357.2.Fredricks,D.N.,J.A.Jolley,P.W.Lepp,J.C.Kosek,and D.A.Relman.2000.Rhinosporidium seeberi:a human pathogen from a novel group of aquatic protistan parasites.Emerg.Infect.Dis.6:273–282.3.Glee,P.M.,P.J.Russell,J.A.Welsch,J.C.Pratt,and J.E.Cutler.1987.Methods for DNA extraction from Candida albicans.Anal.Biochem.164: 207–213.4.Haron,E.,S.Vartivarian,E.Anaissie,R.Dekmezian,and G.P.Bodey.1993.Primary Candida pneumonia.Experience at a large cancer center and review of the literature.Medicine(Baltimore)72:137–142.5.Haugland,R.A.,N.Brinkman,and S.J.Vesper.2002.Evaluation of rapidDNA extraction methods for the quantitative detection of fungi using real-time PCR analysis.J.Microbiol.Methods50:319–323.6.Haugland,R.A.,J.L.Heckman,and L.J.Wymer.1999.Evaluation ofdifferent methods for the extraction of DNA from fungal conidia by quan-titative competitive PCR analysis.J.Microbiol.Methods37:165–176.7.Jin,J.,Y.K.Lee,and B.L.Wickes.2004.Simple chemical extraction methodfor DNA isolation from Aspergillus fumigatus and other Aspergillus species.J.Clin.Microbiol.42:4293–4296.V OL.43,2005SIX DNA EXTRACTION METHODS FOR RECOVERY OF FUNGAL DNA5127。