Genomic and genic sequence variation in synthetic hexaploid wheat (AABBDD) as compared to the
- 格式:pdf
- 大小:441.53 KB
- 文档页数:6
BIOINFORMATICS ORIGINAL PAPER Vol.21no.92005,pages1859–1875doi:10.1093/bioinformatics/bti310 Sequence analysisGMAP:a genomic mapping and alignment program for mRNAand EST sequencesThomas D.Wu1,∗and Colin K.Watanabe21Department of Bioinformatics and2Department of Corporate Information Technology,Genentech,Inc.,South San Francisco,CA94080,USAReceived on February24,2004;revised on January27,2005;accepted on February4,2005Advance Access publication February22,2005ABSTRACTMotivation:We introduce gmap,a standalone program for mapping and aligning cDNA sequences to a genome.The program maps and aligns a single sequence with minimal startup time and memory requirements,and provides fast batch processing of large sequence sets.The program generates accurate gene structures,even in the presence of substantial polymorphisms and sequence errors,without using probabilistic splice site models.Methodology underlying the program includes a minimal sampling strategy for genomic mapping, oligomer chaining for approximate alignment,sandwich DP for splice site detection,and microexon identification with statistical significance testing.Results:On a set of human messenger RNAs with random mutations at a1and3%rate,gmap identified all splice sites accurately in over 99.3%of the sequences,which was one-tenth the error rate of existing programs.On a large set of human expressed sequence tags,gmap provided higher-quality alignments more often than blat did.On a set of Arabidopsis cDNAs,gmap performed comparably with GeneSeqer. In these experiments,gmap demonstrated a several-fold increase in speed over existing programs.Availability:Source code for gmap and associated programs is available at /share/gmapContact:twu@Supplementary information:/share/gmapINTRODUCTIONMapping and alignment of cDNA sequences—both messenger RNAs (mRNAs)and expressed sequence tags(ESTs)—onto the genome has become a central procedure in genome research.The resulting cDNA–genomic alignments not only reveal the intron–exon struc-ture of genes,but also facilitate the study of splicing mechanics and such transcript-based phenomena as alternative splicing,single nuc-leotide polymorphisms,and cDNA insertions and deletions(Jiang and Jacob,1998;Irizarry et al.,2000;Kan et al.,2001,2002;Zavolan et al.,2002;Modrek and Lee,2002;Clamp et al.,2003;Wheeler et al.,2003;Drabenstot et al.,2003;Kim et al.,2004;Florea et al., 2005).To address these needs,programs such as ssaha(Ning et al., 2001)have been introduced to map cDNA sequences to a gen-ome.Other programs have been developed to align a cDNA to a given genomic segment,including est_genome(Mott,1997), To whom correspondence should be addressed.dds/gap2(Huang,1996),sim4(Florea et al.,1998),Spidey(Wheelan et al.,2001),GeneSeqer(Usuka et al.,2000;Schlueter et al.,2003) and MGAlign(Lee et al.,2003;Ranganathan et al.,2003).Finally, some recent integrated programs,such as blat(Kent,2002)and squall(Ogasawara and Morishita,2002),perform both genomic mapping and alignment.Despite the availability of these programs,achieving perfection in cDNA–genomic alignment has been surprisingly elusive.Studies of existing programs have revealed various types of errors in identify-ing gene structures and splice sites(Haas et al.,2002).In compiling a database of EST-based splice sites,researchers have reportedly had to resort to manual curation of alignments to obtain the cor-rect results(Burset et al.,2001).Difficulties generally arise when a cDNA sequence differs from its corresponding genomic exons,due to polymorphisms,mutations or sequencing errors.Sequencing errors are especially prevalent in ESTs,where error rates are estimated to be1.5%for high-quality sequences(Zhuo et al.,2003)and3–4% overall(Richterich,1998).Such sequencing errors,especially near exon–exon junctions,can complicate the detection of splice sites. One approach to this situation has been to combine information across various alignments(Birney et al.,2004;Haas et al.,2003; Brendel et al.,2004)or even multiple sources of evidence(Allen et al.,2004)to arrive at a consensus answer.However,since such programs depend ultimately upon the original solutions generated by cDNA–genomic alignment programs,advances in the underlying alignment methodology are still important.In this paper,we introduce an integrated genomic mapping and alignment program called gmap(Genomic Mapping and Alignment Program).In contrast to programs designed primarily to run in client/ server mode,such as blat and squall,our program operates as a traditional standalone program.Gmap provides not only improved performance over existing programs in terms of speed and accur-acy,but also enhanced functionality.The functionality provided by gmap allows a user to:(1)map and align a single cDNA interact-ively against a large genome in about a second,without the startup time of several minutes typically needed by existing mapping pro-grams;(2)switch arbitrarily among different genomes,without the need for a pre-loaded server dedicated to each genome;(3)run the program on computers with as little as128MB of RAM(random access memory);(4)perform high-throughput batch processing of cDNAs by using memory mapping and multithreading when appro-priate memory and hardware are available;(5)generate accurate gene models,even in the presence of substantial polymorphisms and sequence errors;(6)locate splice sites accurately without the use©The Author2005.Published by Oxford University Press.All rights reserved.For Permissions,please email:journals.permissions@1859 at Institute of Biophysics,CAS on July 7, 2014 /Downloaded fromT.D.Wu and C.K.Watanabeof probabilistic splice site models,allowing generalized use of the program across species;(7)detect statistically significant microexons and incorporate them into the alignment;and(8)handle mapping and alignment tasks on genomes having alternate assemblies,linkage groups or strains.In the remainder of the paper,we review existing work on cDNA–genomic mapping and alignment,and describe the methods underlying gmap.Next we provide examples of how these methods in gmap lead to improved splice site and gene structure prediction.Then we compare the performance of gmap with existing programs in three large-scale experiments.In experiment1,we test for robustness to sequence error by using test sets of human mRNAs with compu-tationally simulated sequence errors.In experiment2,we examine mapping and alignment quality for human ESTs with naturally occur-ring sequence errors.In experiment3,we evaluate the performance of gmap on another species,namely,the plant Arabidopsis thaliana. Finally,we describe the implementation of gmap and additional features provided by the program.RELATED WORKOne approach to cDNA–genomic alignment has been to use general sequence alignment programs,such as blast(Altschul et al.,1990), and then to assemble the resulting hits into gene structures(Gelfand et al.,1996;Wiehe et al.,2001;Milanesi and Rogozin,2003;Zhang, 2003;Yeo et al.,2004).However,the cDNA–genomic alignment problem is important enough to warrant programs specialized for the task.The particular problem that arises in cDNA–genomic align-ment is the presence of introns,which appear as large genomic gaps of up to hundreds of thousands of nucleotides in length.Introns have characteristic patterns at their splice sites,which cDNA–genomic alignment programs must take into account.About99%of introns are bounded on their ends by the canonical dinucleotide pair GT–AG; the remainder have a semi-canonical dinucleotide pair GC–AG or AT–AC,or another,non-canonical dinucleotide pair(Burset et al., 2000).Probabilistic patterns of conservation are also seen at pos-itions further away from the intron–exon boundary(Mount,1981; Senapathy et al.,1990;Solovyev,2002).Existing programs for cDNA–genomic mapping and alignment, cited in the Introduction,provide a foundation for further advances. In particular,gmap draws upon three fundamental concepts intro-duced by earlier programs.First,gmap uses an oligomer index table for genomic mapping.Second,gmap takes a hierarchical approach to genomic alignment,byfirst computing an approximate alignment and thenfilling in the details.Finally,like almost all existing align-ment programs,gmap applies specific methods tailored for detecting splice sites and for incorporating them into the alignment. Although essentially all cDNA–genomic mapping and alignment programs share these fundamental building blocks,they differ in their particular methods for implementing them;it is these methodological choices that largely account for differences in their performance. In the Algorithm section,we provide a detailed description of the specific methods underlying gmap;in the rest of this section,we summarize the basic similarities and differences of our methods relative to existing ones.Genomic mappingGenomic mapping can be accomplished rapidly because of the near-identity between a cDNA sequence and its corresponding genomic exons,which manifests as regions of exact matches.Existing pro-grams exploit this fact either byfinding clusters of relatively short oligomers,such as11-mers(blat)or14-mers(ssaha and squall), or by using fewer long oligomers.The long oligomer approach is exemplified by MGAlign(Ranganathan et al.,2003):although it does not perform mapping on a genomic scale,it initially aligns a cDNA to a given genomic segment by scanning20-mers from the ends of the cDNA.Similarly,rapid mapping is provided by MUM-mer(Delcher et al.,1999,2002),which uses suffix trees(Manber and Myers,1993)tofind long unique matches between genomes, and MegaBlast(Zhang et al.,2000),which uses28-mers to identify sequence matches.Existing cDNA–genomic mapping programs that use an oligomer index on a genomic scale begin by pre-loading the index into memory, which means that these programs not only have a long startup time, but also require computers with large amounts of dedicated RAM.For example,squall requires12GB of RAM,and the standalone version of blat requires8GB of RAM in order to map a cDNA sequence onto the entire human genome.The startup time for the standalone version of blat is several minutes,which makes it inconvenient for a researcher who wishes to map a single cDNA sequence to a genome,or who wishes to switch quickly among different genomes or versions of a genome.Therefore,blat typically runs in a client–server mode,in which a dedicated server for a particular genome keeps its genomic oligomerfiles resident in RAM.A blat server, which also requires several minutes of startup time,needs1.2GB of RAM to process the human genome,and must be kept running continuously to answer queries from a client computer.In contrast,gmap is a standalone program that has been designed to handle individual queries rapidly,with essentially no startup time. Instead of pre-loading the entire oligomer indexfile into memory, gmap looks up oligomers as needed directly from thefile.Because access tofiles is much slower than to memory,ourfile-based strategy is enabled by a minimal sampling strategy that attempts to perform as few oligomer lookups as possible,while still mapping reliably to an entire genome.Our sampling strategy involves more than scanning long oligomers from the ends of a cDNA tofind a matching pair. Because our mapping universe is an entire genome,we must safe-guard against false mapping results from the initial matching pair, which can arise due to paralogs,pseudogenes and segmental duplica-tions in the genome(Wheelan et al.,2001;Bailey et al.,2002;Zhang and Gerstein,2004).Therefore,reliable matching on a genomic scale requires additional steps,such as accumulating additional oligomer evidence beyond thefirst matching pair;monitoring when the num-ber of candidate locations has been limited adequately;and sampling adaptively to extract information from different parts of the cDNA sequence,including the middle when necessary.Approximate alignmentAn approximate alignment step is necessitated by the large size of genomic segments,which makes a nucleotide-level align-ment prohibitively time-consuming,and is therefore used in some form by virtually every cDNA–genomic alignment program.In est_genome,approximate alignments are computed by using local Smith and Waterman(1981)alignments and the resulting seg-ments are then recomputed with a global Needleman and Wunsch (1970)alignment.Spidey computes an alignment with increasing detail by performing successive blast runs at decreasing stringency levels.1860 at Institute of Biophysics,CAS on July 7, 2014 /Downloaded fromGMAP for mRNA and EST sequencesIn other programs,the predominant strategy has been a‘seed-and-extend’strategy,in which the programfirstfinds significant oligomer matches between the cDNA and genomic segment,then extends these seeds to form longer matching fragments,andfinally assembles a selection of these fragments into a collinear chain. The seed-and-extend strategy is found in a variety of programs, including those for genome–genome alignment(Chain et al.,2003; Morgenstern,1999;Batzoglou et al.,2000;Kent and Zahler,2000; Schwartz et al.,2000;Ma et al.,2002;Brudno et al.,2003a,b;Bray et al.,2003;Kalafus et al.,2004),and constitutes the approach in several cDNA–genomic alignment programs.Sim4finds match-ing seeds of12-mers in the genomic segment,extends these seeds by nucleotide-level scoring of matches and mismatches,and then assembles the resulting‘exon cores’through dynamic programming (DP).MGAlign also applies DP,both to extend its fragments and to combine local alignments into longer ones.Blat breaks the cDNA into500-bp chunks,uses these chunks to create alignment fragments through a recursive seed-and-extend method,and then uses DP to stitch together these subalignments.In contrast,gmap uses an oligomer chaining method that involves neither seeds nor extensions.Rather,this methodfinds all matching 8-mers between the cDNA and genomic sequence,and then uses DP tofind an optimal global chain of8-mers.In this process,exons are not created explicitly,but instead emerge implicitly from the globally optimal distribution of8-mer matches between the cDNA and genomic segment.Although exon–exon boundaries are defined only approximately by this method,their location is determined by both distant alignment information and local information.Oligomer chaining may extend an exon alignment that otherwise looks locally unfavorable,or terminate an exon alignment that otherwise looks locally favorable,when such decisions contribute toward a better global alignment.We have found that the use of global information is particularly important in the presence of sequence polymorph-isms or errors,which can adversely affect local decision-making for extending fragments.Splice site identificationApproximate alignment using8-mers or other fragments generally does not have the resolution needed at the nucleotide level to detect splice sites accurately.To recognize splice sites correctly in the pres-ence of sequence errors,a program must often introduce substitutions or gaps,shift nucleotides from one end of the intron to the other,or explore alternate locations for the splice sites.Existing approaches to splice site identification are based upon two ideas.Thefirst idea is to apply various heuristics tofix or adjust the approximate alignment to incorporate a splice site.For example, Spidey and MGAlign search for splice sites in the overlap between adjacent exons,and then trim the exons at the highest-scoring splice site,whereas sim4has an intron shifting procedure that adjusts the exon–exon junction tofind the best pair of splice sites.The other idea is to use splice site models,such as scoring matrices(Salzberg,1997; Brendel and Kleffe,1998),which model the observed frequency of nucleotides near the5 and3 splice sites(Nakata et al.,1985; Gelfand,1989)and thereby provide clues about the presence and location of splice sites.In contrast,gmap handles this problem by using a formal DP procedure that we call sandwich DP.Sandwich DP involves two DP matrices,one for each end of an intron,and attempts tofind the best alignment path across the diagonals of both matrices.Rather than attempting tofix an existing approximate alignment,the method computes the whole subalignment in the region surrounding anintron.This approach guarantees that all possible combinationsof substitutions,gaps and intron shifts are considered,and per-mits the use of various DP techniques.These techniques include specialized gap penalties that favor insertions or deletions of tri-nucleotides(Gotoh,1999)and band-limited alignment(Sankoff and Kruskal,1999),which enables efficient consideration of substitutionsor gaps at a large distance away from the splice site.MicroexonsIn addition to the above features,gmap has an explicit procedurefor detecting microexons and incorporating them into the alignment. Microexons as short as1nucleotide in length have found apparent experimental support(McAllister et al.,1992;Sterner and Berget,1993;Simpson et al.,2000;Carlo et al.,2000),and a computationalstudy suggests that between0.5and1.6%of mRNA sequences invarious species contain microexons(V olfovsky et al.,2003).Suchshort exons pose an acknowledged problem for cDNA–genomic alignment programs(Florea et al.,1998).A procedure for identi-fying microexons has been developed by V olfovsky et al.(2003),and applied in a large-scale study.We further this work by integrat-ing the detection procedure into the framework of a cDNA–genomic alignment program,and by adding a probabilistic extension that ensures that incorporated microexons are statistically significant. ALGORITHMIn this section,we discuss the methods used by gmap in the con-text of each of the major components needed for cDNA–genomic mapping and alignment.Specifically,we describe:(1)a minimal sampling strategy for genomic mapping,(2)oligomer chaining for generating approximate gene structures,(3)sandwich DP for identi-fying splice sites,and(4)microexon identification with statisticalsignificance testing.Minimal sampling strategyFor genomic mapping,gmap uses a sampling strategy designed to minimize the number of oligomer lookups needed to map a cDNAreliably to the genome.Our minimal sampling strategy is basedupon the use of long oligomers to achieve high specificity,combinedwith an adaptive sampling scheme to utilize mapping evidence from different parts of the cDNA sequence.As discussed previously,the rationale for using long oligomers istheir exponentially greater specificity in the genome,which meansthat mapping can be performed with few oligomer matches.Ourchoice of24as an oligomer length is guided by our own study of oligomer uniqueness in the human genome,as shown in Figure1.This graph,based on the unmasked portion of the NCBI human gen-ome(build29),shows the percentage of the observed oligomersof various lengths that are unique in the genome.For example,among all11-mers in the genome,only0.1%of them have aunique position in the genome.Likewise,among all14-mers,only22.5%specify a unique position in the genome.On the other hand,when the oligomer length is20or more,the percentage of oli-gomers with a unique genomic location reaches an asymptotic levelof96–97%.Our implementation of24-mer lookups on a genomic scale requires some adaptation of the index table scheme of ssaha(Ning1861at Institute of Biophysics,CAS on July 7, 2014/Downloaded fromT.D.Wu andC.K.WatanabeFig.1.Distribution of oligomers of various lengths in the masked regionof the human genome (NCBI build 29).The horizontal axis represents vari-ous oligomer sizes from 11to 25.The total space of possible oligomers increases exponentially,as shown by the exponentially increasing line.For each oligomer size,counts of all overlapping oligomers in the masked part of the human genome are shown by the top line,and the counts of distinct oligomers are shown by the topmost sigmoid line.Distinct oligomers can be divided into unique oligomers,which occur once (shown by the sigmoid line with percentages),and repeated oligomers,which occur more than once (shown by the bottom line).et al .,2001).In that scheme,a position file contains the observed pos-itions of oligomers in the genome,and an offset file contains pointers to the position file to indicate where a block of positions begins and ends for a given oligomer.Because this offset file contains an entry for each possible oligomer,its size grows exponentially with the oli-gomer length.In fact,14-mers represent the current practical limit for the ssaha data structure,because the corresponding index file occupies 1.1GB.Extending this indexing scheme to 24-mers would yield a sparse offset file of 424=281trillion 32-bit entries,which would be prohibitively large to store.Therefore,in our initial implementation of gmap ,we tried a hash-ing scheme instead,where the space of 24-mers is mapped onto a space of 12-mers using a hash function.If a given 24-mer has a match somewhere in the genome,an entry for the 24-mer can be found in the expected hash bin.This entry then provides the appropriate offset into the position file.Although this hashing scheme worked reasonably well,we sub-sequently found a more efficient solution by using a double lookup scheme,which breaks up the problem of finding a 24-mer into the problem of finding two 12-mers.In other words,we implement the ssaha data structure for 12-mers,with the requirement that entries in the position table be pre-sorted in ascending numeric order within each oligomer.To find the positions for a given 24-mer,we look up two lists of genomic positions,one for the initial 12-mer and one for the terminal 12-mer.The desired set of 24-mer genomic loca-tions is obtained by finding pairs of entries in these two lists that areseparated by 12nucleotides.The reason for pre-sorting the genomic positions within each oligomer is to make this procedure run in linear time with the number of genomic positions,rather than quadratic.The size of the position file is determined by the genome size and by how often oligomers are sampled in the genome.Although minimal coverage of the genome can be achieved by sampling all non-overlapping 12-mers in the genome,an overlapping samplinginterval provides increased resolution,but at the cost of a largerposition file.An additional advantage of overlapping sampling inter-vals in our scheme is that it permits lookups of oligomers otherthan 12-mers and 24-mers.For example,if we store 12-mers at anoverlapping interval of 6(which is our default),we can determ-ine the genomic location of oligomers of length 12,18,24,and soon.These intermediate-length oligomers can be useful in genomicmapping.The use of 18-mers can give additional sensitivity for diver-gent sequences,such as in the cross-species genomic mapping ofmouse cDNAs onto the human genome,and vice versa.In addition,short cDNA sequences often have too few 24-mers for reliable gen-omic mapping.In these cases,the program uses smaller oligomers:18-mers if the cDNA is between 40and 80nt,and 12-mers if it isless than 40nt.In addition to using highly specific 24-mers,gmap employs anadaptive sampling scheme designed to utilize mapping information from different parts of the cDNA sequence.The sampling process begins by scanning both ends of the cDNA sequence,and monitoring the results until a pair of 24-mers match to approximately the same location in the genome.The definition of ‘same location’depends upon the length of the query cDNA,with an allowed genomic expan-sion of 1000times the query length,subject to a default upper limit of 1million nucleotides.Therefore,the program will not attempt to predict a long intron for a very short EST.To avoid false localizations from a fortuitous pair of matches tothe genome,the program continues to sample beyond the first pairof successful hits,in order to accumulate evidence of other pos-sible localizations in the genome.This amount of further sampling is determined both by a minimum distance (default 48nt)and by a min-imum number of additional successful matches (default 3)required.If this process yields a limited number of genomic locations,the mapping process terminates.On the other hand,if there are a large number of candidate genomic locations,then gmap begins a sampling process that uses informa-tion from the middle of the cDNA sequence.This sampling process is performed iteratively,with the sampling interval halved in each round.At each sampling interval,the program looks for clusters on the genome with a high concentration of matches,with the provi-sion that genomic positions be collinear with the cDNA positions.Sampling terminates when the correct genome location is resolved to a limited number of good candidates.This determination is made by setting a threshold at 70%of the number of matches in the best cluster,and requiring that only a limited number of clusters (currently defined as 10or fewer)are above this threshold.For each candidate cluster of 24-mers,the program extracts the corresponding segment from the genome,with the correct strand of the genome determined by the orientation of the matching 24-mers.To extend the genomic segment to regions that may be relevant for further alignment at the oligomer and nucleotide level,the program looks up the genomic positions of the nearest 12-mers that match to the ends of the cDNA sequence.Oligomer chainingFor approximate alignment,oligomer chaining attempts to find a path of 8-mers that match between the cDNA sequence and each genomic segment found in the mapping step.The procedure is illustrated in the top part of Figure 2.Instead of the standard DP paradigm,which uses a matrix to align two sequences,oligomer chaining uses an 1862 at Institute of Biophysics,CAS on July 7, 2014/Downloaded fromGMAP for mRNA and ESTsequencesFig.2.Oligomer chaining and nucleotide-level alignment.The top part of the figure shows oligomer chaining.The horizontal axis represents positions on the cDNA sequence.Each cDNA position may have one or more matches of 8-mers to the genomic segment,represented by a vertical stack of cells.For each cell,the DP procedure looks for an optimal previous cell,as represented by thin diagonal lines between cells.The highest-scoring chain of 8-mer matches,represented by a thick line,describes the optimal approximate alignment.This alignment may contain jumps in cDNA or genomic coordinates,due to introns,cDNA insertions or sequence differences.These jumps are resolved by various nucleotide-level alignment procedures,represented in the bottom of the figure by various DP matrices.Sandwich alignments bridge large coordinate jumps across introns (horizontal dashed line)or long cDNA insertions (vertical dashed line).The existence of short exons is resolved by an exon testing procedure that compares alignments with and without the short exon.equivalent but more efficient representation in the form of an array of linked lists.Each position in the array corresponds to an overlapping 8-mer in the cDNA sequence,and each 8-mer has a linked list of positions in the genomic segment where that 8-mer is found.These linked lists are represented in the figure as a vertical stack of cells at each cDNA position.Each cell also contains placeholders for the optimal subscore to that point and for a pointer to the best previous cell that produced the optimal subscore.The array of linked lists is generated by first pre-scanning the cDNA for overlapping 8-mers and noting which 8-mers are present,and hence relevant.This pre-scan prevents unnecessary work later,because most of the 8-mers in the longer genomic sequence are irrel-evant.Then the algorithm scans the genomic segment for relevant 8-mers and adds their genomic positions to a list maintained for each relevant 8-mer.Finally,the algorithm scans the cDNA again,making a copy of the appropriate position list for each element of the array.After building this data structure,oligomer chaining proceeds with a DP procedure that assigns a subscore and pointer to each cell,starting from the beginning of the cDNA sequence.For each cell,the algorithm looks backward to cells at previous cDNA positions to identify the cell that both is consistent and generates a maximal score to the given cell.A previous cell is consistent if its genomic position is lower than that of the given cell,which enforces collin-earity of the cDNA and genomic sequences.The score for the cell is the score of the previous cell plus 1to indicate the length of the chain.Because introns will cause 8-mers in the cDNA not to match,the algorithm compensates for such cases by adding enough points to ensure that local extension does not gain an unwarranted advantage over an intron.One cost of our approach is greater computational complexity than one based on larger fragments.As described so far,oligomer chain-ing is O(m 2g 2),where m is the length of the cDNA and g is the average number of cells per linked list,which is generally propor-tional to the length of the genomic segment.(A total of mg cells must be processed,and at each cell the algorithm must look back at the previous set of cells processed.)In order to reduce the complex-ity to O(mg 2),we impose a sufficiency limit on the look backward.Note that this limit applies only to the cDNA sequence coordinates;there is no limitation on the look backward in genomic sequence coordinates.The sufficiency limit has a default value of 60,which expresses our calculated expectation that we should find at least one matching 8-mer between the cDNA and genome within that distance,even accounting for extremely low sequence quality.By using prob-ability calculations based on finite-state automata (Atteson,1998),we estimate that if the sequence error rate is 5%,then the chance of failing to have an error-freestretch of 8nucleotides out of 60total nucleotides is 3.8×10−6.The pointer and optimal subscore for a given cell are based on the best solution found within this sufficiency limit.However,a cDNA sequence may have a local concentration of mismatches or gaps that precludes 8-mers from being identified in a particular stretch.Therefore,if no matching 8-mer is found within the suf-ficiency limit,the algorithm will continue looking backward as far as needed to find a match.This provision allows the algorithm to1863 at Institute of Biophysics,CAS on July 7, 2014/Downloaded from。
MAPK信号途径介导的17β-雌二醇快速激活人精子的研究【摘要】目的观察17β-雌二醇(E2) 对人精子快速的非基因组激活效应,并探讨其相关的作用机制。
方法雌激素受体拮抗剂Tamoxifen (0.05μmol/L)预处理20min后,分别用E2 (1μmol/L)、雌二醇-牛血清白蛋白结合物(E2-BSA,5μmol/L)作用获能人精子15min,以三色法染色检测获能人精子顶体反应率,Western blotting检测丝裂原活化蛋白激酶(MAPK)蛋白的表达变化。
结果E2、E2-BSA均能提高人精子的顶体反应率和促进p42/p44磷酸化MAPK 蛋白的表达,且两者的作用均可被Tamoxifen所阻断。
结论人精子膜上可能存在膜雌激素受体(mER),雌激素与mER结合后可通过MAPK信号途径快速激活人精子。
【关键词】 17β-雌二醇;人精子;顶体反应;膜雌激素受体;丝裂原活化蛋白激酶AbstractObjective To study the rapid non-genic effect and mechanism of17β-estradiol on human spermatozoon activation.Methods Human spermatozoa were cultivated with E2(1μmol/L) or E2-BSA(5μmol/L) for 15 minutes after treated with tamoxifen(0.05μmol/L)for 20 minutes.Acrosome reaction rates were scored by trichrome staining and MAPK activity was determined by Western blotting.ResultsE2 or E2-BSA significantly increased the acrosome reaction rates and MAPK activity (p42/p44).The effects of E2 or E2-BSA were able to be inhibited bytamoxifen.ConclusionsThere may be some membrane estrogen receptors on the human spermatozoa,which mediate the rapid activational effects of estrogen on human spermatozoa through MAPK pathways.KEYWORDShuman spermatozoonacrosome reactionmembrane estrogen receptormitogen-activated protein kinase越来越多的研究资料证明,雌激素在精子的成熟及精子功能的获得中都起着非常重要的调节作用,体外环境下,雌激素对精子有一定的激活效应,在精子运动力的获得、精子蛋白的酪氨酸磷酸化、获能、顶体反应和受精等过程中有重要的促进作用[1]。
Chapter 1: Genomes, Transcriptomes andProteomes1. 概述基因组(Genome):指生物的整套染色体所含有的全部DNA或RNA 序列。
基因组是地球上每一物种具有的生物学信息的存储库。
基因组学(Genomics):指研究生物的整个基因组,涉及基因组作图、测序和功能分析的一门学科。
基因组所包含的生物信息的利用需要酶及其他参与基因组表达过程中一系列复杂生化反应的蛋白质的协同活性。
基因组表达的最初产物是转录组,即那些含有细胞在特定时间所需生物信息、编码蛋白质的基因衍生而来的RNA分子的集合。
转录组由转录过程来维持。
基因组表达的第二个产物是蛋白质组,即细胞中那些决定细胞能够进行生化反应的所有蛋白质组分。
这是通过翻译过程来完成的。
2.1 Genes are made of DNA奥地利神父孟德尔1865年根据7个碗豆性状的实验提出了遗传因子假说,认为每个性状由遗传因子控制,并提出了遗传因子的分离与自由组合两大遗传规律。
证明基因由核酸 (DNA或RNA) 组成的3个著名实验:①肺炎双球菌的转化试验;DNA是遗传物质②噬菌体感染实验;只有DNA是联系亲代和子代的物质③烟草花叶病毒的感染实验。
RNA也是遗传物质2.2 The structure of DNAA. Nucleotides and polynucleotidesB. The model of double helixDNA 晶体X射线衍射图谱 为揭示DNA分子的二级结构提供了重要实验证据a. Watson and Crick (1953) 提出的DNA双螺旋结构模型:" DNA分子通常以右手双螺旋形式存在,两条核苷酸链反向平行,且互为互补链。
" 戊糖-磷酸骨架在分子的外铡,在分子表面形成大沟和小沟,碱基堆积于螺旋内部。
" 碱基间通过氢键相互连接,A 和T 以2个氢键配对, G和C 以3个氢键配对。
•论著•泛酸激酶相关神经变性病新变异致病性鉴定及表型-基因型相关性研究杜鹏"2李宁2蒋玮莹I」中山大学中山医学院遗传学教研室,广州510080;$广州市妇女儿童医疗中心临床基因检测诊断中心510623通信作者:蒋玮莹,Email:jiangwy@【摘要】目的对1个泛酸激酶相关神经变性病(pantothenate kinase-associated neurodegeneration,PKAN)家系进行致病性变异鉴定并对已发表的文献进行综述,为以后临床诊断、产前诊断和遗传咨询提供理论依据。
方法抽取家系成员的外周血提取基因组DNA,用PCR扩增P4NK2基因并进行Sanger测序。
根据ACMG/AMP指南对新变异进行致病性分析。
同时通过数据库查询已发表的病例并对其进行综述。
结果在先证者中发现2个可能的变异,其中c.1355A>G(p.D452G)来自先证者的母亲,为已知的致病性的变异,c.833G>A(p.R278H)来自先证者的父亲,为新变异,根据ACMG/AMP国际遗传变异分类标准与指南将其定义为可能致病性的变异。
通过査询数据库回顾了80篇文献中的263个患者,详细总结了临床表型和变异位点,对基因型和表型的相关性进行分析,发现肌张力障碍是PKAN最常见的表型,142个致病性的变异位点中最常见的变异为c.1583C>T(p.T528M)(43/526)o值得注意的是目前报道的变异位点随机分布在PANK2基因的c.390C之后,但在c.390C之前未报道任何致病性的变异。
结论c.833G>A(p.R278H)是引发PKAN的新的可能致病性变异。
已发表病例的总结能够为今后相关患者的临床诊断、遗传学诊断及生育指导提供重要依据。
【关键词】泛酸激酶相关性神经变性病;PANK2;新变异;致病性鉴定基金项目:广州科技计划项目(201604020020)D0I:10.3760/231536-20200607-00043Pathogenicity identification and genotype-phenotype correlation of a new variant of pantothenatekinase-associated neurodegenerative diseasesDu Peng"'2,Li Ning2,Jiang Weiying11Department of Medical Genetics,ZhongShan School of Medicine,Sun Yat-sen University,Guangzhou510080,China;2Department of Clinical Genetic Testing and Diagnosis Center,Guangzhou Women andChildrens Medical Center,Guangzhou510623,ChinaCorresponding author:Jiang Weiying,Email:jiangwy@[Abstract]Objective To identify PANK variants of a family with pantothenate kinase・associatedneurodegeneration(PKAN)and review the published literatures to provide theoretical basis for future clinical diagnosis,prenatal diagnosis and genetic counseling.Methods Genomic DNA was extracted from theperipheral blood of family members.PCR was generated to amplify the PANK2gene and Sanger sequencingwas performed.Pathogenicity analysis of new variant was obtained according to ACMG/AMP guidelines.We also reviewed the published cases through the database.Results Two variants of PANK2were foundin the proband.One was a maternal variant c.1355A>G(p.D452G),which was known as pathogenic.The other was a paternal variant c.833G>A(p.R278H),and we defined it as potentially pathogenic according to the criteria and guidelines of ACMG/AMP genetic variation classification.We reviewed 263patients in80articles and summarized the clinical phenotype and variant sites in detail.Dystonia is the most common phenotype of PKAN and the most com m on variant is c.1583C>T(p.T528M) (43/526)in142variants.It is intriguing that the variants reported currently are distributed after c.390 C of PANK2gene.Conclusion c.833G>A(p.R278H)is a novel variant likely causing PKAN. Our summary of published cases can provide important basis for PKAN patients in clinical diagnosis,prenatal diagnosis and fertility guidanee in future.[Key words]Pantothenate kinase-associated neurodegenerative disease;PANK2;Novel variant;Pathogenicity identificationFund program:Science and Technology Program Project of Guangzhou(201604020020)DOI:10.3760/231536-20200607-00043泛酸激酶相关神经变性病(PKAN,OMIM# 234200)是一种由PANK2基因变异引起的常染色体隐性遗传病,其特征是铁沉积过多,由Hallervorden和Spatz于1922年首次报道,因此又叫Hallervorden-Spatz综合征(Hallervorden-Spatz syndrome,HSS)11o该病的患病率极低,约为1/1000000~3/1000000⑵。
Plant Cell, Tissue and Organ Culture 64: 145–157, 2001. © 2001 Kluwer Academic Publishers. Printed in the Netherlands.145Oxidative stress and physiological, epigenetic and genetic variability in plant tissue culture: implications for micropropagators and genetic engineersAlan C. Cassells & Rosario F. CurryDepartment of Plant Science, National University of Ireland, Cork, Ireland (∗ requests for offprints; Fax: +353-21274420; E-mail: a.cassells@ucc.ie)Received 18 April 2000; accepted in revised form 1 December 2000Key words: DNA repair, free radicals, genetic engineering, hyperhydricity, in vitro culture, juvenility, micropropagation, mutation, reactive oxygen species, somaclonal variation, tissue cultureAbstract A number of well defined problems in physiological, epigenetic and genetic quality are associated with the culture of plant cell, tissue and organs in vitro, namely, absence or loss of organogenic potential (recalcitrance), hyperhydricity (‘vitrification’) and somaclonal variation. These broad terms are used to describe complex phenomena that are known to be genotype and environment dependent. These phenomena affect the practical application of plant tissue culture in plant propagation and in plant genetic manipulation. Here it is hypothesised much of the variability expressed in microplants may be the consequence of, or related to, oxidative stress damage caused to the plant tissues during explant preparation, and in culture, due to media and environmental factors. The characteristics of these phenomena are described and causes discussed in terms of the known effects of oxidative stress on eukaryote genomes. Parameters to characterise the phenomena are described and methods to remediate the causes proposed. Abbreviations: AFLP – amplified fragment length polymorphism; FISH – fluorescent in situ hybridization; HSP – heat shock proteins; PR-proteins – pathogenesis-related proteins; RFLP – restriction fragment length polymorphism; ROS – reactive oxygen species Introduction Those using tissue culture for multiplication or transformation are concerned to produce microplants that are ‘fit for the purpose’, that is, free of specified diseases, vigorous, developmentally normal and genetically true-to-type (Cassells, 2000a, b; Cassells et al., 2000). The exceptions are that the market may exploit altered developmental characteristics, e.g. juvenility in herbaceous or woody plants where this results in greater productivity of the microplants when used for cutting production (George, 1993, 1996) or where tissue protocols give earlier flowering (Cassells, 2000a). In general, genotype-dependent, multiplication via buds tends to be the preferred strategy to maintain genetic stability (Figure 1; George, 1993). Proliferation of side shoots from axillary buds, termed ‘nodal culture’ is preferred over proliferation of precocious axillary buds in shoot tip explants. The basis for this is that apical explants may give rise to basal explant callus from which adventitious buds may arise. The latter has been associated in strawberry with genetic variability in the progeny (Jemmali et al., 1997). It is important to mention here that epigenetic and genetic instability in the tissues used for Agrobacterium transformation (somaclonal variation: see Jain et al., 1998), that is expressed in adventitious shoots, may result in chimeral transformants (Cassells et al., 1987), and in somaclonal variation in the background of the transgenic lines (Sala et al., 2000) which may contribute to the silencing of trangenes (Matzke and Matzke, 1998). In human health the importance of oxidative stress has been long recognised in cancer and ageing studies146Figure 1. The methods of micropropagation least likely to produce plants with genetic variation (reproduced with permission; from George, 1993).(Harman, 1956). It is also recognised how complex the underlying mechanisms and processes are (Halliwell and Aruoma, 1993). The cellular mechanisms to manage stress, namely, constitutive and induced production of radical scavengers, free radical and oxidised-protein enzymatic degradation pathways and DNA repair mechanisms are highly conserved in all eukaryotes (Halliwell and Aruoma, 1993; McKersie and Leshem, 1994). Environmental and pathogeninduced stress have been investigated in detail in plants in vivo (Bolwell et al., 1995; Baker and Orlandi, 1995; Doke et al., 1996; McKersie and Leshem, 1994). Stress-like phenomena expressed in vitro and in microplants have been extensively described but less is known about the underlying causal mechanisms (Ziv, 1991; Jain et al., 1998). Both in initiating cultures and in sub-culturing, explant preparation involves wounding of the tissues which is known to cause oxidative stress (Yahraus et al., 1995). Elicitors of oxidative stress e.g. hypochlorite (Wiseman and Halliwell, 1996) and mercuric salts (Patra et al., 1997), are used to surface sterilize theprimary explants. Many factors associated with aberrations in plant tissue culture such as habituation, hyperhydricity (Gaspar, 1998) are caused by oxidative stresses (Keevers et al., 1995), such as high salt (McKersie and Leshem, 1994), water stress (NavariIzzo et al., 1996), mineral deficiency (Elstner, 1991), excess metal ions (Caro and Puntarulo, 1996) and possible over exposure to auxin (Droog, 1997). Oxidative stress (Gille and Sigler, 1995; Bartosz, 1997) is defined as an imbalance in the pro- versus anti-oxidant ratio in cells and results in elevated levels of pro-oxidants (ROS: reactive oxygen species; including superoxide, hydrogen peroxide hydroxyl, peroxyl and alkoxyl radicals) (Wiseman and Halliwell, 1996) which can cause cell damage (Sies, 1991). ROS (Figure 2) can react with a spectrum of metabolites, proteins including enzymes, and nucleic acid molecules (Gille and Siegler, 1995). Oxidised enzymes which may be inactivated, are degraded by cytosolic proteinases (Laval, 1996). The influence of ROS, through altered cell redox potential, on the cell cycle and oxidative damage to both nuclear and organellar147Figure 2. Reactive oxygen species (ROS) produced constitutively in the cell. The upper section shows the natural antioxidants and enzymes used to minimize the toxic effects of ROS. The lower section gives selected examples of the harmful effects of ROS when the pro- and anti-oxidant balance is perturbed in oxidative stress. (MDE, malondialdehyde; HNE, 4-hydroxynonenal).DNA, may result in mutations (Figure 3; Bohr and Dianov, 1999). Oxidative damage in eukaryote cells is expressed in altered hyper- and hypomethylation of DNA (Kaeppler and Phillips, 1993; Tilghman, 1993; Wiseman and Halliwell, 1996; Cerda and Weitzman, 1997; Wacksman, 1997); changes in chromosome number from polyploidy to aneuploidy, chromosome strand breakage, chromosome rearrangements, and DNA base deletions and substitutions (Gille et al., 1994; Czene and Harms-Ringdahl, 1995; Hagege,1995). Such changes could explain, at least in part, the range of variability found in plant cells, tissues and organs in culture and in microplants, namely, recalcitrance including loss of cell competence (Hagege, 1995; Lambe et al., 1997), hyperhydricity (Olmos et al., 1997) and somaclonal variation including epigenetic and genetic variation (Jain et al., 1998; Joyce et al., 1999; Kowalski and Cassells, 1999). The objective of this review is to discuss tissue culture variability, its causes, detection and remediation148 with emphasis on the possible role of oxidative stress in this phenomenon.Aberrations and variation expressed in vitro At the outset it should be recognised that explants, other than buds, from dicot plants have different characteristics to those from monocots, specifically those of dicots may have a cambium; that there are differences in organogenetic potential between families, genera, species and genotypes; and that different genotypes of a species may show widely different responses (George, 1993, 1996). Further there are differences in the responses of explants from different parts of a plant, which change ontogenetically (George, loc. cit.). Some trends are evident, e.g. increased recalcitrance with advancing age of the cultures (Hagege, 1995) and increased somaclonal variability in microplants with increasing sub-culture number (Brar and Jain, 1998). With a given genotype, wounding of the tissues on cutting (excision), and tissue damage and exposure to sterilants during sterilisation, and suboptimal in vitro factors (Ziv, 1991) are important in relation to genomic damage. So called ‘pre-existing’ genomic diversity at the cell level (D’Amato, 1964; Figure 4) and wound or oxidative damage due to wounding may explain some of the variability subsequently seen in vitro and in the resulting microplants. Possible stress due to unbalanced media, bad culture vessel design and environmental stress may also, or further, contribute to the genetic, epigenetic (developmental) and physiological variability recorded (Ziv, 1991). Wounding or excision per se may be considered both as a trigger for cell division (Sangwan et al., 1992) and as a damaging oxidative burst (Schaaf et al., 1995; Yahraus et al., 1995). As a consequence of the above factors, explants may senesce, fail to respond, undergo cell division and/or produce adventitious organs or somatic embryos. In responding genotypes, the response is generally regulated in a predictable way by manipulation of the auxin to cytokinin ratio and absolute growth regulator concentrations (Skoog and Miller, 1957). In some cases, recalcitrance may be overcome by pulsing in sequence with auxin followed by cytokinin (Christianson and Warnick, 1985). Whether genome variability is ‘pre-existing’, caused by oxidative stress on wounding an/or caused by stress in culture (Figure 4), selection may begin in vitro with the appearance of sectoring in the callus (D’Amato et al., 1980). Cell lineFigure 3. Changes in DNA caused by oxidative stress which can lead to recalcitrance, loss of competence, hyperhydricity and somaclonal variation.selection for in vitro conditions may result in loss of competence; e.g. selection based on fitness of grossly altered genotype(s) may result in the irreversible loss of competence (Hagege, 1995). The main morphological aberration seen in shoots in vitro cultures, both in nodal/bud derived shoots and in adventitious shoots, is hyperhydricity (‘vitrification’) (Debergh et al., 1992). This term is used to describe aberrant morphology, typically hyperhydrated, translucent tissues and physiological dysfunction in plant tissues in vitro (Ziv, 1991). It is also associated with leaf-tip and bud necrosis. The latter often leads to loss of apical dominance in the shoots and is associated with callusing of the stem base. An important characteristic of this condition is impaired stomatal function which causes problems in establishing microplants (Preece and Sutter, 1991). Morphological variability in plants from in vitro culture may be seen in intrapopulation variability (within a population of adventitiously regenerated plants) (Kowalski and Cassells, 1998) and interpopulation variability (between populations of in vitro plants) (Joyce et al., 1999). The latter may arise when plants are propagated on different media or in culture vessels with different characteristics (Joyce et al., 1999). Intrapopulation variability can be a result of the loss of specific viruses, including cryptic viruses, from some of the regenerated plants (Matthews, 1991); chimeral breakdown, rearrangement and/or synthesis of unstable chimeral plants (Tilney-Bassett, 1986). In generally, heritable somaclonal variation (Larkin and149Figure 4. Sources of genetic variation in plants obtained through organogenesis in callus cultures (reproduced with permission; from George, 1993).Scowcroft, 1981) has the characteristics of mutation (Anonymous, 1995; Jain et al., 1998), albeit occurring at higher frequency than occurs spontaneously in seed or vegetative propagules (Preil, 1986). It is genotype-dependent and dependent on the pathway of regeneration (Karp, 1991). Epigenetic changes can occur in vitro culture resulting in ‘apparent rejuvenation’ (Pierik, 1990) affecting woody and herbaceous plants (Huxley and Cartwright, 1994; James and Mantell, 1994; Jemmali et al., 1994; Cassells et al., 1999 a, b). Interpopulation variation is usually cryptic, as control populations are not available for comparison; it is recognised in quality differences in plants produced by different protocols or by different micropropagators (GrunewaldtStoker, 1997). Examples of interpopulation variability are populations differing in degree of hyperhydricity or juvenility (Swartz, 1991). While woody plant propagators are familiar with phase change (Howell, 1998), micropropagators of herbaceous plants appear less conscious of this phenomenon but it has implications for disease susceptibility in that polygenic resistance develops as the plant soma matures (Agrios, 1997) and for time to flowering (Howell, 1998). Plants showing prolonged juvenility (epigen-etic/ontogenetic variability) may be more susceptible to damping-off diseases (Agrios, 1997). This is not always the case, as juvenile tissues are reported to have enhanced resistance to fusaric acid (Barna et al., 1995) and Cassells et al. (1991) have shown that potato crops derived from microplants, showing juvenility compared to a tuber-derived crop, were more resistant to potato blight. In vitro plants may have a longer time to flowering compared to those from vegetative propagules (Cassells et al., 1999a). While morphological intrapopulation variability and ontogenetic and physiological variation, expressed in interpopulation variability, are well recognised phenomena in micropropagation, cryptic intrapopulation variation in juvenility has also been detected in adventitiously regenerated plant populations showing genetic variation (Kowalski and Cassells, 1998) suggesting that genetic and epigenetic variability are not necessarily discrete but can occur in the same population. Somaclonal variation is strongly expressed after the microplant population establishment stage as interplant variation in morphological characters. Some of the plants may show characteristics of chimeral breakdown (Tilney-Bassett, 1986). Somaclonal variation has been extensively reviewed in Jain et al. (1998).150Figure 5. Showing the consequences of oxidative stress from induction of host antioxidant defences (repair, heat shock protein induction, pathogenesis related protein induction) to mutation, programmed cell death and uncontrolled cell death. Figure 6. The relationship between stress inducers e.g. medium salt stress, gene activation and the generation of biomarkers for stress remediation and stress damage repair. Examples of damage exposure are ethylene and ethane; of damage are oxidised bases e.g. 8-oxoguanine; of remediation are glutathione and glutathione reductase and of repair, Poly(ADP-ribose) polymerase (see text for further markers).As discussed above, shifts in characters in populations, e.g. physiological or developmental changes, are not readily recognised unless control populations are available (Cassells et al., 1997). These can, however, be visually expressed in loss of apical dominance, leaf number and leaf size and, more importantly in the time to flowering, and yield quality e.g. tuber number and size distribution in potato seed production (Cassells et al., 1999a).Characterisation of epigenetic and genetic changes in microplants pre- and post- establishment Cytometric analysis of callus has shown variability in chromosome number and ploidy in tissue culturederived plants (Geier, 1991; Gupta, 1998). Investigations indicate more chromosome variability in the callus phase than in adventitious shoots (D’Amato et al., 1980), indicating a loss of competence in the more seriously disturbed genomes (Valente et al., 1998). Cell line selection for secondary product formation also shows differences at the metabolite level (Berglund and Ohlsson, 1995). While occasional albino shoots are observed, the expression of morphological variation is difficult to assess in vitro due to variability between shoots due to temporal differences in shoot initiation and because of the limited leaf expansion in in vitro cultures. Variability in both qualitative and quantitative traits has also been reported (Karp, 1991). The latter expressed in increased standard deviations of the character mean (DeKlerk, 1990) and can be quantified using computerised image analysis (Cassells et al., 1999a). Analysis of DNA-base methylation and various genetic fingerprinting techniques have also been used to confirm and characterise variability in tissue culturederived plants, confirming both morphological and cryptic genetic and epigenetic variability between and within populations (Karp et al., 1998; Cassells et al., 1999b).Current views on the molecular basis of somaclonal variation In recent years plant cell, tissue and organ culture has been developed for applications in plant genetic manipulation (Cassells and Jones, 1995). In this field, somaclonal variation has attracted considerable interest as a means of improving crop plants (Jain et al., 1998). Reviews discussed a number of mechanisms to explain somaclonal variation, these included changes in chromosome number, chromosome breakage and rearrangement, DNA amplification, point mutations, changes in DNA methylation, changes in organellar DNA, activation of transposons (Frahm et al., 1998; Gupta, 1998; Henry, 1998; Jain et al.,151 1998; Kaeppler et al., 1998). The mechanisms appear to be equally applicable to explaining the basis of variation at the cell and callus level and are similar to the variability resulting from oxidative genome damage and induced mutation. Nagl (1990) has discussed the relationship between stress-induced and ontogenetic changes in plant genomes arguing that plant genomes are inherently fluid. In some genomes, e.g. flax (Schneeberger and Cullis, 1991) and banana (Cullis and Kunert, 2000) there are well characterised genomic instabilities associated with somaclonal variation. (Figure 5). In addition to the above ROS, chemical and physical agents can stimulate lipid peroxidation that can become autocalatytic resulting in the production of organic hydroperoxides (Figure 2). ROS in the presence of iron and copper ions may generate highly mutagenic compounds, e.g. peroxyl radicals and alkoxyl radicals (Koh et al., 1997). The primary response to elevated ROS production (Figure 2) is stimulation of production of antioxidant molecules (radical scavengers) such as ascorbic acid and glutathione in the aqueous phase and alpatocopherol and carotenoids in the lipid phase (Gille and Sigler, 1995) and the activation of antioxidant enzyme systems including superoxide dismutase, catalase and glutathione and ascorbic acid peroxidases (Tsang et al., 1991; Larson, 1995; Smirnoff, 1996). Additional responses involve the activation of heat shock proteins to protect enzymes systems against ROS damage (Burdon, 1993) and of proteases to degrade damaged proteins (Stadtman, 1992). More significant is the potential of ROS to cause DNA damage (‘genotoxicity’; Wiseman and Halliwell, 1996). The cell cycle slows or shuts down to minimise the transmission of mutations to daughter cells through mitosis and to facilitate DNA repair (Logemann et al., 1995; Amor et al., 1998; Reichheld et al., 1999) and DNA repair mechanisms, the SOS response, are activated (Laval, 1996; Yamamoto et al., 1997; Vonarx et al., 1998). The outcomes of oxidative stress depend on the balance between pro and anti-oxidants responses. Imbalance may lead to controlled responses (Figure 5) such as induced resistance to pathogens (Ernst et al., 1992), excessive imbalance to cell damage and mutation (Wiseman and Halliwell, 1996), possible programmed cell death (apoptosis) (Polyak et al., 1997) and, in the extreme, to (unprogrammed) cell death (Hippeli and Elstner, 1996). Oxidative stress has been linked to recalcitrance in protoplast culture (Benson and Roubelakis-Angelakis, 1994). Increase in ROS is associated with a range of biotic and abiotic stresses (Gile and Siegler, 1995; Bartosz, 1997). These include salt, drought, heat, and UV-induced stresses. They are also induced by chemical and physical mutagens (Anon, 1977). Their protective and constitutive roles include direct protection against pathogen attack, and involves the role of H2 O2 as a messenger in the induction of host resistance (pathogenesis-related (PR) protein induction) (Lamb and Dixon, 1997). H2 O2 may have a role in xylem formation and other cell-death processes in plants (Howell, 1998). ROS have a role in creatingThe relationship between somaclonal variation and spontaneous mutation The paper of Shepard et al. (1980) on somaclonal variation in potato stimulated interest in the application of this variability in crop improvement but was soon followed by concern about the quality of somaclonal variation and whether it differed qualitatively from spontaneous mutation (Sanford et al., 1984). This issue has been discussed by Karp (1991, 1995) but while it is still controversial, somaclonal variation is used in plant improvement, with induced mutagenesis whose efficiency has been improved by exploiting in vitro plant systems (Cassells, 1998). More importantly here, induced mutation and somaclonal variation result in a qualitatively similar, if not quantitatively identical, spectrum of DNA changes (see Figure 3). The issue is whether somaclonal variation and other tissue culture variability are mechanistically like physically-induced mutation and are caused by reactive oxygen species (Anonymous, 1977; 1995; Micke and Donini, 1993).Oxidative stress and mutation Oxidative stress is caused by the unremediated hyperactivity of reactive oxygen species (Figure 2). ROS such as superoxide, hydrogen peroxide and the hydroxyl radical are metabolic intermediates in respiration and photosynthesis and other metabolic activities in plants (see review by Gille and Sigler, 1995; Bartosz, 1997). Their natural cytoplasmic toxicity and genotoxicity is controlled by antioxidants and enzymic pathways in the cell (Hippeli and Elstner, 1996). Various environmental signals (Bartosz, 1997), which lead to an increase in ROS are associated with induced mechanisms to minimise their harmful effects152 variability in the plant genome by activating transposons (Mhiri et al., 1997), inducing polyploidy, chromosome breakage/rearrangements and base mutations (Figure 2). DNA based changes induced by ROS may inhibit methylating enzymes leading to hypomethylation (Cerda and Weitzman, 1997; Wacksman, 1997), e.g. formation of 8-oxoguanine occurs at high frequency which may lead to mismatch at DNA repair (base mutation, e.g. AT–GC changes), similarly for other oxidative base changes. While much of the ROS-induced effects due to wounding may be localised, some ROS can migrate across membranes, e.g. H2 O2 and cause effects directly, or via membrane lipid peroxidation, at the level of DNA in the stressed cells or in neighbouring cells (Figure 2). A gradient of stress damage from the wound area back into the explant may be hypothesised. ROS (and physical mutagens) have been confirmed in animal cells (Halliwell, 1999) as causing the range of epigenetic and genetic changes in DNA that are problematic in plant tissue culture (Figure 3). number and DNA content (Quicke, 1993; Curry and Cassells, 1998). Techniques including fluorescent in situ hybridisation (FISH) (Maluszynska and HeslopHarrison, 1991) and Giemsa banding (Quicke, 1993) are used to look for somatic recombination, including chromosome breakage and rearrangement and may also be used to detect DNA amplification and reduction. Changes in DNA base methylation can be investigated using methylation-sensitive restrictions in RFLP and AFLP analysis (Karp et al., 1998) or by PCR of bisulphite modified DNA (Joyce et al., 1999).Plant hormones and oxidative stress Plant hormones implicated in hyperhydricity (vitrification) include cytokinins, auxins, and the auxin/cytokinin ratio; gibberellic acid and ethylene (Ziv, 1991; George, 1996). Oxidative stress has been associated with auxin and cytokinin metabolism in Agrobacterium induced tumours (Jia et al., 1996). Ethylene is also strongly linked to oxidative stress (McKersie and Leshem, 1994; Pell et al., 1997). Injury has been shown to activate the oxidation of IAA, while kinetin is reported to be a secondary product of oxidative stress (Barciszewski et al., 1997). In vitro, various media factors have been shown to induce stress, including hormones, and mineral nutrients (Ziv, 1991; George, 1993, 1996) there is evidence that metal toxicities and deficiencies may generate ethylene through oxidative stress (Lynch and Brown, 1997). Oxidative stress also affects cytosolic calcium (Price et al., 1994). Oxidative stress has been suggested as a cause of guard cell malfunction (McAnish et al., 1996). Calcium has been implicated in increased stress tolerance (Gong et al., 1997).Parameters used to characterise oxidative stress A number of methods have been used to monitor/characterise oxidative stress (Figure 6). These include measurement of the redox potential (Reichheld et al., 1999), measurement of stress related metabolites e.g. ascorbic acid (Smirnoff, 1996), glutathione (de Vos et al., 1994), hydrogen peroxide (Schreck et al., 1996). Ethylene has been monitored in hyperhydricity (‘vitrification’) studies (Keevers and Gaspar, 1985) and along with ethane, a marker for lipid peroxidation, has been used to monitor stress in vitro (Cassells et al., 1980; Cassells and Tamma, 1985). Thiobarbituric acid reactive substances are also used to assess lipid peroxidation (Laszczyca et al., 1995). 8oxo-2 -deoxyguanosine (Kasai, 1997; Bialkowski and Olinski, 1999) is considered to be a reliable indicator of genotoxicity as are other bases modified by ROS (Yamamoto et al., 1997). Enzymes of oxidative metabolism (Mehlhorn, 1990), enzymes associated with the cell cycle (Chiatante et al., 1997), enzymes of the SOS response (Laval, 1996; Wiseman and Halliwell, 1996) e.g. poly(ADP-ribose)-polymerase (Amor et al., 1998), screening for heat-shock proteins (HSP) (Burden, 1993) and PR proteins (Glandorf et al., 1997) have also been used as oxidative stress monitors. Karyotyping, flow cytometry and microdensitometry can be used to measure changes in chromosomeRemediation of oxidative and other tissue culture associated stresses Remediation of oxidative stress can be based on several strategies. Genotypes can be screened for their sensitivity to stress and sensitive genotypes avoided; or they can be bred, mutated or engineered for increased stress tolerance (Gupta et al., 1993). The breeding/genetic manipulation options are both relatively long-term and costly and can only be applied to individual genotypes (Jones and Cassells, 1995). An important consideration where plant tissue culture is used for cloning or transformation is the choice153 of the explant. While mature plant tissue may be polysomatic (D’Amato, 1964), this may not be so in the case of the tissues of young plants in vitro (Curry and Cassells, 1998; Curry and Cassells, unpublished). Selection of explants from the latter may avoid the problem of ‘pre-existing’ variation (Figure 4). The main option, however, is the use of stable pathways of multiplication (Figure 1: George, 1993, 1996); albeit, even with these genetic drift may occur (Jemmali et al., 1997). There is evidence e.g. that protoplasts give rise to greater variability than tissue explants as expressed in the greater variability in plants regenerated from the former (Sree Ramulu et al., 1984). This may reflect expression of somatic cell variability (Figure 4) but it also appears to reflect the stress experienced in their isolation and regeneration where they are exposed to osmotic stress and the toxic effects of cell wall degrading enzyme preparations (Cassells and Tamma, 1985) and this has implication for their use in transformation via electroporation or biolistics at the protoplast level (Christou, 1995) as opposed to bombardment of apices (Sautter et al., 1995). The immediacy of the oxidative stress caused by excision and manipulation (Yahraus et al., 1995) would suggest that treatment of the explant after excision would be relatively ineffectual. As an alternative, it is suggested that stress be applied to the donor plant (or in vitro microplant) before excision. Under in vivo conditions, it was been shown that stress exposure induces cross-stress tolerance, e.g. UV treatment not only increases tolerance to further UV exposure but also to pathogen induced stress (Ernst et al., 1992), similarly paclobutrazol which is used in vitro and in weaning treatments (George, 1993, 1996) reduces oxidative stress to high light and high temperature (Mahoney et al., 1998). It has been suggested that induction of heat shock proteins be used to protect against oxidative stress (Banzet et al., 1998; Sebehat et al., 1998). Prevention or reduction of oxidative stress damage in vitro may be possible by manipulating hormone and mineral nutrients using the above oxidative stress parameters to monitor the protocols as has been done in the case of protoplasts (Cassells et al., 1980; Cassells and Tamma, 1985). In vitro calli and tissues are more amenable than intact tissues to permeation techniques. Cells, calli and explants can be infiltrated and bathed with radical scavengers such as ascorbic acid, mannitol and dimethyl sulphoxide. The stress messenger hydrogen peroxide can be broken down by extracellular catalase. Manipulation of media composition, particularly using simple media formulations in autotrophic or microhydroponic culture (Cassells, 2000a) and modification of culture vessel design to facilitate controlled gaseous exchange (Cassells and Walsh, 1998), can greatly influence microplant resistance to hyperhydricity and improve ontogenic development (Cassells, 2000b).Conclusions It is speculated here that problems underlying the application of plant tissue culture systems in plant cloning (micropropagation) and genetic transformation, namely, recalcitrance, hyperhydricity, poor physiological quality, genetic and epigenetic variation, may have a common basis, at least in part, in oxidative stress-induced damage. This damage is caused by an overwhelming of the antioxidative defenses by primary ROS and secondarily, by the production of ROS by lipid autocatalytic peroxidation (Figure 2). While damaged proteins may be broken down by proteinases, damage is fixed in the DNA. Specific levels of DNA damage may result in cell death or programmed cell death, other levels in loss of cell competence, altered methylation patterns may result in epigenetic changes, or in mutations (Figure 3). Somaclonal variation shows a similar spectrum of genetic variation to induced mutation; and oxidative stress and irradiation are known to involve ROS. Oxidative stress damage which it is emphasised is genotype dependent, may also operate at the physiological level since ROS have been shown to influence plant hormones namely, cytokinin, auxin, ethylene metabolism. Oxidative stress also influences calcium metabolism which in turn is involved in auxin transport, guard cell function and as a secondary messenger. Remediation strategies have been proposed and some makers of oxidative stress listed. It is suggested that induction of heat shock proteins may confer cross tolerance to the stress phenomena encountered as problems in establishing plant tissue cultures. Media and environmental manipulations should be carried out aimed at reducing in vitro stress based on adjusting the media composition and the physical culture environment, paying special attention to hormone stress, mineral composition and water and light stress. Reactive nitrogen species (NOS) not discussed here, can。