遗传多态性知识汇总
- 格式:doc
- 大小:629.00 KB
- 文档页数:27
遗传多态性知识
一、SNP, LD, Haplotype and Tagger SNP
1. 遗传/基因多态性(genetic/gene polymorphism)
在一随机婚配的群体中,染色体同一基因座位点上有两种或两种以上的基因型,且各个等位基因在群体中的出现频率皆高于1%。
它是决定人体对疾病易感性、临床表现多样性及药物治疗反应差异性的重要因素。
而种群中频率等于或小于1 %的碱基变异称为突变。
染色体同一DNA位置上的每个碱基类型叫做一个等位位点。
如某些人的染色体上某一位置的碱基是A,而另一些人的染色体上相同位置上的碱基是G,除性染色体外,每个人体内的染色体都有两份,所以,一个人所拥有的一对等位位点的类型被称作基因型(genotype),如GA、GG、AA;检定一个人的基因型,被称作基因分型(genotyping)。
由不同基因型与环境共同作用所产生的生物体(人类)可观测的物理或生理性状称为表现型(phenotype)。
限制性片段长度多态性(restriction fragment length polymorphism. RFLP)是第一代的遗传标记;可变数目的串联重复(variable number of tandem repeat. VNTR)是第二代遗传标记;其中重复单位为2-6个核苷酸称为微卫星或短串联重复;6-12个核苷酸称为小卫星。
Polymorphisms are defined as frequent (occurring in greater than 1% of the population) variations in the human DNA sequence. Most involve a single base pair substitution, known as single nucleotide polymorphisms(1), although more complex variations are also recognised. SNPs are single base pair positions in genomic DNA at which different sequence alternatives (alleles) exist in normal individuals in some population(s), wherein the least frequent allele has an abundance of 1% or greater. In principle, SNPs could be bi-, tri-, or tetra-alletic polymorphisms. Howere, in humans, tri-alletic and tetra-alletic SNPs are rare almost to the point of non-existence, and so SNPs are sometimes simply referred to as bi-allelic markers.
单核苷酸多态性(single nucleotide polymorphism.SNP):最早由美国麻省理工学院的人类基因组研究中心Lander于1996年提出,是不同个体基因组DNA序列内特定核苷酸位置上单个碱基的不同.是第三代遗传标记,任一SNP在群体中出现的频率应不小于1%,原则上SNP 可以是双、三、四等位基因多态,在人类三、四等位基因的SNP很少甚至几乎不存在,因此SNP简单指双等位标记,双等位基因的SNP替换包括1个转换C\T(G\A)和3个颠换C\A(G\T)、C\G(G\C)、T\A(A \T),由于核苷酸的5-甲基胞嘧啶脱氢基反应相对比较频繁,使得四种SNPs
在基因组中出现的频率不同,在生物体内约2/3是C/T(G/A)转换,并且多存在于非转录序列中。
据统计,人类基因组中3*10
9碱基中至少存在着1000万个SNPs位点,平均约1个SNP/1000bp。
与其他遗传标记(如限制性片段长度多态,短串联重复)的主要不同是不再以“长度”的差异作为检测的手段,而直接以序列的变异作为标记,具有高丰度、高度稳定性和易于自动化分析等独特的优势。
英文描述:SNP markers are preferred over microsatellite markers for association studies, because of their high abundance along the human genome (SNPs with minor allele frequency>0.1 occur once every 600 kb) (Wang et al.1998), their low mutation rate, and the accessibility of high-throughput genotyping. The power of association studies based on SNPs depends not only on the sample size and density of the marker map but also on many other factors, such as the age and frequency of the disease mutations and SNPs and the extent of linkage disequilibrium(LD) in the region.(2)根据SNP在基因序列中所处的位置的不同,SNP位点可以分为几个大类。
大多数对基因的功能没有影响的SNPs,称为anonymous SNPs;存在于基因内部的SNP位点则称为gene-based SNPs,包括内含子、外显子和启动子中的单核苷酸多态性位点。
其中,存在于蛋白质编码序列中的SNP位点称为cSNPs或coding SNPs。
在cSNPs中,如果不改变所编码的氨基酸序列,这样的单核苷酸多态性称为synonymous SNPs;如果SNP导致了氨基酸序列的改变,则称为non-synonymous SNPs。
发生在基因蛋白编码区的SNP,可能引起编码氨基酸的置换,导致蛋白功能的改变;大多数SNPs发生在非编码区,启动子区域的SNP也许影响转录因子结合的能力,改变基因转录的速率或水平;发生在5’上游区或3’下游区域的SNPs可能改变转录的mRNA的稳定性或增强子活性;而内含子区域的SNPs的功能效应有待于进一步研究(3)。
检测SNP的方法多种多样,有直接测序法、PCR-RFLP法、单链构型多态分析法(single strand conformation polymorphism analysis,SSCP)、异源双链分析法(heteroduplex analysis,HA)、变性梯度凝胶电泳分析法(denaturing gradient gel electrophoresis,DGGE)、固相化学断裂法(solid phase chemical cleavage method,spCCM)、等位基因特异性聚合酶链反应法(allele-specific PCR)、DNA芯片检测法和实时荧光定量PCR法等,均具有较高的特异性和敏感性,不同实验室可以根据研究目的和经费选择合适的检测方法。
2. 单倍型(haplotype)
位于染色体上特定区域、相互关联、倾向于以整体模式遗传给后代的SNPs组合称作单倍型(haplotype)
,比拟为人类进化历史的“分子化石”。
在一段DNA内若存在n个SNP位点,则群体内理论上可能存在2n种单倍型,但针对每一个体来说只有2种单倍型。
单倍型构建方法:实验方法目前有单分子稀释法(single-specific dilution)、AP-PCR(allele-specific PCR)、长插入克隆法(Long-insert cloning)与双倍型-单体型转化(diploid-to-haploid conversion)等;统计算法有Clark算法、最大似然算法、贝叶斯算法。
3. 单倍域(haplotype block)
根据基因组大范围内SNPs之间的连锁不平衡,能够用一个相对简单的模型来描述人类基因组的单倍型结构,即染色体上存在的连续的、稳定的、几乎没有被重组所打断的单倍型区域,称为单倍域(haplotype block or haploblocks)。
Several neighboring, tightly linked SNPs are inherited together and form a haplotype block.单倍域可能是遗传的最小单位,在极端情况下,它可以是一个单独的SNP或者是一整条染色体,重组事件频发的区域可将相邻的单倍域间隔开来。
3.1 单倍域的定义:
①a haplotype block is a contiguous set of markers in which the average D’(the standardized coefficient of LD(4)) is greater than some predetermined threshold.
②Gabriel et al(5)described human genome can be parsed objectively into haplotype blocks: sizable regions over which there is little evidence for historical recombination and within which only a few common haplotypes are observed. based on linkage disequilibrium (LD), that is large pairwise |D’| values between those SNP pairs within one haploblock.
③Patil et al(6) defined haplotype blocks as a region with a large proportion(>80%) of inferred common haplotypes.based on the concept of “chromosome coverage” , with a haplotype block containing a minimum number of SNPs that account for a majority of common haplotypes or a reduced level of haplotype diversity.
④Wang et al(7)further proposed explicit“no historical recombination” as a definition for haplotype blocks, which can be tested using a four-gamete test.⑤Ding K et al(8)choose to define haplotype blocks based on LD when haplotype-block-based tSNPs selection methods were employed. The LD-based haplotype-block definition requires that the proportion of SNP pairs with strong D’(absolute D’≥0.70) must account for at least 95% of pairs of SNPs
3.2 单倍域的算法及划分标准:
3.2.1 基于连锁不平衡:
①Gabriel Criteria(5) of haplotype block partitioning:
❖Exclude MAF of SNPs below 0.05
❖“strong LD” is defined that if the one-sided upper 95% confidence bound on D’is > 0.98 (that is, consistent with no historical recombination) and the lower bound is above 0.7.
❖“strong evidence for historical recombination” pairs for which the upper confidence bound on D’ is less than 0.9.
We defined a haplotype block as a region over which a very small proportion (<5%) of comparisons among informative SNP pairs show strong evidence of historical recombination.Fraction of strong LD in these two categories must be at least 95%. [We allow for 5% because many forces other than recombination(both biological and artifactual) can disrupt haplotype patterns, such as recurrent mutation, gene conversion, or errors of genome assembly or genotyping]
②The block definition method based on the D' measure of LD, employed by Gabriel et al, was applied to the SNP genotype data through the HaploView software package (MJ Daly and JC Barrett, Whitehead Institute, MA, USA). Briefly,a block was defined as a region in which less than 5% of SNP pairs had a D' upper confidence bound less than 0.9. In addition, blocks consisting of 2 SNPs could span up to 20 kb and blocks of 3 or 4 SNPs could span up to 30 kb. Blocks were not allowed to overlap(9)
③Wang et al(7) f urther proposed explicit“no historical recombination” as a definition for haplotype blocks .利用四配子检验法(four-gamete test,FGT)提出了单体域的算法:首先对成对的SNPs进行四配子检验(检测到4个配子就表示曾经发生重组),将两两位点的四配子状态用矩阵表示,有4个配子出现计为1,否则为0;单体域被定义为没有重组现象发生的一组有序SNP标记,也就是根据FGT的结果,只要配子数不超过3个,就不断累加SNP到一个域中,直到第k个位点出现4个配子而结束,位点k可作为另一个新单倍域的突变起始点。
FGT 算法的优点之一就是无需预先设定域值,当样本量较大时与贪婪算法结果相似。
Haploview Four gamete rule: For each marker pair,the population frequencies of
the 4 possible two-marker haplotypes are computed. If all 4 are observed with at least frequency 0.01, a recombination is deemed to have taken place. Blocks are formed by consecutive markers where only 3 gametes are observed.
3.2.2 基于单体型多样性:
①Patil et al(6) defined haplotype blocks as a region with a large proportion (>80%)of inferred common haplotypes. 提出了获得单体域近似分割的贪婪算法,首先考虑由连续SNPs形成的所有可能的单体域,然后从中选出一个单体域,使得该域中的SNP数目与所需最少的标签SNPs(用来区分的出现一次以上单体型)数目之比值达到最大,也就是用最少的标签SNP区分出最多的SNP;每个SNP都被安排一个单体域中.所有单体域的大小与其在染色体上的顺序无关,且单体域没有绝对的边界。
Two criteria:(1) in each block, at least 80% of the observed haplotypes are represented more than once; and (2) the total number of tag SNPs for distinguishing at least 80% of haplotypes is as small as possible
②Zhang et al(10-11)提出了单体域分割的动态程序算法,算法的原理是使每个单体域中能代表域中大部分性质的标签SNPs达到最少,他们的算法已经被开发为程序HAPBLOCK(http:// /msms/HapBlock/)。
尽管上述方法各具优点,但Wall et al(12)指出更倾向于第一类方法,原因:其一,使用D’直接检测历史性重组的发生看起来更符合单体域的定义;其二,对于二倍体的遗传数据,两两配对的方法更容易应用;最后,两两配对连锁不平衡的系数更易于可视化。
3.2.3 其余划分标准
❖haplotype block boundaries were inferred from the phased genotype data (probability threshold for correct phase call at each site: 0.95) by D’ confidence limits (upper confidence limit >0.97, lower confidence limit > 0.70, fraction of informative pairs in strong LD: 0.95) using Haploview (/personal/jcbarret/haploview/)
❖所有两两SNP之间的D’值最小值>0.9(13-14)
❖所有两两SNP之间的r2值和D’值均等于1(15)
❖所有两两SNP之间的r2值最小值>0.8(16)
❖95%的两两SNP之间的D’值最小值>0.7(8)
Several neighboring, tightly linked SNPs are inherited together and form a haplotype block, which as a haploblock has a higher discrimination power than the individual SNPs within the block. Candidate haplotype blocks were selected from three major populations(Caucasian, East Asian, and African) using the following
parameters: maximum match probability reduction=0.85, linkage disequilibrium (LD) r2≥0.7, maximum F
st=0.06(17), minimum number of SNPs=3, minimum heterozygosity=0.2, and minimum number of haplotypes=3.(18)
4. 标签SNP(tagger SNP)
对于一个连锁群来说其可能包含有很多SNP位点,但是只需用少数几个SNPs就足以特异性地鉴定出该连锁群的单体型模式,而这样的SNPs被称为标签单核苷酸多态性(tag single nucleotide polymorphism,tSNPs),是基因组中具有代表性和特征性的SNP,是构建单倍型或进行关联分析所必需的一组遗传标记。
而仅通过少数SNP等遗传标记就可以识别单倍域中的大部分单倍型,这些遗传标记被称为单倍型标SNP,称为单倍型标签SNP(haplotype tag SNP htSNP) (19)。
4.1 tSNP和htSNP的区别
The two terms, htSNPs and tSNPs, refer to two different strategies(8)for choosing the optimal minimum subset of SNPs from the entire set of SNPs. htSNPs are selected based on the haplotype-block model of LD pattern in a region of interest and represent the common haplotypes inferred from the original set of SNPs. On the other hand, tSNPs are selected based on measures of association, such that a tSNP predicts partially or completely the state of other SNPs.
4.2 挑选tSNP或htSNP方法分类
Eight methods can also be classified as haplotype block-based methods: All common haplotypes, Haplotype diversity, R2h (Coefficient of determination), and Entropy and haplotype-block-free methods: TagIT (Haplotype r2), LD r2 (based on pairwise LD), PCA (principal component analysis), and BEST (based on set theory).
LD level is based on the following criteria(8): LD level varied from strong LD (D’>0.8), to moderate LD (0.4<D’≦0.8), and to weak LD (D’ ≦0.4). A measure of LD (D’) was calculated using LDA software, and t he level of LD was assessed by use of sliding-window plots of average D’ in each gene.
①Haploview Tagger软件标准:若连锁不平衡参数r2≥0.8,可以认为其中一个SNP可以取代另一个SNP,所以可以任选其中一个作为标签SNP,以最少数目的标签SNP代表全基因范围内的MAF≥0.05的SNPs。
Haplotype tag SNPs (htSNPs) were selected by Haploview on a block-by-block basis. Haplotypes defined by the tagging SNPs within each block of CCR3 gene for all of the studied subjects were inferred by PedPhase V2.0 (/jili/haplotyping.html)(15). A total of 16 SNPs in and around CCR3 gene (20)were selected on the basis of the following criteria: (1) validation status, especially in Caucasians, (2) an average density of 1 SNP per 3 kb, (3) degree of heterozygosity, i.e., minor allele frequencies (MAF) >0.05, (4) functional relevance and importance, (5) reported to dbSNP by various source.
②Zhang et al(10)所构建单体域的动态程序中,采用枚举法来选择htSNP。
这些方法就是先将染色体分割成连续的单体域,然后通过肉眼观察或程序运算从每个域中选择出可以代表域中多样性的标签SNPs,并要求标签SNPs 的数目达到最小。
③Johnson et al(21)提出的算法是以连锁不平衡为基础的,原理是首先计算两两SNPs间连锁不平衡程度,如果高度连锁,那么就可以用一个来预测另一个,就是说只选择其中一个作为htSNP。
算法依据连锁不平衡的参数,剔除冗余的SNPs,列出所有可能的htSNPs子集,然后根据各组htSNP能够说明样本中单体型变异的多少。
(多样解释比例,proportion of diversity explained,PDE)来确定最佳的一组为htSNPs.
④CIayton算法(22)的原理是让进行基因分型的标签SNPs能够对剩余的不分型SNPs进行很好的预测,依据其原理建立算法选出可能的htSNPs子集,最后同样依据PDE来选出htSNPs.与Johnson的算法不同的是该方法可以按照使用者的要求在PDE分析之前剔除覆盖率达不到要求或有缺失的数据的htSNPs子集。
⑤Stram等(23)选择标签SNPs算法就是让标签SNPs能够对总体SNPs的分布进行较好的预测,算法考虑每个个体真实的单体型拷贝数与通过标签SNPs所预测的单体型拷贝数的相关程度,并将相关系数的平方R2(>0.7)作为选择htSNPs的参数。
In any case, a considerable loss of information about potentially causative variants was associated with all SNP tagging methods. Simply put, more tagSNPs provide more information, and all genotyped SNPs provide the maximum information. While r2 tagging with a cut-off of 0.8 is often considered sufficient to capture most information, there is still a loss of 15% compared to the use of the complete marker set. The portion of captured variants obtained with using all markers is comparable to earlier findings for the ENCODE regions with both lower and higher
SNP marker densities.
(24)
Haplotype block, tag SNPs, Haplotype用途和研究步骤(2):Haplotype blocks, together with the corresponding tag SNPs and common haplotypes determined by haplotype block–partitioning algorithms, can be used in genomewide association studies, as well as in the finescale mapping of complex disease genes. First, a small number of samples (e.g., 10 or 20 individuals) are chosen to be genotyped at a very dense SNP map in a region, and the haplotypes of these individuals are identified simultaneously. Second, an algorithm for haplotype block partitioning is employed, to identify haplotype block structure and a set of well-spaced tag SNPs. Third, a larger number of samples are genotyped only at these tag SNP marker loci. Fourth, association studies are conducted using all the genotyped samples, with knowledge of the haplotype block structure.
5. 连锁不平衡(linkage disequilibrium LD)
由Jinnings在20世纪初期提出,是指同一条染色体上的SNPs之间不是孤立的,不同位点的等位基因往往倾向于同时出现,出现的概率超过人群中因随机分布而使两位点同时出现的概率,又称等位基因关联(allelic associattion)(25)。
LD is the “nonrandom association of alleles at different loci”.连锁不平衡现象在群体遗传学参数估计、基因精细定位、关联分析等方面有广泛应用,从本质上讲,关联分析检测的就是遗传标记和性状之间的连锁不平衡。
一般来说,连锁不平衡可以从突变、随机漂变、瓶颈效应和群体混合过程中产生,而连锁不平衡随衰减时间和遗传位点间的遗传距离的增长而减弱。
连锁不平衡区别于连锁(linkage)(26),Linkage refers to the correlated inheritance of loci through the physical connection on a chromosome, whereas LD refers to the correlation between alleles in a population.
5.1 连锁不平衡度量方法
连锁不平衡的度量方法有很多,如相关系数r2、Lewontin’s D’、人群归因危险度δ、Yule’s Q、Kaplan和Weir’s比例差d等(27),大多应用于双等位基因的配对检验,其中使用最广泛的是D’和r2。
在简化模型中,假设两个位点的两个等位基因A、a和B、b,等位基因的频率分别是πA、πa、πB、πb,可形成4种单倍型,单倍型的频率表示为πAB、π
aB、πab、πAb。
The basic component of all LD statistics is the difference between the observed and expected haplotype frequencies:D ab=(πAB-πAπB)。
由于D的取值范围不理想,通常不直接使用这个公式进行度量,而对D先进行归一化后再使用。
两个最常用的归一化的LD度量是r2和Lewontin的连锁不平衡系数(coefficient of linkage disequilibrium)D’,r2(also described as △2)=D ab2/(πAπaπBπb),认为是两个等位基因的相关系数的平方;如果D ab>0则D’= D ab/min(πAπB,πaπb),如果D ab<0则D’=D ab/min(πAπb,πaπB) (26)。
标准化后,这两个度量的取值范围在0(重组)和1(完全连锁不平衡)之间。
The measurement of LD is a large and complex topic and will not be reviewed in detail here; but see the work of Devlin and Risch (1995); Jorde (2000) and Hudson (2001). Most of the measures of LD that are in wide use quantify the degree of association between pairs of markers. In part, they differ according to the way in which they depend on the marginal allele frequencies. In the present article, we use one popular measure of LD between pairs of biallelic markers, commonly denoted by r2 (elsewhere,r2is also denoted by △2) (28).
D’可以看成是一个和频率无关的度量,当在检测位点间观察不到任何重组事件的时候取得最大值1,即完全连锁不平衡(complete LD),在这种情况下由这两个位点构成的4种单倍型在所选的样本中至多只能出现3种。
如果D’<1则说明这两个位点间发生过重组(新发生的突变也会引起D’<1,但对于SNP来说突变的概率较重组要小的多),这种情况下4种单倍型均可出现,但这时D’值相对大小的意义就很模糊了(如D’=0.3或D’=0.7,二者的区别就很模糊),因此如果D’的计算结果接近于1,则提示两位点间历史上发生重组的可能性很小,但如果D’处于中间值则不可用该数值来比较两位点LD程度的差别。
而且,在小样本中D’值会显著增加,这对于有等位基因频率较低(两个等位基因中频率较低的一个频率<5%)的SNP来说非常明显,因此这时即使两个标记已达连锁平衡亦可能出现较高的D’值。
r2则是一个和频率有关的度量,代表两位点在统计学上的关系,r2等于D2除以两位点上4种等位基因频率的积,实际上可以从两位点的独立性检验公式推算得到。
r2等于1称为完美连锁不平衡(perfect LD),说明两位点没有被重组分开,且等位基因频率相同,在这种情况下由两点构成的4种单倍型在所选样本中只出现2种(即AB,ab),并且r2值在小样本中不会显著增加。
尽管r2和D’在小样本量或低等位基因频率时衡量连锁不平衡有不足,但二者各有自身优点,r2概述了重组和突变,D’仅反映重组是较精确的衡量重组发生的方法,但
在小样本和低等位基因频率影响较大,需r
2统计量弥补。
D’ is good at identifying the ‘block’ structure of LD and r2 is better for defini ng the ‘associated interval’ and identifying potential causal variants.(29)
LD 度量方法D’和r2值的描述:linkage disequilibrium (LD) describes the relationship between genotypes at a pair of polymorphic sites. Several popular s tatistics exist for describing LD; the two most frequently used are D’and r2 (sometimes referred to as “△2”) (Devlin and Risch 1995). D’=1 if neither site has experienced recurrent mutation or gene conversion and if there has been no recombination betwee n the sites. “D’=1” can be described as “complete LD”, because the allelic association is as strong as possible, given the allele frequencies at the two sites. However, genotypes can be perfectly correlated between sites only if their MAFs are the same. Only when genotypes are perfectly correlated does r2 =1, which can be described as “perfect LD.”(30)
r2值的大小和关联分析的效力直接相关。
当遗传易感位点和一个遗传标记间的连锁不平衡用r2度量的时候,要达到与直接检测这个易感位点相似的显著性水平,样本容量需要达到原来的1/r2倍(31)。
影响连锁不平衡的因素有等位基因频率、重组(减弱LD)、突变、群体大小(小样本群体的遗传漂移增加LD水平)、群体交配模式(异性杂交减弱LD,自体受精增强LD)、Admixture(admixture is gene flow between individuals of genetically distinct populations followed by intermating)等
5.2 关联分析(Association analysis)就是利用等位基因间连锁不平衡的关系鉴定导致生物特定性状的基因组DNA区段,研究生物基因型和表现型的关系。
存在问题:
连锁分析(1inkage analysis)和关联分析(association analysis)是遗传病学中被大家所熟悉的方法。
连锁分析是基于家系研究的一种方法,不适合于脓毒症的遗传学研究,因此关联分析被用于脓毒症的遗传学研究。
基因关联研究的基本原理是人类基因组与临床表现的关联分析的判断。
目前,对于脓毒症与基因多态性的研究主要存在下列问题:第一,缺乏重复性研究;第二,样本量不足,易犯二类错误;第三,多重的试验,如多种SNP、亚群和结果,群体结构和假关联性易犯一类错误;第四,对于许多SNP机能重要性的理解不足,候补基因的选择依赖何种途径,SNP不是主要致病因素或与致病SNP不存在高度的连锁不平衡;第五,如何确定对照组。
二、SNP分型方法汇总
按照分型原理分:
基于碱基延伸:直接测序、各种芯片、SNPscan、SNaPshot、MassArray、LDR、TaqMan 基于内切酶:RFLP、mF-RFLP
按照通量高低分:
1、全基因组水平(GWAS):illumina(Infinium HD BeadChips); Affymetrix芯片
2、候选SNP分型
1)高级多重:Golden Gate(illumina) 96×N(N=1-16)
SNPscanTM(Genesky) 48×N (N=1-10, 通常1-4)
2) 中级多重:Multiplex SNaPshot 6-30
MassArray (Sequenom) 6-30
iMLDR (Genesky) 12-30
3) 简单多重:荧光多重LDR 4-8
mF-RFLP(Genesky) 3-6个样本1-3位点
4) 单位点分析:TaqMan (ABI)
RFLP
5) 其它:直接测序(Sanger双脱氧末端终止测序法)
SNP分型方法比较及优缺点
PCR-RFLP酶切:
优点:成本节约,比较灵活,不受样本数和位点数的制约
缺点:1、酶切位点不是每个SNP都有
2、酶切准确性不是很高,判读的人为主观因素较多
3、需要5-10%样本量的测序验证
4、劳动量较大,自动化程度低
Taqman探针
优点:1、国际认可度高
2、大样本情况下成本较低
3、数据准确率高>98%
缺点:1、只能单个位点分型,多位点时实验时间较长,样本消耗多
2、试剂订购需要2-3个月
3、约1/10的位点探针合成失败或大批量样本分型无法判读
Snapshot技术:基于荧光标记单碱基延伸原理的分型技术,在一个含有测序酶、四种荧光标记ddNTP、紧邻多台位点5'端的不同长度延伸引物和PCR产物模板的反应体系中,引物延伸一个碱基即终止,经ABI测序仪检测后,根据峰的移动位置确定该延伸产物对应的SNP位点,根据峰的颜色可得知掺入的碱基种类,从而确定该样本的基因型。
通常用于10~30个SNP位点分析。
优点:1、分型准确:其准确度仅亚于直接测序,>98%的成功率和准确率
2、多位点同时检测:可以同时检测达12-20个位点
3、不受位点多态特性限制:不管该位点是G/C, A/T, G/A, C/T,甚至部分插入/缺失多态,也无论该位点处于哪条染色体上,都可以放在一个体系中检测。
4、不受样本量的限制,样本量从100-5000都可以完成,成本主要由位点决定,样本可多可少。
5、可以检测出受污染的样本:如果一个样本的分型峰谱偏离正常的分布,它可以提示该样本可能受到污染或浓度过低,而其它分型方法则不能做到这一点
6、国际认可度很高,文献很多
缺点:1. 分型成本不低:根据样本量和位点数的多少,服务价格在6-15元
2. 分型成本随着样本量的增加并不能快速降低,因此该方法在中等样本量的项目中有优势,在大样本项目中不具有优势
Mass array(质谱分析技术):基质辅助激光解吸电离飞行时间质谱(MALDI-TOF MS) 技术。
通过PCR扩增目标序列,加入SNP序列特异延伸引物,在SNP 位点上,延伸1个碱基;将制备的样品分析物与芯片基质共结晶,将该晶体放入质谱仪的真空管,强激光激发,使基质晶体升华,变为亚稳态离子,产生的离子多为单电荷离子,他们在加速电场中获得相同的动能,进而在一非电场漂移区内按照其质荷比率加以分离,在真空小管中飞行到达检测器,MALDI 产生的离子常用飞行时间(Time-of-Flight,TOF)检测器来检测,离子质量越小,就越快到达。
根据离子质量判断基因型。
优点:1、多位点同时检测(12-24)
2、成本较低(服务5-10元/SNP)
3、通量较高时间较快
4、国际认可度高,文献较多
缺点:1、位点成功率仅93%左右,处于高GC含量区域以及复杂二级结构区域的SNP不适合
2、数据准确率为95-96%,主要由于其是间接的检测方法(根据离子核质比推断SNP)
LDR连接酶:用高温连接酶实现对基因多态性位点的识别。
高温连接酶一旦检测到DNA与互补的两条寡聚核苷酸接头对应处存在着基因点突变类型的碱基错配,则连接反应就不能进行。
优点:1、成本较低,
2、不受样本数和位点数的限制
3、准确性较高,连接酶>98%
缺点:1、同时检测的位点较少,4-10个
2、国际认可度一般,需要做部分验证实验
3、实验流程较长,需2-3天,操作复杂
Snapscan TM技术:基于多重探针连接的SNP分型方法
优点:1、可进行48个以上SNP的同时分型
2、成本降低,在样本量超过1000,位点数超过48的项目中,成本是目前所有服务技术最低的
3、准确性>98%,样本符合要求,成功率>98%
缺点:1、对于高GC,复杂序列区,CNV区的位点可能不适合
2、样本要求高:总量达300ng的DNA,最好至少20ng/ul以上的浓度,Nanodrop测定OD 比值260/280在1.8~2.0,电泳测试DNA不降解
Golden gate芯片
优点:1、通量高
2、单个位点的成本低
缺点:1、数据准确性约93%,成功率93%,芯片数据准确率一般
2、芯片定制时间较长
3、不适合用于做全基因组SNP芯片数据的验证
Sanger测序法的原理是在普通的引物延伸反应中加入4中不同荧光标记的双脱氧核苷三磷酸(ddNTP),如果ddNTP被掺入到合成链中,聚合酶就无法继续延伸这条链,所以延伸产物中含有各种长度的带荧光的延伸产物,用电涌分离这些产物能得到目标DNA的序列。
分型金标准,成本较高。
Pyro: (Pyrosequencing AB, Uppsala, Sweden). Pyrosequencing is a DNA sequencing technique that is based on the detection of released pyrophosphate during DNA synthesis.In a cascade of enzymatic reactions, visible light is generated that is proportional to the number of incorporated nucleotides.The light generated provides a visual display called a pyrogram. For analysis of SNPs, the 3' end of a primer is designed to hybridize to one or a few bases upstream of the polymorphic position. The pyrogram display demonstrates a clear distinction between the various genotypes; each allele combination (homozygous or heterozygous) provides a distinct pattern when compared with the two other possible variants. Pattern recognition software is used to compare the generated pyrogram with patterns predicted for each of the three possible genotypes. (Alderborn A, Kristofferson A, Hammerling U. Determination of single nucleotide polymorphisms by real-time pyrophosphate DNA sequencing. Genome Res. 2000;10:1249–1258.)。