突变命名
- 格式:doc
- 大小:67.50 KB
- 文档页数:9
分子病理检测报告送检医院:姓名:标本条码:临床诊断:性别:年龄:科室:房/床号:申请医生:门诊/住院号:送检标本:采样时间:接收时间:实验号:女标本情况:无肉眼可见异常联系电话:1/1岁72医院标识:.Molecular Pathology:Laboratory Report医生电话:广州金域医学检验中心建议与解释:1.对患者血浆游离DNA进行22基因突变热点检测;2.该样本EGFR基因外显子20(Exon20)检测到突变,突变命名为NM_005228.3:c.2369C>T(p.T790M);3.该突变可能使患者对小分子TKI类药物(如易瑞沙和特罗凯等)靶向治疗产生耐药作用;4.该样本外显子18(Exon18)检测到多态性位点NM_005228.3:c.2175G>A(p.Thr725=)(SNP:rs55959834),目前数据表明该位点对临床没有预测作用;5.本检测用样本经病理医生对H&E玻片进行形态学评估,确定肿瘤细胞含量为20%;6.请结合其它临床信息及其它检测结果综合使用本结果。
以下信息来自相关文献,仅供参考:1.ctDNA(循环肿瘤DNA)是cfDNA(游离DNA)中的一种,约占cfDNA的0.01%-10%;多数研究认为,在肿瘤细胞坏死,凋亡及自分泌过程中均可释放一定量的ctDNA进入血液循环系统;2.ctDNA半衰期短,能准确反映肿瘤当前情况;ctDNA检测很少出现假阳性,研究发现,ctDNA与原发肿瘤组织基因组DNA具有相同的分子遗传学改变,如点突变,拷贝数变化,甲基化状态,基因重排等;3.基于ctDNA的液性活检具有重要的临床意义:(1)能反应肿瘤的整体异质性,提供更多的基因信息;(2)临床相关性高,可提前预测继发性耐药;(3)能够比影像学检查提早发现肿瘤的进展情况,可为肿瘤转移,复发和疗效监测提供更加特异,敏感的评估指标;(4)ctDNA比CTCs(循环肿瘤细胞)更灵敏,更适合成为癌症的生物学指标。
马凡氏综合征突变基因马凡氏综合征(Marfan Syndrome)是一种遗传性结缔组织疾病,以多器官受累为特征。
该综合征最早由法国儿科医生Antoine Bernard-Jean Marfan于1896年首次描述,并以其名字命名。
马凡氏综合征患者常常表现出身材高大、长而狭窄的头颅、长臂和长腿等特征。
此外,马凡氏综合征还伴有眼部、心脏、骨骼和肺部等组织的异常。
马凡氏综合征是由FBN1(Fibrillin-1)基因突变引起的,这是一个位置在染色体15上的基因。
FBN1基因编码的产品是一种称为纤维连结蛋白的结构蛋白。
纤维连结蛋白是组成胶原纤维的重要组分,而胶原纤维是构成结缔组织的重要成分之一。
因此,马凡氏综合征患者的结缔组织异常主要与纤维连结蛋白缺陷有关。
FBN1基因突变是导致马凡氏综合征的主要原因,多数情况下是由父母遗传给孩子的。
几乎所有的马凡氏综合征患者都携带有FBN1基因的突变。
这些突变可以是点突变、缺失或插入等形式,导致纤维连结蛋白的结构或功能异常。
由于FBN1基因编码的纤维连结蛋白参与到细胞外基质的形成和维持中,而细胞外基质作为细胞的支架和间质,维持了组织的整体结构和生理功能,故纤维连结蛋白的异常会影响多个器官的发育和功能。
马凡氏综合征的临床表现非常多样化,因为FBN1基因突变可能导致纤维连结蛋白的功能损害程度和类型不同。
然而,马凡氏综合征患者最常见的症状包括:身材高大、长臂长腿、狭窄的胸廓、下颌(脸窄长)和特征性的眼部异常,如眼睑下垂、眼球下沉、虹膜脱垂和晶状体脱位等。
此外,马凡氏综合征还会影响心血管系统,患者常见心脏瓣膜异常(尤其是二尖瓣和主动脉瓣)、动脉瘤和心肌病等。
肌骨系统方面,骨骼异常如脊柱侧凸、胸骨凹陷、髋关节脱位等也相对常见。
不同的FBN1基因突变类型有可能导致不同程度和不同器官的受累,这也解释了为什么马凡氏综合征的表现可以有很大的差异。
研究发现,某些突变型马凡氏综合征患者可能只具备轻微的症状,而另一些突变型患者则可能出现严重的器官病变。
egfr 共突变基因类型
EGFR(表皮生长因子受体)是一个重要的基因,它在许多肿瘤中突变,并影响细胞增殖和生长。
EGFR突变是一种常见的肺癌和其他肿瘤的驱动突变。
目前已经发现了多种不同类型的EGFR突变。
最常见的EGFR突变类型是L858R和exon 19缺失突变。
L858R 突变是指EGFR基因中的第858位氨基酸由亮氨酸(L)变为精氨酸(R),而exon 19缺失突变则是指EGFR基因中第18个外显子的部分或全部删除。
除了L858R和exon 19缺失突变外,还有一些其他较为罕见的EGFR突变类型,例如T790M突变、G719X突变和L861Q突变等。
这些突变类型在特定的肿瘤类型中可能出现,并具有不同的临床意义。
了解肿瘤中EGFR突变的具体类型对于制定个性化治疗方案非常重要。
通过检测肿瘤组织中的EGFR突变,可以选择合适的靶向治疗方法,如使用酪氨酸激酶抑制剂(TKI)来抑制EGFR突变引起的异常信号传导,从而抑制肿瘤生长和转移。
需要注意的是,不同地区和实验室可能使用不同的命名方式来描述EGFR突变类型。
基因上某一位点突变的表达方法展开全文首先是一个常问的问题:翻译SNP的时候是否是野生型和突变。
经常得到的答案是不能用翻译为突变,许多人的建议是改变或者是变异一般的共识是:序列变异(sequence variation)包括突变和多态(mutation+polymorphism)突变muation: 常常用来指导致疾病的改变(disease-causing change)多态polymorphism:经常用来指代变异频率大于1%,且常常被认为是非致病的改变(non-disease-causing change)命名:1:不同序列类型不同的前缀:命名分为基因组,cDNA,线粒体,RNA,蛋白序列上的变异,为避免混淆,添加前缀来区别:“g.” for a genomic sequence (e.g., g.76A>T)“c.” for a cDNA sequence (e.g., c.76A>T)“m.” for a mitochondrial se quence (e.g., m.76A>T)“r.” for an RNA sequence (e.g., r.76a>u)“p.” for a protein sequence (e.g., p.K76A)2:SNP的命名(因为SNP常用的是DNA水平的,所以这里写的是DNA上序列单点变异的命名法,其他的请在需要时参考文献)1)位置第一位:以ATG(翻译起始密码子)的A为+1,外显子上依次类推+2,+3...(而上游5'端第一个为-1,余类推,没有0)外显子的非编码区(UTR):5’UTR为从ATG(起始密码子)的A往5'端走分别是-1,-2,-3...3' UTR从翻译终止密码子开始3'端往下走分别是*1,*2,*3…内含子区如果在一个内含子的上游(靠近上一个外显子):以上一个外显子最后一个(3端)核酸为参照点(如为77位)则内含子上第一个核酸为77+1,第二个为77+2,余类推。
关于snp位点的命名其实并不统一,大家在文献中一般用的都是习惯或者说惯用名称。
具体表现在以下几种形式:
一、突变信息之间加上位置信息
主要有三种方式:比如说突变信息之间加上cDNA的位置,如C188T;突变信息之间加上DNA的位置,如A2546G;突变氨基酸信息之间加上氨基酸位置,如Glu145Lys。
二、按发现顺序或频率顺序拟定的惯用名称
也有几种形式,如CYP2D6*10,CYP2C9*3等。
还有一些前面加个m,表示突变,如cyp2c19m2等,还有一些也可以在文献中见到,如CYP2E1的c1>c2的突变等等。
总之形式是多样的,有时确实让人感到头晕。
你可以到下面的网址看看,也许有启发,这是CYP系列SNP的一个命名网站。
http://www.cypalleles.ki.se/
三、NCBI的rs号
ncbi里对所有提交的snp进行分类考证之后,都会给出一个rs号,也可称作参考snp,并给出snp的具体信息,包括前后序列,位置信息,分布频率等,应该说用这个rs号是比较容易确定搞明白的。
四、需要注意的地方
首先,由于基因信息的不断完善和补充,很多原来的snp位置信息都在发生变化,像C188T 这样的snp位置信息,你只需把它当成一个名字而已,千万不要真对着188这个位置去找snp.查到位置疑议,也不必惊慌,很可能就是基因信息的更迭造成的。
再次,ncbi的同一个snp可能拥有2个rs号,这也没什么,关键是你要找对了。
基因工程中常用术语的命名规则肖业臣 魏剑波(华南农业大学蚕丝科学系广东广州510642)1 基因的命名基因符号最初用英文的第1个字母的大写表示。
例如,T代表苏氨酸,T+代表苏氨酸合成酶野生型基因, T-代表苏氨酸合成酶缺陷型基因。
但随着基因数目的增多及众多突变类型的出现,这一套符号已经不够用了。
1906年,M.D e m erec等提出了一套新的基因命名规则:1)每个基因座位用3个小写斜体字母表示,这3个字母来自说明基因特性单词的前3个字母;2)表型相同基因不同的突变型,用3个字母后面加了1个大写字母表示;3)同一基因的不同突变位点用基因符号后面所加的阿拉伯符号表示,如果突变位点所属的基因还不确定,那么大写字母用一短线代替。
根据以上命名规则,组氨酸合成酶基因用h is表示,各个组氨酸合成酶基因用h is A、h is B表示;色氨酸合成酶基因用trp表示,各个色氨酸合成酶基因用trpA、trpB等表示,基因trpA的各个突变型分别用trpA23、trpA46表示。
如果一些基因的特性无法用1个词来表示,就要用2个或3个词的前3个首写字母来表示。
例如,R p l表明基因与核糖体的大亚基有关(由ribo s om al p ro tein large 3个单词的首字母组成)。
R p s则表明基因与核糖体的小亚基有关(由ribo s om al p ro tein s m all3个单词的首字母组成);ri m则与核糖体的装配、成熟有关(由ribo s o2 m al modificati on2个单词的首字母组成)。
表示突变型基因的符号,应该在座位符号右上角加上“+”或“-”来表示。
例如h is A-表示组氨酸缺陷型基因,h is A+表示相应的野生型基因,与链霉素抗药型有关的基因座位称为strR(或str2r),敏感的野生型基因是strS(或str2s)。
另外还有一些常见的基因符号代表特定的意义,譬如,inc代表不亲和性;rep代表复制;tra代表转移;fin代表致育抑制;o ri代表复制起点;D am代表DNA腺嘌呤甲基化酶等等。
tcga命名规则TCGA(The Cancer Genome Atlas)是一个国际性的合作项目,旨在通过全面研究和揭示癌症基因组的突变特征,推动癌症的个性化治疗。
为了方便数据的管理和处理,TCGA制定了一套命名规则,用于标识和分类不同的癌症类型和样本。
TCGA命名规则主要包括了癌症类型、样本编号等信息。
以下是TCGA命名规则的详细解释。
每种癌症类型都有一个特定的缩写,用于标识该类型的癌症。
这个缩写通常由癌症类型的英文名称的首字母组成,比如BRCA表示乳腺癌(Breast Cancer)、LUAD表示肺腺癌(Lung Adenocarcinoma)等。
这样的缩写主要是为了方便识别和区分不同的癌症类型。
每个样本都有一个唯一的标识号码,用于区分不同的样本。
这个标识号码通常由16个字符组成,由以下几部分组成:TCGA实例(TCGA),癌症类型缩写,病人编号,样本类型和样本编号。
其中,病人编号由两个字符组成,用于区分同一个癌症类型的不同病人,比如01表示第一个病人,02表示第二个病人等。
样本类型通常有多个,比如正常组织(Normal)、原发肿瘤(Primary Tumor)等。
样本编号由两个字符组成,用于区分同一个病人的不同样本,比如01表示第一个样本,02表示第二个样本等。
以乳腺癌为例,一个典型的乳腺癌样本的命名如下:TCGA-BRCA-01A-01D-0010。
其中,TCGA表示这是TCGA项目的一个样本,BRCA表示这是乳腺癌样本,01A表示这是第一个病人的样本,01D表示这是第一个病人的第一个原发肿瘤样本,0010表示这是第一个病人的第一个原发肿瘤样本的编号。
除了样本命名之外,TCGA还对数据文件的命名进行了规范。
一个典型的数据文件命名如下:TCGA-XX-XXXX-01A-01D-0010-XX.gene_expression.txt。
其中,TCGA表示这是TCGA项目的一个数据文件,XX表示所属的癌症类型,XXXX表示病人编号,01A表示样本类型,01D表示样本编号,0010表示样本编号,XX表示文件类型(比如基因表达等)。
COL6A1基因嵌合突变的Bethlem肌病一例并文献复习作者:吴若豪邱坤银唐文婷何展文来源:《新医学》2021年第12期【摘要】目的報道1例由COL6A1基因嵌合突变所致的Bethlem肌病,并分析该突变的致病性。
方法应用全外显子基因组测序法(trio-WES)对1例Bethlem肌病患儿及其父母进行基因测序,对检出突变进行生物信息学预测,并以“Bethlem肌病”“COL6A1”(包括中英文)为检索词在PubMed、CNKI、中华医学期刊全文数据库(MedBook)检索相关病例,在千人基因组数据库、ExAC数据库及ClinVar数据库检索患儿的基因突变。
结果经trio-WES检测发现,患儿COL6A1基因第8外显子存在1个c.868G > A (p.G290R)错义突变(突变频率为48.13%),该突变同时在其父亲外周血中检出,但突变频率仅7.89%,考虑为嵌合突变(突变频率<10%),属于新发突变(PS2);该突变导致的蛋白水平改变发生在甘氨酸位置,属于COL6A1基因热点区域突变(PM1);同时该突变在正常人群突变频率数据库中均不存在(PM2)。
经多种有害突变预测软件预测结果均提示c.868G > A (p.G290R)为有害突变(PP2+PP3),根据美国医学遗传学与基因组学学会指南判定该新发错义突变为致病性突变(PS2+PM1+PM2+PP2+PP3)。
在数据库检索到6例COL6A1基因突变所致Bethlem肌病先证者,无存在嵌合突变者。
结论 COL6A1基因c.868G >A (p.G290R)为该患儿罹患Bethlem肌病的致病原因,该突变未曾被报道,这在一定程度上扩充了COL6A1基因的变异谱。
【关键词】COL6A1基因;Bethlem肌病;嵌合突变;错义突变Identification of a novel chimeric mutation in the COL6A1 gene of a child with Bethlem myopathy: a case report and literature review Wu Ruohao, Qiu Kunyin, Tang Wenting, He Zhanwen. Department of Pediatrics, Sun Yat-sen Memorial Hospital, Sun Yat-sen University,Guangzhou 510120,ChinaCorresponding author, He Zhanwen, E-mail:*****************【Abstract】Objective To report one case of Bethlem myopathy caused by COL6A1 chimeric mutation and analyze the pathogenicity. Methods Trio-Whole Exome Sequencing (trio-WES) was conducted to detect the putative pathogenic mutations of the boy diagnosed with Bethlem myopathy and his parents. The impact of the detected mutations was predicted and validated by bioinformatics. Relevant cases were searched from PubMed,CNKI and MedBook databases using “Bethlem myopathy” and “COL6A1” as the keywords. The gene mutations of the affected child were searched from the mutation frequency databases of healthy population. Results A missense mutation ofc.868G > A (p.G290R) in the exon 8 of COL6A1 gene was identified in the child and his father by trio-WES. However, the frequency of this mutation in his father was only 7.89%(<10%), which was considered as chimeric mutation (de novo mutation)(PS2). This mutation was glycine-related mutation, which was considered as hotspot variation in the COL6A1 gene (PM1). However, this mutation was absent in major allele frequency databases (PM2). The results of multiple pathogenic variant prediction software prompt that c.868G > A (p.G290R) is a pathogenic mutation (PP2+PP3). According to the American College of Medical Genetics and Genomics (ACMG) variant classification guidelines, the variant of c.868G > A (p.G290R) in theCOL6A1 gene in this child was classified as pathogenic mutation (PS2+PM1+PM2+PP2+PP3). After literature review, six probands of Bethlem myopathy with COL6A1 variant were searched. However, Bethlem myopathy with COL6A1 chimeric variant has not been reported. Conclusion The chimeric mutation of c.868G > A (p.G290R) in the COL6A1 gene is the pathogenesis of Bethlem myopathy, which has not been reported. This case report expands the variation spectrum ofCOL6A1 gene.【Key words】COL6A1 gene; Bethlem myopathy; Chimeric mutation;Missense mutationBethlem肌病(BM)是一种常染色体显性遗传性肌病,由COL6A1基因变异所致。
新冠变异毒株命名总结随着新冠病毒在全球的持续传播,不同变异毒株的出现受到了广泛关注。
为了更好地描述这些变异毒株并方便相关研究和沟通,WHO和其他一些相关机构制定了一套变异毒株的命名规则。
本文将对这些规则进行总结和解析。
命名规则首先需要明确的是,新冠病毒变异毒株的命名不应该包含地理或人种等方面的信息,以避免种族歧视和恐慌情绪的产生。
因此,WHO制定了一个简单的字母和数字组合来表示每个毒株的特异性。
具体规则如下:1. 每个毒株都应该有一个独特的名称,名称中应包含一个(或多个)大写字母表示变异毒株的最显著特征、一个数字表示毒株的顺序、一个小写字母(可选)表示毒株的变异情况。
2. 大写字母可以从变异毒株的突变点、新功能、受影响的部位、变异后的蛋白质等方面来选择,但必须经过科学和医学共同的认可。
3. 每个新的变异毒株应该被尽可能快地提交到全球病毒数据库中,以便他人了解变异情况和及时掌握科学研究的进展。
4. 不允许使用名称中包含特定的国家或人种的词汇或符号,或给毒株命名时对任何地域或人群产生负面影响的符号或字母。
实践应用按照上述规则,已经有几个变异毒株被命名为Alpha、Beta、Gamma和Delta等名称。
比如,Alpha变异毒株是最早发现的变异毒株,它主要的突变位点是SARS-CoV-2的S蛋白上的501位置,因此被命名为B.1.1.7或Alpha。
同时,已经有一些新的变异毒株被发现,包括Epsilon、Zeta和Eta等等,它们都有着自己独特的变异情况。
这种命名规则的实践应用不仅可以避免对特定国家或民族的歧视,也更方便科学家和医生之间的沟通交流。
未来展望随着新冠病毒的全球传播,新的变异毒株的发现也将不断涌现。
因此,WHO的变异毒株命名规则将继续发挥它的作用,并在未来应用于更多的变异毒株中。
此外,命名规则的制定也将成为全球医学界和科学界的一个共识,有利于更好地沟通和交流研究成果,在全球范围内共同应对新冠病毒的挑战。
hgvs命名规则
Human Genome Variation Society(HGVS)命名规则是一种简便、一致的DNA变异的描述格式。
它主要适用于基因组学研究中的 DNA 、RNA 和蛋白序列的变异,其命名格式是根据变异的位置和性质来定义的,结构有层次组织,可以帮助识别和在质粒、基因、转录本、外显子和蛋白质水平都可以用同样的格式来描述变异。
HGVS 命名规则由类型、序列位置、变异性质及变异片段构成,用大小写字母表示;要描述变异,如点突变,在亚细胞水平要用DNA命名,用c.标识点突变,位置用基因组序列上的碱基表示,原来的碱基用小写字母表示,变异的碱基用大写号表示;如常染色体插入、插入/缺失/重复的变异,用DNA命名,要使用ins(插入)、del(缺失)、dup(重复),后接插入/缺失(重复)的碱基数量结合位点来描述;如氨基酸突变,在蛋白质水平使用p.来表示,位置用氨基酸位置表示,原来的氨基酸用三字母表示,后面用小写号表示变异后的氨基酸。
Nomenclature for the description of sequence variationsJ.T. den Dunnen, S.E. Antonarakis: Hum Genet 109(1): 121-124, 2001 Reproduced with kind permission from Prof. S. E. Antonarakis(last modified March 7, 2001)Questions and comments regarding nomenclature should be directed to Professor Stylianos Antonarakis( stylianos.antonarakis@medecine.unige.ch ) or Dr. Johan T. den Dunnen ( ddunnen@lumc.nl ). This page can also be found at the HGVS site. Contents∙Introduction∙Recommendationso Generalo DNA-levelo RNA-levelo protein-level∙Codons and encoded amino acidso genetic codeo amino acid descriptions (one / three letter code) IntroductionRecently, a nomenclature system has been suggested for the description of changes (mutations and polymorphisms) in DNA and protein sequences [Antonarakis, S.E. and the Nomenclature Working Group (1998) Recommendations for a nomenclature system for human gene mutations.Hum.Mut. 11: 1-3]. These nomenclature recommendations have now been largely accepted and stimulated the uniform and unequivocal description of sequence changes. However, current rules do not yet cover all types of mutations, nor do they cover more complex mutations. This document lists the existing recommendations and summarizes suggestions for the description of additional, more complex changes, (shown in italics) based on a manuscript published in Human Mutation [den Dunnen, JT and Antonarakis, SE (2000). Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum.Mut. 15: 7-12] (copy in PDF format).Discussions regarding the advantages and disadvantages of the suggestions are necessary in order to continuously improve the designation of sequence changes. The consensus of the discussions will be posted here and we invite investigators to communicate with us regarding these suggestions. Furthermore, we invite investigators to send us complicated cases not covered yet, with a suggestion of how to describe these (mail to ddunnen@lumc.nl and Stylianos.Antonarakis@medecine.unige.ch). We hope these pages will be used as a guide to describe any sequence change, ultimately evolving into a uniformly accepted reference for mutation nomenclature description.General recommendations(suggestions extending the current recommendations are in italtics) The term "sequence variation"is used to prevent confusion with the terms"mutation" and "polymorphism", mutation meaning "change" in some disciplines and "disease-causing change" in others and polymorphism meaning "non disease-causing change" or "change found at a frequency of 1% or higher in the population".The basic recommendation is to use systematic names to describe each sequence variation. For this, variations are described at the most basic level, i.e. the DNA level, using either a genomic or a cDNA reference sequence. A genomic reference sequence is preferred because it overcomes difficult cases, including multiple transcription initiation sites (promoters), alternative splicing, the use of different poly-A addition signals, multiple translation initiation sites (ATG-codons) and the occurence of length variations. When, like in most cases, the entiregenomic sequence is not known, a cDNA reference sequence should be used instead.∙sequence variations are described in relation to a reference sequence for which the accession number from a primary sequencedatabase (Genbank, EMBL, DDJB, SWISS-PROT) should be mentioned in the publication/database submission (e.g. M18533) ∙tabular listings of the sequence variations described shouldcontain columns for DNA, RNA and protein and clearly indicatewhether the changes were experimentally determined or onlytheoretically deduced∙to avoid confusion in the description of a sequence change, preceed the description with a letter indicating the type of referencesequence used;o"g." for a genomic sequence (e.g. g.76A>T)o"c." for a cDNA sequence (e.g. c.76A>T)o"m." for a mitochondrial sequence (e.g. m.76A>T)(from David Fung, Camperdown, Australia)o"r." for an RNA sequence (e.g. r.76a>u)o"p." for a protein sequence (e.g. p.K76A)∙to discrimintate between the different levels (DNA, RNA or protein), descriptions are unique;o at DNA-level, in capitals, starting with a number refering to the first nucleotide affected (e.g. c.76A>T)o at RNA-level, in lower-case, starting with a number refering to the first nucleotide affected (e.g. r.76a>u)o at protein level, in capitals, starting with a letter referring to first the amino acid (one-letter code) affected(e.g. p.T26P)∙ a range of affected residues is indicated by a "_"-character (underscore) separating the first and last residue affected (e.g.76_78delACT)NOTE: current recommendations use the "-"-character (i.e.76-78delACT)∙for deletions, duplications or insertions in short tandem repeats, the most 3' nucleotide is arbitrarily assigned as the nucleotide changed∙two sequence variations in one allele are listed between brackets, separated by a "+"-character (e.g. [76A>C + 83G>C])NOTE:current recommendations use the ";"-character as a separator(i.e. [76A>C; 83G>C])∙sequence changes in different alleles (e.g. for recessive diseases) are listed between brackets, separated by a "+"-character (e.g.[76A>C] + [87delG])NOTE: the current recommendation is [76A>C + 87delG] ∙ a unique identifier should be assigned to each mutation. The unique OMIM-identifier can be used, otherwise database curators shouldassign unique identifiersDNA level∙nucleotides are designated by the bases (in upper case); A (adenine),C (cytosine), G (guanine) and T (thymidine)∙nucleotide numbering;o nucleotide +1 is the A of the ATG-translation initiation codon, the nucleotide 5' to +1 is numbered -1; there is nobase 0o non-coding regions;▪the nucleotide 5' of the ATG-translation initiationcodon is -1▪the nucleotide 3' of the translation termination codonis *1o intronic nucleotides;▪beginning of the intron: the number of the lastnucleotide of the preceeding exon, a plus sign and theposition in the intron, e.g. 77+1G, 77+2T (when theexon number is known, the notation can also bedescribed as IVS1+1G, IVS1+2T)▪end of the intron: the number of the first nucleotideof the following exon, a minus sign and the positionupstream in the intron, e.g. 78-2A, 78-1G (when theexon number is known, the notation can also bedescribed as IVS1-2A, IVS1-2G)o for deletions, duplications or insertions in singlenucleotide (or amino acid) stretches or tandem repeats, themost 3' copy is arbitrarily assigned to have been changed (e.g.ACTTTG TG CC to ACTTTGCC is described as 7_8delTG) Description of nucleotide changes∙substitutions are designated by a “>”-charactero76A>C denotes that at nucleotide 76 a A is changed to a Co88+1G>T (alternatively IVS2+1G>T) denotes the G to T substitution at nucleotide +1of intron 2, relative to thecDNA positioned between nucleotides 88 and 89o89-2A>C (alternativelyIVS2-2A>C) denotes the A to C substitution at nucleotide -2 of intron 2, relative to thecDNA positioned between nucleotides 88 and 89NOTE:polymorphic variants are sometimes described as 76A/G, but this is not recommened !∙deletions are designated by "del" after the nucleotide(s) flanking the deletion siteo76_78del (alternatively 76_78delACT) denotes a ACT deletion from nucleotides 76 to 78o82_83del (alternatively 82_83delTG) denotes a TG deletion in the sequence ACTTTG TG CC (A is nucleotide 76) to ACTTTGCC o IVS2_IVS5del (alternatives 88+?_923+? or EX3_5del) denotes an exonic deletion starting at an unknown position in intron2 (after nucleotide 88) and ending at an unknown position inintron 5 (after nucleotide 923)∙insertions are designated by "ins" after the nucleotides flanking the insertion site, followed by the nucleotides insertedNOTE:as separator the "^"-character is sometimes used but this is not recommened (e.g. 83^84insTG)o76_77insT denotes that a T was inserted between nucleotides76 and 77o83_84insTG denotes a TG insertion in the TG-tandem repeat sequence of ACTTTGTGCC (A is nucleotide 76) to ACTTTGTG TG CC.Note that this sequence variation (a duplicating insertion)can also be described as a duplication, i.e. 82_83dupTG (see"duplications")∙variability of short sequence repeats, e.g. in ACTGTGTGCC (A is nt 1991), are designated as 1993(TG)3-6 with nucleotide 1993 containing the first TG-dinucleotide which is found repeated 3 to6 times in the population.∙insertion/deletions (indels) are descibed as a deletion followed by an insertion after the nucleotides afectedo112_117delinsTG (alternatively 112_117delAGGTCAinsTG or 112_117>TG) denotes the replacement of nucleotides 112 to 117(AGGTCA) by TG∙duplications are designated by "dup" after the nucleotides flanking the duplication site,o77_79dupCTG denotes that the nucleotides 77 to 79 were duplicatedo duplicating insertions in short tandem repeats (or single nucleotide stretches) can also be described as a duplication,e.g. a TG insertion in the TG-tandem repeat sequence ofACTTTGTGCC (A is nt 76) to ACTTTGTG TG CC can be described as82_83dupTG (now 83_84insTG)∙inversions are designated by "inv" after the nucleotides flanking the inversion siteo203_506inv (or 203_506inv304) denotes that the 304nucleotides from position 203 to 506 have been inverted ∙translocations (no suggestions yet)∙changes in different alleles (e.g. in recessive diseases) are described as "[change allele 1] + [change allele 2]"o[76A>C] + [76A>C] denotes a homozygous A to C change at nucleotide 76o[76A>C] + [?] denotes a A to C change at nucleotide 76 in one allele and an unknown change in the other allele ∙two variations in one allele are described as "[first change + second change]"o[76A>C + 83G>C] denotes an A to C change at nucleotide 76 anda G to C change at nucleotide 83 in the same alleleNOTE:current recommendations use the ";"-character as a separator(i.e. [76A>C; 83G>C])RNA levelSequence changes at RNA level are basically described as those at the DNA level with the following modifications/additions;∙an “r.” is used to indicate that a change is described at RNA-level ∙nucleotides are designated by the bases (in lower case); a (adenine),c (cytosine), g (guanine) and u (uracil)o78u>a denotes that at nucleotide 78 a U is changed to an A ∙when one change affects RNA-processing, yielding two or more transcripts, these are described between square brackets,separated by a “;”-charactero[r.76a>c; r.76a>c + r.73_88del] denotes the nucleotide change c.76A>C causing the appearance of two RNA molecules,one carrying this variation only and one containing inaddition a deletion of nucleotides 73 to 88 (shift of thesplice donor site to within the exon)o[r.=; r.88_89ins88+1_88+10 + r.88+2t>c] denotes the intronic mutation g.88+2T>C causing the appearance of two RNAmolecules, one normal (r.=) and one containing an insertionof the intronic nucleotides 88+1 to 88+10 with the nucleotidechange 88+2t>co[r.88g>a + r.88_89ins88+1_88+10] denotes the nucleotide change c.88G>A causing an insertion of the intronicnucleotides 88+1 to 88+10 (shift of the splice donor site toan intronic position)Protein levelSequence changes at protein level are basically described as those at the DNA level with the following modifications/additions;∙the one letter amino acid code is used, with "X" designating a translation termination codon∙Amino acid numbering;o the translation initiator Methionine is numbered as +1 Description of amino acid changes∙substitutions;o missense changesW26C denotes that amino acid 26 (Tryptophan, W) is changedto a Cysteine (C)o nonsense changesW26X denotes that amino acid 26 (Tryptophan, W) is changedto a stop codon (X)o initiating methionine (M1)Currently, mutations in the translation initiatingMethionine (M1) are mostly described as a substitution, e.g.M1V. This is not correct. Either no protein is produced orthe translation initiation site moves up- or downstream.Unless experimental proof is available, it is probably bestto report the effect on protein level as “unknown”. Whenexperimental data show that no protein is made, thedescription "p.0" might be most appropriateNOTE:polymorphic variants are sometimes described as 36L/I, but this is not recommened !∙deletions are designated by "del" after the nucleotide(s) flanking the deletion siteo K29del in the sequence C K MGHQQQCC (C is amino acid 28) denotesa deletion of amino acid Lysine 29 (K) to CMGHQQQCCo C28_M30del denotes a deletion of three amino acids, from Cysteine 28 to Methionine 30o Q35del in the sequence CKMGHQQ Q CC (C is amino acid 28) denotesa Glutamine 35 (Q) deletion to CKMGHQQCCo if a deletion creates a new amino acid at the deletion junction the change is described as an insertion/deletions,e.g. C28_M30delinsW (see below)∙insertions are designated by "ins" after the nucleotides flanking the insertion site, followed by the nucleotides insertedNOTE:as separator the "^"-character is sometimes used but this is not recommened (e.g. Q83^C84insQ)o K29_M29insQSK denotes that the sequence QSK was inserted between amino acids Lysine 29 (K) and Methionine 30 (M),changing CKMGHQQQCC (C is amino acid 28) to CK QSK MGHQQQCC o Q35_C36insQ in the sequence CKMGHQQQCC (C is amino acid 28) denotes a Glutamine (Q) insertion to CKMGHQQQ Q CC. Note thatthis sequence variation (a duplicating insertion) can alsobe described as a duplication, i.e. Q35dup (see"duplications")o if an insertion creates a new amino acid at the insertion junction the change is described as an insertion/deletions,e.g. C28delinsWV (see below)∙variability of short sequence repeats, e.g. in CKMGHQQQCC (C is amino acid 28), are designated as 33(Q)3-6 with amino acidGlutamine 33 (Q, the first repeated amino acid) found repeated 3 to 6 times in the population.∙insertion/deletions (indels)are described as a deletion followed by an insertion after the nucleotides affectedo C28_K29delinsW denotes a 3 bp deletion affecting the codons for Cysteine 28 and Lysine 29, substituting them for a codonfor Tryptophano C28delinsWV denotes a 3 bp insertion in the codon for Cysteine28, generating codons for Tryptophan (W) and Valine(V)∙duplications are designated by "dup" after the amino acids flanking the duplication siteo G31_Q33dup in the sequence CKMGHQQQCC (C is amino acid 28) denotes a duplication of amino acids Glycine 31 (G) toGlutamine 33 (Q) CKMGHQ GHQ QQCCo duplicating insertions in short tandem repeats (or single amino acid stretches) can also be described as a duplication,e.g. a HQ insertion in the HQ-tandem repeat sequence ofCKMGHQHQCC (C is amino acid 28) to CKMGHQHQ HQ CC can bedescribed as H34_Q35dup (now Q35_C36insHQ)∙frame shifting mutations; recommendations to describe these sequence changes have not yet been made. Although it is probably not useful to add much detail in this description, it might be sensible, e.g. in the case of C-terminal mutations, to include the length of the new, shifted reading frame.o R97fsX121 (alternative R97fs) denotes a frame shifting change with Arginine97 as the first affected amino acid andthe new reading frame being open for 23 amino acids。