Rapid ab initio Prediction of RNA Pseudoknots via Graph Tree Decomposition

格式：pdf
大小：213.09 KB
文档页数：11

下载文档原格式

/ 11

生物信息学中RNA序列分析与可变剪接预测方法研究

生物信息学中RNA序列分析与可变剪接预测方法研究RNA序列是生物信息学研究中的重要内容之一。

它是由DNA转录而来的一种核酸分子，在细胞中具有多种功能，如编码蛋白质、调控基因表达等。

随着高通量测序技术的发展，越来越多的RNA序列数据被产生出来，这使得RNA序列分析与可变剪接预测方法的研究变得尤为重要。

RNA序列分析是对RNA序列进行全面的研究和挖掘，以了解其功能和作用机制。

在RNA序列分析中，首先需要对RNA序列进行质量控制和清洗，以确保后续分析的准确性和可靠性。

接下来，常见的RNA序列分析方法包括差异表达分析、富集分析、转录因子结合位点预测等。

其中，差异表达分析可以帮助我们发现在不同条件下表达水平发生显著变化的基因，从而了解这些基因在生物过程中的功能和调控机制。

富集分析则可以帮助我们发现在特定的功能模块或通路中富集的基因，从而揭示这些功能模块或通路在生物过程中的作用。

转录因子结合位点预测可以帮助我们发现哪些转录因子与RNA序列相互作用，从而推断其可能的调控关系。

RNA的可变剪接是由于同一基因在转录过程中剪接方式的不同而产生的多种剪接异构体。

可变剪接对于生物体的功能多样性和复杂性起到重要作用。

因此，预测可变剪接的方式也成为RNA序列分析的一个重要研究方向。

常见的可变剪接预测方法包括基于比对的方法和基于组装的方法。

基于比对的方法首先对RNA序列进行比对，然后通过分析比对结果来预测可变剪接位点。

基于组装的方法则将RNA序列比对到基因组参考序列上，然后通过组装比对结果来预测可变剪接位点。

这些预测方法的准确性和可靠性对于研究人员能否准确地预测和理解可变剪接的功能至关重要。

除了RNA序列分析和可变剪接预测方法外，还有一些与RNA序列相关的研究方法和工具。

例如，RNA二级结构预测可以帮助我们了解RNA序列中的空间结构和相互作用方式，有助于理解其功能和调控机制。

RNA序列比对和聚类分析可以帮助我们了解不同物种或不同基因之间的相似性和差异性。

酶

在癌细胞中，液泡ATP酶是一种多亚基酶，表达于细胞质和细胞膜上，严重影响转移行为。V-ATPase a2同工型的可溶性紧密结合的N-末端结构域与体外诱导的巨噬细胞特性相关联。

以小鼠4T-1乳腺癌为模型，做对照实验，发现，与敲除a2V的4T-1细胞共同培养的巨噬细胞在体外产生较少量的致瘤因子，其抑制T细胞火花和增殖的能力降低。
施一公研究组发文报道分辨率为4.3埃的人体γ分泌酶电镜结构。首次准确定位出该复合体的 4个膜蛋白亚基各跨膜螺旋的位置和组装方式。这些研究结果为理解阿尔兹海默病的发病及开发可能的治疗药物提供了分子基础。
突变酶Duang~的一下将A型和B型血转化为o型血，可以输给任意血型的病人
加拿大不列颠哥伦比亚大学利用基因工程制造了一种来源于肺炎链球菌的98糖苷水解酶经过5代进化的突变酶，可以除去A抗原和B抗原的终端N-乙酰半乳糖胺得到万能的O型血。
感想

1、刷新三观，增进兴趣 2、扩大专业知识、常识面（酵素、亚裔人喝酒脸红、非处方酶、超氧化物歧化酶sod） 3、用辩证观点来看我们这个学科，它可能带来的益处有着巨大的诱惑力，但是需要面对的风险和困难（方法学薄弱、样本量小、异质性大、难以归纳）也更大。 4、要想深入了解专业的发展动向，不能仅仅局限于SCI、NATURE等数一数二的杂志。
这表明，若肿瘤细胞中不存在a2V，肿瘤微环境中的驻留型巨噬细胞群发生改变，从而影响肿瘤生长。借助宿主免疫系统，可通过对肿瘤细胞a2V的靶向抑制来控制肿瘤的生长。
阿尔兹海默症-------最严峻的老年神经退行性疾病之一

与γ-分泌酶有关
很多药物研究直接以该酶为靶点，因此，获取它的三维结构至关重要。
艾滋病、糖尿病、肿瘤、

受损细胞释放的RNA可促进凝血

受损细胞释放的RNA可促进凝血
佚名
【期刊名称】《基础医学与临床》
【年(卷),期】2007(27)5
【摘要】德国Justus-Liebig大学的KlausT·Preissner博士及其同事于4月10日的《美国科学院院刊》（ProcNatlAcadsciusA，2007；104：6388-6393．）上报告一项新研究的结果显示，从受损细胞释放的RNA是血液凝固的一种关键性辅助因子。

【总页数】1页(P543-543)
【关键词】细胞释放;RNA;凝血;辅助因子;血液凝固;科学院;美国;博士
【正文语种】中文
【中图分类】R962
【相关文献】
1.水蛭提取液对凝血酶诱导血管内皮细胞释放凝血因子的影响 [J], 黄莺;张晓青;李龙;易小民;田雪飞
2.凝血酶受体激活肽和凝血酶对人成纤维细胞释放TGF-β和VEGF的影响 [J], 黄跃生;杨宗城
3.凝血酶受体激活肽和凝血酶对人成纤维细胞释放TGF-β和VEGF的影响 [J], 黄跃生;杨宗城
4.环状RNA circHIPK3调控微RNA124表达促进口腔鳞状细胞癌细胞增殖的研究[J], 王军;赵思语;欧阳少波;黄自坤;罗清;廖岚
5.促进细胞代谢的ATP能促进病毒粒子释放RNA [J], 周昀陵
因版权原因，仅展示原文概要，查看原文内容请购买。

垂体特异性转录因子研究进展

垂体特异性转录因子研究进展
高岩
【期刊名称】《国外医学：内分泌学分册》
【年(卷),期】1995(015)003
【摘要】垂体特异性转录因子于垂体前叶表达，是垂体生长激素细胞，催乳素细胞以及促甲状腺素细胞基因转录的调节者，能特异地识别基因上的ＤＮＡ序列，并与之结合，进而影响该基因转录。

本文阐述已发现的两种垂体特异性转录因子（Ｐｉｔ－１和Ｐｉｔ－２）对垂体激素分泌功能及垂体疾病发病机理的重要意义。

【总页数】5页(P116-120)
【作者】高岩
【作者单位】无
【正文语种】中文
【中图分类】R584.02
【相关文献】
1.小干扰垂体特异性转录因子-1对大鼠垂体生长激素肿瘤细胞侵袭行为的影响 [J], 范月超;张慧;李锦晓;李中林;纵振坤;陈晨;冯力;苗发安;谢满意
2.垂体特异性转录因子与家畜经济性状关系的研究进展 [J], 李佳蓉;贾超;姜怀志
3.牛垂体特异性转录因子POU1F1研究进展 [J], 赵静雯;孙少华;吴慧光
4.猪垂体特异性转录因子的研究进展 [J], 王政坤;王钦德
5.垂体特异性转录因子1基因遗传多态性与中国西门塔尔牛生长及育肥性状的遗传效应分析 [J], 陈翠;许尚忠;张立敏;陈晓杰;刘喜冬;李姣;张路培;高雪;高会江;李俊雅
因版权原因，仅展示原文概要，查看原文内容请购买。

蛇毒素蛋白毕赤酵母表达进展

另外, 还有其它的类凝血酶也在毕赤酵母中得到了成功表达 [ 10 12] ( 表 1 ) 。最近, Y ang 等 [ 13] 用 pPIC9K
88
中国生物工程杂志 China B io techno logy
Vo.l 30 N o. 10 2010
1 丝氨酸蛋白酶的酵母表达
蛇毒丝氨酸蛋白酶 ( sn ake venom serine p roteases, SV SP) 在蝰科蛇毒中含量较高, 此类蛋白酶的活性中心有 H is( A rg) A sp S er三联体结构, 其活性可被丝氨酸修饰剂苯甲基磺酰氟 ( PM SF ) 或二异丙基磷酸 ( DFP )
类凝血酶
B atroxobin
H arob in A nc rod C a lobin G loshedobin
独特活性 SVSP
型
蛇毒金属蛋白酶
( SVM P)
型
L 氨基酸氧化酶
P roteinC activa tor
A lbo fibrase A lfim eprase F ibrinogenase IV E chistatin Rhodostom in
H a lydin J erdon itin A lbola tin Saxat ilin heterodim e r)
神经毒素 ( NT ) 血管收缩素
p resyn ap tic NT
PLA 2 B ung arotox in
p os ts ynap tic NT
迄今为止, 超过 30 种的蛇毒类凝血酶已被发现, 由于在血栓治疗方面的重要用途, 因而这类 SV SP 是蛇毒蛋白酵母表达研究最多的。 Yang等 [ 6 7] 用 pP IC9K 载体在毕赤酵母 GS115 中分别表达了长白山白眉蝮蛇 ( G loyd ius ussuriensis)类凝血酶 gu ssurobin 基因和蛇岛蝮 ( G loyd ius shedaoensis)的 g loshedobin 基因, 重组类凝血酶蛋白 Gu ssu rob in含有 260个氨基酸和 12个半胱氨酸残基, 产量为 3. 5 m g /L, 分子量为 28kD a( 理论值为

环状RNA(circRNA)研究现状及2016年各月份研究总结

环状RNA研究现状环状RNA（circRNA）是一类特殊的非编码RNA分子，也是RNA领域最新的研究热点。

与传统的线性RNA（linear RNA，含5’和3’末端）不同，circRNA分子呈封闭环状结构，不受RNA外切酶影响，表达更稳定，不易降解。

在功能上，近年的研究表明，circRNA分子富含microRNA（miRNA）结合位点，在细胞中起到miRNA海绵（miRNA sponge）的作用，进而解除miRNA对其靶基因的抑制作用，升高靶基因的表达水平，进而在转录水平上对靶基因进行调控；这一作用机制被称为竞争性内源RNA（ceRNA）机制。

通过与疾病关联的miRNA相互作用，circRNA在疾病中发挥着重要的调控作用。

一、环状RNA相关研究环状RNA（circular RNA, circRNA）是一类新近发现的内源性非编码RNA（non-coding RNA, ncRNA），是最近RNA领域的最新的研究热点。

Sanger等在20世纪70年代发现某些高等植物中存在可致病的单链环状类病毒，这是人类首次发现cirRNA。

随后人们陆续在酵母线粒体、丁型肝炎病毒中发现cirRNA，也发现在小鼠的睾丸内含有性别决定基因（sex-determining region Y, Sry）转录而来的cirRNA，还证实了cirRNA也存在于人体细胞。

随着高通量测序及生物信息学的快速发展，越来越多的cirRNA在生物体中被发现，它们广泛的，多样化的，保守的，稳定的存在于真核生物中，丰富了RNA家族的内容。

不同物种中鉴定的cirRNA数目是不一样的，总结目前已经发表的研究中cirRNA的鉴定情况如下：刘骏武研究植物水稻的cirRNA，利用总RNA-seq数据，分别利用find_circ、STAR、CIRI三种软件同时进行预测，只选取具有canonical位点的环状RNA，分别找出20、155、69 个，取三个软件结果的并集，总共有213 个环状RNA。

果蝇也能产生一种系统性RNAi反应

果蝇也能产生一种系统性RNAi反应
佚名
【期刊名称】《昆虫知识》
【年(卷),期】2009()3
【摘要】果蝇和其他昆虫已知能够发起涉及RNA干涉（RNAi）的局部抗病毒防卫。

以前人们认为，果蝇不能系统性地展开RNAi反应，依据是所表达的内生RNA发夹不会从一个细胞向另一个细胞展开。

【总页数】1页(P331-331)
【关键词】RNAi;果蝇;反应;系统;RNA干涉;抗病毒;细胞;昆虫
【正文语种】中文
【中图分类】Q522;Q969.462.2
【相关文献】
1.一种不产生核废物的加速器反应堆 [J], 张果
2.系统性红斑狼疮患者外周血淋巴细胞增殖反应与其产生甲硫氨酸脑啡肽的关系[J], 徐星铭;戴宏;魏伟
3.一种连续产生次氯酸钠的管状电化学反应杀菌器 [J], 曾新安;秦贯丰;李忠彦
4.学懂有关药物基本知识,引导安全用药(三)成的一种适应状态,用药者渴求定期使用某药物,以得到欣快感。

中断用药后会产生严重的戒断反应, [J], 陈孝治;
5.一种黄明龙还原反应副产物产生的机理分析 [J], 刘海涛;蔡亮;郭强;尹登山;王小伟
因版权原因，仅展示原文概要，查看原文内容请购买。

RNAi基因组免疫学

十多年前，Jorgensen和他的同事们试图加深矮牵牛花的紫色，就将一个强启动子控制下的色素基因转入，但结果非常惊奇，许多花并不是紫色而是杂色，甚至是白色。
Jorgensen 称这种现象为共抑制（cosuppression），即转入的外源基因和其本身的同源基因都被抑制-出现基因沉默（silencing），后来发现许多植物都有这种现象。
Double-stranded RNA
sense
antisense
inject
Neg. control
Uninjected
C. elegans
Antisense RNA dsRNA
Mex-3 mRNA detection in embryos by in situ hybridization
Examples of RNAi
2002年6月份《自然医学》杂志，刊登了美国的科学家用短干预RNA（siRNA）的方法，使细胞表面HIV病毒结合受体-CD4的基因沉默，结果降低了细胞内病毒的拷贝数，这的确给我们最后征服艾滋病带来希望。
除此之外，也许我们根据HIV的mRNA的序列，制备出双链 RNA （ dsRNA ）作为 HIV 的疫苗，当它们进入细胞后可以降解HIV的 mRNA，从而阻止HIV在细胞内的复制。另外，癌症的治疗同样可以用RNAi的方法，
RNAi与免疫作用的比较
脊椎动物的免疫系统对于病毒等病原微生物的入侵采用两步战略：通过基因重排，使有限的基因片段组成很多抗体编码基因，各种抗体编码基因分布在大量细胞之中。感染以后，通过克隆选择和相应细胞的扩增，产生针对免疫原的特异性免疫应答。
脊椎动物的免疫细胞-T、B淋巴细胞，在早期发育阶段通过阳性和阴性选择，将能识别自身抗原的克隆排除，从而解决了特异性的问题。RNAi系统的作用也具有同样的特异性，当病毒进入，只出现病毒基因的沉默现象，而机体的其它基因表达正常。

干扰素基因刺激因子激动剂的研究进展

收稿日期：２０２３－１２－１５，修回日期：２０２４－０１－１７基金项目：国家自然科学基金青年项目（Ｎｏ８１８０３５９２）；中国医学科
学院医学与健康科技创新工程项目（Ｎｏ２０２１－１Ｉ２Ｍ０３０）作者简介：王娅（１９９８－），女，博士生，研究方向：抗病毒药理，Ｅｍａｉｌ：ｗａｎｇｙａｉｍｂ＠１２６．ｃｏｍ；李玉环（１９７０－），女，博士，研究员，博士生导师，研究方向：抗病毒药理，通信作者，Ｅｍａｉｌ：ｙｕｈｕａｎｌｉｂｊ＠１２６．ｃｏｍ
ＥＲＩＳ）为靶点的药物开发已成为近年创新药领域的研发热点。１ＳＴＩＮＧ蛋白的结构干扰素基因刺激因子（ｓｔｉｍｕｌａｔｏｒｏｆｉｎｔｅｒｆｅｒｏｎｇｅｎｅｓ，ＳＴＩＮＧ，又称为ＴＭＥＭ１７３／ＭＰＹＳ／ＭＩＴＡ／ＥＲＩＳ）作为一种先天性免疫调节信号分子于２００８年首次报道，主要位于细胞内质网（ｅｎｄｏｐｌａｓｍｉｃｒｅｔｉｃｕｌｕｍ，ＥＲ），由ＴＭＥＭ１７３基因编码，人源ＳＴＩＮＧ（ｈＳＴＩＮＧ）由３７９个氨基酸组成，鼠源ＳＴＩＮＧ（ｍＳＴＩＮＧ）由３７８个氨基酸组成，两者相似度约８１％。ｈＳＴＩＮＧ的Ｎ末端结构域（Ｎｔｅｒｍｉｎａｌｄｏｍａｉｎ，ＮＴＤ）主要分为４个跨膜结构域（ＴＭ１～４），与其细胞内位置密切相关。Ｃ末端结构域（Ｃｔｅｒｍｉｎａｌｄｏｍａｉｎ，ＣＴＤ）包括配体结合结构域（ｌｉｇａｎｄｂｉｎｄｉｎｇｄｏｍｉａｎ，ＬＢＤ）和Ｃ尾端结构域（Ｃｔｅｒｍｉｎａｌｔａｉｌ，ＣＴＴ）两部分，ＬＢＤ负责与２′３′ｃＧＡＭＰ等配体相结合诱导ＳＴＩＮＧ蛋白构象转变，ＣＴＴ负责与ＴＡＮＫ结合激酶１（ＴＡＮＫｂｉｎｄｉｎｇｋｉｎａｓｅ１，ＴＢＫ１）和干扰素调节因子３（ｉｎｔｅｒｆｅｒｏｎｒｅｇｕｌａｔｏｒｙｆａｃｔｏｒ３，ＩＲＦ３）相结合，促进信号的级联传导。２ｃＧＡＳＳＴＩＮＧ通路激活及其应用环状ＧＭＰＡＭＰ合成酶（ｃｙｃｌｉｃＧＭＰＡＭＰｓｙｎｔｈａｓｅ，ｃＧＡＳ）包含一个核苷酸转移酶结构域和两个ＤＮＡ结合结构域，当胞质中无双链ＤＮＡ（ｄｏｕｂｌｅｓｔｒａｎｄｅｄＤＮＡ，ｄｓＤＮＡ）时，ｃＧＡＳ会以自我抑制的状态存在，当存在ｄｓＤＮＡ时，ｃＧＡＳ会与ｄｓＤＮＡ以序列非特异性的方式相结合，催化ＡＴＰ和ＧＴＰ环化，形成２′３′ｃＧＡＭＰ。２′３′ｃＧＡＭＰ是内源性环二核苷酸（ｃｙｃｌｉｃｄｉｎｕｃｌｅｏｔｉｄｅ，ＣＤＮ）类化合物，除此之外，在细菌感染的过程中也会产生其他种类的ＣＤＮｓ，如２′２′ｃＧＡＭＰ、３′３′ ｃＧＡＭＰ、ｃｄｉＡＭＰ、ｃｄｉＧＭＰ等。作为ＣＤＮｓ的下游受体蛋白，静息状态下ＳＴＩＮＧ以同源二聚体形式存在，与ＣＤＮｓ结合后形成四聚体或多聚体复合物，多聚化ＳＴＩＮＧ脱离内质网，转位到高尔基体，募集ＴＢＫ１和ＩＲＦ３。ＴＢＫ１二聚体自身磷酸化激活其激酶活性，然后将ＳＴＩＮＧＣＴＤ磷酸化并将ＩＲＦ３磷酸化激活，活化的ＩＲＦ３二聚化并转位到细胞核，促进Ⅰ型干扰素（ｉｎｔｅｒｆｅｒｏｎ，ＩＦＮ）基因的转录表达。除此之外，ＳＴＩＮＧ激活后还会激活激酶ＩＫＫ，催化ＩκＢ蛋白的磷酸化，磷酸化的ＩκＢ蛋白通过泛素－蛋白酶体途径降解，从而将ＮＦκＢ释放

石胆酸通过上调miR-21表达降低核受体PPARα_mRNA的稳定性

除酒精、病毒、药物等脂肪肝致病因素外，大泡性肝细胞脂肪变性超过5%诊断为非酒精性脂肪性肝病（NAFLD ），其疾病谱包括单纯性脂肪肝、非酒精性脂肪性肝炎（NASH ）、肝纤维化、肝硬化和肝癌［1,2］。

2018年NAFLD 全球患病率高达25%［3］。

我国NAFLD 患病率已赶超欧美发达国家，严重危害国民健康和社会发展［1］。

脂肪肝是NAFLD 早期阶段，肝脏脂肪分解利用与新生脂肪生成的失调是重要的致病风险因素，因此，促进肝内脂肪酸β-氧化分解，降低肝脏脂肪堆积可有效延缓或阻止NAFLD 发展。

Lithocholic acid decreases mRNA stability of nuclear receptor PPAR αby upregulating miR-21expression in hepatoma HepG2cellsP ANG Biying,HUANG Nana,HUANG Xiaoxia,LI Xin,XIONG Wenting,KONG Bo,YAO Yan School of Life Science,Guangzhou University,Guangzhou 510000,China摘要：目的探讨石胆酸在转录后水平对PPAR αmRNA 稳定性的调控。

方法构建PPAR α3'UTR 荧光素酶报告基因载体并转染HepG2细胞，比较两种浓度石胆酸处理后荧光素酶活性的变化。

结合生物信息学预测，确定石胆酸诱导下差异表达的miRNAs 及其在3'UTR 上的潜在结合位点，对结合位点进行突变（Mut1、Mut2和Mut1+Mut2），比较石胆酸处理后Mut1、Mut2和Mut1+Mut2荧光素酶活性的变化。

利用Western blot 和RT-qPCR 检测两种浓度石胆酸处理下信号通路的激活及其下游转录因子基因表达水平，比较不转染对照组和瞬时转染过表达转录因子的PPAR α蛋白表达，确定参与石胆酸调控PPAR αmRNA 稳定性的信号通路和转录因子。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Rapid ab initio Prediction of RNA Pseudoknots via GraphTree Decomposition∗Jizhen Zhao†,Russell L.Malmberg‡and Liming Cai†AbstractThe prediction of RNA secondary structure including pseudoknots remains a challenge due to the in-tractable computation of the sequence conformation from nucleotide interactions under free energy models.Optimal algorithms often assume a restricted class for the predicted RNA structures and yet still require ahigh-degree polynomial time complexity,which is too expensive to use.Heuristic methods may yield time-eﬃcient algorithms but they do not guarantee optimality of the predicted structure.This paper introducesa new and eﬃcient algorithm for the prediction of RNA structure with pseudoknots for which the structureis not restricted.Novel prediction techniques are developed based on graph tree decomposition.In partic-ular,based on a simpliﬁed energy model,stem overlapping relationships are deﬁned with a graph,in whicha specialized maximum independent set corresponds to the desired optimal structure.Such a graph is treedecomposable;dynamic programming over a tree decomposition of the graph leads to an eﬃcient optimalalgorithm.Theﬁnal structure predictions are then based on re-ranking a list of suboptimal structures under amore comprehensive free energy model.The new algorithm is evaluated on a large number of RNA sequencesets taken from diverse resources.It demonstrates overall sensitivity and speciﬁcity that outperforms or iscomparable with those of previous optimal and heuristic algorithms yet it requires signiﬁcantly less time thanthe compared optimal algorithms.Keywords:RNA secondary structure prediction,pseudoknot,thermodynamic energy,tree decomposition,tree width,maximum independent set1IntroductionThe secondary structure of an RNA molecule is formed due to short or long distance pairings between nucleotides in the sequence.Base pair regions either single,nested or parallel are called stem-loops;base pair regions crossing each other are called pseudoknots[30].Pseudoknots are important structures in RNA molecules and often play important functional roles such as catalysis,RNA splicing,transcription regulation[15,2,25].Knowing the secondary structures of RNA molecules is critical for determining their three dimensional structures and understanding their functions.Automated prediction of RNA secondary structure is thus in demand since it is expensive and time consuming to experimentally determine the structure.It is computationally challenging to predict RNA secondary structure including pseudoknots.In particular, the problem of predicting RNA pseudoknots with the minimum free energy is provably NP-hard for the nearest neighbor model[17].Practical approaches to cope with this computational challenge are either to restrict the class of pseudoknots under consideration or to employ heuristics in the algorithms.Optimal algorithms for restricted pseudoknot classes are usually thermodynamics-based extended from Zuker’s algorithm for the prediction of pseudoknot-free structures[33].In such algorithms,the predicted optimal structure of a single RNA sequence is the one with the global minimum free energy based on a set of thermodynamical parameters.For example,the recently developed PKNOTS[21]can handle the widest classes of pseudoknots.However,its time complexity O(n6)makes it infeasible to fold RNA sequences of a moderate length.The computational eﬃciency may be improved at the cost of further restricting the structure of pseudoknots[3,17,31],but still with the time ∗The preliminary version of this paper appeared in the proceedings of the6th Workshop on Algorithms for Bioinformatics(WABI 2006).†Department of Computer Science,University of Georgia,Athens,GA30602,USA.Email:{jizhen,cai}@,Fax: (706)542-2966.To whom correspondence should be addressed.‡Department of Plant Biology,University of Georgia,Athens,GA30602,USA.Email:russell@complexity O(n5)or O(n4).Most such algorithms produce only the optimal solution,while suboptimal ones that may reveal the true structure are often ignored.However,the partition function approach,based on the calculation of equilibrium base-pairing probabilities,captures the contributions of all suboptimal structures[9].On the other hand,computationally eﬃcient heuristic methods have also been explored to allow unrestricted pseudoknot structures.Iterated loop matching(ILM)[23]is one such method.Itﬁnds the most stable stem, adds it to the candidate secondary structure and then masks oﬀthe bases forming the stem and iterates on the left sequence segments until no other stable stem can be found.One structure is reported at the end.Another algorithm,HotKnots[20],does the prediction in a slightly diﬀerent way.It keeps multiple candidate structures rather than only one and builds each of them in a similar but more elaborate way.These methods can usually be fast,yet they often do not provide an optimality guarantee for the predicted structure or a quality measure on the predicted structure with respect to the optimal structure.Other heuristic methods based on genetic algorithms usually do not address the optimality issue either[8,1].In this paper,we introduce a novel approach for the optimal prediction of RNA pseudoknots for which the structure is not restricted.Our method is based on a simpliﬁed free energy model without accounting for loop energies[19,23].In this method,stable stems are selected from an RNA sequence as vertices of a graph;vertices are connected with edges if corresponding stems conﬂict(i.e.,overlap)in their positions in the sequence.The optimal structure of an RNA sequence corresponds to a collection of non-conﬂicting stable stems,which can be found by seeking the maximum weighted independent set(WIS)from the graph.We observe that stable stems can be so selected that the resulting graph is of a moderately small tree width t.Based on a tree decomposition of the graph,a dynamic programming algorithm for WIS of the worst-case time complexity O(2t n)is obtained, where n is the number of vertices in the graph,at most quadratic in the length of the RNA sequence.This is an eﬃcient prediction algorithm parameterized on the tree width t,which is usually small.In addition,the algorithm re-ranks a list of suboptimal structures based on the more comprehensive energy functions used in PKNOTS[21].We implemented our algorithm TdFOLD and evaluated its performance on various RNA sequence sets from diﬀerent sources.The test results showed high eﬃciency and high accuracy for our algorithm.TdFOLD was tested against PKNOTS,ILM and HotKnots on a set of50tRNA’s,a set of50small RNA sequences containing pseudoknots with length ranging from23to113,and a set of11large RNA’s with length range from210to 412.The results showed that overall,in terms of the sensitivity and speciﬁcity of the prediction,TdFOLD outperforms the optimal algorithm PKNOTS and the heuristic algorithms ILM and HotKnots.In time eﬃciency, it outperforms PKNOTS and HotKnots,and is comparable with ILM.Graph theoretic methods have previously been explored for RNA structure prediction[30].Our method is diﬀerent from the previous ones in two respects.Our graphs constructed from the RNA sequence contain vertices describing stems instead of nucleotides;making stem to be the smallest structural unit can greatly simplify the complexity of the problem.More importantly,our graph algorithm takes advantage of the tree decomposition technique on the formulated graphs.In fact,it has been demonstrated that the RNA secondary structure can be proﬁled with a conformational graph of small tree width[27].The underlying graph constructed for the ab initio structure prediction is essentially an augmentation of the conformational graph in which additional vertices and edges are added only for the overlapping stems,thus inheriting the tree decomposability which makes the algorithm eﬃcient.We note that our algorithm,like other ab initio ones,is suitable for predicting the structure of single RNA sequence.When related structurally homologous sequences are available,the accuracy of RNA structure predic-tion can usually be improved through the use of comparative analysis.This uses the information of the covarying residues in a set of multiple sequences or additional phylogenetic relationship of these sequences and may produce the most reliable prediction for the consensus structure[11,24,16,14,29].Because such methods inevitably involve multiple sources of data or computational tools,they usually rely on human intervention.Nevertheless,a fully automated comparative analysis process was proposed[11,10]for RNA consensus structure prediction.Due to its computational complexity,the process has only been implemented for pseudoknot-free RNAs[11].With algorithm TdFOLD,and a tree decomposition based structure-sequence alignment algorithm we developed earlier [27],the eﬃcient implementation of this fully automated process for pseudoknots becomes possible.The paper concludes with a presentation of how TdFOLD could be used in automatid comparative RNA analysis.Figure1:(a)Ten stable stems in Ec P k4,the fourth pseudoknot in E.coli tmRNA molecule,including their left and right regions,and thermodynamic energies;(b)stem graph for Ec P k4;and(c)a tree decomposition of the stem graph with tree width4.2Methods and AlgorithmGiven an RNA sequence,our algorithmﬁrst builds a pool of stable stems.A secondary structure is indeed a set of compatible stems(i.e.the stems do not share one or more bases between each other).Based on the observation that a set of compatible stems with minimum or near minimum total stem energies tend to be in the true structure,our algorithmﬁrstﬁnds a number of secondary structures with minimum or near minimum total stem energies by a tree decomposition based procedure for a graph formed by the stable stems.These secondary structures are then reordered by counting the stem stabilizing and loop destabilizing energies together, and reported as the predicted structures.2.1Problem formulationA(canonical)base pair is either a Watson-Crick pair(A-U or C-G)or wobble pair G-U.A stem is a set of stacked nucleotide base pairs on an RNA sequence s.In general a stem S can be associated with four positions (i l,j l,i r,j r),where i l<j l<i r<j r,on the sequence s such that(a)(s[i l],s[j r])and(s[j l],s[i r])are two canonical base pairs;and(b)for any two base pairs(s[x],s[y]),(s[z],s[w])in the stem S,either i l≤x<z≤j l and i r≤w<y≤j r,or i l≤z<x≤j l and i r≤y<w≤j r.Region s[i l..j l]is the left region of the stem and s[i r..j r]is the right region of the stem.Stem S is stable if the formation of its base pairs allows the thermodynamic energy∆(S)of the stem to be below a predeﬁned threshold parameter E<0.Figure1(a)shows all the stable stems in Ec P k4with E=−5kcal/mol,the fourth pseudoknot in E.coli tmRNA[32],and their corresponding free energy values.A stem graph G s=(V,E)can be deﬁned for the RNA sequence s,where each vertex in V uniquely represents a stable stem on s,and E contains an edge between two vertices if and only if the corresponding two stems (a,b,c,d)and(x,y,z,w)conﬂict in their positions,i.e.,one or both of the regions s[a..b]and s[c..d]overlap with at least one of the regions s[x..y]and s[z..w].Figure1(b)shows the stem graph for Ec P k4constructed according to the stable stems given in Figure1(a).The stem graph is a weighted graph,with a weight on every vertex. Usually,the weight of a vertex can simply be absolute value of the thermodynamic energy∆(S)of the stem S corresponding to the vertex.The weight may also be adjusted by scaling it(non-)linearly according to the length of the corresponding stem or the distance between the left and right regions of the stem.The problem of predicting the optimal structure of the RNA then corresponds toﬁnding a collection of non-conﬂicting stems from its stem graph which achieves the maximum total weight.This is exactly the same as the graph theoretic problem:ﬁnding the maximum weighted independent set(WIS)in the stem graph.Note the optimality of the secondary structures depends on the total stem energies only(this was previously adopted by both the primitive method[19]and the more elaborate one[23]too).In order to rectify the possible bias caused by not counting the loop energies,we output optimal as well as a number of sub-optimal struc-tures.These structures are then re-ordered according to the whole energies including stem stabilizing and loop destabilizing energies,and reported as the predicted structures.2.2Identifying stable stemsFor our purpose,stable stems are deﬁned according to a set of parameters (in addition to the energy threshold parameter E ).In particular,a stem contains at least P base pairs;the loop length in between the left and right region of the stem is at least L ;free energy using only Turner’s base stacking energy parameters is at most E .Bulges within a stem are allowed,for which the stem essentially becomes a set of substems separated by the bulges.In addition,parameter T limits the minimum substem length,and parameter B limits the maximum bulge length.The thermodynamic energy ∆(S )of stem S is calculated by taking into account both the stacking energies and the destabilizing energies caused by bulges.The default values for parameters P,L,E,T,and B are set to 3,3,−5,3,0according to our previous experiments,but they may be adjusted according to diﬀerent class of RNA’s.We set B to 0since it works well for a lot of RNAs.Besides 0,1and 2are good choices for B for some other RNAs.We leave the choice to the users.A procedure similar to the one used in [14]is employed to identify all the stable stems.These stable stems are called the initial stable stem pool.If two stems only share a few bases,we may resolve the conﬂict by considering some maximal substems of these two stems.For example,one stem A formed by s [a..a +9]and s [b −9..b ]conﬂicts with another stem B formed by s [b −1..b +8]and s [c..c +9],we could add two more stems formed by s [a +2..a +9]and s [b −9..b −2],s [b +11..b +8]and s [c..c +7],which are shortened from A and B respectively.This is controlled by a switch parameter S .If it is on,an extended stable stem pool will be built:all the stems in the initial stem pool will be imported into it;for each pair of conﬂict stems,get the maximal sub-stems that can resolve the conﬂict and meet the requirements deﬁned by the parameters P,L,E,T,B and add them to the extended pool.If it is oﬀ,we just use the initial pool as all of the stable stems.According to our experiments,the use of the extended pool can improve the accuracy for some but not all of the RNAs.We also noticed that the longer a sequence is,the more the spurious stems will be produced.We incorporated some biological knowledge to reduce the spurious stems for long sequences,e.g.parameter adjusting according to the properties of some RNAs,not allowing G-U pairs at the ending of a stem.2.3Tree decomposition based algorithmDeﬁnition [22]A tree decomposition of graph G =(V,E )is a pair (T,X )if it satisﬁes:1.T =(I,F )is a tree with node set I and edge set F ,2.X ={X i :i ∈I,X i ⊆V },i X i =V and ∀u ∈V ,∃i ∈I such that u ∈X i ,3.∀(u,v )∈E ,∃i ∈I such that u,v ∈X i ,4.∀i,j,k ∈I ,if k is on the path that connects i and j in tree T ,then X i ∩X j ⊆X kThe width of a tree decomposition (T,X )is deﬁned as max i ∈I |X i |−1.The tree width of the graph G is the minimum tree width over all possible tree decomposition of G .If T is restricted to be a path,we refer to (T,X )as a path decomposition and the best width over all of the path decompositions as the path width of G .The tree decomposition is rooted in the deep graph minor theorems by Robertson and Seymour [22].It provides a topological view on a graph and the tree width measures how much the graph is “tree-like”.Figure 1(c)shows a tree decomposition for the stem graph given in Figure 1(b).Many computationally intractable graph problems can be easily solved on graphs of small tree width.In particular,a large number of such graph problems,while intractable on general graphs,can be solved in linear time,given a tree decomposition of tree width ≤t ,for a ﬁxed t .Maximum weighted independent set (WIS)is one such problem [5];it has the time complexity O (2t n ).The factor 2t is due to the dynamic programming enumeration,in each node of the tree decomposition,of all partial independent sets formed by the t vertices.2.3.1Algorithm detailsNow we describe the tree decomposition based dynamic programming algorithm that ﬁnds the maximum weighted independent set from the stem graph G =(V,E ).It assumes a binary tree decomposition (T,X ),where X =∪X m i =1,for the stem graph,where m =O (|V |),|X i |=t ,for i =1,...,m .We only discuss the process for achieving the optimal solution.The technical details for getting suboptimal solutions are similar.The algorithm constructs one dynamic programming table m i for every tree node X i ={v 1,...,v t }.Table m i records all possible partial independent sets in the subgraph induced by the set of all the vertices in theFigure2:Dynamic programming table construction over tree decomposition.Table m i is computed also based on the computed tables m k and m j.Row f=(x0,...,x h,...,x m,...,x r,...,x t)in table m i is computed from row g in table m k and row h of table m j,Row g is the optimal for columns X k−X i given the value(x0,...,x m) for columns X k∩X i.Similarly,row h is the optimal for columns X j−X i given the value(x h,...,x r)for columns X j∩X i.subtree rooted at i of the tree decomposition.There are t columns in the table m i,one for each vertex in the corresponding tree node X i.Rows are the combinations of these vertices;a vertex is selected if and only if the corresponding column takes value1.There are additionally three columns V,S,Opt in the table.Column V records whether each row represents a valid independent set,column S is the weight of the valid independent set represented by each row.Column Opt,more sophisticated,is explained in the following.These tables are constructed in a bottom-up fashion,from leaves to the apex of the tree decomposition(see Figure2).Every table contains rows,with each being some combination of the vertices in the corresponding node.Column V is set to be1if the row represents a valid independent set.Column Opt is set1if and only if:•The row represents a valid independent set.•S in this row is optimal among all the rows with diﬀerent choices in the columns corresponding to the vertices in X i−X p,given the chosen values same as this row in the columns corresponding to the vertices in X i∩X p,where node p is the parent of node i.Column S is set diﬀerently based on whether the current node is a leaf or an internal node in the tree decomposition.For a leaf node,S is0if the row is not a valid independent set;otherwise S is the corresponding weight of the set.For an internal node i that has two children j and k whose tables m j,m k have been computed, for each row in table m i,column S is computed as S=w1+w2+w3−w4,where•w1is the weight of the row in table m j with the same combination in the columns corresponding to the vertices in X j∩X i that has column Opt=1;•w2is the weight of the row in table m k with the same combination in the columns corresponding to the vertices in X k∩X i that has column Opt=1;•w3is the weight of the independent set formed by the choices in columns corresponding to the vertices in X i−X j−X k;and•w4is the weight of the independent set formed by the same combination in the columns corresponding to the vertices in X i∩X j∩X k.In implementation,the computation of table T i of node i does not enumerate all combinations of vertices in X i.Instead,in general,a greedy algorithm is used to partition set X i into a collection of cliques.Consider the sequence as a straight line and the left(right)region of a stem as an interval.Let all the left regions of the stable stems included in the tree node form an interval graph.Choose an interval(left region)with the right end at the leftmost position among all of the intervals,record all the intervals that overlap with this interval as a clique and remove them,recursively call on the interval graph left until it is empty.A linear time in t is enough forthis procedure.Once the collection of cliques is obtained for X i,combinations of vertices are only considered by taking at most one vertex from every clique.A similar technique can also be used to further improve the eﬃciency of the algorithm for some long sequences: we build the stem graph based on the left regions of the stems,then we use a similar procedure to the above one to build a path decomposition of the graph.Each node of the path decomposition is indeed a clique consisting of overlapping half stems.We only need to choose zero or one half stem from each node.During the dynamic programming,each combination is associated with C(a user deﬁned number,default4000)suboptimal partial solutions rather than the optimal one only,and the conﬂicts caused by the right regions are resolved during the dynamic programming.2.3.2Tree decomposition of stem graphFinding the optimal tree decomposition is NP-hard[4],we use a simple,fast heuristic algorithm to produce a tree decomposition for the given stem graph.This algorithm is based on a heuristic method for greedyﬁll-in[13].This method will produce a tree decomposition with small tree width but not necessary the optimal one,i.e.the tree width of the tree decomposition might be greater than the tree width of the stem graph.Note this will not aﬀect the optimal property of the tree decomposition based method described above since along any tree decomposition the optimal solution could be found.2.3.3Reordering suboptimal structuresThe list of candidate structures,including the optimal and the suboptimal ones,are reordered based on a more sophisticated energy model.In particular,we recalculate the free energy for each of the candidate structures obtained from the previous step using a procedure implemented in[20]according to the energy model in[26,18] together with the one in[9],which take the stem stabilizing energies,loop destabilizing,and pseudoknot energies into account.3Evaluation Results3.1Data setsIn order to test the eﬀectiveness and the eﬃciency of our algorithm,we evaluated it on small and large,pseudo-knotted and pseudoknot-free RNAs.We used three sets of RNA sequences.Theﬁrst set consists of50tRNAs, shown in Table1,with lengths ranging from71to79(with the average75).The second set consists of50small RNA sequences or sequence segments with pseudoknot structures(see Table2)of lengths ranging from23to113 (with the average53).The third set consists of11large pseudoknot or pseudoknot free RNA sequences(see Table 3)of lengths ranging from210to412(with the average344).GA0001GA1262GA2492GA3755GA4966GC2866GD1723GD5199GE2095GE4739GF1407GF4687GG0841GG2136GG3917GH0128GH4536GI1748GI4502GK1078GK4537GM0313GM2284GM4471GM5945GN2837GP1341GP3879GP5312GQ2684GR0044GR0793GR1516GR2309GR3541GR4508GR4705GR4740GR5278GT0109GT1418GT4178GT5273GV0579GV1734GV4391GV5554GW1796GW5332GY4135Table1:Test set one:sequence IDs of50tRNAs,taken from[28].3.2Experiment detailsWe compared the performance of our algorithm TdFOLD and that of algorithms PKNOTS[21],ILM[23],and HotKnots[20].We chose the optimal algorithm PKNOTS since it can deal with the most comprehensive type of pseudoknot,while some newer algorithms can only deal with very restricted kinds of pseudoknots.We ran all these algorithms on the tRNAs and the set of small pseudoknot RNAs,and ran all but PKNOTS on the set of large RNAs.We evaluated both accuracy and eﬃciency of these algorithms.The accuracy is measured in both sensitivity and speciﬁcity.Let RP be the number of base pairs in the real structure,T P(true positive)be theSequence type Sequence IDsmRNA Bt-PrP,Ec alpha,Ec S15,Hs-PrP,T4gene32[32]tmRNA Lp PK1,Ec PK1,Ec PK4[32]ribozymes satRPV,Tt-LSU-P3P7,Bp PK2[32]viral tRNA like OYMV,APLV,CGMMV,SBWMV1,BSMVbeta,CGMMV PKbulge,ORSV-S1,AMV3[32]viral3’UTR BSBV3,TMV-L UPD-PK3,STMV UPD1-PK3,BVQ3UPD-PKb,BSBV1,UPD-PKc,PSLVbeta UPD-PK1,PSLVbeta UPD-PK3,SBWMV1UPD-PKb[32] viral ribosomal minimal IBV,MMTV,MMTV-vpk,pKA-A,BWYV,SRV-1,T2gene32[12];RNA shifting signals EIAV,PLRV-S[32]ribozymes HDV-It ag[32]telomerase RNA T.the telo[32]aptamers NGF-L6[32]rRNA Sc18S-PKE21-7[32]antizyme ribosomalframe shifting site Rr ODCanti[32]viral RNA PSIV IRES[32];TYMV,TMV.L,TMV.R[20]HIV-1-RT ligand RNA HIVRT32,HIVRT322,HIVRT33[20]hepatitis virus ribozyme HDV,HDV anti[20]Table2:Test set two:sequence IDs of50small pseudoknotted RNAs(with their reference citations).Sequence type Sequence IDsRNaseP RNA A.ferrooxidans,idlawii(pseudoknot free),A.tum,B.anthracis,B.halodurans,CPB147,D.desulfuricans,EM14b-9,E.thermomarinus,T.roseum[6]telomerase RNA telo.human[8]Table3:Test set three:large RNAs containing pseudoknots(with their reference citations).number of correctly predicted base pairs and F P(false positive)be the number of predicted base pairs that do not exist as real structures.We deﬁne SE(sensitivity)as T P/RP,and SP(speciﬁcity,also called as positive predictive value)as T P/(T P+F P).The perfect prediction should yield1for both sensitivity and speciﬁcity values.For tRNA,the pseudoknot option for PKNOTS was turned oﬀsince it aﬀects the predictions little for tRNA based on some previous tests.For TdFOLD,parameters were set to default values and the number of output solutions was set to40for tRNAs and small pseudoknotted RNAs.For HotKnots and TdFOLD,we only collect the top prediction result from the program output.We also determine the number of best predictions for each program on all data sets.Here,we say that a program has the best prediction for the secondary structure of an RNA if the sensitivity or speciﬁcity of the prediction is not worse than any of the predictions from other programs.The experiments were run on a PC with2.8GHz Intel(R)Pentium4processor and1-GB RAM,running RedHat Enterprise Linux version4AS.Running times were measured using the“time”function.The testing results are summarized in Tables4,5,and6.Due to space limitations,we omit the data for each tRNA and each small RNA with pseudoknots.3.3Prediction accuracyTables4to6summarize the testing results for diﬀerent programs on the three RNA data sets.In Table4and5, we also record the minimum(maximum)value among the predictions for all of the sequences as min(max)since we did not show the data for each sequence.For example,among the50predictions for tRNAs from TdFOLD, in term of sensitivity,the worst one(min)is0.33and the best(max)one is1.Table4shows that TdFOLD has sensitivity0.81and speciﬁcity0.75on average for the tRNA prediction,which are slightly better than PKNOTS and signiﬁcantly better than ILM and HotKnots.Table5shows that,for the small pseudoknotted RNAs,TdFOLD has average sensitivity0.76,which is less than PKNOTS but greater than ILM and HotKnots.On the other hand,TdFOLD has average speciﬁcity0.79,which outperforms all the others. TdFOLD is slightly better in overall accuracy than PKNOTS,which reports the optimal structure according its sophisticated energy model.Table6shows the prediction accuracy on the large RNA’s.TdFOLD maintains thesame sensitivity(0.54)as HotKnots,which is slightly better than ILM.TdFOLD has the highest speciﬁcity.As we mentioned,a program has the best prediction for an RNA sequence if the sensitivity or speciﬁcity of the prediction is not worse than any of the predictions by other programs.For example,for tRNA,in sensitivity, TdFOLD,PKNOTS,HotKnots,ILM have30,29,15,20best predictions respectively.In speciﬁcity,they have 23,27,19,4respectively.For small pseudoknotted RNAs,they have29,31,27,24for sensitivity and29,26,22, 20for speciﬁcity respectively.For large RNAs,TdFOLD,HotKnots and ILM have6,3,4for sensitivity and7, 1,4for speciﬁcity.Thus,TdFOLD performs better than,or as well as,the other programs tested.We also noticed that diﬀerent initial stem pools could aﬀect the prediction results.The predictions based on the initial pools according to the parameter values mentioned in this paper may not necessarily be the best ones. It could be straightforward to improve the accuracy of our algorithm:generate multiple initial stem pools for an RNA sequence according to diﬀerent parameters values;run our current version of the algorithm to produce multiple sets of predictions;pick up a number of the best ones from the multiple sets of the predictions according to the full energy model.Given the eﬃciency of our algorithm,such an extension is reasonable.Since it could be prejudicial to choose any one particular set of values of the parameters,this extension could also rectify the bias caused by choosing only one parameter value set at some degree.TdFOLD HotKnots ILM PKNOTSSE SP T SE SP T SE SP T SE SP Tmin0.330.290.260.330.250.570.330.250.01000.11max 1.00 1.00 1.37 1.00 1.008.32 1.00 1.000.15 1.00 1.000.24average0.810.750.540.720.66 3.330.750.610.030.780.730.41Table4:Summary of testing results on50tRNAs,where SE:sensitivity,SP:speciﬁcity,T:time(in seconds).TdFOLD HotKnots ILM PKNOTSSE SP T SE SP T SE SP T SE SP Tmin000.04000.0500.250.001000.27max 1.00 1.000.57 1.00 1.0057.0 1.00 1.000.05 1.00 1.00>1hraverage0.760.790.360.690.72 5.840.730.690.030.780.731066Table5:Summary of testing results on small pseudoknotted RNAs,where SE:sensitivity,SP:speciﬁcity,T:time in seconds(if not otherwise noted).TdFOLD HotKnots ILML SE SP T SE SP T SE SP TA.ferrooxidans3440.420.3814.50.430.381690.380.360.71idlawii3160.450.35 3.130.680.526550.570.45 1.15A.tum.RNaseP4000.580.70.460.610.6314320.770.82 1.24B.anthracis4080.430.570.470.350.322220.430.42 1.49B.halodurans4120.560.60.480.530.44134830.560.530.99CPB1472980.180.177.620.590.521700.300.250.85D.desulfuricans3600.620.61 5.590.580.521590.480.450.63EM14b-93550.720.73 4.580.660.51297100.670.610.86E.thermomarinus3310.460.41 1.860.650.561480.540.470.83telo.human2100.860.69 3.170.280.181570.700.550.81T.roseum3500.620.63 1.830.240.1927140.510.45 1.10average3440.540.53 3.970.540.4944560.510.440.97Table6:Summary of testing results on large RNAs(pseudoknots),where L:length,SE:sensitivity,SP:speciﬁcity, and T:time in seconds.3.4Time EﬃciencyEﬃciency comparisons are also given in Tables4,5and6on each data set,respectively.For tRNA’s,the average running time of0.54seconds for TdFOLD is slower than the average0.03of ILM and the average0.41of PKNOTS。