Phylogenetic networks form partial trees
- 格式:pdf
- 大小:250.15 KB
- 文档页数:26
系统发育树的构建方法,使用的保守蛋白集
生物系统发育树(Phylogenetic tree)是分子生物学研究中最为常用的技术之一。
它可以预测到一组基因的演化过程,以便了解其衍生的生物类别的相对关系。
在构建生物系统发育树的过程中,常常使用保守蛋白集(conserved protein set)。
保守蛋白集是指在不同物种之间具有稳定序列并能够执行特定生物功能的蛋白质。
选择保守蛋白集作为建立生物系统发育树的分子标志物,这是因为它在沿着一个演化过程中保持稳定性,可以为树的构建提供有效的信息和数据。
此外,由于保守蛋白集通常都可以完全鉴定出来,而且序列之间的相似性要大于其它蛋白质,因此可以更加准确地定量表征这些物种的相似性。
在构建生物系统发育树时,首先要收集尽可能多物种的保守蛋白质序列,其次要对所有序列进行比较,然后用这些比较结果来构建一棵生物系统发育树。
其中,比较过程可以基于结构、功能、序列或者综合多种方法来完成,以便更准确地评估物种之间的相关性。
建立完成以后,可以提取从树中获得的信息来进一步研究这些物种的关系。
在生物系统发育树的构建过程中,使用保守蛋白集是一种有效的方法,它可以更准确地反映物种之间的关系,同时也有助于我们理解进化的模式和进程。
trimAl: a tool for automated alignment trimming in large-scale phylogenetics analyses Salvador Capella-Gutiérrez, Jose M. Silla-Martínez and Toni GabaldónTutorialVersion 1.2trimAl tutorialtrimAl is a tool for the automated trimming of Multiple Sequence Alignments. A format inter-conversion tool, called readAl, is included in the package. You can use the program either in the command line or webserver versions. The command line version is faster and has more possibilities,so it is recommended if you are going to use trimAl extensively.The trimAl webserver included in Phylemon 2.0 provides a friendly user interface and the opportunity to perform many different downstream phylogenetic analyses on your trimmed alignment. This document is a short tutorial that will guide you through the different possibilities of the program.Additional information can be obtained from where a more comprehensive documentation is available.If you use trimAl or readAl please cite our paper:trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.Salvador Capella-Gutierrez;Jose M.Silla-Martinez;Toni Gabaldon.Bioinformatics 2009 25: 1972-1973.If you use the online webserver phylemon or phylemon2, please cite also this reference:Phylemon:a suite of web tools for molecular evolution,phylogenetics and phylogenomics.Tárraga J, Medina I, Arbiza L, Huerta-Cepas J, Gabaldón T, Dopazo J, Dopazo H. Nucleic Acids Res. 2007 Jul;35 (Web Server issue):W38-42.1. Program Installation.If you have chosen the trimAl command line version you can download the source code from the Download Section in trimAl's wikipage.For Windows OS users, we have prepared a pre-compiled trimAl version to use in this OS. Once the user has uncompressed the package, the user can find a directory,called trimAl/bin, where trimAl and readAl pre-compiled version can be found.Meanwhile for the OS based on Unix platform, e.g. GNU/Linux or MAC OS X, the user should compile the source code before to use these programs. To compile the source code, you have to change your current directory to trimAl/source and just execute "make".Once you have the trimAl and readAl binaries program, you should check if trimAl is running in appropriate way executing trimal program before starting this tutorial.2. trimAl. Multiple Sequence Alignment dataset.In order to follow this tutorial, we have prepared some examples. These examples have been taken from and you can use the codes from these files to get more information about it in this database.You can find three different directories called Api0000038, Api0000040 and Api0000080 with different files. The directory contains these files:A file .seqs with all the unaligned sequences.A file .tce with the Multiple Sequence Alignment produced by T-Coffee1.A file .msl with the Multiple Sequence Alignment produced by Muscle2.A file .mft with the Multiple Sequence Alignment produced by Mafft3.A file .clw with the Multiple Sequence Alignment produced by Clustalw4.A file .cmp with the different names of the MSAs in the directory. This file would be used by trimAl to get the most consistent MSA among the different alignments.You can use any directory to follow the present tutorial.3. Useful trimAl's features.Among the different trimAl parameters, there are some features that can be useful to interpret your alignment results:-htmlout filename. Use this parameter to have the trimAl output in an html file. In this way you can see the columns/sequences that trimAl maintains in the new alignment in grey color while the columns/sequences that have been deleted from the original alignment are in white color.-colnumbering. This parameter will provide you the relationship between the column numbers in the trimmed and the original alignment.-complementary. This parameter lets the user get the complementary alignment, in other words,when the user uses this parameter trimAl will render the columns/sequences that would be deleted from the original alignment.-w number. The user can change the windows size, by default 1, to take into account the surrounding columns in the trimAl's manual methods. When this parameter is fixed, trimAl take into account number columns to the right and to the left from the current position to compute any value, e.g. gap score, similarity score, etc. If the user wants to change a specific windows size value should use the correspond parameter-gw to change window size applied only a gap score assessments, -sc to change window size applied only to similiraty score calculations or -cw to change window size applied only to consistency part.4. Useful trimAl's/readAl's features.Both programs, trimAl and readAl, share common features related to the MSA conversion. It is possible to change the output format for a given alignment, by default the output format is the same than the input one, you can produce an output in different format with these options: -clustal. Output in CLUSTAL format.-fasta. Output in FASTA format.-nbrf. Output in PIR/NBRF format.-nexus. Output in NEXUS format.-mega. Output in MEGA format.-phylip3.2. Output in Phylip NonInterleaved format.-phylip. Output in Phylip Interleaved format.5. Getting Information from Multiple Sequence Alignment.trimAl computes different scores, such as gap score or similarity score distribution, from a given MSA. In order to obtain this information, we can use different parameters through the command line version.To do this part,we are going to use the MSA called Api0000038.msl.This file is in the Api0000038 directory.$ cd Api0000038$ trimal -in Api0000038.msl -sgt$ trimal -in Api0000038.msl -sgc$ trimal -in Api0000038.msl -sct$ trimal -in Api0000038.msl -scc$ trimal -in Api0000038.msl -sidentYou can redirect the trimAl output to a file. This file can be used in subsequent steps as input of other programs, e.g.gnuplot,,microsoft excel,etc,to do plots of this information.$ trimal -in Api0000038.msl -scc > SimilarityColumnsFor instance, in the lines below you can see how to plot the information generated by trimAl using the GNUPLOT program.$ gnuplotplot 'SimilarityColumns' u 1:2 w lp notitleset yrange [-0.05:1.05]set xrange [-1:1210]set xlabel 'Columns'set ylabel 'Residue Similarity Score'plot 'SimilarityColumns' u 1:2 w lp notitleexitIn this other example you can see the gaps distribution from the alignment. This plot also was generated using GNUPLOT$ trimal -in Api0000038.msl -sgt > gapsDistribution$ gnuplotset xlabel '% Alignment'set ylabel 'Gaps Score'plot 'gapsDistribution' u 7:4 w lp notitleexit6. Using user-defined thresholds.If you do not want to use any of the automated procedures included in trimAl (see sections 7 and 8) you can set your own thresholds to trim your alignment. We will use the parameter -htmlout filename for each example so differences can be visualized. In this example, we will use the Api0000038.msl file from the Api0000038 directory.Firstly, we are going to trim the alignment only using the -gt value which is defined in the [0 - 1] range. In this specific example, those columns that do not achieve a gap score, at least, equal to 0.190, meaning that the fraction of gaps on these columns are smaller than this value, will be deleted from the input alignment.$ trimal -in Api0000038.msl -gt 0.190 -htmlout ex01.htmlYou can see different parts of the alignment in the image below.This figure has been generated from the trimAl's HTML file for the previous example.In this other example, we can see the effect to be more strict with our threshold. An usual consequence of higher stringency is that the trimmed MSA has fewer columns. Be careful so you do not remove too much signal$ trimal -in Api0000038.msl -gt 0.8 -htmlout ex02.htmlTo be on the safe side, you can set a minimal fraction of your alignment to be conserved. In this example,we have reproduced the previous example with the difference that here we required to the program that, at least, conserve the 80% of the columns from the original alignment. This will remove the most gappy 20% of the columns or stop at the gap threshold set.$ trimal -in Api0000038.msl -gt 0.8 -cons 80 -htmlout ex03.htmlSecondly,we are going to introduce other manual threshold-st value.In this case,this threshold,also defined in the[0-1]range,is related to the similarity score.This score measures the similarity value for each column from the alignment using the Mean Distance method, by default we use Blosum62 similarity matrix but you can introduce any other matrix (see the manual). In the example below, we have used a smaller threshold to know its effect over the example.$ trimal -in Api0000038.msl -st 0.003 -htmlout ex04.htmlIn this example, similar to the previous example, we have required to conserve a minimum percentage of the original alignment in a independent way to fixed by the similarity threshold.A given threshold maintains a larger number of columns than the cons threshold, trimAl selects this first one.$ trimal -in Api0000038.msl -st 0.003 -cons 30 -htmlout ex05.htmlThirdly, we are going to see the effect of combining two different thresholds. In this case, trimAl only maintains those columns that achieve or pass both thresholds.$ trimal -in Api0000038.msl -st 0.003 -gt 0.19 -htmlout ex06.htmlFinally, we are going to see the effect of combining two different thresholds with the cons parameter. In this case, if the number of columns that achieve or pass both thresholds is equal or greater than the percentage fixed by cons parameter, trimAl chose these columns. However, if the number of columns that achieve or pass both thresholds is less than the number of columns fixed by cons parameter, trimAl relaxes both to thresholds in order to retrieve those columns that lets to achieve this minimum percentage.$ trimal -in Api0000038.msl -st 0.003 -gt 0.19 -cons 60 -htmlout ex07.html7. Selection of the most consistent alignment.trimAl can select the most consistent alignment when more than one alignment is provided for the same sequences (and in the same order) using the -compareset filename parameter. To do this part, we are going to move to Api0000040 directory, we can find there a file calledApi0000040.cmp listing the alignment paths. Using this file, we execute the instruction below to select the most consistent alignment among the alignment provided$ trimal -compareset Api0000040.cmpAs in previous section, once trimAl has selected the most consistent alignment, we can get information about the alignment selected using the appropriate parameters. For example, we can use the follow instructions to know the consistency value for each column in the alignment or its consistency values distribution$ trimal -compareset Api0000040.cmp -sct$ trimal -compareset Api0000040.cmp -sccAlso, we can trim the selected alignment using a specific threshold related to the consistency value. To do that, we should use the -ct value where the value is a number defined in the [0 - 1] range. This number refers to the average conservation of residue pars in that column with respect to the other alignments.$ trimal -compareset Api0000040.cmp -ct 0.6 -htmlout ex08.htmlOn the same way than the previous section, we can define a minimum percentage of columns that should be conserve in the new alignment. For this purpose, we have to use the cons parameter as we explained before.$ trimal -compareset Api0000040.cmp -ct 0.6 -cons 50 -htmlout ex09.htmlFinally, we can combine different thresholds, in fact, we can use all of them as well as we can define a minimum percentage of columns that should be conserve in the output alignment. In the line below, you can see an example of this situation.$ trimal -compareset Api0000040.cmp -ct 0.6 -cons 50 -gt 0.8 -st 0.01-htmlout ex10.html8. Applying automated methods.One of the most powerful aspects of trimAl is that it provides you with several automated options.This option will automatically select the most appropriate thresholds for your alignment after examining the distribution of various parameters along your alignment. Among the alignment features that trimAl takes into account to compute these optimal cut-off are the gap distribution, the similarity distribution, the identity score, etc.You can find a complete explanation about all of these methods in the trimAl's Publications Section.Here,we provide some examples on how to use these methods.The automated methods, gappyout, strict and strictpus, can be used independently if you are working with one or more than one alignment, in the last case, for the same sequences.In the lines below, you can see how to use the gappyout method in both ways. This method will eliminate the most gappy fraction of the columns from your alignment. For this, we are going to continue using the same directory than the previous section.$ trimal -compareset Api0000040.cmp -gappyout -htmlout ex11.html$ trimal -in Api0000040.mft -gappyout -htmlout ex12.htmlIn this case, we are going to use the same files than in the example before but we have changed the method to trim the alignmnet. Now, we are using strict and strictplus methods. These two methods combine the information on the fraction of gaps in a column and their similarity scores, being strictplus for more stringent than strict method.$ trimal -compareset Api0000040.cmp -strict -htmlout ex13.html$ trimal -in Api0000040.clw -strictplus -htmlout ex14.htmling an heuristic method to decide which is the best automated method for a given MSA.Finally, we implemented an heuristic method to decide which is the best automated method to trim a given alignment. The heuristic method takes into account alignment features such as the number of sequences in the alignment as well as some measures about the identity score among the sequences in the alignment or among the best pairwise sequences in that MSA. According to these characteristics trimAl will decide upon one of the two automated methods (gappyout or strictplus).To illustrate how to use this method, we provide a couple of example using the same directory than the section before. First, we used trimAl to selecte the most consistent alignment and then we trimmed that alignmnet using our heuristic method.$ trimal -compareset Api0000040.cmp -automated1 -htmlout ex15.htmlThen, we trim a single MSA using the previously mentioned method.$ trimal -in Api0000040.msl -automated1 -htmlout ex16.html10. Getting more information.We hope that this short introduction to trimAl's features has been useful to you.We advise you to visit periodically the trimAl's wikipage()where you could get the latest news about the program as well as more information, examples, etc, about trimAl's package. You can also subscribe to the mailing list if you want to be updated in new trimAl developing.11. References.1.T-Coffee: A novel method for fast and accurate multiple sequence alignment.Notredame C, Higgins DG, Heringa J. J Mol Biol. 2000 Sep 8;302(1):205-17.2.MUSCLE:multiple sequence alignment with high accuracy and highthroughput. Edgar RC.Nucleic Acids Res. 2004 Mar 19;32(5):1792-7.3.MAFFT: a novel method for rapid multiple sequence alignment based on fastFourier transform. Katoh K, Misawa K, Kuma K, Miyata T. Nucleic Acids Res. 2002 Jul 15;30(14):3059-66.4.CLUSTAL W:improving the sensitivity of progressive multiple sequencealignment through sequence weighting,position-specific gap penalties and weight matrix choice. Thompson JD, Higgins DG, Gibson TJ. Nucleic Acids Res. 1994 Nov 11;22(22):4673-80.。
efficientnet解读-回复什么是EfficientNet?EfficientNet是一种高效的卷积神经网络(Convolutional Neural Network, CNN)架构,由Google研究团队在2019年提出。
它通过优化网络的深度、宽度和分辨率,达到了在图像分类任务上,比目前其他SOTA(State-of-the-Art)的模型具有更高的精度和更好的效能。
EfficientNet的创新点在于使用了一种称为Compound Scaling的方法,该方法为网络的不同维度(深度、宽度和分辨率)选择了合适的比例,从而使网络在三个方面都能够提供良好的性能。
这意味着EfficientNet不仅提升了模型的准确性,同时还减少了可训练参数的数量,使得模型更加高效。
EfficientNet的核心思想是通过在网络的不同层次上相对比例地缩放深度、宽度和分辨率,从而平衡网络的规模和性能。
具体地说,EfficientNet利用了一个复合缩放系数phi(ϕ),它控制了网络的总体缩放比例。
该phi 参数是通过在一定范围内进行搜索和验证,确定出最佳值的。
在EfficientNet的工作中,研究人员通过在Imagenet上进行大规模的实验评估,找到了一个最佳的复合缩放系数phi=1.2。
在EfficientNet的网络结构中,首先进行的是深度方向的缩放。
通过复制某个输入模型(例如ResNet)的某个重复模块,即可扩展EfficientNet的深度。
而深度可以同时扩展子层的数量和整体的网络深度。
然后,进行的是宽度方向的缩放,即扩展通道/特征维度的数量。
为了平衡不同层级的性能,EfficientNet限制了扩展的范围。
这样,即使在更高分辨率的层级中,EfficientNet也能保证较高的计算效率。
最后,进行的是分辨率方向的缩放,即调整图像输入的分辨率。
通过在训练过程中逐渐增加分辨率,EfficientNet能够提高网络对更高分辨率图像的适应能力。
Estimating phylogenetic trees and networks usingSplitsTree4,Daniel H.HusonCenter for Bioinformatics,T¨u bingen University,Sand14,72076T¨u bingen,Germany.E-mail:huson@informatik.uni-tuebingen.deDave BryantCenter for Bioinformatics,McGill University,Montreal,CanadaE-mail:bryant@mcb.mcgill.caSplitsTree4[7,8]is a new Java program for estimating phylogenetic trees and networks.Its main focus is on computing phylogenetic networks[4].It provides methods for con-structing splits networks,such as the consensus network[6]or super network of a set of phylogenetic trees or partial trees[9],distance methods such as Neighbor-Net[3]and split decomposition[1]and direct methods such as median networks[2]and spectral analysis[5].SplitsTree4also provides a large number of distance transformations and implementations of all standard distance-methods for building phylogenetic trees.Moreover,it also provides interfaces to external programs such ClustalW or Phylip.SplitsTree is currently the only available program for computing hybridization networks from gene trees[11]and computing recombination networks from binary sequences[10].The program can be run both in a GUI mode or in a non-interactive command-line mode. The software is freely available for UNix/Linux,Windows and MacOS from:References[1]H.-J.Bandelt and A.W.M.Dress.A canonical decomposition theory for metrics on afinite set.Advances in Mathematics,92:47–105,1992.[2]H.-J.Bandelt,P.Forster,B.C.Sykes,and M.B.Richards.Mitochondrial portraits of human population using median networks.Genetics,141:743–753,1995.[3] D.Bryant and V.Moulton.NeighborNet:An agglomerative method for the construction of planar phylogenetic networks.In R.Guig´o andD.Gusfield,editors,Algorithms in Bioinformatics,WABI2002,volume LNCS2452,pages375–391,2002.[4] A.W.M.Dress and D.H.Huson.Constructing splits graphs.IEEE/ACM Transactions in Computational Biology and Bioinformatics,1(3):109–115,2004.[5]M.D.Hendy and D.Penny.Spectral analysis of phylogentic data.Journal of Classification,10:5–24,1993.[6] B.Holland and V.Moulton.Consensus networks:A method for visualizing incompatibilities in collections of trees.In G.Benson and R.Page,editors,Proceedings of“Workshop on Algorithms in Bioinformatics”,volume2812of LNBI,pages165–176.Springer,2003.[7] D.H.Huson.SplitsTree:A program for analyzing and visualizing evolutionary data.Bioinformatics,14(10):68–73,1998.[8] D.H.Huson and D.Bryant.Estimating phylogenetic trees and networks using SplitsTree4.Manuscript in preparation,software availablefrom ,2005.[9] D.H.Huson,T.Dezulian,T.Kloepper,and M. A.Steel.Phylogenetic super-networks from partial trees.IEEE/ACM Transactions inComputational Biology and Bioinformatics,1(4):151–158,2004.[10] D.H.Huson and puting recombination networks from binary sequences.Submitted,2005.[11] D.H.Huson,T.Kloepper,P.J.Lockhart,and M.A.Steel.Reconstruction of reticulate networks from gene trees.In Proceedings of the NinthInternational Conference on Research in Computational Molecular Biology(RECOMB),2005.。
一种分层图像编码算法
王成儒;司菁菁
【期刊名称】《计算机工程与应用》
【年(卷),期】2005(041)026
【摘要】针对纹理较丰富的图像提出了一种分层编码算法.该算法将图像划分成具有不同特征的两层:平滑层和纹理层,分别基于小波变换和自适应局部余弦变换(ALCT)进行编码.编码平滑层时,仅对原图像一级小波变换后的低频子带进行小波SPIHT编码,在减少计算量的同时,提高了整个算法的性能.该文改进了实现ALCT的折叠算子,并且提出了一种ALCT系数的重组方案,从而利用零树编码获得了嵌入式码流.实验结果表明,本算法在未进行算术编码的情况下,压缩性能优于SPIHT,并且在重建图像中更好地保留了原图像的纹理特征.
【总页数】3页(P78-80)
【作者】王成儒;司菁菁
【作者单位】燕山大学信息科学与工程学院,河北,秦皇岛,066004;燕山大学信息科学与工程学院,河北,秦皇岛,066004
【正文语种】中文
【中图分类】TP391
【相关文献】
1.采用小波与ALCT的分层图像编码算法 [J], 王成儒;司菁菁
2.一种采用分层多阈值的分辨率渐进图像编码算法 [J], 陈仁喜;赵忠明
3.一种基于离散余弦变换的嵌入式图像编码算法 [J], 陈鑫;游敏娟;崔艺君;张勇;王世刚
4.基于改进的正交FRIT的分层图像编码算法 [J], 司菁菁;王成儒;程银波
5.基于小波和匹配跟踪的分层图像编码算法 [J], 刘利雄;廖斌;贾云得;王元全
因版权原因,仅展示原文概要,查看原文内容请购买。
⽣物信息学复习题⼀、名词解释1.bioinformatics:⽣物信息学,指从事对基因组研究相关的⽣物信息的获取、加⼯、储存、分配、分析和解释的⼀门科学,是⼀门⽣物学,数学和计算机相互交叉融合⽽产⽣的新兴学科。
2.molecular bioinformatics:指综合应⽤信息科学、数学的理论、⽅法和技术,管理、分析和利⽤⽣物分⼦数据的科学。
3.GenBank:是美国全国卫⽣研究所维护的基因序列数据库,汇集并注释了所有公开的核酸序列,与⽇本的DNA数据库DDBJ以及欧洲分⼦实验室核酸序列数据库EMBL⼀起,都是国际核苷酸序列数据库合作的成员。
4.EMBL:EMBL实验室—欧洲分⼦⽣物学实验室,EMBL数据库—是⾮盈利性学术组织EMBL建⽴的综合性数据库,EMBL核酸数据库是欧洲最重要的核酸序列数据库,它定期地与美国的GenBank、⽇本的DDBJ数据库中的数据进⾏交换,并同步更新。
5.DDBJ:⽇本DNA数据库,主要向研究者收集DNA序列信息并赋予其数据存取号,信息来源主要是⽇本的研究机构,也接受其他国家呈递的序列。
6.BLAST:基本局部⽐对搜索⼯具的缩写,是⼀种序列类似性检索⼯具。
BLAST采⽤统计学⼏分系统,同时采⽤局部⽐对算法, BLAST程序能迅速与公开数据库进⾏相似性序列⽐较。
BLAST结果中的得分是对⼀种对相似性的统计说明。
7.BLASTn:是核酸序列到核酸库中的⼀种查询。
库中存在的每条已知序列都将同所查序列作⼀对⼀地核酸序列⽐对。
8.BLASTp:是蛋⽩序列到蛋⽩库中的⼀种查询。
库中存在的每条已知序列将逐⼀地同每条所查序列作⼀对⼀的序列⽐对。
9.Clustsl X:是CLUSTAL多重序列⽐对程序的Windows版本,是⽤来对核酸与蛋⽩序列进⾏多序列⽐较的程序,也可以对来⾃不同物种的功能或结构相似的序列进⾏⽐对和聚类,通过重建系统发⽣树判断亲缘关系,并对序列在⽣物进化过程中的保守性进⾏估计。
1 SplitsTree4.0-Computation of phylogenetic treesand networksDaniel H.Huson1,Tobias Kloepper2,David Bryant3 Keywords:phylogeny,evolution,trees,networks,graphs.1IntroductionThe goal of phylogenetic analysis is to determine the order and approximate timing of spe-ciation events in the evolution of a given set of species.In the classic theory of phylogenetic analysis the species are assumed to evolve along a(bifucating)X-tree,experiencing point mutations along the way.Recent results from genome wide comparision of gene trees seem to indicate that these models mayfit well for single genes but fail to represent the complex evo-lution of a genome.A natural generalisation of the phylogenetic tree,that seems tofit well with the complex evolution of a genome,is the phylogenetic network.There are two main approches to generate such phylogenetic networks.Thefirst one is the combined analysis approach,and the second one is the individual analysis approach.In thefirst approach the phylogenetic network is generated directly from the given combined information.In the sec-ond approach individual trees are generated for the given information sets and all individual trees are then combined into one network.Bandelt and Dress[BD92]suggested a number of combined analysis methods,such as the split-decomposition method and the parsimony splits.More recently,Bryant and Moulton[BM02]described a new method Neighbor-Net, that brings together both split decomposition and the well-known Neighbor-Joining method [SN87].Holland and Moulton[HM03]presented a method to join individual gene trees for a common set of species.In2004Huson et al.[HDKS04]presented the Z-Super-Network method,which merges individual partial gene trees.2The SplitsTree programThere exist a number of packages for performing phylogenetic analysis,e.g.[Swo00,SvH96, MM02].However,they all use trees as the fundamental data structure.In contrast,the SplitsTree program[HB05]is based on so-called splits and phylogenetic networks.It is aimed at providing a general framework for both tree-and network-oriented phylogenetic analysis.Fundamental data types supported by the program includ unaligned-and aligned sequences,distances,splits,trees,networks and quartets.The package provides commonly used distance-based algorithms.We have also implemented the methods mentioned in the introduction.The package provides numerous visualisation methods for phylogentic trees and networks,examples are split graphs[DH04]and the equal-daylight method[Fel04].We [HKLS05]have recently implemented methods for the reconstruction of reticulation networks and the transformation from split networks to reticulation networks.The main features of the program are:2•it runs on any machine with minimal installation requirements based on Java or Java Web Start.•GUI version for interactive use,command-line version for scripting pipelines.•Flexible frame-work for doing phylogenetic analysis.•De-centralized plug-in concept for adding new methodology.•Fundamental data types include splits and quartets.•Uses Nexusfile format,with one-to-one correspondence between internal data classes and external nexus blocks,and supports most other formats.•Transformations of molecular sequences to distances.•Combined and individual analysis methods.•Visualisation of phylogenetic trees and networks.•Interactive exploration of the visualisation.•Bootstrapping3Availability.The program is freely available from rmatik.uni-tuebingen.de/software References[BD92]H.-J.Bandelt,and A.W.M.Dress,Split Decomposition:A new and useful approach to phylogenetic analysis of distance data.Molecular Phylogenetics and Evolution1(3):242-252, 1992.[BM02] D.Bryant and V.Moulton.NeighborNet:An agglomerative method for the construction of planar phylogenetic networks.In R.Guig´o and D.Gusfield,editors,Algorithms in Bioinfor-matics,WABI2002,volume LNCS2452,pages375,391,2002.[DH04] A.W.M.Dress and D.H.Huson.Constructing splits graphs,IEEE/ACM Transactions in Computational Biology and Bioinformatics,volume1(3),pages109-115,2004.[Fel04]J.Felsenstein.Inferring Phylogenies Sinauer Associates,Inc.,pages582ff,2004.[HM03] B.Holland and V.Moulton.Consensus networks:A method for visualizing incompatibilities in collections of trees.Algorithms in Bioinformatics,WABI2003,volume LNBI2812,pages 165-176,2003[HB05] D.H.Huson and D.Bryant.Estimating phylogenetic trees and networks using SplitsTree4,note=Manuscript in preparation,software available from rmatik.uni-tuebingen.de/software[HDKS04] D.H.Huson,T.Dezulian,T.Kloepper and M.A.Steel.Phylogenetic Super-Networks from Partial Trees.Algorithms in Bioinformatics,WABI2004,in press,2004.[HKLS05] D.H.Huson,T.Kloepper,P.J.Lockhart and M.A.Steel.Reconstruction of Reticulate Networks from Gene Trees accepted for RECOMB2005.[MM02]W.Maddison and D.Maddison.Mesquite-a modular system for evolutionary analysis./mesquite/mesquite.html,2002.[SN87]N.Saitou and M.Nei.The Neighbor-Joining method:a new method for reconstructing phylogenetic trees.Molecular Biology and Evolution,4:406–425,1987.[SvH96]K.Strimmer and A.von Haeseler.Quartet puzzling:a quartet maximum likelihood method for reconstructing tree topologies.Molecular Biology and Evolution,13:964–969,1996. [Swo00] D.L.Swofford.PAUP∗:Phylogenetic analysis using parsimony(∗:and other methods), version4.2,2000.。
作系统进化树的方法系统进化树(Phylogenetic tree)是一种表示生物物种之间进化关系的图形结构。
它基于生物的遗传物质或形态特征等数据,通过一定的算法和模型来构建,以揭示物种之间的亲缘关系和进化历程。
以下是构建系统进化树的一般步骤:1. 数据收集:首先需要收集用于构建进化树的基因或形态特征数据。
这通常涉及从各种来源获取DNA、蛋白质或其他分子序列数据,或者从博物馆和标本馆获取生物形态特征数据。
2. 序列比对:对于DNA或蛋白质序列数据,需要将这些序列进行比对,以确保它们可以一起进行比较和分析。
3. 选择适当的距离度量:在构建系统进化树时,需要计算物种之间的“距离”。
这些距离是基于序列或形态特征的差异来计算的。
有多种方法可以计算这些距离,例如基于遗传物质的p距离(代表两个序列之间的差异比例)或形态特征的欧几里得距离。
4. 选择合适的建树算法:系统进化树可以通过多种算法来构建,包括但不限于UPGMA(Unweighted Pair Group Method with Arithmetic Mean)、WPGMA(Weighted Pair Group Method with Arithmetic Mean)、WPGMC(Weighted Pair Group Method with Centroid Linkage)、Neighbor Joining、Fitch-Margoliash、Maximum Parsimony、Maximum Likelihood等。
选择哪种算法取决于你的具体需求和所处理数据的性质。
5. 构建系统进化树:使用选择的算法和距离度量,将物种按照它们的亲缘关系分组。
这一步通常涉及到一个迭代过程,其中算法会尝试不同的分组方案,直到找到一个最优解。
6. 评估和验证树:一旦构建了系统进化树,就需要对其进行评估和验证,以确保其合理性和可靠性。
这通常涉及使用多种统计测试和可视化工具,例如Bootstrapping、P-distance、Tree-bisection-reconnection (TBR) 操作等。
生物大数据分析中的进化遗传树构建方法与技巧进化遗传树(Phylogenetic Tree)是生物学研究中用于分析物种关系和演化历程的重要工具。
通过构建进化树,我们可以了解不同物种之间的进化关系,揭示物种的演化历史以及预测它们之间的共同祖先。
在生物大数据分析中,构建进化遗传树有着重要的意义,因为它可以帮助我们理解生物的遗传多样性、物种起源以及群体分化等重要生物学问题。
在构建进化遗传树的过程中,我们需要根据生物学数据来推断物种间的关系。
这些生物学数据可以是DNA或RNA序列、蛋白质序列、形态特征等。
为了准确地构建进化遗传树,我们需要选择合适的方法和技巧。
下面将介绍一些常用的进化遗传树构建方法和技巧。
1. 距离法(Distance-based methods):距离法是通过计算物种间的相似度或差异度来构建进化遗传树的方法。
常用的距离法包括最邻近法(Neighbor Joining)、最小进化法(Minimum Evolution)和最大简约法(Maximum Parsimony)等。
这些方法根据不同的算法和模型,通过计算物种间的距离矩阵来构建进化关系。
2. 贝叶斯方法(Bayesian methods):贝叶斯方法是一种基于统计模型和概率推断的进化遗传树构建方法。
它通过采用贝叶斯推断和蒙特卡洛马尔科夫链蒙特卡洛算法(MCMC)来估计进化树的拓扑结构和参数。
贝叶斯方法具有高度灵活性和更准确的模型,适用于复杂的进化树推断问题。
3. 最大似然方法(Maximum likelihood methods):最大似然方法是一种常用的基于概率统计的进化遗传树构建方法。
它通过最大化观测到的数据出现的概率,推断出可能的进化树。
最大似然方法考虑了模型中的参数估计问题,并用参数化的模型来描述进化过程,从而提高了推断结果的准确性。
在进行进化遗传树构建时,还有一些技巧需要注意,以保证结果的准确性和可靠性:1. 数据质量的控制:数据质量是构建进化遗传树的关键因素之一。
a rXiv:079.283v1[math.CO ]3Se p27Phylogenetic Networks from Partial Trees S.Gr¨u newald Department of Combinatorics and Geometry (DCG),CAS-MPG Partner Institute for Computational Biology (PICB),Shanghai Institutes for Biological Sciences (SIBS),Chinese Academy of Sciences (CAS),China,and K.T.Huber and Q.Wu School of Computing Sciences,University of East Anglia,Norwich,NR57TJ,United Kingdom.February 1,20081K.T.HuberSchool of Computing Sciences,University of East Anglia,Norwich,NR57TJ,United Kingdom.email:katharina.huber@FAX:+44(0)1603593345We would prefer to submit thefinal manuscript in LATEX.2AbstractA contemporary and fundamental problem faced by many evolu-tionary biologists is how to puzzle together a collection P of partialtrees(leaf-labelled trees whose leaves are bijectively labelled by speciesor,more generally,taxa,each supported by e.g.a gene)into an overallparental structure that displays all trees in P.This already difficultproblem is complicated by the fact that the trees in P regularly sup-port conflicting phylogenetic relationships and are not on the samebut only overlapping taxa sets.A desirable requirement on the soughtafter parental structure therefore is that it can accommodate the ob-served conflicts.Phylogenetic networks are a popular tool capableof doing precisely this.However,not much is known about how toconstruct such networks from partial trees,a notable exception beingthe Z-closure super-network approach and the recently introduced Q-imputation approach.Here,we propose the usage of closure rules toobtain such a network.In particular,we introduce the novel Y-closurerule and show that this rule on its own or in combination with oneof Meacham’s closure rules(which we call the M-rule)has some verydesirable theoretical properties.In addition,we use the M-and Y-ruleto explore the dependency of Rivera et al.’s“ring of life”on the factthat the underpinning phylogenetic trees are all on the same data set.Our analysis culminates in the presentation of a collection of inducedsubtrees from which this ring can be reconstructed.1IntroductionPhylogenetic trees have proved an important tool for representing evolution-ary relationships.For a set X of species(or,more generally,taxa)these are formally defined as leaf-labelled trees whose leaves are bijectively labelled by the elements of X.Advances in DNA sequencing have resulted in ever more data on which such trees may be putational limitations however combined with the need to understand species evolution have left biologists with the following fundamental problem which we will refer to as amalgamation problem:given a collection P of phylogenetic trees,how can these trees be amalgamated into an overall parental structure that preserves the phylogenetic relationships supported by the trees in P?The hope is that such a structure might help shed light on the evolution of the underlying genomes(and thus the species).In the ideal case that all trees in P support the same phylogenetic rela-tionships(as is the case for trees T1and T2depicted in Fig.1)this structure is known to be a phylogenetic tree and a supertree method[2]may be used to reconstruct it.For the above example the outcome T∗of such a method is3T2T1Figure1:3phylogenetic trees which appeared in weighted form in[17]on subsets of the7plant species: A.thaliana(A.th),A.suecia(A.su),Turri-tis(Tu)A.arenosa(A.ar),A.cebennensis(A.ce),Crucihimalaya(Cru)and A.halleri(A.ha).T1with species Cru(see Fig.1for full species names)attached via a pendant edge to the vertex labelled v.It should be noted that T∗supports the same phylogenetic relationships as T1and T2in the following sense:For afinite set X,call a bipartition S={A, A}of some subset X′⊆X a partial split on X,or a partial(X)-split for short,and denote it by A| A or,equivalently,by A|A where A:=X′−A.In particular,call S a(full)split of X if X′=X. Furthermore,say that a partial X-split S=A| A extends a partial X-split S′=B| B if either B⊆A and B⊆ A or B⊆ A and A⊆ B.Finally,say that a phylogenetic tree T displays a split S=A| A if S is a partial split on the leaf set L(T)of T induced by deleting an edge of T.Then“supports the same phylogenetic relationships”means that for every split S displayed by T1or T2there exists a split on L(T∗)that extends S and is displayed by T∗.Due to complex evolutionary mechanisms such as incomplete lineage sorting,recombination(in the case of viruses),or lateral gene transfer(in case of bacteria)the trees in P may however support not the same but conflicting phylogenetic relationships.A phylogenetic network in the form of a split network(see[10,19]for overviews)rather than a phylogenetic tree is therefore the structure of choice if one wishes to simultaneously represent all phylogenetic relationships supported by the trees in P.An example in point is the split network pictured in Fig.2which appeared as a weighted network in[17].With replacing“edge”in the definition of displaying by “band of parallel edges”and“L(T)”by“set of network vertices of degree 1”to obtain a definition for when a split network displays a split,it is straight forward to check that the network in Fig.2displays all splits displayed by the3trees pictured in Fig.1.It should be noted that phylogenetic networks such as the one depicted in4Fig.2(see e.g.[7,11,14]for recently introduced other types of phylogeneticnetworks)provide a means to visualize the complexity of a data set andshould not be thought of as an explicit model of evolution.Awareness ofthis complexity does not only allow the exploration of a data set but,as is the case of e.g.hybridization networks[14],can also serve as starting pointfor obtaining an explicit model of evolution(see[12]for more on this).Figure2:A circular phylogenetic network that represents all phylogenetic relationships supported by the trees depicted in Fig.1(see thatfigure forfull species names).Apart from displaying all splits induced by the3trees depicted in Fig.1,the network depicted in Fig.2has a further interesting feature.It is cir-cular.In other words,if X denotes the set of the7plant species underconsideration,then the elements of X can be arranged around a circle C sothat every split S=A| A of X displayed by the network can be obtained by intersecting C with a straight line so that the label set of one of the resulting2connected components is A and the label set of the other is A.Although seemingly a very special type of phylogenetic network,circularphylogenetic networks are a frequently used structure in phylogenetics(seee.g.[3,4,5,6,8])as they do not only naturally generalize the concept of aphylogenetic tree but are also guaranteed to be representable in the plane;a fact that greatly facilitates drawing and thus analyzing them.However,al-though recentlyfirst steps have been made with regards tofinding a solution to the amalgamation problem in terms of a phylogenetic network leading to the attractive Z-closure[13]and Q-imputation[9]approaches,very little is known about a solution of this problem in terms of a circular phylogenetic network.Intrigued by this and motivated by the fact that,from a combinatorialpoint of view,phylogenetic trees and networks are split systems(i.e.collections5of full splits)and that therefore the amalgamation problem boils down to the problem of how to extend partial splits on some set X to splits on X,we wondered whether closure rules for partial splits could not be of help.Es-sentially mechanisms for splits’enlargement,such rules have proved useful for supertree construction and also underpin the above mentioned Z-closure super-network approach.As it turns out,this is indeed the case.As an immediate consequence of our main result(Corollary5.5),we obtain that for a collection of partial splits that can be“displayed”by a circular phy-logenetic network N,the collection of(full)splits generated by the closure rules in the centre of this paper is guaranteed to be displayable by N and also independent of the order in which the rules are applied.In a study aimed at shedding light into the origin of eukaryotes,Rivera et al.[20]put forward the idea of a“ring of life”with the eukaryotic genome being the result of a fusion of two diverse procaryotic genomes(see also [16,20,23]).A natural and interesting question in this context is how dependent Rivera et al.’s ring of life is on the fact that all underpinning trees are on the same taxa set.In the last section of this paper,we provide a partial answer by presenting an example of a collection of induced partial trees from which the ring of life can be reconstructed using the M-and Y-rule.The paper is organized as follows.In Section2,wefirst introduce some more terminology and then restate one of Meacham’s closure rules(our M-rule)and introduce the novel Y-rule.In Section3,we study the relationship between the M-and Y-rule and the closure rule that underpins the afore-mentioned Z-closure super-network approach.In Section4,we introduce the concept of a circular collection of partial splits and show that both the Y-and M-rule preserve circularity(Proposition4.4).In Section5,we in-troduce the concept of a split closure and show that for certain collections of partial splits this closure is independent of the order in which the Y-rule and/or M-rule are/is applied(Theorem5.3).This result lies at the heart of Corollary5.5.In Section6,we explore the dependency of Rivera et al.’s ring of life on the fact that the underpinning trees are all on the same data setThroughout the paper,X denotes afinite set and the terminology and notation largely follows[21].62Closure rulesWe start this section by introducing some additional terminology and nota-tion.Subsequent to this,wefirst restate Meacham’s rule(which we call the M-rule)and then introduce a novel closure rule which we call the Y-rule.LetΣ(X)denote the collection of all partial splits of X and suppose Σ⊆Σ(X).Then a partial split S∈Σthat can be extended by a partial split S′∈Σ−S is called redundant.The set obtained by removing redundant elements fromΣis denoted byΣ−.IfΣ=Σ−thenΣis called irreducible and the set of all irreducible subsets inΣ(X)is denoted by P(X).Note that the relation“ ”defined for any two(partial)split collectionsΣ,Σ′∈P(X) by puttingΣ Σ′if every partial split inΣis extended by a partial split in Σ′is a partial order on P(X)[21].Suppose for the following thatθis a closure rule,that is,a replacement rule that replaces a collection A⊆Σ(X)of partial splits that satisfy some condition Cθby a collectionθ(A)⊆Σ(X)whose elements are generated in some systematic way from the partial splits in A(see e.g.the M-and the Y-closure rules presented below for two such systematic ways).Suppose Σ,Σ′∈P(X)are two irreducible collections of partial splits and Cθ(Σ)is the set of all subsets ofΣthat satisfy Cθ.If there exists some subset A∈Cθ(Σ) such thatΣ′=(Σ∪θ(A))−then we say thatΣ′is obtained fromΣvia a single application ofθ.Finally,if for every subset A∈Cθ(Σ)we have θ(A)− Σthen we call an application ofθtoΣtrivial and say thatΣis closed with respect toθ.We are now in the position to present the2closure rules we are mostly concerned with in this paper:the M-rule which is originally due to Meacham [18]and the novel Y-rule.We start with Meacham’s rule.2.1The M-ruleSuppose S1,S2∈Σ(X)are two distinct partial splits of X.Then the M-rule θM is as follows:(θM)If there exists some A i∈S i,i=1,2such thatA1∩A2=∅and A1∩ A2=∅(1)then replace A={S1,S2}by the setθ{A1,A2}M (A)which comprises ofA and,in addition,also the partial splitsS′1=(A1∩A2)|( A1∪ A2)and S′2=( A1∩ A2)|(A1∪A2).7In case the partial splits S1and S2are such that there is no ambiguity with regards to the identity of the sets A1and A2in the statement of the M-rule orthey are irrelevant to the discussion,we will simplifyθ{A1,A2}M (A)toθM(A).Clearly,such ambiguity cannot arise if S1and S2are compatible,that is, there exist subsets D i∈S i,i=1,2such that D1∩D2=∅.However if S1 and S2are incompatible,that is,not compatible then caution is required.Note that if A1and A2as in the statement of the M-rule are such that A2⊆A1and A1⊆ A2,then it is easy to verify thatθM applies trivially to A.Also note that for anyΣ∈P(X)and any two distinct partial splitsS1,S2∈Σ,we haveΣ (Σ∪θM({S1,S2}))−.Finally,note that any phylogenetic tree on X that displays the partial splitsin some setΣ∈P(X)also displays the partial splits in(Σ∪θM({S1,S2}))−,S1,S2∈Σ.2.2The Y-ruleSuppose S i∈Σ(X),i=1,2,3,are three distinct partial splits of X.Then the Y-ruleθY is as follows:(θY)If there exists some A i∈S i,i=1,2,3such that∅∈{A1∩A2∩A3, A1∩ A2∩A3, A1∩A2∩ A3}andA1∩ A2∩ A3=∅.(2) (see Fig.3(a)for a graphical interpretation),then replace A={S1,S2,S3}by the setθ{A1,A2,A3}Y(A)which comprises of the partial splitsS′1= A1∪( A2∩ A3)|A1,S′2=A2∪(A1∩ A3)| A2,andS′3=A3∪(A1∩ A2)| A3.Although the condition in(2)might look quite strange atfirst sight,the class of triplets of partial splits that satisfy it is very rich.For example,sup-pose that S i=A i| A i,i=1,2,3are splits of X that can be arranged in the plane as indicated in Fig.3(b)where each bold,straight line represents oneof S i,i=1,2,3and the dots represent non-empty triplewise intersectionsof the parts of S i,i=1,2,3,in which they lie.For example,the dot in the bottom wedge represents the intersection A1∩A2∩A3.The shaded regions correspond to the3non-empty intersections mentioned in the statement of the Y-rule.The partial splits S′i=A′i| A′i i=1,2,3obtained by restricting81111111101111A12∩A3=∅(a)A1∩A2∩ A3=∅A1∩ A2∩ A1∩A2∩ A3=∅A1(b)A2A3A3A1A2Figure3:(a)A graphical representation of Condition(2)in the form of a Y.(b)An example of three splits,depicted in bold lines,that satisfy Condition(2)–see text for details.S1,S2,and S3to different subsets of X so that the shaded regions remain non-empty form a triplet of partial splits that satisfy(2).As the example of set A comprising the three partial splits S1=145|2367, S2=1357|246,and S3=127|356shows different choices of the sets A i,i=1,2,3lead to different setsθ{A1,A2,A3}Y(A).For example,if A1:= {1,4,5},A2:={1,3,5,7},and A3:={1,2,7}then(2)is satisfied andθ{A1,A2,A3}Y(A)={S1,S2,1247|356}.If however A1and A2are as beforeand A3:={3,5,6},then(2)is also satisfied andθ{A1,A2,A3}Y(A)is the set{S1,S2,127|3456}.Following our practise for the M-rule,for A={S1,S2,S3}we simplifyθ{A1,A2,A3}Y(A)toθY(A)if the partial splits S i, i=1,2,3are such that there is no ambiguity with regards to the iden-tity of the sets A i,i=1,2,3,in the statement of the Y-rule or they are irrelevant to the discussion.Note that if A i,i=1,2,3as in the statement of the Y-rule are suchthat,in addition,∅=A1∩ A2⊆A3,∅=A1∩ A3⊆A2,∅= A3∩ A2⊆ A1, and A1∩A2∩A3=∅it is easy to see thatθY applies trivially to A.Also note that for anyΣ∈P(X)and any3partial splits S1,S2,S3∈Σof X,we haveΣ (Σ∪θY({S1,S2,S3})−.3First closure rule relationshipsIn this section wefirst restate the Z-(closure)rule which was used in[13]inthe context of a supernetwork construction approach and then investigatethe relationship between the Y-,M-,and Z-rule.Also originally due to Meacham[18],the Z-ruleθZ can be restated as9follows:Suppose S1,S2∈Σ(X)are two distinct partial splits of X.(θZ)If there exists some A i∈S i,i=1,2such that∅∈{A1∩A2,A2∩ A1, A1∩ A2}and A1∩ A2=∅(3) then replace A={S1,S2}by the setθY(A)which comprises of the partial splits( A1∪ A2)|A1and A2|(A1∪A2).Note that any two compatible partial splits of X satisfy the condition in(3).With this third closure rule at hand we are now in the position to present afirst easy to verify result.Suppose S1,S2,and S3are3distinct partial splits of X such that there exist parts A i∈S i,i=1,2,3as in the statement of the Y-rule.If,in addition, A1∩ A2∩ A3=∅and A1⊆A2∪A3,then the partial split A1| A1∪( A2∩ A3)generated byθY is also generated byfirst applyingθM to S2and S3(with regards to A2∩ A3=∅)and then applying θM to the resulting partial split A2∪A3| A2∩ A3and S1.In addition,we have the following result whose straight forward proof we leave to the reader.Proposition3.1Suppose S1is a full split of X.Then the following state-ments hold.(i)If S2is a partial X-split andθZ applies toΣ={S1,S2},thenθM(Σ)−=θZ(Σ).(ii)If S2and S3are partial X-splits so thatθY applies toΣ={S1,S2,S3} andθZ applies to{S1,S2}and{S1,S3}.ThenθZ(S1,S i))−.(θY(Σ)∪ j=2,3θM(S1,S j))−=(i∈{2,3}4Closure rules and weakly compatible collections of partial splitsIn this section we introduce the notion of a weakly compatible collection of partial splits and study properties of the Y-and M-rules regarding such collections.A particular focus lies on the study of circular collections of partial splits which we also introduce.As we will see,they form a very rich subclass of such collections of partial splits.104.1Weakly compatible collections of partial splitsWe start this section with a definition that generalizes the concept of weak compatibility for(full)splits of X[1]to partial splits of X.Suppose S i= A i| A i∈Σ(X),i=1,2,3,are three partial X-splits.Then we call S1,S2,S3 weakly compatible if at least one of the four intersectionsA1∩A2∩A3, A1∩ A2∩A3, A1∩A2∩ A3,A1∩ A2∩ A3(4) is empty1.Since the roles of A i and A i in S i,i=1,2,3,can be interchanged without changing S1,S2,S3we have that S1,S2,S3are weakly compatible if and only if at least one of the four intersectionsA1∩ A2∩ A3,A1∩A2∩ A3,A1∩ A2∩A3, A1∩A2∩A3is empty.More generally,we call a collectionΣ⊆Σ(X)of partial X-splits weakly compatible if every three partial splits inΣare weakly compatible.To give an example,the partial splits S1=123|4567,S2=124|3567,and S3= 235|146are weakly compatible whereas the partial splits S3,S4=24|135, and S5=21|346are not.Thus,{S1,...,S5}is not weakly compatible.Note that,like in the case of(full)splits,it is easy to see that any collection of pairwise compatible partial splits is also weakly compatible. Clearly any three partial splits S i=A i| A i∈Σ(X),i=1,2,3,for which precisely one of the four intersections in(4)is empty also satisfies Condition (2).ThusθY may be applied to S1,S2,S3.However,as the example of the set{127|3456,1234|567,235|146}shows,application ofθY to a weakly compatible collection of partial splits does not,in general,yield a weakly compatible collection of partial splits.Also it should be noted thatθM applied to a weakly compatible collection of partial splits does not always yield a weakly compatible collection of partial splits.However,the next result whose proof is straight forward holds.Lemma4.1SupposeΣ,Σ′⊆Σ(X).IfΣ′is weakly compatible andΣ Σ′, thenΣmust also be weakly compatible.4.2Circular collections of partial splitsWe now turn our attention to the study of a special class of weakly compati-ble collections of partial splits called circular collections of partial splits.Tobe able to state their definition,we require some more terminology whichwe introduce next.A cycle C is a connected graph with|V(C)|≥3and every vertex hasdegree2.We call C an X-cycle if the vertex set of C is X.For x i∈X (1≤i≤n:=|X|)and C an X-cycle,we call x1,x2,...,x n,x n+1=x1avertex ordering(of C)if the edge set of C coincides with the set of all2-sets{x i,x i+1}of X,i=1,...,n.For a graph G=(V,E)and some subset E′of E,we denote by G−E′thegraph obtained from G by deleting the edges in E′.We say that a partial X-split A| A is displayed by an X-cycle C if there exist two distinct edges e1and e2in C such that the vertex set of one of the two components of C−{e1,e2}contains A and the other one contains A.More generally,we say that a setΣ⊆Σ(X)of partial splits is displayed by an X-cycle C if every partial split inΣis displayed by C.Finally,we say that a collectionΣ⊆Σ(X) is circular if there exists some X-cycle C such that every partial split inΣis displayed by C.Note that every split collection inΣ(X)displayed by a circular phylogenetic network is circular.As is well-known,every circular split system is in particular weakly com-patible.The next result shows that an analogous result holds for collectionsof partial splits.Lemma4.2SupposeΣ⊆Σ(X).IfΣis circular thenΣis also weakly compatible.Proof:Suppose C is an X-cycle that displaysΣbut there exist three partialsplits S1,S2,S3∈Σsuch that with A i∈S i,i=1,2,3,playing the role of their namesakes in(4)none of the four intersections in(4)is empty.Then S1and S2are incompatible and,since S1and S2are displayed by C,there must exist edges e1,e′1,e2,e′2∈E(C)such that,for all i,j∈{1,2}distinct, the vertex set of one component of C−{e i,e′i}contains A i∪e j and the other contains A i∪e′j.Since S3is displayed by C and neither A1∩A2∩A3nor A1∩ A2∩A3,nor A1∩A2∩ A3is empty,it follows that A1∩ A2∩ A3=∅, which is impossible.Lemma4.3SupposeΣ,Σ′⊆Σ(X).IfΣ′is displayed by an X-cycle C and Σ Σ′,thenΣis also displayed by C.4.3Circularity and the M-and Y-ruleAs was noted earlier,neither the Y-rule nor the M-rule preserve weak com-patibility in general.As the next result shows,the situation is different for the special case of circular collections of partial splits.Proposition4.4SupposeΣ,Σ′∈P(X)and C is an X-cycle.IfΣ′is obtained fromΣby a single application of eitherθY orθM thenΣis displayed by C if and only ifΣ′is displayed by C.Proof:SupposeΣ,Σ′∈P(X)and C is an X-cycle.We start the proof with noting that,regardless of whetherΣ′is obtained from a single application of eitherθY orθM toΣ,Σis displayed by C wheneverΣ′is displayed by C in view of Lemma4.3.Conversely,suppose thatΣis displayed by C.Assumefirst thatΣ′is obtained fromΣby a single application ofθY.Let{S1,S2,S3}⊆Σbe the set to whichθY is applied.With A i∈S i,i=1,2,3,playing the role of their namesakes in the statement of(2),we may assume without loss of generality that none of the three intersections D1=A1∩A2∩A3,D2= A1∩ A2∩A3, and D3= A1∩A2∩ A3is empty but that A1∩ A2∩ A3=∅.It suffices to show that the partial split S=A3∪(A1∩ A2)| A3is displayed by C. Clearly,if A1∩ A2=∅then S=S3and,therefore,S is displayed by C.So assume A1∩ A2=∅.Then since by assumption D i=∅,i=1,2,3, and S1and S2are displayed by C,there must exist four distinct edges e1,e′1,e2,e′2∈E(C)such that,for all i,j∈{1,2}distinct,one component of C−{e i,e′i}contains A i in its vertex set and e j⊆A i and the other contains A i in its vertex set and e′j⊆ A i.Without loss of generality,we may assume that X={x1,...,x n},n≥3,that x1,x2,...,x n is a vertex ordering of C, and that e1={x n,x1}.Furthermore,we may also assume without loss of generality that the component of C−{e1,e′1}that contains x1in its vertex set also contains A1.Since D1=∅=D2,and S3is displayed by C there must exist distinct paths P and P′in C such that either A3⊆V(P)or A3⊆V(P′)(see Figure4).If A3⊆V(P)then∅= A1∩A2∩ A3=D3which is impossible.Thus A3⊆V(P′)must hold.Suppose y,z∈V(C)are such that when starting at x1and traversing C clockwise y is contained in13A 3and the next vertex y ′on C with y ′∈A 3∪ A 3is contained in A 3whereasz ∈ A 3and the next vertex z ′on C with z ′∈A 3∪ A 3is contained in A 3.1∩A 2∩ A 22e ′e A 1∩ A 1∩Figure 4:A schematic representation of the two alternative locations for A 3(cf proof of Proposition 4.4).The closed curve is the X -cycle C ,the four curves with the short dashes represent the four non-empty intersections A 1∩A 2,A 1∩ A 2, A 1∩ A 2and A 1∩A 2(note that each of them can consist of more than one part),the rectangles mark the intersections D 1and D 2,and the dotted and dashed curves represent the two paths P and P ′on C on which A 3can lie.Let P ′′denote the path from z ′to y (taken clockwise).Then e 2ande ′1are edges on P′′and so A 1∩ A 2⊆V (P ′′).The choice of y and z ′implies V (P ′′)∩ A 3=∅and A 3∪(A 1∩ A 2)⊆V (P ′′).Hence,the splitV (P ′′)|X −V (P ′′)which is displayed by C extends the partial split S .Thus C displays S .This concludes the proof in case the applied closure rule applied is θY .To conclude the proof of the proposition suppose Σ′is obtained from Σby a single application of θM .Let {S 1,S 2}⊆Σbe the set to which θM is applied.With A i ∈S i ,i =1,2,we may assume without loss of generalitythat A 1∩A 2=∅and A 1∩ A 2=∅.If θM applies trivially to Σthen Σ=Σ′and so Σ′must be displayed by C .If θM does not apply trivially to Σit suffices to show that C displays (A 1∩A 2)|( A 1∪ A 2).Since S 1and S 2are displayed by C there must exist edges e i ,e ′i ∈E (C )such that the vertex set of one of the two components P i ,P ′i of C −{e i ,e ′i }contains A i and the other contains A i ,i =1,2.Put k :=|{e 1,e ′1}∩{e 2,e ′2}|and note that 0≤k ≤2.Without loss of generality,we may assumeA i ⊆V (P i )and A i ⊆V (P ′i ),i =1,2.Then,∅=A 1∩A 2⊆V (P 1)∩V (P 2).Since V (P 1)∩V (P 2)is the vertex set of one of the 4−k components of C14with the edges e i,e′i,i=1,2removed,it follows that there must exist two distinct edges e3,e4among the edges e1,e′1,e2,e′2so that the vertex set of one of the two components of C−{e3,e4}is V(P1)∩V(P2).SinceX−(V(P1)∩V(P2))=(X−V(P1))∪(X−V(P2))=V(P′1)∪V(P′2) is the vertex set of the other component of C−{e3,e4}and A i⊆V(P′i), i=1,2,it follows that C displays(A1∩A2)|( A1∪ A2).This concludes the proof in caseΣ′is obtained fromΣby a single application ofθM and thus the proof of the proposition.and call n the length ofσ.In addition,we call the last element ofσa split closure ofΣ.Note that in caseΣn=ω,θapplies only trivially toΣn.The following combinations of(P)andθare of interest to us:(a)(P)is the property thatΣis weakly compatible andθis the Y-rule.(b)(P)is unspecified andθis the M-rule.(c)(P)is the property thatΣis weakly compatible andθis the M/Y-combination closure ruleθM/Y which appliesθM orθY toΣ.To elucidate the notion of a split closure sequence and a split closure associated to a set in P(X)we next present an example for the assignments of(P)andθspecified in(a).Consider the set X={1,2,3,4,5}together with the collectionΣcomprising of the partial X-splits S1=12|34,S2= 23|14,S3=15|24,and S4=45|13.Clearly,Σis displayed by an X-cycle C with vertex ordering1,2,3,4,5.ThusΣis circular and so,by Lemma4.2,Σis weakly compatible.NowθY applied to{S1,S2,S3}generates the split S′3=15|234,θY applied to{S1,S2,S4}generates the split S′4=45|123and θY applied to{S2,S′3,S′4}generates the split S′2=145|23.Since every subset ofΣ′={S1,S′2,S′3,S′4}of size three contains two pairwise compatible full splits,θY can only be applied trivially toΣ′.Hence,the sequence S0=Σ,Σ1={S1,S2,S′3,S4},Σ2={S1,S2,S′3,S′4},Σ′is a split closure sequence forΣof length3andΣ′is a split closure forΣ.Regarding(c),it should be noted that even if for someΣ∈P(X)two distinct split closure sequences have the same length and terminate in the same elementΣ′=ωone of them might utilise fewer applications ofθY (and thus more applications ofθM!)than the other.For the previous example,one way to construct two such sequences is to exploit the following relationship between the Y-rule and the M-rule for{S2,S′3,S′4}. Proposition5.1SupposeΣ={S i=A i| A i:i=1,2,3}∈P(X)is such that A1⊆A2and A2− A1⊆ A3⊆ A1∪ A2.If the Y-rule applies toΣthen θY(Σ)−={ A1∪ A2|A1,S2,S3}={S3}∪θM(S1,S2)−.Proof:Assume thatΣand S i and A i,i=1,2,3,are such that the assump-tions of the proposition are satisfied.Then A1∩ A2=∅.Combined with the assumption thatθY applies toΣ,it follows that either(2)is satisfied withA i,i=1,2,3playing the roles of their namesakes in the statement of(2)or(2)is satisfied with A3playing the role of A3and A i playing the role of A i,16。