SPR Distance Computation for Unrooted Trees
- 格式:pdf
- 大小:204.18 KB
- 文档页数:14
人工智能边缘计算方法
人工智能边缘计算方法指的是将人工智能应用于边缘计算的技
术手段。
边缘计算即将计算、存储和网络资源离用户设备更近地部署,用于处理数据和提供服务。
人工智能边缘计算方法可以将人工智能算法应用到边缘节点上,使得边缘节点可以在本地处理数据并进行决策,实现更快速、更实时的响应。
人工智能边缘计算方法主要包括两个方面:一是将人工智能算法应用到边缘节点上,例如使用机器学习算法对数据进行分类或预测;二是将边缘计算应用到人工智能中,例如使用边缘节点来加速深度学习模型的训练和推理。
在人工智能边缘计算方法中,还有一些常用的技术手段,如联邦学习、模型压缩、量化等。
联邦学习是指将多个边缘节点的数据集合并起来训练模型,避免数据离散的问题。
模型压缩可以将模型的大小减小,使得模型可以更快地在边缘节点上运行。
量化则是将模型中的参数压缩到低比特数,提高模型在边缘设备上的效率。
总的来说,人工智能边缘计算方法可以提高人工智能应用的响应速度和实时性,减少数据传输和存储的负担,同时也能够更好地保护用户隐私。
- 1 -。
超越图灵测试:自省性问答才是检测真正人工智能的方法1950年,英国数学家阿兰•图灵(Alan Turing)提出了图灵测试(Turing Test)的概念,以测试机器是否能够像人类一样进行思考。
基本形式是机器程序与人之间的简短对话。
如果一个机器程序能够骗过判断者,使其误认为是人类所答,则该程序通过测试。
图灵的初衷是希望为类人人工智能提供测试的标准,并且预测了到2000年会有机器能够通过测试。
但在2014年第一个通过测试的机器才出现。
一个俄罗斯团队开发的一款聊天软件通过了测试。
在测试中,该聊天软件模仿一名来自乌克兰的13岁男孩,并且成功地让33%的裁判相信了这一点。
该团队选择的方法必定不会是图灵喜欢的,他们把机器人的背景设置为不以英语为母语的13岁乌克兰男孩,这样他就可以合理地回避绝大部分问题。
这反应了早年定义的图灵测试的一个致命的问题:如果被测试的机器不具备人类的思维以应对测试者的问题,它可以选择以各种模式去绕开或回避问题,回避的理由可以为他的背景,比如这个例子的一个非英语母语的儿童;也可以不需要把背景仅仅作为一种风格设定:我就是不喜欢好好回答问题。
这些方法已经在现有的应答机器上得到广泛的使用。
现有的语言应答技术可以靠投机取巧通过测试,但用户总是会在和其聊天不超过1个小时之后就因为发现其应答的套路而感到厌倦,因为它不是经历了如同正常人类那样的思维而做出应答,事实上它根本不知道自己在说什么。
这就和一个针对理解力或思维能力的考试,能够通过死记硬背去通过那样,考试的效力是有问题的。
所以让我们回到当年图灵的起点,考虑一个前提性的问题:人工智能需要测试做什么?我们通过测试来评估一个人工智能实体的智能的等级。
正如当年的图灵对人工智能的定位——类人人工智能,他设计的测试也是基于这样的假设:如果机器能够像人那样思维,也就能够像人那样去聊天,并且分不出是真是假。
这里我需要表达两个观点:其一:把人工智能定义为类人人工智能是一个朴素的想法,我们这样的工程叫做基于逻辑仿生的思维工程。
普林斯顿算法
普林斯顿算法是一种用于解决最短路径问题的一种经典算法,也称为迪杰斯特拉算法。
它是一种贪婪算法,逐步构建最短路径树,从起始节点开始,依次选择与当前节点距离最近的节点,并更新该节点到其他节点的距离。
通过不断选择最短路径上的节点,最终得到起点到各个节点的最短路径。
普林斯顿算法的基本步骤如下:
1. 创建一个距离列表distances,用于保存起始节点到各个节点的最短距离,初始值为无穷大(表示未知路径)。
2. 创建一个前驱列表predecessors,用于保存路径上每个节点
的前驱节点,初始值为None。
3. 将起始节点的距离设置为0,即distances[start_node] = 0。
4. 选择距离最短且未被访问的节点作为当前节点。
5. 更新当前节点到邻居节点的距离,如果新的距离比原来的距离小,则更新距离和前驱节点。
6. 标记当前节点为已访问。
7. 重复步骤4-6直到所有节点都被访问。
8. 根据distances和predecessors构建最短路径。
普林斯顿算法的时间复杂度为O(V^2),其中V为节点数。
它
适用于处理节点数不太大的图,但在节点数非常大时,性能可能较差。
为了提高效率,还有一种优化的算法称为堆优化的迪杰斯特拉算法,它使用优先队列来选择最短距离的节点,使得时间复杂度降为O((V+E)logV),其中E为边数。
第26 卷第 3 期2 0 0 9 年9 月战术导弹控制技术Control Technology of Tactical M issileVol〃26 No〃3Sep 〃2 0 0 9无人机多机协同航迹规划的研究及发展胡中华,赵敏,撒鹏飞(南京航空航天大学自动化学院,南京210016)摘要:构建了无人机协同航迹规划的结构框架,并阐述了其发展,分析了无人机系统约束及威胁场约束,探讨了无人机航迹几何建模方法及协同规划算法的国内外研究概况,并着重分析了协同规划算法如遗传算法、神经网络及蚁群算法。
最后,阐述了无人机协同航迹规划面临的关键问题及发展趋势。
关键词:无人机;协同航迹规划;蚁群算法;遗传算法;神经网络中图分类号:O22文献标识码:A文章编号:(2009)03-050-6Research and development trend of cooperativepath planning for multiple UAVsHU Zhong-hua,ZHAO Min,SA Peng-fei(College of Automation Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 210016)Abst r act:Cooperative path planning is one of the critical technologies of m ulti unm anned air vehicles cooperative operation.The C ooperative P ath planning developm ents of the UAVs and fram ework is developed,constraint o f UAVs self and m enace fields is analyzed.The algorithms of cooperative planning and geometric m odeling hom e and abroad is also discussed.The genetic algorithm,neural networks and ant colony optim ization algorithm are particu- larly studied.Finally,a brief conclusion of the key problem s and the developm ent trend of it are described.Key words:UAV;cooperative path planning;AC O;GA;neural networks无人机(UAV,Unma nne d Air Vehic le s)由于具有重量轻、尺寸小、机动性高、隐蔽性好、适应性强和不必冒生命危险等特点,在民用和军用领域受到广泛关注。
Progressive Simplicial Complexes Jovan Popovi´c Hugues HoppeCarnegie Mellon University Microsoft ResearchABSTRACTIn this paper,we introduce the progressive simplicial complex(PSC) representation,a new format for storing and transmitting triangu-lated geometric models.Like the earlier progressive mesh(PM) representation,it captures a given model as a coarse base model together with a sequence of refinement transformations that pro-gressively recover detail.The PSC representation makes use of a more general refinement transformation,allowing the given model to be an arbitrary triangulation(e.g.any dimension,non-orientable, non-manifold,non-regular),and the base model to always consist of a single vertex.Indeed,the sequence of refinement transforma-tions encodes both the geometry and the topology of the model in a unified multiresolution framework.The PSC representation retains the advantages of PM’s.It defines a continuous sequence of approx-imating models for runtime level-of-detail control,allows smooth transitions between any pair of models in the sequence,supports progressive transmission,and offers a space-efficient representa-tion.Moreover,by allowing changes to topology,the PSC sequence of approximations achieves betterfidelity than the corresponding PM sequence.We develop an optimization algorithm for constructing PSC representations for graphics surface models,and demonstrate the framework on models that are both geometrically and topologically complex.CR Categories:I.3.5[Computer Graphics]:Computational Geometry and Object Modeling-surfaces and object representations.Additional Keywords:model simplification,level-of-detail representa-tions,multiresolution,progressive transmission,geometry compression.1INTRODUCTIONModeling and3D scanning systems commonly give rise to triangle meshes of high complexity.Such meshes are notoriously difficult to render,store,and transmit.One approach to speed up rendering is to replace a complex mesh by a set of level-of-detail(LOD) approximations;a detailed mesh is used when the object is close to the viewer,and coarser approximations are substituted as the object recedes[6,8].These LOD approximations can be precomputed Work performed while at Microsoft Research.Email:jovan@,hhoppe@Web:/jovan/Web:/hoppe/automatically using mesh simplification methods(e.g.[2,10,14,20,21,22,24,27]).For efficient storage and transmission,meshcompression schemes[7,26]have also been developed.The recently introduced progressive mesh(PM)representa-tion[13]provides a unified solution to these problems.In PM form,an arbitrary mesh M is stored as a coarse base mesh M0together witha sequence of n detail records that indicate how to incrementally re-fine M0into M n=M(see Figure7).Each detail record encodes theinformation associated with a vertex split,an elementary transfor-mation that adds one vertex to the mesh.In addition to defininga continuous sequence of approximations M0M n,the PM rep-resentation supports smooth visual transitions(geomorphs),allowsprogressive transmission,and makes an effective mesh compressionscheme.The PM representation has two restrictions,however.First,it canonly represent meshes:triangulations that correspond to orientable12-dimensional manifolds.Triangulated2models that cannot be rep-resented include1-d manifolds(open and closed curves),higherdimensional polyhedra(e.g.triangulated volumes),non-orientablesurfaces(e.g.M¨o bius strips),non-manifolds(e.g.two cubes joinedalong an edge),and non-regular models(i.e.models of mixed di-mensionality).Second,the expressiveness of the PM vertex splittransformations constrains all meshes M0M n to have the same topological type.Therefore,when M is topologically complex,the simplified base mesh M0may still have numerous triangles(Fig-ure7).In contrast,a number of existing simplification methods allowtopological changes as the model is simplified(Section6).Ourwork is inspired by vertex unification schemes[21,22],whichmerge vertices of the model based on geometric proximity,therebyallowing genus modification and component merging.In this paper,we introduce the progressive simplicial complex(PSC)representation,a generalization of the PM representation thatpermits topological changes.The key element of our approach isthe introduction of a more general refinement transformation,thegeneralized vertex split,that encodes changes to both the geometryand topology of the model.The PSC representation expresses anarbitrary triangulated model M(e.g.any dimension,non-orientable,non-manifold,non-regular)as the result of successive refinementsapplied to a base model M1that always consists of a single vertex (Figure8).Thus both geometric and topological complexity are recovered progressively.Moreover,the PSC representation retains the advantages of PM’s,including continuous LOD,geomorphs, progressive transmission,and model compression.In addition,we develop an optimization algorithm for construct-ing a PSC representation from a given model,as described in Sec-tion4.1The particular parametrization of vertex splits in[13]assumes that mesh triangles are consistently oriented.2Throughout this paper,we use the words“triangulated”and“triangula-tion”in the general dimension-independent sense.Figure 1:Illustration of a simplicial complex K and some of its subsets.2BACKGROUND2.1Concepts from algebraic topologyTo precisely define both triangulated models and their PSC repre-sentations,we find it useful to introduce some elegant abstractions from algebraic topology (e.g.[15,25]).The geometry of a triangulated model is denoted as a tuple (K V )where the abstract simplicial complex K is a combinatorial structure specifying the adjacency of vertices,edges,triangles,etc.,and V is a set of vertex positions specifying the shape of the model in 3.More precisely,an abstract simplicial complex K consists of a set of vertices 1m together with a set of non-empty subsets of the vertices,called the simplices of K ,such that any set consisting of exactly one vertex is a simplex in K ,and every non-empty subset of a simplex in K is also a simplex in K .A simplex containing exactly d +1vertices has dimension d and is called a d -simplex.As illustrated pictorially in Figure 1,the faces of a simplex s ,denoted s ,is the set of non-empty subsets of s .The star of s ,denoted star(s ),is the set of simplices of which s is a face.The children of a d -simplex s are the (d 1)-simplices of s ,and its parents are the (d +1)-simplices of star(s ).A simplex with exactly one parent is said to be a boundary simplex ,and one with no parents a principal simplex .The dimension of K is the maximum dimension of its simplices;K is said to be regular if all its principal simplices have the same dimension.To form a triangulation from K ,identify its vertices 1m with the standard basis vectors 1m ofm.For each simplex s ,let the open simplex smdenote the interior of the convex hull of its vertices:s =m:jmj =1j=1jjsThe topological realization K is defined as K =K =s K s .The geometric realization of K is the image V (K )where V :m 3is the linear map that sends the j -th standard basis vector jm to j 3.Only a restricted set of vertex positions V =1m lead to an embedding of V (K )3,that is,prevent self-intersections.The geometric realization V (K )is often called a simplicial complex or polyhedron ;it is formed by an arbitrary union of points,segments,triangles,tetrahedra,etc.Note that there generally exist many triangulations (K V )for a given polyhedron.(Some of the vertices V may lie in the polyhedron’s interior.)Two sets are said to be homeomorphic (denoted =)if there ex-ists a continuous one-to-one mapping between them.Equivalently,they are said to have the same topological type .The topological realization K is a d-dimensional manifold without boundary if for each vertex j ,star(j )=d .It is a d-dimensional manifold if each star(v )is homeomorphic to either d or d +,where d +=d:10.Two simplices s 1and s 2are d-adjacent if they have a common d -dimensional face.Two d -adjacent (d +1)-simplices s 1and s 2are manifold-adjacent if star(s 1s 2)=d +1.Figure 2:Illustration of the edge collapse transformation and its inverse,the vertex split.Transitive closure of 0-adjacency partitions K into connected com-ponents .Similarly,transitive closure of manifold-adjacency parti-tions K into manifold components .2.2Review of progressive meshesIn the PM representation [13],a mesh with appearance attributes is represented as a tuple M =(K V D S ),where the abstract simpli-cial complex K is restricted to define an orientable 2-dimensional manifold,the vertex positions V =1m determine its ge-ometric realization V (K )in3,D is the set of discrete material attributes d f associated with 2-simplices f K ,and S is the set of scalar attributes s (v f )(e.g.normals,texture coordinates)associated with corners (vertex-face tuples)of K .An initial mesh M =M n is simplified into a coarser base mesh M 0by applying a sequence of n successive edge collapse transforma-tions:(M =M n )ecol n 1ecol 1M 1ecol 0M 0As shown in Figure 2,each ecol unifies the two vertices of an edgea b ,thereby removing one or two triangles.The position of the resulting unified vertex can be arbitrary.Because the edge collapse transformation has an inverse,called the vertex split transformation (Figure 2),the process can be reversed,so that an arbitrary mesh M may be represented as a simple mesh M 0together with a sequence of n vsplit records:M 0vsplit 0M 1vsplit 1vsplit n 1(M n =M )The tuple (M 0vsplit 0vsplit n 1)forms a progressive mesh (PM)representation of M .The PM representation thus captures a continuous sequence of approximations M 0M n that can be quickly traversed for interac-tive level-of-detail control.Moreover,there exists a correspondence between the vertices of any two meshes M c and M f (0c f n )within this sequence,allowing for the construction of smooth vi-sual transitions (geomorphs)between them.A sequence of such geomorphs can be precomputed for smooth runtime LOD.In addi-tion,PM’s support progressive transmission,since the base mesh M 0can be quickly transmitted first,followed the vsplit sequence.Finally,the vsplit records can be encoded concisely,making the PM representation an effective scheme for mesh compression.Topological constraints Because the definitions of ecol and vsplit are such that they preserve the topological type of the mesh (i.e.all K i are homeomorphic),there is a constraint on the min-imum complexity that K 0may achieve.For instance,it is known that the minimal number of vertices for a closed genus g mesh (ori-entable 2-manifold)is (7+(48g +1)12)2if g =2(10if g =2)[16].Also,the presence of boundary components may further constrain the complexity of K 0.Most importantly,K may consist of a number of components,and each is required to appear in the base mesh.For example,the meshes in Figure 7each have 117components.As evident from the figure,the geometry of PM meshes may deteriorate severely as they approach topological lower bound.M 1;100;(1)M 10;511;(7)M 50;4656;(12)M 200;1552277;(28)M 500;3968690;(58)M 2000;14253219;(108)M 5000;029010;(176)M n =34794;0068776;(207)Figure 3:Example of a PSC representation.The image captions indicate the number of principal 012-simplices respectively and the number of connected components (in parenthesis).3PSC REPRESENTATION 3.1Triangulated modelsThe first step towards generalizing PM’s is to let the PSC repre-sentation encode more general triangulated models,instead of just meshes.We denote a triangulated model as a tuple M =(K V D A ).The abstract simplicial complex K is not restricted to 2-manifolds,but may in fact be arbitrary.To represent K in memory,we encode the incidence graph of the simplices using the following linked structures (in C++notation):struct Simplex int dim;//0=vertex,1=edge,2=triangle,...int id;Simplex*children[MAXDIM+1];//[0..dim]List<Simplex*>parents;;To render the model,we draw only the principal simplices ofK ,denoted (K )(i.e.vertices not adjacent to edges,edges not adjacent to triangles,etc.).The discrete attributes D associate amaterial identifier d s with each simplex s(K ).For the sake of simplicity,we avoid explicitly storing surface normals at “corners”(using a set S )as done in [13].Instead we let the material identifier d s contain a smoothing group field [28],and let a normal discontinuity (crease )form between any pair of adjacent triangles with different smoothing groups.Previous vertex unification schemes [21,22]render principal simplices of dimension 0and 1(denoted 01(K ))as points and lines respectively with fixed,device-dependent screen widths.To better approximate the model,we instead define a set A that associates an area a s A with each simplex s 01(K ).We think of a 0-simplex s 00(K )as approximating a sphere with area a s 0,and a 1-simplex s 1=j k 1(K )as approximating a cylinder (with axis (j k ))of area a s 1.To render a simplex s 01(K ),we determine the radius r model of the corresponding sphere or cylinder in modeling space,and project the length r model to obtain the radius r screen in screen pixels.Depending on r screen ,we render the simplex as a polygonal sphere or cylinder with radius r model ,a 2D point or line with thickness 2r screen ,or do not render it at all.This choice based on r screen can be adjusted to mitigate the overhead of introducing polygonal representations of spheres and cylinders.As an example,Figure 3shows an initial model M of 68,776triangles.One of its approximations M 500is a triangulated model with 3968690principal 012-simplices respectively.3.2Level-of-detail sequenceAs in progressive meshes,from a given triangulated model M =M n ,we define a sequence of approximations M i :M 1op 1M 2op 2M n1op n 1M nHere each model M i has exactly i vertices.The simplification op-erator M ivunify iM i +1is the vertex unification transformation,whichmerges two vertices (Section 3.3),and its inverse M igvspl iM i +1is the generalized vertex split transformation (Section 3.4).Thetuple (M 1gvspl 1gvspl n 1)forms a progressive simplicial complex (PSC)representation of M .To construct a PSC representation,we first determine a sequence of vunify transformations simplifying M down to a single vertex,as described in Section 4.After reversing these transformations,we renumber the simplices in the order that they are created,so thateach gvspl i (a i)splits the vertex a i K i into two vertices a i i +1K i +1.As vertices may have different positions in the different models,we denote the position of j in M i as i j .To better approximate a surface model M at lower complexity levels,we initially associate with each (principal)2-simplex s an area a s equal to its triangle area in M .Then,as the model is simplified,wekeep constant the sum of areas a s associated with principal simplices within each manifold component.When2-simplices are eventually reduced to principal1-simplices and0-simplices,their associated areas will provide good estimates of the original component areas.3.3Vertex unification transformationThe transformation vunify(a i b i midp i):M i M i+1takes an arbitrary pair of vertices a i b i K i+1(simplex a i b i need not be present in K i+1)and merges them into a single vertex a i K i. Model M i is created from M i+1by updating each member of the tuple(K V D A)as follows:K:References to b i in all simplices of K are replaced by refer-ences to a i.More precisely,each simplex s in star(b i)K i+1is replaced by simplex(s b i)a i,which we call the ancestor simplex of s.If this ancestor simplex already exists,s is deleted.V:Vertex b is deleted.For simplicity,the position of the re-maining(unified)vertex is set to either the midpoint or is left unchanged.That is,i a=(i+1a+i+1b)2if the boolean parameter midp i is true,or i a=i+1a otherwise.D:Materials are carried through as expected.So,if after the vertex unification an ancestor simplex(s b i)a i K i is a new principal simplex,it receives its material from s K i+1if s is a principal simplex,or else from the single parent s a i K i+1 of s.A:To maintain the initial areas of manifold components,the areasa s of deleted principal simplices are redistributed to manifold-adjacent neighbors.More concretely,the area of each princi-pal d-simplex s deleted during the K update is distributed toa manifold-adjacent d-simplex not in star(a ib i).If no suchneighbor exists and the ancestor of s is a principal simplex,the area a s is distributed to that ancestor simplex.Otherwise,the manifold component(star(a i b i))of s is being squashed be-tween two other manifold components,and a s is discarded. 3.4Generalized vertex split transformation Constructing the PSC representation involves recording the infor-mation necessary to perform the inverse of each vunify i.This inverse is the generalized vertex split gvspl i,which splits a0-simplex a i to introduce an additional0-simplex b i.(As mentioned previously, renumbering of simplices implies b i i+1,so index b i need not be stored explicitly.)Each gvspl i record has the formgvspl i(a i C K i midp i()i C D i C A i)and constructs model M i+1from M i by updating the tuple (K V D A)as follows:K:As illustrated in Figure4,any simplex adjacent to a i in K i can be the vunify result of one of four configurations in K i+1.To construct K i+1,we therefore replace each ancestor simplex s star(a i)in K i by either(1)s,(2)(s a i)i+1,(3)s and(s a i)i+1,or(4)s,(s a i)i+1and s i+1.The choice is determined by a split code associated with s.Thesesplit codes are stored as a code string C Ki ,in which the simplicesstar(a i)are sortedfirst in order of increasing dimension,and then in order of increasing simplex id,as shown in Figure5. V:The new vertex is assigned position i+1i+1=i ai+()i.Theother vertex is given position i+1ai =i ai()i if the boolean pa-rameter midp i is true;otherwise its position remains unchanged.D:The string C Di is used to assign materials d s for each newprincipal simplex.Simplices in C Di ,as well as in C Aibelow,are sorted by simplex dimension and simplex id as in C Ki. A:During reconstruction,we are only interested in the areas a s fors01(K).The string C Ai tracks changes in these areas.Figure4:Effects of split codes on simplices of various dimensions.code string:41422312{}Figure5:Example of split code encoding.3.5PropertiesLevels of detail A graphics application can efficiently transitionbetween models M1M n at runtime by performing a sequence ofvunify or gvspl transformations.Our current research prototype wasnot designed for efficiency;it attains simplification rates of about6000vunify/sec and refinement rates of about5000gvspl/sec.Weexpect that a careful redesign using more efficient data structureswould significantly improve these rates.Geomorphs As in the PM representation,there exists a corre-spondence between the vertices of the models M1M n.Given acoarser model M c and afiner model M f,1c f n,each vertexj K f corresponds to a unique ancestor vertex f c(j)K cfound by recursively traversing the ancestor simplex relations:f c(j)=j j cf c(a j1)j cThis correspondence allows the creation of a smooth visual transi-tion(geomorph)M G()such that M G(1)equals M f and M G(0)looksidentical to M c.The geomorph is defined as the modelM G()=(K f V G()D f A G())in which each vertex position is interpolated between its originalposition in V f and the position of its ancestor in V c:Gj()=()fj+(1)c f c(j)However,we must account for the special rendering of principalsimplices of dimension0and1(Section3.1).For each simplexs01(K f),we interpolate its area usinga G s()=()a f s+(1)a c swhere a c s=0if s01(K c).In addition,we render each simplexs01(K c)01(K f)using area a G s()=(1)a c s.The resultinggeomorph is visually smooth even as principal simplices are intro-duced,removed,or change dimension.The accompanying video demonstrates a sequence of such geomorphs.Progressive transmission As with PM’s,the PSC representa-tion can be progressively transmitted by first sending M 1,followed by the gvspl records.Unlike the base mesh of the PM,M 1always consists of a single vertex,and can therefore be sent in a fixed-size record.The rendering of lower-dimensional simplices as spheres and cylinders helps to quickly convey the overall shape of the model in the early stages of transmission.Model compression Although PSC gvspl are more general than PM vsplit transformations,they offer a surprisingly concise representation of M .Table 1lists the average number of bits re-quired to encode each field of the gvspl records.Using arithmetic coding [30],the vertex id field a i requires log 2i bits,and the boolean parameter midp i requires 0.6–0.9bits for our models.The ()i delta vector is quantized to 16bitsper coordinate (48bits per),and stored as a variable-length field [7,13],requiring about 31bits on average.At first glance,each split code in the code string C K i seems to have 4possible outcomes (except for the split code for 0-simplex a i which has only 2possible outcomes).However,there exist constraints between these split codes.For example,in Figure 5,the code 1for 1-simplex id 1implies that 2-simplex id 1also has code 1.This in turn implies that 1-simplex id 2cannot have code 2.Similarly,code 2for 1-simplex id 3implies a code 2for 2-simplex id 2,which in turn implies that 1-simplex id 4cannot have code 1.These constraints,illustrated in the “scoreboard”of Figure 6,can be summarized using the following two rules:(1)If a simplex has split code c12,all of its parents havesplit code c .(2)If a simplex has split code 3,none of its parents have splitcode 4.As we encode split codes in C K i left to right,we apply these two rules (and their contrapositives)transitively to constrain the possible outcomes for split codes yet to be ing arithmetic coding with uniform outcome probabilities,these constraints reduce the code string length in Figure 6from 15bits to 102bits.In our models,the constraints reduce the code string from 30bits to 14bits on average.The code string is further reduced using a non-uniform probability model.We create an array T [0dim ][015]of encoding tables,indexed by simplex dimension (0..dim)and by the set of possible (constrained)split codes (a 4-bit mask).For each simplex s ,we encode its split code c using the probability distribution found in T [s dim ][s codes mask ].For 2-dimensional models,only 10of the 48tables are non-trivial,and each table contains at most 4probabilities,so the total size of the probability model is small.These encoding tables reduce the code strings to approximately 8bits as shown in Table 1.By comparison,the PM representation requires approximately 5bits for the same information,but of course it disallows topological changes.To provide more intuition for the efficiency of the PSC repre-sentation,we note that capturing the connectivity of an average 2-manifold simplicial complex (n vertices,3n edges,and 2n trian-gles)requires ni =1(log 2i +8)n (log 2n +7)bits with PSC encoding,versus n (12log 2n +95)bits with a traditional one-way incidence graph representation.For improved compression,it would be best to use a hybrid PM +PSC representation,in which the more concise PM vertex split encoding is used when the local neighborhood is an orientableFigure 6:Constraints on the split codes for the simplices in the example of Figure 5.Table 1:Compression results and construction times.Object#verts Space required (bits/n )Trad.Con.n K V D Arepr.time a i C K i midp i (v )i C D i C Ai bits/n hrs.drumset 34,79412.28.20.928.1 4.10.453.9146.1 4.3destroyer 83,79913.38.30.723.1 2.10.347.8154.114.1chandelier 36,62712.47.60.828.6 3.40.853.6143.6 3.6schooner 119,73413.48.60.727.2 2.5 1.353.7148.722.2sandal 4,6289.28.00.733.4 1.50.052.8123.20.4castle 15,08211.0 1.20.630.70.0-43.5-0.5cessna 6,7959.67.60.632.2 2.50.152.6132.10.5harley 28,84711.97.90.930.5 1.40.453.0135.7 3.52-dimensional manifold (this occurs on average 93%of the time in our examples).To compress C D i ,we predict the material for each new principalsimplex sstar(a i )star(b i )K i +1by constructing an ordered set D s of materials found in star(a i )K i .To improve the coding model,the first materials in D s are those of principal simplices in star(s )K i where s is the ancestor of s ;the remainingmaterials in star(a i )K i are appended to D s .The entry in C D i associated with s is the index of its material in D s ,encoded arithmetically.If the material of s is not present in D s ,it is specified explicitly as a global index in D .We encode C A i by specifying the area a s for each new principalsimplex s 01(star(a i )star(b i ))K i +1.To account for this redistribution of area,we identify the principal simplex from which s receives its area by specifying its index in 01(star(a i ))K i .The column labeled in Table 1sums the bits of each field of the gvspl records.Multiplying by the number n of vertices in M gives the total number of bits for the PSC representation of the model (e.g.500KB for the destroyer).By way of compari-son,the next column shows the number of bits per vertex required in a traditional “IndexedFaceSet”representation,with quantization of 16bits per coordinate and arithmetic coding of face materials (3n 16+2n 3log 2n +materials).4PSC CONSTRUCTIONIn this section,we describe a scheme for iteratively choosing pairs of vertices to unify,in order to construct a PSC representation.Our algorithm,a generalization of [13],is time-intensive,seeking high quality approximations.It should be emphasized that many quality metrics are possible.For instance,the quadric error metric recently introduced by Garland and Heckbert [9]provides a different trade-off of execution speed and visual quality.As in [13,20],we first compute a cost E for each candidate vunify transformation,and enter the candidates into a priority queueordered by ascending cost.Then,in each iteration i =n 11,we perform the vunify at the front of the queue and update the costs of affected candidates.4.1Forming set of candidate vertex pairs In principle,we could enter all possible pairs of vertices from M into the priority queue,but this would be prohibitively expensive since simplification would then require at least O(n2log n)time.Instead, we would like to consider only a smaller set of candidate vertex pairs.Naturally,should include the1-simplices of K.Additional pairs should also be included in to allow distinct connected com-ponents of M to merge and to facilitate topological changes.We considered several schemes for forming these additional pairs,in-cluding binning,octrees,and k-closest neighbor graphs,but opted for the Delaunay triangulation because of its adaptability on models containing components at different scales.We compute the Delaunay triangulation of the vertices of M, represented as a3-dimensional simplicial complex K DT.We define the initial set to contain both the1-simplices of K and the subset of1-simplices of K DT that connect vertices in different connected components of K.During the simplification process,we apply each vertex unification performed on M to as well in order to keep consistent the set of candidate pairs.For models in3,star(a i)has constant size in the average case,and the overall simplification algorithm requires O(n log n) time.(In the worst case,it could require O(n2log n)time.)4.2Selecting vertex unifications fromFor each candidate vertex pair(a b),the associated vunify(a b):M i M i+1is assigned the costE=E dist+E disc+E area+E foldAs in[13],thefirst term is E dist=E dist(M i)E dist(M i+1),where E dist(M)measures the geometric accuracy of the approximate model M.Conceptually,E dist(M)approximates the continuous integralMd2(M)where d(M)is the Euclidean distance of the point to the closest point on M.We discretize this integral by defining E dist(M)as the sum of squared distances to M from a dense set of points X sampled from the original model M.We sample X from the set of principal simplices in K—a strategy that generalizes to arbitrary triangulated models.In[13],E disc(M)measures the geometric accuracy of disconti-nuity curves formed by a set of sharp edges in the mesh.For the PSC representation,we generalize the concept of sharp edges to that of sharp simplices in K—a simplex is sharp either if it is a boundary simplex or if two of its parents are principal simplices with different material identifiers.The energy E disc is defined as the sum of squared distances from a set X disc of points sampled from sharp simplices to the discontinuity components from which they were sampled.Minimization of E disc therefore preserves the geom-etry of material boundaries,normal discontinuities(creases),and triangulation boundaries(including boundary curves of a surface and endpoints of a curve).We have found it useful to introduce a term E area that penalizes surface stretching(a more sophisticated version of the regularizing E spring term of[13]).Let A i+1N be the sum of triangle areas in the neighborhood star(a i)star(b i)K i+1,and A i N the sum of triangle areas in star(a i)K i.The mean squared displacement over the neighborhood N due to the change in area can be approx-imated as disp2=12(A i+1NA iN)2.We let E area=X N disp2,where X N is the number of points X projecting in the neighborhood. To prevent model self-intersections,the last term E fold penalizes surface folding.We compute the rotation of each oriented triangle in the neighborhood due to the vertex unification(as in[10,20]).If any rotation exceeds a threshold angle value,we set E fold to a large constant.Unlike[13],we do not optimize over the vertex position i a, but simply evaluate E for i a i+1a i+1b(i+1a+i+1b)2and choose the best one.This speeds up the optimization,improves model compression,and allows us to introduce non-quadratic energy terms like E area.5RESULTSTable1gives quantitative results for the examples in thefigures and in the video.Simplification times for our prototype are measured on an SGI Indigo2Extreme(150MHz R4400).Although these times may appear prohibitive,PSC construction is an off-line task that only needs to be performed once per model.Figure9highlights some of the benefits of the PSC representa-tion.The pearls in the chandelier model are initially disconnected tetrahedra;these tetrahedra merge and collapse into1-d curves in lower-complexity approximations.Similarly,the numerous polyg-onal ropes in the schooner model are simplified into curves which can be rendered as line segments.The straps of the sandal model initially have some thickness;the top and bottom sides of these straps merge in the simplification.Also note the disappearance of the holes on the sandal straps.The castle example demonstrates that the original model need not be a mesh;here M is a1-dimensional non-manifold obtained by extracting edges from an image.6RELATED WORKThere are numerous schemes for representing and simplifying tri-angulations in computer graphics.A common special case is that of subdivided2-manifolds(meshes).Garland and Heckbert[12] provide a recent survey of mesh simplification techniques.Several methods simplify a given model through a sequence of edge col-lapse transformations[10,13,14,20].With the exception of[20], these methods constrain edge collapses to preserve the topological type of the model(e.g.disallow the collapse of a tetrahedron into a triangle).Our work is closely related to several schemes that generalize the notion of edge collapse to that of vertex unification,whereby separate connected components of the model are allowed to merge and triangles may be collapsed into lower dimensional simplices. Rossignac and Borrel[21]overlay a uniform cubical lattice on the object,and merge together vertices that lie in the same cubes. Schaufler and St¨u rzlinger[22]develop a similar scheme in which vertices are merged using a hierarchical clustering algorithm.Lue-bke[18]introduces a scheme for locally adapting the complexity of a scene at runtime using a clustering octree.In these schemes, the approximating models correspond to simplicial complexes that would result from a set of vunify transformations(Section3.3).Our approach differs in that we order the vunify in a carefully optimized sequence.More importantly,we define not only a simplification process,but also a new representation for the model using an en-coding of gvspl=vunify1transformations.Recent,independent work by Schmalstieg and Schaufler[23]de-velops a similar strategy of encoding a model using a sequence of vertex split transformations.Their scheme differs in that it tracks only triangles,and therefore requires regular,2-dimensional trian-gulations.Hence,it does not allow lower-dimensional simplices in the model approximations,and does not generalize to higher dimensions.Some simplification schemes make use of an intermediate vol-umetric representation to allow topological changes to the model. He et al.[11]convert a mesh into a binary inside/outside function discretized on a three-dimensional grid,low-passfilter this function,。
Navigating the Transition to Remote Learning The COVID-19 pandemic has forced an unprecedented shift in the education system, with schools and universities around the world transitioning to remote learning. The suddenness of the shift has created numerous challenges for both students and educators, and navigating this new terrain has been a daunting task for many. In this response, I will explore the challenges of transitioning to remote learning from multiple perspectives, including those of students, educators, and administrators.From the perspective of students, the transition to remote learning has been particularly challenging. Many students have struggled to adjust to the lack of face-to-face interaction with their teachers and classmates, and the absence of a physical classroom environment. This has made it difficult for them to stay motivated and engaged in their studies, and has led to a decline in academic performance for some. Additionally, many students have had to deal with the added stress of balancing their schoolwork with other responsibilities, such as caring for family members or working part-time jobs.From the perspective of educators, the transition to remote learning has presented a whole new set of challenges. Many teachers have had to quickly learn new technologies and teaching strategies in order to effectively deliver their lessons online. They have also had to find ways to engage their students and maintain a sense of community in a virtual classroom setting. This has required a significant amount of time and effort, and has often resulted in increased workload and stress for educators.Finally, from the perspective of administrators, the transition to remote learning has brought about a range of logistical and financial challenges. Schools and universities have had to invest in new technologies and infrastructure to support remote learning, and many have had to make difficult decisions about how to allocate limited resources. Additionally, administrators have had to find ways to ensure that students have access to the resources and support they need to succeed in a remote learning environment.Despite these challenges, there have also been some positive outcomes of the transition to remote learning. For example, many educators have found that the use oftechnology has enabled them to deliver their lessons in new and innovative ways, and has allowed them to reach a wider audience. Additionally, some students have found that remote learning has provided them with greater flexibility and autonomy in their studies, allowing them to better balance their academic and personal responsibilities.In conclusion, the transition to remote learning has been a challenging process for students, educators, and administrators alike. However, it has also presented opportunities for growth and innovation in the education system. As we continue to navigate this new terrain, it is important that we work together to address the challenges and capitalize on the opportunities presented by remote learning. By doing so, we can ensure that students receive the education they need to succeed in a rapidly changing world.。
稀疏算子(Sparse Operator)是指只对部分元素进行操作的算子,例如矩阵乘法中的稀疏矩阵。
在编译过程中,稀疏算子的处理通常涉及到如何有效地存储和计算稀疏矩阵,以及如何优化稀疏算子的计算性能。
以下是一些编译中处理稀疏算子的常见方法:
1.压缩存储:对于稀疏矩阵,可以使用压缩存储方法来减少存储空间的使用。
例
如,可以使用三元组表示法或行主序存储法等。
2.稀疏算子优化:针对稀疏算子进行优化,可以显著提高计算性能。
例如,可以
使用快速傅里叶变换(FFT)等算法加速稀疏矩阵乘法等操作。
3.代码生成优化:在编译器中,可以根据稀疏算子的特性生成优化的代码。
例如,
可以使用向量化指令、并行计算等技术来加速稀疏算子的计算。
4.内存优化:对于大规模的稀疏矩阵,内存的使用也是一个重要的问题。
可以使
用内存优化技术,例如缓存优化、内存对齐等,来提高内存的使用效率。
5.并行计算:对于大规模的稀疏矩阵操作,可以使用并行计算技术来加速计算。
例如,可以将稀疏矩阵分成多个子矩阵,并使用多线程或分布式计算等技术进行并行处理。
总之,在编译过程中处理稀疏算子需要综合考虑存储、计算和内存等多个方面,并使用各种优化技术来提高计算性能和内存使用效率。
根据您提供的主题,我们将针对基于帕累托前沿面曲率预估的超多目标进化算法展开深度和广度兼具的文章撰写。
在文章中,我们将从简到繁地探讨帕累托前沿面、曲率预估和超多目标进化算法,帮助您全面理解这一主题。
让我们来了解一下帕累托前沿面的概念。
帕累托最优解是在多目标优化问题中非常重要的概念,它代表了在多个目标中达到最优的一系列解。
在帕累托最优解中,不存在能够同时改善所有目标的解,通常需要进行权衡取舍。
我们将探讨曲率预估在多目标优化中的作用。
曲率预估是一种用来估计帕累托前沿面曲率的方法,它能够帮助算法更好地理解前沿面的性质,从而更有效地搜索最优解。
随后,我们将详细解析超多目标进化算法的原理和应用。
超多目标进化算法是针对多目标优化问题设计的一种进化算法,它通过对帕累托前沿面的曲率进行预估,能够更加准确地搜索出多目标优化问题的解集。
我们将深入讨论超多目标进化算法的优点和局限性,帮助您全面了解这一算法的特点。
在文章的结尾部分,我们将对帕累托前沿面曲率预估的超多目标进化算法进行总结和回顾,让您能够全面、深刻和灵活地理解这一主题。
我们还会共享个人观点和理解,从不同角度对这一主题进行深入思考和探讨。
通过以上方式,我们将按照知识的文章格式撰写一篇深度和广度兼具的中文文章,帮助您更好地理解基于帕累托前沿面曲率预估的超多目标进化算法。
如有需要,我们可以进一步讨论文章的具体内容和结构,以确保最终的文章能够满足您的要求。
期待和您共同探讨这一主题,并撰写一篇有价值的文章。
帕累托前沿面曲率预估的超多目标进化算法在实际应用中具有广泛的应用前景。
通过对帕累托前沿面的曲率进行预估,可以有效地优化多目标优化问题,找到更全面的解决方案。
在本文中,我们将深入探讨帕累托前沿面的概念、曲率预估方法以及超多目标进化算法的原理与应用,以帮助读者更好地理解这一重要的主题。
让我们来进一步了解帕累托前沿面的概念。
帕累托最优解是多目标优化问题的核心概念,它代表了在多个目标中找到最优解的一系列解集。
Package‘reservoirnet’April4,2023Type PackageTitle Reservoir Computing and Echo State NetworksVersion0.2.0Date2023-03-13SystemRequirements Python(>=3.7)Description A simple user-friendly library based on the'python'module'reservoirpy'.It provides aflexible interface to implement efficient ReservoirComputing(RC)architectures with a particular focus on Echo State Networks(ESN).Some of its features are:offline and online training,parallel implementation,sparse matrix computation,fast spectral initialization,advanced learningrules(e.g.Intrinsic Plasticity)etc.It also makes possible to easily createcomplex architectures with multiple reservoirs(e.g.deep reservoirs),readouts,and complex feedback loops.Moreover,graphical tools are included to easilyexplore hyperparameters.Finally,it includes several tutorials exploringtime series forecasting,classification and hyperparameter tuning.For more informationabout'reservoirpy',please see Trouvain et al.(2020)<doi:10.1007/978-3-030-61616-8_40>.This package was developed in the framework of the University of Bordeaux’s IdEx``Investments for the Future''program/RRI PHDS.Config/reticulate list(packages=list(list(package=``reservoirpy'',pip=TRUE)))License GPL(>=3)Repository CRANURL https:///reservoirpyDepends R(>=3.6)RoxygenNote7.2.3Encoding UTF-8Imports reticulate,testthat(>=3.0.0),rlang,ggplot2,ggpubr,janitor,dplyr,magrittr,methodsSuggests rmarkdown,knitr,covr,kableExtra,slider,tibble,tidyrConfig/testthat/edition3VignetteBuilder knitr1Language en-USNeedsCompilation noAuthor Thomas Ferte[aut,cre,trl],Kalidou Ba[aut,trl],Nathan Trouvain[aut],Rodolphe Thiebaut[aut],Xavier Hinaut[aut],Boris Hejblum[aut,trl]Maintainer Thomas Ferte<**************************>Date/Publication2023-04-0411:40:02UTCR topics documented:createNode (2)dfCovid (4)generate_data (5)install_reservoirpy (6)link (7)plot.reservoir_predict_seq (8)plot_2x2_perf (9)plot_marginal_perf (10)plot_perf_22 (10)predict_seq (11)print.summary.reservoirR_fit (12)random_search_hyperparam (13)reservoirR_fit (14)rloguniform (15)summary.reservoirR_fit (15)summary.reservoir_predict_seq (16)%>>% (17)Index18 createNode Function to create some nodeDescriptionFunction to create some nodeUsagecreateNode(nodeType=c("Ridge"),units=NULL,lr=1,sr=NULL,otputDim=NULL,inputDim=NULL,name=NULL,ridge=0,inputBias=TRUE,input_scaling=TRUE,input_connectivity=0.1,rc_connectivity=0.1,activation="tanh",dtype="float64",seed=NULL,...)ArgumentsnodeType Type of node.Default is"Ridge".units(int)optional Number of reservoir units.If None,the number of units will be infered from the W matrix shape.lr(float)default to1.0Neurons leak rate.Must be in:math:[0,1].sr(float)optional Spectral radius of recurrent weight matrix.otputDim Output dimension of the Node.Dimension of its state.inputDim Input dimension of the Node.name Name of the Node.It must be a unique identifier.ridgefloat,default to0.0.L2regularization parameter.inputBias bool,default to TRUE.If TRUE,then a bias parameter will be learned along with output weights.input_scalingfloat or array-like of shapes(features),default to1.0.Input gain.An array of the same dimension as the inputs can be used to set up different input scaling foreach feature.input_connectivityfloat,default to0.1.Connectivity of input neurons,i.e.ratio of input neuronsconnected to reservoir neurons.Must be between0and1.rc_connectivityfloat,default to0.1.Connectivity of recurrent weight matrix,i.e.ratio of reser-voir neurons connected to other reservoir neurons,including themselves.Mustbe between0and1.activation str’tanh’.Reservoir units activation function.Should be a activationsfunc func-tion name(’tanh’,’identity’,’sigmoid’,’relu’,’softmax’,’softplus’).4dfCoviddtype Numerical type for node parametersseed set random seed...Others paramsValueA node generated by reservoirpy python module.Examplesif(interactive()){readout<-reservoirnet::createNode("Ridge")}dfCovid Datagouv covid-19datasetDescriptionA dataset containing the data from datagouv.fr concerning covid-19infections in Aquitaine.Datarelated to hospitalizations can be found at Santépublique France-Data downloaded at https://www.data.gouv.fr/fr/datasets/r/0 6780-452d-9b8c-ae244ad529b3,update from26/01/2023.Data related to RT-PCR can be found atSantépublique France-Data downloaded at https://www.data.gouv.fr/fr/datasets/r/10639654-3864-48ac-b024-d772c218c4c1,update from26/01/2023.Usagedata(dfCovid)FormatA data frame with962rows and4variablesDetails•date.The date•hosp.Number of person hospitalized with SARS-CoV-2in Aquitaine.•Positive.Number of person with a positive RT-PCR in Aquitaine.•Tested.Number of person with a RT-PCR in Aquitaine.generate_data5 generate_data Load data from the Japanese vowels or the Mackey-GlassDescriptionMackey-Glass time series[8]_[9]_,computed from the Mackey-Glass delayed differential equa-tion:Usagegenerate_data(dataset=c("japanese_vowels","mackey_glass","both"),one_hot_encode=TRUE,repeat_targets=FALSE,reload=FALSE,n_timesteps,tau=17,a=0.2,b=0.1,n=10,x0=1.2,h=1)Argumentsdataset(String)take value in array[japanese_vowels,mackey_glass]one_hot_encode(bool),default to True.If True,returns class label as a one-hot encoded vector.repeat_targets(bool),default to False.If True,repeat the target label or vector along the time axis of the corresponding sample.reload(bool),default to False If True,re-download data from remote repository.Else, if a cached version of the dataset exists,use the cached dataset.n_timesteps(int)Number of time steps to compute.tau(int),default to17Time delay:math:‘\tau‘of Mackey-Glass equation.By de-faults,equals to17.Other values can change the choatic behaviour of the time-series.a(float)default to0.2:math:‘a‘parameter of the equation.b(float)default to0.1:math:‘b‘parameter of the equation.n(int)default to10:math:‘n‘parameter of the equation.x0(float),optional,default to1.2Initial condition of the timeseries.h(float),default to1.0Time delta between two discrete timesteps.Valuearray of shape(n_timesteps,1)Mackey-Glass timeseries.6install_reservoirpy Examplesif(interactive()){japanese_vowels<-generate_data(dataset="japanese_vowels")timeSerie<-generate_data(dataset="mackey_glass",n_timesteps=2500)res=generate_data(dataset<-"both",n_timesteps=2500)}install_reservoirpy Install reservoirpyDescriptionInstall reservoirpyUsageinstall_reservoirpy(envname="r-reticulate",method="auto")Argumentsenvname str name of environment.Default is R-reticulatemethod str type of environment type(virtualenv,conda).Default is auto(virtualenv is not available on Windows)ValueA NULL object after installing reservoirpy python module.Examples##Not run:reservoirnet::install_reservoirpy()##End(Not run)link7link Link two:py:class:~.Node instances to form a:py:class:~.Model in-stance.node1output will be used as input for node2in the createdmodel.This is similar to a function composition operation:DescriptionLink two:py:class:~.Node instances to form a:py:class:~.Model instance.node1output will be used as input for node2in the created model.This is similar to a function composition operation:Usagelink(node1,node2,name=NULL)Argumentsnode1(Node)or(list_of_Node)Nodes or lists of nodes to link.node2(Node)or(list_of_Node)Nodes or lists of nodes to link.name(str)optional Name for the chaining Model.DetailsCan update the state of the node several timesValueA reservoir model linking node1and node2.Examplesif(reticulate::py_module_available("reservoirpy")){reservoir<-reservoirnet::createNode(nodeType="Reservoir",seed=1,units=100,lr=0.7,sr=1,input_scaling=1)readout<-reservoirnet::createNode(nodeType="Ridge",ridge=0.1)model<-reservoirnet::link(reservoir,readout)}8plot.reservoir_predict_seq plot.reservoir_predict_seqplot.reservoir_predict_seqDescriptionplot.reservoir_predict_seqUsage##S3method for class reservoir_predict_seqplot(x,...,vec_nodes=c(1:20),vec_time=NULL)Argumentsx A reservoir_predict_seq object...deprecatedvec_nodes Number of nodes to plotvec_time Time to plotValueA ggplotExamplesif(reticulate::py_module_available("reservoirpy")){reservoir<-reservoirnet::createNode(nodeType="Reservoir",seed=1,units=100,lr=0.7,sr=1,input_scaling=1)X<-matrix(data=rnorm(100),ncol=4)reservoir_state_stand<-reservoirnet::predict_seq(node=reservoir,X=X)plot(reservoir_state_stand)summary(reservoir_state_stand)}plot_2x2_perf9 plot_2x2_perf plot_2x2_perfDescriptionPlot2x2combinations of the hyperparameters.Usageplot_2x2_perf(dfPerf,perf_lab="Median relative error",legend_position="bottom",trans="log10")ArgumentsdfPerf The performance dataframe which should have the columns:perf,ridge,in-put_scaling,leaking_rate,spectral_radius.Where perf is the performance met-ricperf_lab The label of the performance metric.legend_positionPosition of legend passed to ggarrangetrans The transformation(default is"log10")ValueA mutliple2x2plots.ExamplesdfPerf<-data.frame(perf=runif(n=10),ridge=runif(n=10),input_scaling=runif(n=10),leaking_rate=runif(n=10))reservoirnet::plot_2x2_perf(dfPerf=dfPerf)10plot_perf_22 plot_marginal_perf plot_marginal_perfDescriptionget marginal performance from dfPerfUsageplot_marginal_perf(dfPerf,color_cut=10,perf_lab="Median relative error")ArgumentsdfPerf The performance dataframe which should have the columns:perf,ridge,in-put_scaling,leaking_rate,spectral_radius.Where perf is the performance met-riccolor_cut The cutting point to highlight best values(default=10)perf_lab The label of the performance metric.ValueA plot with4facetsExamplesdfPerf<-data.frame(perf=runif(n=10),ridge=runif(n=10),input_scaling=runif(n=10),leaking_rate=runif(n=10))reservoirnet::plot_marginal_perf(dfPerf=dfPerf,color_cut=2)plot_perf_22plot_perf_22DescriptionUnit plot for2x2functionUsageplot_perf_22(x,y,dfPerf,perf_lab,trans="log10")predict_seq11Argumentsx The x featurey The y featuredfPerf The performance dataframe which should have the columns:perf,ridge,in-put_scaling,leaking_rate,spectral_radius.Where perf is the performance met-ricperf_lab The label of the performance metric.trans The transformation(default is"log10")ValueA2x2plotExamplesdfPerf<-data.frame(perf=runif(n=10),ridge=runif(n=10),input_scaling=runif(n=10),leaking_rate=runif(n=10))reservoirnet::plot_perf_22(dfPerf=dfPerf,x="ridge",y="input_scaling",perf_lab="MSE")predict_seq Run the node-forward function on a sequence of dataDescriptionRun the node-forward function on a sequence of dataUsagepredict_seq(node,X,formState=NULL,stateful=TRUE,reset=FALSE) Argumentsnode nodeX array-like of shape([n_inputs],timesteps,input_dim)A sequence of data of shape(timesteps,features).12print.summary.reservoirR_fit formState array of shape(1,output_dim),optional Node state value to use at begining of computation.stateful bool,default to TRUE If True,Node state will be updated by this operation.reset bool,default to FALSE If True,Node state will be reset to zero before this oper-ation.DetailsCan update the state of the node several timesValueAn object of class reservoir_predict_seq.This object is a numeric vector containing the matrix of the prediction of the reservoir.It is either the forecast of the ridge layer or the node state of the reservoir if no ridge layer is given.Examplesif(reticulate::py_module_available("reservoirpy")){reservoir<-reservoirnet::createNode(nodeType="Reservoir",seed=1,units=100,lr=0.7,sr=1,input_scaling=1)X<-matrix(data=rnorm(100),ncol=4)reservoir_state_stand<-reservoirnet::predict_seq(node=reservoir,X=X)plot(reservoir_state_stand)summary(reservoir_state_stand)}print.summary.reservoirR_fitreservoirR_fit print summaryDescriptionprint S3method for summary.reservoirR_fit objectUsage##S3method for class summary.reservoirR_fitprint(x,...)random_search_hyperparam13 Argumentsx an object of class summary.reservoirR_fit to print....further arguments.ValueA NULL object which shows the model setting to perform the reservoirfit.Examplesif(reticulate::py_module_available("reservoirpy")){}random_search_hyperparamrandom_search_hyperparamDescriptionGenerate a hyperparameter simulation table using functions as input.Usagerandom_search_hyperparam(n=100,ls_fct=list(ridge=function(n)1e-05,input_scaling=function(n)1,spectral_radius =function(n)rloguniform(n=n,min=0.01,max=10),leaking_rate=function(n)rloguniform(n=n,min=0.001,max=1)))Argumentsn Number of searchls_fct A list of functionsValueA dataframe of size n x4.Each row is a different set of hyperparameters.14reservoirR_fitExamplesrandom_search_hyperparam(n=100,ls_fct=list(ridge=function(n)1e-5,input_scaling=function(n)1,spectral_radius=function(n)rloguniform(n=n,min=1e-2,max=10),leaking_rate=function(n)rloguniform(n=n,min=1e-3,max=1)))reservoirR_fit Offlinefitting method of a NodeDescriptionOfflinefitting method of a NodeUsagereservoirR_fit(node,X,Y,warmup=0,stateful=FALSE,reset=FALSE) Argumentsnode nodeX array-like of shape[n_inputs],[series],timesteps,input_dim),op-tional Input sequences dataset.If None,the method will try tofit the param-eters of the Node using the precomputed values returned by previous call of:py:meth:partial_fit.Y array-like of shape([series],timesteps,output_dim),optional Teacher signals dataset.If None,the method will try tofit the parameters of the Node us-ing the precomputed values returned by previous call of:py:meth:partial_fit,or tofit the Node in an unsupervised way,if possible.warmup:int,default to0Number of timesteps to consider as warmup and discard at the begining of each timeseries before training.stateful is boolenreset is boolean.Should the node status be reset beforefitting.ValueAfitted reservoir of class reservoiR_fit containing thefitted model.rloguniform15Examplesif(reticulate::py_module_available("reservoirpy")){}rloguniform rloguniformDescriptionSimulate a log-uniform distributionUsagerloguniform(n,min=10^-1,max=10^2)Argumentsn number of samplemin minimum of the distributionmax maximum of the distributionValueA vector of simulated valuesExamplesrloguniform(n=1)summary.reservoirR_fitreservoirR_fit summaryDescriptionsummary S3method for reservoirR_fit objectUsage##S3method for class reservoirR_fitsummary(object,...)16summary.reservoir_predict_seqArgumentsobject an object of class reservoirR_fit to summarized....further arguments.Valuea list objectExamplesif(reticulate::py_module_available("reservoirpy")){}summary.reservoir_predict_seqsummary.reservoir_predict_seqDescriptionsummary.reservoir_predict_seqUsage##S3method for class reservoir_predict_seqsummary(object,...)Argumentsobject A reservoir_predict_seq object...Additional argument(unused)ValueA dataframe with node activationExamplesif(reticulate::py_module_available("reservoirpy")){reservoir<-reservoirnet::createNode(nodeType="Reservoir",seed=1,units=100,lr=0.7,sr=1,input_scaling=1)X<-matrix(data=rnorm(100),ncol=4)reservoir_state_stand<-reservoirnet::predict_seq(node=reservoir,X=X)plot(reservoir_state_stand)%>>%17 summary(reservoir_state_stand)}%>>%Takes two nodes and applies python operator>>DescriptionA port of the>>"chevron"operator from reservoirpy.Usagenode1%>>%node2Argumentsnode1a Node or a list of Nodesnode2a Node or a list of NodesValueA node or a list of nodes.Examplesif(interactive()){source<-reservoirnet::createNode("Input")reservoir<-reservoirnet::createNode("Reservoir",units=100,lr=0.1,sr=0.9)source%>>%reservoirreadout<-reservoirnet::createNode("Ridge")list(source%>>%reservoir,source)%>>%readout}Index∗datasetsdfCovid,4%>>%,17chevron(%>>%),17createNode,2dfCovid,4generate_data,5install_reservoirpy,6link,7plot.reservoir_predict_seq,8plot_2x2_perf,9plot_marginal_perf,10plot_perf_22,10predict_seq,11print.summary.reservoirR_fit,12random_search_hyperparam,13reservoirR_fit,14rloguniform,15summary.reservoir_predict_seq,16 summary.reservoirR_fit,1518。
量子算法在自然语言处理中的应用在当今数字化的时代,自然语言处理(NLP)已经成为了一项至关重要的技术。
它使得计算机能够理解和处理人类语言,从而实现了诸如机器翻译、文本分类、问答系统等众多应用。
而随着量子计算技术的迅速发展,量子算法为自然语言处理带来了新的机遇和挑战。
量子计算基于量子力学的原理,利用量子比特(qubit)的叠加和纠缠等特性来进行计算,具有远超传统计算的并行处理能力。
当量子算法应用于自然语言处理时,有望解决一些传统方法难以应对的复杂问题。
在自然语言处理中,一个关键的任务是对文本进行分类。
传统的机器学习算法在处理大规模文本数据时,可能会遇到计算效率和准确性的瓶颈。
量子算法中的量子支持向量机(QSVM)为文本分类提供了新的思路。
QSVM 利用量子比特的叠加态来同时处理多个特征,从而能够更快速地找到最优的分类超平面。
与传统的支持向量机相比,QSVM 在处理高维特征空间时具有潜在的优势,能够更有效地处理海量的文本特征,提高分类的准确性。
另一个重要的应用是机器翻译。
在传统的机器翻译中,需要对大量的语料库进行学习和建模,以建立语言之间的映射关系。
量子算法中的量子退火算法(QA)可以用于优化翻译模型的参数。
QA 能够在复杂的能量景观中快速找到全局最优解,从而改善翻译模型的性能。
通过利用量子退火算法,可以更准确地捕捉语言之间的语义和语法关系,提高翻译的质量。
在信息检索和问答系统中,量子算法也能发挥作用。
量子漫步算法(Quantum Walk Algorithm)可以用于快速搜索和匹配文本中的模式。
与传统的随机漫步算法相比,量子漫步算法具有更快的收敛速度和更好的搜索性能。
这意味着在面对大规模的文本数据库时,能够更迅速地找到与用户问题相关的信息,提高问答系统的响应速度和准确性。
然而,要将量子算法成功应用于自然语言处理并非一帆风顺。
首先,量子计算目前仍处于发展的早期阶段,硬件技术还不够成熟,存在着量子比特的稳定性、噪声等问题。
SPR Distance Computation for Unrooted TreesGlenn Hickey∗Frank Dehne†Andrew Rau-Chaplin‡Christian Blouin§Abstract:The subtree prune and regraft distance(d SP R)between phylogenetic trees is important both as a general means of comparing phylogenetic tree topologies as well as a measure of lateral gene transfer(LGT).Although there has been extensive study on the computation of d SP R and similar metrics between rooted trees,much less is known about SPR distances for unrooted trees, which often arise in practice when the root is unresolved.We show that unrooted SPR distance computation is NP-Hard and verify which techniques from related work can and cannot be applied. We then present an efficient heuristic algorithm for this problem and benchmark it on a variety of synthetic datasets.Our algorithm computes the exact SPR distance between unrooted tree,and the heuristic element is only with respect to the algorithm’s computation time.Our method is a heuristic version of afixed parameter tractability(FPT)approach and our experiments indicate that the running time behaves similar to FPT algorithms.For real data sets,our algorithm was able to quickly compute d SP R for the majority of trees that were part of a study of LGT in144prokaryotic genomes.Our analysis of its performance,especially with respect to searching and reduction rules, is applicable to computing many related distance measures.Keywords:Unrooted trees,SPR distance,lateral gene transfer,phylogenetic tree metrics1IntroductionPhylogenetic trees are used to describe evolutionary relationships.DNA or protein se-quences are associated with the leaves of the tree and the internal nodes correspond to speciation or gene duplication events.In order to model ancestor-descendant relationships on the tree,a direction must be associated with its edges by assigning a root.Often,insuffi-cient information exists to determine the root and the tree is left unrooted.Unrooted trees still provide a notion of evolutionary relationship between organisms even if the direction of descent remains unknown.The phylogenetic tree representation has recently come under scrutiny with critics claim-ing that it is too simple to properly model microbial evolution,particularly in the presence of lateral gene transfer(LGT)events(Doolittle1999).A LGT is the transfer of genetic material between species by means other than inheritance and thus cannot be represented in a tree as it would create a cycle.The prevalence of LGT events in microbial evolution can,however,still be studied using phylogenetic trees.Given a pair of trees describing the same sets of species,each constructed using different sets of genes,a LGT event corresponds to a displacement of a common subtree,referred to as a SPR operation.The SPR distance is the minimum number of SPR operations,denoted by d SP R,that explain the topological differences between a pair of trees.It is equivalent to the number of transfers in the most ∗Corresponding Author.School of Computer Science,Carleton University,Ottawa,Canada K1S5B6. Tel:+1(613)520-2600ext.4588,Email:ghickey@scs.carleton.ca†School of Computer Science,Carleton University,Ottawa,Canada,‡Faculty of Computer Science,Dalhousie University,Halifax,Canada,http://users.cs.dal.ca/∼arc §Faculty of Computer Science,Dalhousie University,Halifax,Canada,cblouin@cs.dal.caparsimonious LGT scenario(Beiko&Hamilton2006).In general,d SP R can be used as a measure of the topological difference between two trees,e.g.for comparing the outputs of different tree construction algorithms.Tree bisection and reconnection(TBR)is a generalization of SPR that allows the pruned subtree to be rerooted before being putation of the TBR distance(d T BR) was shown to be NP-hard by(Allen&Steel2001),who also provided two rules that re-duce two input trees to a size that is a linear functions of d T BR without altering their distance.These rules,which reduce common chains and subtrees,also form the basis of algorithms that compute the SPR distance between rooted trees(d rSP R)(Bordewich& Semple2004)as well as hybridization number(h)(Bordewich et al.2007),see Section3.3. Such algorithms proceed as follows.First the distance problem is shown to be equivalent to counting components of a maximum agreement forest,and then it is shown that the application of the rules do not alter the number of components in the forest.These steps have been successfully applied to d T BR,d rSP R and h but not d SP R,for which no equivalent agreement forest problem is known.As a consequence,the computational complexity of d SP R has remained an open problem.We provide a proof of NP-Hardness in Section2.In Section3,we present an efficient algorithm that relies only on the subtree reduction rule to compute the SPR distance of unrooted trees.An implementation of this algorithm was tested on a variety of data,and the results are analyzed in Section4.In particular,we show that the conjecture that chain decomposition is d SP R-preserving for unrooted trees(Allen &Steel2001)is strongly supported by our data.The contributions of this paper can be summarized as follows:(1)We show that SPR distance computation is NP-hard for unrooted trees.(2)We present an efficient heuristic algorithm for this problem and benchmark it on a variety of synthetic datasets.Our algo-rithm computes the exact SPR distance between unrooted trees,and the heuristic element is only with respect to the algorithm’s computation time.Our method is a heuristic version of afixed parameter tractability(FPT)approach(Downey&Fellows1998)and our exper-iments indicate that the running time behaves similar to FPT algorithms.For real data sets,our algorithm was able to quickly compute d SP R for the majority of trees that were part of a study of LGT in144prokaryotic genomes.(3)Our analysis of its performance, especially with respect to searching and reduction rules,is applicable to computing many related distance measures.(4)In(Bordewich et al.2007),a decomposition by common clusters was used with significant practical success.We show that such a decomposing by common clusters cannot be used to compute exact SPR distance for unrooted trees(Figure 4)which is somewhat counterintuitve.2SPR Distance Computation is NP-Hard for Unrooted Trees In(Hein et al.1996),it was shown that computing the size of a the Maximum Agreement Forest(MAF)of two trees is NP-Hard by reducing it from Exact Cover of3-Sets(X3C). Later,(Allen&Steel2001)proved that this result is insufficient to show the hardness of unrooted SPR distance because there is no direct relationship between MAF size and d SP R,as was previously claimed.Similar techniques have since been used in(Bordewich& Semple2004)to show that rooted SPR distance is NP-Hard via reduction from X3C to a rooted version of MAF.We show that although d SP R cannot be used to compute|MAF| in general,it can for the trees used in the polynomial-time reduction from X3C and this is sufficient to show that d SP R is NP-Hard as well.We begin with two preliminary definitionsDefinition2.1.An agreement forest for two trees is any common forest that can be ob-tained from both trees by cutting the same number of edges from each tree,applying forced contractions after each cut.A maximum agreement forest(MAF)for two trees is an agree-ment forest with a minimum number of components.(Hein et al.1996)Definition2.2.The exact cover by3-sets(X3C)problem is defined as follows(Garey& Johnson1979):Given a set X with|X|=n=3q and a collection C of m3-element subsets of X.Does C contain an exact cover for X,ie,a sub-collection C′⊆C such that every element of X occurs in exactly one member of C′?NOTE:This problem remains NP-Complete if no element occurs in more than three subsets.Also note that this problem remains NP-Complete if each element occurs in exactly three subsets.This second property is implied by(Hein et al.1996)though never explicitly stated.A supplemental proof is provided in Appendix A.We now review the polynomial-time reduction from X3C to MAF provided by(Hein et al.1996),clarifying their notation to reflect that each element of X belongs to exactly three subsets in C,ie|X|=|C|=3q=m=n,a fact implied but not clearly stated in their paper.An instance of X3C is transformed into two rooted phylogenetic trees shown in Figure1.Each element of X is represented by a triplet of the form{a,u,v}and each triplet appears3times in each tree,once for each occurrence in a subset in C.Tree T1is illustrated in Figure1(a).Each subtree A i∈T1,shown in Figure1(b)corresponds to a subset c i∈C.Each subtree of A i induced by the triple{a i,j,u i,j,v i,j}where j∈{1,2,3} corresponds to a single element of X.Tree T2,shown in Figure1(c),has the same leaf set as T1but a different topology.Each D i subtree of T2,as seen in Figure1(e),corresponds to a subset in C except only the a-part of each triplet is present.Each B i subtree of T2,as seen in Figure1(d),corresponds to an element in X.Each such element x={a,u,v}in the set X appears in three different subsets of C:c j,c k,and c l.Without loss of generality,assume it consists of thefirst element of c j,second element of c k,and third element of c l.The corresponding B tree would have leaves{u j,j′,u k,k′,u l,l′,v j,j′,v k,k′,v l,l′}where j′=1,k′=2,l′=3.(Hein et al.1996)show that|MAF(T1,T2)|=20q+1if and only if C contains an exact cover of X.Note that we have added the z leaf to these trees,unrooting them.This does not have any affect on the|MAF|as z can remain attached to x1in the agreement forest without the addition of any new components.Proving that d SP R(T1,T2)=|MAF(T1,T2)−1|is sufficient to transform any instance of X3C where|X|=|C|=3q to an instance of d SP R.In fact,it is sufficient to show that the inequality d SP R(T1,T2)≤|MAF(T1,T2)−1|is true as d SP R(T1,T2)≥|MAF(T1,T2)−1| follows from Lemma2.7(b)and Theorem2.13from(Allen&Steel2001).We proceed much in the same way as the original proof,noting that each SPR operation used to transform to T1to T2corresponds to a cut required to form their MAF.MAF(T1,T2)is formed by the cutting edges from A i subtrees(and the corresponding subtrees in T2)in either of two possible ways(Hein et al.1996):1.Cut leaves u i,1,v i,1,u i,2,v i,2,u i,3,v i,3and then prune the remaining subtree formed byleaves{a i,1,a i,2,a i,3}.Such a procedure contributes7components to the MAF.2.Cut the leaves a i,1,a i,2,a i,3then cut each of the remaining two-leaf subtrees:{u i,1,v i,1},{u i,2,v i,2},and{u i,3,v i,3}.These operations contribute6components to the MAF2n−12n (a)Tree T1i,3i,3i,2i,2(b)Subtree A ix(c)TreeT2l,l’(d)Subtree B ii,3(e)Subtree D iFigure1:Reduction of an instance of X3C to|MAF(T1,T2)|from an{a,u,v}triplet.The instance of X3C has a solution if and only if|MAF(T1,T2)|=20q+1(where n=3q).(a)(b)(c)4(d)Figure2:(a)Original tree.(b)Edge uv is removed,pruning subtree rooted at u.(c) Subtree is regrafted,creating new vertex v′.(d)Degree-2vertex v is contracted.We now show that given two trees T1and T2and their MAF,which was created using the above cut operations,there exists|MAF|−1SPR operations that can transform T1 to T2.In particular,for each set of cut operations,there exists an equivalent set of SPR operations.1.Prune leaves u i,1,v i,1,u i,2,v i,2,u i,3,v i,3from A i and regraft them onto the chain,form-ing B i subtrees in the required positions.Prune the subtree{a i,1,a i,2,a i,3}and regraft into the position of D i.In this case,7SPR operations are performed.2.Prune the leaves a i,1,a i,2,a i,3and regraft them onto the chain,forming a D i subtreein the proper position.Prune the remaining two-leaf subtrees:{u i,1,v i,1},{u i,2,v i,2}, and{u i,3,v i,3}and regraft them onto the chain,forming B i subtree components in the required position.6SPR operations are used.There is a one-to-one correspondence between cuts formed when creating the MAF and SPR operations that can transform T1to T2.Thus d SP R(T1,T2)≤|MAF(T1,T2)|−1and the proof is completed.3Algorithm for d SPR Computation3.1DefinitionsAll trees referred to in this paper,unless otherwise stated,are unrooted binary phylogenetic trees.Such trees have interior vertices of degree3and uniquely labeled leaves.Given a tree T,let V(T),L(T)and E(T)∈{V(T)×V(T)}be the vertex,leaf,and edge sets of T respectively.A tree can be rooted by adding a root vertex of degree2.A pendant subtree of T is any rooted tree T′such that V(T′)⊆V(T),L(T′)⊆L(T)and E(T′)⊆E(T).A SPR operation on a tree T is defined by the following three steps,illustrated in Figure2. First,an edge{u,v}∈E(T)is removed,effectively pruning a pendant subtree rooted at u from T.A new interior vertex w is created by subdividing an edge in T and the subtree is then regrafted by creating edge{u,w}.Finally,the degree-2vertex v is contracted by identifying its incident edges.The SPR distance between T1and T2,denoted d SP R(T1,T2), is the minimum number of SPR operations required to transform T1into T2.Furthermore, d SP R is a metric(Allen&Steel2001).3.2Exhaustive SearchThe reduction rules referred to above only serve to transform the original problem into smaller subproblems.These subproblems must still be solved with an exhaustive search as the problem is NP-Hard(see proof in Appendix).Let G SP R(n)be the graph such that each vertex in the graph is associated with a unique tree topology with n leaves,and all possible topologies are in the graph.A pair of vertices in this graph are connected if their SPR distance puting d SP R(T1,T2)is therefore equivalent tofinding the length of the shortest path between T1and T2on G SP R(n)and can be computed through an exhaustive breadth-first search beginning at T1.In(Allen&Steel2001),it was shown that each tree will have O(n2)neighbors in the graph and it follows that the search will visit O(n2)trees of distance1from T1,O(n4)trees of distance2,up to O(n2k)trees of distance k.A hash table is kept to ensure the same tree is not visited twice.Assuming that it can be updated in constant time,each tree can be processed in O(n)bringing the time and space complexity of the search to O(n2k+1).While it is still an open problem to determine if reduction rules can be found to reduce n to k in the asymptotic complexity above,the value of the exponent can be reduced significantly.Observe that there must be some tree T′such that d SP R(T1,T′)=⌊k/2⌋and d SP R(T2,T′)=⌈k/2⌉because d SP R is a metric and therefore satisfies the triangle inequality. T′and,correspondingly,k can be computed by performing two breadth-first searches,with origins at T1and T2simultaneously.During the i th iteration of the search,all trees of distance i fromfirst T1then T2are explored and updated into the same hash table.T′is thefirst tree to be found by both searches and d SP R(T1,T2)is2i−1if T′is found in the search for T1or2i otherwise.Pseudocode is given in Algorithm1.The time complexity of this algorithm is O(n⌊k/2⌋+1)+O(n⌈k/2⌉+1)=O(n k+2).This is a significant reduction from the simple search but the complexity is still prohibitive.Fortunately,heuristics can greatly speed up many instances of the problem while still guaranteeing an exact answer.3.3Heuristic ImprovementsA subtree reduction replaces any pendant subtree that occurs in both input trees by a single leaf with a new label in each tree as as shown in Figure3(a).A chain reduction,illustrated in3(b),replaces any chain of pendant subtrees that occur identically in both trees by three new leaves with new labels correctly oriented to preserve the direction.(Allen&Steel2001) showed that maximum application of both of these rules reduces the size of the input trees to a linear function of d T BR.This result also holds for d SP R as d SP R≤2d T BR for two trees since each TBR operation can be replaced by2SPR operations.It is trivial to show that subtree reductions do not alter d SP R but,unlike d T BR it is presently unknown whether or not chain reductions affect d SP R,therefore they can not be used in an exact algorithm. However,our experimental results,further described in Section4,do support the conjecture that chain reductions do not affect SPR distance.In addition to applying reductions on the input trees,intermediate trees visited during the breadth-first search can be likewise reduced.For example,if T∗is a tree found on the i th iteration from T1that has a common pendant subtree with T2,then that subtree can be reduced to a leaf in T∗and T2without affecting d SP R(T∗,T2).Accordingly,the shortest path from T1to T2will still be found by a search that applies subtree reductions to the intermediate trees.For ease of maintaining the hash table of trees visited,in our implementation weflag common subtrees rather than remove them and use theseflags toAlgorithm1SPRDIST(T1,T2)1:if T1=T2then2:return03:end if4:Apply subtree reductions to T1and T25:d←06:H←empty hash table7:L1,L A←empty lists8:Insert T1into L19:Insert T2into L A10:loop11:L2,L B←empty lists12:if ITERATE(L1,L2,H,T2)=TRUE then13:return d14:else15:L1←L216:d←d+117:end if18:if ITERATE(L A,L B,H,T1)=TRUE then19:return d20:else21:L A←L B22:d←d+123:end if24:end loopAlgorithm2ITERATE(L in,L out,H,T)1:for all t∈L in do2:if t∈H then3:return TRUE4:else5:Append set of SPR neighbors of t to L out6:Insert t into H7:end if8:end for9:return FALSE(a)(b)Figure3:Reduction rules applied to a tree.(a)A subtree is reduced to a leaf.(b)A chain of length n is reduced to a chain of length3.avoid performing SPR operations that would prune from or regraft toflagged subtrees. This process has no adverse effect on the asymptotic complexity of the search as common subtrees and chains can be detected in O(n).It is expected that performing reductions on intermediate trees will lessen the total number of trees searched but we are unable to show that it will affect the worst case complexity.Because the number of trees visited in each iteration of the exhaustive search increases exponentially,the asymptotic complexity is bounded by the number of trees explored in the final iteration.It follows that the order in which these trees are searched can have a critical impact on the running time.We attempt to increase the probability that the tree upon which the search is completed is visited near the beginning of an iteration by sorting the trees in each iteration according to how many leaves are eliminated in by subtree reduction. Our hypothesis is that trees with larger common subtrees are more likely to be near the destination tree.Since at most n leaves can be eliminated by subtree reductions,the trees can be bucket sorted in O(n)time,leaving the asymptotic complexity unchanged.These last two heuristics are employed by replacing the call to ITERATE in SPRDIST to a call to SORT ITERATE,shown in Algorithm3.Algorithm3SORT ITERATE(L in,L out,H,T)1:for all t∈L in do2:Flag all subtrees in t that also occur in T3:end for4:Bucket Sort L in in decreasing order by number of verticesflagged5:for all t∈L in do6:if t∈H then7:return TRUE8:else9:Append set of SPR neighbors which preserveflagged subtrees of t to L out10:Insert t into H11:end if12:end for13:return FALSEA cluster is the leaf set of a pendant subtree.T1and T2share a common cluster C if they contain pendant subtrees S1and S2respectively such that L(S1)=L(S2)=C.In (Baroni et al.2006),it was shown that the hybridization number of two trees is equal to the total of the hybridization numbers of all their pairs of maximal common clusters.In(Beiko &Hamilton2006),the authors made a similar assumption in their heuristic algorithm to measure LGT.Such a decomposition makes intuitive sense for exact SPR distance as well, as it would seem that any SPR operation that affects more than one common cluster would not reduce the distance and therefore not be part of an optimal solution.Unfortunately, this is not the case as evidenced by the counterexample given in Figure4which presents T1 and T2that share the common cluster{7,8,9}.d SP R(T1,T2)=3and3SPR operations are shown that transform T1into T2,thefirst of which breaks the common cluster.Indeed an exhaustive simulation showed that no3sequential SPR operations exist to transform the trees that do not break the common clusters.This can be more easily seen by observing that any such sequence would have to regraft7to9and only2operations would be left to transform the cluster{1,2,3,4,5,6}which is clearly insufficient.98(a)T 1(b){3,4}is regrafted to {9}(c){2}is regrafted to {3}(d)Figure 4:Example of trees whose common clusters cannot be maintained by a minimal SPR path.T 1(a)and T 2(b)have a SPR distance of three but all possible sequences of SPR operations of this length (one is shown by the dotted lines)break the common cluster {7,8,9}.4Experimental Results 4.1DatasetsThe datasets were chosen to analyze the merits of the heuristics discussed in the previous section as well as evaluate our algorithm in a realistic setting.To these ends,we bench-marked our algorithm on a variety of randomly generated trees,as well as trees created by (Beiko et al.2005)in the course of analyzing the proteins from the 144sequenced prokaryotic genomes available at the time.Two sets of random trees were generated,one by the Yule-Harding model and the other by random walks.Yule-Harding trees are con-structed by first creating an edge between two randomly selected leaves,then randomly attaching the remaining leaves to the tree until none are left.The random walk dataset consists of pairs of trees such that one of which is generated by the Yule-Harding model and the other is created from the first by applying a sequence of between 2and 8random SPR operations (Beiko &Hamilton 2006).The size of the datasets,along with the average distances computed by our algorithm are presented in Figure 5.In some cases,the pro-gram ran out of memory before finding the solution.The fraction of instances successfully resolved for each type of data is listed in the “%Resolved”column.4.2PerformanceThe algorithm described in Section 3was implemented in C++and benchmarked on a2.6Ghz Pentium Xeon System with 3G of RAM.The source code is available at http://morticia.cs.dal.ca/lab public/?Download .This program was executed for all pairs of trees described in Figure 5with and without the various heuristic optimizations discussed previously.Graphs (a),(c)and (e)in Figure 6display the effectiveness of the reduction rules’ability to reduce the input trees.As could be expected,the trees in the protein and random SPR walk datasets are reduced more than the two random datasets as their ratios of size to distance are much higher.In all cases,the amount of reduction increases in correlation to the mean distance rather than n .Our method is essentially a fixed parameter tractability (FPT)approach (Downey &Fellows 1998)and our experiments indicate that0 50 100 150 200 250 4 5 6 7 89 10N u m b e r o f T r e e P a i r s Number of Leaves (n)Total Resolved (a)Yule-Harding Random Pct.Resolved0 123 45 4 56789 10d S P R Number of Leaves (n)Min/Mean/Max - Distance(b)Yule-Harding Random Distances 0 10 20 30 40 50 60 10 20 30 40 50 60 70 80 90 100N u m b e r o f T r e e P a i r s Number of Leaves (n)Total Resolved (c)Simulated SPR Walk Pct.Resolved0 1 2 3 4 56 10 20 30 40 50 60 70 80 90 100d S P RNumber of Leaves (n)Min/Mean/Max - Distance(d)Simulated SPR Walk Distances 0 50 100150 200 250 300 20 25 30 35 40 45 50 55 60N u m b e r o f T r e e P a i r s Number of Leaves (n)Total Resolved (e)Protein Pct.Resolved 0 123 4 567 5 10 15 20 25 30 35 40 45 50 55d S P R Number of Leaves (n)Min/Mean/Max - Distance(f)Protein DistancesFigure 5:Size,success rate and distance distributions for each dataset.For the protein data,no trees of size greater than 60were resolved.the running time behaves similar to FPT algorithms.Also encouraging is the fact that the reduction rules perform much better in practice than the worst-case analysis in(Allen &Steel2001),which predicts a reduction in size to a factor of28times the distance.For example,in the random SPR walk dataset whose mean distance is roughly2,the reductions are effective for n>4whereas in the worst case it is only guaranteed to work for n>=56. Similar results are visible in the protein dataset graphs as well.As can be seen in these graphs,chain reductions accounts for only a small portion(well under10%)of the overall gain with subtree reductions making up the rest.We also note that of the roughly20,000 pairs of trees tested,application of the chain reduction rule did not once affect the SPR distance.The performance of the remaining heuristics is displayed in terms of running time in graphs(b),(d)and(f)in Figure6.Applying the reductions to intermediate trees provided very little performance gain,implying that the search space is dominated by trees with few common subtrees and chains.However,sorting the trees visited in each iteration of the search by the number of leaves reduced had a significant impact on the running time for all of the harder cases(d SP R≥4),speeding up the computation by nearly a factor of6for some of the larger protein tree pairs.5ConclusionThe computation of SPR distances between unrooted phylogenetic trees can be used to compare the evolutionary histories of different genes and provide a lower bound on the number of lateral transfers.Little previous work has been done on this problem though many related tree metrics have been relatively well studied in the literature.The reason for this appears to be primarily due to less insight into the problem’s structure(no known MAF reduction)rather than lack of interest.In this paper we revisited the problem of unrooted SPR distance,showing that it is NP-Hard and providing an optimized algorithm and implementation to solve it exactly.The algorithm is based on dividing the problem into two searches and making use of heuristics such as subtree reductions and reordering.This algorithm was able to quickly compute the exact distance between the majority of proteins belonging to144available sequenced prokaryotic genomes and their supertree.Our method can also be used to improve the brute force search component of TBR and rooted SPR distance computation.Though a polynomial time solution is unlikely due to its NP-Hardness,some possible avenues of future work on this problem remain.One is to show that chain reductions do not affect the distance,a conjecture that is supported by our experimental results but for which an analytical proof remains absent.This result would be sufficient to show that unrooted SPR distance isfixed parameter tractable,being exponential only in terms of the distance and not the size of the trees.In(Bordewich et al.2007),a decomposition by common clusters was used with significant practical success.We showed that such a technique cannot be directly applied to the problem of unrooted SPR distances but perhaps a variation of this technique can.AcknowledgmentThis research partially supported by the Natural Sciences and Engineering Research Council of Canada and Genome Atlantic.2 468104 56 7 8 9 10N u m b e r o f L e a v e s A f t e r R e d u c t i o n sNumber of Leaves in Input (n)No ReductionsMin/Mean/Max - Subtree ReductionsMin/Mean/Max - Chain & Subtree Reductions(a)Yule-Harding Random0.050.1 0.150.2 0.25 0.3 0.350.4 0.45 0.5 4 5 67 89 10M e a n W a l l T i m e (s )Number of Leaves (n)No HeuristicsIntermediate Subtree ReductionsIntermediate Subtree Reductions & Sorting(b)Yule-Harding Random2040608010010 20 3040 50 60 70 80 90 100N u m b e r o f L e a v e s A f t e r R e d u c t i o n sNumber of Leaves in Input (n)No ReductionsMin/Mean/Max - Subtree ReductionsMin/Mean/Max - Chain & Subtree Reductions(c)Simulated SPR Walk24 68 10 12 1416 18 20 10 20 3040 50 60 70 80 90 100M e a n W a l l T i m e (s )Number of Leaves (n)No HeuristicsIntermediate Subtree ReductionsIntermediate Subtree Reductions & Sorting(d)Simulated SPR Walk1020304050 6010 2030 4050 60N u m b e r o f L e a v e s A f t e r R e d u c t i o n sNumber of Leaves in Input (n)No ReductionsMin/Mean/Max - Subtree ReductionsMin/Mean/Max - Chain & Subtree Reductions(e)Protein 01020 30 40 50 60 70 80 90 100 5 10 15 2025 30 35 40 45 50 55M e a n W a l l T i m e (s )Number of Leaves (n)No HeuristicsIntermediate Subtree ReductionsIntermediate Subtree Reductions & Sorting(f)ProteinFigure 6:Experimental evaluation of the different heuristics on the three datasets.The effect of the reduction rules on the input tree sizes is displayed on the left.The improvements to the running time made by reducing and sorting intermediate trees are displayed on the right.。