Data reduction, exact, and heuristic algorithms for clique cover

格式：pdf
大小：813.06 KB
文档页数：9

下载文档原格式

Bibliography

Bibliography[1]H.L.Abbott,Lower bounds for some Ramsey numbers,Discrete Math.2(1972),289–293.[2] A.Agresti,Categorical Data Analysis,John Wiley and Sons,New York,1990,xvi+558pp.[3] D.Aldous,On the Markov-chain simulation method for uniform combinatorial simulationand simulated annealing,.Sci.1(1987),33–46.[4] D.Aldous,Some inequalities for reversible Markov chains,J.London Math.Soc.25(1982),564–576.[5]N.Alon,Eigenvalues and expanders,Combinatorica6(1986),86–96.[6]N.Alon and F.R.K.Chung,Explicit construction of linear sized tolerant networks,DiscreteMath.72(1988),15–19.[7]N.Alon,F.R.K.Chung and R.L.Graham,Routing permutations on graphs via matchings,SIAM J.Disc.Math.7(1994),513–530.[8]N.Alon,Z.Galil and man,Better expanders and superconcentrators,J.Algorithms8(1987),337–347.[9]N.Alon and man,λ1isoperimetric inequalities for graphs and superconcentrators,b.Theory B38(1985),73–88.[10]N.Alon,Z.Galil and O.Margalit,On the exponent of the all pairs shortest paths prob-lem,32nd Symposium on Foundations of Computer Science,IEEE Computer Society Press, (1991),569–575.[11]N.Alon and N.Kahale,Approximating the independence number via theθfunction,Math.Programming,80(1998),253–264.[12]N.Alon and J.H.Spencer,The Probabilistic Method,John Wiley and Sons,New York1991,xvi+254pp.[13]S.Arora and S.Safra,Probabilistic checking of proofs,a new characterization of NP.33rdSymposium on Foundations of Computer Science,IEEE Computer Society Press,(1992), 2–13.[14]L.Babai and M.Szegedy,Local expansion of symmetrical graphs,Combinatorics,Probabilityand Computing1(1991),1–11.[15]L.Babai,Automorphism groups,isomorphism,reconstruction,Handbook of Combinatorics(eds.R.L.Graham,M.Gr´o tschel and L.Lov´a sz),North-Holland,Amsterdam,(1996),1447–1540.[16] D.Bakry,L’hypercontractivit´e et son utilisation en th´e orie des semigroupes,In Ecole d’´e t´ede Saint Fleur1992,Springer Lecture Notes1581,1–114.[17]L.A.Bassalygo,Asymptotically optimal switching circuits,Problems Inform.Transmission17(1981),81–88.[18]J.Beck,On size Ramsey number of paths,trees and circuits I.,J.Graph Theory7(1983),115–129.[19]W.Beckner,Inequalities in Fourier analysis,Annals of Mathematics102(1975),159–182.[20]V.E.Ben˘e s,Mathematical Theory of Connecting Networks and Telephone Traﬃc,AcademicPress,New York1965,xiv+319pp.[21]J.C.Bermond and B.Bollob´a s,The diameter of graphs–a survey,Congressus Numerantium32(1981),3–27.[22]J.C.Bermond,C.Delorme and G.Farhi,Large graphs with given degree and diameter,III,Proc.Coll.Cambridge(1981),Ann.Discr.Math.13,North Holland,Amsterdam(1982), 23–31.[23] A.J.Berstein,Maximally connected arrays on the n-cube,SIAM J.Appl.Math.15(1967),1485–1489.201202BIBLIOGRAPHY[24]S.N.Bhatt and F.T.Leighton,A framework for solving VLSI graph layout problems,J.Comput.System Sci.28(1984),300–343.[25] F.Bien,Constructions of telephone networks by group representations,Notices Amer.Math.Soc.36(1989),5–22.[26]N.L.Biggs,Algebraic Graph Theory,(2nd ed.),Cambridge University Press,Cambridge,1993,xvi+205pp.[27]N.L.Biggs and M.H.Hoare,The sextet construction for cubic graphs,Combinatorica3(1983),153–165.[28]N.L.Biggs, E.K.Lloyd and R.J.Wilson,Graph Theory1736-1936,Clarendon Press,Oxford,1976,xi+239pp.[29]Y.Bishop,S.Fienberg,P.Holland,Discrete Multivariate Analysis,MIT Press,Cambridge,1975,x+557pp.[30]M.Blum,R.M.Karp,O.Vornberger,C.H.Papadimitriou,and M.Yannakakis,The complex-ity of testing whether a graph is a superconcentrator,Inf.Proc.Letters13(1981),164–167.[31] B.Bollob´a s,Random Graphs,Academic Press,New York(1985),xvi+447pp.[32] B.Bollob´a s and F.R.K.Chung,The diameter of a cycle plus a random matching,SIAM J.Disc.Math.1(1988),328–333.[33] B.Bollob´a s and I.Leader,Edge-isoperimetric inequalities in the grid,Combinatorica11(1991),299–314.[34] B.Bollob´a s and I.Leader,An isoperimetric inequality on the discrete torus,SIAM J.Disc.Math.3(1990),32–37.[35] B.Bollob´a s,and A.Thomason,Graphs which contain all small graphs,European J.of Com-binatorics2(1981),13–15.[36] B.Bollob´a s and W.F.de la Vega,The diameter of random regular graphs,Combinatorica2(1982),125–134.[37]J.A.Bondy and M.Simonovits,Cycles of even length in graphs,bin.Theory Ser.B16(1974),97–105.[38]R.B.Boppana,Eigenvalues and graph bisection:An average-case analysis,28th Symposiumon Foundations of Computer Science,IEEE Computer Society Press,(1987),280–285. [39]R.Bott and J.P.Mayberry,Matrices and trees,In Economic Activity Analysis,(O.Morgen-stern,ed.),John Wiley and Sons,New York(1954),391–340.[40] A.Broder, A.Frieze and E.Upfal,Existence and construction of edge disjoint paths onexpander graphs,Proc.Sym.Theo.on Computing,ACM(1992),140–149.[41]R.Brooks,The spectral geometry of k-regular graphs,Journal d’Analyse Math´e matique,57(1991),120–151.[42]N.G.de Bruijn,A combinatorial problem,Nederl.Akad.Wetensch.Proc.49(1946),758–764.[43] D.A.Burgess,On character sums and primitive roots,Proc.London Math.Soc.12(1962),179–192.[44]P.Buser,Cubic graphs and theﬁrst eigenvalue of a Riemann surface,Math.Z.162(1978),87–99.[45]P.Buser,Cayley graphs and planar isospectral domains,in Geometry and Analysis on Man-ifolds(T.Sunada,ed.),Springer Lecture Notes1339(1988),64–77.[46]P.Buser,Cubic graphs and theﬁrst eigenvalue of a Riemann surface,Math.Z.162(1978),87–99.[47]L.Caccetta,On extremal graphs with given diameter and connectivity,Ann.New York Acad.Sci.328(1979),76–94.[48] A.Cayley,A theorem on trees,Quart.J.Math.23(1889),376–378.[49]J.Cheeger,A lower bound for the smallest eigenvalue of the Laplacian,Problems in Analysis(R.C.Gunning,ed.),Princeton Univ.Press(1970),195–199.[50]S.Y.Cheng,Peter Li and S.-T.Yau,On the upper estimate of the heat kernel of a completeRiemannian manifold,American Journal of Mathematics103(1981),1021–1063.[51]R.Christensen,Log-Linear Models,Springer-Verlag,New York,1990,xii+408pp.[52] F.R.K.Chung,On concentrators,superconcentrators,generalizers and nonblocking net-works,Bell Systems Tech.J.58(1979),1765–1777.[53] F.R.K.Chung,A note on constructive methods for Ramsey numbers,J.Graph Th.5(1981),109–113.BIBLIOGRAPHY203 [54] F.R.K.Chung,Diameters of communications networks,Mathematics of Information Pro-cessing,AMS Short Course Lecture Notes(1984),1–18.[55] F.R.K.Chung,Diameters of graphs:Old problems and new results,Congressus Numeran-tium60(1987),295–317.[56] F.R.K.Chung,Diameters and eigenvalues,J.of Amer.Math.Soc.2(1989),187–196.[57] F.R.K.Chung,Quasi-random classes of hypergraphs,Random Structures and Algorithms1(1990),363–382.[58] F.R.K.Chung,Regularity lemmas for hypergraphs and quasi-randomness,Random Struc-tures and Algorithms2(1991),241–252.[59] F.R.K.Chung,Constructing random-like graphs,in Probabilistic Combinatorics and ItsApplications,(B.Bollobas ed.),Amer.Math.Soc.,Providence,1991,21–55.[60] F.R.K.Chung,Laplacians of graphs and Cheeger inequalities,in Combinatorics,Paul Erd˝o sis Eighty,Volume2,(D.Mikl´o s,V.T.S´o s,and T.Sz˝o nyi eds.),J´a nos Bolyai Mathematical Society,Budapest(1996),157–172.[61] F.R.K.Chung,V.Faber and T.A.Manteuﬀel,An upper bound on the diameter of a graphfrom eigenvalues associated with its Laplacian,SIAM.J.Discrete Math.7(1994),443–457.[62] F.R.K.Chung and M.R.Garey,Diameter bounds for altered graphs,J.of Graph Theory8(1984),511–534.[63] F.R.K.Chung and R.L.Graham,Quasi-random hypergraphs,Random Structures andAlgorithms1(1990),105–124.[64] F.R.K.Chung and R.L.Graham,Quasi-random tournaments,J.of Graph Theory15(1991),173–198.[65] F.R.K.Chung and R.L.Graham,Maximum cuts and quasirandom graphs,Random Graphs(A.Frieze and T.Luczak,eds.),John Wiley and Sons,New York(1992),23–33.[66] F.R.K.Chung and R.L.Graham,On graphs not containing prescribed induced subgraphs,in A Tribute to Paul Erd¨o s,(A.Baker et al.eds.)Cambridge University Press(1990),111–120.[67] F.R.K.Chung and R.L.Graham,Quasi-random set systems,J.Amer.Math.Soc.4(1991),151–196.[68] F.R.K.Chung and R.L.Graham,Quasi-random subsets of Z n,bin.Th.(A)61(1992),64–86.[69] F.R.K.Chung and R.L.Graham,Cohomological aspects of hypergraphs,Trans.Amer.Math.Soc.334(1992),365–388[70] F.R.K.Chung and R.L.Graham,Random walks on generating sets of groups,Electronicbinatorics,4,no.2,(1997),#R7,14pp.[71] F.R.K.Chung and R.L.Graham,Stratiﬁed random walks on an n-cube,Random Structuresand Algorithms,11(1997),199–222.[72] F.R.K.Chung,R.L.Graham and R.M.Wilson,Quasi-random graphs,Combinatorica9(1989),345–362.[73] F.R.K.Chung,R.L.Graham and S.-T.Yau,On sampling with Markov chains,RandomStructures and Algorithms9(1996),55–77.[74] F.R.K.Chung,A.Grigor’yan,and S.-T.Yau,Upper bounds for eigenvalues of the discreteand continuous Laplace operators,Advances in Mathematics117(1996),165–178.[75] F.R.K.Chung,A.Grigor’yan,and S.-T.Yau,Eigenvalues and diameters for manifolds andgraphs,Tsing Hua Lectures on Geometry and Analysis,International Press,Cambridge,MA, 1997,79–106.[76] F.R.K.Chung and C.M.Grinstead,A survey of bounds for classical Ramsey numbers,J.Graph Theory7(1983),25–37.[77] F.R.K.Chung,B.Kostant and S.Sternberg,Groups and the Buckyball,in Lie Theory andGeometry:In honor of Bertram Kostant(Eds.J.-L.Brylinski,R.Brylinski,V.Guillemin and V.Kac)PM123,Birkh¨a user,Boston,1994,97–126.[78] F.R.K.Chung and nglands,A combinatorial Laplacian with vertex weights,J.Comb.Theory(A)75(1996),316-327.[79] F.R.K.Chung and K.Oden,Weighted graph Laplacians and isoperimetric inequalities,Paciﬁc J.of Math.192(2000),257–273.[80] F.R.K.Chung and S.Sternberg,Laplacian and vibrational spectra for homogenous graphs,J.Graph Theory16(1992),605–627.204BIBLIOGRAPHY[81] F.R.K.Chung and S.Sternberg,Mathematics and the Buckyball,American Scientist81,No.1,(1993),56–71.[82] F.R.K.Chung and P.Tetali,Communication complexity and quasi-randomness,SIAM J.Discrete Math.6(1993),110–123.[83] F.R.K.Chung and P.Tetali,Isoperimetric inequalities for Cartesian products of graphs,Combinatorics,Probability and Computing,7(1998),141–148.[84] F.R.K.Chung and S.-T.Yau,A Harnack inequality for homogeneous graphs and subgraphs,Communications in Analysis and Geometry,2(1994),627–640.[85] F.R.K.Chung and S.-T.Yau,Eigenvalues of graphs and Sobolev inequalities,Combinatorics,Probability and Computing4(1995),11–26.[86] F.R.K.Chung and S.-T.Yau,Logarithmic Harnack inequalities,Mathematics ResearchLetters3(1996),793–812.[87] F.R.K.Chung and S.-T.Yau,Eigenvalue inequalities for graphs and convex subgraphs,Communications in Analysis and Geometry5(1997),575–623.[88] F.R.K.Chung and S.-T.Yau,A Harnack inequality for Dirichlet eigenvalues,J.of GraphTheory34(2000),247–257.[89] D.M.Cvetkovi´c,M.Doob and H.Sachs,Spectra of Graphs,Theory and Application,Aca-demic Press,1980,368pp.[90] D.M.Cvetkovi´c,M.Doob,I.Gutman,and A.Torga¨s ev,Recent results in the Theory ofGraph Spectra,North Holland,Amsterdam1988,xii+306pp.[91] E.B.Davies,Heat kernel bounds,conservation of probability and the Feller property,J.d’Analyse Math.58,(1992),99–119.[92]P.J.Davis,Circulant Matrices,John Wiley and Sons,New York,1979,xv+250pp.[93]P.Deligne,La conjecture de Weil I,Inst.Hautes Etudes Sci.Publ.Math43(1974),273–307.[94] E.D’Hoker and D.H.Phong,On determinants of Laplacians on Riemann surfaces,Comm.Math.Phys.104(1986),537–545.[95]J.-D.Deuschel and D.W.Stroock,Hypercontractivity and spectral gap of symmetric diﬀu-sions with applications to the stochastic Ising models,J.Funct.Anal.92(1990),30–48. [96]P.Diaconis,Group Representations in Probability and Statistics,Institute of Math.Statistics,Hayward,California,1988,vi+198pp.[97]P.Diaconis,R.L.Graham and J.Morrison,Asymptotic analysis of a random walk on ahypercube with many dimensions,Random Structures and Algorithms1(1990),51–72. [98]P.Diaconis and L.Saloﬀ-Coste,Comparison theorems for reversible Markov chains,Annalsof Applied Prob.3(1993),696–730.[99]P.Diaconis and L.Saloﬀ-Coste,Comparison techniques for random walks onﬁnite groupsAnnals of Applied Prob.4(1993),2131–2156.[100]P.Diaconis and L.Saloﬀ-Coste,An application of Harnack inequalities to random walk on nilpotent quotients,J.Fourier Anal.Appl.(1995),189–207.[101]P.Diaconis and L.Saloﬀ-Coste,Logarithmic Sobolev inequalities forﬁnite Markov chains, Ann.Appl.Prob.6(1996),695–750.[102]P.Diaconis and L.Saloﬀ-Coste,Walks on generating sets of groups,Prob.Theory Related Fields105(1996),393–421.[103]P.Diaconis and D.W.Stroock,Geometric bounds for eigenvalues of Markov chains,Annals Applied Prob.1(1991),36–61.[104]P.Diaconis and B.Sturmfels,Algebraic algorithms for sampling from conditional distribu-tions,Ann.Statist.26(1998),363–397.[105]M.J.Dinneen,M.R.Fellows and V.Faber,Algebraic constructions of eﬃcient broadcast networks,Lecture Notes in Comput.Sci.,539,Springer,Berlin,1991,152–158.[106]J.Dodziuk and L.Karp,Spectral and function theory for combinatorial Laplacians,in Geometry of Random Motion,Contemp.Math73,AMS Publication(1988),25–40. [107]M.Dyer,A.Frieze and R.Kannan,A random polynomial time algorithm for approximating the volume of convex bodies,JACM38(1991),1–17.[108]M.Eichler,Quaternary quadratic forms and the Riemann hypothesis for congruence zeta functions,Arch.Math.5(1954),355–366.[109] B.Elspas,Topological constraints on interconnection limited logic,Switching Circuit Theory and Logical Design5(1964),133–147.[110]P.Erd˝o s,Some remarks on the theory of graphs,Bull.Amer.Math.Soc.53(1947),292–294.BIBLIOGRAPHY205 [111]P.Erd˝o s,Some remarks on chromatic graphs,Colloquium Mathematicum16(1967),253-256.[112]P.Erd˝o s,S.Fajtlowicz and A.J.Hoﬀman,Maximum degree in graphs of diameter2, Networks10(1980),87–90.[113]P.Erd˝o s and A.Hajnal,On spanned subgraphs of graphs,Betrage zur Graphentheorie und deren Anwendungen,Kolloq.Oberhof(DDR)(1977),80–96.[114]P.Erd˝o s and A.R´e nyi,On a problem in the theory of graphs,Publ.Math.Inst.Hungar.Acad.Sci.7(1962),623–641.[115]P.Erd˝o s,A.R´e nyi and V.T.S´o s,On a problem of graph theory,Studia Sci.Math.Hungar.1(1966),215–235.[116]P.Erd˝o s and H.Sachs,Regul¨a re Graphen gegenebener Teillenweite mit minimaler Knoten-zahl,Wiss.Z.Univ.Halle–Wittenberg,Math.Nat.R.12(1963),251–257.[117]P.Erd˝o s and V.T.S´o s,On Ramsey-Tur´a n type theorems for hypergraphs,Combinatorica 2(1982),289–295.[118]P.Erd˝o s and J.Spencer,Imbalances in k-colorations,Networks1(1972),379–385. [119]P.Erd˝o s and J.Spencer,Probabilistic Methods in Combinatorics,Academic Press,New York(1974),106pp.[120]R.J.Faudree and M.Simonovits,On a class of degenerate extremal graph problems,Com-binatorica3(1983),83–93.[121]U.Feige,Randomized graph products,chromatic numbers,and the Lov´a szθ-function,Proc.Sym.Theo.on Computing,ACM(1995),635–640.[122]U.Feige,S.Goldwasser,L.Lov´a sz,S.Safra,and M.Szegedy,Approximating clique is al-most NP-complete,32nd Symposium on Foundations of Computer Science,IEEE Computer Society Press,(1991),2–12.[123]M.Fiedler,Algebraic connectivity of graphs,Czech.Math.J.23(98)(1973),298–305. [124]J.A.Fill,Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process,Ann.Appl.Prob.1(1991)62–87.[125]L.R.Ford and D.R.Fulkerson,Flows in Networks,Princeton Univ.Press(1962),xii+194 pp.[126]P.Frankl,A constructive lower bound for some Ramsey numbers,Ars Combinatoria3 (1977),297–302.[127]P.Frankl and V.R¨o dl,Forbidden intersections,Trans.AMS300(1987),259–286. [128]P.Frankl,V.R¨o dl and R.M.Wilson,The number of submatrices of a given type in a Hadamard matrix and related results,binatorial Th.(B)44(1988),317–328. [129]P.Frankl and R.M.Wilson,Intersection theorems with geometric consequences,Combina-torica1(1981),357–368.[130]M.L.Fredman,New bounds on the complexity of the shortest path problem,SIAM J.Computing5(1976),83–89.[131]J.Friedman,On the second eigenvalue and random walks in random d-regular graphs, Combinatorica11(1991),331–362.[132]J.Friedman and N.Pippenger,Expanding graphs contain all small trees,Combinatorica7 (1987),71–76.[133]G.Frobenius,¨Uber die Charaktere der alternierenden Gruppe,Sitzungsberichte Preuss.Akad.Wiss.(1901),303–315.[134]G.Frobenius,¨Uber Matrizen aus nicht negative Elementen,Sitzber.Akad.Wiss.Berlin (1912),456–477.[135]Z.F¨u redi and J.Koml´o s,The eigenvalues of random symmetric matrices,Combinatorica1 (1981),233–241.[136]O.Gabber and Z.Galil,Explicit construction of linear-sized superconcentrators,put.System Sci.22(1981),407–420.[137] A.Galtman,Spectral characterizations of the Lov´a sz number and Delsarte number of a graph,bin.12(2000),131–143.[138] F.R.Gantmacher,The Theory of Matrices,Vol.1,Chelsea Pub.Co.,New York(1977), x+374pp.[139]M.R.Garey and D.S.Johnson,Computers and Intractability,A Guide to the Theory of NP-Completeness,W.H.Freeman and Co.,San Francisco,1979,x+338pp.[140]M.Goemans and D.Williamson,Improved approximation algorithms for maximum cut and satisﬁability problems using semideﬁnite programming,J.ACM42(1995),1115–1145.206BIBLIOGRAPHY[141]P.Ginsbarg,Applied Conformal Field Theory,les Houches(1988),1–168.[142]W.Goddard,private communication.[143]R.L.Graham and V.R¨o dl,Numbers in Ramsey theory,Surveys in Combinatorics(1987),(C.Whitehead,ed.)London Math.Soc.Lecture Notes Series123,111–153.[144]S.W.Graham and C.Ringrose,Lower bounds for least quadratic nonresidues,Analytic Number Theory,(edited by B.Berndt,H.Halberstam,H.Diamond and A.Hildebrand), Birkh¨a user,Boston1990,264–309.[145]R.L.Graham and J.H.Spencer,A constructive solution to a tournament problem,Canad.Math.Bull.14(1971),45–48.[146] A.Grigor’yan,Integral maximum principle and its applications,Proc.of Royal Society, Edinburgh Sect.A,124(1994),353–362.[147]M.Gromov,Groups of polynomial growth and expanding maps,Publ.IHES,53(1981), 53–73.[148]L.Gross,Logarithmic Sobolev inequalities,Amer.J.Math.97(1975),1061-1083. [149]L.Gross,Logarithmic Sobolev inequalities and contractivity properties of semigroups, Springer Lecture Notes1563,(1993),54–88.[150]M.Grotschel,L.Lov´a sz and A.Schrijver,The ellipsoid method and its consequences in combinatorial optimization,Combinatorica1(1981),169–197.[151]M.Grotschel,L.Lov´a sz and A.Schrijver,Geometric Algorithms and Combinatorial Opti-mization,Springer-Verlag,Berlin,1988,xii+362pp.[152]W.Haemers,Eigenvalue methods in Packing and Covering in Combinatorics,(A.Schrijver ed.),Mathematisch Centrum,Amsterdam(1982),15–38.[153]M.M.Halldorsson,A still better performance guarantee for approximating graph coloring, Information Processing Letters45(1993),19–23.[154]L.H.Harper,Optimal numberings and isoperimetric problems on graphs,J.of Comb.Theory1(1966),385–393.[155]S.Hart,A note on the edges of the n-cube,Discrete Math.14(1976),157–163.[156]J.Hastad,Clique is hard to approximate within n1− ,37th Symposium on Foundations of Computer Science,IEEE Computer Society Press,(1996),627–636.[157]J.Haviland and A.Thomason,Pseudo-random hypergraphs,Discrete Math.75(1989), 255–278.[158]W.Hebisch and L.Saloﬀ-Coste,Gaussian estimates for Markov chains and random walks on groups,Ann.Probab.21(1993),673–709.[159] A.J.Hoﬀman,Eigenvalues of graphs,in Studies in Graph Theory II(D.R.Fulkerson,ed.), M.A.A.Studies in Math,Washington D.C.,(1975),225–245.[160] A.J.Hoﬀman and R.R.Singleton,On Moore graphs with diameter2and3,IBM J.of Res.Development4(1960),497–504.[161]R.A.Horn and C.R.Johnson,Matrix Analysis,Cambridge University Press,New York, 1985,xii+561pp.[162]J.Igusa,Fibre systems of Jacobian varieties III,American J.of Math.81(1959),453–476. [163]W.Imrich,Explicit construction of regular graphs without small cycles,Combinatorica4 (1984),53–59.[164]K.Ireland and M.Rosen,A Classical Introduction to Modern Number Theory,Springer-Verlag,New York1982,xiii+341pp.[165]M.Jerrum and A.J.Sinclair,Approximating the permanent,SIAM puting18(1989), 1149–1178.[166] F.Juh´a sz,On the spectrum of a random graph,Colloq.Math.Soc.J´a nos Bolyai25,Alge-braic Methods in Graphs Theory,Szeged(1978),313–316.[167]N.Kahale,Isoperimetric inequalities and eigenvalues,SIAM J.Discrete Math.,10(1997), 30–40.[168] D.Karger,R.Motwani,and M.Sudan,Approximate graph coloring by semi-deﬁnite pro-gramming,35th Symposium on Foundations of Computer science,IEEE Computer Society Press,1994,2–13.[169] B.S.Kashin and S.V.Konyagin,On systems of vectors in a Hilbert space,Trudy Mat.Inst.imeni V.A.Steklova157(1981)64–67.English translation in Proceedings of the Steklow Institute of Math.AMS(1983),67–70.[170]G.O.H.Katona,A theorem ofﬁnite sets,Theory of Graphs,Proc.Colloq.Tihany,Academic Press,New York,1966,187–207.BIBLIOGRAPHY207 [171]N.M.Katz,An estimate for character sums,J.Amer.Math.Soc.2(1989),197–200. [172] F.Kirchhoﬀ,¨Uber die Auﬂ¨o sung der Gleichungen,auf welche man bei der Untersuchung der linearen Verteilung galvanischer Str¨o me gef¨u hrt wird,Ann.Phys.chem.72(1847),497–508 [173]M.Klawe,Non-existence of one-dimensional expanding graphs,22nd Symposium on Foun-dations of Computer Science,IEEE Computer Society Press,(1981),109–113.[174] D.E.Knuth,The Art of Computer Programming,vol.3,Addison-Wesley,Reading,MA, 1973,xi+722pp.[175] D.E.Knuth,The sandwich theorem,Electronic J.of Combinatorics1(1994),A1,48pp. [176]J.B.Kruskal,The number of simplices in a complex,Mathematical Optimization Tech-niques,(ed.R.Bellman),University of California Press,Berkeley,1963,251–78,.[177]nglands and Y.Saint-Aubin,Algebro-geometric aspects of the Bethe equations, Strings and Symmetries,Lecture Notes in Phys.447,Springer,Berlin,1995,40–53. [178] zebnik,timenko and A.J.Woldar,A new series of dense graphs of high girth, Bull.Amer.Math.Soc.32(1995),73–79.[179] F.T.Leighton,Introduction to Parallel Algorithms and Architectures:Arrays,Trees,Hy-percubes,Morgan-Kauﬀman,San Mateo,CA,1992,xx+831pp.[180] F.T.Leighton and S.Rao,An approximate max-ﬂow min-cut theorem for uniform multi-commodityﬂow problem with applications to approximation algorithms,29th Symposium on Foundations of Computer Science,IEEE Computer Society Press,(1988),422–431. [181]T.Lengauer and R.E.Tarjan,Asymptotically tight bounds on time-space trade-oﬀs in a pebble game,put.Mach.29(1982),1087–1130.[182]H.Lenstra,personal communication.[183]G.Lev,Size bounds and parallel algorithms for networks,Ph.D.Thesis,Department of Computer Science,University of Edinburg.[184]W.Li,Character sums and abelian Ramanujan graphs,J.Number Theory41(1992),199–217.[185]P.Li and S.-T.Yau,On the parabolic kernel of the Schr¨o dinger operator,Acta Mathematica 156(1986),153–201.[186]P.Li and S.-T.Yau,Estimates of eigenvalues of a compact Riemannian manifold,Amer.Math.Soc.Proc.Symp.Pure Math.36(1980),205–239.[187]J.H.Lindsey,Assignment of numbers to vertices,Amer.Math.Monthly71(1964),508–516. [188]R.J.Lipton and R.E.Tarjan,A separator theorem for planar graphs,SIAM J.Appl.Math.36(1979),177–189.[189]L.Lov´a sz,On the Shannon capacity of a graph,Transactions on Information Theory-IT-25,IEEE Computer Society Press,(1979),1–7.[190]L.Lov´a sz,Perfect graphs,in Selected Topics in Graph Theory2,(eds R.L.Wilson and L.W.Beineke),Academic Press,New York(1983),55–87.[191]L.Lov´a sz and M.Simonovits,Random walks in a convex body and an improved volume algorithm,Random Structures and Algorithms4(1993),359–412.[192] A.Lubotzky,R.Phillips and P.Sarnak,Ramanujan graphs,Combinatorica8(1988),261–277.[193]G.A.Margulis,Explicit constructions of concentrators,Problemy Peredaci Informacii9 (1973),71–80(English transl.in Problems Inform.Transmission9(1975),325–332. [194]G.A.Margulis,Arithmetic groups and graphs without short cycles,6th Internat.Symp.on Information Theory,Tashkent(1984)Abstracts1,123–125(in Russian).[195]G.A.Margulis,Some new constructions of low-density parity check codes,3rd Internat.Seminar on Information Theory,convolution codes and multi-user communication,Sochi (1987),275–279(in Russian).[196]G.A.Margulis,Explicit group theoretic constructions of combinatorial schemes and their applications for the construction of expanders and concentrators,Problemy Peredaci Infor-macii(1988)(in Russian).[197]J.C.Maxwell,A Treatise on Electricity and Magnetism I,Oxford,Clarendon Press(1892), 403–410.[198]R.J.McEliece, E.R.Rodemich,and H.C.Rumsey,Jr.,The Lov´a sz bound and some generalizations,binatorics,Inform.Syst.Sci.3(1978),134–152.[199] B.Mohar,Isoperimetric number of graphs,J.of Comb.Theory(B)47(1989),274–291. [200]H.L.Montgomery,Topics in Multiplicative Number Theory,Lecture Notes in Math.227, Springer-Verlag,New York(1971),ix+178pp.208BIBLIOGRAPHY[201]J.W.Moon,Counting Labelled Trees,Canadian Mathematical Monographs,Canadian Mathematical Congress,Montreal,1970,x+113pp.[202]Z.Nagy,A constructive estimate of the Ramsey number,pok23(1975),301–302. [203] A.Nilli,On the second eigenvalue of a graph,Discrete Math.91(1991),207–210.[204] B.Osgood,R.Phillips,and P.Sarnak,Extremals of determinants of Laplacians,J.Funct.Anal.80(1988),148-211.[205] B.Osgood,R.Phillips,and P.Sarnak,Moduli space,heights and isospectral sets of plane domains,Ann.Math.129(1989),293–362.[206] E.M.Palmer,Graphical Evolution,John Wiley and Sons,New York(1985),xvii+177pp. [207]W.J.Paul,R.E.Tarjan and J.R.Celoni,Space bounds for a game on graphs,Math.Systems.Theory10(1977),239–251.[208]O.Perron,Zur Theorie der Matrizen,Math.Ann.64(1907),248–263.[209]M.Pinsker,On the complexity of a concentrator,7th Internat.Teletraﬃc Conf.,Stockholm, June1973,318/1–318/4.[210]N.Pippenger,Superconcentrators,SIAM put.6(1977),298–304.[211]N.Pippenger,Advances in pebbling,Internat.Collo.On Automation Languages and Pro-gramming9(1982),407–417.[212]G.Poly´a and S.Szeg¨o,Isoperimetric Inequalities in Mathematical Physics,Annals of Math.Studies,no.27,Princeton University Press,(1951),xvi+279pp.[213] A.M.Polyakov,Quantum geometry of bosonic strings,Phys.Letters B103(1981),207–210. [214]S.Ramanujan,On certain arithmetical functions,Trans.Cambridge Philos.Soc.22(9) (1916),159–184.[215] F.P.Ramsey,On a problem of formal logic,Proc.London Math.Soc.30(1930),264–286. [216] D.Ray and I.M.Singer,Analytic torsion for complex manifolds,Ann.Math.98(1973), 154–177.[217] D.K.Ray-Chaudhuri and R.M.Wilson,On t-designs,Osaka J.Math.12(1975),737–744. [218] A.R´e nyi,On the enumeration of trees,in Combinatorial Structures and their Applications (R.Guy,H.Hanani,N.Sauer,and J.Schonheim,eds.)Gordon and Breach,New York(1970), 355–360.[219]V.R¨o dl,On universality of graphs with uniformly distributed edges,Discrete Math.59 (1986),125–134.[220]P.Sarnak,Determinants of Laplacians,Comm.Math.Phys.110(1987),113–120. [221]P.Sarnak,Some Applications of Modular Forms,Cambridge University Press,Cambridge, 1990,x+111pp.[222] A.Schrijver,A comparison of the Delsarte and Lov´a sz bounds,IEEE Transactions on Information Theory IT-25,(1979),425–429.[223]J.J.Seidel,Graphs and their spectra,Combinatorics and Graph Theory,PWN-Polish Scientiﬁc Publishers,Warsaw(1989),147–162.[224]R.Seidel,On the all-pairs-shortest-path problem,Proc.Sym.Theo.on Computing,ACM (1992),745–749.[225]M.Simonovits and V.T.S´o s,Szemer´e di’s partition and quasirandomness,Random Struc-tures and Algorithms2(1991),1–10.[226] A.J.Sinclair,Algorithms for Random Generation and Counting,Birkhauser,Boston,1993, vi+146pp.[227] A.J.Sinclair and M.R.Jerrum,Approximate counting,uniform generation,and rapidly mixing markov chains,Information and Computation82(1989),93–133.[228]R.Singleton,On minimal graphs of maximum even girth,bin.Theory1(1966), 306–332.[229]J.Spencer,Optimal ranking of tournaments,Networks1(1971),135–138.[230]J.Spencer,Ramsey’s theorem–A new lower bound,binatorial Theory18(1975), 108–115.[231]S.Sternberg,Group Theory in Physics,Cambridge University Press,Cambridge,1994.xiv+429pp.[232] D.W.Stroock,Logarithmic Sobolev inequalities for Gibbs states,Springer Lecture Notes 1563,Berlin,(1993),194–228.[233]J.J.Sylvester,On the change of systems of independent variables,Quarterly Journal of Mathematics1(1857),42–56.Collected Mathematical Papers,Cambridge2(1908),65–85.。

整理蛛形学常用名称

精品文档蛛形学常用名称abdomen-腹部Acari --蜱螨目aciniform gland—葡萄腺Actinopodidae -线足蛛科Adams consen sus cladograms Adams-合意支序图Addition sequences —加入顺序Additions —加入Additive character transformation series —加性性状演变系列Additive coding —加性编码Adult stages一成体状态Advancement index -进步系数Agele ni dae—漏斗蛛科aggregate gland— ** 月泉Alternative cladograms —^候选支序图alveolus 一生殖腔窝Amaurobiidae 一暗蛛科Amaurobioidea --暗蛛总科Ambiguous components 一含糊的组份Amblypygi 一无鞭目Ammoxenidae —沙蛛科Amphi nectidae —菲蛛科ampullata gla nd一壶状腺An unequivocal relationship of generality —普遍性的明确性关系anal tubercle 一肛丘Analysis of distance data 一距离数据分析法Anapidae一安蛛科Ancestral characters —^祖先性状Ancestral taxon 一^祖先分类单元Annotated Linnaean hierarchy —注释林奈等级系统an terior eye row 一前眼歹U anterior laterior eye—前侧眼an terior media n eye—前中眼Antrodiaetidae 一穴蛛科Anyphaenidae —近管蛛科Apomorphies apomorphic characters-近裔性状Arachnida 一蛛形纲Araneae 一蜘蛛目Araneidae 一园蛛科Araneoclada --蜘蛛分支类精品文档精品文档Araneomorphae --新蛛下目Archaeidae -古蛛科Area cladogram—区域支序图Argiopidae --金蛛科（现已并入园蛛科）Arrangement 一安排Artificial classification 一人为分类Artificial taxon 一人为分类单元Atypidae 一地蛛科Atypoidina --地蛛总科Austrochilidae —南蛛科Autopomorphies 一自近裔性状Auxiliary principle—Hennig 的辅助原则Barychelidae —螯耙蛛科Binary character transformation series 一二性状演变系列Binary coding 一二态编码方法Biogenetic law —生物发生律Biotic hierarchy —生物等级系统book lung—书月市Branch swapping 一分支互换Branch-and-bound searches 一分板界限搜索Branching character transformation series 一分支性状演变系列calamistru m一栉Camin-sokal parsim ony - Cami n-sokal 简约法Cap o niidae-开普蛛科carapace-背甲cardiac mark 一心斑Category 一阶元cephalic一头cephalothorax 头胸部cervical groove—颈沟Character code ―性状编码Character data matrix ―性状数据矩阵Character optimization ―性状优化Character polarity —‘确定性状极向’Character polarization —性状极化Character state tree —性状状态树Character state —性状状态Character transformation series —性状演变系列Character tree —性状树Character weighting —性状权衡Characte—性状chelicera一螯肢Chummidae ?Cithaeronidae 一琴蛛科Clade 一分支Cladistics 一支序系统学Cladoge nesis —分支发生Cladogram bisecti on and rec onn ecti on 一二分支序图与重联精品文档Cladogram comparis ons 一支序图的比较Cladogram len gth 一支序图的长度Cladogram measures 一支序图的量度Cladogram rearra ngeme nt 一支序图的重新安排Cladogram 一支序图Cladograms buildi ng —建立支序图Classificati on —分类clavate—棒状claw tuft 一毛簇Clique analysis 一集群分析Clubi on idae 一管巢蛛科clypeus—额colulus —舌状体Commonality 一内群共同性’Com mon ality —共同;性Com mon-Is-Primitive: Twice Character Freque ncy An alysis共同即为原始:次性状分布频率分析Compatibility method 一相容法Comp onent an alysis 一组份分析Comp onents 一组份con ductor一弓1导器Consen sus an alysis —合意分析Consen sus cladogram tech niq ues 一合意支序图技术Consen sus cladogram 一合意支序图Consen sus tree tech niq ues -合意树技术Consen sus tree 一合意树Con ti nu ous characters 一连续性状Conv erge nt similarity —趋同相似性Corinni dae —圆颚蛛科coxal gla nd一基节腺coxa —基节Crassitarsae--厚跑类Cribellatae —筛器类cribellum gland一筛腺cribellum —筛器Cte ni dae —栉足蛛科Cte nizidae—蝗当科*回复收起回复・3楼• 2012-01-28 12:09« keensunl•皇帝巴布10cuspule-疣突Cyatholipidae 一杯蛛科 Cybaeidae-并齿蛛科 cybium -跗舟 Cyclocte nidae —圆栉蛛科 Cyrtauche niidae-弓蛛科 Dei nopidae —妖面蛛科 Dei nopoidea —妖面蛛总科Deletio ns —删除Derivatives of classification tree 一分类树的衍生物Derived characters-衍生性状Desce ndant characters-后裔性状Desidae-潮蛛科 diaxial 一横螯Dicty ni dae —卷叶蛛科Dicty noidea --卷叶蛛总科Differenee matrix -差值矩阵Differe nee -差异Differential weighting —差值权衡Diguetidae —迪格蛛科Dio nycha --两爪类Dipluridae —长尾蛛科Discrete characters 一离散性状 distal haematodocha-端血囊 Dollo parsimony — Dollo 简约法Drymusidae 一丛蛛科Dysderidae 一石蛛科Ecribellatae --无筛器类Orthog natha --直螯类 embolus 一插入器«我也说一句en dite—颚叶en doskeletor—内骨骼en doster nite-内胸板Ensemble consistency indices 一总体一致性指数Entelegynae --复杂生殖器类epigastric furrow 一生殖沟Epigenetic characters —后生性状epig yne, epig ynum —^外雌器Equally parsimonious cladograms 一同等简约支序图Equally parsimonious trees 一同等简约树Eresidae —隆头蛛科Eresoidea --隆头蛛总科Evolutionary systematics—进化系统学Exact algorithms一精确算法Exhaustive searches-彻底搜索exoskelet on—外骨骼False components-错误组分fang一螯牙femur 一腿节fertilization duct —受精囊Filistatidae一管网蛛科Fitch parsimony —Fitch 简约法folium 一叶状斑Fornicephalae --拱头蛛类fovea 一中窝F-statistic—f-统计量Functional ingroup —‘功能内群’Functional outgroup 一功能外群F-value—f-值Gallieniellidae —加利蛛科Generalised parsimony 一般化简约法Generality —普遍性Generalized direct method 一般化的直接方法Global optimum 一全局优化Globally parsimonious analysis 一全局简约分析Gnaphosidae 一平腹蛛科Gn aphosoidea-平腹蛛总科Gradungulidae 一格拉蛛科Group compatibility algorithm 一类群一致算法Hah niidae 一栅蛛科Haplogynae --简单生殖器类Henn ig argume ntation— Henn ig 论证Hersiliidae 一长纺蛛科Heuristic searches —启发式搜索Hexathelidae 一异纺蛛科Hierarchy 一等级系统’Holarchaeidae 一全古蛛科精品文档精品文档Homal ony chidae 一无齿蛛科Homologous characters or homologue 一同源性状Homology 一同源Homoplastic character 一非同源性状Homoplasy excess ratio HER 一非同源过量比率Homoplasy -三日同源Hutto ni idae 一胡通蛛科Hypochilidae —古筛蛛科Idiopidae 一异蛛科Inclusion/exclusion rule —包括用卡除原则Index of consistency c ——致性原贝Uinferior claw一下爪In group an alysis 一内群分析In group node 一内群节点In group —内群Internal branch 一内部分支Internode —节点间隔In tersectio n —交集In terval, INT 一间隔In terve ning character 一中介性状labium 一下唇Lamp oni dae 一灯蛛科lateral codyle 一I侧结节Lept on etidae —弱蛛科Lin ear character tran sformati on series 一线性性状演变系列•回复收起回复*4楼•2012-01-28 12:10*我也说一句精品文档« keensunl•皇帝巴布10Linear codi ng 一线性编码Linn aea n hierarchy —林奈等级系统Lin yphiidae 一皿蛛科Liocra nidae 一光盔蛛科Liphistiidae 一节板蛛科Listing convention 一排列约定lobed gla nd一叶状腺Local optimum 一局部优化Logical consistency —逻辑上一致Lycosidae —狼蛛科Lycosoidea --狼蛛总科Majority consensus cladograms 一多数合意支序图Majority consensus tree 一多数合意树Majority rule —多数规则Malkaridae —马尔卡蛛科malphighia n tube 一马氏管Man hattan dista nee —曼哈顿距离maxilla 一小颚Maximum likelihood 一最大相似法Mecicobothriidae 一墨穴蛛科Mecysmauche niidae-展颈蛛科media n apophysis—中突media n ocular area—中眼域Mesothelae --中纺亚目metatarsus-后跑节Micropholcommatidae 一小幽蛛科Microstigmatidae 一小点蛛科Migidae 一四纺蛛科Mimetidae —^拟态蛛科Miturgidae 一米图蛛科Mixed coding 一混合编码Monophylitic group 一单系群精品文档Monophyly -单系Monotypic genus -单型属Most parsimonious reconstruction sets, MPR sets -最简约的重建集，MPR 集Multicharacter transformation series -多性状演变系列Multistate character transformation series -多态性状演变系列Mygalomorphae --原蛛下目Mysmenidae -密蛛科Natural classification -自然分类Nearest neighbor -最近邻居Nearest neighborur interchange-最近邻居互换Neighborhood —邻居关系Nels on consen sus cladogram Nels on—合意支序图Nels on consen sus tree Nels on -合意树Nemesiidae-线蛛科Neocribellatae --新筛蛛类Nephilidae—络新妇科n ephrocyte-肾原细胞Nesticidae一类球蛛科Nicodamidae一尼可蛛科Node一节点Nonadditive character transformation series 一非加性性状演变系列Nonredundant linear coding 一非冗余线性编码Numerical prefix scheme-数字前缀系统Numerical tax onomy-数值分类学Ochyroceratidae-花洞蛛科ocular area-眼域Oecobiidae-拟壁钱科Ontogenetic transformation 一个体发育变化Ontogeny method 一‘个体发育方法’On toge ny一个体发育Oon opidae-卵形蛛科Opiliones --盲蛛目opistosoma-后体Opsithothelae --后纺亚目Optimal cladogram -'优化支序图Optimization of cladograms 一支序图的优化Orbiculariae --圆网类精品文档精品文档Orbiculariae --圆网蛛类Orsolobidae—激蛛科Outgroup analysis -^外群分析Outgroup node —^外群节点Outgroup —^外群Oxyopidae —猫蛛科palpal organ一触肢器Palpigradi --须脚目Palpimanidae -二纺蛛科Palpimanoidea --二纺蛛总科palp 一触肢paracybium 一副跑舟Parallelism 一平行Par叩hyletic group 一并系群Paraphyly 一并系Pararchaeida一^拟古蛛科Paratropididae-鳞毛蛛科paraxial—直螯Parsim ony—简约法patella 一膝节Patristic dista nee matrix —路径长度矩阵Patter ns of evoluti onary relatio nships一进化关系格局paturo n—螯基pedicel-腹柄pedipalp一触肢Periegopidae ?petiolus—腹柄Phenetic differences 一表型差异Phenetics —表征分类学和表型分类学Philodromidae 一逍遥蛛科Pholcidae—幽灵蛛科Phylogenetic systematics 一系统发育系统学Phylogenetic tree一系统发育树Phyxelididae —菲克蛛科Pimoidae 一派模蛛科Pisauridae-盗蛛科Plectreuridae 一距蛛科Plesiomorphic character —近祖性状Plesiomorphic —近祖的plumose 一羽状pois on gla nd—毒腺Polymorphic taxa 一多态分类单元Polymorphism parsimony 一多态现象简约法Polyphyletic group 一多系群Polyphyly 一多系Polytomy bush 一多分支丛林porti on 一头posterior eye row —后眼歹Uposterior laterior eye—后侧眼posterior media n eye—后中眼Posteriori character weighting 一事后性状权衡Postorder traversal —后根次序遍历Predo minant character method—优势性状原贝UPreorder traversal 一先根次序遍历精品文档精品文档Priori charavter weighting 一事先性状遍历procurved—前凹Prodidomidae 一粗螯蛛科prolateral surface—足的前侧面promarg inal tooth —前堤齿prosoma —前体proximal haematodocha-基血囊Psechridae-褛网蛛科Pseudoscorpi ones--伪蝎目pyriform gland—梨状腺Qualitative characters-定性性状Quan titative characters 一定量性状radial furrow—放射沟rake一螯耙Rastelloidi na --耙蛛总科rastellum —螯耙rebordered-下唇前缘膨大而加厚Recon struct ing phyloge nies 一重建系统发育Recursive swapp ing —递推式分支互换recurved —后凹Refere nee tax on ―参考分类单元Reformulated bioge netic law 一重新阐述的生物发生定律Relati on ships 一关系Rescaled consistency index re 一重新标尺的一致性指数Retention index r 一保留指数Reticulate speciation 一网状物种形成retrolateral surface-足的后侧面retromargi nal tooth —后堤齿Reversal —逆转Reversibility 一逆转性Ricinulei --节腹目Root cladogram—根支序图Root node 一■根节点Root taxon一■根分类单元*回复收起回复*5楼*2012-01-28 12:10«我也说一句•皇帝巴布« keensunl10精品文档精品文档Root tree —根树Root —根RTA clade - RTA 分支类Salticidae—跳蛛科scapus-垂体Schizomida --裂盾目scopula-毛丛Scorpi onida --蝎目Scytodidae-花皮蛛科Segestriidae-类石蛛科Sele nopidae-拟扁蛛科Sen oculidae-六眼蛛科Seque ncing conven tio n 一顺序约定serrated bristle-锯齿毛serrated-锯齿状serrula—微齿Sicariidae 一刺客蛛科sigilla 一肌斑silk gland—丝腺Sin opimoidae —华模蛛科Sister group 一姐妹群Solifugae --避日目Sparassida一巨蟹蛛科spatulate-抹刀状Species category 一种级阶元Species-物种spermatheca-Z 内精囊spigot一纺管spinn eret一纺器squmiform —鳞状Stem species -基干种Stenochilidae -斯坦蛛科Steps -步数Strict comesnsustree -严格合意树Strict consensuscladogram -严格合意支序图stridulating ridge 一发声嵴Subcladogram pruning and regrafting -亚支序图的修剪与嫁接Suboptimal chadograms —亚优化支序图Substitutions 一^替代Subtree pruning and regrafting —亚树修剪与嫁接Successive approximations weighting —连续近似权衡superior claw 一上爪Symbiotic origin 一共生起源的Symphytognathidae 一愈螯蛛科Symplesiomorphous similarity 一共近祖性状相似性Symplesiomorphy or shared ancestral characters 一共近祖性状Synaphridae —? Synapomorphies or shared derived characters 一共近裔性状Sy notaxidae-合蛛科Systematics of biosystematics 一生物系统学tarsa orga n一时节器tarsus-跑节Taxon 一分类单元Telemidae—泰莱蛛科精品文档精品文档Ten gellidae—廷盖蛛科Termi nal branch—终端分支Terminal taxon —终端分类单元Tetrablemmidae—盔蛛科Tetragnathidae 一肖虫肖科Tetrapneumonae --四肺类The alternating out-group rule 一交替外群规则The first doublet rule —第一成对规则The most recent common ancestor —最近共同祖先The smallest closed set 一最小性状闭集Thelyphonida --有鞭目Ther叩hosidae 一狒蛛科Ther叩hosoidae-捕鸟蛛总科Theridiidae一球蛛科Theridiosomatidae 一球体蛛科Thomisidae一蟹蛛科thorax 一胸tibia —月胫节Titanoecidae 一隐石蛛科Topological tree structure -拓扑树结构trachea 气管Transformed cladistics -改造支序系统学Tree bisection and reconnection -二分树与重联Tree rearrangement -树的重新安排Trees building -建树trichobothrium 一听毛Trionycha 一三爪类Trochanteriidae 转蛛科trocha nter—转节True components —真实组份Tuberculotae --眼丘蛛类tubular gla nd一管状腺Uloboridae 一妩蛛科Vector of character ―性状矢量Venn diagram 一套嵌框图Wagner algorithm - Wagner 算法Wagner cladogram - Wagner 支序图Wagner parsim ony - Wagner 简约法Wagner tree - Wagner 树Young stages 一幼体状态Zodariidae 一^拟平腹蛛科Zoridae —狼栉蛛科Zorocratidae ?Zoropsidae 一逸蛛科精品文档。

复杂网络中聚类算法总结

复杂⽹络中聚类算法总结⽹络，数学上称为图，最早研究始于1736年欧拉的哥尼斯堡七桥问题，但是之后关于图的研究发展缓慢，直到1936年，才有了第⼀本关于图论研究的著作。

20世纪60年代，两位匈⽛利数学家Erdos和Renyi建⽴了随机图理论，被公认为是在数学上开创了复杂⽹络理论的系统性研究。

之后的40年⾥，⼈们⼀直讲随机图理论作为复杂⽹络研究的基本理论。

然⽽，绝⼤多数的实际⽹络并不是完全随机的。

1998年，Watts及其导师Strogatz在Nature上的⽂章《Collective Dynamics of Small-world Networks》揭⽰了复杂⽹络的⼩世界性质。

随后，1999年，Barabasi及其博⼠⽣Albert在Science上的⽂章《Emergence of Scaling in Random Networks》⼜揭⽰了复杂⽹络的⽆标度性质(度分布为幂律分布)，从此开启了复杂⽹络研究的新纪元。

随着研究的深⼊，越来越多关于复杂⽹络的性质被发掘出来，其中很重要的⼀项研究是2002年Girvan和Newman在PNAS上的⼀篇⽂章《Community structure in social and biological networks》，指出复杂⽹络中普遍存在着聚类特性，每⼀个类称之为⼀个社团(community)，并提出了⼀个发现这些社团的算法。

从此，热门对复杂⽹络中的社团发现问题进⾏了⼤量研究，产⽣了⼤量的算法，本⽂试图简单整理⼀下复杂⽹络中聚类算法，希望对希望快速了解这⼀部分的⼈有所帮助。

本⽂中所谓的社团跟通常我们将的聚类算法中类(cluster)的概念是⼀致的。

0. 预备知识为了本⽂的完整性，我们⾸先给出⼀些基本概念。

⼀个图通常表⽰为G=(V,E)，其中V表⽰点集合，E表⽰边集合，通常我们⽤n表⽰图的节点数，m表⽰边数。

⼀个图中，与⼀个点的相关联的边的数量称为该点的度。

第七讲卡诺图

11
000 010 110 011 111
m0 m2 m6 m3 m7
卡诺图
卡诺图是真值表的另一种形式，可以帮助是图形化相邻
– 有利于使用结合定理
A 0 1
0 1
A 0 1 0
2 3
B 0 1 0 1
F 1 0 1 0
B
1 1 0
0 0 1 1
12
2和3个变量的卡诺图
A A B ABC ABC ABC ABC
价廉物美的计算设备人的计算方法计算方法描述输入输出行为数据编码真值表优化的组合电路和之积pos积之和sop布尔函数开关电路逻辑门电路逻辑门的实现逻辑映射综合分析应用问题文字规范说明开关电路andornandorandnor其它andorinv功能描述开关函数真值表时序图其它soppos最小项范式最大项范式开关函数化简
23
例3的卡诺图
f(A,B,C,D) = m(1,2,4,6,9)
Step 2 AB CD
0
A 01 11
12 8
00
4
10 Step 1
00
1 5
1
13 9
01
3
1
7 15 11
1 D
11 C 10
2 6 14 10
1
1 B
Step 3
24
卡诺图化简指导原则
n变量卡诺图中的每个单元都有n个逻辑相邻的单元。单元可以被合并为大小为 2,4,8,…,2k 的组。每个被合并组中包含的所有单元都对一些变量有相同的值。合并尽可能多的单元，这将导致组所对应的项内字母的个数最少（扇入）。尽可能用最少的组覆盖所有的最小项，这将导致结果中包含最少的积项（门数和最后一级或门的扇入）。应该从最“孤立”单元开始。

MIT麻省理工学院算法导论公开课 Problem Set 4 solution

Introduction to Algorithms October 29, 2005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 18Problem Set 4 SolutionsProblem 4-1. TreapsIf we insert a set of n items into a binary search tree using T REE-I NSERT, the resulting tree may be horribly unbalanced. As we saw in class, however, we expect randomly built binary search trees to be balanced. (Precisely, a randomly built binary search tree has expected height O(lg n).) Therefore, if we want to build an expected balanced tree for a ﬁxed set of items, we could randomly permute the items and then insert them in that order into the tree.What if we do not have all the items at once? If we receive the items one at a time, can we still randomly build a binary search tree out of them?We will examine a data structure that answers this question in the afﬁrmative. A treap is a binary search tree with a modiﬁed way of ordering the nodes. Figure 1 shows an example of a treap. As usual, each item x in the tree has a key key[x]. In addition, we assign priority[x], which is a random number chosen independently for each x. We assume that all priorities are distinct and also that all keys are distinct. The nodes of the treap are ordered so that (1) the keys obey the binary-search-tree property and (2) the priorities obey the min-heap order property. In other words,•if v is a left child of u, then key[v]<key[u];•if v is a right child of u, then key[v]>key[u]; and•if v is a child of u, then priority(v)>priority(u).(This combination of properties is why the tree is called a “treap”: it has features of both a binary search tree and a heap.)Figure 1: A treap. Each node x is labeled with key[x]:p riority[x]. For example, the root has key G and priority 4.It helps to think of treaps in the following way. Suppose that we insert nodes x1,x2,...,x n, each with an associated key, into a treap in arbitrary order. Then the resulting treap is the tree that wouldhave been formed if the nodes had been inserted into a normal binary search tree in the order given by their (randomly chosen) priorities. In other words, priority[x i]<priority[x j]means that x i is effectively inserted before x j.(a) Given a set of nodes x1,x2,...,x n with keys and priorities all distinct, show that thereis a unique treap with these nodes.Solution:Prove by induction on the number of nodes in the tree. The base case is a tree withzero nodes, which is trivially unique. Assume for induction that treaps with k−1orfewer nodes are unique. We prove that a treap with k nodes is unique. In this treap, theitem x with minimum priority must be at the root. The left subtree has items with keys<key[x]and the right subtree has items with keys >key[x]. This uniquely deﬁnes theroot and both subtrees of the root. Each subtree is a treap of size ≤k−1, so they areunique by induction.Alternatively, one can also prove this by considering a treap in which nodes are inserted in order of their priority. Assume for induction that the treap with the k−1nodes with smallest priority is unique. For k=0t his is trivially true. Now considerthe treap with the k nodes with smallest priority. Since we know that the structureof a treap is the same as the structure of a binary search tree in which the keys areinserted in increasing priority order, the treap with the k nodes with smallest priorityis the same as the treap with the k−1nodes with smallest priority after inserting thek-th node. Since BST insert is a deterministic algorithm, there is only one place thek-th node could be inserted. Therefore the treap with k nodes is also unique, provingthe inductive hypothesis.(b) Show that the expected height of a treap is O(lg n), and hence the expected time tosearch for a value in the treap is O(lg n).Solution: The idea is to realize that a treap on n nodes is equivalent to a randomlybuilt binary search tree on n nodes. Recall that assigning priorities to nodes as theyare inserted into the treap is the same as inserting the n nodes in the increasing orderdeﬁned by their priorities. So if we assign the priorities randomly, we get a randomorder of n priorities, which is the same as a random permutation of the n inputs, sowe can view this as inserting the n items in random order.The time to search for an item is O(h)where h is the height of the tree. As we saw inlecture, E[h]=O(lg n). (The expectation is taken over permutations of the n nodes,i.e., the random choices of the priorities.)Let us see how to insert a new node x into an existing treap. The ﬁrst thing we do is assign x a random priority priority[x]. Then we call the insertion algorithm, which we call T REAP-I NSERT, whose operation is illustrated in Figure 2.(e) (f)Figure 2: Operation of T REAP-I NSERT. As in Figure 1, each node x is labeled with key[x]: priority[x]. (a) Original treap prior to insertion. (b) The treap after inserting a node with key C and priority 25. (c)–(d) Intermediate stages when inserting a node with key D and priority 9.(e) The treap after insertion of parts (c) and (d) is done. (f) The treap after inserting a node with key F and priority 2.(c) Explain how T REAP-I NSERT works. Explain the idea in English and give pseudocode.(Hint: Execute the usual binary search tree insert and then perform rotations to restorethe min-heap order property.)Solution: The hint gives the idea: do the usual binary search tree insert and thenperform rotations to restore the min-heap order property.T REAP-I NSERT (T,x)inserts x into the treap T(by modifying T). It requires that xhas deﬁned key and priority values. We have used the subroutines T REE-I NSERT,R IGHT-R OTATE, and R IGHT-R OTATE as deﬁned in CLRS.T REAP-I NSERT (T,x)1T REE-I NSERT (T,x)2 while x =root[T]and priority[x]<priority[p[x]]3 do if x=left[p[x]]4 then R IGHT-R OTATE (T,p[x])5 else L EFT-R OTATE (T,p[x])Note that parent pointers simplify the code but are not necessary. Since we only needto know the parent of each node on the path from the root to x(after the call toT REE-I NSERT), we can keep track of these ourselves.(d) Show that the expected running time of T REAP-I NSERT is O(lg n). Solution:T REAP-I NSERT ﬁrst inserts an item in the tree using the normal binary search treeinsert and then performs a number of rotations to restore the min-heap property.The normal binary-search-tree insertion algorithm T REE-I NSERT always places thenew item at a new leaf of tree. Therefore the time to insert an item into a treap isproportional to the height of a randomly built binary search tree, which as we saw inlecture is O(lg n)in expectation.The maximum number of rotations occurs when the new item receives a priority lessthan all priorities in the tree. In this case it needs to be rotated from a leaf to the root.An upper bound on the number of rotations is therefore the height of a randomly builtbinary search tree, which is O(lg n)in expectation. (We will see that this is a fairlyloose bound.) Because each rotation take constant time, the expected time to rotate isO(lg n).Thus the expected running time of T REAP-I NSERT is O(lg n+lg n)=O(lg n).T REAP-I NSERT performs a search and then a sequence of rotations. Although searching and rotating have the same asymptotic running time, they have different costs in practice. A search reads information from the treap without modifying it, while a rotation changes parent and child pointers within the treap. On most computers, read operations are much faster than write operations. Thus we would like T REAP-I NSERT to perform few rotations. We will show that the expected number of rotations performed is bounded by a constant (in fact, less than 2)!(a) (b)Figure 3: Spines of a binary search tree. The left spine is shaded in (a), and the right spine is shaded in (b).In order to show this property, we need some deﬁnitions, illustrated in Figure 3. The left spine of a binary search tree T is the path which runs from the root to the item with the smallest key. In other words, the left spine is the maximal path from the root that consists only of left edges. Symmetrically, the right spine of T is the maximal path from the root consisting only of right edges. The length of a spine is the number of nodes it contains.(e) Consider the treap T immediately after x is inserted using T REAP-I NSERT. Let Cbe the length of the right spine of the left subtree of x. Let D be the length of theleft spine of the right subtree of x. Prove that the total number of rotations that wereperformed during the insertion of x is equal to C+D.Solution: Prove the claim by induction on the number of rotations performed. Thebase case is when x is the parent of y. Performing the rotation so that y is the new rootgives y exactly one child, so C+D=1.Assume for induction that the number of rotations k performed during the insertionof x equals C+D. The base case is when 0 rotations are necessary and x is insertedas a leaf. Then C+D=0. To prove the inductive step, we show that if after k−1rotations C+D=k−1, then after k rotations C+D=k. Draw a picture of aleft and right rotation and observe that C+D increases by 1 in each case. Let y bethe parent of x, and suppose x is a left child of y. After performing a right rotation, ybecomes the right child of x, and the previous right child of x becomes the left childof y. That is, the left spine of the right subtree of x before the rotation is tacked onto y, so the length of that spine increases by one. The left subtree of x is unaffectedby a right rotation. The case of a left rotation is symmetric. Therefore after one morerotation C+D increases by one and k=C+D, proving the inductive hypothesis.We will now calculate the expected values of C and D. For simplicity, we assume that the keys are 1,2,...,n. This assumption is without loss of generality because we only compare keys.�For two distinct nodes x and y , let k = key [x ]and i = key [y ], and deﬁne the indicator random variableX 1 if y is a node on the right spine of the left subtree of x (in T ),i,k =0 otherwise .(f) Show that X i,k =1if and only if (1) priority [y ]> priority [x ], (2) key [y ]< key [x ], and(3) for every z such that key [y ]< key [z ]< key [x ],we have p riority [y ]< priority [z ].Solution:To prove this statement, we must prove both directions of the “if and only if”. Firstwe prove the “if” direction. We prove that if (1) priority [y ]> priority [x ], (2) key [y ]<key [x ], and (3) for every z such that key [y ]< key [z ]< key [x ]are true, priority [y ]< priority [z ], then X i,k =1. We prove this by contradiction. Assume that X i,k =0. That is, assume that y is not on the right spine of the left subtree of x . We show thatthis leads to a contradiction. If y is not on the right spine of the left subtree of x ,it could be in one of three places:1. Suppose y is in the right subtree of x . This contradicts condition (2) becausekey [y ]< key [x ].2. Suppose y is not in one of the subtrees of x . Then x and y must share somecommon ancestor z . Since key [y ]< key [x ], we know that y is in the left subtreeof z and x is in the right subtree of z and key [y ]< key [z ] < key [x ]. Since y isbelow z in the tree, priority [z ]< priority [x ]and priority [z ]< priority [y ]. Thiscontradicts condition (3).3. Suppose that y is in the left subtree of x but not on the right spine of the leftsubtree of x . This implies that there exists some ancestor z of y in the left subtreeof x such that y is in the left subtree of z . Hence key [y ]< key [z ]< key [x ]. Sincez is an ancestor of y , priority [z ]< priority [y ], which contradicts condition (3).All possible cases lead to contradictions, and so X i,k =1.Now for the “only if” part. We prove that if X i,k =1, then statements (1), (2), and (3) are true. If X i,k =1, then y is in the right spine of the left subtree of x . Since y is ina subtree of x , y must have been inserted after x ,so p riority [y ]> priority [x ], proving(1). Since y is in the left subtree of x , key [y ]< key [x ], proving (2). We prove (3) by contradiction: suppose that X i,k =1and there exists a z such that key [y ]< key [z ]< key [x ]and priority [z ]< priority [y ]. In other words, z was inserted before y . There arethree possible cases that satisfy the condition key [z ]< key [x ]:1. Suppose z is in the right spine of the left subtree of x .For y to be inserted into theright spine of the left subtree of x , it will have to be inserted into the right subtreeof z . Since key [y ]< key [z ], this leads to a contradiction.2. Suppose z is in the left subtree of x but not in the right spine. This implies thatz is in the left subtree of some node z in the right spine of x . Therefore for y tobe inserted into the right spine of the left subtree of x , it must be inserted into theright subtree of z . This leads to a contradiction by reasoning similar to case 1.3. Suppose that z is not in one of the subtrees of x . Then z and x have a commonancestor z such that z is in the left subtree of z and x is in the right subtree of x .This implies key [z ] < key [z ] < key [x ]. Since key [y ]< key [z ]< key [z ], y cannotbe inserted into the right subtree of z . Therefore it cannot be inserted in a subtreeof x , which is a contradiction.Therefore there can be no z such that key [y ] < key [z ] < key [x ] and priority [z ] < priority [y ]. This proves statement (3). We have proven both the “if” and “only if” directions, proving the claim.(g) Show that(k − i − 1)! 1 Pr {X i,k =1} == . (k − i +1)! (k − i +1)(k − i )Solution: We showed in the previous part that X i,k =1if and only if the priorities of the items between y and x are ordered in a certain way. Since all orderings are equally likely, to calculate the probability we count the number of permutations of priorities that obey this order and divide by the number of total number of priority permutations. We proved in (e) that whether or not X i,k =1depends only on the relative ordering of the priorities of y , x , and all z such that key [y ] < key [z ] < key [x ]. Since we assumed that the keys of the items come from {1,...,n }, the keys of the items in question are i,i +1,i +2,...,k − 1,k . There are (k − i +1)!permutations of the priorities of these items. Of these permutations, the ones for which X i,k =1are those where i has minimum priority, k has the second smallest priority, and the priorities of the remaining k − i − 1 items follow in any order. There are (k − i − 1)! of these permutations. Thus the probability that we get a “bad” order is (k −i −1)!/(k −i +1)!= 1/(k − i )(k − i +1).(h) Show thatk −1 � 1 1 E [C ]= =1− . j (j +1)k j =1 Solution:X For a node x with key k , E [C ]is the expected number of nodes in the right spine of the left subtree of x . This equals the sum of the expected value of the random variables i,k for all i in the tree. Since X i,k =0for all nodes i ≥ k , we only need to consider i <k .� � � k −1 �k −1 � E [C ]=E [X i,k ]=E X i,k i =1 i =1 k −1 =Pr {X i,k =1}i =1 k −1 � 1 =(k − i )(k − i +1) i =1 k −1 � 1= j (j +1)j =1 1To simplify this sum, observe that j (j 1+1) = j +1−j = − 1 . Therefore the sumj (j +1) j j +1 telescopes and we have 1 E [C ]=1− . kIf you didn’t see this, you could have proven that the equationk −1 � 1 1 =1− j (j +1)k j =1 holds by induction on k . In the proving the inductive hypothesis you might have 11discovered 1 − = .j j +1 j (j +1)(i) Use a symmetry argument to show that1 E [D ]=1− . n − k +1Solution: The idea is to consider the treap produced if the ordering relationship amongthe keys is reversed. That is, for all items x , leave priority [x ]unchanged but replacekey [x ]with n − key [x ]+1.Let T be the binary tree we get when inserting the items (in increasing order of priority) using the original keys. Once we remap the keys and insert them into a newbinary search tree, we get a tree T whose shape is the mirror image of the shape ofT . (reﬂected left to right). Consider the item x with key k in T and therefore has key n − k +1 i n T . The length of the left spine of x ’s right subtree in T has becomethe length of the right spine of x ’s left subtree in T . We know by part (g) that the expected length of the right spine of a left subtree of a node y is 1− 1/idkey [y ],so the expected length of the right spine of the left subtree of x in T is 1− 1/(n − k +1), which means that 1 E [D ]=1− . n − k +1(j) Conclude that the expected number of rotations performed when inserting a node into a treap is less than 2.Solution:11 E [number of rotations ]= E [C +D ]=E [C ]+E [D ]=1− +1− k n − k +1 ≤ 1+1=2. Problem 4-2. Being balancedCall a family of trees balanced if every tree in the family has height O (lg n ), where n is the number of nodes in the tree. (Recall that the height of a tree is the maximum number of edges along any path from the root of the tree to a leaf of the tree. In particular, the height of a tree with just one node is 0.)For each property below, determine whether the family of binary trees satisfying that property is balanced. If you answer is “no”, provide a counterexample. If your answer is “yes”, give a proof (hint: it should be a proof by induction). Remember that being balanced is an asymptotic property, so your counterexamples must specify an inﬁnite set of trees in the family, not just one tree. (a) Every node of the tree is either a leaf or it has two children.Solution: No. Counterexample is a right chain, with each node having a leaf hanging off to the left(b) The size of each subtree can be written as 2k − 1, where k is an integer (k is not the same for each subtree).Solution: Yes.Consider any subtree with root r . We know from the condition that the size of this subtree is 2k 1 − 1. We also know that the size of the subtree rooted at the left child of r is 2k 2 − 1, and the size of the subtree rooted at the right child of r is 2k 3 − 1. But the size of the subtree at r is simply the node r together with the nodes in the left and right subtrees. This leads to the equation 2k 1 − 1=1+(2k 2 − 1)+(2k 3 − 1),or 2k 1 =2k 2 +2k 3. Now we show that k 2 =k 3. This is easy to see if you consider the binary representations of k 1, k 2, and k 3.Otherwise, if we assume WLOG that k 2 ≤ k 3, then we have 2k 1−k 2 =1+2k 3−k 2. 2Now, the only pair of integer powers of 2 that could satisfy this equation are 21 and 0 . Thus k 3 − k 2 =0,or k 2 =k 3, and the left and right subtrees of r have an equal number of nodes. It follows that the tree is perfectly balanced.(c) There is a constant c>0such that, for each node of the tree, the size of the smallerchild subtree of this node is at least c times the size of the larger child subtree.Solution:Yes1. The proof is by induction. Assume that the two subtrees of x with n nodes in itssubtree has two children y and z with subtree sizes n1 and n2. By inductive hypothesis,the height of y’s subtree is at most d lg n1 and the height of z’s subtree is at most d lg n2for some constant d. We now prove that the height of x’s subtree is at most d lg n.Assume wlog that n1 ≥n2. Therefore, by the problem statement, we have n2 ≥cn1.Therefore, we have n=n1 +n2 +1≥(1+c)n1 +1≥(1+c)n1 and the height hof x’s subtree is d lg n1 +1≤d lg(n/(c+1))+1≤d lg n−d lg(1+c)+1≤d lg nif d lg(1+c)≥1. Therefore, for sufﬁciently large d, the height of a tree with n nodesis at most d lg n.(d) There is a constant c such that, for each node of the tree, the heights of its childrensubtrees differ by at most c.Solution: Yes1. Let n(h)be the minimum number of nodes that a tree of height h thatsatisﬁes the stated property can have. We show by induction that n(h)≥(1+α)h−1,for some constant 0<α≤1. We can then conclude that for a tree with n nodes,h≤log1+α(n+1)=O(lg n).For the base case, a subtree of height 0has a single node, and 1≥(1+α)0 −1forany constant α≤1.In the inductive step, assume for all trees of height k<h, that the n(k)≥(1+α)k−1.Now consider a tree of height h, and look at its two subtrees. We know one subtreemust have height h−1, and the other must have height at least h−1−c. Therefore,we known(h)≥n(h−1)+n(h−1−c)+1.Using the inductive hypothesis, we getn(h) ≥(1+α)h−1 −1+(1+α)h−1−c−1+1≥2(1+α)h−1−c−1.Suppose we picked αsmall enough so that (1+α)<21/(c+1). Then (1+α)c+1 <2.Therefore, we getn(h)≥2(1+α)h−1−c−1≥(1+α)h−1.1Note that in this problem we assume that a nil pointer is a subtree of size 0, and so a node with only one child has two subtrees, one of which has size 0. If you assume that a node with only one child has only one subtree, then the answer to this problem part is “no”.�11Handout18: Problem Set4SolutionsTherefore, we satisfy the inductive hypothesis.Note that if we plug this value for αback into h≤log1+α(n+1),we getlg(n+1)h≤≤(c+1)l g(n+1).lg(1+2c+1)(e) The average depth of a node is O(lg n). (Recall that the depth of a node x is thenumber of edges along the path from the root of the tree to x.)Solution: No.√Consider a tree with n−n nodes organized as a complete binary tree, and the other √ √n nodes sticking out as a chain of length n from the balanced tree. The height of√√√the tree is lg(n−n)+n=Ω(n), while the average depth of a node is at most�√√ √(1/n)n·(n+lg n)+(n−n)·lg n√√=(1/n)(n+n lg n+n lg n−nlgn)=(1/n)(n+n lg n)=O(lg n)√Thus, we have a tree with average node depth O(lg n), but height Ω(n).。

基于联合意义度量的Top-K图模式挖掘

基于联合意义度量的Top-K图模式挖掘刘勇高宏李建中(哈尔滨工业大学计算机科学与技术学院哈尔滨 150001)摘要提出了一个新的研究问题:挖掘Top-K图模式,联合起来使某个意义度量最大化.利用信息论的概念,给出了两个具体的问题定义MES和MIGS,并证明它们是NP-hard.提出了两个高效算法Greedy-Topk和Cluster-TopK. Greedy-TopK先产生频繁子图,然后增量贪心选择K个图模式. Cluster-TopK先挖掘频繁子图的一个代表模式集合,然后从代表模式中增量贪心选择K个图模式.当意义度量满足submodular性质时, Greedy-TopK能提供近似比保证. Cluster-TopK没有近似比保证,但比Greedy-TopK更高效. 实验结果显示,在结果可用性方面,本文提出的Top-K挖掘优于传统的Top-K挖掘. Cluster-TopK比Greedy-TopK快至少一个数量级. 而且, 在质量和可用性方面, Cluster-TopK的挖掘结果非常类似于Greedy-TopK的挖掘结果.关键词图挖掘; 图数据库; 频繁子图; 代表模式; 联合熵; 信息增益中图法分类号: TP311Top-K Graph Patterns Mining Based on Some Joint Significance MeasureLiu Yong Gao Hong Li Jianzhong(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001)Abstract This paper proposes a novel problem of mining top-k graph patterns that jointly maximize some significance measure from graph databases. By exploiting the concepts of information theory, we give two problem formulations, MES and MIGS, and prove that they are NP-hard. Two efficient algorithms, Greedy-TopK and Cluster-TopK, are proposed for this new problem. Greedy-TopK first generates frequent subgraphs, and then incrementally and greedily selects K graph patterns from frequent subgraphs. Cluster-TopK first mines a set of representative patterns for frequent subgraphs, and then incrementally and greedily selects K graph patterns from representative patterns. When a given significance measure satisfies the submodular property, Greedy-TopK can provide tight approximation bound. Cluster-TopK has no approximation bound guarantee but is more efficient than Greedy-TopK. Extensive experimental results demonstrate that the Top-K mining proposed in this paper is superior to the traditional Top-K mining in terms of results usefulness. Cluster-TopK can achieve at least an order of magnitude speedup than Greedy-TopK, while achieving comparable mining results in terms of quality and usefulness.Keywords graph mining; graph database; frequent subgraph; representative pattern; joint entropy; information gain1 引言作为一种通用的数据结构,图可以用来表示数据对象之间的各种复杂关系.例如:图可以表示化合物的分子结构,蛋白质交互网络,社会网络,Web结构图等.很多与图有关的应用都需要利用图模式来管理、查询和分析图数据.例如:图查询[1]可以利用图模式建立有效的索引, 图分类[2]可以利用图模式建立有效的分类模型.因此,从图数据库中发现有用的图模式已成为数据挖掘领域一项重要的研究课题.本课题得到国家”九七三”重点基础研究发展计划项目(2006CB303005)和国家自然科学基金(60533110, 60773063)资助.刘勇,男,1975年生,博士生,主要研究领域为图挖掘和图数据管理.E-mail:liuyong123456@. 高宏,女,1966年生,教授,博士生导师,主要研究领域为数据仓库与数据挖掘,传感器网络等. 李建中,男,1950年生,教授,博士生导师,主要研究领域为数据仓库与数据挖掘,传感器网络,数据目前,与图模式挖掘有关的绝大部分研究主要集中在如何高效地挖掘频繁子图[3-8],以及频繁子图的各种简洁表示(频繁闭图模式[9]和频繁最大图模式[10-11]). 然而, 很少有研究考虑如何根据用户给定的意义度量来挖掘图模式. 本质上,频繁子图挖掘所使用的支持度恰恰是一种特殊的意义度量.然而,在不同的应用中,用户很可能需要不同的意义度量.给定一个意义度量,传统Top-K挖掘方法对所有可能的图模式根据度量值的大小排序,输出前K个图模式.然而, 传统Top-K挖掘并不考虑图模式之间的相关性,输出的Top-K模式可能非常相似. 第5节的图5 显示了信息增益作为意义度量时, 传统Top-K挖掘网格,计算生物学等.方法从一个图数据库中挖掘的Top-5图模式.这5个图模式在结构上非常相似.如果用户得到其中一个图模式, 就会对其它图模式失去了兴趣,因为在实际应用中用户希望得到的是一个多样化的图模式集合.为了克服传统Top-K挖掘方法的缺点, 本文研究基于联合意义度量的Top-K图模式挖掘方法. 联合意义度量的作用域是图模式集合而不是图模式, 因此充分考虑了图模式之间的相关性. 给定一个联合意义度量,本文要研究的问题就是挖掘Top-K图模式, 联合起来使该意义度量最大化. 本文的目标是设计一个适用于任何意义度量的通用的Top-K挖掘算法.本文首先讨论了适用于图模式集合的意义度量,并利用信息论中的概念(联合熵和信息增益)给出了两个具体的问题定义MES和MIGS,然后证明了它们是NP-hard问题. 为了高效地挖掘基于联合意义度量的Top-K图模式,本文给出了两个算法Greedy-TopK和Cluster-TopK. Greedy-TopK先产生频繁图模式(或频繁闭图模式),然后增量贪心选择K个图模式.本文证明了如果用户给定的意义度量满足submodular性质, Greedy-TopK能提供近似比保证.然而, 当频繁图模式(或频繁闭图模式)数量很大时,Greedy-TopK效率低,可扩展性差. 为此,本文又提出了另一个更高效的算法Cluster-TopK. Cluster-TopK先从图数据库中挖掘所有频繁图模式的一个代表模式集合, 然后从代表模式中增量贪心选择K个图模式. Cluster-TopK最大的优点是无需产生频繁图模式(或闭图模式)就能快速地挖掘一个代表模式集合.而且, 因为代表模式的数量比闭图模式的数量少很多, 所以Cluster-TopK的贪心选择也比Greedy-TopK的贪心选择快很多.此外,本文还从理论上严格证明了Cluster-TopK产生的解和Greedy-TopK产生的解非常接近.实验结果表明本文提出的Top-K挖掘在结果质量和可用性方面要远远优于传统Top-K挖掘. Cluster-TopK和Greedy-TopK在结果质量和可用性方面非常接近.然而, Cluster-TopK能比Greedy-TopK快1到2个数量级.本文内容安排如下:第2节介绍预备知识; 第3节给出问题定义和NP-hard证明; 第4节给出完整的算法描述和算法分析; 第5节通过实验证明算法的有效性以及挖掘结果的可用性;第6节介绍相关工作;第7节总结全文.2 预备知识本节介绍图挖掘和信息论的一些基本概念.定义1(标号图).标号图G定义为四元组G =(V, E, Σ, l),其中,V是顶点集合,E⊆V×V是边集合,Σ是标号集合, l：V∪E→Σ是一个函数,用来对顶点和边分配标号.定义2(子图同构). 给定两个图G = (V, E, Σ, l)和G′= (V′, E′,Σ′, l′), 一个从G到G′的子图同构是一个单射函数f:V→V′,满足:(1)∀u∈V, l(u)=l′(f(u));(2)∀(u,v)∈E, (f(u), f(v))∈E′并且l((u,v))=l′((f(u),f(v))).单射函数f也称为G在G′中的一个嵌入.如果存在一个从G到G′的子图同构,则G称为G′的子图, G′称为G的超图,记为G⊆G′.如果G⊆G′且G≠G′,则G称为G′的真子图,G′称为G的真超图,记为G⊂G′.子图同构测试已被证明是一个NP-完全问题[12].如果G ⊆G′,也称G′包含G.给定一个图数据库D={G1,G2,…,G n}和一个图模式p, p在D中的支持集定义为D中包含p的图集合,记为D supp(p)={G i|p⊆G i,G i∈D}. |D supp(p)|称为p在D中的支持度,记为supp(p;D).|D supp(p)|/|D|称为p在D中的相对支持度.支持度度量具有反单调性质:如果p1⊆p2,则supp(p1;D)≥supp(p2;D).对于用户给定的一个最小支持度阈值min_sup,如果supp(p;D)≥min_sup,称p在D中是频繁的.D中所有频繁图模式集合记为FS={p| supp(p;D)≥min_sup}.本文中,min_sup既可以表示绝对最小支持度阈值,又可以表示相对最小支持度阈值.如果在D中p的任何真超图与p支持度都不相同,则称p是D中的闭图模式.D中所有频繁闭图模式集合记为CS={p|p∈FS,⌝∃p′∈FS使得p⊂p′并且supp(p;D)=supp(p′;D)}.上下文明确时,可用supp(p)代替supp(p;D).因为本文后面介绍的意义度量将使用信息论的一些概念[13], 这里先回顾一下有关的基本定义.定义3(熵). 随机变量x的熵定义为:H(x)= −()()log(())xx xv dom xp v p v∈∑,其中dom(x)是x的定义域, ()xp v是x等于x v时的概率.定义4(条件熵). 在给定随机变量x的条件下,随机变量y的条件熵定义为:H(y|x) = −()()(,)log((|))x yx y y xv dom x v dom yp v v p v v∈∈∑∑.定义5(联合熵). 随机变量x和y的联合熵定义为:H(x, y) = −()()(,)log((,))x yx y x yv dom x v dom yp v v p v v∈∈∑∑.3 问题定义本节首先讨论用于图模式集合的意义度量,然后给出两个具体的问题定义MES和MIGS,最后证明它们是NP-hard问题.文献中已有大量的用于单一模式(包括项集模式,序列模式和图模式)的意义度量.例如：在统计学领域,x2检验和Pearson相关可以用来度量模式的统计意义.在数据挖掘和机器学习领域,信息增益和交叉熵可以用来度量模式是否适合作为分类特征.文献[14]总结了21个常用的意义度量.尽管文献中给出的大部分意义度量都能直接用来度量单一图模式的意义,然而它们中的绝大部分并不能用来量化一个图模式集合的意义.本文算法需要用户提供一个意义度量M使得M能量化一个图模式集合的意义.信息论中的联合熵和信息增益可以用来量化一个图模式集合的意义.在给出具体的问题定义之前,我们先给出本文要研究的一般问题.假设S是给定的图模式集合(例如：S可以表示某个图数据库中所有频繁图模式集合),T是S的子集合, M(T)是T的意义度量.本文要解决的问题就是发现S的一个大小为K的子集合T*使得M(T*)被最大化,即T* = argmax T⊂S, |T|=k M(T) (1)如果所有候选的图模式给定的话,上面定义的问题显然是一个组合优化问题. 本文的目标是设计求解上述问题的通用算法，不受任何具体意义度量的限制.为了方便描述算法,我们利用信息论中的联合熵和信息增益给出下面两个具体的问题定义.定义6(基于熵的意义度量最大化). 给定图数据库D,D的一个图模式集合S(或者最小支持度min_sup)和一个整数K,最大化基于熵的意义度量问题(MES)就是从S(或者D的所有频繁图模式集合)中发现一个大小为K的子集合T使得H(T)被最大化,其中H(T)是T中随机变量(图模式)的联合熵.定义7(基于信息增益的意义度量最大化). 给定图数据库D, D的一个图模式集合S(或者最小支持度min_sup)和一个整数K.假设D中每个图都有一个类标记,设C是表示类标记的随机变量.最大化基于信息增益的意义度量问题(MIGS)就是从S(或者D的所有频繁图模式集合)中发现一个大小为K的子集合T使得IG(T)=H(C)−H(C|T)被最大化,其中H(C)是C的无条件熵,H(C|T)是T中随机变量(图模式)给定的条件下C的条件熵.基于熵的意义度量经常用来在无监督的环境下度量不确定性,而基于信息增益的意义度量经常用来在有监督的环境下选择强分类特征.在上面的形式化定义中,我们是把每个图模式p看成了一个随机变量v p,如果数据库中的某个图G含有p,则在G上v p等于1，否则v p等于0.通常,当某个具体的意义度量给定时,式(1)定义的问题是NP-hard问题.下面,我们证明MIGS问题(定义7)是NP-hard. 类似地,也可以证明MES问题(定义6)是NP-hard.定理1. 最大化基于信息增益的意义度量问题(MIGS)是NP-hard.证明.通过把最大覆盖问题规约到MIGS问题,我们来证明该定理.设MC = (E,S,K) 是最大覆盖问题的任一实例,其中E ={e1, e2, ......, e n}是元素集合, S={S1, S2, ......, S L}是集合族.求解MC要求从S中选择K个集合,使得这K 个集合覆盖的元素数量最大化.我们可以在多项式时间内把MC转化成MIGS问题的一个实例.1. 把E中每个元素看成一个边的标号,那么S中的某个集合S i={e i1,e i2,…...,e it}可以转换成图1所示的图模式.该图模式用P(S i)表示. P(S i)中所有结点的标号都相同,但是边的标号各不相同.以相同的方式,我们可以把S中的所有集合都转换成对应的图模式.所有这些图模式构成的集合用PS表示.图1 对应S i={e i1,e i2,…...,e it}的图模式P(S i)2. 我们首先为E中的每个元素构造一个图.如果一个元素e能被{S i1, S i2, ......, S im}中的集合覆盖, 我们为e构造一个图2所示的图, 其中P(S i)是对应集合S i的图模式,连接各个图模式P(S i)的边标号都相同.显然,每个元素都将对应一个图,我们把所有这样的图标记为正类,放入数据库DB.此时, |DB|=n.......P(S i1)P(S i2)P(S im)图2 对应{S i1, S i2, ......, S im}的数据库图3. 我们然后随意构造n个图使得这n个图中不含有PS中的任何图模式.我们把这n个图标记为负类, 放入数据库DB. 此时, |DB|=2n.给定上面构造的图数据库DB, DB中的一个图模式集合PS以及正整数K, 我们得到了MIGS问题的一个实例IMIG.DB中正类图个数等于负类图个数,因此H(C)=1,其中C是表示类标记的随机变量.设T是PS中任意一个大小为K的子集合.因为T中每个图模式对应S中的一个集合,因此从T可得到S的一个大小为K的子集合S′.设被S′覆盖的元素数是x.对T中的任何图模式p,DB中的任何负类图都不含有p.根据条件熵定义,H(C|T) =()(log 2()()n x n n x n x n n x n n x n -+---⨯+-+-+log )()()n nn x n n x n-+-+=1(log log )2222n x n x n n n x n x---+--.因此, IG (T )=H (C )−H (C |T )=1+1loglog 2222n x n x nn n x n x --+--.令()f x =1log log 2222n x n x n n n x n x--+--.可以证明()f x 的一阶导数()f x '大于0, ()f x 是一个单调递增函数.当x 的值增大时, IG (T )也增大.因此, IMIG的最优解能被用来导出MC 的一个最优解. □4 挖掘Top-K 图模式MES 和MIGS 都是NP-hard 问题,因此需要设计近似算法或启发式算法求解它们.本文提出了两个高效算法,Greedy-TopK 和Cluster-TopK,来求解它们. Greedy-TopK 从频繁图模式集合中贪心选择Top-K 模式,具有近似比保证. Cluster-TopK 从图数据库中挖掘代表图模式集合,然后再从中贪心选择Top-K 模式. 尽管Cluster-TopK 没有近似比保证, Cluster-TopK 却具有极高的挖掘效率.4.1 Greedy-TopK 算法Greedy-TopK 算法采用著名的贪心策略,从所有频繁图模式集合中增量地选择K 个图模式.为了应用贪心策略,我们定义了一个收益函数b. Greedy-TopK 每次选择使收益函数b 最大的图模式. 假定T 是已经选择的图模式集合. 对MES 和MIGS 问题,图模式p 的收益函数b 定义如下:(,)() MES()(|)(|,) MIGS H T p H T b p H C T H C T p -⎧=⎨-⎩对对 (2) 根据等式(2)给出的贪心规则,我们设计了一个贪心算法Greedy-TopK,如算法1所示. 初试化时,使用某个图挖掘算法(例如: gSpan [6] )挖掘所有频繁图模式集合F ,同时置结果集T 为空.然后算法选择最有意义的图模式p *放入T 中.此后,算法进入一个循环过程,每次从剩余的图模式集合F −T 中选择一个使收益函数b 最大的图模式p ,放入T 中.当结果集T 中图模式个数等于K 时,退出循环,算法结束. 算法1. Greedy-TopK输入:图数据库D ;最小支持度阀值min _sup ;意义度量M ;输出模式个数K . 输出:使M 最大化的K 个图模式1. 根据min _sup ,挖掘D 中所有频繁图模式F ;2. 从F 中选择一个图模式p *使M (p )最大化;3. T = { p *};4. WHILE (|T | < K ) do5. 从F −T 中选择图模式p 使得b (p ) 最大化;6. T = T ∪{ p };7. 输出T ;下面,我们分析算法Greedy-TopK 的近似比. 当意义度量M 满足submodular 性质(见下面的定义)时,贪心算法将给出近似解[15].定义8(submodular 性质). 设M 是定义在一个集合S 上的度量函数.如果对任何T ⊂T '⊆ S 和任何p ∈S , 都有M (T ∪{p })−M (T ) ≥ M (T '∪{p })−M (T '),则称M 是一个submodular 的集合函数.仔细分析本文给出的两个意义度量,我们发现基于熵的意义度量满足submodular 性质,因此算法Greedy-TopK 对MES 问题能给出如下的近似比.定理2. 设T 是算法Greedy-TopK 选择的K 个图模式集合, T*是使联合熵最大化的K 个图模式集合.那么,*()()1H T eH T e ≤-. 证明. 文献[15] 给出了完整的证明过程. □我们发现基于信息增益的意义度量并不满足submodular 性质.因此,对MIGS 问题,算法Greedy-TopK 不能给出像定理2那样的近似比.然而,我们注意到,对MIGS 问题,无条件熵H (C )是任何模式集合信息增益的上界.因此,算法Greedy-TopK 对MIGS 问题能给出如下的联机近似比.定理3. 设T 是算法Greedy-TopK 选择的K 个图模式集合. 那么, 算法Greedy-TopK 对MIGS 问题给出的联机近似比至多是H (C )/IG (T ). 4.1.1 裁剪技术算法Greedy-TopK 的第一步需要遍历图模式搜索空间,挖掘所有频繁图模式.本小节研究如何根据给定的意义度量设计有效的裁剪技术,将其集成到图模式挖掘算法的框架中裁剪图模式搜索空间.所有频繁子图挖掘算法都利用了支持度的反单调性质来裁剪搜索空间.即,一个图模式p 的支持度是p 的所有超图支持度的上界.然而,通常情况下,给定的意义度量并不具有反单调性质.因此,我们不能像利用支持度那样简单地利用给定的意义度量.幸运的是,给定一个图模式p 的意义度量值M (p ),我们可以导出p 的所有超图意义度量值的上界,从而根据这个上界来裁剪图模式搜索空间.我们仍然采用Greedy-TopK 算法的框架来增量地选择Top-K 图模式.因此,我们分两种情况介绍裁剪技术:(1)当结果集为空时,选择第一个图模式;(2)当某些图模式已经在结果集中,选择下一个图模式. 4.1.1.1 选择第一个图模式本小节研究结果集为空时如何设计有效的裁剪技术. 请注意,我们在4.1.1.1和4.1.1.2节给出的裁剪技术可以应用到任何频繁子图挖掘算法的框架中.因此,我们未给出具体的挖掘算法.定理4给出了基于熵的裁剪条件.定理5给出了基于信息增益的裁剪条件.定理4. 设D 是给定的图数据库, q 是p 的任意超图.如果supp (p ; D ) ≤ 1/2,则H (q ) ≤ H (p ).定理5. 给定图数据库D=PD+ND , 其中PD 和ND 分别是正类和反类图集合.设q 是p 的任意超图, PD 中有x 个图含有p , ND 有y 个图含有p .那么,其中,.4.1.1.2 选择下一个图模式本小节研究结果集不空时如何设计有效的裁剪技术.在给出具体的裁剪技术之前,我们先定义一个新的概念—等价类.定义9(等价类).设D是图数据库,G是D中的任意图,T是已经选择的图模式集合.G相对于T的等价类定义为,其中是指示函数.根据上面等价类的定义,一个图模式集合T可以将数据库D划分成若干个等价类(块)集合,并且. 利用等价类的定义,我们可以在结果集非空的情况下导出有效的裁剪条件.定理6给出了基于熵的裁剪条件.定理7给出了基于信息增益的裁剪条件.定理6. 设D是给定的图数据库,T是已经选择的图模式集合, 是用T划分D得到的等价类集合, q是p的任意超图(p⊆q),并且,. 那么,其中, x i是B i 中含有p的图个数.定理7. 给定图数据库D=PD+ND,其中PD 和ND分别是正类和反类图集合.T是已经选择的图模式集合,是用T划分D得到的等价类集合, q是p的任意超图(p⊆q),并且,. 那么,其中,,,m i和n i分别是B i中正类图个数和负类图个数,x i和y i分别是B i中含有p的正类图个数和含有p的负类类图个数.4.2 Cluster-TopK算法Greedy-TopK算法需要先产生所有的频繁图模式,在频繁图模式数量较少的情况下, Greedy-TopK算法具有较高的效率.然而,在实际应用中,当数据库中图较稠密或者用户给定的支持度阀值较低时,频繁子图挖掘算法通常产生大量的甚至指数级的频繁子图.这使得Greedy-TopK算法不能在合理的时间内完成任务,限制了该算法的可用性.即使使用 4.1.1节中的裁剪技术, Greedy-TopK也需要遍历图模式空间中的大量图模式.为此,本节提出了另一个更高效的算法Cluster-TopK.Cluster-TopK算法的基本思想是将所有频繁图模式聚类成若干个簇,选择每个簇的中心作为代表模式构成一个候选集合T,然后再使用贪心策略从T中增量地选择K个图模式.Cluster-TopK算法的优点是不需要先产生所有的频繁图模式,就能从图数据库中快速地挖掘代表模式得到候选集,具体内容见4.2.2节.假设C={p1, p2, ……, p n}是由若干个图模式构成的一个簇,p i是该簇中心.因为p i将作为代表模式被选择, p i应具有这样的特点：∀P j∈C, P j⊆P i, 并且P j 和P i在数据库图中应经常一起出现,即P j和P i有类似的支持度.下面,我们给出这种类型簇的两个形式化定义, δ-簇和Δ-簇.定义9(δ-覆盖).设δ(0≤δ≤1)是用户给定的一个参数,p和q是任意两个图模式.如果满足q⊆p, ≤δ, 则称q被p δ-覆盖.定义10(δ-簇).设δ(0≤δ≤1)是用户给定的一个参数,C={p1, p2, ……, p n}是一个图模式集合.如果C 中存在图模式p i满足∀p j∈C, p j被p i δ-覆盖, 则称C 是一个δ-簇,称p i为该δ-簇的代表模式(中心点).定义11(Δ-覆盖).设Δ(Δ>0的整数)是用户给定的一个参数, p和q是任意两个图模式.如果满足q⊆p, supp(q) −supp(p) ≤Δ, 则称q被p Δ-覆盖.定义12(Δ-簇). 设Δ(Δ>0的整数)是用户给定的一个参数,C={p1, p2, ……, p n}是一个图模式集合.如果C中存在图模式p i满足∀p j∈C, p j被p i Δ-覆盖, 则称C是一个Δ-簇,称p i为该Δ-簇的代表模式(中心点).Cluster-TopK算法就是根据用户给定的参数δ(或Δ),将频繁图模式集合划分成若干个δ-簇(或Δ-簇),选择每个δ-簇(或Δ-簇)的代表模式构成候选集,然后再从中贪心选择K个图模式.4.2.1节分析了Cluster-TopK算法采用这种聚类策略的优点. 4.2.2节研究了如何在不产生所有频繁图模式的情况下快速地挖掘每个δ-簇(或Δ-簇)的代表模式. 4.2.3给出了具体的算法实现.4.2.1 Cluster-TopK聚类策略的优点本小节讨论Cluster-TopK算法为什么采用这种先聚类后贪心的策略.设N是给定图数据库D的大小,即|D| = N. 定义函数f(n)=logn NN n(0≤n≤N).下面的引理和定理将利用该函数.引理1. 设D是给定的图数据库, |D| = N, P1和P2是两个图模式, d是P1和P2之间的海明距离(即d=, 其中I是指示函数).那么, 0 ≤H(P1,P2)−H(P1)≤d(f(1)+f(N-1)).证明.见文献[16]中的Proposition 6.3. □根据引理1,我们容易得到下面的推论1.推论1. 设D是给定的图数据库, |D| = N, F={P1, P2, ……, P n}是任意图模式集合, P n+1是任意一个图模式, 如果P n和P n+1之间的海明距离是d,则H(P1, ……, P n-1, P n) −H(P1, ……, P n-1, P n+1) ≤d(f(1)+ f(N−1)).证明. H(P1, ……, P n-1, P n) −H(P1, ……, P n-1, P n+1) ≤H(P1, ……, P n, P n+1) −H(P1, ……, P n-1, P n+1) = H(P n| P1, ……, P n-1, P n+1) ≤H(P n| P n+1) = H(P n, P n+1)−H(P n+1). 因为P n和P n+1之间的海明距离是d,再根据引理1,可得H(P1, P2, ……, P n) −H(P1, ……, P n-1,P n+1)≤H(P n, P n+1)−H(P n+1)≤d(f(1)+ f(N−1)).□因为频繁图模式的数量通常很大,直接对频繁图模式进行贪心选择,时间复杂性太高. 对频繁图模式先聚类,只提取每个类的代表模式(中心点)构成候选集合,可以将频繁模式的数量降低2到3个数量级(见后面的实验结果).然后对候选集合进行贪心选择,可使算法具有极高的效率.而且, 我们可以证明代表模式集合中的最优解和频繁图模式集合中的最优解差别很小.定理8针对MES问题,分析了最优解之间的差异. 定理9针对MIGS问题,分析了最优解之间的差异.定理8和9给出了利用Δ-簇的分析结果.类似地,我们也容易给出利用δ-簇的分析结果.以MES问题为例,我们分析具体的差异.假设定理8中的N=10000, K=10, Δ=5, 则有H(T*) −H(S*) ≤ΔK (f(1)+f(N−1))≈0.073. 这是理论上的最大可能差异, 而实验中的结果显示：从聚类结果中选择的解和从频繁图模式集合中选择的解差别微乎其微.定理8. 设D是给定的图数据库, |D| = N, F={P1, P2, ……, P n}是D中频繁图模式集合, F被划分成m个Δ-簇的集合CS={C1, C2, ……, C m}, 其中每个C i(1 ≤i≤m)都是一个Δ-簇, 并且F=C1∪C2∪……∪C m. CS 中每个Δ-簇的代表模式被选择构成了一个集合RS={R1, R2, ……, R m},其中R i (1 ≤ i≤m)是C i的代表模式. 再假设T*是F中的一个大小为K的子集合,并且使H(T)最大化.S*是RS的一个大小为K的子集合,并且使H(S*)最大化.那么,H(T*) − H(S*) ≤ΔK (f(1)+ f(N−1)).证明. 设T*={Q1, Q2, ……, Q K}.因为F被划分成m个Δ-簇的集合CS, RS={R1, R2, ……, R m}是CS中Δ-簇的代表模式集合, 并且T*⊂F, 根据Δ-簇定义, 对任意Q i∈T*, 都存在RS中的一个代表模式R使得R能Δ-覆盖Q i.不矢一般性,假定Q1被R1 Δ-覆盖.根据Δ-覆盖定义,可知Q1⊆R1, supp(Q1)−supp(R1) ≤Δ.因此, Q1与R1之间的海明距离d=supp(Q1) −supp(R1) ≤Δ. 我们试图用R1替换Q1.令T =T*−{Q1}∪{R1}. 根据推论1, 可得H(T*) − H(T) ≤d(f(1)+ f(N−1)) ≤Δ(f(1)+ f(N−1)). 以此类推,如果T*中每个图模式都被RS中对应的代表模式所替换,可得到一个新的集合T',并且H(T*) −H(T') ≤ΔK (f(1)+ f(N−1)). 显然,T'是RS的一个大小为K的子集合.因为S*是RS的一个大小为K的子集合,并且使H(S*)最大化.因此,H(T*) −H(S*) ≤H(T*) −H(T') ≤ΔK (f(1)+ f(N−1)).□定理9. 设D是给定的图数据库, |D| = N, F={P1, P2, ……, P n}是D中频繁图模式集合, F被划分成m个Δ-簇的集合CS={C1, C2, ……, C m}, 其中每个C i(1 ≤i≤m)都是一个Δ-簇, 并且F=C1∪C2∪……∪C m. CS 中每个Δ-簇的代表模式被选择构成了一个集合RS={R1, R2, ……, R m},其中R i (1 ≤ i≤m)是C i的代表模式. 再假设T*是F中的一个大小为K的子集合,并且使IG(T)最大化.S*是RS的一个大小为K的子集合,并且使IG(S*)最大化.那么,IG(T*)−IG(S*)≤2ΔK (f(1)+ f(N−1)).证明. 假设C是表示数据库D中图类别的随机变量.类似于定理8中的证明过程,可得|H(T*)− H(S*)|≤ΔK (f(1)+ f(N−1)), |H(C,T*)−H(C,S*)| ≤ΔK(f(1)+ f(N−1)).因为IG(T*) −IG(S*) = (H(C)−H(C|T*)) − (H(C)−H(C|S*))= H(C|S*) −H(C|T*) = (H(C,S*)−H(S*)) − (H(C,T*)−H(T*)) =(H(T*)−H(S*))−(H(C,T*)−H(C,S*)),因此IG(T*)−IG(S*) ≤ 2ΔK(f(1)+ f(N−1)).□4.2.2 Cluster-TopK算法的关键技术如果先得到完整的频繁图模式集合F,再使用传统的聚类算法(例如k-means算法)对F进行聚类,时间复杂性将是O(|F|2),这显然是不可行的.为了使Cluster-TopK算法高效可扩展, Cluster-TopK算法要实现如下两个目标：(1)只扫描频繁子图挖掘算法输出的图模式一遍而能得到一个代表模式集合; (2)裁剪图模式空间中不产生(或极少产生)代表模式的那些分枝.下面两个小节分别介绍实现这两个目标的关键技术.4.2.2.1 产生代表模式图3显示了图模式空间中的一个分枝,每个节点代表一个频繁子图.大部分频繁子图挖掘算法(gSpan[6],FFSM[7]等)都采用深度优先方式访问该空间中的每个节点.采用深度优先方式将会访问每个节点两次：(1)第一次是从父亲节点到当前节点的访问,例如在图3中从节点A到节点C的访问. (2)第二次是在完成了所有后裔的访问后再次回到当前节点,例如在图3中访问完节点E、F和G之后第二次访问节点C.目前的频繁子图挖掘算法是在第一次访问某个节点时输出它.因此,对图3中的节点输出顺序是……A,B,C,E,F,G,D,……....图3 图模式空间中的一个分枝为方便描述,我们将δ-覆盖或Δ-覆盖都简称为覆盖.我们的目标就是产生一个代表模式集合能覆盖所有的频繁图模式.对一个给定的图模式,例如图3中的节点A,根据定义,能覆盖节点A的图模式必然是节点A的超图.因此,图3中的节点P,Q和A的后裔都有可。

《人工智能》课程习题

《人工智能》课程习题第一章绪论1-1. 什么是人工智能？试从学科和能力两方面加以说明。

1-2. 在人工智能的发展过程中，有哪些思想和思潮起了重要作用？1-3. 为什么能够用机器（计算机）模仿人的智能？1-4. 现在人工智能有哪些学派？它们的认知观是什么？1-5. 你认为应从哪些层次对认知行为进行研究？1-6. 人工智能的主要研究和应用领域是什么？其中，哪些是新的研究热点？第二章知识表示方法2-1状态空间法、问题归约法、谓词逻辑法和语义网络法的要点是什么？它们有何本质上的联系及异同点?2-2设有3个传教士和3个野人来到河边，打算乘一只船从右岸渡到左岸去。

该船的负载能力为两人。

在任何时候，如果野人人数超过传教士人数，那么野人就会把传教士吃掉。

他们怎样才能用这条船安全地把所有人都渡过河去?再定义描述过河方案的谓词：L-R(x, x1, y, y1,S)：x1个修道士和y1个野人渡船从河的左岸到河的右岸条件：Safety(L,x-x1,y-y1,S’)∧Safety(R,3-x+x1,3-y+y1,S’)∧Boat(L,S)动作：Safety(L,x-x1,y-y1,S’)∧Safety(R,3-x+x1,3-y+y1,S’)∧Boat(R,S’)R-L (x, x1, y, y1,S)：x2个修道士和y2个野人渡船从河的左岸到河的右岸条件：Safety(R,3-x-x2,3-y-y2,S’)∧Safety(L,x+x2,y+y2,S’)∧Boat(R,S)动作：Safety(R,3-x-x2,3-y-y2,S’)∧Safety(L,x+x2,y+y2,S’)∧Boat(L,S’)(2) 过河方案Safety(L,3,3,S0)∧Safety(R,0,0,S0)∧Boat(L,S0)L-R(3, 1, 3, 1,S0) L-R(3, 0, 3, 2,S0)Safety(L,2,2,S1)∧Safety(R,1,1,S1)∧Boat(R,S1)Safety(L,3,1,S1’)∧Safety(R,0,2,S1’)∧Boat(R,S1’)R-L (2, 1, 2, 0,S1) R-L (3,0, 1, 1,S1’)Safety(L,3,2,S2)∧Safety(R,0,1,S2)∧Boat(L,S2)L-R(3, 0, 2, 2,S2)Safety(L,3,0,S3)∧Safety(R,0,3,S3)∧Boat(R,S3)R-L (3, 0, 0, 1,S3)Safety(L,3,1,S4)∧Safety(R,0,2,S1)∧Boat(L,S4)L-R(3, 2, 1, 0,S4)Safety(L,1,1,S5)∧Safety(R,2,2,S5)∧Boat(R,S5)R-L (1, 1, 1, 1,S5)Safety(L,2,2,S6)∧Safety(R,1,1,S6)∧Boat(L,S6)L-R(2, 2, 2, 0,S6)Safety(L,0,2,S7)∧Safety(R,3,1,S7)∧Boat(R,S7)R-L (0, 0, 2, 1,S7)Safety(L,0,3,S8)∧Safety(R,3,0,S8)∧Boat(L,S8)L-R(0, 0, 3, 2,S8)Safety(L,0,1,S9)∧Safety(R,3,2,S9)∧Boat(R,S9)R-L (0, 1, 1, 0,S9)Safety(L,1,1,S10)∧Safety(R,2,2,S10)∧Boat(L,S10)2-3利用图2.3，用状态空间法规划一个最短的旅行路程：此旅程从城市A开始，访问其他城市不多于一次，并返回A。

6Cluster

The quality of a clustering result depends on both the similarity measure used by the method and its
implementation

The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns
It is hard to define “similar enough” or “good enough”

the answer is typically highly subjective.
Data Mining: Concepts and Techniques 7
June 27, 2012
where

m f 1 (x 1 f x 2 f n
...
x nf
).
Calculate the standardized measurement (z-score)
x m f z if if s
f

Using mean absolute deviation is more robust than using standard deviation
2
What is Cluster Analysis?

Cluster: a collection of data objects

Similar to one another within the same cluster
Dissimilar to the objects in other clusters

禁忌搜索算法求解图节点着色问题

— nbiter := nbiter +1；
—产生 N (S) 的一个候选集 V* ，要求候选元素 x 是非禁忌的或是特赦的
f (x) ≤ A( f (S))；
—在V * 中选目标值最优的解 S * ；
( ) ( ) —若 f (S* ) ≤ A( f (S )) ，则 A f (S* ) ( ) := f S* −1；否则若 f (S ) ≤ A f (S* ) ，则 ( ) A f (S* ) := f (S ) −1；
As a new intelligent search algorithm, Tabu search algorithm is widely used in various optimization areas and achieved satisfactory results. This article describes the characteristics of Tabu search algorithms, applications and research processes. The article also outlines the basic processes of the algorithm, and review the keys to the process of algorithm design elements. In addition, we apply it to the Graph node coloring problem. Finally, we discuss the developing trends of the algorithm research.
(1)设定算法参数,产生初始解 x ,置空禁忌表；
(2)判断是否满足终止条件，若是,则结束,并输出结果;否则,继续以下步骤；

DELVER

DELVER: Real-Time, ExtensibleAlgorithms出处AbstractPeer-to-peer communication and semaphores have garnered minimal interest from both computational biologists and mathematicians in the last several years. In fact, few cyberinformaticians would disagree with the investigation of rasterization. In this position paper, we investigate how write-back caches can be applied to the emulation of object-oriented languages.Table of Contents1) Introduction2) Design3) Implementation4) Experimental Evaluation∙ 4.1) Hardware and Software Configuration∙ 4.2) Experiments and Results5) Related Work6) Conclusion1 IntroductionThe refinement of linked lists is a confusing question. To put this in perspective, consider the fact that foremost researchers always use suffix trees to fulfill this ambition. A practical problem in steganography is the exploration of operating systems. To what extent can redundancy be enabled to fulfill this intent?We examine how superpages can be applied to the synthesis of the Turing machine. Our heuristic creates Internet QoS. Nevertheless, the study of I/O automata might not be the panacea that end-users expected. Combinedwith Lamport clocks, it harnesses a novel framework for the synthesis of superblocks.To our knowledge, our work in this position paper marks the first framework improved specifically for flip-flop gates. Existing semantic and lossless applications use multimodal technology to allow RPCs. Existing electronic and cacheable algorithms use the refinement of model checking to request event-driven models [,,,]. It should be noted that our application is maximally efficient. This combination of properties has not yet been studied in previous work.Our main contributions are as follows. To begin with, we verify that though neural networks can be made large-scale, authenticated, and replicated, virtual machines can be made interactive, distributed, and homogeneous. Continuing with this rationale, we verify that Boolean logic and the transistor are largely incompatible. Of course, this is not always the case.The roadmap of the paper is as follows. To begin with, we motivate the need for linked lists. Furthermore, we prove the robust unification of lambda calculus and reinforcement learning []. We place our work in context with the related work in this area. Ultimately, we conclude.2 DesignOur methodology relies on the confirmed methodology outlined in the recent acclaimed work by Johnson in the field of linear-time theory. Consider the early methodology by Kumar and Jackson; our framework is similar, but will actually achieve this mission. The model for DELVER consists of four independent components: interrupts, multimodal modalities, "smart" configurations, and self-learning configurations. This may or may not actually hold in reality. Rather than synthesizing compact epistemologies, DELVER chooses to study active networks.Figure 1: DELVER's relational provision.DELVER relies on the extensive design outlined in the recent well-known work by G. Watanabe et al. in the field of networking. This is a significant property of our methodology. Consider the early methodology by Alan Turing et al.; our methodology is similar, but will actually solve this grand challenge. Similarly, Figure 1details DELVER's client-server synthesis. Thus, the model that our algorithm uses is feasible.Figure 2: The architectural layout used by DELVER.Reality aside, we would like to evaluate a design for how DELVER might behave in theory. Furthermore, we assume that operating systems and the producer-consumer problem can synchronize to surmount this problem. While information theorists always assume the exact opposite, our methodology depends on this property for correct behavior. The question is, will DELVER satisfy all of these assumptions? The answer is yes.3 ImplementationThough many skeptics said it couldn't be done (most notably K. Ramanathan et al.), we construct a fully-working version of our system. DELVER requires root access in order to learn self-learning symmetries. DELVER is composed of a collection of shell scripts, a virtual machine monitor, and a virtual machine monitor. It was necessary to cap the instruction rate used by our system to 2498 teraflops. One is not able to imagine other solutions to the implementation that would have made implementing it much simpler.4 Experimental EvaluationOur evaluation represents a valuable research contribution in and of itself. Our overall evaluation methodology seeks to prove three hypotheses: (1) that e-commerce no longer influences complexity; (2) that interrupt rate stayed constant across successive generations of Apple Newtons; and finally (3) that an approach's user-kernel boundary is not as important as an application's legacy ABI when maximizing median power. An astute reader would now infer that for obvious reasons, we have decided not to refine effective response time. Along these same lines, we are grateful for discrete 16 bit architectures; without them, we could not optimize for usability simultaneously with usability constraints. Third, our logic follows a new model: performance matters only as long as security constraints take a back seat to popularity of replication. We hope to make clear that our patching the ABI of our distributed system is the key to our evaluation method.4.1 Hardware and Software ConfigurationFigure 3: Note that seek time grows as block size decreases - a phenomenon worthemulating in its own right.Our detailed performance analysis necessary many hardware modifications. We scripted a simulation on MIT's 2-node overlay network to quantify T. U. Qian's synthesis of neural networks in 1999 []. To start off with, physicists added 2Gb/s of Ethernet access to our system to understand the effective RAM speed of DARPA's human test subjects. We removed 100MB of flash-memory from the NSA's metamorphic overlay network to understandtechnology. Had we prototyped our trainable overlay network, as opposed to emulating it in courseware, we would have seen duplicated results. Along these same lines, we removed more USB key space from CERN's network to prove collectively linear-time methodologies's impact on the work of Soviet chemist Ivan Sutherland. Next, we quadrupled the hit ratio of our human test subjects to consider information. On a similar note, we removed 25 CPUs from our 10-node overlay network to understand our mobile telephones []. In the end, we removed some NV-RAM from our human test subjects to disprove the paradox of cryptoanalysis.Figure 4: Note that latency grows as work factor decreases - a phenomenon worthimproving in its own right.We ran DELVER on commodity operating systems, such as Ultrix Version 0.8.0, Service Pack 9 and AT&T System V. all software components were hand assembled using Microsoft developer's studio linked against trainable libraries for enabling SCSI disks. All software was hand assembled using GCC 8.0 with the help of Richard Stearns's libraries for provably exploring vacuum tubes. Our purpose here is to set the record straight. All software components were linked using Microsoft developer's studio built on James Gray's toolkit for lazily enabling average throughput. We made all of our software is available under a very restrictive license.Figure 5: These results were obtained by Adi Shamir []; we reproduce them here for clarity. Even though such a hypothesis is continuously a robust ambition, it largely conflicts with the need to provide cache coherence to hackers worldwide.4.2 Experiments and ResultsFigure 6: These results were obtained by Lee et al. []; we reproduce them here forclarity.Figure 7: The median interrupt rate of our application, as a function of hit ratio[].Is it possible to justify the great pains we took in our implementation? Yes. We ran four novel experiments: (1) we measured optical drive throughput as a function of ROM space on a Nintendo Gameboy; (2) we ran 08 trials with a simulated RAID array workload, and compared results to our bioware emulation; (3) we measured E-mail and DNS latency on our system; and (4) we compared median popularity of rasterization on the FreeBSD, L4 and L4 operating systems. All of these experiments completed without paging or unusual heat dissipation. Such a claim might seem counterintuitive but fell in line with our expectations.We first analyze the first two experiments as shown in Figure 5 [,,]. Operator error alone cannot account for these results. These instruction rate observations contrast to those seen in earlier work [], such as B. Harris's seminal treatise on I/O automata and observed ROM throughput. This is essential to the success of our work. Note that 4 bit architectures have smoother clock speed curves than do modified journaling file systems.We next turn to the first two experiments, shown in Figure 4. This is essential to the success of our work. Note the heavy tail on the CDF in Figure 5, exhibiting amplified median time since 2001. the key to Figure 3is closing the feedback loop; Figure 6shows how our heuristic's effective ROM speed does not converge otherwise. Next, note the heavy tail on the CDF in Figure 4, exhibiting amplified median instruction rate.Lastly, we discuss the first two experiments. The many discontinuities in the graphs point to muted median latency introduced with our hardware upgrades. Note how emulating Byzantine fault tolerance rather than emulating them in middleware produce smoother, more reproducible results.Furthermore, error bars have been elided, since most of our data points fell outside of 14 standard deviations from observed means.5 Related WorkWe now consider related work. Thomas [] developed a similar application, unfortunately we verified that our methodology is in Co-NP [,,,,]. Unlike many previous methods [], we do not attempt to control or provide the producer-consumer problem []. DELVER is broadly related to work in the field of hardware and architecture by Suzuki [], but we view it from a new perspective: hierarchical databases [,,,,]. Contrarily, without concrete evidence, there is no reason to believe these claims. Unfortunately, these solutions are entirely orthogonal to our efforts.Instead of architecting read-write configurations [,], we achieve this goal simply by visualizing cooperative theory []. We had our method in mind before Raj Reddy published the recent well-known work on reliable models. Our design avoids this overhead. Unlike many prior approaches, we do not attempt to synthesize or emulate the construction of Scheme []. DELVER represents a significant advance above this work. These frameworks typically require that the memory bus [] and interrupts can collaborate to accomplish this intent, and we validated here that this, indeed, is the case.We now compare our method to previous certifiable technology methods. Further, instead of architecting the development of flip-flop gates [,], we accomplish this purpose simply by controlling spreadsheets []. DELVER represents a significant advance above this work. Unlike many previous approaches, we do not attempt to develop or simulate the deployment of journaling file systems []. Instead of emulating distributed configurations [], we achieve this purpose simply by improving the partition table. All of these approaches conflict with our assumption that wireless technology and highly-available methodologies are structured [,].6 ConclusionIn our research we introduced DELVER, a methodology for superpages. Along these same lines, we also explored a framework for model checking []. Weused compact epistemologies to argue that kernels and theproducer-consumer problem can connect to overcome this riddle. We expect to see many researchers move to visualizing DELVER in the very near future.。

loop BP

Loopy Belief Propagation – p.8/73
Loopy BP
The assumptions made by BP only hold for acyclic graphs. For graphs containing cycles, loopy BP is not guaranteed to converge or be correct. However, it has been applied with experimental success.
Loopy Belief Propagation – p.3/73
Probabilistic Model
These assumptions are a simpliﬁcation of the problem’s true structure. The world appears stochastic in terms of the model. Graphical models are interpreted as describing the probability distribution of random variables.
A B C
If nodes separate nodes from nodes , then and are conditionally independent given . The Hammersley-Clifford theorem: the conditional independence assumptions hold if and only if the distribution factorizes as the product of potential functions over cliques: .

基于商空间粒度的覆盖聚类算法的研究

设Ｋ分为ｓ个子集Ｋ＝｛。ｘ，ｘ ”｝ …，ｘ， …，，Ｋ＝
闵鹏鹏，郑诚，胡哲
（安徽大学计算机学院，安徽合肥２０３）３０９摘要：聚类技术就是将数据分为自然的群体，并给出每个群的特征描述的一种数据方法。但是传统的聚类算法对高维大
规模数据的处理效率不高，张铃教授提出的交叉覆盖算法可以有效地处理大规模的聚类问题，因而本文提出基于覆盖算
ｅｉｉｎｌａｄｅａｌｒｅｎｍｂｒｆａａｆｃｅｔｈｎｌａｇｕｅｔ．Ｏｎｒｓ・ｖｒｌｏｔｍａｅｎｄｖｌｐｄｔｆｃｉｅｙｃｐｔｈｇ —ｃｌａａｙｏｄｅｃｏｓｏｅｇｒｈｈｂｅｅｅｏｅｅｅｔｌｏｅｗｉｔｅｌｅｓａｅｄｔａｉｓｏｖｈｒａ
２０第１期０８年１
计算机与现代化ＪＵＮＩＹＩＮＡＨＡＩＡＪＵＸＡＤＩＵＳ
总第１９期５
文章编号：０６２７（ｏ８１－０２０１０－５２ｏ）１６－４０４
基于商空间粒度的覆盖聚类算法的研究
ＡｂｔａｔＩａａｃｕｔｒｇｔｃｎｌｇ，ｄｔａｅｎｄｖｄｄｉｔａｕａｃｌｎ，ａｄｅｃｏｏｙｃａａｔｒｔｅｃｂｓｏｅｓｒｃ：ｎｄｔｌｓｅｎｅｈｏｏｙａａｈｓｂｅｉｉｅｎｏｎｔｒｌｏｏｙｎａｈｃｌｎｈｒｃｅｉｉｄｓｒｅｎｉｓｃｉｄｔｎｎｔｏ．Ｔｉｉｎａｉａｆａａｍｉｉｇａｄｋｏｌｄｅｄｓｏｅｎｂｔｒｄｔｎｌｓｅｎｌｏｔｍｓｃｎｎｔａａｍｉｉｇｍｅｈｄｈｓｓｏｅｂｓｃｗｙｏｔｎｎｎｗｅｇｉｃｖｒｇ，ｕａｉｏａｃｕｔｒｇａｇｒｈａｏｄｎｉｔｉｌｉｉ

基于遗传算法的3sat问题

%遗传算法具有隐含的并行性 "
因此 ! 我们尝试利用遗传算法来解决 0$!"# 问题 "
’ 主要算法描述
!! "染色体编码由 0$!"# 问题的描述中 ! 我们可以很自然地想到使用二进制编码是比较方便的 " 在此题中 ! 一个染色体就相当于对 8 个逻辑变量的一个指派 " !" "种群种群是个体的集合 ! 也就是染色体的集合 ! 我
九期 "
! " ! " # $ % & ’ "! ( ) * + %
"""""""""""
,--./!
" " Байду номын сангаас " "
""""""""""
@
余峰
& 中山大学计算机科学系 ! 广州 .:-,<. $ 要 # 本文利用遗传算法的具体求解问题无关性以及全局优化的优点 # 仿照自然界进化的过程 # 对染色体进行选择 $ 变异 $ 杂交 # 来解决 0123* 问题 # 并给出 ’AB 程序的实现 %
!!!!!!!!!!!!!!!!!!!!465Y9:94ZJ[Z5[Q-R !!! !!!!!!!!\ !!!!!!!!!!\ !!!!!!!!!!]796KR ! !!!\ \ \
&10 之间的随机数 2! 然后配对的两个串互相对应地交换从 230 到 &10 的位段 " 其特征是它可以产生
出与母体完全不同的子代串 $ 当两个母体串不相同时 %& 我们选取的杂交算子是这样的 #首先随机生成两个数 45675 和 89:;5<!45675 表示开始杂交的位置 !

Appeared in Computational Intelligence (1995), 11(2), 246-267

Incremental concept formation algorithms based on Galois (concept) latticesAppeared in Computational Intelligence (1995), 11(2), 246-267ROBERT GODIN, ROKIA MISSAOUI, HASSAN ALAOUIDépartement d'InformatiqueUniversité du Québec à MontréalC.P. 8888, Succursale "Centre Ville"Montréal (Québec), Canada, H3C 3P8(514)987-3088E-mail: godin.robert@uqam.caAbstract. The Galois (or concept) lattice produced from a binary relation has been proved useful for many applications. Building the Galois lattice can be considered as a conceptual clustering method since it results in a concept hierarchy. This article presents incremental algorithms for updating the Galois lattice and corresponding graph, resulting in an incremental concept formation method. Different strategies are considered based on a characterization of the modifications implied by such an update. Results of empirical tests are given in order to compare the performance of the incremental algorithms to three other batch algorithms. Surprisingly, when the total time for incremental generation is used, the simplest and less efficient variant of the incremental algorithms outperforms the batch algorithms in most cases.When only the incremental update time is used, the incremental algorithm outperforms all the batch algorithms. Empirical evidence shows that, on the average, the incremental update is done in time proportional to the number of instances previously treated. Although the worst case is exponential, when there is a fixed upper bound on the number of features related to an instance, which is usually the case in practical applications, the worst case analysis of the algorithm also shows linear growth with respect to the number of instances.Keywords. Concept lattice, Galois lattice, conceptual clustering, incremental algorithm, concept formation. Subject Categories. Learning, Knowledge Representation.1. IntroductionBuilding the Galois lattice of a binary relation has many important applications. Wille (1982) proposed to consider each element in the lattice as a concept and the corresponding graph (Hasse diagram) as the generalization/specialization relationship between concepts. From this perpective, the lattice represents a concept hierarchy. Each concept is a pair composed of an extension representing a subset of instances and an intension representing the common features for this set of instances. The task of inducing a concept hierarchy in an incremental manner is known as incremental concept formation or simply concept formation (Fisher, Pazzani & Langley, 1991; Gennari, Langley & Fisher, 1990) and is a fundamental process of human learning. In this article we present and analyze incremental algorithms for building Galois lattices. The main distinguishing characteristics of this method with respect to other well-known concept formation methods such as UNIMEM (Lebowitz, 1987), COBWEB (Fisher, 1987) and CLASSIT (Gennari et al., 1990) are that the concept hierarchy is a lattice, not restricted to a tree hierarchy, and although the algorithm is incremental, the resulting hierarchy does not depend on any tunable parameters, algorithm specifics, or the order in which the instances are acquired. A good overview of work on concept formation is found in (Fisher et al., 1991; Gennari et al., 1990).The practical application of using Galois lattices has resulted in many developments concerning the visualization of the lattice using computer generated diagrams (Wille, 1984), its simplification through decompositions or pruning heuristics (Ganter & Wille, 1989; Godin & Mili, 1993; Mephu Nguifo, 1993), its use in an interactive knowledge acquisition process and the generation of rules from the lattice (Guigues & Duquenne, 1986; Wille, 1992) which can be used for knowledge discovery in databases (Godin & Missaoui, 1994). In (Missaoui & Godin, 1994) we show how the concept lattice can usefully be (i) exploited (by avoiding combinatorial computations) as a framework for computing the basic measures usually calculated in the rough sets studies (Pawlak, 1991), (ii) used to define additional notions related to the intensional dimension of a classification. Carpinieto and Romano (1993) show how the lattice can be used successfully for class discovery and class prediction using several machine learning data sets. The Galois lattice can also be useful as a browsing space for information retrieval (Godin, Missaoui & April, 1993; Godin, Saunders & Gecsei, 1986). A structure very similar to the Galois lattice has been proposed for retrieval of classes in a large library based on class features (Oosthuizen, Bekker & Avenant, 1992). Recently, several other software engineering applications have been considered (Godin & Mili, 1993; Godin, Mineau, Missaoui, St-Germain & Faraj, 1995; Krone & Snelting, 1994). Extensions and refinements of the basic theory have been proposed for dealing with richer representations such as attribute-value pairs (Godin et al., 1986; Wille, 1992), conceptual graphs and use of feature taxonomies (Godin & Mili, 1993; Mineau & Godin, 1994).Many algorithms have been proposed for generating the Galois lattice from a binary relation (Bordat, 1986; Carpineto & Romano, 1993; Chein, 1969; Fay, 1975; Ganter, 1984; Godin, Missaoui & Alaoui, 1991; Malgrange, 1962; Norris, 1978). Only two of these (Carpineto & Romano, 1993; Godin et al., 1991) incrementally update the lattice and the corresponding Hasse diagram. This feature is necessary in many applications. As argued in (Frawley, Piatetsky-Shapiro & Matheus, 1991), two important features for the practical feasibility of automated knowledge discovery methods are that they should be incremental and efficient. From the currently known algorithms, the Bordat algorithm (Bordat, 1986) builds the Hasse diagram but is not incremental, and the Norris algorithm is incremental but it does not build the Hasse diagram (Guénoche, 1990). Furthermore, very little is known about the complexity of these algorithms. The algorithms we present in this article generate both the lattice and Hasse diagram incrementally. Furthermore, a detailed comparison is made to show the relative complexity of these algorithms.The following section recalls basic definitions related to the concept of Galois lattice. Section 3 presents incremental algorithms after giving a characterization of the modifications induced by adding a new instance.Section 4 presents detailed comparisons with other well-known algorithms, some of which were adapted to generate the Hasse diagram.2. Basic definitionsThis section recalls basic definitions related to the concept of Galois lattice for a binary relation. We limit our description to the aspects relevant to this paper. The following references give a more complete and formal coverage of the subject (Barbut & Monjardet, 1970; Davey & Priestley, 1992; Wille, 1982; Wille, 1992). Given a set of instances E and a set of features E', and a binary relation R between these two sets, R Õ E x E' (Table 1), there is a unique Galois lattice corresponding to this binary relation (Figure 1). Each element of the lattice is a pair, noted (X, X'), composed of a set X Œ P(E) and a set X' Œ P(E'). P(A) represents the powerset of A. Each pair must be a complete pair as defined in the following.A pair (X, X') from P(E) x P(E') is complete with respect to R if and only if the two following properties are satisfied:1) X' = f(X) where f(X) = {x' ∈ E' | ∀ x ∈ X, xRx'}.2) X = f'(X') where f'(X') = {x ∈ E | ∀ x' ∈ X', xRx'}Notice that f(Ø) = E' and f'(Ø) = E. Only maximally extended pairs are kept in the hierarchy. If a subset X of instances is not maximally extended in the sense that there are other instances described by the features common to X, then the subset has to be extended to contain this instance. Symmetrically, if X' does not contain a feature that is common to every instance in X, it has to be extended to include this feature. The idea of maximally extending the sets is formalized by the mathematical notion of a closure in ordered sets. A closure in an ordered set (E, =) is an application, h: E → E, having the following properties:i) ∀ x ∀ y, x = y ⇒ h(x) = h(y)ii) ∀ x, h(x) = xiii) ∀ x, h(h(x)) = h(x)E'R|a b c d e f g h i______|________________________________________________________1| 1 010010102|101000101E3|1001001014|0110010105|010010100______|________________________________________________________Figure 1. Matrix representation of a binary relation R.({1,2,3},{a})({1,2,4},{c})({4,5},{b})({2,3,5},{g})(Ø,{a,b,c,d,e,f,g,h,i})Figure 1. Galois lattice.The image h(x) of x in E is called h-closure of x; if x = h(x), x is h-closed or a closed element for h. The applications h = f o f' and h' = f' o f are respectively closures in E and E'. Therefore the h-closed elements of E correspond to the maximally extended instance subsets and they form a lattice H isomorphic to the Galois lattice. The isomorphism is simply to put into correspondence the closed elements with the X sets of G. Symmetrically, the h'-closed elements of E' are the maximally extended feature subsets and they correspond to the X' sets of the G.The couple of functions (f, f') is a Galois connection between P(E) and P(E') and the Galois lattice G for the binary relation is the set of all complete pairs (Barbut & Monjardet, 1970) with the following partial order: given C1=(X1,X'1) and C2=(X2,X'2), C1 < C2 <=> X'1⊂ X'2 .There is a dual relationship between the X and X' sets in the lattice, i.e.,X'1⊂ X'2 <=> X2⊂ X1and therefore,C1 < C2 <=> X2⊂ X1.The partial order is used to generate the lattice graph in the following way: there is an edge (C1,C2) if C1 < C2 and there is no other element C3 in the lattice such that C1 < C3 < C2 . C1 is called parent of C2 and C2 child of C1. The graph is usually called a Hasse diagram. When drawing a Hasse diagram, the edge direction is either downwards or upwards. The Hasse diagram therefore reveals the generalization/specializationrelationship between the concepts corresponding to the subset relationship between the extension or intensionof the concepts. Given, C, a set of elements from G, inf(C) and sup(C) will denote respectively the greatest lower bound and least upper bound of the elements in C. A fundamental property of G is that it is a completelattice since for any subset of G (complete pairs) there exists a unique greatest lower bound and a unique leastupper bound (Davey & Priestley, 1992).3. Incremental Update AlgorithmsBefore presenting the incremental update algorithms, a description of the update problem is first given inSection 3.1. The presentation of this paper extends on the material in (Godin et al., 1991) by giving a morecomplete and detailed presentation of the algorithms. Some of this material is also used in (Godin & Missaoui, 1994) to describe an algorithm for the incremental update of rules generated from a database using the lattice.3.1 Characterization of the update problemFor most applications, it is necessary to generate not only the complete pairs of the lattice G but also the Hasse diagram of the lattice. It is also important in many applications to be able to incrementally add a new instance, x*, by modifying the lattice without having to regenerate it from scratch. For example, adding the new element, x* = 6, related to the set, f*({x*}) = {b, c, f, i}, would result in the modifications shown in Figure 2 representing the new lattice noted G*. G* can be obtained from G by taking all the pairs in G and modifying the X part of the nodes for which X' ⊆ f*({x*}) by adding x*. The pairs that remain intact are called old pairs and the others, modified pairs in G*. In addition, new pairs are created. These new pairs are always in the form (Y ∪ {x*}, Y' ∩ f*({x*})) for some pair (Y, Y') ∈ G. They correspond to new X' sets with respect to G. The related (Y, Y') node in G is called the generator for this new pair. If the generators can be characterized in some manner, the new nodes can be generated from them. Proposition 1 is one possible characterization and is the basis for Algorithm 1.Proposition 1.If (X, X') = inf{(Y,Y')∈ G| X'=Y' ∩ f*({x*})} for some set X' and there is no node of the form (Z, X') ∈ G then (Y, Y') is the generator of a new pair (X = Y ∪ {x*}, X' = Y' ∩ f*({x*})) ∈ G*.#1Figure 2. Modifications resulting from adding element 6 related to the set {b, c, f, i}.Any new X' set in G* will have to be the result of intersecting f*({x*}) with some Y' set already present in the lattice G. There may be many pairs in G that give a particular new intersection in this manner. For example, in Figure 2, the new X' set, {i}, in the new pair #15, can be formed by intersecting f*({x*}) = {b, c, f, i} with the Y' set of pair #7, {a, g, i}, or the Y' set of pair #12, {a, d, g, i}. However, there is only one of these that is the generator of the new pair, #7 in this example. This pair is the smallest (inf) old pair that produces the intersection. Unicity of the greatest lower bound is guaranteed by the lattice property of G. In Figure 2, the generator pairs for the new pairs #15, #16, #17, #18, #19 are respectively #7, #11, #8, #13, #14.One minor problem is for the case of the generator being sup(G) = (Ø, E'). If the new instance x* contains new features not contained in E', there will be no generators for the pair ({x*}, f({x*}). In practical applications, we may want E' to grow as new features are encountered. This is easily taken into account by simply adding the new features to E' as a first step in the algorithm and therefore the characterization remains valid.In addition, the edges of the Hasse diagram also have to be updated. First, the generator of a new pair will always be a child to the new pair in the Hasse diagram. The children of old pairs do not change. The parents of generator pairs, however, have to be changed. The generator is the only old pair that becomes a child of athe new pair. There may be another child but it will be a new pair. For example in Figure 2, there is an edgefrom the new pair # 17 to its generator #8 and there is another child #18 that is a new pair. The parents of oldpairs that are not generators remain unchanged.Also, the parents of modified pairs never change. However, the children of some modified pairs may bemodified. There may be new children that are new pairs. This implies that some old children may have to beremoved if the new children fall in between the old child and the modified pair. This is the case when the new pair falls between a generator and one of its parents. The result is that the edge from that parent to thegenerator (e.g., edge(#3,#8) in Figure 2) is replaced by two edges, one from the parent to the new pair (e.g.,edge(#3,#17)) and one from the parent to the generator (e.g., edge(#17,#8)).Table 2 is a summary of the modifications resulting from the update process with respect to our classificationof pairs. The updating is decomposed into four aspects: the X set, the X' set, the parents and children. The firstthree categories represent the pairs that are in G and remain in G* with possibly some modifications. As opposed to category 3, categories 1 and 2 are pairs that have to be modified. The fourth category is for the new pairs.TABLE 2. Summary of the modifications in the update process.Type of node X set X' set parents children1.Modified node (Y, Y') ∈ G Add x*No change No change Add new nodesin some casesRemove agenerator whena new node inbetween2.Old node generator of N No change No change Add new nodeN and removeparent when Nis in betweenthis parent andthe generatorNo change3.Old node nongeneratorNo change No change No change No change4.New node having generator (Y, Y')Y ∪ {x*}Y' ∩ f({x*})Old nodes andnew nodesGenerator andpossibly newnode3.2 Basic algorithmsAlgorithm 1 is a basic algorithm and a refinement which could be considered for large data sets is given in Algorithm 2. For a node H, X(H) and X'(H) will denote respectively the extent (first component) and intent (second component) of the complete pair. Sup(G) and inf(G) will denote respectively the lowest upper bound and greatest lower bound of the lattice.The lattice is initialized with one element: (Ø, Ø). This means that E = E' = Ø. The algorithm updates E and E' as new elements are added. If we suppose that E and E' contain in advance every element with an empty R, the lattice would be initialized with the two elements: (E,Ø) and (Ø,E'). This would slightly simplify the algorithm because adjusting E' by adding new elements from f({x*}) would not be necessary.Algorithm 1 Add (x*: new object; f({x*}) : elements related to x* by R);BEGIN1{Adjust (sup(G)) for new elements in E'}2IF sup(G) = (Ø, Ø) THEN3Replace sup(G) by: ({x*},f({x*}))4ELSE5IF NOT (f*({x*}) ⊆ X'(sup(G))) THEN6IF X(sup(G)) = Ø THEN X'(sup(G)) := X'(sup(G)) ∪ f({x*})7ELSE8Add new pair H {becomes sup(G*)}: (Ø,X'(sup(G)) ∪ f({x*}));9Add new edge sup(G)->H10END IF11END IF;12C[i] <- {H: ||X'(H)||=i}; {Class pairs in buckets with same cardinality of the X' sets}13C'[i] <- Ø; {Initialize the C' sets}14{Treat each bucket in ascending cardinality order}15.FOR i : 0 TO maximum cardinality DO16FOR each pair H in C[i]17 IF X'(H) ⊆ f({x*}) THEN {modified pair}18Add x* to X(H);19 Add H to C'[i];20 IF X'(H) = f({x*}) THEN exit algorithm21 ELSE{old pair}22 int <- X'(H) ∩ f({x*});23 IF ¬∃ H1 ∈ C'[||int||] such that X'(H1)=Int THEN {H is a generator}24 Create new pair H n=(X(H) ∪{x*},int) and add to C'[||int||];25Add edge H n -> H;26 {Modify edges}27 FOR j : 0 TO ||int||-128FOR each H a∈ C'[j]29IF X'(H a) ⊂ int {H a is a potential parent of H n}30 parent<-true;31 FOR each H d child of H a32IF X'(H d) ⊂ Int parent<-false; exit FOR END IF33 END FOR;34 IF parent35IF H a is a parent of H36 eliminate edge H a->H END IF;37 Add edge H a->H n38 END IF39 END IF40 END FOR41END FOR;42 IF Int=f*({x*}) THEN exit algorithm END IF43 END IF44 END IF45END FOR46END FOR47END IFEND {Add}As introduced earlier, the basic idea is to generate G* from G, changing the X sets, and links of the pairs already in G and adding some new pairs for the new X' sets and linking them. The new pairs are generated by finding the generator pairs using the characterization of Proposition 1. The function of lines 1 to 11 is to take into account the case when new features appear by adjusting E' in sup(G). Line 12 classifies the pairs intobuckets with the same ||X'||. The main loop (lines 15-46) iterates on every pair in ascending ||X'||. New pairsare obtained by systematically trying to generate a new intersection from each pair (Y, Y') already in the lattice by intersecting Y' with f({x*}) (line 22). Verifying that this intersection is not already present is doneby looking in the sets already encountered which are subsets of f({x*}) (line 23). These sets are kept in C'(line 19, line 24). This is valid because the pairs are treated in ascending cardinality of the X' sets.Furthermore, the first pair encountered which gives a new intersection is the generator of the new pair because it is necessarily the infimum. Thus, we compute the X set of the New pair by adding x* to the generator's X set (line 24). Also, there is automatically an edge from the new pair to the generator (line 25). When a new pair is added, some edges have to be added from modified or other new pairs to the new pair. The candidates are necessarily in the C' sets since there X' set must be a subset of f({x*}). These parents of the new pair are determined by examining the pairs in C' (lines 27-41), testing if the X' sets are subsets of the X' set of the new pair (line 29) and verifying that no child of the potential parent has this property (lines 30-34). It is necessary to eliminate an edge between the new parent and the generator when there is such an edge (lines 34-35). The exhaustive iteration on the C' set is a fairly simple minded approach to modifying the edges and could be optimized somewhat. Lines 17-20 process the modified pairs and the rest of the treatment is skipped for these pairs because they cannot be generators.Although in the worst case, the lattice may grow exponentially, the growth is linearly bounded with respect to||E||, when there is a fixed upper bound on ||f({x})||. This is the case in practical applications such asinformation retrieval where for each document x in E there is a reasonable fixed upper bound on the number of terms from E' related to x (Godin et al., 1986). Clearly any new X' other than the updating of E' is a subset of f({x*}). This implies that there is at most 2||f({x*})|| additional pairs generated by adding x*. If there is an upper bound K, independent of the ||E||, on ||f({x})||, the lattice grows linearly with respect to ||E||: ||G|| = 2K ||E||.Experimental applications and theoretical results based on a uniform distribution hypothesis show that the average growth factor obtained is far less than the 2K bound. In every application, we observe that: ||G|| = k ||E||,where k is the mean value for ||f({x})||. Furthermore, theoretical estimations based on different values of ksuggest that the lattice may grow linearly with respect to the number of terms per document. Using a formuladerived in (Godin et al., 1986) for the mean value of ||G*|| with random assignment of index terms within documents, we have computed the theoretical values for different values of k, keeping other parameters fixed. Again, we obtain ||G*||/n < k and the observed growth is linear in k (Godin et al., 1993). More details on the complexity of the lattice are found in (Godin, 1989; Godin et al., 1986).The time complexity of iterating on the pairs for creating the intersections and verifying the existence of theintersection in C' is the major factor in analyzing the complexity of the algorithm. Although the linkingprocess is complicated, the pairs affected are limited and this part is only done when a generator pair is encountered. This is why we give a fairly straightforward algorithm for this process. Since ||G|| = 2K ||E|| and the ||C'|| is bounded by 2K, the whole time complexity is:O(||G|| 2K) = O(22K ||E||),which is O(||E||) for a fixed K. In applications where k is large, a more refined algorithm for the edge updating is to consider.Algorithm 1 iterates on almost every pair of G. It is possible to do the work by looking at a limited subset of G: the pairs that correspond to generators and modified pairs. This can be done by keeping for each x' in E' a pointer P x' on the smallest pair containing x' and using these pointers as entry points for a top-down depth-first search starting with every x' in f({x*}). This guarantees that any pair encountered will have at least x' in common with f({x*}). Algorithm 2 performs the search in this manner. Some minor modifications have to be introduced for maintaining these pointers but the details are left out.Algorithm 2 Add(x*: new object; f({x*}) : elements related to x by R);Procedure SelectAndClassifyPairsProcedure Search(H: pair)BEGINMark H as visited and add H to C[||X'(H)||];FOR each H d child of HIF H d is not marked as visitedSearch(H d)END IFEND FOREND {search}BEGINMark inf(G) as visited and add inf(G) to C[||X'(inf(G))||]FOR each x' in f({x*})·Search(P x')END FOREND {SelectAndClassifyPairs}BEGIN....12. SelectAndClassifyPairs;....END. {Add}Compared to Algorithm 1, Algorithm 2 can offer significant gains. For example, in an information retrieval application involving more than 3000 documents (Godin, 1986), the proportion of nodes treated by Algorithm 2 varied between 10% to 15% of the whole lattice. However, even if there is an important gain with this strategy compared to the basic Algorithm 1, the asymptotical complexity is unchanged with respect to Algorithm 1 because generally the subset of pairs is of the same order of complexity as the whole lattice.3.3 Other variantsIn this section we describe some variants of the previous algorithms which could prove to be useful in some contexts. However, we have not considered these variants in our experiments. Although the algorithm by Norris (Norris, 1978). generates the pairs using the same basic method as in Algorithm 1 by scanning each node in G, testing for modified pairs and testing for generators in order to add new pairs, it differs from ours in the way the generator is characterized. In the Norris algorithm, the complete pair property of the potential new pair, (Y ∪ {x*},Y' ∩ f*({x*})), is explicitly computed by testing if f'(Y' ∩ f*({x*})) = Y ∪ {x*}. Therefore, the information necessary for this computation must be incrementally maintained and the computation time grows with ||E|| as opposed to our approach for testing generators. Furthermore, the Norris algorithm does not address the graph update part. Their method of determining generators could nevertheless be incorporated in Algorithms 1 and 2 giving other variants that might be advantageous in some cases. In particular, when the average for ||f({x})|| is large, the number of pairs in C' can grow so much that computing the f' function would be less expensive.Another possible variation to be considered is to use the generator characterization of Proposition 2. It is a more local characterization since it only depends on conditions expressed in terms of the candidate generator and its parents.Proposition 2.(Y,Y') ∈ G is the generator of the new pair (X,X') ∈ G* if and only ifY' ¬⊆ f*({x*}) and for every parent (Z, Z') of (Y, Y') in G, Y' ∩ f*({x*}) ¬⊆ Z'This would mean replacing the search of the C' set to determine if the intersection is new by looking at the set of parents of the pair to be tested. Empirical evidence from a large information retrieval application (Godin, 1986) suggests that the mean number of parents (which is equal to the mean number of children) grows proportionally to ln(||E||), and that this number is often smaller than the number of pairs in C'. This suggests some potential saving in CPU time if the characterization of Proposition 2 is used instead of searching C'. The algorithm proposed in (Carpineto & Romano, 1993) uses this approach to generate the new nodes. One major disadvantage in their algorithm with respect to Algorithm 2 is that it iterates over all the nodes in the lattice as our Algorithm 1. Since this strategy systematically considers the set of parents of a node, a natural way proposed in (Godin, 1986) is to use a bottom-up strategy starting with sup(G) and searching the lattice with a depth-first search and using the characterization of Proposition 2 for the generator. This strategy however looks at more nodes than Algorithm 2 because some nodes for which X'(H) ∩ f*({x*}) = Ø have to be treated. In particular, for sup(G), a large number of non relevant parent nodes have to be considered.To deal with this problem, another algorithm has been tested (Godin, 1986) which searches the lattice as in Algorithm 2 using a depth first search based on the characterization of Proposition 2 but in a different manner than suggested above. Instead of verifying the characterization of Proposition 2 directly by looking at all the parents at once, nodes are created temporarily until a parent that contradicts the property is found. Therefore, some nodes that have been created by mistake will be deleted. The algorithm is more complex than Algorithm 2 and did not give any improvement for our experimental applications. There may be some advantage in this strategy when central memory is limited because only local information is used to process a node.A major advantage of using the characterization of Proposition 2 for generating new nodes is when considering parallel algorithms. Given the localized characterization of the generators, these can be tested independently in parallel. A parallel algorithm for building the lattice in such a manner is proposed in (Ouaggag, Godin, Missaoui & Mili, 1993).3.4 Removing instancesRemoving an instance from the lattice might also be considered for some applications. This problem is much easier than adding an instance. One simple method is to start from the lowest pair containing the instance to remove and to search upwards recursively, verifying if the parents are still necessary. When deleting a parent, care must be taken for relinking nodes, if necessary. The number of pairs to examine is limited to pairs (X, X') where X' ⊆ f({x}), x being the instance. This process will therefore be much more efficient than adding an instance. More details are found in (Godin, 1986).。

聚集(三)

n i =1
I
, Q)
最小
对所有的j=1,…,m有
f r ( A j = q j | X ) ≥ f r ( A j = ck , j | X ) f r ( A j = ck , j | X ) = nck , j n
q j ≠ ck , j
Nck,j是在属性上Ai值为ck,j的对象数
K模算法
1.为每个簇选择初始模，共k个 2.根据d，把对象分配给最近的簇。根据定理重新计算簇的模 3.计算每个对象对当前模的相异度，重新分配对象到簇 4.重复上述2，3过程，直到簇中的对象不再发生变化
p q p1 p o q
DBSCAN基本思想
簇：基于密度可达性的最大的密度相连对
象的集合噪音：不在任何簇中的对象边界对象：不是核心对象，但在簇中，即至少从一个核心对象直接可达
噪音边界点核心对象
ε = 1cm
MinPts =Biblioteka 5DBSCAN算法1）任意选择没有加簇标签的点 p 2）找到从p关于ε and MinPts 密度可达的所有点 3）如果|Nε(q)|≥MinPts ，则p是核心对象,形成一个新的簇，给簇内所有的对象点加簇标签 4）如果p 是边界点, 则处理数据库的下一点 5）重复上述过程，直到所有的点处理完毕
如果 ,（t为一个指定域值），则dist 为NONE. 否则，dist不变.
conflt >t n
统计信息（3）
n=220 m=20.27 s=2.37 min=3.8 max=40 dist=normal
dist4≠dist confl=10 confl/n=0.045<0.05
自顶向下地回答查询
定义
给定半径ε和MinPts ，每个聚类中的对象的 ε-邻域中至少包含MinPts个对象给定对象集合D

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

DataReduction,Exact,andHeuristicAlgorithmsforCliqueCoverJensGramm∗JiongGuo†FalkH¨uﬀner†RolfNiedermeier†

AbstractTocovertheedgesofagraphwithaminimumnumberofcliquesisanNP-completeproblemwithmanyapplications.Thestate-of-the-artsolvingalgorithmisapolynomial-timeheuristicfromthe1970’s.Wepresentanimprovementofthisheuristic.Ourmaincontribution,however,isthedevelopmentofeﬃcientandeﬀectivepolynomial-timedatareductionrulesthat,combinedwithasearchtreealgorithm,allowforexactproblemsolutionsincompetitivetime.Thisisconﬁrmedbyexperimentswithreal-worldandsyntheticdata.Moreover,weprovetheﬁxed-parametertractabilityofcoveringedgesbycliques.

1IntroductionDatareductiontechniquesforexactlysolvingNP-hardcombinatorialoptimizationproblemshaveprovenuse-fulinmanystudies[1,3,10,16].Thepointisthatbypolynomial-timeexecutablereductionrulesmanyinputinstancesofhardcombinatorialproblemscanbesigniﬁ-cantlyshrunkand/orsimpliﬁed,withoutsacriﬁcingthepossibilityofﬁndinganoptimalsolutiontothegivenproblem.Forsuchreducedinstancesthenoftenexhaus-tivesearchalgorithmscanbeappliedtoeﬃcientlyﬁndoptimalsolutionsinreasonabletime.Hencedatareduc-tiontechniquesareconsideredasa“must”whentry-ingtocopewithcomputationalintractability.StudyingtheNP-completeproblemtocovertheedgesofagraphwithaminimumnumberofcliques((Edge)CliqueCover)1,weaddanewexampletothesuccessstoryofdatareduction,presentingbothempiricalaswellastheoreticalﬁndings.tionruleswiththesearchtree,clearlyoutperformingheuristicapproachesinseveralways.Forinstance,wecansolvereal-worldinstancesfromastatisticalapplica-tion[18]—sofarsolvedheuristically[18,11]—optimallywithouttimeloss.Thisindicatesthatforasigniﬁcantfractionofreal-worldinstancesourexactapproachisclearlytobepreferredtoaheuristicapproachwhichiswithoutguaranteedsolutionquality.Wealsoex-perimentedwithrandomgraphsofdiﬀerentdensities,showingthatourexactapproachworksextremelywellforsparsegraphs.Inaddition,ourempiricalresultsre-vealthatfordensegraphsadatareductionrulethatwasdesignedforshowingtheproblemkerneldoesverywell.Inparticular,thisgivesstrongempiricalsupportforfurthertheoreticalstudiesinthedirectionofim-provedﬁxed-parametertractabilityresultsforCliqueCover,nicelydemonstratingafruitfulinterchangebe-tweenappliedandtheoreticalalgorithmicresearch.Notalldetailsandproofsaregiveninthisextendedabstract.

2ImprovedHeuristicKellerman[12]proposedapolynomial-timeheuristicforCliqueCover.ThisheuristicwasimprovedbyKou,Stockmeyer,andWong[14]byaddingapostprocessingstep;thisversionhasbeensuccessfullyappliedtoinstructionschedulingproblems[19]andintheanalysisofstatisticaldata[11].Clearly,bothversionsoftheheuristicruninpolynomialtimebutinbothcasesamorepreciseanalysisoftheirruntimewasnotgiven.Inthissection,forann-vertexandm-edgegraph,weanalyzetheruntimeofbothheuristicsasO(nm2).WeshowhowtoslightlymodifyKellerman’sheuristicsuchthatwecanimprovetheruntimeofbothheuristicstoO(nm)byacarefuluseofadditionaldatastructures.TheCliqueCoverheuristicbyKellerman[12],further-onreferredtoasCC-Heuristic1,isdescribedinpseudo-codeintheleftcolumnofFig.1.Tosimplifynotation,weuseV={1,...,n}asvertexset.Thealgorithmstartswithanemptycliquecoverandsuccessively,fori=1,...,n,updatesthecliquecovertoaccountforedges{i,j}withjarenoedgesbetweenthecurrentlyprocessedvertexiandthesetWofitsalreadyprocessedneighbors,anewcliqueiscreated,containingonlyi.Otherwise,wetrytoadditoexistingcliqueswherepossible.Afterthis,theremaystillremainuncoverededgesbetweeniandW.Tocoverthoseedges,wecreateanewcliquecontainingianditsneighborsfromoneoftheexistingcliquessuchthatthenumberofedgescoveredbythisnewcliqueismaximized.WerepeatthisprocessuntilalledgesbetweeniandverticesfromWarecovered.Toimproveboththeanalysisandinsomecasesthe

Figure2:Theadd-to-cliquesubroutine.resultoftheheuristic,weassumethatthealgorithmisabortedassoonasmcliquesaregenerated.Inthatcasewecansimplytakethesolutionthatcoverseachedgeseparatelybyatwo-vertexclique.ItisrelativelystraightforwardtoobservethatCC-Heuristic1runsinO(nm2)time.Toimprovetheruntime,theideaistoidentifythe“hotspots”andtousecachingdatastructuresthatwillgivethecomputationofthesehotspotsbasicallyforfree,andspreadtheworkofkeepingthisstructureup-to-datethroughouttherestoftheprogram.Wemaintainthefollowingtwotablesforverticesi,1≤i≤n,whereCl,1≤l≤m,arethecliquesgeneratedbythealgorithm:

S[i]:={l|Cl⊆N<(i)},I[i]:={l|Cl∩N<(i)=∅},sorteddescendingaccordingto|Cl∩N<(i)|,

wherethesetN<(i)foravertexiisdeﬁnedasN<(i):={j|jwith{i,j}∈Ecalleduncoveredif∀1≤l≤m:{i,j}⊆Cl.ThesetN>(i)isdeﬁnedanalogously.ThetableS[i]isusedtokeeptrackoftheexistingcliquestowhichaverteximaybeadded.UsingS[i]willavoidtoinspecteveryexistingcliqueindividuallyinordertotestwhethervertexicanbeaddedtotheclique(seelines10and11forCC-Heuristic1inFig.1).ThetableI[i]isusedtokeeptrackoftheexistingcliqueswithwhichavertexihasan“uncoveredoverlap”,i.e.,withwhichisharesyetuncoverededges.Thecliquesarekeptsortedbythesizeofthisuncoveredoverlap.UsingtableI[i],wewillavoidthecostlycomputationofacliquewithmaximumuncoveredoverlapinline18.Thus,tablesSandIhelptoreplacecostlyoperationsduringtheheuristicbyconstant-timelook-upsinthesetables.Thiscomesatthepriceofhavingtokeepthesetablesup-to-datethroughouttheheuristic.Wenowﬁrstexplainhowthenewlyintroducedtablesareupdatedthroughouttheheuristicand,then,

Data reduction, exact, and heuristic algorithms for clique cover

合集下载

Bibliography

整理蛛形学常用名称

复杂网络中聚类算法总结

第七讲卡诺图

MIT麻省理工学院算法导论公开课 Problem Set 4 solution

基于联合意义度量的Top-K图模式挖掘

《人工智能》课程习题

6Cluster

禁忌搜索算法求解图节点着色问题

DELVER

loop BP

基于商空间粒度的覆盖聚类算法的研究

基于遗传算法的3sat问题

Appeared in Computational Intelligence (1995), 11(2), 246-267

聚集(三)

文档推荐

最新文档

Data reduction, exact, and heuristic algorithms for clique cover

合集下载

Bibliography

整理蛛形学常用名称

复杂网络中聚类算法总结

第七讲 卡诺图

MIT麻省理工学院 算法导论公开课 Problem Set 4 solution

基于联合意义度量的Top-K图模式挖掘

《人工智能》课程习题

6Cluster

禁忌搜索算法求解图节点着色问题

DELVER

loop BP

基于商空间粒度的覆盖聚类算法的研究

基于遗传算法的3sat问题

Appeared in Computational Intelligence (1995), 11(2), 246-267

聚集(三)

文档推荐

最新文档

第七讲卡诺图

MIT麻省理工学院算法导论公开课 Problem Set 4 solution