Modularity and community detection in bipartite networks.
- 格式:pdf
- 大小:207.06 KB
- 文档页数:11
Community Detection in Social Networks via GraphicalGameJaewon Yanga project proposal for STATS375,Winter2010-11January17,20111BackgroundCommunities are regarded as an important tool to understand and represent social networks. By nature,the rigorous definition of a community has not existed and many researchers have proposed their own definition of community or own algorithm forfinding communities.The most prominent advance among the recent works is the concept of modularity by Newman et al.[New06]After Newman et al.,extensive amount of researches has focused on the community detection of the optimization of modularity.Though mathematically elegant, community detection solely by optimization of modularity does not provide clean intuitive explanation about the formation of communities.In social networks,each node is individual person,who makes one’s decision actively,but most of researches did not focus on the motivation of individuals.2Community Detection via Graphical GameRecently,Wei Chen et al.[CLSW10]proposed a game-theoretic framework to view commu-nity detection.Wei Chen et al.[CLSW10]proposed the structure of a graphical game of community formation,where the graph of the game is the same as the graph of a social network,and each node’s payoffincreases with the number of communities that the node shares with the node’s neighbors.They proved the existence of pure Nash equilibria and proposed a greedy-hill climbing algorithm to reach a”local”equilibria.As game-theory has been proved to be powerful tool to explain individuals’collective behavior,it would provide intuition about why and how communities are formed between individuals.In this project, I will explore further possibilities of applying graphical game to community detection.13Directions3.1Implementation of Wei Chen et al.I will begin with implementing Wei Chen et al.’s method to detect communities.I will reproduce the same benchmark graphs as in[CLSW10],and check whether the method works as well as in[CLSW10].Also,I will test the method in different benchmark graphs and evaluate performance.3.2Finding approximation of pure Nash equilibria[CLSW10]called the equilibria they found a”local”equilibria in the sense that each indi-vidual is allowed to change its policy by small amount.In the project,I will try tofind pure Nash equilibria of the game.In particular,I will refer to the recent achievement in reduction of a graphical game to Markov networks,[Kea07,DP06]and apply several techniques for approximate inference of Markov networks.3.3Varying the structure of the game[CLSW10]proposed the payoffthat is similar to modularity.I will try different payoffs and evaluate their effects on the performance of the method.References[CLSW10]Wei Chen,Zhenming Liu,Xiaorui Sun,and Yajun Wang.A game-theoretic framework to identify overlapping communities in social networks.Data Min.Knowl.Discov.,21:224–240,September2010.[DP06]Constantinos Daskalakis and Christos puting pure nash equilibria in graphical games via markov randomfields.In Proceedings of the7thACM conference on Electronic commerce,EC’06,pages91–99,New York,NY,USA,2006.ACM.[Kea07]Michael Kearns.Chapter7graphical games,2007.[New06]M.E.J.Newman.Modularity and community structure in networks.Proceedings of the National Academy of Sciences,103(23):8577–8582,June2006.2。
Physica A 374(2007)483–490Identification of overlapping community structure in complexnetworks using fuzzy c -means clusteringShihua Zhang a,Ã,Rui-Sheng Wang b ,Xiang-Sun Zhang aa Academy of Mathematics &Systems Science,Chinese Academy of Science,Beijing 100080,Chinab School of Information,Renmin University of China,Beijing 100872,ChinaReceived 28June 2006Available online 7August 2006AbstractIdentification of (overlapping)communities/clusters in a complex network is a general problem in data mining of network data sets.In this paper,we devise a novel algorithm to identify overlapping communities in complex networks by the combination of a new modularity function based on generalizing NG’s Q function,an approximation mapping of network nodes into Euclidean space and fuzzy c -means clustering.Experimental results indicate that the new algorithm is efficient at detecting both good clusterings and the appropriate number of clusters.r 2006Elsevier B.V.All rights reserved.Keywords:Overlapping community structure;Modular function;Spectral mapping;Fuzzy c -means clustering;Complex network1.IntroductionLarge complex networks representing relationships among set of entities have been one of the focuses of interest of scientists in many fields in the recent years.Various complex network examples include social network,worldwide web network,telecommunication network and biological network.One of the key problems in the field is ‘How to describe/explain its community structure’.Generally,a community in a network is a subgraph whose nodes are densely connected within itself but sparsely connected with the rest of the network.Many studies have verified the community/modularity structure of various complex networks such as protein-protein interaction network,worldwide web network and co-author network.Clearly,the ability to detect community structure in a network has important practical applications and can help us understand the network system.Although the notion of community structure is straightforward,construction of an efficient algorithm for identification of the community structure in a complex network is highly nontrivial.A number of algorithms for detecting the communities have been developed in various fields (for a recent review see Ref.[1]and a recent comparison paper see Ref.[2]).There are two main difficulties in detecting community structure.The first is that we don’t know how many communities there are in a given network.The usual drawback in many /locate/physa0378-4371/$-see front matter r 2006Elsevier B.V.All rights reserved.doi:10.1016/j.physa.2006.07.023ÃCorresponding author.E-mail addresses:zsh@ (S.Zhang),wrs@ (R.-S.Wang),zxs@ (X.-S.Zhang).algorithms is that they cannot give a valid criterion for measuring the community structure.Secondly,it is a common case that some nodes in a network can belong to more than one community.This means the overlapping community structure in complex networks.Overlapping nodes may play a special role in a complex network system.Most known algorithms such as divisive algorithm [3–5]cannot detect them.Only a few community-detecting methods [6,7]can uncover the overlapping community structure.Taking into account the first difficulty,Newman and Girvan [8]has developed a new approach.They introduced a modularity function Q for measuring community structure.In order to write the context properly,we refer to a similar formulation in Ref.[5].In detail,given an undirected graph/network G ðV ;E ;W Þconsisting of the node set V ,the edge set E and a symmetric weight matrix W ¼½w ij n Ân ,where w ij X 0and n is the size of the network,the modularity function Q is defined asQ ðP k Þ¼X k c ¼1L ðV c ;V c ÞL ðV ;V ÞÀL ðV c ;V ÞL ðV ;V Þ 2"#,(1)where P k is a partition of the nodes into k groups and L ðV 0;V 00Þ¼P i 2V 0;j 2V 00w ði ;j Þ.The Q function measuresthe quality of a given community structure organization of a network and can be used to automatically select the optimal number of communities k according to the maximum Q value [8,5].The measure has been used for developing new detection algorithms such as Refs.[5,9,4].White and Smyth [5]showed that optimizing the Q function can be reformulated as a spectral relaxation problem and proposed two spectral clustering algorithms that seek to maximize Q .In this study,we develop an algorithm for detecting overlapping community structure.The algorithm combines the idea of modularity function Q [8],spectral relaxation [5]and fuzzy c -means clustering method[10]which is inspired by the general concept of fuzzy geometric clustering.The fuzzy clustering methods don’t employ hard assignment,while only assign a membership degree u ij to every node v i with respect to the cluster C j .2.MethodSimulation across a wide variety of simulated and real world networks showed that large Q values are correlated with better network clusterings [8].Then maximizing the Q function can obtain final ‘optimal’community structure.It is noted that in many complex networks,some nodes may belong to more than one community.The divisive algorithms based on maximizing the Q function fail to detect such case.Fig.1shows an example of a simple network which visually suggests three clusters and classifying node 5(or node 9)intoFig.1.An example of network showing Q and e Qvalues for different number k of clusters using the same spectral mapping but different cluster methods,i.e.k -means and fuzzy c -means,respectively.For the latter,it shows every node’s soft assignment and membership of final clusters with l ¼0:15.S.Zhang et al./Physica A 374(2007)483–490484two clusters at the same time may be more appropriate intuitively.So we introduce the concept of fuzzy membership degree to the network clustering problem in the following subsection.2.1.A new modular functionIf there are k communities in total,we define a corresponding n Âk ‘soft assignment’matrix U k ¼½u 1;...;u k with 0p u ic p 1for each c ¼1;...;k and P kc ¼1u ic ¼1for each i ¼1;...;n .With this we define the membership of each community as ¯V c ¼f i j u ic 4l ;i 2V g ,where l is a threshold that can convert a soft assignment into final clustering.We define a new modularity function e Q as e Q ðU k Þ¼X k c ¼1A ð¯V c ;¯V c ÞA ðV ;V ÞÀA ð¯V c ;V ÞA ðV ;V Þ 2"#,(2)where U k is a fuzzy partition of the vertices into k groups and A ð¯V c ;¯V c Þ¼P i 2¯V c ;j 2¯V c ððu ic þu jc Þ=2Þw ði ;j Þ,A ð¯V c ;V Þ¼A ð¯V c ;¯V c ÞþP i 2¯V c ;j 2V n ¯V c ððu ic þð1Àu jc ÞÞ=2Þw ði ;j Þand A ðV ;V Þ¼P i 2V ;j 2V w ði ;j Þ.This of coursecan be thought as a generalization of the Newman’s Q function.Our objective is to compute a soft assignment matrix by maximizing the new Q function with appropriate k .How could we do?2.2.Spectral mappingWhite and Smyth [5]showed that the problem of maximizing the modularity function Q can be reformulated as an eigenvector problem and devised two spectral clustering algorithms.Their algorithms are similar in spirit to a class of spectral clustering methods which map data points into Euclidean space by eigendecomposing a related matrix and then grouping them by general clustering methods such as k -means and hierarchical clustering [5,9].Given a network and its adjacent matrix A ¼ða ij Þn Ân and a diagonal matrix D ¼ðd ii Þ,d ii ¼P k a ik ,two matrices D À1=2AD À1=2and D À1A are often used.A recent modification [11]uses the top K eigenvectors of the generalized eigensystem Ax ¼tDx instead of the K eigenvectors of the two matrices mentioned above to form a matrix whose rows correspond to original data points.The authors show that after normalizing the rows using Euclidean norm,their eigenvectors are mathematically identical and emphasize that this is a numerically more stable method.Although their result is designed to cluster real-valued points[11,12],it is also appropriate for network clustering.So in this study,we compute the top k À1eigenvectors of the eigensystem to form a ðk À1Þ-dimensional embedding of the graph into Euclidean space and use ‘soft-assignment’geometric clustering on this embedding to generate a clustering U k (k is the expected number of clusters).2.3.Fuzzy c-meansHere,in order to realize our ‘soft assignment’,we introduce fuzzy c -means (FCM)clustering method [10,13]to cluster these points and maximize the e Qfunction.Fuzzy c -means is a method of clustering which allows one piece of data to belong to two or more clusters.This method (developed by Dunn in 1973[10]and improved by Bezdek in 1981[13])is frequently used in pattern recognition.It is based on minimization of the following objective functionJ m ¼Xn i ¼1X k j ¼1u m ij k x i Àc j k 2,(3)over variables u ij and c with P j u ij ¼1.m 2½1;1Þis a weight exponent controlling the degree of fuzzification.u ij is the membership degree of x i in the cluster j .x i is the i th d -dimensional measured data point.c j is the d -dimensional center of the cluster j ,and k Ãk is any norm expressing the similarity between any measured data and the center.Fuzzy partitioning is carried out through an iterative optimization of the objective function shown above,with the update of membership degree u ij and the cluster centers c j .This procedure converges to a local minimum or a saddle point of J m .S.Zhang et al./Physica A 374(2007)483–4904852.4.The flow of the algorithmGiven an upper bound K of the number of clusters and the adjacent matrix A ¼ða ij Þn Ân of a network.The detailed algorithm is stated straightforward for a given l as follows:Spectral mapping:(i)Compute the diagonal matrix D ¼ðd ii Þ,where d ii ¼P k a ik .(ii)Form the eigenvector matrix E K ¼½e 1;e 2;...;e K by computing the top K eigenvectors of thegeneralized eigensystem Ax ¼tDx .Fuzzy c -means:for each value of k ,2p k p K :(i)Form the matrix E k ¼½e 2;e 3;...;e k from the matrix E K .(ii)Normalize the rows of E k to unit length using Euclidean distance norm.(iii)Cluster the row vectors of E k using fuzzy c -means or any other fuzzy clustering method to obtain a softassignment matrix U k .Maximizing the modular function:Pick the k and the corresponding fuzzy partition U k that maximizes e QðU k Þ.In the algorithm above,we initialize FCM such that the starting centroids are chosen to be as orthogonal as possible which is suggested for k -means clustering method in Ref.[12].The initialization does not change the time complexity,and also can improve the quality of the clusterings,thus at the same time reduces the need for restarting the random initialization process.The framework of our algorithm is similar to several spectral clustering methods in previous studies[5,9,12,11].We also map data points (work nodes in our study)into Euclidean space by computing the top K eigenvectors of a generalized eigen system and then cluster the embedding using a fuzzy clustering method just as others using geometric clustering algorithm or general hierarchical clustering algorithm.Here,we emphasize two key points different from those earlier studies:We introduce a generalized modular function e Q employing fuzzy concept,which is devised for evaluating the goodness of overlapping community structure. In combination with the novel e Qfunction,we introduce fuzzy clustering method into network clustering instead of general hard clustering methods.This means that our algorithm can uncover overlapping clusters,whereas general framework:‘‘Objective function such as Q function and Normalized cut function+Spectral mapping+general geometric clustering/hierarchical clustering’’cannot achieve this.3.Experimental resultsWe have implemented the proposed algorithm by Matlab.And the fuzzy clustering toolbox [14]is used for our analysis.In order to make an intuitive comparison,we also compute the hard clustering based on the original Q -function,spectral mapping (same as we used)and k -means clustering.We illustrate the fuzzy concept and the difference of our method with traditional divisive algorithms by a simple example shown in Fig.1.Just as mentioned above,the network visually suggests three clusters.But classifying node 5(or node 9)simultaneously into two clusters may be more reasonable.We can see from Fig.1that our method did uncover the overlapping communities for this simple network,while the traditional method can only make one node belong to a single cluster.We also present the analysis of two real networks,i.e.the Zachary’s karate club network and the American college football team network for better understanding the differences between our method and traditional methods.S.Zhang et al./Physica A 374(2007)483–490486S.Zhang et al./Physica A374(2007)483–490487 3.1.Zachary’s karate clubThe famous karate club network analyzed by Zachary[15]is widely used as a test example for methods of detecting communities in complex networks[1,8,16,3,4,17,9,18,19].The network consists of34members of a karate club as nodes and78edges representing friendship between members of the club which was observed over a period of two years.Due to a disagreement between the club’s administrator and the club’s instructor, the club split into two smaller ones.The question we concern is that if we can uncover the potential behavior of the network,detect the two communities or multiple groups,and particularly identify which community a node belongs to.The network is presented in Fig.2,where the squares and the circles label the members of the two groups.The results of k-means and our analysis are illustrated in Fig.3.The k-means combined with Q function divides the network into three parts(see in Fig.3A),but we can see that some nodes in one cluster are also connected densely with another cluster such as node9and31in cluster 1densely connecting with cluster2,and node1in cluster2with cluster3.Fig.3B shows the results of our method,from which we can see that node1,9,10,31belong to two clusters at the same time.These nodes in the network link evenly with two clusters.Another thing is that the two methods both uncover three communities but not two.There is a small community included in the instructor’s faction,since the set of nodes5,6,7,11,17only connects with node1in the instructor’s faction.Note that our method also classifies node1into the small community,while k-means does not.work of American college football teamsThe second network we have investigated is the college football network which represents the game schedule of the2000season of Division I of the US college football league.The nodes in the network represent the115 teams,while the links represent613games played in the course of the year.The teams are divided into conferences of8–12teams each and generally games are more frequent between members of the same conference than between teams of different conferences.The natural community structure in the network makes it a commonly used workbench for community-detecting algorithm testing[3,5,7].Fig.4shows how the modularity Q and e Q vary with k with respect to k-means and our method,respectively. The peak for k-means is at k¼12,Q¼0:5398,while for our algorithm at k¼10,e Q¼0:4673with l¼0:10. Both methods identify ten communities which contain ten conferences almost exactly.Only teams labeled as Sunbelt are not recognized as belonging to a same community for both methods.This group is classified as well in the results of Refs.[3,19].This happens because the Sunbelt teams played nearly as many games against Western Athletic teams as they played in their own conference,and they also played quite a number of gamesagainst Mid-American team.Our method identified11nodes(teams)which belong to at least twoFig.2.Zachary’s karate club network.Square nodes and circle nodes represent the instructor’s faction and the administrator’s faction, respectively.Thisfigure is from Newman and Girvan[8].communities (see Fig.5,11red nodes).These nodes generally connect evenly with more than one community,so we cannot classify them into one specific community correctly.These nodes represent ‘fuzzy’points which cannot be classified correctly by employing current link information.Maybe such points play a ‘bridge’role in two or more communities in complex network of other types.4.Conclusion and discussionIn this paper,we present a new method to identify the community structure in complex networks with a fuzzy concept.The method combines a generalized modularity function,spectral mapping,and fuzzy clustering technique.The nodes of the network are projected into d -dimensional Euclidean space which is obtained by computing the top d nontrivial eigenvectors of the generalized eigensystem Ax ¼tDx .Then the fuzzy c -means clustering method is introduced into the d -dimensional space based on general Euclidean distance to cluster the data points.By maximizing the generalized modular function e QðU d Þfor varying d ,we obtain the appropriate number of clusters.The final soft assignment matrix determines the final clusters’membership with a designated threshold l .Fig.3.The results of both k -means and our method applied to karate club network.A:The different colors represent three different communities obtained by k -means and the right table shows values of NG’Q versus different k .B:Four red nodes represent the overlap of two adjacent communities obtained by our method and the right table shows values of new Q versus different k with l ¼0:25.3.t y 0510152000.10.20.30.40.50.6k-meansK N G ' Q 0510152000.10.20.30.40.5fuzzy c-means K N e w Q Fig.4.Q and e Qvalues versus k with respect to k -means and fuzzy c -means clustering methods for the network of American college football team.S.Zhang et al./Physica A 374(2007)483–490488Although spectral mapping has been comprehensively used before to detect communities in complex networks (even in clustering the real-valued points),we believe that our method represents a step forward in this field.A fuzzy method is introduced naturally with the generalized modular function and fuzzy c -means clustering technique.As our tests have suggested,it is very natural that some nodes should belong to more than one community.These nodes may play a special role in a complex network system.For example,in a biological network such as protein interaction network,one node (protein or gene)belonging to two functional modules may act as a bridge between them which transfers biological information or acts as multiple functional units [6].One thing should be noted is that when this method is applied to large complex networks,computational complexity is a key problem.Fortunately,some fast techniques for solving eigensystem have been developed[20]and several methods of FCM acceleration can also be found in the literature [21].For instance,if we adopt the implicitly restarted Lanczos method (IRLM)[20]to compute the K À1eigenvectors and the efficient implementation of the FCM algorithm in Ref.[21],we can have the worse-case complexity of O ðmKh þnK 2h þK 3h Þand O ðnK 2Þ,respectively,where m is the number of edges in the network and h is the number of iteration required until convergence.For large sparse networks where m $n ,and K 5n ,the algorithms will scale roughly linearly as a function of the number of nodes n .Nonetheless,the eigenvector computation is still the most computationally expensive step of the method.We expect that this new method will be employed with promising results in the detection of communities in complex networks.AcknowledgmentsThis work is partly supported by Important Research Direction Project of CAS ‘‘Some Important Problem in Bioinformatics’’,National Natural Science Foundation of China under Grant No.10471141.The authors thank Professor M.E.J.Newman for providing the data of karate club network and the college football team network.Fig.5.Fuzzy communities of American college football team network (k ¼10and e Q¼0:4673)with given l ¼0:10(best viewed in color).S.Zhang et al./Physica A 374(2007)483–490489References[1]M.E.J.Newman,Detecting community structure in networks,Eur.Phys.J.B 38(2004)321–330.[2]L.Danon,J.Duch,A.Diaz-Guilera,A.Arenas,Comparing community structure identification,J.Stat.Mech.P09008(2005).[3]M.Girvan,M.E.J.Newman,Community structure in social and biological networks,A 99(12)(2002)7821–7826.[4]J.Duch,A.Arenas,Community detection in complex networks using extremal optimization,Phys.Rev.E 72(2005)027104.[5]S.White,P.Smyth,A spectral clustering approach to finding communities in graphs,SIAM International Conference on DataMining,2005.[6]G.Palla,I.Derenyi,I.Farkas,T.Vicsek,Uncovering the overlapping community structure of complex networks in nature and society,Nature 435(2005)814–818.[7]J.Reichardt,S.Bornholdt,Detecting fuzzy community structures in complex networks with a Potts model,Phys.Rev.Lett.93(2004)218701.[8]M.E.J.Newman,M.Girvan,Finding and evaluating community structure in networks,Phys.Rev.E 69(2004)026113.[9]L.Donetti,M.A.Mun oz,Detecting network communities:a new systematic and efficient algorithm,J.Stat.Mech.P10012(2004).[10]J.C.Dunn,A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,J.Cybernet.3(1973)32–57.[11]D.Verma,M.Meila,A comparison of spectral clustering algorithms.Technical Report,2003,UW CSE Technical Report 03-05-01.[12]A.Ng,M.Jordan,Y.Weiss,On spectral clustering:analysis and an algorithm,Adv.Neural Inf.Process.Systems 14(2002)849–856.[13]J.C.Bezdek,Pattern Recognition with Fuzzy Objective Function Algorithms,Plenum Press,New York,1981.[14]Fuzzy Clustering Toolbox-h http://www.fmt.vein.hu/softcomp/fclusttoolbox/i .[15]W.W.Zachary,An information flow model for conflict and fission in small groups,J.Anthropol.Res.33(1977)452–473.[16]M.E.J.Newman,Fast algorithm for detecting community structure in networks,Phys.Rev.E 69(2004)066133.[17]F.Radicchi,C.Castellano,F.Cecconi,V.Loreto,D.Parisi,Defining and identifying communities in networks,Proc.Natl.Acad.A 101(9)(2004)2658–2663.[18]F.Wu,B.A.Huberman,Finding communities in linear time:a physics approach,Eur.Phys.J.B 38(2004)331–338.[19]S.Fortunato,tora,M.Marchiori,A method to find community structures based on information centrality,Phys.Rev.E 70(2004)056104.[20]Z.Bai,J.Demmel,J.Dongarra,A.Ruhe,H.Vorst (Eds.),Templates for the Solution of Algebraic Eigenvalue Problems:A PracticalGuide,SIAM,Philadelphia,PA,2000.[21]J.F.Kelen,T.Hutcheson,Reducing the time complexity of the fuzzy c -means algorithm,IEEE Trans.Fuzzy Systems 10(2)(2002)263–267.S.Zhang et al./Physica A 374(2007)483–490490。
动态社区的点增量发现算法顾炎;熊超【摘要】当前复杂网络中动态社区发现方式大多为孤立地考察当前时间节点,没有利用之前时间节点上社区结构的信息,因而产生了大量的冗余计算.为解决此问题,基于动态社会网络在短时间内未发生过多改变的短时平滑性假设,提出了一种增量聚类动态社区发现算法.该算法将物理学领域万有引力的思想引入到动态社区发现中,针对动态社会网络中的节点,定义了节点间的相互作用力,在t-1与t时刻社区变化差量的基础上,通过比较节点间作用力对节点的社区归属进行了分析和调整,以期在t时刻快速准确地发现动态社区.在安然邮件数据集上的实验表明,当网络中的节点数量达到104以上,提出的算法能够在两分钟左右的时间内挖掘出模块度为0.53左右的社区结构,优于其他几种算法,说明该方法能够快速准确地挖掘出较好的社区结构.%Currently,most ways of community detection in dynamic complex networks belongs to separate observations on nonce time nodes without utilization of community structural information on former time nodes,thus more redundant computation has been generated.To solve this problem,on the short-term smoothness assumption that the dynamic community networks could not generate too many changes in short-time interval,an incremental clustering algorithm for detecting dynamic communities has been proposed.The universal gravitation in physic field has been introduced into community detection and mutual forces has been defined between nodes in dynamic community.The community adscription of the node has been analyzed and adjusted through comparison of the mutual forces based on the difference between t-1 andt interval so as to detect dynamic community quickly and accurately at t interval.Results of experiments on Enron email dataset show that when the network has more than 104 vertices,the proposed algorithm can detect community structures with modularity at around 0.53 within about two minutes and is more efficient than other algorithms,and thus it can detect dynamic community structures quickly and accurately.【期刊名称】《计算机技术与发展》【年(卷),期】2017(027)006【总页数】6页(P81-85,90)【关键词】节点;增量;动态网络;社区发现【作者】顾炎;熊超【作者单位】南京邮电大学计算机学院,江苏南京 210023;南京邮电大学计算机学院,江苏南京 210023【正文语种】中文【中图分类】TP391人类是社会性动物,往往基于类似的喜好、共享的利益和共同的地缘等聚集成群体,这样的群体被称之为社区。
Louvain法及其在R语言中的实现1. 介绍Louvain法是一种用于社区发现(Community Detection)的算法,旨在将网络中的节点分组成不同的社区或群组。
它是通过最大化模块度(Modularity)来实现这一目标的。
社区发现是复杂网络分析中的一个重要问题,它可以帮助我们理解网络结构以及节点之间的关系。
Louvain法是一种基于局部优化策略的层次聚类算法,具有高效且可扩展性强的特点。
R语言是一种广泛应用于数据分析和统计建模的编程语言,拥有丰富的包和库,使得实现Louvain法变得相对容易。
本文将详细介绍Louvain法的原理和步骤,并给出在R语言中使用Louvain法进行社区发现的示例代码。
2. Louvain法原理Louvain法采用了一种贪心策略,通过迭代地优化每个节点所属社区来达到全局最优。
其基本思想如下:1.初始化:将每个节点视为一个单独的社区。
2.迭代优化:对每个节点,计算将其移到其他社区时能够获得的模块度增益,选择模块度增益最大的移动方式,直到没有模块度增益或达到最大迭代次数。
3.合并社区:将节点按照当前划分结果进行合并,形成新的网络。
4.重复步骤2和3,直到无法再进行社区合并。
Louvain法通过优化模块度来评估社区划分的质量。
模块度是一个衡量网络内部连接强度与预期连接强度之差的指标。
当模块度接近1时,表示网络内部连接强于预期;当模块度接近0时,表示网络内部连接与预期相当。
3. Louvain法在R语言中的实现在R语言中,我们可以使用igraph包来实现Louvain法进行社区发现。
igraph是一个用于创建、操作和分析图形和网络数据结构的包。
以下是使用Louvain法进行社区发现的示例代码:# 安装和加载igraph包install.packages("igraph")library(igraph)# 创建一个简单的图形对象g <- graph(c(1,2, 2,3, 3,4, 4,1))# 使用Louvain法进行社区发现louvain_communities <- cluster_louvain(g)# 输出社区划分结果print(louvain_communities)上述代码中,我们首先安装并加载了igraph包。
一文读懂社会网络分析(SNA)理论、指标与应用开新坑!社交网络分析(又称复杂网络、社会网络,Social Network Analysis)是诞生于数学图论、计算机科学、物理学的交叉碰撞中的一门有趣的学科。
缘起:我研究SNA已经有近2年的时光,一路坎坷走来有很多收获、踩过一些坑,也在线上给很多学生讲过SNA的入门知识,最近感觉有必要将心得和基础框架分享出来,抛砖引玉,让各位对SNA感兴趣的同学们一起学习进步。
我的能力有限,如果有不足之处大家一起交流,由于我的专业的影响,本文的SNA知识可能会带有情报学色彩。
面向人群:优先人文社科类的无代码学习,Python、R的SNA 包好用是好用,但是对我们这这些社科的同学来说门槛太高,枯燥的代码首先就会让我们丧失学习兴趣。
特征:类综述文章,主要目的是以通俗的语言和精炼的框架带领各位快速对SNA领域建立起一个全面的认知,每个个关键概念会附上链接供感兴趣的同学深入学习。
开胃菜:SNA经典著作分享《网络科学引论》纽曼 (访问密码 : v9d9g3)2 概述篇:什么是网络?我们从哪些角度研究它?1) 认识网络SNA中所说的网络是由节点(node,图论中称顶点vertex)和边(edge)构成,如下图。
每个节点代表一个实体,可以是人、动物、关键词、神经元;连接各节点的边代表一个关系,如朋友关系、敌对关系、合作关系、互斥关系等。
最小的网络是由两个节点与一条边构成的二元组。
Les Miserables人际关系网络2) 构建网络就是建模马克思说过,“人的本质在其现实性上,它是一切社会关系的总和。
” 事实上,当我们想快速了解一个领域,无论该领域是由人、知识、神经元乃至其他实体集合构成,利用SNA的方法将实体及其相互关系进行抽象和网络构建,我们就完成了对某一领域的“建模”,这个模型就是网络图,拿科学网络计量学家陈超美的观点来说,借助网络图,“一图胜千言,一览无余”。
3) 社会网络类型此处展示常见且常用的网络类型名词,想要具体了解可以点击链接仔细查看!•网络中节点的来源集合异同o一模网络 one-modeo二模网络 two-mode•视角:•边权重o加权网络 weight networko无权网络 unweight networko符号网络 Signed network•关系是否有方向o有向网络 Directed networko无向网络 Undirected network4) 网络分析的5大中心问题SNA可以帮助我们快速了解该网络中的分布格局和竞争态势,“孰强孰弱,孰亲孰远,孰新孰老,孰胜孰衰”,这16字箴言是我学习SNA总结的精华所在,初中级甚至高级的社会网络分析学习几乎完全就是围绕着这四个方面开展,后面将要讲到的理论与方法皆为此服务,希望同学们可以重点关注。
社交网络分析中的网络社区发现方法探究社交网络已经成为人们日常生活中不可或缺的一部分。
在这些网络中,人们可以与朋友、家人和陌生人交流、分享信息、建立关系等。
社交网络分析(Social Network Analysis,简称SNA)是一种研究社交网络中个体之间关系的方法。
其中一个重要的任务是发现网络中的社区,即具有紧密联系且相对独立于其他群体的集合。
本文将探讨一些常见的社交网络分析中的网络社区发现方法。
一、基于图论的方法图论是研究图形的一门学科,社交网络可以被看作是一种图形结构,人们之间的关系可以用图中的节点和边来表示。
基于图论的方法是社交网络分析中最常用的方法之一。
其中最常见的方法之一是通过寻找图中的“密度”来发现社区。
密度是指节点之间相互连接的程度,如果一个社区内的节点之间连接紧密,而与其他节点连接较少,则可以判断这个节点集合为一个社区。
一个常见的指标是模块度(modularity),它用于衡量网络中社区结构的优劣。
模块度越高,说明社区结构越优化。
而通过改变社区内边的权重,可以进一步优化社区结构。
二、基于聚类的方法聚类是一种将数据集划分为不相交子集的方法,其目标是使得每个子集内部的相似度最大化,而不同子集之间的相似度最小化。
在社交网络分析中,聚类方法也可以应用于社区发现。
基于聚类的方法可以将社交网络中的节点划分为若干个聚类,每个聚类即为一个社区。
常用的聚类算法有K-means、DBSCAN、谱聚类等。
这些方法通过计算节点之间的距离或相似度,将节点划分为不同的聚类,从而发现社区结构。
然而,由于社交网络中存在节点之间的高度相互连接,传统的聚类方法可能不适用于发现社区结构。
因此,还需要对聚类方法进行改进和优化。
三、基于社区结构属性的方法社交网络中的节点通常会具有一些属性,比如用户的年龄、性别、职业等。
基于社区结构属性的方法是通过分析这些节点属性来发现社区结构。
常见的方法有基于属性相似度的方法和基于属性聚类的方法。
社交网络分析算法的使用教程社交网络分析(Social Network Analysis,SNA)是一种研究人际关系的方法,通过分析个体之间的连接和关联,揭示社交网络中的模式和结构。
在社交媒体时代,社交网络分析算法成为了研究网络社会学、营销学以及信息传播的重要工具。
本文将介绍几种常用的社交网络分析算法及其使用教程。
一、度中心性算法(Degree Centrality)度中心性算法是最简单也是最常用的社交网络分析算法之一,用于计算每个节点在网络中有多少条边与之连接。
该算法可以用来评估一个节点的重要性和影响力。
具体计算方法如下:1. 首先,将网络数据导入社交网络分析工具(如Gephi、Cytoscape等)中。
2. 在工具中选择度中心性算法,并点击运行。
3. 程序会计算每个节点的度中心性,并将结果显示在节点上或作为节点的属性。
4. 分析结果,找出具有较高度中心性的节点,这些节点在网络中起到重要的枢纽作用。
二、介数中心性算法(Betweenness Centrality)介数中心性算法用于衡量节点在网络中的中介地位,即节点在连接其他节点之间的最短路径中扮演的角色。
该算法可以用来识别那些在信息传播、资源传输中起到关键作用的节点。
具体计算方法如下:1. 在社交网络分析工具中导入网络数据。
2. 在工具中选择介数中心性算法,并点击运行。
3. 工具会计算每个节点的介数中心性,并在节点上显示结果。
4. 根据分析结果,找出介数中心性较高的节点,这些节点在信息传播和资源传输中扮演着重要的桥梁角色。
三、聚集系数算法(Clustering Coefficient)聚集系数算法用于衡量节点邻居之间的相互连接程度,用来判断网络中的群组和社区结构。
该算法可以帮助我们理解社交网络中的小世界现象和群体行为。
具体计算方法如下:1. 将网络数据导入社交网络分析工具中。
2. 在工具中选择聚集系数算法,并运行。
3. 工具会计算每个节点的聚集系数,并在节点上显示结果。
modularity算法Modularity algorithm is a popular algorithm used in community detection in complex networks. It was first introduced by Newman and Girvan in 2004 and has since been widely applied in various fields, including social network analysis, biological networks, and web graph analysis. The algorithm aims to identify densely connected groups or communities within a network by maximizing the modularity value.Modularity is a measure of the quality of a partition in a network. It quantifies the difference between the number of edges within a community and the number of edges expected in a random network with the same degree distribution. The modularity value, Q, ranges from -1 to 1, where a higher value indicates a better community structure. The algorithm iteratively merges and splits nodes to maximize the modularity value.The modularity algorithm can be summarized in several steps:1. Start with each node in its own community.2. Calculate the initial modularity value of the network.3. For each pair of nodes, calculate the change in modularity if they are merged into the same community.4. Merge the pair of nodes that results in the highest increase in modularity.5. Recalculate the modularity value after the merge.6. Repeat steps 3-5 until no further increase in modularity is possible or the desired number of communities is reached.The modularity value is calculated using the following formula:Q = (1/2m) * Σ [Ai, j - (ki * kj) / (2m)] * δ(ci, cj)where Ai, j is the number of edges between nodes i and j, ki and kj are the degrees of nodes i and j, m is the total number of edges in the network, ci and cj are the community assignments of nodes i and j, and δ(ci, cj) is the Kronecker delta function which is 1 if nodes i and j are in the same community and 0 otherwise.It is important to note that the modularity algorithm suffers from resolution limit, meaning it tends to detect larger communities while missing smaller ones. Various methods have been proposed to address this issue, such as the Louvain method and the Infomap algorithm.In conclusion, the modularity algorithm is a powerful and widely used algorithm for community detection in complex networks. It provides a quantitative measure of the quality of community structure and can help uncover the hidden organization within a network. By iteratively optimizing the modularity value, the algorithm identifies densely connected groups or communities. However, it is important to consider the limitations of the algorithm, such as the resolution limit, when applying it to real-world networks.。
社会网络分析中的社群发现方法评估社会网络分析是一种研究人际关系与交互的方法,通过分析人与人之间的关系,可以揭示出社会中的各种结构和特征。
其中一个重要的研究领域就是社群发现(Community Detection),它旨在寻找社会网络中的紧密连接的社群或子群体。
社群发现方法的评估是至关重要的,本文将对一些常见的社群发现方法进行评估。
首先介绍一下社群发现方法中的一个基本概念:模块度(Modularity)。
模块度是衡量社群结构划分好坏的指标,它根据社区内部联系紧密、社区之间联系稀疏的特性进行计算。
模块度的值介于-1和1之间,越接近1表示社区划分得越好。
一种常见的社群发现方法是基于聚类的方法,如K-means算法。
K-means算法通过将节点划分为K个不相交的簇来发现社群结构。
然而,K-means算法依赖于初始中心点的选择,对异常值敏感,并且要求簇的数量事先给定。
因此,在应用K-means算法进行社群发现时,需要进行多次实验以找到最佳的结果。
另一种常见的社群发现方法是基于图划分的方法,如Louvain算法和Girvan-Newman算法。
Louvain算法通过优化模块度来划分社群,它通过迭代将节点从一个社区移动到另一个社区,直到模块度不再增加为止。
Louvain算法具有高效性和可扩展性,能够在大规模网络中快速发现社群结构。
然而,Louvain算法也存在一些问题,例如对网络中的重叠社群处理不足。
Girvan-Newman算法是一种基于边缘介数(Edge Betweenness)的方法,它通过计算边缘介数来度量网络中的边缘的重要性。
然后,将具有最高介数的边缘删除,直到网络被划分为多个社群或无法再删除边缘为止。
Girvan-Newman算法能够发现不同规模的社群结构,但是在大规模网络中计算复杂度较高。
此外,还有一些基于模型的社群发现方法,如概率图模型和社交信息传播模型。
这些方法基于一定的假设和模型,通过最大化似然函数或最小化模型的损失函数来划分社群。
基于邻域信息的社区发现方法韩路;张海【摘要】考虑含有节点邻域信息的新模块度函数的社区发现方法和最优分组下标度参数的选择问题,通过谱松弛方法求解模块度函数的最大化问题,最终利用新算法快速求解,并通过真实网络数据验证算法能更好的发现社区.【期刊名称】《纯粹数学与应用数学》【年(卷),期】2015(031)001【总页数】8页(P85-92)【关键词】模块度函数;邻域信息;谱方法【作者】韩路;张海【作者单位】西北大学数学学院,陕西西安710127;西北大学数学学院,陕西西安710127【正文语种】中文【中图分类】O233;TP391.41复杂网络作为一种数据关系的表达方法,成为目前机器学习领域的热点之一.其中,网络中的节点表示研究问题的对象,边表示对象和对象之间的一种属性关系.在现实世界中,复杂网络常分为以下几种类型,如技术网络,社交网络,信息网络和生物网络等[1-2].社区描述网络的结构,它是指在一个较大的网络中,网络的结构特征通过节点位于不同组表现出来.比如组内边的联接比较紧密,组间边的联接比较稀疏.如何有效发现网络中的社区,对于理解网络功能和结构有着重要意义.例如,在一个学术关系网络中,节点表示作者,边表示每两位作者之间是否有合作发表论文.此网络中的社区可能由一些研究方向相同或相近的作者组成,形成不同特征的社区.因此,如何发现此类社区并预测网络中某一位作者所属的社区,对于研究网络的行为具有实际意义.近年来,社区发现是网络研究的热点之一[3-4].Newman和Girvan[5]第一次提出模块度函数Q用于社区发现.尽管模块度函数自提出后得到广泛应用,发展了很多以该函数为目标函数的新算法.如Newman[6]提出的一种贪婪策略下的快速聚合算法,White和Smyth[7]提出的一种谱聚类方法等.但是该函数Q并没有利用节点的邻域信息,对于很多有节点信息的真实网络,则该模块度函数Q并不能很好地度量该网络的社区结构.因此,研究结合节点信息的社区发现方法有着重要意义.而文献[8]利用了节点的邻域信息,扩展并提出了新的模块度函数QDist,它度量了节点的邻域信息,QDist不但适合节点有额外信息的网络,而且可以得到不同标度下的社区结构.虽然该文章给出了在特定标度下的最优分组结果,但是并没有给出如何选取此标度的方法.通常地,基于模块度函数方法发现社区有许多典型的算法,如何利用并推广现有算法到结合了节点信息的新模块度函数发现社区,同时如何选取最优分组时的标度是本文关注的问题.谱分析方法早在20世纪70、80年代就已经被提出[9],该方法后来被发展成许多不同的谱聚类方法[10].其基本思想是通过对邻接矩阵形成的拉普拉斯矩阵或者标准拉普拉斯矩阵的特征值与特征向量进行分析,从而进行网络的社区发现.而Newman[11-12]将谱分析方法与模块度函数最大化相结合,提出一种谱方法并应用于社区发现.本文研究将Newman所提出的谱方法推广到新的模块度,同时解决新模块度函数最优分组时标度参数的选择问题.通过将最大化QDist问题转化为谱松弛问题,进而提出一种二分的谱算法,同时给出了最优分组时标度的选取方法.最后,通过在三个真实网络数据上进行实验,结果表明该算法能够有效的给出实际网络二分的社区结构.一个网络通常包括两组信息,节点个数n和邻接矩阵A=(Aij)1≤i,j≤n.其中,Aij取值为1或者0.当Aij=1时,表示节点i和j之间有边连接,当Aij=0时,表示节点i和j之间没有边连接.模块度函数的定义如下:上述三种距离分别描述两节点之间的联接强度,不同距离的选择包含网络中不同的结构信息.例如,Jaccard距离[13-14]包含网络的节点的邻域信息,欧式距离包含网络的节点的属性信息,最短距离包含网络的拓扑信息.一般地,对于一个网络,当知道网络的真实分组时,可以计算QDist的值,并且QDist的值越大,社区结构越明显.本文仅考虑无向网络的两分社区情况,使得QDist最大化.对于节点i,若si的值为1,则表示节点i属于组1,若si的值为−1,则表示节点i属于组2,那么δ(li,lj)可以化为(sisj+1)/2.则本节通过对真实数据Zachary空手道俱乐部网络,海豚社交网络和美国政治书籍网络试验说明算法的有效性.本实验中的相似距离 dij都采用 Jaccard距离.即这里Γ(i)表示节点i的邻居节点.实验一本实验通过对 Zachary空手道俱乐部网络[15]进行实验,该网络是Zachary 在1970年代初,研究了一所美国大学的空手道俱乐部成员的社交网络.网络中的节点代表34位俱乐部成员,边代表每个成员之间的友谊关系.但是由于在是否涨学费问题上的分歧,俱乐部主席(节点34)和教练(节点1)的之间发生了冲突,俱乐部自发形成了支持管理者和教练的两组成员.不同的分组按红色和蓝色区分.现在的问题是在只知道邻接矩阵的情况下,能否正确检测出空手道俱乐部网络真实的社区结构?本实验参数ε=10−3.实验分析图1(b)表示空手道俱乐部网络在利用本文算法得出分组的QLaplace值的情况,当σ∈(0,0.20)和σ=1.04时,网络的分组结果如图1(a)所示,由图1(b)可知,此时网络的分组的QLaplace值最高(忽略了0值,因为此时的分组是全部节点分成一组),和真实分组比较,除了节点3与真实网络分组不同之外,其他节点的分组完全相同.实验二本实验通过对海豚社交网络[16-17]进行实验,该网络是Lusseau在神奇湾观察62只海豚后建立的.网络中的节点代表62只海豚,如果两只海豚之间有边,则表示这两只海豚被观察在一起次数多于期望的次数,代表海豚之间某种亲密关系.但是由于一只海豚的暂时离开导致海豚群体分成了20只和 42只两个组.不同的分组按红色和蓝色区分.本实验参数ε=10−3.实验分析图 2(c)表示海豚社交网络在利用本文算法得出分组的 QLaplace值的情况. 当σ∈(0,0.34)和σ∈(0.72,+∞)时,网络的分组结果如图2(a)所示.和真实分组比较发现,除了节点31和节点40与真实网络分组不同之外,其他节点的分组完全相同.当σ=0.4时,网络的分组情况如图2(b)所示,由图2(c)可知,此时网络分组的QLaplace值最高(忽略了0值,因为此时的分组是全部节点分成一组),和真实分组比较发现,只有节点40和真实网络分组不同,此时的分组结果比其他QLaplace值的结果都要好.实验三本实验通过对美国政治书籍网络进行实验.该网络节点表示在亚马逊网站销售的105本关于美国政治的书籍,边表示两本书经常被同一消费者购买.该书籍被Mark Newman划分为关于自由党和保守党两种书籍,还有少部分书籍被划分为中间派书籍.不同的分组按红色和蓝色区分.本实验参数ε=10−2.实验分析图3(c)表示美国政治书籍网络在利用本文算法得出分组的QLaplace值的情况. 当σ∈(0,0.32)和σ∈(1.34,+∞)时,美国政治书籍网络的分组结果如图3(a)所示.该结果将节点59和节点78错分.但是当σ∈(0.88,1.06)时,该网络分组结果如图3(b)所示.由图3(c)可知,此时网络分组的QLaplace值最高,该结果同图3(a)的结果相比较,节点53的分组结果不同.此时把节点53错分.节点53的5个邻居节点中有3个被分为自由党,2个被分为保守党,所以将节点53错分了.本文研究了网络的社区结构问题,通过将包含邻域信息的模块度函数QDist的最大化问题转化为谱松弛问题,同时提出一种二分的谱算法进行求解.将Newman的二分谱方法推广到新模块度函数模型上,同时解决的新模块度函数下网络最优分组时的标度选取问题.最后,通过实验证明了新算法可以有效的辨识网络的二分社区结构.【相关文献】[1]Newman M E works:An Introduction[M].New York:Oxford University Press,2010.[2]Albert R,Barabsi A.Statistical mechanics of complex networks[J].Reviews of Modern Physics,2002,74:47-97.[3]Newman M E J.The structure and function of complex networks[J].SIAM Review,2003,45:167-256.[4]Newman M E J,Leicht E.Mixture models and exploratory analysis innetworks[J].Proceedings of the National Academy of Sciences,2007,104:9564-9569. [5]Newman M E J,Girvan M.Finding and evaluating community structure innetworks[J].Phys.Rev.E.,2004,69:026113.[6]Newman M E J.Fast algorithm for detecting community structure innetworks[J].Phys.Rev.E.,2004,69:066133.[7]White S,Smyth P.A spectral clustering approach to finding communities in graphs[J].In:Kamath C,Goodman A,eds.Proc.of the 5th SIAM Int Conf.on Data Mining.Philadelphia:SIAM,2005,76-84.[8]Liu X,Murara T,Wakita K.Extending modularity by capturing the similarity attraction feature in the null model[J].2013,arXiv:1210.4007 v3[cs.SI].[9]Fiedler M.Algebraic connectivity of graphs[J].Czech Math J.,1973,23(98):298-305.[10]Von Luxburg U.A tutorial on spectral clustering[J].Statistics and Computing,2007,17:395-416.[11]Newman M E J.Modularity and community structure innetworks[J]A,2006,103(23):8577-8582.[12]Newman M E J.Spectral methods for network community detection and graph partitioning[J].Phys.Rev. E.,2013,88:042822.[13]Levandowsky M,Winter D.Distance between sets[J].Nature,1971,234:34-35.[14]Jaccard P.Etude comparative de la distribution florale dans une portion des alpes et des jura[J].Bull.Soc. Vaudoise Sci.Nat.,1901,37:547-579.[15]Zachary W W.An information flow model for conflict and fission in smallgroups[J].Journal of Anthropological Research,1977,33:452-473.[16]Lusseau D.The emergent properties of a dolphin social network[J].Proceedings of the Royal Society of London Series B,2003,270:S186-S188.[17]Lusseau D,Schneider K,Boisseau O J,et al.The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations[J].Behavioral Ecology and Sociobiology,2003,54:396-405.。
基于重叠社团发现的微博话题检测方法程飞;姬东鸿【期刊名称】《计算机工程与应用》【年(卷),期】2015(000)006【摘要】Micro-blog topic detection has become the current research focuses. A micro-blog topics detection method based on overlap community detection is proposed. Raw micro-blog data will be preprocessed. After micro-blog content segmen-tation, topic words will be extracted with part of speech and temporal distribution. And edge is created between high rele-vance topic words to get a complex network. The concept of community independent modularity is introduced, and the overlap community is detected with the model of community independent modularity maximization; each community is taken as a micro-blog topic. The method of overlap community detection can solve the problem of low accurate rate of topic detecting caused by one or more keywords belonging to more than one topic. The experiments results show that the proposed method is feasible in micro-blog topic detection.%微博话题检测是当前研究的热点,提出一种基于复杂网络重叠社团发现的微博话题检测方法。
DOI: 10.1126/science.1184819, 876 (2010);328 Science , et al.Peter J. Mucha NetworksCommunity Structure in Time-Dependent, Multiscale, and MultiplexThis copy is for your personal, non-commercial use only.clicking here.colleagues, clients, or customers by , you can order high-quality copies for your If you wish to distribute this article to othershere.following the guidelines can be obtained by Permission to republish or repurpose articles or portions of articles): December 2, 2010 (this infomation is current as of The following resources related to this article are available online at/content/329/5989/277.3.full.html A correction has been published for this article at:/content/328/5980/876.full.html version of this article at:including high-resolution figures, can be found in the online Updated information and services,/content/suppl/2010/05/13/328.5980.876.DC1.html can be found at:Supporting Online Material /content/328/5980/876.full.html#related found at:can be related to this article A list of selected additional articles on the Science Web sites /content/328/5980/876.full.html#ref-list-1, 3 of which can be accessed free:cites 19 articles This article /content/328/5980/876.full.html#related-urls 1 articles hosted by HighWire Press; see:cited by This article has been/cgi/collection/comp_math Computers, Mathematicssubject collections:This article appears in the following registered trademark of AAAS.is a Science 2010 by the American Association for the Advancement of Science; all rights reserved. The title Copyright American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the Science o n D e c e m b e r 2, 2010w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o mCommunity Structure inTime-Dependent,Multiscale,and Multiplex NetworksPeter J.Mucha,1,2*Thomas Richardson,1,3Kevin Macon,1Mason A.Porter,4,5Jukka-Pekka Onnela 6,7Network science is an interdisciplinary endeavor,with methods and applications drawn from across the natural,social,and information sciences.A prominent problem in network science is the algorithmic detection of tightly connected groups of nodes known as communities.We developed a generalized framework of network quality functions that allowed us to study the community structure of arbitrary multislice networks,which are combinations of individual networks coupled through links that connect each node in one network slice to itself in other slices.This framework allows studies of community structure in a general setting encompassing networks that evolve over time,have multiple types of links (multiplexity),and have multiple scales.The study of graphs,or networks,has a long tradition in fields such as sociology and mathematics,and it is now ubiquitous in academic and everyday settings.An important tool in network analysis is the detection of mesoscopic structures known as communities (or cohesive groups),which are defined intuitively as groups of nodes that are more tightly connected to each other than they are to the rest of the network (1–3).One way to quantify communities is by a quality function that compares the number of intracommunity edges to what one would expect at random.Given the network adjacency matrix A ,where the element A ij details a direct connection between nodes i and j ,one can construct a qual-ity function Q (4,5)for the partitioning of nodes into communities as Q =∑ij (A ij −P ij )d (g i ,g j ),where d (g i ,g j )=1if the community assignments g i and g j of nodes i and j are the same and 0otherwise,and P ij is the expected weight of the edge between i and j under a specified null model.The choice of null model is a crucial con-sideration in studying network community struc-ture (2).After selecting a null model appropriate to the network and application at hand,one can use a variety of computational heuristics to assign nodes to communities to optimize the quality Q (2,3).However,such null models have not been available for time-dependent networks;analyses have instead depended on ad hoc methods topiece together the structures obtained at different times (6–9)or have abandoned quality functions in favor of such alternatives as the Minimum Description Length principle (10).Although tensor decompositions (11)have been used to cluster network data with different types of connections,no quality-function method has been developed for such multiplex networks.We developed a methodology to remove these limits,generalizing the determination of commu-nity structure via quality functions to multislice networks that are defined by coupling multiple adjacency matrices (Fig.1).The connections encoded by the network slices are flexible;they can represent variations across time,variations across different types of connections,or even community detection of the same network at different scales.However,the usual procedure for establishing a quality function as a direct count of the intracommunity edge weight minus thatexpected at random fails to provide any contribu-tion from these interslice couplings.Because they are specified by common identifications of nodes across slices,interslice couplings are either present or absent by definition,so when they do fall inside communities,their contribution in the count of intra-community edges exactly cancels that expected at random.In contrast,by formulating a null model in terms of stability of communities under Laplacian dynamics,we have derived a principled generaliza-tion of community detection to multislice networks,1Carolina Center for Interdisciplinary Applied Mathematics,Department of Mathematics,University of North Carolina,Chapel Hill,NC 27599,USA.2Institute for Advanced Materials,Nanoscience and Technology,University of North Carolina,Chapel Hill,NC 27599,USA.3Operations Research,North Carolina State University,Raleigh,NC 27695,USA.4Oxford Centre for Industrial and Applied Mathematics,Mathematical Institute,University of Oxford,Oxford OX13LB,UK.5CABDyN Complexity Centre,University of Oxford,Oxford OX11HP,UK.6Department of Health Care Policy,Harvard Medical School,Boston,MA 02115,USA.7Harvard Kennedy School,Harvard University,Cambridge,MA 02138,USA.*To whom correspondence should be addressed.E-mail:mucha@1234Fig.1.Schematic of a multislice network.Four slices s ={1,2,3,4}represented by adjacencies A ijs encode intraslice connections (solid lines).Interslice con-nections (dashed lines)are encoded by C jrs ,specifying the coupling of node j to itself between slices r and s .For clarity,interslice couplings are shown for only two nodes and depict two different types of couplings:(i)coupling between neighboring slices,appropriate for ordered slices;and (ii)all-to-all interslice coupling,appropriate for categoricalslices.n o d e sresolution parameterscoupling = 0123451015202530n o d e s resolution parameterscoupling = 0.1123451015202530n o d e sresolution parameterscoupling = 1123451015202530Fig. 2.Multislice community detection of the Zachary Karate Club network (22)across multiple resolutions.Colors depict community assignments of the 34nodes (renumbered vertically to group similarly assigned nodes)in each of the 16slices (with resolution parameters g s ={0.25,0.5,…,4}),for w =0(top),w =0.1(middle),and w =1(bottom).Dashed lines bound the communities obtained using the default resolution (g =1).14MAY 2010VOL 328SCIENCE876CORRECTED 16 JULY 2010; SEE LAST PAGEo n D e c e m b e r 2, 2010w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o mwith a single parameter controlling the interslice correspondence of communities.Important to our method is the equivalence between the modularity quality function (12)[with a resolution parameter (5)]and stability of com-munities under Laplacian dynamics (13),which we have generalized to recover the null models for bipartite,directed,and signed networks (14).First,we obtained the resolution-parameter generaliza-tion of Barber ’s null model for bipartite networks (15)by requiring the independent joint probability contribution to stability in (13)to be conditional on the type of connection necessary to step between two nodes.Second,we recovered the standard null model for directed networks (16,17)(again with a resolution parameter)by generaliz-ing the Laplacian dynamics to include motion along different kinds of connections —in this case,both with and against the direction of a link.By this generalization,we similarly recovered a null model for signed networks (18).Third,we interpreted the stability under Laplacian dynamics flexibly to permit different spreading weights on the different types of links,giving multiple reso-lution parameters to recover a general null model for signed networks (19).We applied these generalizations to derive null models for multislice networks that extend the existing quality-function methodology,including an additional parameter w to control the coupling between slices.Representing each network slice s by adjacencies A ijs between nodes i and j ,with interslice couplings C jrs that connect node j in slice r to itself in slice s (Fig.1),we have restricted our attention to unipartite,undirected network slices (A ijs =A jis )and couplings (C jrs =C jsr ),but we can incorporate additional structure in the slices and couplings in the same manner as demonstrated for single-slice null models.Notating the strengths of each node individually in each slice by k js =∑i A ijs and across slices by c js =∑r C jsr ,we define the multislice strength by k js =k js +c js .The continuous-time Laplacian dynamics given byp˙is ¼∑jr ðA ijs d sr þd ij C jsr Þp jrk jr−p isð1Þrespects the intraslice nature of A ijs and the interslice couplings of C jsr .Using the steady-state probability distribution p ∗jr ¼k jr =2m ,where 2m =∑jr k jr ,we obtained the multislice null model in terms of the probability r is |jr of sampling node i in slice s conditional on whether the multislice struc-ture allows one to step from (j ,r )to (i ,s ),accounting for intra-and interslice steps separately asr is j jr p ∗jr ¼k is2m s k jr k jr d sr þC jsr c jr c jr k jr d ijk jr 2m ð2Þwhere m s =∑j k js .The second term in parentheses,which describes the conditional probability of motion between two slices,leverages the definition of the C jsr coupling.That is,the conditional probability of stepping from (j ,r )to (i ,s )along an interslice coupling is nonzero if and only if i =j ,and it is proportional to the probability C jsr /k jr of selecting the precise interslice link that connects to slice s .Subtracting this conditional joint probability from the linear (in time)approximation of the exponential describing the Laplacian dynamics,we obtained a multislice generalization of modularity (14):Q multislice ¼12m ∑ijsrhA ijs −g sk is k js 2m s d sr þd ij C jsr id ðg is ,g jr Þð3Þwhere we have used reweighting of the conditionalprobabilities,which allows a different resolution g s in each slice.We have absorbed the resolution pa-rameter for the interslice couplings into the mag-nitude of the elements of C jsr ,which,for simplicity,we presume to take binary values {0,w }indicating the absence (0)or presence (w )of interslice links.YearS e n a t o rCTMARI DENYIL IN MIWI IA KSMONDVA AL ARFL GALA MSSC KYOK WVCOID MTNMWYORAK HI Congress #ABFig.3.Multislice community detection of U.S.Senate roll call vote similarities (23)with w =0.5coupling of 110slices (i.e.,the number of 2-year Congresses from 1789to 2008)across time.(A )Colors indicate assignments to nine communities of the 1884unique senators (sorted vertically and connected across Congresses by dashed lines)in each Congress in which they appear.The dark blue and red communities correspond closely to the modern Democratic and Republican parties,respectively.Horizontal bars indicate the historical period of each community,with accompanying text enumerating nominal party affiliations of the single-slice nodes (each representing a senator in a Congress):PA,pro-administration;AA,anti-administration;F,Federalist;DR,Democratic-Republican;W,Whig;AJ,anti-Jackson;A,Adams;J,Jackson;D,Democratic;R,Republican.Vertical gray bars indicate Congresses in which three communities appeared simultaneously.(B )The same assignments according to state affiliations.SCIENCEVOL 32814MAY 2010877REPORTSo n D e c e m b e r 2, 2010w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o mCommunity detection in multislice networks can then proceed using many of the same com-putational heuristics that are currently available for single-slice networks [although,as with the stan-dard definition of modularity,one must be cautious about the resolution of communities (20)and the likelihood of complex quality landscapes that necessitate caution in interpreting results on real networks (21)].We studied examples that have multiple resolutions [Zachary Karate Club (22)],vary over time [voting similarities in the U.S.Senate (23)],or are multiplex [the “Tastes,Ties,and Time ”cohort of university students (24)].We provide additional details for each example in (14).We performed simultaneous community de-tection across multiple resolutions (scales)in the well-known Zachary Karate Club network,which encodes the friendships between 34members of a 1970s university karate club (22).Keeping the same unweighted adjacency matrix across slices (A ijs =A ij for all s ),the resolution associated with each slice is dictated by a specified sequence of g s parameters,which we chose to be the 16values g s ={0.25,0.5,0.75,…,4}.In Fig.2,we depict the community assignments obtained for cou-pling strengths w ={0,0.1,1}between each neighboring pair of the 16ordered slices.These results simultaneously probe all scales,includ-ing the partition of the Karate Club into four com-munities at the default resolution of modularity (3,25).Additionally,we identified nodes that have an especially strong tendency to break off from larger communities (e.g.,nodes 24to 29in Fig.2).We also considered roll call voting in the U.S.Senate across time,from the 1st Congress to the 110th,covering the years 1789to 2008and includ-ing 1884distinct senator IDs (26).We defined weighted connections between each pair of sen-ators by a similarity between their voting,specified independently for each 2-year Congress (23).We studied the multislice collection of these 110networks,with each individual senator coupled to himself or herself when appearing in consecutive Congresses.Multislice community detection un-covered interesting details about the continuity of individual and group voting trends over time that are not captured by the union of the 110in-dependent partitions of the separate Congresses.Figure 3depicts a partition into nine communities that we obtained using coupling w =0.5.The Congresses in which three communities appeared simultaneously are each historically noteworthy:The 4th and 5th Congresses were the first with political parties;the 10th and 11th Congresses occurred during the political drama of former Vice President Aaron Burr ’s indictment for treason;the 14th and 15th Congresses witnessed the beginning of changing group structures in the Democratic-Republican party amidst the dying Federalist party (23);the 31st Congress included the Compromise of 1850;the 37th Congress occurred during the beginning of the American Civil War;the 73rd and 74th Congresses followed the landslide 1932election (during the Great Depression);and the 85th to 88th Congresses brought the major American civil rights acts,including the congressio-nal fights over the Civil Rights Acts of 1957,1960,and 1964.Finally,we applied multislice community detection to a multiplex network of 1640college students at a northeastern American university (24),including symmetrized connections from the first wave of this data representing (i)Facebook friendships,(ii)picture friendships,(iii)roommates,and (iv)student housing-group preferences.Be-cause the different connection types are categorical,the natural interslice couplings connect an individ-ual in a slice to himself or herself in each of the other three network slices.This coupling between categorical slices thus differs from that above,which connected only neighboring (ordered)slices.Table 1indicates the numbers of communities and the percentages of individuals assigned to one,two,three,or four communities across the four types of connections for different values of w ,as a first investigation of the relative redundancy across the connection types.Our multislice framework makes it possible to study community structure in a much broader class of networks than was previously possible.Instead of detecting communities in one static network at a time,our formulation generalizing the Laplacian dynamics approach of (13)permits the simulta-neous quality-function study of community struc-ture across multiple times,multiple resolution parameter values,and multiple types of links.Weused this method to demonstrate insights in real-world networks that would have been difficult or impossible to obtain without the simultaneous consideration of multiple network slices.Although our examples included only one kind of variation at a time,our framework applies equally well to networks that have multiple such features (e.g.,time-dependent multiplex networks).We expect multislice community detection to become a powerful tool for studying such systems.References and Notes1.M.Girvan,M.E.J.Newman,Proc.Natl.Acad.Sci.U.S.A.99,7821(2002).2.M.A.Porter,J.-P.Onnela,P.J.Mucha,Not.Am.Math.Soc.56,1082(2009).3.S.Fortunato,Phys.Rep.486,75(2010).4.M.E.J.Newman,Phys.Rev.E 74,036104(2006).5.J.Reichardt,S.Bornholdt,Phys.Rev.E 74,016110(2006).6.J.Hopcroft,O.Khan,B.Kulis,B.Selman,Proc.Natl.Acad.Sci.U.S.A.101(suppl.1),5249(2004).7.T.Y.Berger-Wolf,J.Saia,in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006),p.523(10.1145/1150402.1150462).8.G.Palla,A.-L.Barabási,T.Vicsek,Nature 446,664(2007).9.D.J.Fenn et al .,Chaos 19,033119(2009).10.J.Sun,C.Faloutsos,S.Papadimitriou,P.S.Yu,inProceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2007),p.687(10.1145/1281192.1281266).11.T.M.Selee,T.G.Kolda,W.P.Kegelmeyer,J.D.Griffin,CSRI Summer Proceedings 2007,Technical Report SAND2007-7977,Sandia National Laboratories,Albuquerque,NM and Livermore,CA ,M.L.Parks,S.S.Collis,Eds.(2007),p.87(/CSRI/Proceedings).12.M.E.J.Newman,M.Girvan,Phys.Rev.E 69,026113(2004)mbiotte,J.C.Delvenne,M.Barahona,http://arxiv.org/abs/0812.1770(2008).14.See supporting material on Science Online.15.M.J.Barber,Phys.Rev.E 76,066102(2007).16.A.Arenas,J.Duch,A.Fernandez,S.Gomez,N.J.Phys.9,176(2007).17.E.A.Leicht,M.E.J.Newman,Phys.Rev.Lett.100,118703(2008).18.S.Gómez,P.Jensen,A.Arenas,Phys.Rev.E 80,016114(2009).19.V.A.Traag,J.Bruggeman,Phys.Rev.E 80,036115(2009).20.S.Fortunato,M.Barthélemy ,Proc.Natl.Acad.Sci.U.S.A.104,36(2007).21.B.H.Good,Y.-A.de Montjoye,A.Clauset,Phys.Rev.E81,046106(2010).22.W.W.Zachary,J.Anthropol.Res.33,452(1977).23.A.S.Waugh,L.Pei,J.H.Fowler,P.J.Mucha,M.A.Porter,/abs/0907.3509(2009).24.K.Lewis,J.Kaufman,M.Gonzalez,A.Wimmer,N.Christakis,works 30,330(2008).25.T.Richardson,P.J.Mucha,M.A.Porter,Phys.Rev.E 80,036111(2009).26.K.T.Poole,Voteview ()(2008).27.We thank N.A.Christakis,L.Meneades,and K.Lewis foraccess to and helping with the “Tastes,Ties,and Time ”data;S.Reid and A.L.Traud for help developing code;and A.Clauset,J.-C.Delvenne,S.Fortunato,M.Gould,and V.Traag for discussions.Congressional roll call data are from (26).Supported by NSF grant DMS-0645369(P.J.M.),James S.McDonnellFoundation grant 220020177(M.A.P.),and the Fulbright Program (J.-P.O.).Supporting Online Material/cgi/content/full/328/5980/876/DC1SOM Text References17November 2009;accepted 22March 201010.1126/science.1184819Table munities in the first wave of the multiplex “Tastes,Ties,and Time ”network (24),using the default resolution (g =1)in each of the four slices of data (Facebook friendships,picture friendships,roommates,and housing groups)under various couplings w across slices,which changed the number of communities and percentages of individuals assigned on a per-slice basis to one,two,three,or four communities.w Number of communitiesCommunities per individual (%)1234010360001000.112214.040.537.38.20.26619.949.125.3 5.70.34926.248.321.6 3.90.43631.847.018.4 2.80.53139.342.416.8 1.511610014MAY 2010VOL 328SCIENCE878REPORTSo n D e c e m b e r 2, 2010w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m1 sCiEnCE erratum post date 16 july 2010 ErratumReports: “Community structure in time-dependent, multiscale, and multiplex networks” by P. J. Mucha et al . (14 May, p. 876). Equation 3 contained a typographical error that was not caught during the editing process: The δsr term should have been outside of the paren-theses within the square brackets. The correct equation, which also appears in the support-ing online material as equation 9, is as follows:See the revised supporting online material (/cgi/content/full/sci;328/5980/876/DC2), which also includes a correction to equation 11. The computations supporting the examples described in the Report were all performed with the correct for-mula for Q multislice . The authors thank Giuseppe Mangioni for pointing out the error.Post date 16 July 2010o n D e c e m b e r 2, 2010w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o mCOMMENTARY16 JULY 2010 VOL 329 SCIENCE 276LETTERSedited by Jennifer SillsLETTERS I BOOKS I POLICY FORUM I EDUCATION FORUM I PERSPECTIVESC R ED I T : ME H M E T K A R A T A Y /W I K I M E D I A C O M M O N SBrazilian Law:Full Speed in Reverse?IS IT POSSIBLE TO COMBINE MODERN TROPI-cal agriculture with environmental conserva-tion? Brazilian agriculture offers encourag-ing examples that achieve high production together with adequate environmental pro-tection (1, 2). However, these effective prac-tices may soon lose ground to the conven-tional custom of resource overexploitation and environmental degradation.A revision to the Forest Act, the main Bra-zilian environmental legislation on private land, has just been submitted to Congress, and there is a strong chance that it will be approved. The proposed revision raises seri-ous concerns in the Brazilian scientifi c com-munity, which was largely ignored during its elaboration. The new rules will benefi t sectors that depend on expanding frontiers by clear-cutting forests and savannas and will reduce mandatory restoration of native vegetation illegally cleared since 1965. If approved, CO 2 emissions may increase substantially, instead of being reduced as was recently pledged in Copenhagen. Simple species-area relation-ship analyses also pro j ect the extinction of more than 100,000 species, a massive loss that will invalidate any commitment to biodi-versity conservation. Proponents of the new law, with well-known ties to specifi c agribusi-ness groups, claim an alleged shortage of land for agricultural expansion, and accuse the current legislation of being overprotective ofFunding Should Come to Those Who WaitWE APPLAUD THE PERSPECTIVE BY T. CLUTTON-BROCK ANDB. C. Sheldon (“The Seven Ages of Pan ,” 5 March, p. 1207) on the value of long-term behavior and ecologi-cal research. We pick up where they left off: funding. Long-term research has cumulative value that far exceeds its annual rate of return. Sadly, quick empiri-cal studies trump long-term research in the reward sys-tem for academic promotion in ecology and behavior. If long-term research is to fl ourish, we must build a reward system for studies characterized by deferred gratifi ca-tion. A sea change in these values must precede attemptsto address funding.To secure the future of long-term fi eld projects, we must act on three fronts:(i) We must devise funding mechanisms for “legacy” projects deemed too valuable to falter. Whereas the National Science Foundation’s (NSF’s) National Ecological Observatory Network and Long-Term Ecological Research programs support long-term collaborative, site-based research, there is a compelling need to support the diversity of long-term investiga-tor-initiated programs. As implemented, NSF’s Long-Term Research in Environmental Biol-ogy program is a fi rst step, but has insuffi cient support to maintain many valuable projects.(ii) We must develop mechanisms to fund the establishment of new programs with long-term potential. Such potential may not be initially appreciated, but with vision and support, new systems studied over the long run will produce novel insights.(iii) Support for ecological research must be increased. We do not advocate robbing Peter (short-term research) to pay Paul (long-term research). However, we maintain that Paul has already been robbed and some balance needs to be restored.Most of us involved in long-term research have a story to share, in which time-lim-ited funding shortages took our programs to the edge of a precipice. Investigators that suc-ceed and become known for long-term research, almost by defi nition, have found a way to adapt to funding shortfalls, usually at great personal sacrifi ce. A recent case at the Los Amigos Biological Station in the Peruvian Amazon speaks to the value of funding continuity (1). During a 4-year period of programmatic support, the scientifi c productivity of the station surged, producing many valuable fi ndings and building substantial scientifi c capacity for the region. Since the funding evaporated, the station has failed to return to its former glory, at great loss to our ability to make scientifi c inroads into understanding the ecology of this area, characterized by unrivaled biodiversity.Of course, long-term programs must remain intellectually vibrant and methodologically rigorous if they are to be supported. In the end, the onus is on ecologists to convince ourselves, society, and funding agencies that long-term research has unique and irreplaceable value.RONALD R. SWAISGOOD,1* JOHN W. TERBORGH,2 DANIEL T. BLUMSTEIN 31Applied Animal Ecology, San Diego Zoo’s Institute for Conservation Research, San Diego, CA 92027, USA. 2Center for Tropi-cal Conservation, Duke University, Durham, NC 27705, USA. 3Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA.*To whom correspondence should be addressed. E-mail: rswaisgood@Reference1. N. C. A. Pitman, Trends Ecol. Evol . 25, 381 (2010).Long-term studies. Studies spanning decades have yielded insights into red deer and other species. Published by AAASo n D e c e m b e r 2, 2010w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m SCIENCE VOL 329 16 JULY 2010277the environment in response to foreign inter-ests fronted by green nongovernmental orga-nizations. However, recent studies (3) show that, without further conversion of natural vegetation, crop production can be increased by converting suitable pastures to agriculture and intensifying livestock production on the remaining pasture. Brazil has a high poten-tial for achieving sustainable development and thereby conserving its unique biological heritage. Although opposed by the Ministry of the Environment and most scientists, the combination of traditional politicians, oppor-tunistic economic groups, and powerful land-owners may be hard to resist. The situation is delicate and serious. Under the new ForestAct, Brazil risks suffering its worst environ-mental setback in half a century, with criti-cal and irreversible consequences beyond itsborders.JEAN PAUL METZGER,1* THOMAS M. LEWINSOHN,2CARLOS A. JOLY,3 LUCIANO M. VERDADE,4 LUIZ ANTONIO MARTINELLI,5 RICARDO R. RODRIGUES 61Department of Ecology, Institute of Bioscience, University of São Paulo, 05508-900, São Paulo, SP, Brazil. 2Depart-ment of Animal Biology, State University of Campinas, Campinas, SP, Brazil. 3Department of Plant Biology, Biol-ogy Institute, State University of Campinas, Campinas, SP, Brazil. 4Center of Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil. 5Program on Food Secu-rity and the Environment, Stanford University, Stanford, CA94305, USA. 6Department of Biological Sciences, “Luiz deQueiroz” College of Agriculture, University of São Paulo, Piracicaba, Brazil.*To whom correspondence should be addressed. E-mail: jpm@p.brReferences1. D. Nepstad et al., Science 326, 1350 (2009).2. C. R. Fonseca et al., Biol. Conserv. 142, 1209 (2009).3. G. Sparovek et al., Considerações sobre o Código Florestalbrasileiro (“Luiz de Queiroz” College of Agriculture, Uni-versity of São Paulo, Piracicaba, Brazil, 2010); p.br/lepac/codigo_fl orestal/Sparovek_etal_2010.pdf.Sponsors of Traumatic Brain Injury Project I’M DELIGHTED THAT SCIENCE TOOK THE TIMEto highlight the ongoing efforts of the Common Data Elements Project for research in psychological health and traumatic brain injury (“New guidelines a im to improve studies of traumatic brain injury,” G. Miller,News of the Week, 16 April, p. 297). The level of interagency collaboration that made the project possible is exactly the type of lea dership tha t America ns should expectfrom the federal government.As noted in the story, the project is co-sponsored by four federal agencies—threeof whom were mentioned. The other agency is the National Institute on Disability andRehabilitation Research (NIDRR) withinthe Department of Education. NIDRR hasleadership, resources, and subject matter experts without which this project would nothave been nearly as successful. Together, all four agencies will continue to develop rec-ommendations and support ongoing efforts to improve and refine the Common Data Elements.GEOFFREY MANLEYDepartment of Neurosurgery, Brain and Spinal Injury Cen-ter, University of California, San Francisco, CA 94110, USA. E-mail: manleyg@Warming, Photoperiods, and Tree PhenologyC. KÖRNER ANDD. BASLER (“PHENOLOGY under global warming,” Perspectives, 19 March, p. 1461) suggest that because of photoperiodic constraints, observed effects of temperature on spring life-cycle events cannot be extrapolated to future tempera-ture conditions.However, no study has demonstrated that photoperiod is more dominant than temper-ature when predicting leaf senescence (1), leafing, or flowering, even in beech—one of the species most sensitive to photoperiod (2, 3). On the contrary, the literature [e.g., (4, 5)] supports the idea that spring phenol-ogy is highly dependent on temperature dur-ing both the endodormancy phase (the period during which the plant remains dormant dueTECHNICAL COMMENT ABSTRACTS Comment on “Observational and Model Evidence for Positive Low-Level Cloud Feedback”Anthony J. Broccoli and Stephen A. KleinClement et al . (Reports, 24 July 2009, p. 460) provided observational evidence for systematic relationships between variations in marine low cloudiness and other climatic variables and found that most current-generation climate models were defi cient in reproducing such relationships. Our analysis of one of these models (GFDL CM2.1), using more com-plete model output, indicates better agreement with observations, suggesting that more detailed analysis of climate model simulations is necessary.Full text at /cgi/content/full/329/5989/277-aResponse to Comment on “Observational and Model Evidence for Positive Low-Level Cloud Feedback”Amy C. Clement, Robert Burgman, Joel R. NorrisBroccoli and Klein argue for additional diagnostics to better assess the simulation of cloud feedbacks in climate models. We agree, and here provide additional analysis of two climate models that reveals where model defi ciencies in cloud simulation in the Northeast Pacifi c may occur. Cloud diagnostics from the forthcoming Climate Model Intercomparison Project 5 should make such additional analyses possible for a large number of climate models.Full text at /cgi/content/full/329/5989/277-bCORRECTIONS AND CLARIFICATIONSNews of the Week: “Invisibility cloaks for visible light must remain tiny, theorists predict” by A. Cho (25 June, p. 1621). The size limit on a cloak for infrared or visible light was misstated. It is a few hundred micrometers, not a few micrometers.News Focus: “Putting light’s light touch to work as optics meets mechanics” by A. Cho (14 May, p. 812). In the third para-graph, “pitchfork” should have been “tuning fork.”Reports: “Community structure in time-dependent, multiscale, and multiplex networks” by P. J. Mucha et al . (14 May, p. 876). Equation 3 contained a typographical error that was not caught during the editing process: The δsr term should have been outside of the parentheses within the square brackets. The correct equation, which also appears in the support-ing online material as equation 9, is to the right. See the revised supporting online material (/cgi/content/full/sci;328/5980/876/DC2), which also includes a correction to equation 11. The computations supporting theexamples described in the Report were allperformed with the correct formula for Q multislice . The authors thank Giuseppe Mangioni for point-ing out the error.Published by AAASo n D e c e m b e r 2, 2010w w w .s c i e n c e m a g .o r g D o w n l o a d e d f r o m。
社区发现(Community Detection)算法社区发现(Community Detection)算法用来发现网络中的社区结构,也可以视为一种广义的聚类算法。
以下是我的一个PPT 报告,分享给大家。
从上述定义可以看出:社区是一个比较含糊的概念,只给出了一个定性的刻画。
另外需要注意的是,社区是一个子图,包含顶点和边。
下面我们以新浪微博用户对应的网络图为例,来介绍相应的社区发现算法。
这里在相互关注的用户之间建立连接关系,主要是为了简化模型,此时对应的图为无向图。
当然,我们也可以采用单向关注来建边,此时将对应有向图。
这个定义看起来很拗口,但通过层层推导,可以得到如下(4.2)的数学表达式。
定义中的随机网络也称为Null Model,其构造方法为:the null model used has so far been a random graph with the same number of nodes, the same number of edges and the same degree distribution as in the original graph, but with links among nodes randomly placed.注意,(4.2) 是针对无向图的,因此这里的m 表示无向边的条数,即若节点i 和节点j 有边相连,则节点(i, j) 对m 只贡献一条边。
标签传播算法(LPA)的做法比较简单:第一步: 为所有节点指定一个唯一的标签;第二步: 逐轮刷新所有节点的标签,直到达到收敛要求为止。
对于每一轮刷新,节点标签刷新的规则如下:对于某一个节点,考察其所有邻居节点的标签,并进行统计,将出现个数最多的那个标签赋给当前节点。
当个数最多的标签不唯一时,随机选一个。
注:算法中的记号N_n^k 表示节点n 的邻居中标签为k 的所有节点构成的集合。
SLPA 中引入了Listener和Speaker两个比较形象的概念,你可以这么来理解:在刷新节点标签的过程中,任意选取一个节点作为listener,则其所有邻居节点就是它的speaker 了,speaker 通常不止一个,一大群speaker 在七嘴八舌时,listener 到底该听谁的呢?这时我们就需要制定一个规则。
2022年 2月 February 2022Digital Technology &Application 第40卷 第2期Vol.40 No.2数字技术与应用102中图分类号:TP393 文献标识码:A 文章编号:1007-9416(2022)02-0102-03DOI:10.19695/12-1369.2022.02.34基于改进小生境粒子群的社区发现算法*广东茂名幼儿师范专科学校教育技术与网络中心 张金霜 黄旭彬社区发现对增加教育虚拟社区用户粘性,提高学习者学习成效具有积极作用。
为解决传统社区发现算法在复杂网络结构不清晰时划分效果不佳的问题,提出一种基于小生境的二进制粒子群优化算法NIBPSO。
算法将每个粒子编码作为社区发现的一种解,以模块度作为优化函数。
在粒子迭代过程中,选取粒子的邻域最优替代全局最优,同时根据粒子各维度的速度,采用轮盘赌算法确定粒子中各节点的社区归属。
通过控制粒子信息传播速度和范围,能有效解决粒子陷入局部最优,提高了社区发现效果。
实验表明,该算法获得较好的社区发现结果。
教育虚拟社区为学习者提供了一个开放的网络学习空间,学习者在这里交流信息、探讨问题、扩展思路、创新观念、达成共识,充分发挥集体智慧。
随着人工智能和大数据技术的发展,精准教育逐步变成现实,教育虚拟社区作为精准教育的载体,在教学过程中发挥了重要作用。
提升教育虚拟社区发现的效率和质量,有助于学习资源精准推送,促进学习者找到兴趣相投的学习伙伴,增加学习者对教育虚拟社区的粘性。
现实生活中,许多复杂系统都被抽象成复杂网络形式,如社交网络、论文引文网络、生物网络等。
复杂网络内部可划分为多个社区,社区内部的节点联系更加紧密,而社区间关系较为稀疏。
社区发现仍是目前复杂网络研究热点之一,社区划分的好坏影响社区价值的挖掘。
在研究初期,有学者将社区发现理解为图分割问题、聚类问题等,提出了各种社区发现算法,比较有代表性的包括GN分裂算法[1]、LPA标签传播算法[2-3]、随机游走算法、层次聚类算法、Fast Unfolding算法等。
Heterogeneous Network Community Detection Algorithm Based on Maximum BipartiteCliqueXiaodong Qian1(&),Lei Yang2,and Jinhao Fang31School of Economics and Business Administration,Lanzhou Jiaotong University,Lanzhou730070,Chinaqiaoxd@2School of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou730070,China3School of Institute of Electrical and Automation Engineering,Lanzhou Jiaotong University,Lanzhou730070,ChinaAbstract.The community mining of heterogeneous networks is a hot issue tostudy the big data of the network,and the original structure of the heterogeneousnetwork and its information can fully exploit the community structure in thenetwork.However,the existing algorithms mainly analysis one type of objectsin the heterogeneous network,and the algorithms about the heterogeneous nodeswhich constitute the community structure is rarely studied.Therefore,this paperintroduces the theory about maximum bipartite clique:Firstly,regarding thelargest maximum bipartite clique that the key node belongs to as initial com-munity.Then,the community is expanded based on the similarity between theneighbor node of the community and the initial community in quantitative.Finally,a reasonable community structure is mined and the simulation experi-ments are carried out on artificial networks and real heterogeneous networks.The experimental results show that the algorithm has relatively high communityaccuracy and modularity in community detection,which proves the rationalityand validity of the algorithm.Keywords:Heterogeneous networkÁCommunity detectionMaximum bipartite cliqueÁSimilarity1ForewordWith the rapid development of communication technology and IT technology,the scale of networks have gradually expanded,its structures have become more intricate, meanwhile,producing a huge number of information data,what we called:Big Data. The emergence of Big Data promotes human being’s technology transmit from Information Age to Data Age.Big Data in internet have penetrated into every corner of human’s life,especially in suchfields:society,economy,and scientific research;at the same time,it provides enough information bases for people to cognize,understand the material world[15].The huge wealth concealed in Big Data has attracted a lot of attention from scholars inside and outside the country.©Springer Nature Singapore Pte Ltd.2018Q.Zhou et al.(Eds.):ICPCSEE2018,CCIS901,pp.253–268,2018.https:///10.1007/978-981-13-2203-7_19。
基于共邻节点相似度的社区划分算法作者:付立东郝伟李丹李凡来源:《计算机应用》2019年第07期摘要:复杂网络中的社区结构能帮助人们认识网络的基本结构及其功能。
针对目前多数社区划分算法准确率低、复杂度高的问题,提出了一种基于共邻节点相似度的社区划分算法。
首先,为了计算节点间相似度值,提出了相似度模型,该模型通过将被测节点对的邻居节点引入一并计算,提高了相似度度量的准确性;然后,计算节点局部影响力值,能客观地表现出节点在所处网络中的重要性;其次,结合节点相似度值和节点局部影响力值对节点进行层次聚类,完成网络社区结构的初步划分;最后,通过聚合初步划分的子社区,获得复杂网络的最优模块度值。
仿真结果表明,在网络的社区特征模糊时,与新的基于局部相似度的社区发现算法(CDALS)相比,所提算法的准确率提高了14%,证明了所提提法更能够准确、有效地划分复杂网络的社区结构。
关键词:共邻节点;相似度度量;节点局部影响力;模块度;社区划分Abstract: The community structure in complex networks can help people recognize basic structure and functions of network. Aiming at the problems of low accuracy and high complexity of most community division algorithms, a community division algorithm based on similarity of common neighbor nodes was proposed. Firstly, a similarity model was proposed in order to calculate the similarity between nodes. In the model, the accuracy of similarity measurement was improved by calculating the tested node pairs and their neighbor nodes together. Secondly, local influence values of nodes were calculated, objectively showing the importances of nodes in the network. Thirdly, the nodes were hierarchically clustered according to the similarity and local influence values of nodes, and preliminary division of network community structure was completed. Finally, the preliminary divided sub-communities were clustered until the optimal modularity value was obtained. The simulation results show that compared with the new Community Detection Algorithm based on Local Similarity (CDALS), the proposed algorithm has the accuuracy improved by 14%, which proves that the proposed algorithm can divide the community structure of complex networks accurately and effectively.Key words: common neighbor node; similarity measurement; local influence of node; modularity; community division0 引言復杂网络是复杂系统的典型表现形式,得到了经济学、生物学和社会学等不同学科学者的广泛关注,成为当今的热点研究课题[1-2],而通过网络的社区结构来分析研究网络的功能和特征,已经成为复杂网络的研究趋势。
a r X i v :p h y s i c s /0602124v 1 [p h y s i c s .d a t a -a n ] 17 F eb 2006Modularity and community structure in networksM. E.J.NewmanDepartment of Physics and Center for the Study of Complex Systems,Randall Laboratory,University of Michigan,Ann Arbor,MI 48109–1040Many networks of interest in the sciences,including a variety of social and biological networks,are found to divide naturally into communities or modules.The problem of detecting and characterizing this community structure has attracted considerable recent attention.One of the most sensitive detection methods is optimization of the quality function known as “modularity”over the possible divisions of a network,but direct application of this method using,for instance,simulated annealing is computationally costly.Here we show that the modularity can be reformulated in terms of the eigenvectors of a new characteristic matrix for the network,which we call the modularity matrix,and that this reformulation leads to a spectral algorithm for community detection that returns results of better quality than competing methods in noticeably shorter running times.We demonstrate the algorithm with applications to several network data sets.IntroductionMany systems of scientific interest can be represented as networks—sets of nodes or vertices joined in pairs by lines or edges .Examples include the Internet and the worldwide web,metabolic networks,food webs,neural networks,communication and distribution networks,and social networks.The study of networked systems has a history stretching back several centuries,but it has expe-rienced a particular surge of interest in the last decade,especially in the mathematical sciences,partly as a result of the increasing availability of large-scale accurate data describing the topology of networks in the real world.Statistical analyses of these data have revealed some un-expected structural features,such as high network tran-sitivity [1],power-law degree distributions [2],and the existence of repeated local motifs [3];see [4,5,6]for reviews.One issue that has received a considerable amount of attention is the detection and characterization of com-munity structure in networks [7,8],meaning the appear-ance of densely connected groups of vertices,with only sparser connections between groups (Fig.1).The abil-ity to detect such groups could be of significant practical importance.For instance,groups within the worldwide web might correspond to sets of web pages on related top-ics [9];groups within social networks might correspond to social units or communities [10].Merely the finding that a network contains tightly-knit groups at all can convey useful information:if a metabolic network were divided into such groups,for instance,it could provide evidence for a modular view of the network’s dynamics,with dif-ferent groups of nodes performing different functions with some degree of independence [11,12].Past work on methods for discovering groups in net-works divides into two principal lines of research,both with long histories.The first,which goes by the name of graph partitioning ,has been pursued particularly in computer science and related fields,with applications in parallel computing and VLSI design,among other ar-eas [13,14].The second,identified by names such as blockFIG.1:The vertices in many networks fall naturally into groups or communities,sets of vertices (shaded)within which there are many edges,with only a smaller number of edges between vertices of different groups.modeling ,hierarchical clustering ,or community structure detection ,has been pursued by sociologists and more re-cently also by physicists and applied mathematicians,with applications especially to social and biological net-works [7,15,16].It is tempting to suggest that these two lines of re-search are really addressing the same question,albeit by somewhat different means.There are,however,impor-tant differences between the goals of the two camps that make quite different technical approaches desirable.A typical problem in graph partitioning is the division of a set of tasks between the processors of a parallel computer so as to minimize the necessary amount of interprocessor communication.In such an application the number of processors is usually known in advance and at least an approximate figure for the number of tasks that each pro-cessor can handle.Thus we know the number and size of the groups into which the network is to be split.Also,the goal is usually to find the best division of the network re-gardless of whether a good division even exists—there is little point in an algorithm or method that fails to divide the network in some cases.Community structure detection,by contrast,is per-2haps best thought of as a data analysis technique used to shed light on the structure of large-scale network datasets,such as social networks,Internet and web data, or biochemical munity structure meth-ods normally assume that the network of interest divides naturally into subgroups and the experimenter’s job is to find those groups.The number and size of the groups is thus determined by the network itself and not by the experimenter.Moreover,community structure methods may explicitly admit the possibility that no good division of the network exists,an outcome that is itself considered to be of interest for the light it sheds on the topology of the network.In this paper our focus is on community structure de-tection in network datasets representing real-world sys-tems of interest.However,both the similarities and differences between community structure methods and graph partitioning will motivate many of the develop-ments that follow.The method of optimal modularity Suppose then that we are given,or discover,the struc-ture of some network and that we wish to determine whether there exists any natural division of its vertices into nonoverlapping groups or communities,where these communities may be of any size.Let us approach this question in stages and focus ini-tially on the problem of whether any good division of the network exists into just two communities.Perhaps the most obvious way to tackle this problem is to look for divisions of the vertices into two groups so as to mini-mize the number of edges running between the groups. This“minimum cut”approach is the approach adopted, virtually without exception,in the algorithms studied in the graph partitioning literature.However,as discussed above,the community structure problem differs crucially from graph partitioning in that the sizes of the commu-nities are not normally known in advance.If community sizes are unconstrained then we are,for instance,at lib-erty to select the trivial division of the network that puts all the vertices in one of our two groups and none in the other,which guarantees we will have zero intergroup edges.This division is,in a sense,optimal,but clearly it does not tell us anything of any worth.We can,if we wish,artificially forbid this solution,but then a division that puts just one vertex in one group and the rest in the other will often be optimal,and so forth.The problem is that simply counting edges is not a good way to quantify the intuitive concept of commu-nity structure.A good division of a network into com-munities is not merely one in which there are few edges between communities;it is one in which there are fewer than expected edges between communities.If the num-ber of edges between two groups is only what one would expect on the basis of random chance,then few thought-ful observers would claim this constitutes evidence of meaningful community structure.On the other hand,if the number of edges between groups is significantly less than we expect by chance—or equivalently if the number within groups is significantly more—then it is reasonable to conclude that something interesting is going on. This idea,that true community structure in a network corresponds to a statistically surprising arrangement of edges,can be quantified using the measure known as modularity[17].The modularity is,up to a multiplicative constant,the number of edges falling within groups mi-nus the expected number in an equivalent network with edges placed at random.(A precise mathematical formu-lation is given below.)The modularity can be either positive or negative,with positive values indicating the possible presence of com-munity structure.Thus,one can search for community structure precisely by looking for the divisions of a net-work that have positive,and preferably large,values of the modularity[18].The evidence so far suggests that this is a highly effective way to tackle the problem.For instance, Guimer`a and Amaral[12]and later Danon et al.[8]op-timized modularity over possible partitions of computer-generated test networks using simulated annealing.In di-rect comparisons using standard measures,Danon et al. found that this method outperformed all other methods for community detection of which they were aware,in most cases by an impressive margin.On the basis of con-siderations such as these we consider maximization of the modularity to be perhaps the definitive current method of community detection,being at the same time based on sensible statistical principles and highly effective in practice.Unfortunately,optimization by simulated annealing is not a workable approach for the large network problems facing today’s scientists,because it demands too much computational effort.A number of alternative heuris-tic methods have been investigated,such as greedy algo-rithms[18]and extremal optimization[19].Here we take a different approach based on a reformulation of the mod-ularity in terms of the spectral properties of the network of interest.Suppose our network contains n vertices.For a par-ticular division of the network into two groups let s i=1 if vertex i belongs to group1and s i=−1if it belongs to group2.And let the number of edges between ver-tices i and j be A ij,which will normally be0or1,al-though larger values are possible in networks where mul-tiple edges are allowed.(The quantities A ij are the el-ements of the so-called adjacency matrix.)At the same time,the expected number of edges between vertices i and j if edges are placed at random is k i k j/2m,where k i and k j are the degrees of the vertices and m=14m ijA ij−k i k j4m s T Bs,(1)where s is the vector whose elements are the s i.The leading factor of1/4m is merely conventional:it is in-cluded for compatibility with the previous definition of modularity[17].We have here defined a new real symmetric matrix B with elementsk i k jB ij=A ij−FIG.2:Application of our eigenvector-based method to the “karate club”network of Ref.[23].Shapes of vertices indi-cate the membership of the corresponding individuals in the two known factions of the network while the dotted line indi-cates the split found by the algorithm,which matches the fac-tions exactly.The shades of the vertices indicate the strength of their membership,as measured by the value of the corre-sponding element of the eigenvector.groups,but to place them on a continuous scale of“how much”they belong to one group or the other.As an example of this algorithm we show in Fig.2the result of its application to a famous network from the so-cial science literature,which has become something of a standard test for community detection algorithms.The network is the“karate club”network of Zachary[23], which shows the pattern of friendships between the mem-bers of a karate club at a US university in the1970s. This example is of particular interest because,shortly after the observation and construction of the network, the club in question split in two as a result of an inter-nal dispute.Applying our eigenvector-based algorithm to the network,wefind the division indicated by the dotted line in thefigure,which coincides exactly with the known division of the club in real life.The vertices in Fig.2are shaded according to the val-ues of the elements in the leading eigenvector of the mod-ularity matrix,and these values seem also to accord well with known social structure within the club.In partic-ular,the three vertices with the heaviest weights,either positive or negative(black and white vertices in thefig-ure),correspond to the known ringleaders of the two fac-tions.Dividing networks into more than two communities In the preceding section we have given a simple matrix-based method forfinding a good division of a network into two parts.Many networks,however,contain more than two communities,so we would like to extend our method tofind good divisions of networks into larger numbers of parts.The standard approach to this prob-lem,and the one adopted here,is repeated division into two:we use the algorithm of the previous sectionfirst to divide the network into two parts,then divide those parts,and so forth.In doing this it is crucial to note that it is not correct, afterfirst dividing a network in two,to simply delete the edges falling between the two parts and then apply the algorithm again to each subgraph.This is because the degrees appearing in the definition,Eq.(1),of the mod-ularity will change if edges are deleted,and any subse-quent maximization of modularity would thus maximize the wrong quantity.Instead,the correct approach is to define for each subgraph g a new n g×n g modularity matrix B(g),where n g is the number of vertices in the subgraph.The correct definition of the element of this matrix for vertices i,j isB(g)ij=A ij−k i k j2m ,(4)where k(g)i is the degree of vertex i within subgraph g and d g is the sum of the(total)degrees k i of the vertices in the subgraph.Then the subgraph modularity Q g=s T B(g)s correctly gives the additional contribution to the total modularity made by the division of this subgraph.In particular,note that if the subgraph is undivided,Q g is correctly zero.Note also that for a complete network Eq.(4)reduces to the previous definition for the modu-larity matrix,Eq.(2),since k(g)i→k i and d g→2m in that case.In repeatedly subdividing our network,an important question we need to address is at what point to halt the subdivision process.A nice feature of our method is that it provides a clear answer to this question:if there exists no division of a subgraph that will increase the modular-ity of the network,or equivalently that gives a positive value for Q g,then there is nothing to be gained by divid-ing the subgraph and it should be left alone;it is indi-visible in the sense of the previous section.This happens when there are no positive eigenvalues to the matrix B(g), and thus our leading eigenvalue provides a simple check for the termination of the subdivision process:if the lead-ing eigenvalue is zero,which is the smallest value it can take,then the subgraph is indivisible.Note,however,that while the absence of positive eigen-values is a sufficient condition for indivisibility,it is not a necessary one.In particular,if there are only small positive eigenvalues and large negative ones,the terms in Eq.(3)for negativeβi may outweigh those for positive.It is straightforward to guard against this possibility,how-ever:we simply calculate the modularity contribution for each proposed split directly and confirm that it is greater than zero.Thus our algorithm is as follows.We construct the modularity matrix for our network andfind its leading (most positive)eigenvalue and eigenvector.We divide the network into two parts according to the signs of the elements of this vector,and then repeat for each of the parts.If at any stage wefind that the proposed split makes a zero or negative contribution to the total mod-5ularity,we leave the corresponding subgraph undivided. When the entire network has been decomposed into in-divisible subgraphs in this way,the algorithm ends. One immediate corollary of this approach is that all “communities”in the network are,by definition,indi-visible subgraphs.A number of authors have in the past proposed formal definitions of what a community is[9,16,24].The present method provides an alter-native,first-principles definition of a community as an indivisible subgraph.Further techniques for modularity maximization In this section we describe briefly another method we have investigated for dividing networks in two by mod-ularity optimization,which is entirely different from our spectral method.Although not of especial interest on its own,this second method is,as we will shortly show,very effective when combined with the spectral method.Let us start with some initial division of our vertices into two groups:the most obvious choice is simply to place all vertices in one of the groups and no vertices in the other.Then we proceed as follows.Wefind among the vertices the one that,when moved to the other group, will give the biggest increase in the modularity of the complete network,or the smallest decrease if no increase is possible.We make such moves repeatedly,with the constraint that each vertex is moved only once.When all n vertices have been moved,we search the set of in-termediate states occupied by the network during the operation of the algorithm tofind the state that has the greatest modularity.Starting again from this state,we repeat the entire process iteratively until no further im-provement in the modularity results.Those familiar with the literature on graph partitioning mayfind this algo-rithm reminiscent of the Kernighan–Lin algorithm[25], and indeed the Kernighan–Lin algorithm provided the inspiration for our method.Despite its simplicity,wefind that this method works moderately well.It is not competitive with the best pre-vious methods,but it gives respectable modularity val-ues in the trial applications we have made.However, the method really comes into its own when it is used in combination with the spectral method introduced ear-lier.It is a common approach in standard graph par-titioning problems to use spectral partitioning based on the graph Laplacian to give an initial broad division of a network into two parts,and then refine that division us-ing the Kernighan–Lin algorithm.For community struc-ture problems wefind that the equivalent joint strategy works very well.Our spectral approach based on the leading eigenvector of the modularity matrix gives an ex-cellent guide to the general form that the communities should take and this general form can then befine-tuned by our vertex moving method,to reach the best possible modularity value.The whole procedure is repeated to subdivide the network until every remaining subgraph is indivisible,and no further improvement in the modular-ity is possible.Typically,thefine-tuning stages of the algorithm add only a few percent to thefinal value of the modularity, but those few percent are enough to make the difference between a method that is merely good and one that is, as we will see,exceptional.Example applicationsIn practice,the algorithm developed here gives excel-lent results.For a quantitative comparison between our algorithm and others we follow Duch and Arenas[19] and compare values of the modularity for a variety of networks drawn from the literature.Results are shown in Table I for six different networks—the exact same six as used by Duch and Arenas.We compare mod-ularityfigures against three previously published algo-rithms:the betweenness-based algorithm of Girvan and Newman[10],which is widely used and has been incor-porated into some of the more popular network analysis programs(denoted GN in the table);the fast algorithm of Clauset et al.[26](CNM),which optimizes modularity using a greedy algorithm;and the extremal optimization algorithm of Duch and Arenas[19](DA),which is ar-guably the best previously existing method,by standard measures,if one discounts methods impractical for large networks,such as exhaustive enumeration of all parti-tions or simulated annealing.The table reveals some interesting patterns.Our al-gorithm clearly outperforms the methods of Girvan and Newman and of Clauset et al.for all the networks in the task of optimizing the modularity.The extremal opti-mization method on the other hand is more competitive. For the smaller networks,up to around a thousand ver-tices,there is essentially no difference in performance be-tween our method and extremal optimization;the mod-ularity values for the divisions found by the two algo-rithms differ by no more than a few parts in a thousand for any given network.For larger networks,however,our algorithm does better than extremal optimization,and furthermore the gap widens as network size increases, to a maximum modularity difference of about a6%for the largest network studied.For the very large networks that have been of particular interest in the last few years, therefore,it appears that our method for detecting com-munity structure may be the most effective of the meth-ods considered here.The modularity values given in Table I provide a use-ful quantitative measure of the success of our algorithm when applied to real-world problems.It is worthwhile, however,also to confirm that it returns sensible divisions of networks in practice.We have given one example demonstrating such a division in Fig.2.We have also checked our method against many of the example net-works used in previous studies[10,17].Here we give two more examples,both involving network representationsmodularity Q network GN CNM DA this paper3419845311331068027519maximal value of the quantity known as modularity over possible divisions of a network.We have shown that this problem can be rewritten in terms of the eigenval-ues and eigenvectors of a matrix we call the modularity matrix,and by exploiting this transformation we have created a new computer algorithm for community de-tection that demonstrably outperforms the best previ-ous general-purpose algorithms in terms of both quality of results and speed of execution.We have applied our algorithm to a variety of real-world network data sets, including social and biological examples,showing it to give both intuitively reasonable divisions of networks and quantitatively better results as measured by the modu-larity.AcknowledgmentsThe author would like to thank Lada Adamic,Alex Arenas,and Valdis Krebs for providing network data and for useful comments and suggestions.This work was funded in part by the National Science Foundation un-der grant number DMS–0234188and by the James S. McDonnell Foundation.[1]D.J.Watts and S.H.Strogatz,Collective dynamics of‘small-world’networks.Nature393,440–442(1998). [2]A.-L.Barab´a si and R.Albert,Emergence of scaling inrandom networks.Science286,509–512(1999).[3]o,S.Shen-Orr,S.Itzkovitz,N.Kashtan,D.Chklovskii,and U.Alon,Network motifs:Simplebuilding blocks of complex networks.Science298,824–827(2002).[4]R.Albert and A.-L.Barab´a si,Statistical mechanics ofcomplex networks.Rev.Mod.Phys.74,47–97(2002).[5]S.N.Dorogovtsev and J.F.F.Mendes,Evolution ofnetworks.Advances in Physics51,1079–1187(2002). [6]M.E.J.Newman,The structure and function of complexnetworks.SIAM Review45,167–256(2003).[7]M.E.J.Newman,Detecting community structure in net-works.Eur.Phys.J.B38,321–330(2004).[8]L.Danon,J.Duch, A.Diaz-Guilera,and A.Arenas,Comparing community structure identification.J.Stat.Mech.p.P09008(2005).[9]G.W.Flake,wrence,C.L.Giles,and F.M.Co-etzee,Self-organization and identification of Web com-munities.IEEE Computer35,66–71(2002).[10]M.Girvan and M.E.J.Newman,Community structurein social and biological networks.Proc.Natl.Acad.Sci.USA99,7821–7826(2002).[11]P.Holme,M.Huss,and H.Jeong,Subnetwork hierar-chies of biochemical pathways.Bioinformatics19,532–538(2003).[12]R.Guimer`a and L.A.N.Amaral,Functional cartogra-phy of complex metabolic networks.Nature433,895–900 (2005).[13]U.Elsner,Graph partitioning—a survey.Technical Re-port97-27,Technische Universit¨a t Chemnitz(1997). [14]P.-O.Fj¨a llstr¨o m,Algorithms for graph partitioning:Asurvey.Link¨o ping Electronic Articles in Computer and Information Science3(10)(1998).[15]H.C.White,S.A.Boorman,and R.L.Breiger,Socialstructure from multiple networks:I.Blockmodels of roles and positions.Am.J.Sociol.81,730–779(1976). [16]S.Wasserman and K.Faust,Social Network Analysis.Cambridge University Press,Cambridge(1994).[17]M.E.J.Newman and M.Girvan,Finding and evaluat-ing community structure in networks.Phys.Rev.E69, 026113(2004).[18]M.E.J.Newman,Fast algorithm for detecting com-munity structure in networks.Phys.Rev.E69,066133 (2004).[19]J.Duch and A.Arenas,Community detection in complexnetworks using extremal optimization.Phys.Rev.E72, 027104(2005).[20]F.R.K.Chung,Spectral Graph Theory.Number92in CBMS Regional Conference Series in Mathematics, American Mathematical Society,Providence,RI(1997).[21]M.Fiedler,Algebraic connectivity of graphs.Czech.Math.J.23,298–305(1973).[22]A.Pothen,H.Simon,and K.-P.Liou,Partitioning sparsematrices with eigenvectors of graphs.SIAM J.Matrix Anal.Appl.11,430–452(1990).[23]W.W.Zachary,An informationflow model for conflictandfission in small groups.Journal of Anthropological Research33,452–473(1977).[24]F.Radicchi,C.Castellano,F.Cecconi,V.Loreto,andD.Parisi,Defining and identifying communities in net-A101,2658–2663 (2004).[25]B.W.Kernighan and S.Lin,An efficient heuristic proce-dure for partitioning graphs.Bell System Technical Jour-nal49,291–307(1970).[26]A.Clauset,M.E.J.Newman,and C.Moore,Findingcommunity structure in very large networks.Phys.Rev.E70,066111(2004).[27]P.Gleiser and L.Danon,Community structure in jazz.Advances in Complex Systems6,565–573(2003). [28]H.Jeong,B.Tombor,R.Albert,Z.N.Oltvai,and A.-L.Barab´a si,The large-scale organization of metabolic networks.Nature407,651–654(2000).[29]H.Ebel,L.-I.Mielsch,and S.Bornholdt,Scale-free topol-ogy of e-mail networks.Phys.Rev.E66,035103(2002).[30]X.Guardiola,R.Guimer`a,A.Arenas,A.Diaz-Guilera,D.Streib,and L. A.N.Amaral,Macro-and micro-structure of trust networks.Preprint cond-mat/0206240 (2002).[31]M.E.J.Newman,The structure of scientific collabora-tion A98,404–409 (2001).[32]L.A.Adamic and N.Glance,The political blogosphereand the2004us election.In Proceedings of the WWW-2005Workshop on the Weblogging Ecosystem(2005).。