Random-Graphs-with-Arbitrary-Degree-Distributions-and-Their-Applications
- 格式:pdf
- 大小:211.62 KB
- 文档页数:17
英⽂论⽂写作中⼀些可能⽤到的词汇英⽂论⽂写作过程中总是被⾃⼰可怜的词汇量击败, 所以我打算在这⾥记录⼀些在阅读论⽂过程中见到的⼀些⾃⼰不曾见过的词句或⽤法。
这些词句查词典都很容易查到,但是只有带⼊论⽂原⽂中才能体会内涵。
毕竟原⽂和译⽂中间总是存在⼀条看不见的思想鸿沟。
形容词1. vanilla: adj. 普通的, 寻常的, 毫⽆特⾊的. ordinary; not special in any way.2. crucial: adj. ⾄关重要的, 关键性的.3. parsimonious:adj. 悭吝的, 吝啬的, ⼩⽓的.e.g. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity.4. diverse: adj. 不同的, 相异的, 多种多样的, 形形⾊⾊的.5. intriguing: adj. ⾮常有趣的, 引⼈⼊胜的; 神秘的. *intrigue: v. 激起…的兴趣, 引发…的好奇⼼; 秘密策划(加害他⼈), 密谋.e.g. The results of this paper carry several intriguing implications.6. intimate: adj. 亲密的; 密切的. v.透露; (间接)表⽰, 暗⽰.e.g. The above problems are intimately linked to machine learning on graphs.7. akin: adj. 类似的, 同族的, 相似的.e.g. Akin to GNN, in LOCAL a graph plays a double role: ...8. abundant: adj. ⼤量的, 丰盛的, 充裕的.9. prone: adj. 有做(坏事)的倾向; 易于遭受…的; 俯卧的.e.g. It is thus prone to oversmoothing when convolutions are applied repeatedly.10.concrete: adj. 混凝⼟制的; 确实的, 具体的(⽽⾮想象或猜测的); 有形的; 实在的.e.g. ... as a concrete example ...e.g. More concretely, HGCN applies the Euclidean non-linear activation in...11. plausible: adj. 有道理的; 可信的; 巧⾔令⾊的, 花⾔巧语的.e.g. ... this interpretation may be a plausible explanation of the success of the recently introduced methods.12. ubiquitous: adj. 似乎⽆所不在的;⼗分普遍的.e.g. While these higher-order interac- tions are ubiquitous, an evaluation of the basic properties and organizational principles in such systems is missing.13. disparate: adj. 由不同的⼈(或事物)组成的;迥然不同的;⽆法⽐较的.e.g. These seemingly disparate types of data have something in common: ...14. profound: adj. 巨⼤的; 深切的, 深远的; 知识渊博的; 理解深刻的;深邃的, 艰深的; ⽞奥的.e.g. This has profound consequences for network models of relational data — a cornerstone in the interdisciplinary study of complex systems.15. blurry: adj. 模糊不清的.e.g. When applying these estimators to solve (2), the line between the critic and the encoders $g_1, g_2$ can be blurry.16. amenable: adj. 顺从的; 顺服的; 可⽤某种⽅式处理的.e.g. Ou et al. utilize sparse generalized SVD to generate a graph embedding, HOPE, from a similarity matrix amenableto de- composition into two sparse proximity matrices.17. elaborate: adj. 复杂的;详尽的;精⼼制作的 v.详尽阐述;详细描述;详细制订;精⼼制作e.g. Topic Modeling for Graphs also requires elaborate effort, as graphs are relational while documents are indepen- dent samples.18. pivotal: adj. 关键性的;核⼼的e.g. To ensure the stabilities of complex systems is of pivotal significance toward reliable and better service providing.19. eminent: adj. 卓越的,著名的,显赫的;⾮凡的;杰出的e.g. To circumvent those defects, theoretical studies eminently represented by percolation theories appeared.20. indispensable: adj. 不可或缺的;必不可少的 n. 不可缺少的⼈或物e.g. However, little attention is paid to multipartite networks, which are an indispensable part of complex networks.21. post-hoc: adj. 事后的e.g. Post-hoc explainability typically considers the question “Why the GNN predictor made certain prediction?”.22. prevalent: adj. 流⾏的;盛⾏的;普遍存在的e.g. A prevalent solution is building an explainer model to conduct feature attribution23. salient: adj. 最重要的;显著的;突出的. n. 凸⾓;[建]突出部;<军>进攻或防卫阵地的突出部分e.g. It decomposes the prediction into the contributions of the input features, which redistributes the probability of features according to their importance and sample the salient features as an explanatory subgraph.24. rigorous: adj. 严格缜密的;严格的;谨慎的;细致的;彻底的;严厉的e.g. To inspect the OOD effect rigorously, we take a causal look at the evaluation process with a Structural Causal Model.25. substantial: adj. ⼤量的;价值巨⼤的;重⼤的;⼤⽽坚固的;结实的;牢固的. substantially: adv. ⾮常;⼤⼤地;基本上;⼤体上;总的来说26. cogent: adj. 有说服⼒的;令⼈信服的e.g. The explanatory subgraph $G_s$ emphasizes tokens like “weak” and relations like “n’t→funny”, which is cogent according to human knowledge.27. succinct: adj. 简练的;简洁的 succinctly: adv. 简⽽⾔之,简明扼要地28. concrete: adj. 混凝⼟制的;确实的,具体的(⽽⾮想象或猜测的);有形的;实在的 concretely: adv. 具体地;具体;具体的;有形地29. predominant:adj. 主要的;主导的;显著的;明显的;盛⾏的;占优势的动词1. mitigate: v. 减轻, 缓和. (反 enforce)e.g. In this work, we focus on mitigating this problem for a certain class of symbolic data.2. corroborate: v. [VN] [often passive] (formal) 证实, 确证.e.g. This is corroborated by our experiments on real-world graph.3. endeavor: n./v. 努⼒, 尽⼒, 企图, 试图.e.g. It encourages us to continue the endeavor in applying principles mathematics and theory in successful deployment of deep learning.4. augment: v. 增加, 提⾼, 扩⼤. n. 增加, 补充物.e.g. We also augment the graph with geographic information (longitude, latitude and altitude), and GDP of the country where the airport belongs to.5. constitute: v. (被认为或看做)是, 被算作; 组成, 构成; (合法或正式地)成⽴, 设⽴.6. abide: v. 接受, 遵照(规则, 决定, 劝告); 逗留, 停留.e.g. Training a graph classifier entails identifying what constitutes a class, i.e., finding properties shared by graphs in one class but not the other, and then deciding whether new graphs abide to said learned properties.7. entail: v. 牵涉; 需要; 使必要. to involve sth that cannot be avoided.e.g. Due to the recursive definition of the Chebyshev polynomials, the computation of the filter $g_α(\Delta)f$ entails applying the Laplacian $r$ times, resulting cal operator affecting only 1-hop neighbors of a vertex and in $O(rn)$ operations.8. encompass: v. 包含, 包括, 涉及(⼤量事物); 包围, 围绕, 围住.e.g. This model is chosen as it is sufficiently general to encompass several state-of-the-art networks.e.g. The k-cycle detection problem entails determining if G contains a k-cycle.9. reveal: v. 揭⽰, 显⽰, 透露, 显出, 露出, 展⽰.10. bestow: v. 将(…)给予, 授予, 献给.e.g. Aiming to bestow GCNs with theoretical guarantees, one promising research direction is to study graph scattering transforms (GSTs).11. alleviate: v. 减轻, 缓和, 缓解.12. investigate: v. 侦查(某事), 调查(某⼈), 研究, 调查.e.g. The sensitivity of pGST to random and localized noise is also investigated.13. fuse: v. (使)融合, 熔接, 结合; (使)熔化, (使保险丝熔断⽽)停⽌⼯作.e.g. We then fuse the topological embeddings with the initial node features into the initial query representations using a query network$f_q$ implemented as a two-layer feed-forward neural network.14. magnify: v. 放⼤, 扩⼤; 增强; 夸⼤(重要性或严重性); 夸张.e.g. ..., adding more layers also leads to more parameters which magnify the potential of overfitting.15. circumvent: v. 设法回避, 规避; 绕过, 绕⾏.e.g. To circumvent the issue and fulfill both goals simultaneously, we can add a negative term...16. excel: v. 擅长, 善于; 突出; 胜过平时.e.g. Nevertheless, these methods have been repeatedly shown to excel in practice.17. exploit: v. 利⽤(…为⾃⼰谋利); 剥削, 压榨; 运⽤, 利⽤; 发挥.e.g. In time series and high-dimensional modeling, approaches that use next step prediction exploit the local smoothness of the signal.18. regulate: v. (⽤规则条例)约束, 控制, 管理; 调节, 控制(速度、压⼒、温度等).e.g. ... where $b >0$ is a parameter regulating the probability of this event.19. necessitate: v. 使成为必要.e.g. Combinatorial models reproduce many-body interactions, which appear in many systems and necessitate higher-order models that capture information beyond pairwise interactions.20. portray:描绘, 描画, 描写; 将…描写成; 给⼈以某种印象; 表现; 扮演(某⾓⾊).e.g. Considering pairwise interactions, a standard network model would portray the link topology of the underlying system as shown in Fig. 2b.21. warrant: v. 使有必要; 使正当; 使恰当. n. 执⾏令; 授权令; (接受款项、服务等的)凭单, 许可证; (做某事的)正当理由, 依据.e.g. Besides statistical methods that can be used to detect correlations that warrant higher-order models, ... (除了可以⽤来检测⽀持⾼阶模型的相关性的统计⽅法外, ...)22. justify: v. 证明…正确(或正当、有理); 对…作出解释; 为…辩解(或辩护); 调整使全⾏排满; 使每⾏排齐.e.g. ..., they also come with the assumption of transitive, Markovian paths, which is not justified in many real systems.23. hinder:v. 阻碍; 妨碍; 阻挡. (反 foster: v. 促进; 助长; 培养; ⿎励; 代养, 抚育, 照料(他⼈⼦⼥⼀段时间))e.g. The eigenvalues and eigenvectors of these matrix operators capture how the topology of a system influences the efficiency of diffusion and propagation processes, whether it enforces or mitigates the stability of dynamical systems, or if it hinders or fosters collective dynamics.24. instantiate:v. 例⽰;⽤具体例⼦说明.e.g. To learn the representation we instantiate (2) and split each input MNIST image into two parts ...25. favor:v. 赞同;喜爱, 偏爱; 有利于, 便于. n. 喜爱, 宠爱, 好感, 赞同; 偏袒, 偏爱; 善⾏, 恩惠.26. attenuate: v. 使减弱; 使降低效⼒.e.g. It therefore seems that the bounds we consider favor hard-to-invert encoders, which heavily attenuate part of the noise, over well conditioned encoders.27. elucidate:v. 阐明; 解释; 说明.e.g. Secondly, it elucidates the importance of appropriately choosing the negative samples, which is indeed a critical component in deep metric learning based on triplet losses.28. violate: v. 违反, 违犯, 违背(法律、协议等); 侵犯(隐私等); 使⼈不得安宁; 搅扰; 亵渎, 污损(神圣之地).e.g. Negative samples are obtained by patches from different images as well as patches from the same image, violating the independence assumption.29. compel:v. 强迫, 迫使; 使必须; 引起(反应).30. gauge: v. 判定, 判断(尤指⼈的感情或态度); (⽤仪器)测量, 估计, 估算. n. 测量仪器(或仪表);计量器;宽度;厚度;(枪管的)⼝径e.g. Yet this hyperparameter-tuned approach raises a cubic worst-case space complexity and compels the user to traverse several feature sets and gauge the one that attains the best performance in the downstream task.31. depict: v. 描绘, 描画; 描写, 描述; 刻画.e.g. As they depict different aspects of a node, it would take elaborate designs of graph convolutions such that each set of features would act as a complement to the other.32. sketch: n. 素描;速写;草图;幽默短剧;⼩品;简报;概述 v. 画素描;画速写;概述;简述e.g. Next we sketch how to apply these insights to learning topic models.33. underscore:v. 在…下⾯划线;强调;着重说明 n.下划线e.g. Moreover, the walk-topic distributions generated by Graph Anchor LDA are indeed sharper than those by ordinary LDA, underscoring the need for selecting anchors.34. disclose: v. 揭露;透露;泄露;使显露;使暴露e.g. Another drawback lies in their unexplainable nature, i.e., they cannot disclose the sciences beneath network dynamics.35. coincide: v. 同时发⽣;相同;相符;极为类似;相接;相交;同位;位置重合;重叠e.g. The simulation results coincide quite well with the theoretical results.36. inspect: v. 检查;查看;审视;视察 to look closely at sth/sb, especially to check that everything is as it should be名词1. capacity: n. 容量, 容积, 容纳能⼒; 领悟(或理解、办事)能⼒; 职位, 职责.e.g. This paper studies theoretically the computational capacity limits of graph neural networks (GNN) falling within the message-passing framework of Gilmer et al. (2017).2. implication: n. 可能的影响(或作⽤、结果); 含意, 暗指; (被)牵连, 牵涉.e.g. Section 4 analyses the implications of restricting the depth $d$ and width $w$ of GNN that do not use a readout function.3. trade-off:(在需要⽽⼜相互对⽴的两者间的)权衡, 协调.e.g. This reveals a direct trade-off between the depth and width of a graph neural network.4. cornerstone:n. 基⽯; 最重要部分; 基础; 柱⽯.5. umbrella: n. 伞; 综合体; 总体, 整体; 保护, 庇护(体系).e.g. Community detection is an umbrella term for a large number of algorithms that group nodes into distinct modules to simplify and highlight essential structures in the network topology.6. folklore:n. 民间传统, 民俗; 民间传说.e.g. It is folklore knowledge that maximizing MI does not necessarily lead to useful representations.7. impediment:n. 妨碍,阻碍,障碍; ⼝吃.e.g. While a recent approach overcomes this impediment, it results in poor quality in prediction tasks due to its linear nature.8. obstacle:n. 障碍;阻碍; 绊脚⽯; 障碍物; 障碍栅栏.e.g. However, several major obstacles stand in our path towards leveraging topic modeling of structural patterns to enhance GCNs.9. vicinity:n. 周围地区; 邻近地区; 附近.e.g. The traits with which they engage are those that are performed in their vicinity.10. demerit: n. 过失,缺点,短处; (学校给学⽣记的)过失分e.g. However, their principal demerit is that their implementations are time-consuming when the studied network is large in size. Another介/副/连词1. notwithstanding:prep. 虽然;尽管 adv. 尽管如此.e.g. Notwithstanding this fundamental problem, the negative sampling strategy is often treated as a design choice.2. albeit: conj. 尽管;虽然e.g. Such methods rely on an implicit, albeit rigid, notion of node neighborhood; yet this one-size-fits-all approach cannot grapple with the diversity of real-world networks and applications.3. Hitherto:adv. 迄今;直到某时e.g. Hitherto, tremendous endeavors have been made by researchers to gauge the robustness of complex networks in face of perturbations.短语1.in a nutshell: 概括地说, 简⾔之, ⼀⾔以蔽之.e.g. In a nutshell, GNN are shown to be universal if four strong conditions are met: ...2. counter-intuitively: 反直觉地.3. on-the-fly:动态的(地), 运⾏中的(地).4. shed light on/into:揭⽰, 揭露; 阐明; 解释; 将…弄明⽩; 照亮.e.g. These contemporary works shed light into the stability and generalization capabilities of GCNs.e.g. Discovering roles and communities in networks can shed light on numerous graph mining tasks such as ...5. boil down to: 重点是; 将…归结为.e.g. These aforementioned works usually boil down to a general classification task, where the model is learnt on a training set and selected by checking a validation set.6. for the sake of:为了.e.g. The local structures anchored around each node as well as the attributes of nodes therein are jointly encoded with graph convolution for the sake of high-level feature extraction.7. dates back to:追溯到.e.g. The usual problem setup dates back at least to Becker and Hinton (1992).8. carry out:实施, 执⾏, 实⾏.e.g. We carry out extensive ablation studies and sensi- tivity analysis to show the effectiveness of the proposed functional time encoding and TGAT-layer.9. lay beyond the reach of:...能⼒达不到e.g. They provide us with information on higher-order dependencies between the components of a system, which lay beyond the reach of models that exclusively capture pairwise links.10. account for: ( 数量或⽐例上)占; 导致, 解释(某种事实或情况); 解释, 说明(某事); (某⼈)对(⾏动、政策等)负有责任; 将(钱款)列⼊(预算).e.g. Multilayer models account for the fact that many real complex systems exhibit multiple types of interactions.11. along with: 除某物以外; 随同…⼀起, 跟…⼀起.e.g. Along with giving us the ability to reason about topological features including community structures or node centralities, network science enables us to understand how the topology of a system influences dynamical processes, and thus its function.12. dates back to:可追溯到.e.g. The usual problem setup dates back at least to Becker and Hinton (1992) and can conceptually be described as follows: ...13. to this end:为此⽬的;为此计;为了达到这个⽬标.e.g. To this end, we consider a simple setup of learning a representation of the top half of MNIST handwritten digit images.14. Unless stated otherwise:除⾮另有说明.e.g. Unless stated otherwise, we use a bilinear critic $f(x, y) = x^TWy$, set the batch size to $128$ and the learning rate to $10^{−4}$.15. As a reference point:作为参照.e.g. As a reference point, the linear classification accuracy from pixels drops to about 84% due to the added noise.16. through the lens of:透过镜头. (以...视⾓)e.g. There are (at least) two immediate benefits of viewing recent representation learning methods based on MI estimators through the lens of metric learning.17. in accordance with:符合;依照;和…⼀致.e.g. The metric learning view seems hence in better accordance with the observations from Section 3.2 than the MI view.It can be shown that the anchors selected by our Graph Anchor LDA are not only indicative of “topics” but are also in accordance with the actual graph structures.18. be akin to:近似, 类似, 类似于.e.g. Thus, our learning model is akin to complex contagion dynamics.19. to name a few:仅举⼏例;举⼏个来说.e.g. Multitasking, multidisciplinary work and multi-authored works, to name a few, are ingrained in the fabric of science culture and certainly multi-multi is expected in order to succeed and move up the scientific ranks.20. a handful of:⼀把;⼀⼩撮;少数e.g. A handful of empirical work has investigated the robustness of complex networks at the community level.21. wreak havoc: 破坏;肆虐;严重破坏;造成破坏;浩劫e.g. Failures on one network could elicit failures on its coupled networks, i.e., networks with which the focal network interacts, and eventually those failures would wreak havoc on the entire network.22. apart from: 除了e.g. We further posit that apart from node $a$ node $b$ has $k$ neighboring nodes.。
基于随机投影与加权稀疏表示残差的光照鲁棒人脸识别方法李燕;章玥【期刊名称】《计算机工程与科学》【年(卷),期】2018(040)011【摘要】针对人脸识别中的光照变化问题,利用随机投影对传统稀疏表示分类器进行改进,提出一种基于随机投影与加权稀疏表示残差的光照鲁棒人脸识别方法.通过对人脸图像进行光照规范化处理,尽量消除人脸图像上的恶劣光照,取得经光照校正的人脸样本后进行多次随机空间投影,进一步丰富样本的光照不变特征,以减小光照变化对人脸识别带来的影响.在此基础上,对利用单一残差分类的传统稀疏表示分类方法进行改进,样本经过多次随机投影和稀疏表示会产生多个样本特征和重构残差,利用样本特征的能量来确定各个重构残差的融合权值,最终得到一种稳定性和可靠性更强的加权残差.在Yale B和CMU PIE两个光照变化较大的人脸库上的实验结果表明,改进的方法具有较强的光照鲁棒性.与传统稀疏表示方法相比,本文提出的方法在Yale B人脸库上两组实验的平均识别率分别提高了25.76%和46.39%,在CMU PIE上的平均识别率提高了10%左右.【总页数】8页(P2015-2022)【作者】李燕;章玥【作者单位】华东师范大学教育部软硬件协同设计技术与应用工程研究中心,上海200062;华东师范大学教育部软硬件协同设计技术与应用工程研究中心,上海200062【正文语种】中文【中图分类】TP391.4【相关文献】1.组合范数正则化稀疏编码和自适应加权残差的鲁棒跟踪 [J], 孔军;成静;蒋敏;柳晨华;顾晓峰2.基于自步学习的加权稀疏表示人脸识别方法 [J], 王学军;王文剑;曹飞龙3.基于加权分块稀疏表示的光照鲁棒性人脸识别 [J], 范守科;朱明4.改进的基于残差加权的稀疏表示人脸识别 [J], 高志荣;熊承义;笪邦友5.基于多方向Gabor特征图稀疏表示的鲁棒人脸识别方法 [J], 徐望明;张培;伍世虔因版权原因,仅展示原文概要,查看原文内容请购买。
A phase transition for the diameter of the configuration modelRemco van der Hofstad∗Gerard Hooghiemstra†and Dmitri Znamenski‡August31,2007AbstractIn this paper,we study the configuration model(CM)with i.i.d.degrees.We establisha phase transition for the diameter when the power-law exponentτof the degrees satisfiesτ∈(2,3).Indeed,we show that forτ>2and when vertices with degree2are present withpositive probability,the diameter of the random graph is,with high probability,bounded frombelow by a constant times the logarithm of the size of the graph.On the other hand,assumingthat all degrees are at least3or more,we show that,forτ∈(2,3),the diameter of the graphis,with high probability,bounded from above by a constant times the log log of the size of thegraph.1IntroductionRandom graph models for complex networks have received a tremendous amount of attention in the past decade.See[1,22,26]for reviews on complex networks and[2]for a more expository account.Measurements have shown that many real networks share two fundamental properties. Thefirst is the fact that typical distances between vertices are small,which is called the‘small world’phenomenon(see[27]).For example,in the Internet,IP-packets cannot use more than a threshold of physical links,and if the distances in terms of the physical links would be large,e-mail service would simply break down.Thus,the graph of the Internet has evolved in such a way that typical distances are relatively small,even though the Internet is rather large.The second and maybe more surprising property of many networks is that the number of vertices with degree k falls offas an inverse power of k.This is called a‘power law degree sequence’,and resulting graphs often go under the name‘scale-free graphs’(see[15]for a discussion where power laws occur in the Internet).The observation that many real networks have the above two properties has incited a burst of activity in network modelling using random graphs.These models can,roughly speaking,be divided into two distinct classes of models:‘static’models and’dynamic’models.In static models, we model with a graph of a given size a snap-shot of a real network.A typical example of this kind of model is the configuration model(CM)which we describe below.A related static model, which can be seen as an inhomogeneous version of the Erd˝o s-R´e nyi random graph,is treated in great generality in[4].Typical examples of the‘dynamical’models,are the so-called preferential attachment models(PAM’s),where added vertices and edges are more likely to be attached to vertices that already have large degrees.PAM’s often focus on the growth of the network as a way to explain the power law degree sequences.∗Department of Mathematics and Computer Science,Eindhoven University of Technology,P.O.Box513,5600 MB Eindhoven,The Netherlands.E-mail:rhofstad@win.tue.nl†Delft University of Technology,Electrical Engineering,Mathematics and Computer Science,P.O.Box5031,2600 GA Delft,The Netherlands.E-mail:G.Hooghiemstra@ewi.tudelft.nl‡EURANDOM,P.O.Box513,5600MB Eindhoven,The Netherlands.E-mail:znamenski@eurandom.nlPhysicists have predicted that distances in PAM’s behave similarly to distances in the CM with similar degrees.Distances in the CM have attracted considerable attention(see e.g.,[14,16,17,18]), but distances in PAM’s far less(see[5,19]),which makes it hard to verify this prediction.Together with[19],the current paper takes afirst step towards a rigorous verification of this conjecture.At the end of this introduction we will return to this observation,but let usfirst introduce the CM and present our diameter results.1.1The configuration modelThe CM is defined as follows.Fix an integer N.Consider an i.i.d.sequence of random variables D1,D2,...,DN.We will construct an undirected graph with N vertices where vertex j has degreeD j.We will assume that LN =Nj=1D j is even.If LNis odd,then we will increase DNby1.Thissingle change will make hardly any difference in what follows,and we will ignore this effect.We will later specify the distribution of D1.To construct the graph,we have N separate vertices and incident to vertex j,we have D j stubs or half-edges.The stubs need to be paired to construct the graph.We number the stubs in a givenorder from1to LN .We start by pairing at random thefirst stub with one of the LN−1remainingstubs.Once paired,two stubs form a single edge of the graph.Hence,a stub can be seen as the left-or the right-half of an edge.We continue the procedure of randomly choosing and pairing the stubs until all stubs are connected.Unfortunately,vertices having self-loops,as well as multiple edges between vertices,may occur,so that the CM is a multigraph.However,self-loops are scarce when N→∞,as shown e.g.in[7].The above model is a variant of the configuration model[3],which,given a degree sequence,is the random graph with that given degree sequence.The degree sequence of a graph is the vector of which the k th coordinate equals the fraction of vertices with degree k.In our model,by the law of large numbers,the degree sequence is close to the distribution of the nodal degree D of which D1,...,DNare i.i.d.copies.The probability mass function and the distribution function of the nodal degree law are denoted byP(D=k)=f k,k=1,2,...,and F(x)= xk=1f k,(1.1)where x is the largest integer smaller than or equal to x.We pay special attention to distributions of the form1−F(x)=x1−τL(x),(1.2) whereτ>2and L is slowly varying at infinity.This means that the random variables D j obey a power law,and the factor L is meant to generalize the model.We denote the expectation of D byµ,i.e.,µ=∞k=1kf k.(1.3)1.2The diameter in the configuration modelIn this section we present the results on the bounds on the diameter.We use the abbreviation whp for a statement that occurs with probability tending to1if the number of vertices of the graph N tends to∞.Theorem1.1(Lower bound on diameter)Forτ>2,assuming that f1+f2>0and f1<1, there exists a positive constantαsuch that whp the diameter of the configuration model is bounded below byαlog N.A more precise result on the diameter in the CM is presented in[16],where it is proved that under rather general assumptions on the degree sequence of the CM,the diameter of the CM divided by log N converges to a constant.This result is also valid for related models,such as the Erd˝o s-R´e nyi random graph,but the proof is quite difficult.While Theorem1.1is substantially weaker,the fact that a positive constant times log N appears is most interesting,as we will discuss now in more detail.Indeed,the result in Theorem1.1is most interesting in the case whenτ∈(2,3).By[18, Theorem1.2],the typical distance forτ∈(2,3)is proportional to log log N,whereas we show here that the diameter is bounded below by a positive constant times log N when f1+f2>0and f1<1. Therefore,we see that the average distance and the diameter are of a different order of magnitude. The pairs of vertices where the distance is of the order log N are thus scarce.The proof of Theorem 1.1reveals that these pairs are along long lines of vertices with degree2that are connected to each other.Also in the proof of[16],one of the main difficulties is the identification of the precise length of these long thin lines.Our second main result states that whenτ∈(2,3),the above assumption that f1+f2>0is necessary and sufficient for log N lower bounds on the diameter.In Theorem1.2below,we assume that there exists aτ∈(2,3)such that,for some c>0and all x≥1,1−F(x)≥cx1−τ,(1.4) which is slightly weaker than the assumption in(1.2).We further define for integer m≥2and a real numberσ>1,CF =CF(σ,m)=2|log(τ−2)|+2σlog m.(1.5)Then our main upper bound on the diameter when(1.4)holds is as follows:Theorem1.2(A log log upper bound on the diameter)Fix m≥2,and assume that P(D≥m+1)=1,and that(1.4)holds.Then,for everyσ>(3−τ)−1,the diameter of the configuration model is,whp,bounded above by CFlog log N.1.3Discussion and related workTheorem1.2has a counterpart for preferential attachment models(PAM)proved in[19].In these PAM’s,at each integer time t,a new vertex with m≥1edges attached to it,is added to the graph. The new edges added at time t are then preferentially connected to older edges,i.e.,conditionally on the graph at time t−1,which is denoted by G(t−1),the probability that a given edge is connected to vertex i is proportional to d i(t−1)+δ,whereδ>−m is afixed parameter and d i(t−1)is the degree of vertex i at time t−1.A substantial literature exists,see e.g.[10],proving that the degree sequence of PAM’s in rather great generality satisfy a power law(see e.g.the references in [11]).In the above setting of linear preferential attachment,the exponentτis equal to[21,11]τ=3+δm.(1.6)A log log t upper bound on the diameter holds for PAM’s with m≥2and−m<δ<0,which, by(1.6),corresponds toτ∈(2,3)[19]:Theorem1.3(A log log upper bound on the diameter of the PAM)Fix m≥2andδ∈(−m,0).Then,for everyσ>13−τ,and withCG (σ)=4|log(τ−2)|+4σlog mthe diameter of the preferential attachment model is,with high probability,bounded above by CGlog log t,as t→∞.Observe that the condition m≥2in the PAM corresponds to the condition P(D≥m+1)=1 in the CM,where one half-edge is used to attach the vertex,while in PAM’s,vertices along a pathhave degree at least three when m≥2.Also note from the definition of CG and CFthat distancesin PAM’s tend to be twice as big compared to distances in the CM.This is related to the structure of the graphs.Indeed,in both graphs,vertices of high degree play a crucial role in shortest paths. In the CM vertices of high degree are often directly connected to each other,while in the PAM, they tend to be connected through a later vertex which links to both vertices of high degree.Unfortunately,there is no log t lower bound in the PAM forδ>0and m≥2,or equivalently τ>3.However,[19]does contain a(1−ε)log t/log log t lower bound for the diameter when m≥1 andδ≥0.When m=1,results exists on log t asymptotics of the diameter,see e.g.[6,24].The results in Theorems1.1–1.3are consistent with the non-rigorous physics predictions that distances in the PAM and in the CM,for similar degree sequences,behave similarly.It is an interesting problem,for both the CM and PAM,to determine the exact constant C≥0such that the diameter of the graph of N vertices divided by log N converges in probability to C.For the CM,the results in[16]imply that C>0,for the PAM,this is not known.We now turn to related work.Many distance results for the CM are known.Forτ∈(1,2) distances are bounded[14],forτ∈(2,3),they behave as log log N[25,18,9],whereas forτ>3 the correct scaling is log N[17].Observe that these results induce lower bounds for the diameter of the CM,since the diameter is the supremum of the distance,where the supremum is taken over all pairs of vertices.Similar results for models with conditionally independent edges exist,see e.g. [4,8,13,23].Thus,for these classes of models,distances are quite well understood.The authors in [16]prove that the diameter of a sparse random graph,with specified degree sequence,has,whp, diameter equal to c log N(1+o(1)),for some constant c.Note that our Theorems1.1–1.2imply that c>0when f1+f2>0,while c=0when f1+f2=0and(1.4)holds for someτ∈(2,3).There are few results on distances or diameter in PAM’s.In[5],it was proved that in the PAMand forδ=0,for whichτ=3,the diameter of the resulting graph is equal to log tlog log t (1+o(1)).Unfortunately,the matching result for the CM has not been proved,so that this does not allow us to verify whether the models have similar distances.This paper is organized as follows.In Section2,we prove the lower bound on the diameter formulated in Theorem1.1and in Section3we prove the upper bound in Theorem1.2.2A lower bound on the diameter:Proof of Theorem1.1We start by proving the claim when f2>0.The idea behind the proof is simple.Under the conditions of the theorem,one can,whp,find a pathΓ(N)in the random graph such that this path consists exclusively of vertices with degree2and has length at least2αlog N.This implies that the diameter is at leastαlog N,since the above path could be a cycle.Below we define a procedure which proves the existence of such a path.Consider the process of pairing stubs in the graph.We are free to choose the order in which we pair the free stubs,since this order is irrelevant for the distribution of the random graph.Hence,we are allowed to start with pairing the stubs of the vertices of degree2.Let N(2)be the number of vertices of degree2and SN(2)=(i1,...,i N(2))∈N N(2)the collection of these vertices.We will pair the stubs and at the same time define a permutationΠ(N)=(i∗1, (i)N(2))of SN(2),and a characteristicχ(N)=(χ1,...,χN(2))onΠ(N),whereχj is either0or1.Π(N)andχ(N)will be defined inductively in such a way that for any vertex i∗k ∈Π(N),χk=1,if and only if vertex i∗k is connected to vertex i∗k+1.Hence,χ(N)contains a substring ofat least2αlog N ones precisely when the random graph contains a pathΓ(N)of length at least 2αlog N.We initialize our inductive definition by i∗1=i1.The vertex i∗1has two stubs,we consider the second one and pair it to an arbitrary free stub.If this free stub belongs to another vertex j=i∗1 in SN(2)then we choose i∗2=j andχ1=1,otherwise we choose i∗2=i2,andχ1=0.Suppose forsome1<k≤N(2),the sequences(i∗1, (i)k )and(χ1,...,χk−1)are defined.Ifχk−1=1,then onestub of i∗k is paired to a stub of i∗k−1,and another stub of i∗kis free,else,ifχk−1=0,vertex i∗khastwo free stubs.Thus,for every k≥1,the vertex i∗k has at least one free stub.We pair this stubto an arbitrary remaining free stub.If this second stub belongs to vertex j∈SN (2)\{i∗1, (i)k},then we choose i∗k+1=j andχk=1,else we choose i∗k+1as thefirst stub in SN(2)\{i∗1, (i)k},andχk=0.Hence,we have defined thatχk=1precisely when vertex i∗k is connected to vertexi∗k+1.We show that whp there exists a substring of ones of length at least2αlog N in thefirsthalf ofχN ,i.e.,inχ12(N)=(χi∗1,...,χi∗N(2)/2).For this purpose,we couple the sequenceχ12(N)with a sequence B12(N)={ξk},whereξk are i.i.d.Bernoulli random variables taking value1withprobability f2/(4µ),and such that,whp,χi∗k ≥ξk for all k∈{1,..., N(2)/2 }.We write PNforthe law of the CM conditionally on the degrees D1,...,DN.Then,for any1≤k≤ N(2)/2 ,thePN-probability thatχk=1is at least2N(2)−CN(k)LN −CN(k),(2.1)where,as before,N(2)is the total number of vertices with degree2,and CN(k)is one plus thetotal number of paired stubs after k−1pairings.By definition of CN(k),for any k≤N(2)/2,we haveCN(k)=2(k−1)+1≤N(2).(2.2) Due to the law of large numbers we also have that whpN(2)≥f2N/2,LN≤2µN.(2.3) Substitution of(2.2)and(2.3)into(2.1)then yields that the right side of(2.1)is at leastN(2) LN ≥f24µ.Thus,whp,we can stochastically dominate all coordinates of the random sequenceχ12(N)with ani.i.d.Bernoulli sequence B12(N)of Nf2/2independent trials with success probability f2/(4µ)>0.It is well known(see e.g.[12])that the probability of existence of a run of2αlog N ones convergesto one whenever2αlog N≤log(Nf2/2) |log(f2/(4µ))|,for some0< <1.We conclude that whp the sequence B1(N)contains a substring of2αlog N ones.Since whpχN ≥B12(N),where the ordering is componentwise,whp the sequenceχNalso contains the samesubstring of2αlog N ones,and hence there exists a required path consisting of at least2αlog N vertices with degree2.Thus,whp the diameter is at leastαlog N,and we have proved Theorem 1.1in the case that f2>0.We now complete the proof of Theorem1.1when f2=0by adapting the above argument. When f2=0,and since f1+f2>0,we must have that f1>0.Let k∗>2be the smallest integer such that f k∗>0.This k∗must exist,since f1<1.Denote by N∗(2)the total number of vertices of degree k∗of which itsfirst k∗−2stubs are connected to a vertex with degree1.Thus,effectively, after thefirst k∗−2stubs have been connected to vertices with degree1,we are left with a structure which has2free stubs.These vertices will replace the N(2)vertices used in the above proof.It is not hard to see that whp N∗(2)≥f∗2N/2for some f∗2>0.Then,the argument for f2>0can be repeated,replacing N(2)by N∗(2)and f2by f∗2.In more detail,for any1≤k≤ N∗(2)/(2k∗) , the PN-probability thatχk=1is at least2N∗(2)−C∗N(k)LN −CN(k),(2.4)where C∗N(k)is the total number of paired stubs after k−1pairings of the free stubs incident to the N∗(2)vertices.By definition of C∗N(k),for any k≤N∗(2)/(2k∗),we haveCN(k)=2k∗(k−1)+1≤N∗(2).(2.5)Substitution of(2.5),N∗(2)≥f∗2N/2and the bound on LNin(2.3)into(2.4)gives us that the right side of(2.4)is at leastN∗(2) LN ≥f∗24µ.Now the proof of Theorem1.1in the case where f2=0and f1∈(0,1)can be completed as above.We omit further details. 3A log log upper bound on the diameter forτ∈(2,3)In this section,we investigate the diameter of the CM when P(D≥m+1)=1,for some integer m≥2.We assume(1.4)for someτ∈(2,3).We will show that under these assumptions CFlog log N isan upper bound on the diameter of the CM,where CFis defined in(1.5).The proof is divided into two key steps.In thefirst,in Proposition3.1,we give a bound on the diameter of the core of the CM consisting of all vertices with degree at least a certain power of log N.This argument is very close in spirit to the one in[25],the only difference being that we have simplified the argument slightly.After this,in Proposition3.4,we derive a bound on the distance between vertices with small degree and the core.We note that Proposition3.1only relies on the assumption in(1.4),while Proposition3.4only relies on the fact that P(D≥m+1)=1,for some m≥2.The proof of Proposition3.1can easily be adapted to a setting where the degrees arefixed, by formulating the appropriate assumptions on the number of vertices with degree at least x for a sufficient range of x.This assumption would replace(1.5).Proposition3.4can easily be adapted to a setting where there are no vertices of degree smaller than or equal to m.This assumption would replace the assumption P(D≥m+1)=1,for some m≥2.We refrain from stating these extensions of our results,and start by investigating the core of the CM.We takeσ>13−τand define the core CoreNof the CM to beCoreN={i:D i≥(log N)σ},(3.1)i.e.,the set of vertices with degree at least(log N)σ.Also,for a subset A⊆{1,...,N},we define the diameter of A to be equal to the maximal shortest path distance between any pair of vertices of A.Note,in particular,that if there are pairs of vertices in A that are not connected,then the diameter of A is infinite.Then,the diameter of the core is bounded in the following proposition:Proposition3.1(Diameter of the core)For everyσ>13−τ,the diameter of CoreNis,whp,bounded above by2log log N|log(τ−2)|(1+o(1)).(3.2)Proof.We note that(1.4)implies that whp the largest degree D(N)=max1≤i≤N D i satisfiesD(N)≥u1,where u1=N1τ−1(log N)−1,(3.3) because,when N→∞,P(D(N)>u1)=1−P(D(N)≤u1)=1−(F(u1))N≥1−(1−cu1−τ1)N=1−1−c(log N)τ−1NN∼1−exp(−c(log N)τ−1)→1.(3.4)DefineN (1)={i :D i ≥u 1},(3.5)so that,whp ,N (1)=∅.For some constant C >0,which will be specified later,and k ≥2we define recursively u k =C log N u k −1 τ−2,and N (k )={i :D i ≥u k }.(3.6)We start by identifying u k :Lemma 3.2(Identification of u k )For each k ∈N ,u k =C a k (log N )b k N c k ,(3.7)with c k =(τ−2)k −1τ−1,b k =13−τ−4−τ3−τ(τ−2)k −1,a k =1−(τ−2)k −13−τ.(3.8)Proof .We will identify a k ,b k and c k recursively.We note that,by (3.3),c 1=1τ−1,b 1=−1,a 1=0.By (3.6),we can,for k ≥2,relate a k ,b k ,c k to a k −1,b k −1,c k −1as follows:c k =(τ−2)c k −1,b k =1+(τ−2)b k −1,a k =1+(τ−2)a k −1.(3.9)As a result,we obtain c k =(τ−2)k −1c 1=(τ−2)k −1τ−1,(3.10)b k =b 1(τ−2)k −1+k −2 i =0(τ−2)i =1−(τ−2)k −13−τ−(τ−2)k −1,(3.11)a k =1−(τ−2)k −13−τ.(3.12)The key step in the proof of Proposition 3.1is the following lemma:Lemma 3.3(Connectivity between N (k −1)and N (k ))Fix k ≥2,and C >4µ/c (see (1.3),and (1.4)respectively).Then,the probability that there exists an i ∈N (k )that is not directly connected to N (k −1)is o (N −γ),for some γ>0independent of k .Proof .We note that,by definition,i ∈N (k −1)D i ≥u k −1|N (k −1)|.(3.13)Also,|N (k −1)|∼Bin N,1−F (u k −1) ,(3.14)and we have that,by (1.4),N [1−F (u k −1)]≥cN (u k −1)1−τ,(3.15)which,by Lemma 3.2,grows as a positive power of N ,since c k ≤c 2=τ−2τ−1<1τ−1.We use a concentration of probability resultP (|X −E [X ]|>t )≤2e −t 22(E [X ]+t/3),(3.16)which holds for binomial random variables[20],and gives that that the probability that|N(k−1)|is bounded below by N[1−F(u k−1)]/2is exponentially small in N.As a result,we obtain that for every k,and whpi∈N(k)D i≥c2N(u k)2−τ.(3.17)We note(see e.g.,[18,(4.34)]that for any two sets of vertices A,B,we have thatPN (A not directly connected to B)≤e−D A D BL N,(3.18)where,for any A⊆{1,...,N},we writeD A=i∈AD i.(3.19)On the event where|N(k−1)|≥N[1−F(u k−1)]/2and where LN≤2µN,we then obtain by(3.18),and Boole’s inequality that the PN-probability that there exists an i∈N(k)such that i is not directly connected to N(k−1)is bounded byNe−u k Nu k−1[1−F(u k−1)]2L N≤Ne−cu k(u k−1)2−τ4µ=N1−cC4µ,(3.20)where we have used(3.6).Taking C>4µ/c proves the claim. We now complete the proof of Proposition3.1.Fixk∗= 2log log N|log(τ−2)|.(3.21)As a result of Lemma3.3,we have whp that the diameter of N(k∗)is at most2k∗,because thedistance between any vertex in N(k∗)and the vertex with degree D(N)is at most k∗.Therefore,weare done when we can show thatCoreN⊆N(k∗).(3.22)For this,we note thatN(k∗)={i:D i≥u k∗},(3.23)so that it suffices to prove that u k∗≥(log N)σ,for anyσ>13−τ.According to Lemma3.2,u k∗=C a k∗(log N)b k∗N c k∗.(3.24) Because for x→∞,and2<τ<3,x(τ−2)2log x|log(τ−2)|=x·x−2=o(log x),(3.25) wefind with x=log N thatlog N·(τ−2)2log log N|log(τ−2)|=o(log log N),(3.26) implying that N c k∗=(log N)o(1),(log N)b k∗=(log N)13−τ+o(1),and C a k∗=(log N)o(1).Thus,u k∗=(log N)13−τ+o(1),(3.27)so that,by picking N sufficiently large,we can make13−τ+o(1)≤σ.This completes the proof ofProposition3.1. For an integer m≥2,we defineC(m)=σ/log m.(3.28)Proposition 3.4(Maximal distance between periphery and core)Assume that P (D ≥m +1)=1,for some m ≥2.Then,for every σ>(3−τ)−1the maximal distance between any vertex and the core is,whp ,bounded from above by C (m )log log N .Proof .We start from a vertex i and will show that the probability that the distance between i and Core N is at least C (m )log log N is o (N −1).This proves the claim.For this,we explore the neighborhood of i as follows.From i ,we connect the first m +1stubs (ignoring the other ones).Then,successively,we connect the first m stubs from the closest vertex to i that we have connected to and have not yet been explored.We call the arising process when we have explored up to distance k from the initial vertex i the k -exploration tree .When we never connect two stubs between vertices we have connected to,then the number of vertices we can reach in k steps is precisely equal to (m +1)m k −1.We call an event where a stub on the k -exploration tree connects to a stub incident to a vertex in the k -exploration tree a collision .The number of collisions in the k -exploration tree is the number of cycles or self-loops in it.When k increases,the probability of a collision increases.However,for k of order log log N ,the probability that more than two collisions occur in the k -exploration tree is small,as we will prove now:Lemma 3.5(Not more than one collision)Take k = C (m )log log N .Then,the P N -probab-ility that there exists a vertex of which the k -exploration tree has at least two collisions,before hittingthe core Core N ,is bounded by (log N )d L −2N ,for d =4C (m )log (m +1)+2σ.Proof .For any stub in the k -exploration tree,the probability that it will create a collision beforehitting the core is bounded above by (m +1)m k −1(log N )σL −1N .The probability that two stubs will both create a collision is,by similar arguments,bounded above by (m +1)m k −1(log N )σL −1N2.The total number of possible pairs of stubs in the k -exploration tree is bounded by[(m +1)(1+m +...+m k −1)]2≤[(m +1)m k ]2,so that,by Boole’s inequality,the probability that the k -exploration tree has at least two collisions is bounded by (m +1)m k 4(log N )2σL −2N .(3.29)When k = C (m )log log N ,we have that (m +1)m k 4(log N )2σ≤(log N )d ,where d is defined inthe statement of the lemma. Finally,we show that,for k = C (m )log log N ,the k -exploration tree will,whp connect to the Core N :Lemma 3.6(Connecting exploration tree to core)Take k = C (m )log log N .Then,the probability that there exists an i such that the distance of i to the core is at least k is o (N −1).Proof .Since µ<∞we have that L N /N ∼µ.Then,by Lemma 3.5,the probability that there exists a vertex for which the k -exploration tree has at least 2collisions before hitting the core is o (N −1).When the k -exploration tree from a vertex i does not have two collisions,then there are at least (m −1)m k −1stubs in the k th layer that have not yet been connected.When k = C (m )log log N this number is at least equal to (log N )C (m )log m +o (1).Furthermore,the expected number of stubs incident to the Core N is at least N (log N )σP (D 1≥(log N )σ)so that whp the number of stubs incident to Core N is at least (compare (1.4))12N (log N )σP (D 1≥(log N )σ)≥c 2N (log N )2−τ3−τ.(3.30)By(3.18),the probability that we connect none of the stubs in the k th layer of the k-exploration tree to one of the stubs incident to CoreNis bounded byexp−cN(log N)2−τ3−τ+C(m)log m2LN≤exp−c4µ(log N)2−τ3−τ+σ=o(N−1),(3.31)because whp LN /N≤2µ,and since2−τ3−τ+σ>1.Propositions3.1and3.4prove that whp the diameter of the configuration model is boundedabove by CF log log N,with CFdefined in(1.5).This completes the proof of Theorem1.2.Acknowledgements.The work of RvdH and DZ was supported in part by Netherlands Organ-isation for Scientific Research(NWO).References[1]R.Albert and A.-L.Barab´a si.Statistical mechanics of complex networks.Rev.Mod.Phys.74,47-97,(2002).[2]A.-L.Barab´a si.Linked,The New Science of Networks.Perseus Publishing,Cambridge,Mas-sachusetts,(2002).[3]B.Bollob´a s.Random Graphs,2nd edition,Academic Press,(2001).[4]B.Bollob´a s,S.Janson,and O.Riordan.The phase transition in inhomogeneous randomgraphs.Random Structures and Algorithms31,3-122,(2007).[5]B.Bollob´a s and O.Riordan.The diameter of a scale-free random binatorica,24(1):5–34,(2004).[6]B.Bollob´a s and O.Riordan.Shortest paths and load scaling in scale-free trees.Phys.Rev.E.,69:036114,(2004).[7]T.Britton,M.Deijfen,and A.Martin-L¨o f.Generating simple random graphs with prescribeddegree distribution.J.Stat.Phys.,124(6):1377–1397,(2006).[8]F.Chung and L.Lu.The average distances in random graphs with given expected degrees.A,99(25):15879–15882(electronic),(2002).[9]R.Cohen and S.Havlin.Scale free networks are ultrasmall,Physical Review Letters90,058701,(2003).[10]C.Cooper and A.Frieze.A general model of web graphs.Random Structures Algorithms,22(3):311–335,(2003).[11]M.Deijfen,H.van den Esker,R.van der Hofstad and G.Hooghiemstra.A preferential attach-ment model with random initial degrees.Preprint(2007).To appear in Arkiv f¨o r Matematik.[12]P.Erd¨o s and A.R´e nyi.On a new law of large numbers,J.Analyse Math.23,103–111,(1970).[13]H.van den Esker,R.van der Hofstad and G.Hooghiemstra.Universality forthe distance infinite variance random graphs.Preprint(2006).Available from http://ssor.twi.tudelft.nl/∼gerardh/[14]H.van den Esker,R.van der Hofstad,G.Hooghiemstra and D.Znamenski.Distances inrandom graphs with infinite mean degrees,Extremes8,111-141,2006.。
第51卷第3期2021年5月吉林大学学报(工学版)Journal of Jilin University (Engineering and Technology Edition)Vol. 51 No. 3May 2021基于连续密度隐马尔可夫模型的矿下异常行为识别算法王淑敏,陈伟(哈尔滨工程大学经济管理学院,哈尔滨150001)摘要:针对当前方法识别精度不高的问题,提出了基于连续密度隐马尔可夫模型的矿下异常行 为识别算法。
获取待识别矿下视频帧数据,通过级联分类器实现运动区域的初步检测,并读入下 帧数据,直到所有帧检测完毕。
引入连续密度隐马尔可夫(H M M)模型,将人体图像分解成若干相 等区域,获取图像区域中的标准差值特征,对连续密度H M M进行训练,完成异常行为识别。
实验 结果证明,本文算法的识别结果具有精度高和检测率高的特性,说明其具有可靠性。
关键词:计算机应用;连续密度;马尔可夫模型;矿下异常;识别中图分类号:T P39;T D76 文献标志码:A文章编号:1671_5497(2021)03-1067-06D O I:10. 13229/ki.j d x b g x b20200160Algorithm for identifying abnormal behavior in undergroundmines based on continuous density hidden Markov modelW A N G S h u-m i n,C H E N W e i(School of E conomics and Management, Harbin Engineering University y Harbin150001, China)Abstract:In the current research of abnormal behaviors in underground m i n e s the result accuracy needs i m p r o v e m e n t.In order to solve this problem and improve the safety of underground m i n e s,a continuous density hidden M a r k o v m o d e l i s proposed to identify abnormal behaviors in underground m i n e s.In the detection of h u m a n motion area under the m i n e,first,the video frame data to be identified is obtained;the h u m a n m o t i o n area is preliminarily extracted for each i m a g e,and the preliminary detection of h u m a n motion area i s realized by cascading classifiers to get m o r e accurate h u m a n motion area.T h e n the next frame data is read in,a n d this process i s iterated until all frames are detected.According to the preliminary detection results,the continuous density hidden M a r k o v m o d e l i s introduced to d e c o m p o s e the h u m a n b o d y i m a g e into several equal areas.T h e representative color features and the standard difference features in the image area are obtained.T h e continuous density H M M of the target i s trained through the obtained feature data.T h e abnor m a l behavior recognition of the h u m a n b o d y target under the m i n e is completed according to the training m o d e l.T h e experimental results s h o w that the proposed algorithm has the characteristics of high accuracy and high detection rate in different收稿日期:2020-03-16.基金项目:国家社科基金项目(17CSH016);国家社科基金一般项目(19BJL008);国家自然科学基金项目(61371178).作者简介:王淑敏(1980-),女,博士研究生.研究方向:企业经营与管理,信息与控制工程.E-mail :*****************•1068 •吉林大学学报(工学版)第51卷n u m b e r of h u m a n behavior recognition,wh i c h s h o w s that the algorithm i s reliable.Key words:c o m p u t e r application;continuous density;M a r k o v m o d e l;underground a n o m a l y;identification〇引言矿井下的灯会周期性地改变照射点,致使矿 井下光照不是十分均匀,而且不恒定[u2]。
多尺度变换域隐马尔可夫树纹理图像特征提取模型多尺度几何分析相对于小波分析逼近性能的提高,其意义丝毫不亚于小波分析相对于Fourier分析逼进性能的提高。
Curvelet变换不仅多尺度,引入的方向参量使其具有高度各向异性,对图像边缘有优越表达能力,能用较少非零系数表示出图像边缘信息。
将Curvelet应用于去噪时噪声去除比较彻底,去噪效率高,边缘保护能力强,但存在“阶梯”效应。
基于偏微分方程(PDE)的图像去噪在图像处理领域中崭露头角,能很好对图像进行平滑处理。
其中四阶偏微分方程去噪能很好的恢复平滑区域保护细小纹理,并且避免“阶梯”效应产生,但去噪效率不高,保护边缘能力不强。
为了对含噪图像进行去噪处理,提高图像特征提取和检索效率,本文提出利用权函数将Curvelet去噪和四阶偏微分方程中LLT模型相结合的去噪模型。
实验表明该模型能很好发挥二者优点,PSNR和视觉效果都优于单一的Curvelet或LLT方法。
Contourlet变换由多尺度分析与方向分析构成,具有各向异性尺度关系,其分解系数具有非高斯性和持续性等特点。
为很好的利用相关性描述纹理图像特征,针对目前Contourlet域隐马尔可夫树模型(CHMT)同等考虑父结点的相邻结点对子结点的影响,本文提出一种加权Contourlet域隐马尔可夫树模型对纹理图像进行特征提取。
分析子结点的状态时,不仅考虑父结点信息,而且利用权重评价父结点兄弟结点对子结点的影响,并将通过附加状态转移矩阵体现出来,使得新模型能更加准确的描述Contourlet系数和HMT的内在联系。
同时运用K-L距离计算图像间的相似度,实验结果表明,本模型比CHMT平均检索率高出7%-46%。
同主题文章[1].毕晓君,李文秀. 一种新型纹理图像生成方法研究' [J]. 自动化技术与应用. 2005.(01)[2].张树良,毕笃彦. 一种基于纹理图像的目标提取方法' [J]. 空军工程大学学报(自然科学版). 2004.(04)[3].崔宇红,倪国强,范宏深. 基于模糊推理的纹理图像融合方法研究' [J]. 光学技术. 2005.(01)[4].张光新,崔扬,周泽魁. 基于小波包分解的纹理图像去噪' [J]. 华南理工大学学报(自然科学版). 2005.(03)[5].苏赫峰,张申生,王江春,罗建强. 跨区域纹理映射的一种实现方法' [J]. 计算机仿真. 2004.(03)[6].李峻,边馥苓,谈晓军. DEM及纹理图像的集成数据库研究' [J]. 测绘科学. 2000.(03)[7].尚赵伟,张明新,沈钧毅,相明. 基于双密度小波变换的纹理图像检索' [J]. 西安交通大学学报. 2005.(10)[8].夏飞,李志斌,陈亮,孟宪明. 数据融合在纹理识别中的应用' [J]. 航空计算技术. 2007.(01)[9].叶桦,章国宝,陈维南. 基于小波变换的纹理图像分割' [J]. 东南大学学报(自然科学版). 1999.(01)[10].余鹏,封举富,童行伟. 一种新的基于高斯混合模型的纹理图像分割方法' [J]. 武汉大学学报(信息科学版). 2005.(06)【关键词相关文档搜索】:电路与系统; 图像去噪; Curvelet; 四阶偏微分方程; LLT模型; Contourlet; 加权CHMT; 纹理图像; 特征提取【作者相关信息搜索】:湖南师范大学;电路与系统;杨家红;王玲华;。
隐马尔可夫模型分词 python隐马尔可夫模型(HMM)是使用频率最高的分词算法之一。
在自然语言处理中,分词是一个很基本的任务。
分词的目的是将一句话分成若干个词语,供计算机处理。
在中文分词中,由于中文没有像英文或德文那样明显的单词边界,因此中文分词任务显得更加复杂。
但是,隐马尔可夫模型却是一种很好的用于中文分词的算法。
Python作为一门强大的编程语言,它有着众多的科学计算库和自然语言处理包,可以非常方便地实现HMM分词算法。
下面,本文将会介绍如何使用Python实现HMM分词算法。
一、隐马尔可夫模型(HMM)隐马尔可夫模型(HMM)是关于时序问题的概率模型。
它的基本想法是:一个状态随机地产生一个输出,产生输出的动作和当前状态有关,但是外部观察者并不能直接观察到状态,只能观察到由状态产生的输出结果。
在分词任务中,HMM模型将文本序列看作是由一个个状态进行转移,产生不同的输出。
HMM模型包含以下几个成分:1. 状态集合:一个离散集合,表示所有可能的隐藏状态,对于分词问题,可以将其看作是所有可能的分词。
2. 观测集合:一个离散集合,表示所有可能的观测值,对于分词问题,可以将其看作是单个字符或者单个标点符号。
3. 状态转移概率矩阵:一个矩阵,表示从一个状态转移到另一个状态的概率。
例如,当一个状态为B时,转移到另一个状态E的概率是多少。
4. 发射概率矩阵:一个矩阵,表示从每个状态产生每个观测值的概率。
例如,在一个状态S下,发射出字符“我”的概率是多少。
5. 初始状态概率向量:一个向量,表示在状态序列中,第一个状态属于每个状态的概率。
在分词问题中,状态集合为所有可能的分词,观测集合为单个字符或标点符号。
例如,下面的计算机专业英语文本:计算机专业英语,是一门介绍计算机技术和计算机应用的英语语言课程。
学生除了学习英语之外,还需要学习一些有关计算机领域的基本知识和词汇。
可以将其转换为状态序列和观测序列:状态序列:B E B M M E B M E E观测序列:计算机专业英语,是一门介绍计算机技术和计算机应用的英语语言课程。
Statistical mechanics of complex networksRe´ka Albert*and Albert-La´szlo´Baraba´siDepartment of Physics,University of Notre Dame,Notre Dame,Indiana46556(Published30January2002)Complex networks describe a wide range of systems in nature and society.Frequently cited examples include the cell,a network of chemicals linked by chemical reactions,and the Internet,a network of routers and computers connected by physical links.While traditionally these systems have been modeled as random graphs,it is increasingly recognized that the topology and evolution of real networks are governed by robust organizing principles.This article reviews the recent advances in the field of complex networks,focusing on the statistical mechanics of network topology and dynamics.After reviewing the empirical data that motivated the recent interest in networks,the authors discuss the main models and analytical tools,covering random graphs,small-world and scale-free networks, the emerging theory of evolving networks,and the interplay between topology and the network’s robustness against failures and attacks.CONTENTSI.Introduction48II.The Topology of Real Networks:Empirical Results49A.World Wide Web49B.Internet50C.Movie actor collaboration network52D.Science collaboration graph52E.The web of human sexual contacts52F.Cellular networks52G.Ecological networks53H.Phone call network53I.Citation networks53works in linguistics53 K.Power and neural networks54 L.Protein folding54 III.Random-Graph Theory54A.The Erdo˝s-Re´nyi model54B.Subgraphs55C.Graph evolution56D.Degree distribution57E.Connectedness and diameter58F.Clustering coefficient58G.Graph spectra59 IV.Percolation Theory59A.Quantities of interest in percolation theory60B.General results601.The subcritical phase(pϽp c)602.The supercritical phase(pϾp c)61C.Exact solutions:Percolation on a Cayley tree61D.Scaling in the critical region62E.Cluster structure62F.Infinite-dimensional percolation62G.Parallels between random-graph theory andpercolation63 V.Generalized Random Graphs63A.Thresholds in a scale-free random graph64B.Generating function formalism64ponent sizes and phase transitions652.Average path length65C.Random graphs with power-law degreedistribution66D.Bipartite graphs and the clustering coefficient66 VI.Small-World Networks67A.The Watts-Strogatz model67B.Properties of small-world networks681.Average path length682.Clustering coefficient693.Degree distribution704.Spectral properties70 VII.Scale-Free Networks71A.The Baraba´si-Albert model71B.Theoretical approaches71C.Limiting cases of the Baraba´si-Albert model73D.Properties of the Baraba´si-Albert model741.Average path length742.Node degree correlations753.Clustering coefficient754.Spectral properties75 VIII.The Theory of Evolving Networks76A.Preferential attachment⌸(k)761.Measuring⌸(k)for real networks762.Nonlinear preferential attachment773.Initial attractiveness77B.Growth781.Empirical results782.Analytical results78C.Local events791.Internal edges and rewiring792.Internal edges and edge removal79D.Growth constraints801.Aging and cost802.Gradual aging81petition in evolving networks811.Fitness model812.Edge inheritance82F.Alternative mechanisms for preferentialattachment821.Copying mechanism822.Edge redirection823.Walking on a network834.Attaching to edges83G.Connection to other problems in statisticalmechanics831.The Simon model832.Bose-Einstein condensation85*Present address:School of Mathematics,University of Min-nesota,Minneapolis,Minnesota55455.REVIEWS OF MODERN PHYSICS,VOLUME74,JANUARY20020034-6861/2002/74(1)/47(51)/$35.00©2002The American Physical Society47IX.Error and Attack Tolerance86A.Numerical results861.Random network,random node removal872.Scale-free network,random node removal873.Preferential node removal87B.Error tolerance:analytical results88C.Attack tolerance:Analytical results89D.The robustness of real networks90munication networks902.Cellular networks913.Ecological networks91X.Outlook91A.Dynamical processes on networks91B.Directed networks92C.Weighted networks,optimization,allometricscaling92D.Internet and World Wide Web93E.General questions93F.Conclusions94 Acknowledgments94 References94 I.INTRODUCTIONComplex weblike structures describe a wide variety of systems of high technological and intellectual impor-tance.For example,the cell is best described as a com-plex network of chemicals connected by chemical reac-tions;the Internet is a complex network of routers and computers linked by various physical or wireless links; fads and ideas spread on the social network,whose nodes are human beings and whose edges represent various social relationships;the World Wide Web is an enormous virtual network of Web pages connected by hyperlinks.These systems represent just a few of the many examples that have recently prompted the scien-tific community to investigate the mechanisms that de-termine the topology of complex networks.The desire to understand such interwoven systems has encountered significant challenges as well.Physics,a major benefi-ciary of reductionism,has developed an arsenal of suc-cessful tools for predicting the behavior of a system as a whole from the properties of its constituents.We now understand how magnetism emerges from the collective behavior of millions of spins,or how quantum particles lead to such spectacular phenomena as Bose-Einstein condensation or superfluidity.The success of these mod-eling efforts is based on the simplicity of the interactions between the elements:there is no ambiguity as to what interacts with what,and the interaction strength is uniquely determined by the physical distance.We are at a loss,however,to describe systems for which physical distance is irrelevant or for which there is ambiguity as to whether two components interact.While for many complex systems with nontrivial network topology such ambiguity is naturally present,in the past few years we have increasingly recognized that the tools of statistical mechanics offer an ideal framework for describing these interwoven systems as well.These developments have introduced new and challenging problems for statistical physics and unexpected links to major topics in condensed-matter physics,ranging from percolation to Bose-Einstein condensation.Traditionally the study of complex networks has been the territory of graph theory.While graph theory ini-tially focused on regular graphs,since the1950s large-scale networks with no apparent design principles have been described as random graphs,proposed as the sim-plest and most straightforward realization of a complex network.Random graphs werefirst studied by the Hun-garian mathematicians Paul Erdo˝s and Alfre´d Re´nyi. According to the Erdo˝s-Re´nyi model,we start with N nodes and connect every pair of nodes with probability p,creating a graph with approximately pN(NϪ1)/2 edges distributed randomly.This model has guided our thinking about complex networks for decades since its introduction.But the growing interest in complex sys-tems has prompted many scientists to reconsider this modeling paradigm and ask a simple question:are the real networks behind such diverse complex systems as the cell or the Internet fundamentally random?Our in-tuition clearly indicates that complex systems must dis-play some organizing principles,which should be at some level encoded in their topology.But if the topology of these networks indeed deviates from a random graph, we need to develop tools and measurements to capture in quantitative terms the underlying organizing prin-ciples.In the past few years we have witnessed dramatic ad-vances in this direction,prompted by several parallel de-velopments.First,the computerization of data acquisi-tion in allfields led to the emergence of large databases on the topology of various real networks.Second,the increased computing power allowed us to investigate networks containing millions of nodes,exploring ques-tions that could not be addressed before.Third,the slow but noticeable breakdown of boundaries between disci-plines offered researchers access to diverse databases, allowing them to uncover the generic properties of com-plex networks.Finally,there is an increasingly voiced need to move beyond reductionist approaches and try to understand the behavior of the system as a whole.Along this route,understanding the topology of the interac-tions between the components,i.e.,networks,is un-avoidable.Motivated by these converging developments and cir-cumstances,many new concepts and measures have been proposed and investigated in depth in the past few years.However,three concepts occupy a prominent place in contemporary thinking about complex net-works.Here we define and briefly discuss them,a discus-sion to be expanded in the coming sections.Small worlds:The small-world concept in simple terms describes the fact that despite their often large size,in most networks there is a relatively short path between any two nodes.The distance between two nodes is defined as the number of edges along the short-est path connecting them.The most popular manifesta-tion of small worlds is the‘‘six degrees of separation’’concept,uncovered by the social psychologist Stanley Milgram(1967),who concluded that there was a path of48R.Albert and A.-L.Baraba´si:Statistical mechanics of complex networks Rev.Mod.Phys.,Vol.74,No.1,January2002acquaintances with a typical length of about six betweenmost pairs of people in the United States(Kochen,1989).The small-world property appears to characterizemost complex networks:the actors in Hollywood are onaverage within three co-stars from each other,or thechemicals in a cell are typically separated by three reac-tions.The small-world concept,while intriguing,is notan indication of a particular organizing principle.In-deed,as Erdo˝s and Re´nyi have demonstrated,the typi-cal distance between any two nodes in a random graphscales as the logarithm of the number of nodes.Thusrandom graphs are small worlds as well.Clustering:A common property of social networks isthat cliques form,representing circles of friends or ac-quaintances in which every member knows every othermember.This inherent tendency to cluster is quantifiedby the clustering coefficient(Watts and Strogatz,1998),a concept that has its roots in sociology,appearing underthe name‘‘fraction of transitive triples’’(Wassermannand Faust,1994).Let us focusfirst on a selected node iin the network,having k i edges which connect it to k iother nodes.If the nearest neighbors of the originalnode were part of a clique,there would be k i(k iϪ1)/2 edges between them.The ratio between the number E iof edges that actually exist between these k i nodes andthe total number k i(k iϪ1)/2gives the value of the clus-tering coefficient of node i,C iϭ2E ik i͑k iϪ1͒.(1)The clustering coefficient of the whole network is the average of all individual C i’s.An alternative definition of C that is often used in the literature is discussed in Sec.VI.B.2(Barrat and Weigt,2000;Newman,Strogatz, and Watts,2000).In a random graph,since the edges are distributed randomly,the clustering coefficient is Cϭp(Sec.III.F). However,in most,if not all,real networks the clustering coefficient is typically much larger than it is in a compa-rable random network(i.e.,having the same number of nodes and edges as the real network).Degree distribution:Not all nodes in a network have the same number of edges(same node degree).The spread in the node degrees is characterized by a distri-bution function P(k),which gives the probability that a randomly selected node has exactly k edges.Since in a random graph the edges are placed randomly,the major-ity of nodes have approximately the same degree,close to the average degree͗k͘of the network.The degree distribution of a random graph is a Poisson distribution with a peak at P(͗k͘).One of the most interesting de-velopments in our understanding of complex networks was the discovery that for most large networks the de-gree distribution significantly deviates from a Poisson distribution.In particular,for a large number of net-works,including the World Wide Web(Albert,Jeong, and Baraba´si,1999),the Internet(Faloutsos et al.,1999), or metabolic networks(Jeong et al.,2000),the degree distribution has a power-law tail,P͑k͒ϳkϪ␥.(2) Such networks are called scale free(Baraba´si and Al-bert,1999).While some networks display an exponential tail,often the functional form of P(k)still deviates sig-nificantly from the Poisson distribution expected for a random graph.These discoveries have initiated a revival of network modeling in the past few years,resulting in the introduc-tion and study of three main classes of modeling para-digms.First,random graphs,which are variants of the Erdo˝s-Re´nyi model,are still widely used in manyfields and serve as a benchmark for many modeling and em-pirical studies.Second,motivated by clustering,a class of models,collectively called small-world models,has been proposed.These models interpolate between the highly clustered regular lattices and random graphs.Fi-nally,the discovery of the power-law degree distribution has led to the construction of various scale-free models that,by focusing on the network dynamics,aim to offer a universal theory of network evolution.The purpose of this article is to review each of these modeling efforts,focusing on the statistical mechanics of complex networks.Our main goal is to present the the-oretical developments in parallel with the empirical data that initiated and support the various models and theo-retical tools.To achieve this,we start with a brief de-scription of the real networks and databases that repre-sent the testing ground for most current modeling efforts.II.THE TOPOLOGY OF REAL NETWORKS:EMPIRICAL RESULTSThe study of most complex networks has been initi-ated by a desire to understand various real systems, ranging from communication networks to ecological webs.Thus the databases available for study span sev-eral disciplines.In this section we review briefly those that have been studied by researchers aiming to uncover the general features of complex networks.Beyond a de-scription of the databases,we shall focus on three robust measures of a network’s topology:average path length, clustering coefficient,and degree distribution.Other quantities,as discussed in the following sections,will again be tested on these databases.The properties of the investigated databases,as well as the obtained expo-nents,are summarized in Tables I and II.A.World Wide WebThe World Wide Web represents the largest network for which topological information is currently available. The nodes of the network are the documents(web pages)and the edges are the hyperlinks(URL’s)that point from one document to another(see Fig.1).The size of this network was close to one billion nodes at the end of1999(Lawrence and Giles,1998,1999).The in-terest in the World Wide Web as a network boomed after it was discovered that the degree distribution of the web pages follows a power law over several orders of magnitude(Albert,Jeong,and Baraba´si,1999;Kumar49R.Albert and A.-L.Baraba´si:Statistical mechanics of complex networks Rev.Mod.Phys.,Vol.74,No.1,January2002et al.,1999).Since the edges of the World Wide Web aredirected,the network is characterized by two degree dis-tributions:the distribution of outgoing edges,P out(k), signifies the probability that a document has k outgoinghyperlinks,and the distribution of incoming edges, P in(k),is the probability that k hyperlinks point to acertain document.Several studies have established that both P out(k)and P in(k)have power-law tails: P out͑k͒ϳkϪ␥out and P in͑k͒ϳkϪ␥in.(3) Albert,Jeong,and Baraba´si(1999)have studied asubset of the World Wide Web containing325729nodes and have found␥outϭ2.45and␥inϭ2.1.Kumar et al. (1999)used a40-million-document crawl by Alexa Inc., obtaining␥outϭ2.38and␥inϭ2.1(see also Kleinberg et al.,1999).A later survey of the World Wide Web to-pology by Broder et al.(2000)used two1999Altavistacrawls containing in total200million documents,obtain-ing␥outϭ2.72and␥inϭ2.1with scaling holding close to five orders of magnitude(Fig.2).Adamic and Huber-man(2000)used a somewhat different representation of the World Wide Web,with each node representing a separate domain name and two nodes being connected if any of the pages in one domain linked to any page in the other.While this method lumped together pages that were on the same domain,representing a nontrivial ag-gregation of the nodes,the distribution of incoming edges still followed a power law with␥in domϭ1.94.Note that␥in is the same for all measurements at the document level despite the two-years’time delay be-tween thefirst and last web crawl,during which the World Wide Web had grown at leastfive times larger. However,␥out has a tendency to increase with the sample size or time(see Table II).Despite the large number of nodes,the World Wide Web displays the small-world property.This wasfirst re-ported by Albert,Jeong,and Baraba´si(1999),who found that the average path length for a sample of 325729nodes was11.2and predicted,usingfinite size scaling,that for the full World Wide Web of800million nodes that would be a path length of around19.Subse-quent measurements by Broder et al.(2000)found that the average path length between nodes in a50-million-node sample of the World Wide Web is16,in agreement with thefinite size prediction for a sample of this size. Finally,the domain-level network displays an average path length of3.1(Adamic,1999).The directed nature of the World Wide Web does not allow us to measure the clustering coefficient using Eq.(1).One way to avoid this difficulty is to make the net-work undirected,making each edge bidirectional.This was the path followed by Adamic(1999),who studied the World Wide Web at the domain level using a1997 Alexa crawl of50million web pages distributed among 259794sites.Adamic removed the nodes that had have only one edge,focusing on a network of153127sites. While these modifications are expected to increase the clustering coefficient somewhat,she found Cϭ0.1078, orders of magnitude higher than C randϭ0.00023corre-sponding to a random graph of the same size and aver-age degree.B.InternetThe Internet is a network of physical links between computers and other telecommunication devices(Fig.TABLE I.The general characteristics of several real networks.For each network we have indicated the number of nodes,the average degree͗k͘,the average path length l,and the clustering coefficient C.For a comparison we have included the average path length l rand and clustering coefficient C rand of a random graph of the same size and average degree.The numbers in the last column are keyed to the symbols in Figs.8and9.Network Size͗k͘l l rand C C rand Reference Nr. WWW,site level,undir.15312735.21 3.1 3.350.10780.00023Adamic,19991 Internet,domain level3015–6209 3.52–4.11 3.7–3.76 6.36–6.180.18–0.30.001Yook et al.,2001a,Pastor-Satorras et al.,20012 Movie actors22522661 3.65 2.990.790.00027Watts and Strogatz,19983 LANL co-authorship529099.7 5.9 4.790.43 1.8ϫ10Ϫ4Newman,2001a,2001b,2001c4 MEDLINE co-authorship152025118.1 4.6 4.910.066 1.1ϫ10Ϫ5Newman,2001a,2001b,2001c5 SPIRES co-authorship56627173 4.0 2.120.7260.003Newman,2001a,2001b,2001c6 NCSTRL co-authorship11994 3.599.77.340.4963ϫ10Ϫ4Newman,2001a,2001b,2001c7 Math.co-authorship70975 3.99.58.20.59 5.4ϫ10Ϫ5Baraba´si et al.,20018 Neurosci.co-authorship20929311.56 5.010.76 5.5ϫ10Ϫ5Baraba´si et al.,20019 E.coli,substrate graph2827.35 2.9 3.040.320.026Wagner and Fell,200010 E.coli,reaction graph31528.3 2.62 1.980.590.09Wagner and Fell,200011 Ythan estuary food web1348.7 2.43 2.260.220.06Montoya and Sole´,200012 Silwood Park food web154 4.75 3.40 3.230.150.03Montoya and Sole´,200013 Words,co-occurrence460.90270.13 2.67 3.030.4370.0001Ferrer i Cancho and Sole´,200114 Words,synonyms2231113.48 4.5 3.840.70.0006Yook et al.,2001b15 Power grid4941 2.6718.712.40.080.005Watts and Strogatz,199816C.Elegans28214 2.65 2.250.280.05Watts and Strogatz,199817 50R.Albert and A.-L.Baraba´si:Statistical mechanics of complex networksRev.Mod.Phys.,Vol.74,No.1,January20021).The topology of the Internet is studied at two differ-ent levels.At the router level,the nodes are the routers,and edges are the physical connections between them.At the interdomain (or autonomous system)level,eachwork structure of the World Wide Web and the Internet.Upper panel:the nodes of the World Wide Web are web documents,connected with directed hyperlinks (URL’s).Lower panel:on the Internet the nodes are the routers and computers,and the edges are the wires and cables that physi-cally connect them.Figure courtesy of Istva´n Albert.TABLE II.The scaling exponents characterizing the degree distribution of several scale-free networks,for which P (k )follows apower law (2).We indicate the size of the network,its average degree ͗k ͘,and the cutoff for the power-law scaling.For directed networks we list separately the indegree (␥in )and outdegree (␥out )exponents,while for the undirected networks,marked with an asterisk (*),these values are identical.The columns l real ,l rand ,and l pow compare the average path lengths of real networks with power-law degree distribution and the predictions of random-graph theory (17)and of Newman,Strogatz,and Watts (2001)[also see Eq.(63)above],as discussed in Sec.V .The numbers in the last column are keyed to the symbols in Figs.8and workSize͗k ͘␥out ␥in lreallrandlpowReference Nr.WWW 325729 4.51900 2.45 2.111.28.32 4.77Albert,Jeong,and Baraba´si 19991WWW 4ϫ1077 2.38 2.1Kumar et al.,19992WWW 2ϫ1087.54000 2.72 2.1168.857.61Broder et al.,20003WWW,site 260000 1.94Huberman and Adamic,20004Internet,domain *3015–4389 3.42–3.7630–40 2.1–2.2 2.1–2.246.3 5.2Faloutsos,19995Internet,router *3888 2.5730 2.48 2.4812.158.757.67Faloutsos,19996Internet,router *150000 2.6660 2.4 2.41112.87.47Govindan,20007Movie actors *21225028.78900 2.3 2.3 4.54 3.65 4.01Baraba´si and Albert,19998Co-authors,SPIRES *566271731100 1.2 1.24 2.12 1.95Newman,2001b9Co-authors,neuro.*20929311.54400 2.1 2.16 5.01 3.86Baraba´si et al.,200110Co-authors,math.*70975 3.9120 2.5 2.59.58.2 6.53Baraba´si et al.,200111Sexual contacts *2810 3.4 3.4Liljeros et al.,200112Metabolic,E.coli 7787.4110 2.2 2.2 3.2 3.32 2.89Jeong et al.,200013Protein,S.cerev.*1870 2.39 2.4 2.4Jeong,Mason,et al.,200114Ythan estuary *1348.735 1.05 1.052.43 2.26 1.71Montoya and Sole´,200014Silwood Park *154 4.7527 1.13 1.13 3.43.232Montoya and Sole´,200016Citation 7833398.573Redner,199817Phone call 53ϫ1063.16 2.1 2.1Aiello et al.,200018Words,co-occurrence *46090270.13 2.7 2.7Ferrer i Cancho and Sole´,200119Words,synonyms *2231113.48 2.8 2.8Yook et al.,2001b20FIG.2.Degree distribution of the World Wide Web from twodifferent measurements:ᮀ,the 325729-node sample of Albert et al.(1999);᭺,the measurements of over 200million pages by Broder et al.(2000);(a)degree distribution of the outgoing edges;(b)degree distribution of the incoming edges.The data have been binned logarithmically to reduce noise.Courtesy of Altavista and Andrew Tomkins.The authors wish to thank Luis Amaral for correcting a mistake in a previous version of this figure (see Mossa et al.,2001).51R.Albert and A.-L.Baraba´si:Statistical mechanics of complex networks Rev.Mod.Phys.,Vol.74,No.1,January 2002domain,composed of hundreds of routers and comput-ers,is represented by a single node,and an edge is drawn between two domains if there is at least one route that connects them.Faloutsos et al.(1999)have studied the Internet at both levels,concluding that in each case the degree distribution follows a power law.The inter-domain topology of the Internet,captured at three dif-ferent dates between 1997and the end of 1998,resultedin degree exponents between ␥I as ϭ2.15and ␥I asϭ2.2.The 1995survey of Internet topology at the router level,containing 3888nodes,found ␥I rϭ2.48(Faloutsos et al.,1999).Recently Govindan and Tangmunarunkit (2000)mapped the connectivity of nearly 150000router inter-faces and nearly 200000router adjacencies,confirmingthe power-law scaling with ␥I rӍ2.3[see Fig.3(a)].The Internet as a network does display clustering and small path length as well.Yook et al.(2001a)and Pastor-Satorras et al.(2001),studying the Internet at the do-main level between 1997and 1999,found that its clus-tering coefficient ranged between 0.18and 0.3,to be compared with C rand Ӎ0.001for random networks with similar parameters.The average path length of the In-ternet at the domain level ranged between 3.70and 3.77(Pastor-Satorras et al.,2001;Yook et al.2001a)and at the router level it was around 9(Yook et al.,2001a),indicating its small-world character.C.Movie actor collaboration networkA much-studied database is the movie actor collabo-ration network,based on the Internet Movie Database,which contains all movies and their casts since the 1890s.In this network the nodes are the actors,and two nodes have a common edge if the corresponding actors have acted in a movie together.This is a continuously expand-ing network,with 225226nodes in 1998(Watts and Stro-gatz,1998),which grew to 449913nodes by May 2000(Newman,Strogatz,and Watts,2000).The average path length of the actor network is close to that of a random graph with the same size and average degree,3.65com-pared with 2.9,but its clustering coefficient is more than 100times higher than a random graph (Watts and Stro-gatz,1998).The degree distribution of the movie actor network has a power-law tail for large k [see Fig.3(b)],following P (k )ϳk Ϫ␥actor ,where ␥actor ϭ2.3Ϯ0.1(Bara-ba´si and Albert,1999;Albert and Baraba ´si,2000;Ama-ral et al.,2000).D.Science collaboration graphA collaboration network similar to that of the movie actors can be constructed for scientists,where the nodes are the scientists and two nodes are connected if the two scientists have written an article together.To uncover the topology of this complex graph,Newman (2001a,2001b,2001c)studied four databases spanning physics,biomedical research,high-energy physics,and computer science over a five-year window (1995–1999).All these networks show a small average path length but a high clustering coefficient,as summarized in Table I.The de-gree distribution of the collaboration network of high-energy physicists is an almost perfect power law with an exponent of 1.2[Fig.3(c)],while the other databases display power laws with a larger exponent in the tail.Baraba´si et al.(2001)investigated the collaboration graph of mathematicians and neuroscientists publishing between 1991and 1998.The average path length of these networks is around l math ϭ9.5and l nsci ϭ6,their clustering coefficient being C math ϭ0.59and C nsci ϭ0.76.The degree distributions of these collaboration networks are consistent with power laws with degree ex-ponents 2.1and 2.5,respectively [see Fig.3(d)].E.The web of human sexual contactsMany sexually transmitted diseases,including AIDS,spread on a network of sexual relationships.Liljeros et al.(2001)have studied the web constructed from the sexual relations of 2810individuals,based on an exten-sive survey conducted in Sweden in 1996.Since the edges in this network are relatively short lived,they ana-lyzed the distribution of partners over a single year,ob-taining for both females and males a power-law degree distribution with an exponent ␥f ϭ3.5Ϯ0.2and ␥m ϭ3.3Ϯ0.2,respectively.F .Cellular networksJeong et al.(2000)studied the metabolism of 43or-ganisms representing all three domains of life,recon-structing them in networks in which the nodes aretheFIG.3.The degree distribution of several real networks:(a)Internet at the router level.Data courtesy of Ramesh Govin-dan;(b)movie actor collaboration network.After Baraba´si and Albert 1999.Note that if TV series are included as well,which aggregate a large number of actors,an exponential cut-off emerges for large k (Amaral et al.,2000);(c)co-authorship network of high-energy physicists.After Newman (2001a,2001b);(d)co-authorship network of neuroscientists.AfterBaraba´si et al.(2001).52R.Albert and A.-L.Baraba´si:Statistical mechanics of complex networks Rev.Mod.Phys.,Vol.74,No.1,January 2002。
随机图中正则Laplace矩阵的谱分析张玲;丁雪【摘要】We investigated the convergence of the empirical spectral distribution (ESD)of normalized Laplacian matrix from random graph with given expected degree.It was shown that the ESD of normalized Laplacian matrix converges to a fixed probability distribution when the expected degree satisfys some assumptions,but the fixed probability distribution may be different at different places.%用随机矩阵中的矩方法研究给定期望度数随机图中正则 Laplace 矩阵经验谱分布的收敛性,结果表明,在期望度数满足一定条件时,相应正则Laplace 矩阵的经验谱分布几乎处处收敛到固定的概率分布,但在不同的期望度数下,此概率分布可能不同。
【期刊名称】《吉林大学学报(理学版)》【年(卷),期】2014(000)005【总页数】7页(P954-960)【关键词】随机图;随机矩阵;正则Laplace矩阵;经验谱分布【作者】张玲;丁雪【作者单位】长春工程学院理学院,长春 130012;吉林大学数学学院,长春130012【正文语种】中文【中图分类】O211.4随机图在物理和化学等领域应用广泛,目前已有许多研究成果[1-2]. 随机图的谱理论主要研究与其有关的相邻矩阵和Laplace矩阵特征值和特征向量的性质[3-4]. 给定期望度数随机图是随机图中一类较重要的随机图,文献[5-12]研究了给定期望度数随机图的谱理论. 本文研究一类期望度数特殊的随机图所对应正则Laplace矩阵的谱性质.设图G=(V(G),E(G)),其中: V(G)={v1,v2,…,vn}表示图G的n个顶点; E(G)表示图的边集. 对于任意一个非负序列w=(ω1,ω2,…,ωn),与其对应的给定期望度数随机图G(w)定义为:每对顶点(vi,vj)以概率ωiωjρ形成G的一条边,并且所有边的形成都是独立的,这里ρ=1/∑ωi. 为方便,称vi,vj是相连的,如果(vi,vj)是G的边,记作vi~vj. 令d(vi)表示所有与顶点vi相连的顶点个数,称d(vi)为顶点vi的度数. 为了使ωiωjρ对所有的i,j确实是概率,本文总假设中顶点vi度数的期望为此时,G(w)称为给定期望度数的随机图模型. 对于随机图G(w),与其相关的相邻矩阵和正则Laplace矩阵定义如下:1) 相邻矩阵An=(aij)1≤i,j≤n是一个对称矩阵,其中2) 正则Laplace矩阵:其中: In是n×n单位矩阵; An是相邻矩阵; Dn是一个n×n 对角矩阵,(i,i)位置的元素是顶点vi的度数d(vi). 对期望度数w=(ω1,ω2,…,ωn)做如下假设:(H1) ωmin≫,这里:当n→∞时,|ωiωjρ-φts|→0,其中: φts∈[0,1]是常数;Int(t=1,2,…,M)是{1,2,…,n}的不交子集,并且|Int|/n→αt,0<αt<1是常数满足是任意正整数.如果λ1(M)≤λ2(M)≤…≤λn(M)表示矩阵M的特征值,在不混淆的情况下简写为λ1≤λ2≤…≤λn,则矩阵M的经验谱分布定义为).定理1 假设w满足(H1),G(w)是具有期望度数w的随机图,则(/2)Ln的经验谱分布几乎处处收敛到一个非随机的概率分布.注1 在定理1的假设下,如果所有的φts都相等,则(/2)Ln的经验谱分布几乎处处收敛到半圆率,如果M=2,则(/2)Ln的经验谱分布几乎处处收敛到半圆率或者半圆率与另外一个已知概率分布的自由卷积,对M>2,(/2)Ln的经验谱分布极限非常复杂,本文只证明该分布的存在性.设k,n是两个正整数,i1,i2,…,ik是从{1,2,…,n}中取值的k个正整数,定义路径π={(i1,i2),(i2,i3),…,(ik-1,ik),(ik,i1)},称该路径为长度为k的循环,称(il,il+1)为循环π的一条边,il,il+1是边(il,il+1)的顶点. 对π的两条边(il,il+1),(is,is+1),如果il=is,il+1=is+1或者il=is+1,il+1=is,则称这两条边是相同的,如果π的边(il,il+1)与所有的边都不同,则称它是单独的.引理1[1] 对任意的正整数k,n,所有长度为k正好拥有l个不同边的循环个数不多于θlknl+1,这里θlk是一个只依赖于l,k的常数.引理2[2] 所有长度为2k正好有l个不同边,且没有单独边的循环个数不多于当l=k时,所有长度为2k正好拥有k个不同边,且没有单独的边,有k+1个不同顶点的循环个数正好是. 而所有长度为2k正好拥有k个不同边,且没有单独边但顶点个数不超过k的循环个数至多为Qknk,这里Qk是只依赖k的常数.令π1,π2,π3,π4是4个长度为k的在{1,2,…,n}中取值的循环,如果π1,π2,π3,π4的所有边中没有单独的边,则称它们是匹配的; 如果πi的所有边与πj(j≠i)中的任何边都不同,则称πi为自匹配的.引理3[1] 假设k,n是两个正整数,用S表示所有长度为k匹配的,没有自匹配的,并且不同的边个数正好是l的循环π1,π2,π3,π4,则S中元素的个数最多为ηlknl+2,这里ηlk是不依赖n的常数.下面证明定理1. 正则Laplace矩阵有如下表述:其中: Vol(G)=∑d(vi); Kn是所有值都为1的n×n矩阵. 定义这里: Wn是n×n的对角矩阵,(i,i)位置的元素为顶点vi的期望度数ωi;Vol(G)=∑ωi是G的期望体积; χ=(,,…,)是一个n维的行向量.命题1 假设w满足(H1),G(w)是具有期望度数w的随机图,则(/2)Cn的经验谱分布几乎处处收敛到一个非随机的概率分布.证明: 应用矩方法证明随机矩阵(/2)Cn经验谱分布的收敛性,只需证明下列事实成立[13]:存在,2) β2k≤,τ是已知常数;对任意的整数r成立,这里κr是只依赖r的常数.注意到这里取遍所有长度为r,在{1,2,…,n}中取值的循环. 由Cn的定义可见,Cn是对称矩阵,且有如下分布:计算可得对任意的M≥2成立.由{Cij,1≤i≤j≤n}的独立性可见,如果π有单独的边,则ECπ=0. 因此只需在式(2)中考虑每条边至少重复两次的循环,这样的循环中最多有r/2条边,如果π正好有l(1≤l≤r/2)条边,则利用式(3)可得|ECπ|≤. 又由引理1可知,正好有l个不同边的循环个数不超过θlrnl+1. 因此有由于假设易计算当r=2k+1时,事实1)的第二部分成立; 当r=2k时,由上述讨论知这里: π是取遍所有长度为2k每条边至少重复两次的循环; S1是所有长度为2k每条边正好重复两次且正好有k+1个不同顶点的循环组成的集合; S2是所有长度为2k每条边重复至少两次,并且至少有一条边重复三次以上的循环组成的集合; S3是所有长度为2k在{1,2,…,n}中取值的每条边正好重复两次且不同顶点个数不超过k 的循环组成的集合. 对S2中的任何循环π,π中不同边的个数至多为k-1,则有式(5)用到了引理2及对任意的π∈S3有|ECπ|≤ρk. 因此为了证明事实1)的第一部分只需证明存在,并且由式(3)知E(Cij)2=ρ(1-ωiωjρ),由假设(H1)可找到一个正数列δn→0,使得由于S1中共有种循环,且每种循环都有相同的符号序列,因此可把S1分成个等价集,分别记为T1,T2,…,T(2k)!/((k+1)!k!). 为了证明式(7),只需对每种等价集证明存在,并且由于每个Tq中的循环都正好有k个不同边,有k+1个不同顶点,每个顶点的取值在Int(t=1,2,…,M)中. 因此,根据每个顶点的取值,可把Tq分成Mk+1个等价集每个等价集中循环的每个顶点只在Int(t=1,2,…,M)中有一个取值.中的每个循环π满足这里表示两个顶点分别在Int,Ins中取值边的个数,则是只依赖于k的常数. 注意到有这里表示集合的个数. 注意到是因为中的每个循环正好有k+1个不同的顶点,而且每个顶点都在{1,2,…,n}中取值. 因此有令表示中循环顶点在Int中取值的个数,则而由假设(H1)可知这里τ=max φts. 综上有由的定义可断定从而至此证明了断言1)的第一部分和断言2)都成立.考虑其中π1,π2,π3,π4取遍所有长度为r在{1,2,…,n}中取值的循环. 如果π1,π2,π3,π4有单独边或有自匹配,则可断定其期望为零. 因此在式(13)中,只需考虑没有单独边且没有自匹配的π1,π2,π3,π4. 由于没有单独边,则π1,π2,π3,π4中至多有2r条边,如果正好有1≤l≤2r条边,则由式(3)知如果π1中有l1个不同边,π2,π3,π4中有l2个不同边,则l1+l2≥l. 如果π1中有单独边,则E(Cπ1)=0;反之对所有有l1条边的π1成立. 同理有因此式(15)中第二个不等式成立是由于假设ωi≤n,则即同理可得将式(13)中的求和项展开,有其中第一个求和这里Sl表示所有长度为r没有单独边,且没有自匹配的π1,π2,π3,π4正好有l个不同边的4个循环的集合. 由引理3可知,集合Sl的元素个数最多为ηlrnl+2个,结合式(14)可知,式(17)中最后的等式成立是由于同理可得于是由于故可断定又由式(13)可知存在只依赖于r的常数κr,使得当n充分大时,有证毕.命题2 假设w满足(H1),G(w)是具有期望度数w的随机图,则(/2)Cn和(/2)(In-Ln)的经验谱分布几乎处处收敛到同一个概率分布.证明: 由于这里‖f‖|f(x)|. 因此(/2)(In-Ln)和(/2)Mn的经验谱分布收敛到相同的概率分布. 将Mn做如下分解:其中: Cn由式(1)定义; Bn,Rn,Sn是对称的随机矩阵,分别有元素aij是相邻矩阵An的元素,di是顶点vi的实际度数d(vi). 易见这里:易见rank(Rn)≤1. rank(Sn)≤1. 从而有表明(/2)(In-Ln)和(/2)(Cn+Bn)的经验谱分布都收敛到相同的概率分布. 由这里L表示两个分布的Levy距离,如果‖Bn‖→0 a.s.,则可断定和有相同的极限分布. 于是因此有这里:故有其中:易见由Borel-Cantell引理[14]容易计算:‖y‖+‖y‖‖z‖)→0 a.s. 从而‖Bn‖→0 a.s. 综上可知和都收敛到相同的概率分布. 证毕.下面证明定理1.由命题2可知,(/2)(In-Ln)和(/2)Cn的经验谱分布几乎处处收敛到相同的概率分布,由命题1可知(/2)Cn的经验谱分布几乎处处收敛到一个非随机的概率分布,因此可断定(/2)(In-Ln)的经验谱分布几乎处处收敛到一个非随机的概率分布,易见(/2)Ln的经验谱分布几乎处处收敛到一个非随机的概率分布. 即定理1成立.【相关文献】[1]Gugelmann L,Spöhel R. On Balanced Coloring Games in Random Graphs [J]. European J Combin,2014,35: 297-312.[2]Fountoulakis N F,Kang R J,McDiarmid C. Largest Sparse Subgraphs of Random Graphs [J]. European J Combin,2014,35: 232-244.[3]Fan C,Radcliffe M. On the Spectra of General Random Graphs [J]. Electron J Combin,2011,18(1): Paper 215.[4]Fan C,Lu L Y. Complex Graphs and Networks [M]. Vol.107. Boston: AMS,2004.[5]Fan C,Lu L Y,Vu V. Spectra of Random Graphs with Given Expected Degrees [J]. PNAS,2003,100(11): 6313-6318.[6]Fan C,Lu L Y,Vu V. Eigenvalues of Random Power Law Graphs [J]. Ann Comb,2003,7(1): 21-33.[7]Fan C,Lu L Y. The Average Distances in Random Graphs with Given Expected Degrees [J]. Proc Natl Acad Sci USA,2002,99(25): 15879-15882.[8]Anand K,Bianconi G,Severini S. Shannon and Von Neumann Entropy of Random Networks with Heterogeneous Expected Degree [J]. Phys Rev E,2011,83(3): 036109.[9]Coja-Oghlan A,Lanka A. The Spectral Gap of Random Graphs with Given Expected Degrees [M]. Lecture Notes in Comput Sci. Vol.4051. New York: Springer,2006.[10]Fan C,Graham R. Quasi-random Graphs with Given Degree Sequences [J]. Random Structures Algorithms,2008,32(1): 1-19.[11]SHANG Yilun. A Remark on the Spectra of Random Graphs with Given Expected Degrees [J]. J Discrete Math Sci Cryptogr,2012,15(6): 317-321.[12]Miller J C,Hagberg A. Efficient Generation of Networks with Given Eexpected Degrees[C]//Proceeding of the 8th International Conference on Algorithms and Models for the Web Graph. Berlin: Springer-Verlag,2011: 115-126.[13]BAI Zhidong,Silverstein J W. Spectral Analysis of Large Dimensinal Random Matrices [M]. 2nd ed. Beijing: Science Press,2010.[14]汪嘉冈. 现代概率论基础 [M]. 2版. 上海: 复旦大学出版社,2006. (WANG Jiagang. Foundations of Modern Probability [M]. 2nd ed. Shanghai: Fudan University Press,2006.)。
Random graphs with arbitrary degree distributions and their applications M.E.J.Newman,1,2S.H.Strogatz,2,3and D.J.Watts1,41Santa Fe Institute,1399Hyde Park Road,Santa Fe,New Mexico875012Center for Applied Mathematics,Cornell University,Ithaca,New York14853-3401 3Department of Theoretical and Applied Mechanics,Cornell University,Ithaca,New York14853-1503 4Department of Sociology,Columbia University,1180Amsterdam Avenue,New York,New York10027͑Received19March2001;published24July2001͒Recent work on the structure of social networks and the internet has focused attention on graphs with distributions of vertex degree that are significantly different from the Poisson degree distributions that have been widely studied in the past.In this paper we develop in detail the theory of random graphs with arbitrary degree distributions.In addition to simple undirected,unipartite graphs,we examine the properties of directed and bipartite graphs.Among other results,we derive exact expressions for the position of the phase transition at which a giant componentfirst forms,the mean component size,the size of the giant component if there is one,the mean number of vertices a certain distance away from a randomly chosen vertex,and the average vertex-vertex distance within a graph.We apply our theory to some real-world graphs,including the world-wide web and collaboration graphs of scientists and Fortune1000company directors.We demonstrate that in some cases random graphs with appropriate distributions of vertex degree predict with surprising accuracy the behavior of the real world,while in others there is a measurable discrepancy between theory and reality, perhaps indicating the presence of additional social structure in the network that is not captured by the random graph.DOI:10.1103/PhysRevE.64.026118PACS number͑s͒:89.75.Hc,87.23.Ge,05.90.ϩmI.INTRODUCTIONA random graph͓1͔is a collection of points,or vertices, with lines,or edges,connecting pairs of them at random ͓Fig.1͑a͔͒.The study of random graphs has a long history.Starting with the influential work of Erdo¨s and Re´nyi in the 1950s and1960s͓2–4͔,random graph theory has developed into one of the mainstays of modern discrete mathematics, and has produced a prodigious number of results,many of them highly ingenious,describing statistical properties of graphs,such as distributions of component sizes,existence and size of a giant component,and typical vertex-vertex dis-tances.In almost all of these studies the assumption has been made that the presence or absence of an edge between two vertices is independent of the presence or absence of any other edge,so that each edge may be considered to be present with independent probability p.If there are N verti-ces in a graph,and each is connected to an average of z edges,then it is trivial to show that pϭz/(NϪ1),which for large N is usually approximated by z/N.The number of edges connected to any particular vertex is called the degree k of that vertex,and has a probability distribution p k given byp kϭͩN kͪp k͑1Ϫp͒NϪkӍz k eϪz k!,͑1͒where the second equality becomes exact in the limit of large N.This distribution we recognize as the Poisson distribution: the ordinary random graph has a Poisson distribution of ver-tex degrees,a point which turns out to be crucial,as we now explain.Random graphs are not merely a mathematical toy;they have been employed extensively as models of real-world net-works of various types,particularly in epidemiology.The passage of a disease through a community depends strongly on the pattern of contacts between those infected with the disease and those susceptible to it.This pattern can be de-picted as a network,with individuals represented by vertices and contacts capable of transmitting the disease by edges.A large class of epidemiological models known as susceptible/ infectious/recovered models͓5–7͔makes frequent use of the so-called fully mixed approximation,which is the assump-tion that contacts are random and uncorrelated,i.e.,they form a random graph.Random graphs however turn out to have severe short-comings as models of such real-world phenomena.Although it is difficult to determine experimentally the structure of the network of contacts by which a disease is spread͓8͔,studies have been performed of other social networks such as net-works of friendships within a variety of communities͓9–11͔, networks of telephone calls͓12,13͔,airline timetables͓14͔, and the power grid͓15͔,as well as networks in physicalor FIG.1.͑a͒A schematic representation of a random graph,the circles representing vertices and the lines representing edges.͑b͒A directed random graph,i.e.,one in which each edge runs in only one direction.PHYSICAL REVIEW E,VOLUME64,026118biological systems,including neural networks͓15͔,the struc-ture and conformation space of polymers͓16,17͔,metabolic pathways͓18,19͔,and food webs͓20,21͔.It is found͓13,14͔that the distribution of vertex degrees in many of these net-works is measurably different from a Poisson distribution—often wildly different—and this strongly suggests,as has been emphasized elsewhere͓22͔,that there are features of such networks that we would miss if we were to approximatethem by an ordinary͑Poisson͒random graph.Another very widely studied network is the internet,whose structure has attracted an exceptional amount of scru-tiny,academic and otherwise,following its meteoric rise topublic visibility starting in1993.Pages on the world-wideweb may be thought of as the vertices of a graph and thehyperlinks between them as edges.Empirical studies͓23–26͔have shown that this graph has a distribution of vertexdegree which is heavily right skewed and possesses a fat ͑power law͒tail with an exponent betweenϪ2andϪ3.͑The underlying physical structure of the internet also has a degree distribution of this type͓27͔.͒This distribution is veryfar from Poisson,and therefore we would expect that asimple random graph would give a very poor approximationof the structural properties of the web.However,the webdiffers from a random graph in another way also:it is di-rected.Links on the web lead from one page to another inonly one direction͓see Fig.1͑b͔͒.As discussed by Broderet al.͓26͔,this has a significant practical effect on the typical accessibility of one page from another,and this effect also will not be captured by a simple͑undirected͒random graph model.A further class of networks that has attracted scrutiny is the class of collaboration networks.Examples of such net-works include the boards of directors of companies͓28–31͔, co-ownership networks of companies͓32͔,and collabora-tions of scientists͓33–37͔and movie actors͓15͔.As well as having strongly non-Poisson degree distributions͓14,36͔, these networks have a bipartite structure;there are two dis-tinct kinds of vertices on the graph with links running only between vertices of unlike kinds͓38͔—see Fig.2.In the case of movie actors,for example,the two types of vertices are movies and actors,and the network can be represented as a graph with edges running between each movie and the actors that appear in it.Researchers have also considered the pro-jection of this graph onto the unipartite space of actors only, also called a one-mode network͓38͔.In such a projection two actors are considered connected if they have appeared in a movie together.The construction of the one-mode network however involves discarding some of the information con-tained in the original bipartite network,and for this reason it is more desirable to model collaboration networks using the full bipartite structure.Given the high current level of interest in the structure of many of the graphs described here͓39͔,and given their sub-stantial differences from the ordinary random graphs that have been studied in the past,it would clearly be useful if we could generalize the mathematics of random graphs to non-Poisson degree distributions,and to directed and bipartite graphs.In this paper we do just that,demonstrating in detail how the statistical properties of each of these graph types can be calculated exactly in the limit of large graph size.We also give examples of the application of our theory to the model-ing of a number of real-world networks,including the world-wide web and collaboration graphs.II.RANDOM GRAPHS WITH ARBITRARYDEGREE DISTRIBUTIONSIn this section we develop a formalism for calculating a variety of quantities,both local and global,on large unipar-tite undirected graphs with arbitrary probability distribution of the degrees of their vertices.In all respects other than their degree distribution,these graphs are assumed to be entirely random.This means that the degrees of all vertices are inde-pendent identically distributed random integers drawn from a specified distribution.For a given choice of these degrees, also called the‘‘degree sequence,’’a graph is chosen uni-formly at random from the set of all graphs with that degree sequence.All properties calculated in this paper are averaged over the ensemble of graphs generated in this way.In the limit of large graph size an equivalent procedure is to study only one particular degree sequence,averaging uniformly over all graphs with that sequence,where the sequence is chosen to approximate as closely as possible the desired probability distribution.The latter procedure can be thought of as a‘‘microcanonical ensemble’’for random graphs, where the former is a‘‘canonical ensemble.’’Some results are already known for random graphs with arbitrary degree distributions:in two beautiful recent papers ͓40,41͔,Molloy and Reed have derived formulas for the po-sition of the phase transition at which a giant componentfirst appears,and the size of the giant component.͑These results are calculated within the microcanonical ensemble,butapply FIG.2.A schematic representation͑top͒of a bipartite graph, such as the graph of movies and the actors who have appeared in them.In this small graph we have four movies,labeled1to4,and 11actors,labeled A to K,with edges joining each movie to the actors in its cast.In the lower part of the picture we show the one-mode projection of the graph for the11actors.M.E.J.NEWMAN,S.H.STROGATZ,AND D.J.WATTS PHYSICAL REVIEW E64026118equally to the canonical one in the large system size limit.͒The formalism we present in this paper yields an alternative derivation of these results and also provides a framework for obtaining other quantities of interest,some of which we cal-culate.In Secs.III and IV we extend our formalism to the case of directed graphs ͑such as the world-wide web ͒and bipartite graphs ͑such as collaboration graphs ͒.A.Generating functionsOur approach is based on generating functions ͓42͔,the most fundamental of which,for our purposes,is the generat-ing function G 0(x )for the probability distribution of vertex degrees k .Suppose that we have a unipartite undirected graph—an acquaintance network,for example—of N verti-ces,with N large.We defineG 0͑x ͒ϭ͚k ϭ0ϱp k x k ,͑2͒where p k is the probability that a randomly chosen vertex onthe graph has degree k .The distribution p k is assumed cor-rectly normalized,so thatG 0͑1͒ϭ1.͑3͒The same will be true of all generating functions consideredhere,with a few important exceptions,which we will note at the appropriate point.Because the probability distribution is normalized and positive definite,G 0(x )is also absolutely convergent for all ͉x ͉р1,and hence has no singularities in this region.All the calculations of this paper will be confined to the region ͉x ͉р1.The function G 0(x ),and indeed any probability generat-ing function,has a number of properties that will prove use-ful in subsequent developments.Derivatives .The probability p k is given by the k th deriva-tive of G 0according top k ϭ1k !d k G 0dx kͯx ϭ0.͑4͒Thus the one function G 0(x )encapsulates all the information contained in the discrete probability distribution p k .We say that the function G 0(x )‘‘generates’’the probability distribu-tion p k .Moments .The average over the probability distribution generated by a generating function—for instance,the aver-age degree z of a vertex in the case of G 0(x )—is given byz ϭ͗k ͘ϭ͚kkp k ϭG 0Ј͑1͒.͑5͒Thus if we can calculate a generating function we can alsocalculate the mean of the probability distribution which it generates.Higher moments of the distribution can be calcu-lated from higher derivatives also.In general,we have͗k n͘ϭ͚kk n p k ϭͫͩx d dxͪnG 0͑x ͒ͬx ϭ1.͑6͒Powers .If the distribution of a property k of an object isgenerated by a given generating function,then the distribu-tion of the total of k summed over m independent realizations of the object is generated by the m th power of that generat-ing function.For example,if we choose m vertices at random from a large graph,then the distribution of the sum of the degrees of those vertices is generated by ͓G 0(x )͔m .To see why this is so,consider the simple case of just two vertices.The square ͓G 0(x )͔2of the generating function for a single vertex can be expanded as͓G 0͑x ͔͒2ϭ͚ͫkp k xkͬ2ϭ͚jkp j p k x j ϩk ϭp 0p 0x 0ϩ͑p 0p 1ϩp 1p 0͒x 1ϩ͑p 0p 2ϩp 1p 1ϩp 2p 0͒x 2ϩ͑p 0p 3ϩp 1p 2ϩp 2p 1ϩp 3p 0͒x 3ϩ•••.͑7͒It is clear that the coefficient of the power of x n in this expression is precisely the sum of all products p j p k such that j ϩk ϭn ,and hence correctly gives the probability that the sum of the degrees of the two vertices will be n .It is straight-forward to convince oneself that this property extends also to all higher powers of the generating function.All of these properties will be used in the derivations given in this paper.Another quantity that will be important to us is the distri-bution of the degree of the vertices that we arrive at by following a randomly chosen edge.Such an edge arrives at a vertex with probability proportional to the degree of that vertex,and the vertex therefore has a probability distribution of degree proportional to kp k .The correctly normalized dis-tribution is generated by͚k kp k x k ͚kkp k ϭxG 0Ј͑x ͒G 0Ј͑1͒.͑8͒If we start at a randomly chosen vertex and follow each ofthe edges at that vertex to reach the k nearest neighbors,then the vertices arrived at each have the distribution of remaining outgoing edges generated by this function,less one power of x ,to allow for the edge that we arrived along.Thus the distribution of outgoing edges is generated by the functionG 1͑x ͒ϭG 0Ј͑x ͒G 0Ј͑1͒ϭ1z G 0Ј͑x ͒,͑9͒where z is the average vertex degree,as before.The prob-ability that any of these outgoing edges connects to the origi-RANDOM GRAPHS WITH ARBITRARY DEGREE ...PHYSICAL REVIEW E 64026118nal vertex that we started at,or to any of its other immediate neighbors,goes as N Ϫ1and hence can be neglected in the limit of large N .Thus,making use of the ‘‘powers’’property of the generating function described above,the generating function for the probability distribution of the number of second neighbors of the original vertex can be written as͚kp k ͓G 1͑x ͔͒k ϭG 0...G 1͑x ͒. (10)Similarly,the distribution of third-nearest neighbors is gen-erated by G 0(G 1…G 1(x )…),and so on.The average numberz 2of second neighbors isz 2ϭͫddx G 0…G 1͑x ͒…ͬx ϭ1ϭG 0Ј͑1͒G 1Ј͑1͒ϭG 0Љ͑1͒,͑11͒where we have made use of the fact that G 1(1)ϭ1.͑Onemight be tempted to conjecture that since the average num-ber of first neighbors is G 0Ј(1),Eq.͑5͒,and the average number of second neighbors is G 0Љ(1),Eq.͑11͒,then the average number of m th neighbors should be given by the m th derivative of G 0evaluated at x ϭ1.As we show in Sec.II F,however,this conjecture is wrong.͒B.ExamplesTo make things more concrete,we immediately introduce some examples of specific graphs to illustrate how these cal-culations are carried out.(a)Poisson-distributed graphs .The simplest example of a graph of this type is one for which the distribution of degree is binomial,or Poisson in the large N limit.This distribution yields the standard random graph studied by many mathema-ticians and discussed in Sec.I.In this graph the probability p ϭz /N of the existence of an edge between any two vertices is the same for all vertices,and G 0(x )is given by G 0͑x ͒ϭ͚k ϭ0NͩN kͪp k ͑1Ϫp ͒N Ϫk x k ϭ͑1Ϫp ϩpx ͒N ϭe z (x Ϫ1),͑12͒where the last equality applies in the limit N →ϱ.It is thentrivial to show that the average degree of a vertex is indeed G 0Ј(1)ϭz and that the probability distribution of degree is given by p k ϭz k e Ϫz /k !,which is the ordinary Poisson distri-bution.Notice also that for this special case we have G 1(x )ϭG 0(x ),so that the distribution of outgoing edges at a vertex is the same,regardless of whether we arrived there by choosing a vertex at random,or by following a randomly chosen edge.This property,which is peculiar to the Poisson-distributed random graph,is the reason why the theory of random graphs of this type is especially simple.(b)Exponentially distributed graphs .Perhaps the next simplest type of graph is one with an exponential distribution of vertex degreesp k ϭ͑1Ϫe Ϫ1/͒e Ϫk /,͑13͒where is a constant.The generating function for this dis-tribution isG 0͑x ͒ϭ͑1ϪeϪ1/͚͒k ϭ0ϱeϪk /x kϭ1Ϫe Ϫ1/1Ϫxe Ϫ1/,͑14͒andG 1͑x ͒ϭͫ1Ϫe Ϫ1/1Ϫxe Ϫ1/ͬ2.͑15͒An example of a graph with an exponential degree distribu-tion is given in Sec.V A.(c)Power-law distributed graphs .The recent interest in the properties of the world-wide web and of social networks leads us to investigate the properties of graphs with a power-law distribution of vertex degrees.Such graphs have beendiscussed previously by Baraba´si and co-workers ͓22,23͔and by Aiello et al.͓13͔.In this paper,we will look at graphs with degree distribution given byp k ϭCk Ϫe Ϫk /for k у1,͑16͒where C ,,and are constants.The reason for including the exponential cutoff is twofold:first many real-world graphs appear to show this cutoff ͓14,36͔;second it makes the distribution normalizable for all ,and not just у2.The constant C is fixed by the requirement of normaliza-tion,which gives C ϭ͓Li (e Ϫ1/)͔Ϫ1and hencep k ϭk Ϫe Ϫk /Li ͑e Ϫ1/͒for k у1,͑17͒where Li n (x )is the n th polylogarithm of x .͓For those unfa-miliar with this function,its salient features for our purposes are that it is zero at x ϭ0and,real,finite,and monotonically increasing in the range 0рx Ͻ1,for all n .It also decreases with increasing n ,and has a pole at x ϭ1for n р1only,although it has a valid analytic continuation below n ϭ1which takes the value (n )at x ϭ1.͔Substituting Eq.͑17͒into Eq.͑2͒,we find that the gener-ating function for graphs with this degree distribution isG 0͑x ͒ϭLi ͑xe Ϫ1/͒Li ͑e Ϫ1/͒.͑18͒In the limit →ϱ—the case considered in Refs.͓13͔and ͓23͔—this simplifies toG 0͑x ͒ϭLi ͑x ͒͑͒,͑19͒where ()is the Riemann function.The function G 1(x )is given byG 1͑x ͒ϭLi Ϫ1͑xe Ϫ1/͒x Li Ϫ1͑e Ϫ1/͒.͑20͒M.E.J.NEWMAN,S.H.STROGATZ,AND D.J.WATTS PHYSICAL REVIEW E 64026118Thus,for example,the average number of neighbors of a randomly chosen vertex iszϭG0Ј͑1͒ϭLiϪ1͑eϪ1/͒Li͑eϪ1/͒,͑21͒and the average number of second neighbors isz2ϭG0Љ͑1͒ϭLiϪ2͑eϪ1/͒ϪLiϪ1͑eϪ1/͒Li͑eϪ1/͒.͑22͒(d)Graphs with arbitrary specified degree distribution.In some cases we wish to model specific real-world graphs that have known degree distributions—known because we can measure them directly.A number of the graphs described in the Introduction fall into this category.For these graphs,we know the exact numbers n k of vertices having degree k,and hence we can write down the exact generating function for that probability distribution in the form of afinite polynomialG0͑x͒ϭ͚kn k x k͚kn k,͑23͒where the sum in the denominator ensures that the generating function is properly normalized.As an example,suppose that in a community of1000people,each person knows between zero andfive of the others,the exact numbers of people in each category being,from zero tofive:͕86,150,363,238,109,54͖.This distribution will then be gen-erated by the polynomialG0͑x͒ϭ86ϩ150xϩ363x2ϩ238x3ϩ109x4ϩ54x51000.͑24͒ponent sizesWe are now in a position to calculate some properties of interest for our graphs.First let us consider the distribution of the sizes of connected components in the graph.Let H1(x)be the generating function for the distribution of the sizes of components that are reached by choosing a random edge and following it to one of its ends.We explicitly ex-clude from H1(x)the giant component,if there is one;the giant component is dealt with separately below.Thus,except when we are precisely at the phase transition where the giant component appears,typical component sizes arefinite,and the chances of a component containing a closed loop of edges goes as NϪ1,which is negligible in the limit of large N.This means that the distribution of components generated by H1(x)can be represented graphically as in Fig.3;each component is treelike in structure,consisting of the single site we reach by following our initial edge,plus any number ͑including zero͒of other treelike clusters,with the same size distribution,joined to it by single edges.If we denote by q k the probability that the initial site has k edges coming out of it other than the edge we came in along,then,making use of the‘‘powers’’property of Sec.II A,H1(x)must satisfy a self-consistency condition of the formH1͑x͒ϭxq0ϩxq1H1͑x͒ϩxq2͓H1͑x͔͒2ϩ•••.͑25͒However,q k is nothing other than the coefficient of x k in the generating function G1(x),Eq.͑9͒,and hence Eq.͑25͒can also be writtenH1͑x͒ϭxG1...H1͑x͒. (26)If we start at a randomly chosen vertex,then we have one such component at the end of each edge leaving that vertex, and hence the generating function for the size of the whole component isH0͑x͒ϭxG0...H1͑x͒. (27)In principle,therefore,given the functions G0(x)and G1(x),we can solve Eq.͑26͒for H1(x)and substitute into Eq.͑27͒to get H0(x).Then we canfind the probability that a randomly chosen vertex belongs to a component of size s by taking the s th derivative of H0.In practice,unfortunately, this is usually impossible;Equation͑26͒is a complicated and frequently transcendental equation,which rarely has a known solution.On the other hand,we note that the coeffi-cient of x s in the Taylor expansion of H1(x)͑and therefore also the s th derivative͒are given exactly by only sϩ1itera-tions of Eq.͑27͒,starting with H1ϭ1,so that the distribution generated by H0(x)can be calculated exactly tofinite order infinite time.With current symbolic manipulation programs, it is quite possible to evaluate thefirst one hundred or soderivatives in this way.Failing this,an approximate solution can be found by numerical iteration and the distribution of cluster sizes calculated from Eq.͑4͒by numerical differen-tiation.Since direct evaluation of numerical derivatives is prone to machine-precision problems,we recommend evalu-ating the derivatives by numerical integration of the Cauchy formula,giving the probability distribution P s of cluster sizes thus:P sϭ1s!d s H0dz sͯzϭ0ϭ12iͶH0͑z͒z sϩ1dz.͑28͒FIG.3.Schematic representation of the sum rule for the con-nected component of vertices reached by following a randomly cho-sen edge.The probability of each such component͑left-hand side͒can be represented as the sum of the probabilities͑right-hand side͒of having only a single vertex,having a single vertex connected to one other component,or two other components,and so forth.The entire sum can be expressed in closed form as Eq.͑26͒.RANDOM GRAPHS WITH ARBITRARY DEGREE...PHYSICAL REVIEW E64026118The best numerical precision is obtained by using the largestpossible contour,subject to the condition that it encloses nopoles of the generating function.The largest contour forwhich this condition is satisfied in general is the unit circle ͉z͉ϭ1͑see Sec.II A͒,and we recommend using this contour for Eq.͑28͒.It is possible tofind thefirst thousand deriva-tives of a function without difficulty using this method͓43͔.D.The mean component size,the phase transition,and thegiant componentAlthough it is not usually possible tofind a closed-formexpression for the complete distribution of cluster sizes on agraph,we canfind closed-form expressions for the averageproperties of clusters from Eqs.͑26͒and͑27͒.For example,the average size of the component to which a randomly cho-sen vertex belongs,for the case where there is no giant com-ponent in the graph,is given in the normal fashion by͗s͘ϭH0Ј͑1͒ϭ1ϩG0Ј͑1͒H1Ј͑1͒.͑29͒From Eq.͑26͒we haveH1Ј͑1͒ϭ1ϩG1Ј͑1͒H1Ј͑1͒,͑30͒and hence͗s͘ϭ1ϩG0Ј͑1͒1ϪG1Ј͑1͒ϭ1ϩz12z1Ϫz2,͑31͒where z1ϭz is the average number of neighbors of a vertex and z2is the average number of second neighbors.We see that this expression diverges whenG1Ј͑1͒ϭ1.͑32͒This point marks the phase transition at which a giant com-ponentfirst appears.Substituting Eqs.͑2͒and͑9͒into Eq.͑32͒,we can also write the condition for the phase transition as͚kk͑kϪ2͒p kϭ0.͑33͒Indeed,since this sum increases monotonically as edges are added to the graph,it follows that the giant component exists if and only if this sum is positive.This result has been de-rived by different means by Molloy and Reed͓40͔.An equivalent and intuitively reasonable statement,which can also be derived from Eq.͑31͒,is that the giant component exists if and only if z2Ͼz1.Our generating function formalism still works when there is a giant component in the graph,but,by definition,H0(x) then generates the probability distribution of the sizes of components excluding the giant component.This means that H0(1)is no longer unity,as it is for the other generating functions considered so far,but instead takes the value 1ϪS,where S is the fraction of the graph occupied by the giant component.We can use this to calculate the size of the giant component from Eqs.͑26͒and͑27͒thus:Sϭ1ϪG0͑u͒,͑34͒where uϵH1(1)is the smallest non-negative real solution ofuϭG1͑u͒.͑35͒This result has been derived in a different but equivalent form by Molloy and Reed͓41͔,using different methods.The correct general expression for the average component size,excluding the͑formally infinite͒giant component,if there is one,is͗s͘ϭH0Ј͑1͒H0͑1͒ϭ1H0͑1͒ͫG0…H1͑1͒…ϩG0Ј…H1͑1͒…G1…H1͑1͒…1ϪG1Ј…H1͑1͒…ͬϭ1ϩzu2͓1ϪS͔͓1ϪG1Ј͑u͔͒,͑36͒which is equivalent to Eq.͑31͒when there is no giant com-ponent(Sϭ0,uϭ1).For example,in the ordinary random graph with Poisson degree distribution,we have G0(x)ϭG1(x)ϭe z(xϪ1)͓Eq.͑12͔͒,and hence wefind simply that1ϪSϭu is a solution of uϭG0(u),or equivalently thatSϭ1ϪeϪzS.͑37͒The average component size is given by͗s͘ϭ11ϪzϩzS.͑38͒These are both well-known results͓1͔.For graphs with purely power-law distributions͓Eq.͑17͒with→ϱ͔,S is given by Eq.͑34͒with u the smallest non-negative real solution ofuϭLiϪ1͑u͒u͑Ϫ1͒.͑39͒For allр2this gives uϭ0,and hence Sϭ1,implying that a randomly chosen vertex belongs to the giant component with probability tending to1as→ϱ.For graphs withϾ2,the probability of belonging to the giant component is strictly less than1,even for infinite.In other words,the giant component essentiallyfills the entire graph forр2, but not forϾ2.These results have been derived by differ-ent means by Aiello et al.͓13͔.E.Asymptotic form of the cluster size distributionA variety of results are known about the asymptotic prop-erties of the coefficients of generating functions,some of which can usefully be applied to the distribution of cluster sizes P s generated by H0(x).Close to the phase transition, we expect the tail of the distribution P s to behave asP sϳsϪ␣eϪs/s*,͑40͒M.E.J.NEWMAN,S.H.STROGATZ,AND D.J.WATTS PHYSICAL REVIEW E64026118。