Random-Graphs-with-Arbitrary-Degree-Distributions-and-Their-Applications

格式：pdf
大小：211.62 KB
文档页数：17

下载文档原格式

英文论文写作中一些可能用到的词汇

英⽂论⽂写作中⼀些可能⽤到的词汇英⽂论⽂写作过程中总是被⾃⼰可怜的词汇量击败, 所以我打算在这⾥记录⼀些在阅读论⽂过程中见到的⼀些⾃⼰不曾见过的词句或⽤法。

这些词句查词典都很容易查到，但是只有带⼊论⽂原⽂中才能体会内涵。

毕竟原⽂和译⽂中间总是存在⼀条看不见的思想鸿沟。

形容词1. vanilla: adj. 普通的, 寻常的, 毫⽆特⾊的. ordinary; not special in any way.2. crucial: adj. ⾄关重要的, 关键性的.3. parsimonious：adj. 悭吝的, 吝啬的, ⼩⽓的.e.g. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity.4. diverse: adj. 不同的, 相异的, 多种多样的, 形形⾊⾊的.5. intriguing: adj. ⾮常有趣的, 引⼈⼊胜的; 神秘的. *intrigue: v. 激起…的兴趣, 引发…的好奇⼼; 秘密策划(加害他⼈), 密谋.e.g. The results of this paper carry several intriguing implications.6. intimate: adj. 亲密的; 密切的. v.透露; (间接)表⽰, 暗⽰.e.g. The above problems are intimately linked to machine learning on graphs.7. akin: adj. 类似的, 同族的, 相似的.e.g. Akin to GNN, in LOCAL a graph plays a double role: ...8. abundant: adj. ⼤量的, 丰盛的, 充裕的.9. prone: adj. 有做(坏事)的倾向; 易于遭受…的; 俯卧的.e.g. It is thus prone to oversmoothing when convolutions are applied repeatedly.10.concrete: adj. 混凝⼟制的; 确实的, 具体的(⽽⾮想象或猜测的); 有形的; 实在的.e.g. ... as a concrete example ...e.g. More concretely, HGCN applies the Euclidean non-linear activation in...11. plausible: adj. 有道理的; 可信的; 巧⾔令⾊的, 花⾔巧语的.e.g. ... this interpretation may be a plausible explanation of the success of the recently introduced methods.12. ubiquitous: adj. 似乎⽆所不在的;⼗分普遍的.e.g. While these higher-order interac- tions are ubiquitous, an evaluation of the basic properties and organizational principles in such systems is missing.13. disparate: adj. 由不同的⼈(或事物)组成的;迥然不同的;⽆法⽐较的.e.g. These seemingly disparate types of data have something in common: ...14. profound: adj. 巨⼤的; 深切的, 深远的; 知识渊博的; 理解深刻的;深邃的, 艰深的; ⽞奥的.e.g. This has profound consequences for network models of relational data — a cornerstone in the interdisciplinary study of complex systems.15. blurry: adj. 模糊不清的.e.g. When applying these estimators to solve (2), the line between the critic and the encoders $g_1, g_2$ can be blurry.16. amenable: adj. 顺从的; 顺服的; 可⽤某种⽅式处理的.e.g. Ou et al. utilize sparse generalized SVD to generate a graph embedding, HOPE, from a similarity matrix amenableto de- composition into two sparse proximity matrices.17. elaborate: adj. 复杂的;详尽的;精⼼制作的 v.详尽阐述;详细描述;详细制订;精⼼制作e.g. Topic Modeling for Graphs also requires elaborate effort, as graphs are relational while documents are indepen- dent samples.18. pivotal: adj. 关键性的；核⼼的e.g. To ensure the stabilities of complex systems is of pivotal significance toward reliable and better service providing.19. eminent: adj. 卓越的，著名的，显赫的;⾮凡的;杰出的e.g. To circumvent those defects, theoretical studies eminently represented by percolation theories appeared.20. indispensable: adj. 不可或缺的;必不可少的 n. 不可缺少的⼈或物e.g. However, little attention is paid to multipartite networks, which are an indispensable part of complex networks.21. post-hoc: adj. 事后的e.g. Post-hoc explainability typically considers the question “Why the GNN predictor made certain prediction?”.22. prevalent: adj. 流⾏的;盛⾏的;普遍存在的e.g. A prevalent solution is building an explainer model to conduct feature attribution23. salient: adj. 最重要的;显著的;突出的. n. 凸⾓;[建]突出部;<军>进攻或防卫阵地的突出部分e.g. It decomposes the prediction into the contributions of the input features, which redistributes the probability of features according to their importance and sample the salient features as an explanatory subgraph.24. rigorous: adj. 严格缜密的;严格的;谨慎的;细致的;彻底的;严厉的e.g. To inspect the OOD effect rigorously, we take a causal look at the evaluation process with a Structural Causal Model.25. substantial: adj. ⼤量的;价值巨⼤的;重⼤的;⼤⽽坚固的;结实的;牢固的. substantially: adv. ⾮常;⼤⼤地;基本上;⼤体上;总的来说26. cogent: adj. 有说服⼒的;令⼈信服的e.g. The explanatory subgraph $G_s$ emphasizes tokens like “weak” and relations like “n’t→funny”, which is cogent according to human knowledge.27. succinct: adj. 简练的;简洁的 succinctly: adv. 简⽽⾔之，简明扼要地28. concrete: adj. 混凝⼟制的;确实的，具体的(⽽⾮想象或猜测的);有形的;实在的 concretely: adv. 具体地;具体;具体的;有形地29. predominant：adj. 主要的;主导的;显著的;明显的;盛⾏的;占优势的动词1. mitigate: v. 减轻, 缓和. (反 enforce)e.g. In this work, we focus on mitigating this problem for a certain class of symbolic data.2. corroborate: v. [VN] [often passive] (formal) 证实, 确证.e.g. This is corroborated by our experiments on real-world graph.3. endeavor: n./v. 努⼒, 尽⼒, 企图, 试图.e.g. It encourages us to continue the endeavor in applying principles mathematics and theory in successful deployment of deep learning.4. augment: v. 增加, 提⾼, 扩⼤. n. 增加, 补充物.e.g. We also augment the graph with geographic information (longitude, latitude and altitude), and GDP of the country where the airport belongs to.5. constitute: v. (被认为或看做)是, 被算作; 组成, 构成; (合法或正式地)成⽴, 设⽴.6. abide: v. 接受, 遵照(规则, 决定, 劝告); 逗留, 停留.e.g. Training a graph classifier entails identifying what constitutes a class, i.e., finding properties shared by graphs in one class but not the other, and then deciding whether new graphs abide to said learned properties.7. entail: v. 牵涉; 需要; 使必要. to involve sth that cannot be avoided.e.g. Due to the recursive definition of the Chebyshev polynomials, the computation of the filter $g_α(\Delta)f$ entails applying the Laplacian $r$ times, resulting cal operator affecting only 1-hop neighbors of a vertex and in $O(rn)$ operations.8. encompass: v. 包含, 包括, 涉及(⼤量事物); 包围, 围绕, 围住.e.g. This model is chosen as it is sufficiently general to encompass several state-of-the-art networks.e.g. The k-cycle detection problem entails determining if G contains a k-cycle.9. reveal: v. 揭⽰, 显⽰, 透露, 显出, 露出, 展⽰.10. bestow: v. 将(…)给予, 授予, 献给.e.g. Aiming to bestow GCNs with theoretical guarantees, one promising research direction is to study graph scattering transforms (GSTs).11. alleviate: v. 减轻, 缓和, 缓解.12. investigate: v. 侦查(某事), 调查(某⼈), 研究, 调查.e.g. The sensitivity of pGST to random and localized noise is also investigated.13. fuse: v. (使)融合, 熔接, 结合; (使)熔化, (使保险丝熔断⽽)停⽌⼯作.e.g. We then fuse the topological embeddings with the initial node features into the initial query representations using a query network$f_q$ implemented as a two-layer feed-forward neural network.14. magnify: v. 放⼤, 扩⼤; 增强; 夸⼤(重要性或严重性); 夸张.e.g. ..., adding more layers also leads to more parameters which magnify the potential of overfitting.15. circumvent: v. 设法回避, 规避; 绕过, 绕⾏.e.g. To circumvent the issue and fulfill both goals simultaneously, we can add a negative term...16. excel: v. 擅长, 善于; 突出; 胜过平时.e.g. Nevertheless, these methods have been repeatedly shown to excel in practice.17. exploit: v. 利⽤(…为⾃⼰谋利); 剥削, 压榨; 运⽤, 利⽤; 发挥.e.g. In time series and high-dimensional modeling, approaches that use next step prediction exploit the local smoothness of the signal.18. regulate: v. (⽤规则条例)约束, 控制, 管理; 调节, 控制(速度、压⼒、温度等).e.g. ... where $b >0$ is a parameter regulating the probability of this event.19. necessitate: v. 使成为必要.e.g. Combinatorial models reproduce many-body interactions, which appear in many systems and necessitate higher-order models that capture information beyond pairwise interactions.20. portray:描绘, 描画, 描写; 将…描写成; 给⼈以某种印象; 表现; 扮演(某⾓⾊).e.g. Considering pairwise interactions, a standard network model would portray the link topology of the underlying system as shown in Fig. 2b.21. warrant: v. 使有必要; 使正当; 使恰当. n. 执⾏令; 授权令; (接受款项、服务等的)凭单, 许可证; (做某事的)正当理由, 依据.e.g. Besides statistical methods that can be used to detect correlations that warrant higher-order models, ... (除了可以⽤来检测⽀持⾼阶模型的相关性的统计⽅法外, ...)22. justify: v. 证明…正确(或正当、有理); 对…作出解释; 为…辩解(或辩护); 调整使全⾏排满; 使每⾏排齐.e.g. ..., they also come with the assumption of transitive, Markovian paths, which is not justified in many real systems.23. hinder:v. 阻碍; 妨碍; 阻挡. (反 foster: v. 促进; 助长; 培养; ⿎励; 代养, 抚育, 照料(他⼈⼦⼥⼀段时间))e.g. The eigenvalues and eigenvectors of these matrix operators capture how the topology of a system influences the efficiency of diffusion and propagation processes, whether it enforces or mitigates the stability of dynamical systems, or if it hinders or fosters collective dynamics.24. instantiate:v. 例⽰；⽤具体例⼦说明.e.g. To learn the representation we instantiate (2) and split each input MNIST image into two parts ...25. favor:v. 赞同;喜爱, 偏爱; 有利于, 便于. n. 喜爱, 宠爱, 好感, 赞同; 偏袒, 偏爱; 善⾏, 恩惠.26. attenuate: v. 使减弱; 使降低效⼒.e.g. It therefore seems that the bounds we consider favor hard-to-invert encoders, which heavily attenuate part of the noise, over well conditioned encoders.27. elucidate:v. 阐明; 解释; 说明.e.g. Secondly, it elucidates the importance of appropriately choosing the negative samples, which is indeed a critical component in deep metric learning based on triplet losses.28. violate: v. 违反, 违犯, 违背(法律、协议等); 侵犯(隐私等); 使⼈不得安宁; 搅扰; 亵渎, 污损(神圣之地).e.g. Negative samples are obtained by patches from different images as well as patches from the same image, violating the independence assumption.29. compel:v. 强迫, 迫使; 使必须; 引起(反应).30. gauge: v. 判定, 判断(尤指⼈的感情或态度); (⽤仪器)测量, 估计, 估算. n. 测量仪器(或仪表);计量器;宽度;厚度;(枪管的)⼝径e.g. Yet this hyperparameter-tuned approach raises a cubic worst-case space complexity and compels the user to traverse several feature sets and gauge the one that attains the best performance in the downstream task.31. depict: v. 描绘, 描画; 描写, 描述; 刻画.e.g. As they depict different aspects of a node, it would take elaborate designs of graph convolutions such that each set of features would act as a complement to the other.32. sketch: n. 素描;速写;草图;幽默短剧;⼩品;简报;概述 v. 画素描;画速写;概述;简述e.g. Next we sketch how to apply these insights to learning topic models.33. underscore：v. 在…下⾯划线;强调;着重说明 n.下划线e.g. Moreover, the walk-topic distributions generated by Graph Anchor LDA are indeed sharper than those by ordinary LDA, underscoring the need for selecting anchors.34. disclose: v. 揭露;透露;泄露;使显露;使暴露e.g. Another drawback lies in their unexplainable nature, i.e., they cannot disclose the sciences beneath network dynamics.35. coincide: v. 同时发⽣;相同;相符;极为类似;相接;相交;同位;位置重合;重叠e.g. The simulation results coincide quite well with the theoretical results.36. inspect: v. 检查;查看;审视;视察 to look closely at sth/sb, especially to check that everything is as it should be名词1. capacity: n. 容量, 容积, 容纳能⼒; 领悟(或理解、办事)能⼒; 职位, 职责.e.g. This paper studies theoretically the computational capacity limits of graph neural networks (GNN) falling within the message-passing framework of Gilmer et al. (2017).2. implication: n. 可能的影响(或作⽤、结果); 含意, 暗指; (被)牵连, 牵涉.e.g. Section 4 analyses the implications of restricting the depth $d$ and width $w$ of GNN that do not use a readout function.3. trade-off:(在需要⽽⼜相互对⽴的两者间的)权衡, 协调.e.g. This reveals a direct trade-off between the depth and width of a graph neural network.4. cornerstone:n. 基⽯; 最重要部分; 基础; 柱⽯.5. umbrella: n. 伞; 综合体; 总体, 整体; 保护, 庇护(体系).e.g. Community detection is an umbrella term for a large number of algorithms that group nodes into distinct modules to simplify and highlight essential structures in the network topology.6. folklore:n. 民间传统, 民俗; 民间传说.e.g. It is folklore knowledge that maximizing MI does not necessarily lead to useful representations.7. impediment:n. 妨碍,阻碍,障碍; ⼝吃.e.g. While a recent approach overcomes this impediment, it results in poor quality in prediction tasks due to its linear nature.8. obstacle:n. 障碍;阻碍; 绊脚⽯; 障碍物; 障碍栅栏.e.g. However, several major obstacles stand in our path towards leveraging topic modeling of structural patterns to enhance GCNs.9. vicinity:n. 周围地区; 邻近地区; 附近.e.g. The traits with which they engage are those that are performed in their vicinity.10. demerit: n. 过失,缺点,短处; (学校给学⽣记的)过失分e.g. However, their principal demerit is that their implementations are time-consuming when the studied network is large in size. Another介/副/连词1. notwithstanding:prep. 虽然;尽管 adv. 尽管如此.e.g. Notwithstanding this fundamental problem, the negative sampling strategy is often treated as a design choice.2. albeit: conj. 尽管;虽然e.g. Such methods rely on an implicit, albeit rigid, notion of node neighborhood; yet this one-size-fits-all approach cannot grapple with the diversity of real-world networks and applications.3. Hitherto:adv. 迄今;直到某时e.g. Hitherto, tremendous endeavors have been made by researchers to gauge the robustness of complex networks in face of perturbations.短语1.in a nutshell: 概括地说, 简⾔之, ⼀⾔以蔽之.e.g. In a nutshell, GNN are shown to be universal if four strong conditions are met: ...2. counter-intuitively: 反直觉地.3. on-the-fly:动态的(地), 运⾏中的(地).4. shed light on/into:揭⽰, 揭露; 阐明; 解释; 将…弄明⽩; 照亮.e.g. These contemporary works shed light into the stability and generalization capabilities of GCNs.e.g. Discovering roles and communities in networks can shed light on numerous graph mining tasks such as ...5. boil down to: 重点是; 将…归结为.e.g. These aforementioned works usually boil down to a general classification task, where the model is learnt on a training set and selected by checking a validation set.6. for the sake of:为了.e.g. The local structures anchored around each node as well as the attributes of nodes therein are jointly encoded with graph convolution for the sake of high-level feature extraction.7. dates back to:追溯到.e.g. The usual problem setup dates back at least to Becker and Hinton (1992).8. carry out:实施, 执⾏, 实⾏.e.g. We carry out extensive ablation studies and sensi- tivity analysis to show the effectiveness of the proposed functional time encoding and TGAT-layer.9. lay beyond the reach of:...能⼒达不到e.g. They provide us with information on higher-order dependencies between the components of a system, which lay beyond the reach of models that exclusively capture pairwise links.10. account for: ( 数量或⽐例上)占; 导致, 解释(某种事实或情况); 解释, 说明(某事); (某⼈)对(⾏动、政策等)负有责任; 将(钱款)列⼊(预算).e.g. Multilayer models account for the fact that many real complex systems exhibit multiple types of interactions.11. along with: 除某物以外; 随同…⼀起, 跟…⼀起.e.g. Along with giving us the ability to reason about topological features including community structures or node centralities, network science enables us to understand how the topology of a system influences dynamical processes, and thus its function.12. dates back to:可追溯到.e.g. The usual problem setup dates back at least to Becker and Hinton (1992) and can conceptually be described as follows: ...13. to this end:为此⽬的;为此计;为了达到这个⽬标.e.g. To this end, we consider a simple setup of learning a representation of the top half of MNIST handwritten digit images.14. Unless stated otherwise:除⾮另有说明.e.g. Unless stated otherwise, we use a bilinear critic $f(x, y) = x^TWy$, set the batch size to $128$ and the learning rate to $10^{−4}$.15. As a reference point:作为参照.e.g. As a reference point, the linear classification accuracy from pixels drops to about 84% due to the added noise.16. through the lens of:透过镜头. (以...视⾓)e.g. There are (at least) two immediate benefits of viewing recent representation learning methods based on MI estimators through the lens of metric learning.17. in accordance with:符合；依照；和…⼀致.e.g. The metric learning view seems hence in better accordance with the observations from Section 3.2 than the MI view.It can be shown that the anchors selected by our Graph Anchor LDA are not only indicative of “topics” but are also in accordance with the actual graph structures.18. be akin to:近似, 类似, 类似于.e.g. Thus, our learning model is akin to complex contagion dynamics.19. to name a few:仅举⼏例;举⼏个来说.e.g. Multitasking, multidisciplinary work and multi-authored works, to name a few, are ingrained in the fabric of science culture and certainly multi-multi is expected in order to succeed and move up the scientific ranks.20. a handful of:⼀把;⼀⼩撮;少数e.g. A handful of empirical work has investigated the robustness of complex networks at the community level.21. wreak havoc: 破坏;肆虐;严重破坏;造成破坏;浩劫e.g. Failures on one network could elicit failures on its coupled networks, i.e., networks with which the focal network interacts, and eventually those failures would wreak havoc on the entire network.22. apart from: 除了e.g. We further posit that apart from node $a$ node $b$ has $k$ neighboring nodes.。

基于随机投影与加权稀疏表示残差的光照鲁棒人脸识别方法

基于随机投影与加权稀疏表示残差的光照鲁棒人脸识别方法李燕;章玥【期刊名称】《计算机工程与科学》【年(卷),期】2018(040)011【摘要】针对人脸识别中的光照变化问题,利用随机投影对传统稀疏表示分类器进行改进,提出一种基于随机投影与加权稀疏表示残差的光照鲁棒人脸识别方法.通过对人脸图像进行光照规范化处理,尽量消除人脸图像上的恶劣光照,取得经光照校正的人脸样本后进行多次随机空间投影,进一步丰富样本的光照不变特征,以减小光照变化对人脸识别带来的影响.在此基础上,对利用单一残差分类的传统稀疏表示分类方法进行改进,样本经过多次随机投影和稀疏表示会产生多个样本特征和重构残差,利用样本特征的能量来确定各个重构残差的融合权值,最终得到一种稳定性和可靠性更强的加权残差.在Yale B和CMU PIE两个光照变化较大的人脸库上的实验结果表明,改进的方法具有较强的光照鲁棒性.与传统稀疏表示方法相比,本文提出的方法在Yale B人脸库上两组实验的平均识别率分别提高了25.76％和46.39％,在CMU PIE上的平均识别率提高了10％左右.【总页数】8页(P2015-2022)【作者】李燕;章玥【作者单位】华东师范大学教育部软硬件协同设计技术与应用工程研究中心,上海200062;华东师范大学教育部软硬件协同设计技术与应用工程研究中心,上海200062【正文语种】中文【中图分类】TP391.4【相关文献】1.组合范数正则化稀疏编码和自适应加权残差的鲁棒跟踪 [J], 孔军;成静;蒋敏;柳晨华;顾晓峰2.基于自步学习的加权稀疏表示人脸识别方法 [J], 王学军;王文剑;曹飞龙3.基于加权分块稀疏表示的光照鲁棒性人脸识别 [J], 范守科;朱明4.改进的基于残差加权的稀疏表示人脸识别 [J], 高志荣;熊承义;笪邦友5.基于多方向Gabor特征图稀疏表示的鲁棒人脸识别方法 [J], 徐望明;张培;伍世虔因版权原因，仅展示原文概要，查看原文内容请购买。

A new approach to the planted clique problem Alan Frieze Department of Mathematical Scienc

i,j,k
Ai,j,k xi yj zk .
x, y, z will denote vectors of length 1 throughout. We will reserve u for the scaled characteristic vector of P deﬁned earlier. The following Theorem (which is the Main Theorem of the paper) will imply (see Corollary 2 below) that if at least one of x, y, z is orthogonal to u, then we have
2
The cubic form and the main result
We deﬁne the 3-dimensional array : if i, j, k are distinct and G contains 1 or 3 edges of the triangle i, j, k . 1 −1 if i, j, k are distinct and G contains 0 or 2 edges of the triangle i, j, k . Ai,j,k = 0 if i, j, k are not distinct. We assume that p = C1 n1/3 (log n)4 . ere C1 , C2 , . . . , are unspeciﬁed positive absolute constants. For vectors x, y, z , we deﬁne A(x, y, z ) =
∗ †
Supported in part by NSF grant ccr0200945 Supported in part by an NSF grant

Cryptographic Primitives

with each the degree
vertex v some of freedom of
xed value for #I(v) and vertex v. In the following
we all
lcoaglalrditfh(vm)s:=halvoeg#thZe#bIa(svis)
e#dZgesanaddjlaocgenmt etaontshleogve#rZt.iceFsoirnaUs.ubset U V we let E(U) denote the set of
Cryptographic primitives for encryption, hashing and pseudo-random generation are judged according to e ciency and security. It is a major problem to prove security in a realistic model, in particular since reasonable complexity lower bounds seem to be impossible to prove for the model of boolean networks. By black box cryptanalysis we can prove interesting complexity lower bounds for a family of attacks that comprises exhaustive search and the birthday paradox. Black box cryptanalysis complements di erential and linear cryptanalysis 5, 7] which analyse particular boxes from the point of view of their non-linearity.

A phase transition for the diameter of the configuration model Remco van der Hofstad

A phase transition for the diameter of the conﬁguration modelRemco van der Hofstad∗Gerard Hooghiemstra†and Dmitri Znamenski‡August31,2007AbstractIn this paper,we study the conﬁguration model(CM)with i.i.d.degrees.We establisha phase transition for the diameter when the power-law exponentτof the degrees satisﬁesτ∈(2,3).Indeed,we show that forτ>2and when vertices with degree2are present withpositive probability,the diameter of the random graph is,with high probability,bounded frombelow by a constant times the logarithm of the size of the graph.On the other hand,assumingthat all degrees are at least3or more,we show that,forτ∈(2,3),the diameter of the graphis,with high probability,bounded from above by a constant times the log log of the size of thegraph.1IntroductionRandom graph models for complex networks have received a tremendous amount of attention in the past decade.See[1,22,26]for reviews on complex networks and[2]for a more expository account.Measurements have shown that many real networks share two fundamental properties. Theﬁrst is the fact that typical distances between vertices are small,which is called the‘small world’phenomenon(see[27]).For example,in the Internet,IP-packets cannot use more than a threshold of physical links,and if the distances in terms of the physical links would be large,e-mail service would simply break down.Thus,the graph of the Internet has evolved in such a way that typical distances are relatively small,even though the Internet is rather large.The second and maybe more surprising property of many networks is that the number of vertices with degree k falls oﬀas an inverse power of k.This is called a‘power law degree sequence’,and resulting graphs often go under the name‘scale-free graphs’(see[15]for a discussion where power laws occur in the Internet).The observation that many real networks have the above two properties has incited a burst of activity in network modelling using random graphs.These models can,roughly speaking,be divided into two distinct classes of models:‘static’models and’dynamic’models.In static models, we model with a graph of a given size a snap-shot of a real network.A typical example of this kind of model is the conﬁguration model(CM)which we describe below.A related static model, which can be seen as an inhomogeneous version of the Erd˝o s-R´e nyi random graph,is treated in great generality in[4].Typical examples of the‘dynamical’models,are the so-called preferential attachment models(PAM’s),where added vertices and edges are more likely to be attached to vertices that already have large degrees.PAM’s often focus on the growth of the network as a way to explain the power law degree sequences.∗Department of Mathematics and Computer Science,Eindhoven University of Technology,P.O.Box513,5600 MB Eindhoven,The Netherlands.E-mail:rhofstad@win.tue.nl†Delft University of Technology,Electrical Engineering,Mathematics and Computer Science,P.O.Box5031,2600 GA Delft,The Netherlands.E-mail:G.Hooghiemstra@ewi.tudelft.nl‡EURANDOM,P.O.Box513,5600MB Eindhoven,The Netherlands.E-mail:znamenski@eurandom.nlPhysicists have predicted that distances in PAM’s behave similarly to distances in the CM with similar degrees.Distances in the CM have attracted considerable attention(see e.g.,[14,16,17,18]), but distances in PAM’s far less(see[5,19]),which makes it hard to verify this prediction.Together with[19],the current paper takes aﬁrst step towards a rigorous veriﬁcation of this conjecture.At the end of this introduction we will return to this observation,but let usﬁrst introduce the CM and present our diameter results.1.1The conﬁguration modelThe CM is deﬁned as follows.Fix an integer N.Consider an i.i.d.sequence of random variables D1,D2,...,DN.We will construct an undirected graph with N vertices where vertex j has degreeD j.We will assume that LN =Nj=1D j is even.If LNis odd,then we will increase DNby1.Thissingle change will make hardly any diﬀerence in what follows,and we will ignore this eﬀect.We will later specify the distribution of D1.To construct the graph,we have N separate vertices and incident to vertex j,we have D j stubs or half-edges.The stubs need to be paired to construct the graph.We number the stubs in a givenorder from1to LN .We start by pairing at random theﬁrst stub with one of the LN−1remainingstubs.Once paired,two stubs form a single edge of the graph.Hence,a stub can be seen as the left-or the right-half of an edge.We continue the procedure of randomly choosing and pairing the stubs until all stubs are connected.Unfortunately,vertices having self-loops,as well as multiple edges between vertices,may occur,so that the CM is a multigraph.However,self-loops are scarce when N→∞,as shown e.g.in[7].The above model is a variant of the conﬁguration model[3],which,given a degree sequence,is the random graph with that given degree sequence.The degree sequence of a graph is the vector of which the k th coordinate equals the fraction of vertices with degree k.In our model,by the law of large numbers,the degree sequence is close to the distribution of the nodal degree D of which D1,...,DNare i.i.d.copies.The probability mass function and the distribution function of the nodal degree law are denoted byP(D=k)=f k,k=1,2,...,and F(x)= xk=1f k,(1.1)where x is the largest integer smaller than or equal to x.We pay special attention to distributions of the form1−F(x)=x1−τL(x),(1.2) whereτ>2and L is slowly varying at inﬁnity.This means that the random variables D j obey a power law,and the factor L is meant to generalize the model.We denote the expectation of D byµ,i.e.,µ=∞k=1kf k.(1.3)1.2The diameter in the conﬁguration modelIn this section we present the results on the bounds on the diameter.We use the abbreviation whp for a statement that occurs with probability tending to1if the number of vertices of the graph N tends to∞.Theorem1.1(Lower bound on diameter)Forτ>2,assuming that f1+f2>0and f1<1, there exists a positive constantαsuch that whp the diameter of the conﬁguration model is bounded below byαlog N.A more precise result on the diameter in the CM is presented in[16],where it is proved that under rather general assumptions on the degree sequence of the CM,the diameter of the CM divided by log N converges to a constant.This result is also valid for related models,such as the Erd˝o s-R´e nyi random graph,but the proof is quite diﬃcult.While Theorem1.1is substantially weaker,the fact that a positive constant times log N appears is most interesting,as we will discuss now in more detail.Indeed,the result in Theorem1.1is most interesting in the case whenτ∈(2,3).By[18, Theorem1.2],the typical distance forτ∈(2,3)is proportional to log log N,whereas we show here that the diameter is bounded below by a positive constant times log N when f1+f2>0and f1<1. Therefore,we see that the average distance and the diameter are of a diﬀerent order of magnitude. The pairs of vertices where the distance is of the order log N are thus scarce.The proof of Theorem 1.1reveals that these pairs are along long lines of vertices with degree2that are connected to each other.Also in the proof of[16],one of the main diﬃculties is the identiﬁcation of the precise length of these long thin lines.Our second main result states that whenτ∈(2,3),the above assumption that f1+f2>0is necessary and suﬃcient for log N lower bounds on the diameter.In Theorem1.2below,we assume that there exists aτ∈(2,3)such that,for some c>0and all x≥1,1−F(x)≥cx1−τ,(1.4) which is slightly weaker than the assumption in(1.2).We further deﬁne for integer m≥2and a real numberσ>1,CF =CF(σ,m)=2|log(τ−2)|+2σlog m.(1.5)Then our main upper bound on the diameter when(1.4)holds is as follows:Theorem1.2(A log log upper bound on the diameter)Fix m≥2,and assume that P(D≥m+1)=1,and that(1.4)holds.Then,for everyσ>(3−τ)−1,the diameter of the conﬁguration model is,whp,bounded above by CFlog log N.1.3Discussion and related workTheorem1.2has a counterpart for preferential attachment models(PAM)proved in[19].In these PAM’s,at each integer time t,a new vertex with m≥1edges attached to it,is added to the graph. The new edges added at time t are then preferentially connected to older edges,i.e.,conditionally on the graph at time t−1,which is denoted by G(t−1),the probability that a given edge is connected to vertex i is proportional to d i(t−1)+δ,whereδ>−m is aﬁxed parameter and d i(t−1)is the degree of vertex i at time t−1.A substantial literature exists,see e.g.[10],proving that the degree sequence of PAM’s in rather great generality satisfy a power law(see e.g.the references in [11]).In the above setting of linear preferential attachment,the exponentτis equal to[21,11]τ=3+δm.(1.6)A log log t upper bound on the diameter holds for PAM’s with m≥2and−m<δ<0,which, by(1.6),corresponds toτ∈(2,3)[19]:Theorem1.3(A log log upper bound on the diameter of the PAM)Fix m≥2andδ∈(−m,0).Then,for everyσ>13−τ,and withCG (σ)=4|log(τ−2)|+4σlog mthe diameter of the preferential attachment model is,with high probability,bounded above by CGlog log t,as t→∞.Observe that the condition m≥2in the PAM corresponds to the condition P(D≥m+1)=1 in the CM,where one half-edge is used to attach the vertex,while in PAM’s,vertices along a pathhave degree at least three when m≥2.Also note from the deﬁnition of CG and CFthat distancesin PAM’s tend to be twice as big compared to distances in the CM.This is related to the structure of the graphs.Indeed,in both graphs,vertices of high degree play a crucial role in shortest paths. In the CM vertices of high degree are often directly connected to each other,while in the PAM, they tend to be connected through a later vertex which links to both vertices of high degree.Unfortunately,there is no log t lower bound in the PAM forδ>0and m≥2,or equivalently τ>3.However,[19]does contain a(1−ε)log t/log log t lower bound for the diameter when m≥1 andδ≥0.When m=1,results exists on log t asymptotics of the diameter,see e.g.[6,24].The results in Theorems1.1–1.3are consistent with the non-rigorous physics predictions that distances in the PAM and in the CM,for similar degree sequences,behave similarly.It is an interesting problem,for both the CM and PAM,to determine the exact constant C≥0such that the diameter of the graph of N vertices divided by log N converges in probability to C.For the CM,the results in[16]imply that C>0,for the PAM,this is not known.We now turn to related work.Many distance results for the CM are known.Forτ∈(1,2) distances are bounded[14],forτ∈(2,3),they behave as log log N[25,18,9],whereas forτ>3 the correct scaling is log N[17].Observe that these results induce lower bounds for the diameter of the CM,since the diameter is the supremum of the distance,where the supremum is taken over all pairs of vertices.Similar results for models with conditionally independent edges exist,see e.g. [4,8,13,23].Thus,for these classes of models,distances are quite well understood.The authors in [16]prove that the diameter of a sparse random graph,with speciﬁed degree sequence,has,whp, diameter equal to c log N(1+o(1)),for some constant c.Note that our Theorems1.1–1.2imply that c>0when f1+f2>0,while c=0when f1+f2=0and(1.4)holds for someτ∈(2,3).There are few results on distances or diameter in PAM’s.In[5],it was proved that in the PAMand forδ=0,for whichτ=3,the diameter of the resulting graph is equal to log tlog log t (1+o(1)).Unfortunately,the matching result for the CM has not been proved,so that this does not allow us to verify whether the models have similar distances.This paper is organized as follows.In Section2,we prove the lower bound on the diameter formulated in Theorem1.1and in Section3we prove the upper bound in Theorem1.2.2A lower bound on the diameter:Proof of Theorem1.1We start by proving the claim when f2>0.The idea behind the proof is simple.Under the conditions of the theorem,one can,whp,ﬁnd a pathΓ(N)in the random graph such that this path consists exclusively of vertices with degree2and has length at least2αlog N.This implies that the diameter is at leastαlog N,since the above path could be a cycle.Below we deﬁne a procedure which proves the existence of such a path.Consider the process of pairing stubs in the graph.We are free to choose the order in which we pair the free stubs,since this order is irrelevant for the distribution of the random graph.Hence,we are allowed to start with pairing the stubs of the vertices of degree2.Let N(2)be the number of vertices of degree2and SN(2)=(i1,...,i N(2))∈N N(2)the collection of these vertices.We will pair the stubs and at the same time deﬁne a permutationΠ(N)=(i∗1, (i)N(2))of SN(2),and a characteristicχ(N)=(χ1,...,χN(2))onΠ(N),whereχj is either0or1.Π(N)andχ(N)will be deﬁned inductively in such a way that for any vertex i∗k ∈Π(N),χk=1,if and only if vertex i∗k is connected to vertex i∗k+1.Hence,χ(N)contains a substring ofat least2αlog N ones precisely when the random graph contains a pathΓ(N)of length at least 2αlog N.We initialize our inductive deﬁnition by i∗1=i1.The vertex i∗1has two stubs,we consider the second one and pair it to an arbitrary free stub.If this free stub belongs to another vertex j=i∗1 in SN(2)then we choose i∗2=j andχ1=1,otherwise we choose i∗2=i2,andχ1=0.Suppose forsome1<k≤N(2),the sequences(i∗1, (i)k )and(χ1,...,χk−1)are deﬁned.Ifχk−1=1,then onestub of i∗k is paired to a stub of i∗k−1,and another stub of i∗kis free,else,ifχk−1=0,vertex i∗khastwo free stubs.Thus,for every k≥1,the vertex i∗k has at least one free stub.We pair this stubto an arbitrary remaining free stub.If this second stub belongs to vertex j∈SN (2)\{i∗1, (i)k},then we choose i∗k+1=j andχk=1,else we choose i∗k+1as theﬁrst stub in SN(2)\{i∗1, (i)k},andχk=0.Hence,we have deﬁned thatχk=1precisely when vertex i∗k is connected to vertexi∗k+1.We show that whp there exists a substring of ones of length at least2αlog N in theﬁrsthalf ofχN ,i.e.,inχ12(N)=(χi∗1,...,χi∗N(2)/2).For this purpose,we couple the sequenceχ12(N)with a sequence B12(N)={ξk},whereξk are i.i.d.Bernoulli random variables taking value1withprobability f2/(4µ),and such that,whp,χi∗k ≥ξk for all k∈{1,..., N(2)/2 }.We write PNforthe law of the CM conditionally on the degrees D1,...,DN.Then,for any1≤k≤ N(2)/2 ,thePN-probability thatχk=1is at least2N(2)−CN(k)LN −CN(k),(2.1)where,as before,N(2)is the total number of vertices with degree2,and CN(k)is one plus thetotal number of paired stubs after k−1pairings.By deﬁnition of CN(k),for any k≤N(2)/2,we haveCN(k)=2(k−1)+1≤N(2).(2.2) Due to the law of large numbers we also have that whpN(2)≥f2N/2,LN≤2µN.(2.3) Substitution of(2.2)and(2.3)into(2.1)then yields that the right side of(2.1)is at leastN(2) LN ≥f24µ.Thus,whp,we can stochastically dominate all coordinates of the random sequenceχ12(N)with ani.i.d.Bernoulli sequence B12(N)of Nf2/2independent trials with success probability f2/(4µ)>0.It is well known(see e.g.[12])that the probability of existence of a run of2αlog N ones convergesto one whenever2αlog N≤log(Nf2/2) |log(f2/(4µ))|,for some0< <1.We conclude that whp the sequence B1(N)contains a substring of2αlog N ones.Since whpχN ≥B12(N),where the ordering is componentwise,whp the sequenceχNalso contains the samesubstring of2αlog N ones,and hence there exists a required path consisting of at least2αlog N vertices with degree2.Thus,whp the diameter is at leastαlog N,and we have proved Theorem 1.1in the case that f2>0.We now complete the proof of Theorem1.1when f2=0by adapting the above argument. When f2=0,and since f1+f2>0,we must have that f1>0.Let k∗>2be the smallest integer such that f k∗>0.This k∗must exist,since f1<1.Denote by N∗(2)the total number of vertices of degree k∗of which itsﬁrst k∗−2stubs are connected to a vertex with degree1.Thus,eﬀectively, after theﬁrst k∗−2stubs have been connected to vertices with degree1,we are left with a structure which has2free stubs.These vertices will replace the N(2)vertices used in the above proof.It is not hard to see that whp N∗(2)≥f∗2N/2for some f∗2>0.Then,the argument for f2>0can be repeated,replacing N(2)by N∗(2)and f2by f∗2.In more detail,for any1≤k≤ N∗(2)/(2k∗) , the PN-probability thatχk=1is at least2N∗(2)−C∗N(k)LN −CN(k),(2.4)where C∗N(k)is the total number of paired stubs after k−1pairings of the free stubs incident to the N∗(2)vertices.By deﬁnition of C∗N(k),for any k≤N∗(2)/(2k∗),we haveCN(k)=2k∗(k−1)+1≤N∗(2).(2.5)Substitution of(2.5),N∗(2)≥f∗2N/2and the bound on LNin(2.3)into(2.4)gives us that the right side of(2.4)is at leastN∗(2) LN ≥f∗24µ.Now the proof of Theorem1.1in the case where f2=0and f1∈(0,1)can be completed as above.We omit further details. 3A log log upper bound on the diameter forτ∈(2,3)In this section,we investigate the diameter of the CM when P(D≥m+1)=1,for some integer m≥2.We assume(1.4)for someτ∈(2,3).We will show that under these assumptions CFlog log N isan upper bound on the diameter of the CM,where CFis deﬁned in(1.5).The proof is divided into two key steps.In theﬁrst,in Proposition3.1,we give a bound on the diameter of the core of the CM consisting of all vertices with degree at least a certain power of log N.This argument is very close in spirit to the one in[25],the only diﬀerence being that we have simpliﬁed the argument slightly.After this,in Proposition3.4,we derive a bound on the distance between vertices with small degree and the core.We note that Proposition3.1only relies on the assumption in(1.4),while Proposition3.4only relies on the fact that P(D≥m+1)=1,for some m≥2.The proof of Proposition3.1can easily be adapted to a setting where the degrees areﬁxed, by formulating the appropriate assumptions on the number of vertices with degree at least x for a suﬃcient range of x.This assumption would replace(1.5).Proposition3.4can easily be adapted to a setting where there are no vertices of degree smaller than or equal to m.This assumption would replace the assumption P(D≥m+1)=1,for some m≥2.We refrain from stating these extensions of our results,and start by investigating the core of the CM.We takeσ>13−τand deﬁne the core CoreNof the CM to beCoreN={i:D i≥(log N)σ},(3.1)i.e.,the set of vertices with degree at least(log N)σ.Also,for a subset A⊆{1,...,N},we deﬁne the diameter of A to be equal to the maximal shortest path distance between any pair of vertices of A.Note,in particular,that if there are pairs of vertices in A that are not connected,then the diameter of A is inﬁnite.Then,the diameter of the core is bounded in the following proposition:Proposition3.1(Diameter of the core)For everyσ>13−τ,the diameter of CoreNis,whp,bounded above by2log log N|log(τ−2)|(1+o(1)).(3.2)Proof.We note that(1.4)implies that whp the largest degree D(N)=max1≤i≤N D i satisﬁesD(N)≥u1,where u1=N1τ−1(log N)−1,(3.3) because,when N→∞,P(D(N)>u1)=1−P(D(N)≤u1)=1−(F(u1))N≥1−(1−cu1−τ1)N=1−1−c(log N)τ−1NN∼1−exp(−c(log N)τ−1)→1.(3.4)DeﬁneN (1)={i :D i ≥u 1},(3.5)so that,whp ,N (1)=∅.For some constant C >0,which will be speciﬁed later,and k ≥2we deﬁne recursively u k =C log N u k −1 τ−2,and N (k )={i :D i ≥u k }.(3.6)We start by identifying u k :Lemma 3.2(Identiﬁcation of u k )For each k ∈N ,u k =C a k (log N )b k N c k ,(3.7)with c k =(τ−2)k −1τ−1,b k =13−τ−4−τ3−τ(τ−2)k −1,a k =1−(τ−2)k −13−τ.(3.8)Proof .We will identify a k ,b k and c k recursively.We note that,by (3.3),c 1=1τ−1,b 1=−1,a 1=0.By (3.6),we can,for k ≥2,relate a k ,b k ,c k to a k −1,b k −1,c k −1as follows:c k =(τ−2)c k −1,b k =1+(τ−2)b k −1,a k =1+(τ−2)a k −1.(3.9)As a result,we obtain c k =(τ−2)k −1c 1=(τ−2)k −1τ−1,(3.10)b k =b 1(τ−2)k −1+k −2 i =0(τ−2)i =1−(τ−2)k −13−τ−(τ−2)k −1,(3.11)a k =1−(τ−2)k −13−τ.(3.12)The key step in the proof of Proposition 3.1is the following lemma:Lemma 3.3(Connectivity between N (k −1)and N (k ))Fix k ≥2,and C >4µ/c (see (1.3),and (1.4)respectively).Then,the probability that there exists an i ∈N (k )that is not directly connected to N (k −1)is o (N −γ),for some γ>0independent of k .Proof .We note that,by deﬁnition,i ∈N (k −1)D i ≥u k −1|N (k −1)|.(3.13)Also,|N (k −1)|∼Bin N,1−F (u k −1) ,(3.14)and we have that,by (1.4),N [1−F (u k −1)]≥cN (u k −1)1−τ,(3.15)which,by Lemma 3.2,grows as a positive power of N ,since c k ≤c 2=τ−2τ−1<1τ−1.We use a concentration of probability resultP (|X −E [X ]|>t )≤2e −t 22(E [X ]+t/3),(3.16)which holds for binomial random variables[20],and gives that that the probability that|N(k−1)|is bounded below by N[1−F(u k−1)]/2is exponentially small in N.As a result,we obtain that for every k,and whpi∈N(k)D i≥c2N(u k)2−τ.(3.17)We note(see e.g.,[18,(4.34)]that for any two sets of vertices A,B,we have thatPN (A not directly connected to B)≤e−D A D BL N,(3.18)where,for any A⊆{1,...,N},we writeD A=i∈AD i.(3.19)On the event where|N(k−1)|≥N[1−F(u k−1)]/2and where LN≤2µN,we then obtain by(3.18),and Boole’s inequality that the PN-probability that there exists an i∈N(k)such that i is not directly connected to N(k−1)is bounded byNe−u k Nu k−1[1−F(u k−1)]2L N≤Ne−cu k(u k−1)2−τ4µ=N1−cC4µ,(3.20)where we have used(3.6).Taking C>4µ/c proves the claim. We now complete the proof of Proposition3.1.Fixk∗= 2log log N|log(τ−2)|.(3.21)As a result of Lemma3.3,we have whp that the diameter of N(k∗)is at most2k∗,because thedistance between any vertex in N(k∗)and the vertex with degree D(N)is at most k∗.Therefore,weare done when we can show thatCoreN⊆N(k∗).(3.22)For this,we note thatN(k∗)={i:D i≥u k∗},(3.23)so that it suﬃces to prove that u k∗≥(log N)σ,for anyσ>13−τ.According to Lemma3.2,u k∗=C a k∗(log N)b k∗N c k∗.(3.24) Because for x→∞,and2<τ<3,x(τ−2)2log x|log(τ−2)|=x·x−2=o(log x),(3.25) weﬁnd with x=log N thatlog N·(τ−2)2log log N|log(τ−2)|=o(log log N),(3.26) implying that N c k∗=(log N)o(1),(log N)b k∗=(log N)13−τ+o(1),and C a k∗=(log N)o(1).Thus,u k∗=(log N)13−τ+o(1),(3.27)so that,by picking N suﬃciently large,we can make13−τ+o(1)≤σ.This completes the proof ofProposition3.1. For an integer m≥2,we deﬁneC(m)=σ/log m.(3.28)Proposition 3.4(Maximal distance between periphery and core)Assume that P (D ≥m +1)=1,for some m ≥2.Then,for every σ>(3−τ)−1the maximal distance between any vertex and the core is,whp ,bounded from above by C (m )log log N .Proof .We start from a vertex i and will show that the probability that the distance between i and Core N is at least C (m )log log N is o (N −1).This proves the claim.For this,we explore the neighborhood of i as follows.From i ,we connect the ﬁrst m +1stubs (ignoring the other ones).Then,successively,we connect the ﬁrst m stubs from the closest vertex to i that we have connected to and have not yet been explored.We call the arising process when we have explored up to distance k from the initial vertex i the k -exploration tree .When we never connect two stubs between vertices we have connected to,then the number of vertices we can reach in k steps is precisely equal to (m +1)m k −1.We call an event where a stub on the k -exploration tree connects to a stub incident to a vertex in the k -exploration tree a collision .The number of collisions in the k -exploration tree is the number of cycles or self-loops in it.When k increases,the probability of a collision increases.However,for k of order log log N ,the probability that more than two collisions occur in the k -exploration tree is small,as we will prove now:Lemma 3.5(Not more than one collision)Take k = C (m )log log N .Then,the P N -probab-ility that there exists a vertex of which the k -exploration tree has at least two collisions,before hittingthe core Core N ,is bounded by (log N )d L −2N ,for d =4C (m )log (m +1)+2σ.Proof .For any stub in the k -exploration tree,the probability that it will create a collision beforehitting the core is bounded above by (m +1)m k −1(log N )σL −1N .The probability that two stubs will both create a collision is,by similar arguments,bounded above by (m +1)m k −1(log N )σL −1N2.The total number of possible pairs of stubs in the k -exploration tree is bounded by[(m +1)(1+m +...+m k −1)]2≤[(m +1)m k ]2,so that,by Boole’s inequality,the probability that the k -exploration tree has at least two collisions is bounded by (m +1)m k 4(log N )2σL −2N .(3.29)When k = C (m )log log N ,we have that (m +1)m k 4(log N )2σ≤(log N )d ,where d is deﬁned inthe statement of the lemma. Finally,we show that,for k = C (m )log log N ,the k -exploration tree will,whp connect to the Core N :Lemma 3.6(Connecting exploration tree to core)Take k = C (m )log log N .Then,the probability that there exists an i such that the distance of i to the core is at least k is o (N −1).Proof .Since µ<∞we have that L N /N ∼µ.Then,by Lemma 3.5,the probability that there exists a vertex for which the k -exploration tree has at least 2collisions before hitting the core is o (N −1).When the k -exploration tree from a vertex i does not have two collisions,then there are at least (m −1)m k −1stubs in the k th layer that have not yet been connected.When k = C (m )log log N this number is at least equal to (log N )C (m )log m +o (1).Furthermore,the expected number of stubs incident to the Core N is at least N (log N )σP (D 1≥(log N )σ)so that whp the number of stubs incident to Core N is at least (compare (1.4))12N (log N )σP (D 1≥(log N )σ)≥c 2N (log N )2−τ3−τ.(3.30)By(3.18),the probability that we connect none of the stubs in the k th layer of the k-exploration tree to one of the stubs incident to CoreNis bounded byexp−cN(log N)2−τ3−τ+C(m)log m2LN≤exp−c4µ(log N)2−τ3−τ+σ=o(N−1),(3.31)because whp LN /N≤2µ,and since2−τ3−τ+σ>1.Propositions3.1and3.4prove that whp the diameter of the conﬁguration model is boundedabove by CF log log N,with CFdeﬁned in(1.5).This completes the proof of Theorem1.2.Acknowledgements.The work of RvdH and DZ was supported in part by Netherlands Organ-isation for Scientiﬁc Research(NWO).References[1]R.Albert and A.-L.Barab´a si.Statistical mechanics of complex networks.Rev.Mod.Phys.74,47-97,(2002).[2]A.-L.Barab´a si.Linked,The New Science of Networks.Perseus Publishing,Cambridge,Mas-sachusetts,(2002).[3]B.Bollob´a s.Random Graphs,2nd edition,Academic Press,(2001).[4]B.Bollob´a s,S.Janson,and O.Riordan.The phase transition in inhomogeneous randomgraphs.Random Structures and Algorithms31,3-122,(2007).[5]B.Bollob´a s and O.Riordan.The diameter of a scale-free random binatorica,24(1):5–34,(2004).[6]B.Bollob´a s and O.Riordan.Shortest paths and load scaling in scale-free trees.Phys.Rev.E.,69:036114,(2004).[7]T.Britton,M.Deijfen,and A.Martin-L¨o f.Generating simple random graphs with prescribeddegree distribution.J.Stat.Phys.,124(6):1377–1397,(2006).[8]F.Chung and L.Lu.The average distances in random graphs with given expected degrees.A,99(25):15879–15882(electronic),(2002).[9]R.Cohen and S.Havlin.Scale free networks are ultrasmall,Physical Review Letters90,058701,(2003).[10]C.Cooper and A.Frieze.A general model of web graphs.Random Structures Algorithms,22(3):311–335,(2003).[11]M.Deijfen,H.van den Esker,R.van der Hofstad and G.Hooghiemstra.A preferential attach-ment model with random initial degrees.Preprint(2007).To appear in Arkiv f¨o r Matematik.[12]P.Erd¨o s and A.R´e nyi.On a new law of large numbers,J.Analyse Math.23,103–111,(1970).[13]H.van den Esker,R.van der Hofstad and G.Hooghiemstra.Universality forthe distance inﬁnite variance random graphs.Preprint(2006).Available from http://ssor.twi.tudelft.nl/∼gerardh/[14]H.van den Esker,R.van der Hofstad,G.Hooghiemstra and D.Znamenski.Distances inrandom graphs with inﬁnite mean degrees,Extremes8,111-141,2006.。

基于连续密度隐马尔可夫模型的矿下异常行为识别算法

第51卷第3期2021年5月吉林大学学报（工学版）Journal of Jilin University (Engineering and Technology Edition)Vol. 51 No. 3May 2021基于连续密度隐马尔可夫模型的矿下异常行为识别算法王淑敏，陈伟(哈尔滨工程大学经济管理学院，哈尔滨150001)摘要：针对当前方法识别精度不高的问题，提出了基于连续密度隐马尔可夫模型的矿下异常行为识别算法。

获取待识别矿下视频帧数据，通过级联分类器实现运动区域的初步检测，并读入下帧数据，直到所有帧检测完毕。

引入连续密度隐马尔可夫(H M M)模型，将人体图像分解成若干相等区域，获取图像区域中的标准差值特征，对连续密度H M M进行训练，完成异常行为识别。

实验结果证明，本文算法的识别结果具有精度高和检测率高的特性，说明其具有可靠性。

关键词：计算机应用；连续密度；马尔可夫模型；矿下异常；识别中图分类号：T P39;T D76 文献标志码：A文章编号：1671_5497(2021)03-1067-06D O I：10. 13229/ki.j d x b g x b20200160Algorithm for identifying abnormal behavior in undergroundmines based on continuous density hidden Markov modelW A N G S h u-m i n，C H E N W e i(School of E conomics and Management, Harbin Engineering University y Harbin150001, China)Abstract：In the current research of abnormal behaviors in underground m i n e s the result accuracy needs i m p r o v e m e n t.In order to solve this problem and improve the safety of underground m i n e s,a continuous density hidden M a r k o v m o d e l i s proposed to identify abnormal behaviors in underground m i n e s.In the detection of h u m a n motion area under the m i n e,first,the video frame data to be identified is obtained；the h u m a n m o t i o n area is preliminarily extracted for each i m a g e,and the preliminary detection of h u m a n motion area i s realized by cascading classifiers to get m o r e accurate h u m a n motion area.T h e n the next frame data is read in,a n d this process i s iterated until all frames are detected.According to the preliminary detection results,the continuous density hidden M a r k o v m o d e l i s introduced to d e c o m p o s e the h u m a n b o d y i m a g e into several equal areas.T h e representative color features and the standard difference features in the image area are obtained.T h e continuous density H M M of the target i s trained through the obtained feature data.T h e abnor m a l behavior recognition of the h u m a n b o d y target under the m i n e is completed according to the training m o d e l.T h e experimental results s h o w that the proposed algorithm has the characteristics of high accuracy and high detection rate in different收稿日期：2020-03-16.基金项目：国家社科基金项目（17CSH016);国家社科基金一般项目（19BJL008);国家自然科学基金项目(61371178).作者简介：王淑敏（1980-)，女，博士研究生.研究方向：企业经营与管理，信息与控制工程.E-mail ：*****************•1068 •吉林大学学报（工学版）第51卷n u m b e r of h u m a n behavior recognition,wh i c h s h o w s that the algorithm i s reliable.Key words：c o m p u t e r application；continuous density；M a r k o v m o d e l；underground a n o m a l y；identification〇引言矿井下的灯会周期性地改变照射点，致使矿井下光照不是十分均匀，而且不恒定[u2]。

on the evolution of random graphs

ON THE EVOLUTION OF RANDOM GRAPHS
by
P. ERD6S and A.
R~NYI
Dedicated to
his 60th birthday.
Profe88m- P. Turdn at
lotroduction
Our aim is to study the probable structure of a ra.ridom gra.ph r,.N which ha.s n given labelled vertices P 1, P2 , ••• , P, and N edges; we suppose that these N edges are chosen a.t random among the (;) possible edges,
' . 1 with Gn.N is - - . If .A is a. property which a. graph may or may not possess,
We denote by p . ...,N (.A) the probability that the A random graph F,.N possesses the property .4, i. e. we put P .. N (A) = 0 n,N where AnN . denotes the
r,.
nected, the pair 0 1 (n) =_!_n log n, 0 11(n) = n is a. pair of strong threshold functions for the property and the corresponding sharp-threshold distribution function is e-r-2ll'; thus we have proved1 that putting 1 N(n) = - n log n + y n+ o(n) we have

Targeted Learning Causal Inference for Observation

June 6-8, 2011
Complications of Human Art in Statistics
1. The parametric model is misspecified. 2. The target parameter is interpreted as if
parametric model is correct.ten data-
parameter of interest
• General decrease in bias for parameter of Interest
• Fewer false positives • Honest p-values, inference, multiple testing
Targeted Maximum Likelihood Estimation Flow Chart
Targeted Learning
• Avoid reliance on human art and non-realistic (parametric) models
• Define interesting parameters • Target the fit of data-generating distribution
to the parameter of interest • Statistical Inference
TMLE/SL
Targeted Maximum Likelihood coupled with Super Learner methodology
TMLE/SL Toolbox
Targeted effects
Targeted Learning
Causal Inference for Observational and Experimental Data

多尺度变换域隐马尔可夫树纹理图像特征提取模型解读

多尺度变换域隐马尔可夫树纹理图像特征提取模型多尺度几何分析相对于小波分析逼近性能的提高,其意义丝毫不亚于小波分析相对于Fourier分析逼进性能的提高。

Curvelet变换不仅多尺度,引入的方向参量使其具有高度各向异性,对图像边缘有优越表达能力,能用较少非零系数表示出图像边缘信息。

将Curvelet应用于去噪时噪声去除比较彻底,去噪效率高,边缘保护能力强,但存在“阶梯”效应。

基于偏微分方程(PDE)的图像去噪在图像处理领域中崭露头角,能很好对图像进行平滑处理。

其中四阶偏微分方程去噪能很好的恢复平滑区域保护细小纹理,并且避免“阶梯”效应产生,但去噪效率不高,保护边缘能力不强。

为了对含噪图像进行去噪处理,提高图像特征提取和检索效率,本文提出利用权函数将Curvelet去噪和四阶偏微分方程中LLT模型相结合的去噪模型。

实验表明该模型能很好发挥二者优点,PSNR和视觉效果都优于单一的Curvelet或LLT方法。

Contourlet变换由多尺度分析与方向分析构成,具有各向异性尺度关系,其分解系数具有非高斯性和持续性等特点。

为很好的利用相关性描述纹理图像特征,针对目前Contourlet域隐马尔可夫树模型(CHMT)同等考虑父结点的相邻结点对子结点的影响,本文提出一种加权Contourlet域隐马尔可夫树模型对纹理图像进行特征提取。

分析子结点的状态时,不仅考虑父结点信息,而且利用权重评价父结点兄弟结点对子结点的影响,并将通过附加状态转移矩阵体现出来,使得新模型能更加准确的描述Contourlet系数和HMT的内在联系。

同时运用K-L距离计算图像间的相似度,实验结果表明,本模型比CHMT平均检索率高出7%-46%。

同主题文章[1].毕晓君,李文秀. 一种新型纹理图像生成方法研究' [J]. 自动化技术与应用. 2005.(01)[2].张树良,毕笃彦. 一种基于纹理图像的目标提取方法' [J]. 空军工程大学学报(自然科学版). 2004.(04)[3].崔宇红,倪国强,范宏深. 基于模糊推理的纹理图像融合方法研究' [J]. 光学技术. 2005.(01)[4].张光新,崔扬,周泽魁. 基于小波包分解的纹理图像去噪' [J]. 华南理工大学学报(自然科学版). 2005.(03)[5].苏赫峰,张申生,王江春,罗建强. 跨区域纹理映射的一种实现方法' [J]. 计算机仿真. 2004.(03)[6].李峻,边馥苓,谈晓军. DEM及纹理图像的集成数据库研究' [J]. 测绘科学. 2000.(03)[7].尚赵伟,张明新,沈钧毅,相明. 基于双密度小波变换的纹理图像检索' [J]. 西安交通大学学报. 2005.(10)[8].夏飞,李志斌,陈亮,孟宪明. 数据融合在纹理识别中的应用' [J]. 航空计算技术. 2007.(01)[9].叶桦,章国宝,陈维南. 基于小波变换的纹理图像分割' [J]. 东南大学学报(自然科学版). 1999.(01)[10].余鹏,封举富,童行伟. 一种新的基于高斯混合模型的纹理图像分割方法' [J]. 武汉大学学报(信息科学版). 2005.(06)【关键词相关文档搜索】：电路与系统; 图像去噪; Curvelet; 四阶偏微分方程; LLT模型; Contourlet; 加权CHMT; 纹理图像; 特征提取【作者相关信息搜索】：湖南师范大学;电路与系统;杨家红;王玲华;。

隐马尔可夫模型分词 python

隐马尔可夫模型分词 python隐马尔可夫模型（HMM）是使用频率最高的分词算法之一。

在自然语言处理中，分词是一个很基本的任务。

分词的目的是将一句话分成若干个词语，供计算机处理。

在中文分词中，由于中文没有像英文或德文那样明显的单词边界，因此中文分词任务显得更加复杂。

但是，隐马尔可夫模型却是一种很好的用于中文分词的算法。

Python作为一门强大的编程语言，它有着众多的科学计算库和自然语言处理包，可以非常方便地实现HMM分词算法。

下面，本文将会介绍如何使用Python实现HMM分词算法。

一、隐马尔可夫模型（HMM）隐马尔可夫模型（HMM）是关于时序问题的概率模型。

它的基本想法是：一个状态随机地产生一个输出，产生输出的动作和当前状态有关，但是外部观察者并不能直接观察到状态，只能观察到由状态产生的输出结果。

在分词任务中，HMM模型将文本序列看作是由一个个状态进行转移，产生不同的输出。

HMM模型包含以下几个成分：1. 状态集合：一个离散集合，表示所有可能的隐藏状态，对于分词问题，可以将其看作是所有可能的分词。

2. 观测集合：一个离散集合，表示所有可能的观测值，对于分词问题，可以将其看作是单个字符或者单个标点符号。

3. 状态转移概率矩阵：一个矩阵，表示从一个状态转移到另一个状态的概率。

例如，当一个状态为B时，转移到另一个状态E的概率是多少。

4. 发射概率矩阵：一个矩阵，表示从每个状态产生每个观测值的概率。

例如，在一个状态S下，发射出字符“我”的概率是多少。

5. 初始状态概率向量：一个向量，表示在状态序列中，第一个状态属于每个状态的概率。

在分词问题中，状态集合为所有可能的分词，观测集合为单个字符或标点符号。

例如，下面的计算机专业英语文本：计算机专业英语，是一门介绍计算机技术和计算机应用的英语语言课程。

学生除了学习英语之外，还需要学习一些有关计算机领域的基本知识和词汇。

可以将其转换为状态序列和观测序列：状态序列：B E B M M E B M E E观测序列：计算机专业英语，是一门介绍计算机技术和计算机应用的英语语言课程。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Random graphs with arbitrary degree distributions and their applications M.E.J.Newman,1,2S.H.Strogatz,2,3and D.J.Watts1,41Santa Fe Institute,1399Hyde Park Road,Santa Fe,New Mexico875012Center for Applied Mathematics,Cornell University,Ithaca,New York14853-3401 3Department of Theoretical and Applied Mechanics,Cornell University,Ithaca,New York14853-1503 4Department of Sociology,Columbia University,1180Amsterdam Avenue,New York,New York10027͑Received19March2001;published24July2001͒Recent work on the structure of social networks and the internet has focused attention on graphs with distributions of vertex degree that are signiﬁcantly different from the Poisson degree distributions that have been widely studied in the past.In this paper we develop in detail the theory of random graphs with arbitrary degree distributions.In addition to simple undirected,unipartite graphs,we examine the properties of directed and bipartite graphs.Among other results,we derive exact expressions for the position of the phase transition at which a giant componentﬁrst forms,the mean component size,the size of the giant component if there is one,the mean number of vertices a certain distance away from a randomly chosen vertex,and the average vertex-vertex distance within a graph.We apply our theory to some real-world graphs,including the world-wide web and collaboration graphs of scientists and Fortune1000company directors.We demonstrate that in some cases random graphs with appropriate distributions of vertex degree predict with surprising accuracy the behavior of the real world,while in others there is a measurable discrepancy between theory and reality, perhaps indicating the presence of additional social structure in the network that is not captured by the random graph.DOI:10.1103/PhysRevE.64.026118PACS number͑s͒:89.75.Hc,87.23.Ge,05.90.ϩmI.INTRODUCTIONA random graph͓1͔is a collection of points,or vertices, with lines,or edges,connecting pairs of them at random ͓Fig.1͑a͔͒.The study of random graphs has a long history.Starting with the inﬂuential work of Erdo¨s and Re´nyi in the 1950s and1960s͓2–4͔,random graph theory has developed into one of the mainstays of modern discrete mathematics, and has produced a prodigious number of results,many of them highly ingenious,describing statistical properties of graphs,such as distributions of component sizes,existence and size of a giant component,and typical vertex-vertex dis-tances.In almost all of these studies the assumption has been made that the presence or absence of an edge between two vertices is independent of the presence or absence of any other edge,so that each edge may be considered to be present with independent probability p.If there are N verti-ces in a graph,and each is connected to an average of z edges,then it is trivial to show that pϭz/(NϪ1),which for large N is usually approximated by z/N.The number of edges connected to any particular vertex is called the degree k of that vertex,and has a probability distribution p k given byp kϭͩN kͪp k͑1Ϫp͒NϪkӍz k eϪz k!,͑1͒where the second equality becomes exact in the limit of large N.This distribution we recognize as the Poisson distribution: the ordinary random graph has a Poisson distribution of ver-tex degrees,a point which turns out to be crucial,as we now explain.Random graphs are not merely a mathematical toy;they have been employed extensively as models of real-world net-works of various types,particularly in epidemiology.The passage of a disease through a community depends strongly on the pattern of contacts between those infected with the disease and those susceptible to it.This pattern can be de-picted as a network,with individuals represented by vertices and contacts capable of transmitting the disease by edges.A large class of epidemiological models known as susceptible/ infectious/recovered models͓5–7͔makes frequent use of the so-called fully mixed approximation,which is the assump-tion that contacts are random and uncorrelated,i.e.,they form a random graph.Random graphs however turn out to have severe short-comings as models of such real-world phenomena.Although it is difﬁcult to determine experimentally the structure of the network of contacts by which a disease is spread͓8͔,studies have been performed of other social networks such as net-works of friendships within a variety of communities͓9–11͔, networks of telephone calls͓12,13͔,airline timetables͓14͔, and the power grid͓15͔,as well as networks in physicalor FIG.1.͑a͒A schematic representation of a random graph,the circles representing vertices and the lines representing edges.͑b͒A directed random graph,i.e.,one in which each edge runs in only one direction.PHYSICAL REVIEW E,VOLUME64,026118biological systems,including neural networks͓15͔,the struc-ture and conformation space of polymers͓16,17͔,metabolic pathways͓18,19͔,and food webs͓20,21͔.It is found͓13,14͔that the distribution of vertex degrees in many of these net-works is measurably different from a Poisson distribution—often wildly different—and this strongly suggests,as has been emphasized elsewhere͓22͔,that there are features of such networks that we would miss if we were to approximatethem by an ordinary͑Poisson͒random graph.Another very widely studied network is the internet,whose structure has attracted an exceptional amount of scru-tiny,academic and otherwise,following its meteoric rise topublic visibility starting in1993.Pages on the world-wideweb may be thought of as the vertices of a graph and thehyperlinks between them as edges.Empirical studies͓23–26͔have shown that this graph has a distribution of vertexdegree which is heavily right skewed and possesses a fat ͑power law͒tail with an exponent betweenϪ2andϪ3.͑The underlying physical structure of the internet also has a degree distribution of this type͓27͔.͒This distribution is veryfar from Poisson,and therefore we would expect that asimple random graph would give a very poor approximationof the structural properties of the web.However,the webdiffers from a random graph in another way also:it is di-rected.Links on the web lead from one page to another inonly one direction͓see Fig.1͑b͔͒.As discussed by Broderet al.͓26͔,this has a signiﬁcant practical effect on the typical accessibility of one page from another,and this effect also will not be captured by a simple͑undirected͒random graph model.A further class of networks that has attracted scrutiny is the class of collaboration networks.Examples of such net-works include the boards of directors of companies͓28–31͔, co-ownership networks of companies͓32͔,and collabora-tions of scientists͓33–37͔and movie actors͓15͔.As well as having strongly non-Poisson degree distributions͓14,36͔, these networks have a bipartite structure;there are two dis-tinct kinds of vertices on the graph with links running only between vertices of unlike kinds͓38͔—see Fig.2.In the case of movie actors,for example,the two types of vertices are movies and actors,and the network can be represented as a graph with edges running between each movie and the actors that appear in it.Researchers have also considered the pro-jection of this graph onto the unipartite space of actors only, also called a one-mode network͓38͔.In such a projection two actors are considered connected if they have appeared in a movie together.The construction of the one-mode network however involves discarding some of the information con-tained in the original bipartite network,and for this reason it is more desirable to model collaboration networks using the full bipartite structure.Given the high current level of interest in the structure of many of the graphs described here͓39͔,and given their sub-stantial differences from the ordinary random graphs that have been studied in the past,it would clearly be useful if we could generalize the mathematics of random graphs to non-Poisson degree distributions,and to directed and bipartite graphs.In this paper we do just that,demonstrating in detail how the statistical properties of each of these graph types can be calculated exactly in the limit of large graph size.We also give examples of the application of our theory to the model-ing of a number of real-world networks,including the world-wide web and collaboration graphs.II.RANDOM GRAPHS WITH ARBITRARYDEGREE DISTRIBUTIONSIn this section we develop a formalism for calculating a variety of quantities,both local and global,on large unipar-tite undirected graphs with arbitrary probability distribution of the degrees of their vertices.In all respects other than their degree distribution,these graphs are assumed to be entirely random.This means that the degrees of all vertices are inde-pendent identically distributed random integers drawn from a speciﬁed distribution.For a given choice of these degrees, also called the‘‘degree sequence,’’a graph is chosen uni-formly at random from the set of all graphs with that degree sequence.All properties calculated in this paper are averaged over the ensemble of graphs generated in this way.In the limit of large graph size an equivalent procedure is to study only one particular degree sequence,averaging uniformly over all graphs with that sequence,where the sequence is chosen to approximate as closely as possible the desired probability distribution.The latter procedure can be thought of as a‘‘microcanonical ensemble’’for random graphs, where the former is a‘‘canonical ensemble.’’Some results are already known for random graphs with arbitrary degree distributions:in two beautiful recent papers ͓40,41͔,Molloy and Reed have derived formulas for the po-sition of the phase transition at which a giant componentﬁrst appears,and the size of the giant component.͑These results are calculated within the microcanonical ensemble,butapply FIG.2.A schematic representation͑top͒of a bipartite graph, such as the graph of movies and the actors who have appeared in them.In this small graph we have four movies,labeled1to4,and 11actors,labeled A to K,with edges joining each movie to the actors in its cast.In the lower part of the picture we show the one-mode projection of the graph for the11actors.M.E.J.NEWMAN,S.H.STROGATZ,AND D.J.WATTS PHYSICAL REVIEW E64026118equally to the canonical one in the large system size limit.͒The formalism we present in this paper yields an alternative derivation of these results and also provides a framework for obtaining other quantities of interest,some of which we cal-culate.In Secs.III and IV we extend our formalism to the case of directed graphs ͑such as the world-wide web ͒and bipartite graphs ͑such as collaboration graphs ͒.A.Generating functionsOur approach is based on generating functions ͓42͔,the most fundamental of which,for our purposes,is the generat-ing function G 0(x )for the probability distribution of vertex degrees k .Suppose that we have a unipartite undirected graph—an acquaintance network,for example—of N verti-ces,with N large.We deﬁneG 0͑x ͒ϭ͚k ϭ0ϱp k x k ,͑2͒where p k is the probability that a randomly chosen vertex onthe graph has degree k .The distribution p k is assumed cor-rectly normalized,so thatG 0͑1͒ϭ1.͑3͒The same will be true of all generating functions consideredhere,with a few important exceptions,which we will note at the appropriate point.Because the probability distribution is normalized and positive deﬁnite,G 0(x )is also absolutely convergent for all ͉x ͉р1,and hence has no singularities in this region.All the calculations of this paper will be conﬁned to the region ͉x ͉р1.The function G 0(x ),and indeed any probability generat-ing function,has a number of properties that will prove use-ful in subsequent developments.Derivatives .The probability p k is given by the k th deriva-tive of G 0according top k ϭ1k !d k G 0dx kͯx ϭ0.͑4͒Thus the one function G 0(x )encapsulates all the information contained in the discrete probability distribution p k .We say that the function G 0(x )‘‘generates’’the probability distribu-tion p k .Moments .The average over the probability distribution generated by a generating function—for instance,the aver-age degree z of a vertex in the case of G 0(x )—is given byz ϭ͗k ͘ϭ͚kkp k ϭG 0Ј͑1͒.͑5͒Thus if we can calculate a generating function we can alsocalculate the mean of the probability distribution which it generates.Higher moments of the distribution can be calcu-lated from higher derivatives also.In general,we have͗k n͘ϭ͚kk n p k ϭͫͩx d dxͪnG 0͑x ͒ͬx ϭ1.͑6͒Powers .If the distribution of a property k of an object isgenerated by a given generating function,then the distribu-tion of the total of k summed over m independent realizations of the object is generated by the m th power of that generat-ing function.For example,if we choose m vertices at random from a large graph,then the distribution of the sum of the degrees of those vertices is generated by ͓G 0(x )͔m .To see why this is so,consider the simple case of just two vertices.The square ͓G 0(x )͔2of the generating function for a single vertex can be expanded as͓G 0͑x ͔͒2ϭ͚ͫkp k xkͬ2ϭ͚jkp j p k x j ϩk ϭp 0p 0x 0ϩ͑p 0p 1ϩp 1p 0͒x 1ϩ͑p 0p 2ϩp 1p 1ϩp 2p 0͒x 2ϩ͑p 0p 3ϩp 1p 2ϩp 2p 1ϩp 3p 0͒x 3ϩ•••.͑7͒It is clear that the coefﬁcient of the power of x n in this expression is precisely the sum of all products p j p k such that j ϩk ϭn ,and hence correctly gives the probability that the sum of the degrees of the two vertices will be n .It is straight-forward to convince oneself that this property extends also to all higher powers of the generating function.All of these properties will be used in the derivations given in this paper.Another quantity that will be important to us is the distri-bution of the degree of the vertices that we arrive at by following a randomly chosen edge.Such an edge arrives at a vertex with probability proportional to the degree of that vertex,and the vertex therefore has a probability distribution of degree proportional to kp k .The correctly normalized dis-tribution is generated by͚k kp k x k ͚kkp k ϭxG 0Ј͑x ͒G 0Ј͑1͒.͑8͒If we start at a randomly chosen vertex and follow each ofthe edges at that vertex to reach the k nearest neighbors,then the vertices arrived at each have the distribution of remaining outgoing edges generated by this function,less one power of x ,to allow for the edge that we arrived along.Thus the distribution of outgoing edges is generated by the functionG 1͑x ͒ϭG 0Ј͑x ͒G 0Ј͑1͒ϭ1z G 0Ј͑x ͒,͑9͒where z is the average vertex degree,as before.The prob-ability that any of these outgoing edges connects to the origi-RANDOM GRAPHS WITH ARBITRARY DEGREE ...PHYSICAL REVIEW E 64026118nal vertex that we started at,or to any of its other immediate neighbors,goes as N Ϫ1and hence can be neglected in the limit of large N .Thus,making use of the ‘‘powers’’property of the generating function described above,the generating function for the probability distribution of the number of second neighbors of the original vertex can be written as͚kp k ͓G 1͑x ͔͒k ϭG 0...G 1͑x ͒. (10)Similarly,the distribution of third-nearest neighbors is gen-erated by G 0(G 1…G 1(x )…),and so on.The average numberz 2of second neighbors isz 2ϭͫddx G 0…G 1͑x ͒…ͬx ϭ1ϭG 0Ј͑1͒G 1Ј͑1͒ϭG 0Љ͑1͒,͑11͒where we have made use of the fact that G 1(1)ϭ1.͑Onemight be tempted to conjecture that since the average num-ber of ﬁrst neighbors is G 0Ј(1),Eq.͑5͒,and the average number of second neighbors is G 0Љ(1),Eq.͑11͒,then the average number of m th neighbors should be given by the m th derivative of G 0evaluated at x ϭ1.As we show in Sec.II F,however,this conjecture is wrong.͒B.ExamplesTo make things more concrete,we immediately introduce some examples of speciﬁc graphs to illustrate how these cal-culations are carried out.(a)Poisson-distributed graphs .The simplest example of a graph of this type is one for which the distribution of degree is binomial,or Poisson in the large N limit.This distribution yields the standard random graph studied by many mathema-ticians and discussed in Sec.I.In this graph the probability p ϭz /N of the existence of an edge between any two vertices is the same for all vertices,and G 0(x )is given by G 0͑x ͒ϭ͚k ϭ0NͩN kͪp k ͑1Ϫp ͒N Ϫk x k ϭ͑1Ϫp ϩpx ͒N ϭe z (x Ϫ1),͑12͒where the last equality applies in the limit N →ϱ.It is thentrivial to show that the average degree of a vertex is indeed G 0Ј(1)ϭz and that the probability distribution of degree is given by p k ϭz k e Ϫz /k !,which is the ordinary Poisson distri-bution.Notice also that for this special case we have G 1(x )ϭG 0(x ),so that the distribution of outgoing edges at a vertex is the same,regardless of whether we arrived there by choosing a vertex at random,or by following a randomly chosen edge.This property,which is peculiar to the Poisson-distributed random graph,is the reason why the theory of random graphs of this type is especially simple.(b)Exponentially distributed graphs .Perhaps the next simplest type of graph is one with an exponential distribution of vertex degreesp k ϭ͑1Ϫe Ϫ1/␬͒e Ϫk /␬,͑13͒where ␬is a constant.The generating function for this dis-tribution isG 0͑x ͒ϭ͑1ϪeϪ1/␬͚͒k ϭ0ϱeϪk /␬x kϭ1Ϫe Ϫ1/␬1Ϫxe Ϫ1/␬,͑14͒andG 1͑x ͒ϭͫ1Ϫe Ϫ1/␬1Ϫxe Ϫ1/␬ͬ2.͑15͒An example of a graph with an exponential degree distribu-tion is given in Sec.V A.(c)Power-law distributed graphs .The recent interest in the properties of the world-wide web and of social networks leads us to investigate the properties of graphs with a power-law distribution of vertex degrees.Such graphs have beendiscussed previously by Baraba´si and co-workers ͓22,23͔and by Aiello et al.͓13͔.In this paper,we will look at graphs with degree distribution given byp k ϭCk Ϫ␶e Ϫk /␬for k у1,͑16͒where C ,␶,and ␬are constants.The reason for including the exponential cutoff is twofold:ﬁrst many real-world graphs appear to show this cutoff ͓14,36͔;second it makes the distribution normalizable for all ␶,and not just ␶у2.The constant C is ﬁxed by the requirement of normaliza-tion,which gives C ϭ͓Li ␶(e Ϫ1/␬)͔Ϫ1and hencep k ϭk Ϫ␶e Ϫk /␬Li ␶͑e Ϫ1/␬͒for k у1,͑17͒where Li n (x )is the n th polylogarithm of x .͓For those unfa-miliar with this function,its salient features for our purposes are that it is zero at x ϭ0and,real,ﬁnite,and monotonically increasing in the range 0рx Ͻ1,for all n .It also decreases with increasing n ,and has a pole at x ϭ1for n р1only,although it has a valid analytic continuation below n ϭ1which takes the value ␨(n )at x ϭ1.͔Substituting Eq.͑17͒into Eq.͑2͒,we ﬁnd that the gener-ating function for graphs with this degree distribution isG 0͑x ͒ϭLi ␶͑xe Ϫ1/␬͒Li ␶͑e Ϫ1/␬͒.͑18͒In the limit ␬→ϱ—the case considered in Refs.͓13͔and ͓23͔—this simpliﬁes toG 0͑x ͒ϭLi ␶͑x ͒␨͑␶͒,͑19͒where ␨(␶)is the Riemann ␨function.The function G 1(x )is given byG 1͑x ͒ϭLi ␶Ϫ1͑xe Ϫ1/␬͒x Li ␶Ϫ1͑e Ϫ1/␬͒.͑20͒M.E.J.NEWMAN,S.H.STROGATZ,AND D.J.WATTS PHYSICAL REVIEW E 64026118Thus,for example,the average number of neighbors of a randomly chosen vertex iszϭG0Ј͑1͒ϭLi␶Ϫ1͑eϪ1/␬͒Li␶͑eϪ1/␬͒,͑21͒and the average number of second neighbors isz2ϭG0Љ͑1͒ϭLi␶Ϫ2͑eϪ1/␬͒ϪLi␶Ϫ1͑eϪ1/␬͒Li␶͑eϪ1/␬͒.͑22͒(d)Graphs with arbitrary speciﬁed degree distribution.In some cases we wish to model speciﬁc real-world graphs that have known degree distributions—known because we can measure them directly.A number of the graphs described in the Introduction fall into this category.For these graphs,we know the exact numbers n k of vertices having degree k,and hence we can write down the exact generating function for that probability distribution in the form of aﬁnite polynomialG0͑x͒ϭ͚kn k x k͚kn k,͑23͒where the sum in the denominator ensures that the generating function is properly normalized.As an example,suppose that in a community of1000people,each person knows between zero andﬁve of the others,the exact numbers of people in each category being,from zero toﬁve:͕86,150,363,238,109,54͖.This distribution will then be gen-erated by the polynomialG0͑x͒ϭ86ϩ150xϩ363x2ϩ238x3ϩ109x4ϩ54x51000.͑24͒ponent sizesWe are now in a position to calculate some properties of interest for our graphs.First let us consider the distribution of the sizes of connected components in the graph.Let H1(x)be the generating function for the distribution of the sizes of components that are reached by choosing a random edge and following it to one of its ends.We explicitly ex-clude from H1(x)the giant component,if there is one;the giant component is dealt with separately below.Thus,except when we are precisely at the phase transition where the giant component appears,typical component sizes areﬁnite,and the chances of a component containing a closed loop of edges goes as NϪ1,which is negligible in the limit of large N.This means that the distribution of components generated by H1(x)can be represented graphically as in Fig.3;each component is treelike in structure,consisting of the single site we reach by following our initial edge,plus any number ͑including zero͒of other treelike clusters,with the same size distribution,joined to it by single edges.If we denote by q k the probability that the initial site has k edges coming out of it other than the edge we came in along,then,making use of the‘‘powers’’property of Sec.II A,H1(x)must satisfy a self-consistency condition of the formH1͑x͒ϭxq0ϩxq1H1͑x͒ϩxq2͓H1͑x͔͒2ϩ•••.͑25͒However,q k is nothing other than the coefﬁcient of x k in the generating function G1(x),Eq.͑9͒,and hence Eq.͑25͒can also be writtenH1͑x͒ϭxG1...H1͑x͒. (26)If we start at a randomly chosen vertex,then we have one such component at the end of each edge leaving that vertex, and hence the generating function for the size of the whole component isH0͑x͒ϭxG0...H1͑x͒. (27)In principle,therefore,given the functions G0(x)and G1(x),we can solve Eq.͑26͒for H1(x)and substitute into Eq.͑27͒to get H0(x).Then we canﬁnd the probability that a randomly chosen vertex belongs to a component of size s by taking the s th derivative of H0.In practice,unfortunately, this is usually impossible;Equation͑26͒is a complicated and frequently transcendental equation,which rarely has a known solution.On the other hand,we note that the coefﬁ-cient of x s in the Taylor expansion of H1(x)͑and therefore also the s th derivative͒are given exactly by only sϩ1itera-tions of Eq.͑27͒,starting with H1ϭ1,so that the distribution generated by H0(x)can be calculated exactly toﬁnite order inﬁnite time.With current symbolic manipulation programs, it is quite possible to evaluate theﬁrst one hundred or soderivatives in this way.Failing this,an approximate solution can be found by numerical iteration and the distribution of cluster sizes calculated from Eq.͑4͒by numerical differen-tiation.Since direct evaluation of numerical derivatives is prone to machine-precision problems,we recommend evalu-ating the derivatives by numerical integration of the Cauchy formula,giving the probability distribution P s of cluster sizes thus:P sϭ1s!d s H0dz sͯzϭ0ϭ12iͶH0͑z͒z sϩ1dz.͑28͒FIG.3.Schematic representation of the sum rule for the con-nected component of vertices reached by following a randomly cho-sen edge.The probability of each such component͑left-hand side͒can be represented as the sum of the probabilities͑right-hand side͒of having only a single vertex,having a single vertex connected to one other component,or two other components,and so forth.The entire sum can be expressed in closed form as Eq.͑26͒.RANDOM GRAPHS WITH ARBITRARY DEGREE...PHYSICAL REVIEW E64026118The best numerical precision is obtained by using the largestpossible contour,subject to the condition that it encloses nopoles of the generating function.The largest contour forwhich this condition is satisﬁed in general is the unit circle ͉z͉ϭ1͑see Sec.II A͒,and we recommend using this contour for Eq.͑28͒.It is possible toﬁnd theﬁrst thousand deriva-tives of a function without difﬁculty using this method͓43͔.D.The mean component size,the phase transition,and thegiant componentAlthough it is not usually possible toﬁnd a closed-formexpression for the complete distribution of cluster sizes on agraph,we canﬁnd closed-form expressions for the averageproperties of clusters from Eqs.͑26͒and͑27͒.For example,the average size of the component to which a randomly cho-sen vertex belongs,for the case where there is no giant com-ponent in the graph,is given in the normal fashion by͗s͘ϭH0Ј͑1͒ϭ1ϩG0Ј͑1͒H1Ј͑1͒.͑29͒From Eq.͑26͒we haveH1Ј͑1͒ϭ1ϩG1Ј͑1͒H1Ј͑1͒,͑30͒and hence͗s͘ϭ1ϩG0Ј͑1͒1ϪG1Ј͑1͒ϭ1ϩz12z1Ϫz2,͑31͒where z1ϭz is the average number of neighbors of a vertex and z2is the average number of second neighbors.We see that this expression diverges whenG1Ј͑1͒ϭ1.͑32͒This point marks the phase transition at which a giant com-ponentﬁrst appears.Substituting Eqs.͑2͒and͑9͒into Eq.͑32͒,we can also write the condition for the phase transition as͚kk͑kϪ2͒p kϭ0.͑33͒Indeed,since this sum increases monotonically as edges are added to the graph,it follows that the giant component exists if and only if this sum is positive.This result has been de-rived by different means by Molloy and Reed͓40͔.An equivalent and intuitively reasonable statement,which can also be derived from Eq.͑31͒,is that the giant component exists if and only if z2Ͼz1.Our generating function formalism still works when there is a giant component in the graph,but,by deﬁnition,H0(x) then generates the probability distribution of the sizes of components excluding the giant component.This means that H0(1)is no longer unity,as it is for the other generating functions considered so far,but instead takes the value 1ϪS,where S is the fraction of the graph occupied by the giant component.We can use this to calculate the size of the giant component from Eqs.͑26͒and͑27͒thus:Sϭ1ϪG0͑u͒,͑34͒where uϵH1(1)is the smallest non-negative real solution ofuϭG1͑u͒.͑35͒This result has been derived in a different but equivalent form by Molloy and Reed͓41͔,using different methods.The correct general expression for the average component size,excluding the͑formally inﬁnite͒giant component,if there is one,is͗s͘ϭH0Ј͑1͒H0͑1͒ϭ1H0͑1͒ͫG0…H1͑1͒…ϩG0Ј…H1͑1͒…G1…H1͑1͒…1ϪG1Ј…H1͑1͒…ͬϭ1ϩzu2͓1ϪS͔͓1ϪG1Ј͑u͔͒,͑36͒which is equivalent to Eq.͑31͒when there is no giant com-ponent(Sϭ0,uϭ1).For example,in the ordinary random graph with Poisson degree distribution,we have G0(x)ϭG1(x)ϭe z(xϪ1)͓Eq.͑12͔͒,and hence weﬁnd simply that1ϪSϭu is a solution of uϭG0(u),or equivalently thatSϭ1ϪeϪzS.͑37͒The average component size is given by͗s͘ϭ11ϪzϩzS.͑38͒These are both well-known results͓1͔.For graphs with purely power-law distributions͓Eq.͑17͒with␬→ϱ͔,S is given by Eq.͑34͒with u the smallest non-negative real solution ofuϭLi␶Ϫ1͑u͒u␨͑␶Ϫ1͒.͑39͒For all␶р2this gives uϭ0,and hence Sϭ1,implying that a randomly chosen vertex belongs to the giant component with probability tending to1as␬→ϱ.For graphs with␶Ͼ2,the probability of belonging to the giant component is strictly less than1,even for inﬁnite␬.In other words,the giant component essentiallyﬁlls the entire graph for␶р2, but not for␶Ͼ2.These results have been derived by differ-ent means by Aiello et al.͓13͔.E.Asymptotic form of the cluster size distributionA variety of results are known about the asymptotic prop-erties of the coefﬁcients of generating functions,some of which can usefully be applied to the distribution of cluster sizes P s generated by H0(x).Close to the phase transition, we expect the tail of the distribution P s to behave asP sϳsϪ␣eϪs/s*,͑40͒M.E.J.NEWMAN,S.H.STROGATZ,AND D.J.WATTS PHYSICAL REVIEW E64026118。