A Robust and Scalable Peer-to-Peer Gossiping Protocol
- 格式:pdf
- 大小:198.18 KB
- 文档页数:12
互联网绝对沟通绝对隔离英语作文English Response:The internet has revolutionized global communication, enabling near-instantaneous and cost-effective exchange of information across vast distances. However, this connectivity has also raised concerns about information overload and the potential for individuals to become overwhelmed by the constant bombardment of data. As a result, some have argued that we need to implement some form of "communication blackout" to protect our mentalwell-being and privacy.Proponents of a communication blackout argue that it would provide individuals with much-needed respite from the constant stream of information and allow them to recharge. They point to studies that have shown that excessive internet use can lead to stress, anxiety, and decreased sleep quality. A blackout would also allow individuals to reconnect with their offline surroundings and engage inmore meaningful interactions.Opponents of a communication blackout, on the other hand, argue that it would be impractical and would deprive individuals of access to essential information. They maintain that the internet is now an indispensable tool for work, education, and social interaction. Implementing a blackout would cut people off from these vital resourcesand could have negative consequences for society as a whole.Ultimately, the question of whether or not we need a communication blackout is a complex one with no easy answers. There are valid arguments to be made on both sides of the issue. However, it is important to weigh thepotential benefits and drawbacks carefully before making a decision.Chinese Response:互联网绝对沟通绝对孤立。
Performance Analysis of Probabilistic Flooding Using Random GraphsKonstantinos Oikonomou Dept.of Informatics,Ionian University Tsirigoti Square7,49100Corfu,Greeceokon@ionio.grIoannis StavrakakisDept.of Inform.&Telecoms,Univ.of Athens Panepistimiopolis,Ilissia,15784,Athens,Greeceioannis@di.uoa.grAbstractProbabilisticflooding(parameterized by a forwarding probability)has frequently been considered in the past,as a means of limiting the large message overhead associated with traditional(full)flooding approaches that are used to disseminate globally information in unstructured peer-to-peer and other networks.A key challenge in using proba-bilisticflooding is the determination of the forwarding prob-ability so that global network outreach is achieved while keeping the message overhead as low as possible.By showing that a probabilisticflooding network gener-ated by applying probabilisticflooding to a connected ran-dom graph network can be bounded by properly parame-terized random graph networks and invoking random graph theory results,bounds on the value of the forwarding prob-ability are derived guaranteeing global network outreach with high probability,while significantly reducing the mes-sage overhead.Bounds on the average number of messages –as well as asymptotic expressions-and on the average time required to complete network outreach are also de-rived,illustrating the benefits of the properly parameterized probabilisticflooding scheme.1.IntroductionIn modern network architectures such as peer-to-peer networks,global node outreach(i.e.,reaching all network nodes)is a major challenge.Reaching all nodes in a net-work is frequently required either to disseminate informa-tion(e.g.,advertise a certain service)or retrieve informa-tion(e.g.,service discovery).In structured peer-to-peer net-works,the available structure facilitates the global network outreach or reaching the appropriate node with relatively low delay and message overhead,[17],[4],[22],[15],[2], [20].In unstructured peer-to-peer networks,though,(e.g. Gnutella,[14]),the global network outreach is far more challenging to achieve efficiently,as there is no structure to take advantage of and design an effective scheme.As a result,the brute-force approach is followed,typically imple-mented through resource wasteful approaches such asflood-ing,[14],[9],[8],[10].Traditionalflooding that traverses all network links and reaches all network nodes,is not an efficient approach as it requires a number of messages equal to the number of network links.In view of the typically large size of peer-to-peer networks in terms of nodes and links,it is clear that traditionalflooding would not be effective for such envi-ronments.However,flooding is frequently considered for comparison purposes and in order to establish the relative efficiency of alternative schemes.Many variations of traditionalflooding have been pro-posed for service discovery in unstructured environments. In Gnutella,[14],a TTL(Time-To-Live)value is used to restrict messageflooding to a small number of hops around the node that has initiated the searching process(this node will be referred to as the initiator node).This approach may be scalable for small values of TTL but at the same time it significantly reduces the probability of locating the requested node(s)of interest in large peer-to-peer networks.Random walks,e.g.[9],[18],have been proposed to reduce the total number of messages by sending a limited number of special messages(agents)in the network.Each of them follows its own path by choosing randomly the next hop node.Messages terminate their walk either af-ter some time(e.g.,TTL expiration)or after checking with the initiator node and learning that the node of interest has already been discovered by another message,or a combi-nation of both.Hybrid probabilistic schemes(e.g.,a local flooding process initiated after a random walk)have also been proposed and analyzed,[8],as well as other schemes that adapt the employed TTL values in a probabilistic man-ner,[10].Another modification,[24],allows for network nodes to forward messages to their neighbors in a random manner,thus significantly reducing the number of messages in the network.The aforementioned idea of reducing the messages of traditionalflooding by selectively choosing the next hop nodes,lays also behind probabilisticflooding,[6],[26],[12],[23].Under probabilisticflooding,messages are forwarded to neighbor nodes based on a certain forwarding probability.There is clearly a trade-off between the induced total number of messages and the number of nodes that are actually reached by such messages:the smaller the prob-ability the smaller the message overhead and the larger the set of nodes in the network not being accessed through these messages.The work presented here investigates probabilisticflood-ing when the underlying network is a random graph,[13], and aims at designing such a scheme in a way that the afore-mentioned trade-off is well managed.That is,achieve high node reachability with a relatively small number of mes-sages.Analytical tools and results,borrowed from ran-dom graph theory,[13],[7],are considered for analyzing the probabilisticflooding.One of the main contributions of this work is establishing a connection between random graphs and the probabilisticflooding network;the latter is defined to be the network consisting of the(sub)set of links and nodes of the underlying random graph network that are traversed by the messages under the probabilisticflooding.It should be noted that the idea of randomly choosing the next neighbor node is not new and has been the subject of many research works in the past,[6],[26],[12],[23], [5],[25],[16],[21],[1],[27].Even though many of these works are related to probabilisticflooding(e.g.,[6],[26], [12],[23]),none of these works have addressed the prob-lem of deriving analytically boundaries f or the value of the forwarding probability that achieves global node outreach for a random graph.In fact,another contribution of this work is the derivation of analytical bounds on the appro-priate value of the forwarding probability,defined to be the value for which the probabilisticflooding network contains (with high probability)all network nodes(i.e.,all nodes are reached)using the smallest possible number of messages. To the best of the authors’knowledge this is thefirst time that such a result is derived for the particular environment. Equally important is the use of random graph theory for the analysis of a particular algorithm(i.e.,probabilisticflood-ing),as it may trigger more such considerations in this re-search area and facilitate the study of information dissemi-nation schemes under a new perspective.Finally,another contribution of this work is the deriva-tion of an upper bound on the(average)total number of messages under the probabilisticflooding.It turns out that this number(under the appropriate value of the forwarding probability)is significantly smaller than that induced under the traditionalflooding.However,as it is analytically shown in this paper,the price paid for this reduction is(a)an in-crease of the time required to outreach all network nodes (global outreach time);(b)global node outreach is achieved with high probability as opposed to certainty under tradi-tionalflooding.Section2summarizes important results from random graph theory that will be used throughout this work.Section3presents the probabilisticflooding scheme and Section4 discusses its connection to the random graphs.Analytical results are presented in Section5and conclusions are drawn in Section6.2.The Random Graph ModelUsually a network is represented by a graph G(V,L), where V is the set of nodes and L is the set of(bidirectional) links connecting the nodes.For example,if a link(u,v), exists between node u and node v,then(u,v)∈L.Ran-dom graphs,mainly introduced by the pioneering work of P.Erd˝o s and A.R´e nyi,[13],have some properties that help to shed light on various aspects of networks.These prop-erties appear in many different networks:social contacts, biological networks,telecommunication networks etc.,[3], [19].In the sequel,a random graph(and the correspond-ing network)will be represented by G p(N),where N is the number of nodes in the network and p an independent prob-ability that a link exists among any pair of network nodes, [7].For most of the cases,as it is also the case in this work, N is considered to be significantly large.A simple construction model to create a G p(N)network,[7],[3],[19],is to consider at the beginning only one node present in the network(e.g.,node0)and assume that nodes entering the network at any order(e.g.,node1entersfirst followed by node2etc.)follow the next rule:each node ar-riving in the network creates a link with any of the already existing nodes with probability p and it does not create the particular link with probability1−p.Consequently,when node1enters the network(only node0is present)a link is created(or not)with probability p(or1−p).When node2 enters the network(and nodes0and1are already present) a link is created(or not)between node2and node0with probability p(or1−p)and another link is created(or not) between node2and node1with probability p(or1−p). By the time the N-th node enters the network,there will beon average pN N−12links in G p(N),[7].From the afore-mentioned construction process,it is evident that for p=0, there are no links in the resulting graph,whereas for p=1, the resulting graph is the complete graph(i.e.,it contains all possible links among the N nodes that amount to N N−12).At this point it is important to note that in most of the cases the arguments are made with high probability(w.h.p.).For example,for p<1N, P r[the giant component exists]→0,while for p>1N, P r[the giant component exists]→1,[18].Actually,for p=1N,where the shape of the network suddenly changes, a phase transition,[7],phenomenon takes place.For p=log(N)Nall nodes become part of the giant component and the network becomes connected w.h.p.,[7].Thus,for any value of p ≥log(N )N ,G p (N )is connected w.h.p.The average number of links,|L |,for the network corre-sponding to G p (N ),when p =log(N )N ,[7],is given by,|L |=12(N −1)log(N ).The diameter of the resulting connected network,denoted by D ,has been proved,to“concentrate around”,[3],log(N )log(pN ),[11],which allows for,D ≈log(N )log(pN ).3.Probabilistic FloodingUnder probabilistic flooding,[6],the initiator node sends a message to each of its neighbor nodes with an (indepen-dent)forwarding probability p f .Any node receiving such a message forwards it to each of its own neighbor nodes (except from the node the message arrived from)with prob-ability p f .Clearly,for p f =0,no messages are sent in the network,while for p f =1,probabilistic flooding re-duces to traditional flooding.As a result of the probabilistic flooding,a network can be defined that consists of the set of nodes that have been reached by the messages and the set of links over which these messages have been forwarded.This particular network will be referred to hereafter as the prob-abilistic flooding network .It is easy to show (based on the definition of the probabilistic flooding)that the probabilistic flooding network is actually a connected network each link of which corresponds to exactly one forwarded message.The main objective in this paper is to derive appropriate values of the forwarding probability that will yield a proba-bilistic flooding network that will include all network nodes (i.e.,all nodes will be reached under probabilistic flooding)w.h.p.and at the same time -the average number of links contained in this probabilistic flooding network be as small as possible (to keep the (average)total number of messages small).Consider a connected (i.e.,p ≥log(N )N ))random graph network G p (N )as the underlying network.Let F p f (G p (N ))denote the probabilistic flooding network gen-erated over the (random graph)network G p (N )when prob-abilistic flooding is employed with probability p f .Under probabilistic flooding a message is forwarded with proba-bility p f over each of the links of G p (N )that are attached to it (except from the link from which the message arrived).As a link connects two different nodes and these nodes may receive a message through a different link (one of the other links attached to them),it is possible that both nodes at-tempt a message transmission over this common link (at the same or at different times).This will happen,for example,if one of the nodes receives a message first through a dif-ferent link,this node makes a failed attempt to forward a message over the common link (with probability (1−p f )),the other node receives a message (through another link)and consequently attempts a message forwarding over thecommon link (again,with probability (1−p f )).In such cases,a link will have two opportunities to forward a mes-sage and,thus,become part of F p f (G p (N )).Other links will have only one opportunity,though;for instance,this will be the case when a message forwarding attempt over a link fails and the node at the other end of the link never receives a message through one of its other links.Links of G p (N )which have only one opportunity to forward a message will be included in F p f (G p (N ))with probability p f ,while links of G p (N )which have two opportunities to forward a message will be included in F p f (G p (N ))with probability 1−(1−p f )(1−p f )=2p f −p 2f (to simplifythe notation let ˜p =2p f −p 2f ).By construction,the resulting probabilistic flooding net-work over a connected G p (N )network (F p f (G p (N )))seems to have a certain resemblance to random graphs.Such observations -allowing for the use of random graph theory for the analysis of probabilistic flooding -are dis-cussed and taken advantage of in the following section.4.Random Graph Network Representation of Probabilistic FloodingGiven that for each node of a connected network is as-sociated with at least one link and most likely with several,removing a link from a network does not necessarily dis-connect (or remove)an associated node as well.In other words,the decrease in the number of nodes in a network as a result of a decrease in the number of links is expected to be lower than the decrease in the number of links.Conse-quently,it is conceivable that all network nodes continue to be included in a network (i.e.,be connected)with high prob-ability (w.h.p.)despite the removal of a number of links.This observation suggests that a probabilistic flooding net-work with sufficiently high forwarding probability may still keep all the nodes connected and in the network,despite a potentially significant removal of links due to a decision not to forward a message over such links.In view of the above discussion it is evident that as p f decreases,the number of links in F p f (G p (N ))decreases as well,while the number of nodes in F p f (G p (N ))decreases at a lower rate.Consequently,for a small reduction in p f below the value of 1,it is expected that all network nodes be still included in F p f (G p (N ))w.h.p.It is thus expected that there is a certain value for the forwarding probability,denoted by p f,0,such that:(a)if p f <p f,0,then the proba-bilistic flooding network does not include all network nodes w.h.p.;(b)if p f ≥p f,0,then the probabilistic flooding net-work does include all network nodes w.h.p.p f,0will be referred to as the appropriate value of the forwarding prob-ability.The determination of p f,0is not an easy task and the focus in the sequel is on the analytical derivation of up-per and lower bounds.First consider the G p ×p f (N )ran-dom graph.G p ×p f (N )can be constructed using the con-struction model presented in Section 2or simply consid-ering G p (N )and then independently selecting each link of G p (N )with probability p f .Keeping in mind that the probabilistic flooding network is created by independently selecting links from G p (N )with probability p f for some of them and with probability ˜p for some others,it is evi-dent that F p f (G p (N ))contains on average more links than G p ×p f (N ).Consequently,when G p ×p f (N )is connected w.h.p.,then F p f (G p (N ))is also connected w.h.p.and,thus,includes all network nodes w.h.p.Before proceeding it is in-teresting to see whether F p f (G p (N ))can have (on average)as many links as G p ×p f (N ).This will be true when all the links of G p (N )are selected with the same probability p f under probabilistic flooding.This is the case,for example,when G p (N )is actually a tree and consequently,all links are selected with the same probability p f .A second observation is possible between F p f (G p (N ))and G p טp (N ).G p טp (N )can be created by independentlyselecting links from G p (N )with the same probability ˜p.Clearly,G p טp (N )contains (on average)more links than G p ×p f (N )(note that p f ≤2p f −p 2f ;the equality hold-ing for p f =1)and when the latter network is con-nected the former is also connected w.h.p.However,in contrast to the previous case,F p f (G p (N ))may not (on average)be as dense as G p טp (N ).This is easily con-cluded since under probabilistic flooding there exists at least one link that has been selected with probability p f .Actually,there are more than one:all links over which messages have been forwarded for the first time to a par-ticular node (e.g.,from the initiator node to its neighbor nodes).Consequently,F p f (G p (N ))contains (on average)fewer links than G p טp (N ).From the previous observa-tion it is now possible to derive analytical bounds for p f,0.From the discussion presented in Section 2,random graphG p ×p f (N )becomes connected w.h.p.for pp f =log(N )N ,or p f =log(N )pN .G p טp(N )becomes connected w.h.p.for p ˜p =log(N )N ,or p (2p f −p 2f )=log(N )N ,or pNp 2f −2pNp f +log(N )=0.The latter polynomial has two solutionsfor p f ,p f,1−2=2pN ±√4p 2N 2−4pN log(N )2pN ,or p f,1−2=1±1−log(N )pN .Note that since p ≥log(N )N(G p (N )is connected w.h.p.),1−log(N )pN ≥0.The solution that satis-fies 0≤p f ≤1,is given by p f,1=1−1−log(N )pN .As p f increases and by the time G p ×p f (N )becomesconnected w.h.p.,F p f (G p (N ))has already become con-nected as well and includes (on average)all network nodes w.h.p.since it contains on average more links than G p ×p f (N )(see earlier).Therefore,it is evident that,p f,0≤log(N )pNis satisfied (the equality holds only for the particular case that all links under probabilistic floodingare selected with probability p f ).On the other hand,ran-dom graph G p טp (N )contains more links (on average)than F p f (G p (N ))and therefore,it becomes connected w.h.p.forsmaller values of p f .Therefore,p f,0>1−1−log(N )pN .Eventually,1−1−log(N )pN <p f,0≤log(N )pN .(1)The latter inequality is an important one since it boundsthe particular value of the forwarding probability p f for which all network nodes are reachable w.h.p.with the smallest possible number of messages.Actually,the ob-servation that F p f (G p (N ))“lays”between G p ×p f (N )and G p טp (N )allows for (a)the use of the aforementioned “safe”value for p f to ensure that all network nodes are reached;(b)to derive analytical upper bounds (i.e.,worst case scenarios)with respect to the total number of messages and termination time.The corresponding analysis is the fo-cus of the following section.5.AnalysisFor comparison reasons,traditional flooding is also con-sidered here.Given that the number of edges of a randomgraph is on average pN N −12,the average number of mes-sages under traditional flooding,M t ,is given by,M t =pN N −12.Since for p f =log(N )pN ,F p f (G p (N ))contains allnetwork nodes G p (N ))w.h.p.,it is useful to derive the cor-responding average number of messages,M p .Clearly,M p corresponds to the average number of network links and it is already known that M p lays (on average)between the average number of links of G p ×p f (N )and G p טp (N ).The average number of links for G p ×p f (N )(G p טp (N ))is equalto pp f N N −12(p ˜p N N −12)and for p f =log(N )pN this number becomes 12(N −1)log(N )((1−log(N )2pN )(N−1)log(N )).Eventually,12(N −1)log(N )≤M p <(1−log(N )2pN)(N −1)log(N ).(2)From Equation (2)it is clear that M p increases with N as N log(N ),while M t increases as N 2.Consequently,the probabilistic flooding can reduce the message overhead substantially when N is large.Let R M =M upM t,where M updenotes the upper bound of the average number of messagesunder probabilistic flooding (i.e.,M u p =(1−log(N )2pN )(N −1)log(N )).Eventually,R M =2log(N )pN − log(N )pN2.(3)As expected from the previous discussion,R M →0,as N →+∞.Figure 1.a illustrates the message overhead savings by plotting R M versus N (100≤N ≤2000)and for various values of p .Note that throughout this work N was considered to be significantly large.and this is also the case under which most of the For in-stance,for a network G 0.8(1000),global network outreach can be achieved under probabilistic flooding with p f in the range (0.00188,0.00375)–see Equation (1)–with only around 1%of the messages expected under traditional flooding.The overhead savings of the probabilistic flood-ing are gained with the following costs.First,the global network outreach achieved under traditional flooding with certainty is now achieved only with high probability.That is,global outreach is only probabilistically guaranteed.a. b.Figure 1.R M and R t as a function of N for various values of p .The second cost paid for the messages overhead savings is regarding the time to complete the global outreach.The average maximum such time (assuming that the initiator is located at the network boundaries)is equal to the networkdiameter.Thus,t t =log(N )log(pN ),under the traditional floodingsince the diameter of G p (N )is equal to log(N )log(pN ).Since the diameter of F p f (G p (N ))lays between those of G p ×p f (N )and G p טp (N )w.h.p.,the average global outreach time t p ,is bounded as follows,log(N )log(p ˜p N )<t p ≤log(N )log(pp f N ).(4)Let t u p denote the upper bound in Equation (4)and let R t =t u pt t.Eventually,R t =log(pN )log(log(N )).(5)Since the diameter of F p f (G p (N ))can never be smaller than that of G p (N ),R t ≥1;equality holds when the two diameters are equal in which case the global network out-reach time are also equal.R t shows the time required to achieve global outreach in F p f (G p (N ))as a percentage ofthat in G p (N ).Figure 1.b illustrates R t as a function of N and for various values of p .For instance,for G 0.8(1000),the global network outreach time under probabilistic flood-ing is about 3.5times that under traditional flooding.Re-call for Figure 1.a that for the same example,only 1%of the total number of messages under traditional flooding were needed (on average)for global network outreach un-der probabilistic flooding.6.ConclusionsIn this work,the problem of limited information dissem-ination in large,unstructured networks is considered and specifically the focus has been on schemes that can achieve a global network outreach.Global network outreach is available in a network if the employed information dissem-ination scheme is capable of taking a message from any originating node to any other network node.Such network outreach is needed in order to support routing protocols,ad-vertise a new service,search for some information,etc.Tra-ditional flooding schemes achieve global network outreach in unstructured networks with certainty (deterministically,for a connected network)at a large message overhead cost.In this paper,probabilistic flooding schemes have been con-sidered in order to reduce their associated large overhead,at the price of providing probabilistic global network outreach guarantees.It is shown here that the network created by probabilis-tic flooding over a random graph network lays between two random graph networks -which are determined facilitating this way the derivation of analytical bounds on the value of the forwarding probability that results in a fairly decreased (compared to the traditional flooding)number of messages,while achieving global node outreach with high probabil-ity (w.h.p.).In particular,it is shown that the probabilistic flooding network F p f (G p (N ))generated by applying some forwarding probability p f over a connected random graph underlying network G p (N ),“lays”between two random graph networks:G p ×p f (N )and G p טp (N ).Actually,the probabilistic flooding network F p f (G p (N ))includes (on average)at least as many links as in G p ×p f (N )and at most as many as in G p טp (N ).If G p ×p f (N )is a connected net-work w.h.p.,then the probabilistic flooding network con-tains all network nodes w.h.p.The latter observation leads to the determination of a “safe”value for the forwarding probability p f that ensures w.h.p.that all nodes are reached under probabilistic flooding,even though some extra mes-sages might be required compared to the case of the optimal value that is hard to determine though.A comparison of the probabilistic flooding under the safe forwarding probability with the traditional flooding is car-ried out.It is shown that the number of messages under probabilistic flooding increases as N log(N )as opposed toN2under traditionalflooding.Thus,significant message overhead reduction can be achieved,especially for large networks(large N).The relative message overhead(com-pared to that under traditionalflooding)is also derived and shown to yield substantial message overhead savings even for low to medium values of N.However,as it was men-tioned,the global network outreach achieved under tradi-tionalflooding with certainty is now achieved only with high probability.Finally,the increase in the time needed to get the message across the network is also derived un-der both schemes and compared.As expected,this time is slightly higher under probabilisticflooding(compared to message reduction).Although theoretical results from random graphs support the claim in various places of this paper for results w.h.p.,it would be useful to assess this theoretical statements by sim-ulating connected random graph networks of various sizes and measuring the node outreach percentile;that is,mea-sure the probability that at least x%of the nodes are in-cluded in the probabilisticflooding network and are,thus, reachable.7.AcknowledgmentsThis work has been supported in part by the project Au-tonomic Network Architecture(ANA)(IST-27489),which is funded by IST FET Program of the European Commis-sion.References[1] D.T.A.Ganesh,L.Massoulie.The effect of network topol-ogy on the spread of epidemicss.In13-17March2005,IN-FOCOM2005,volume2,pages1455–1466,2005.[2]J.K.AB.Y.Zhao and A.Joseph.Tapestry:An infrastructurefor fault-tolerant wide-area location and routing.In Tech.Rep.UCB/CSD-01-1141,UCB,2001.[3]R.Albert and A.Barab´a si.[4] B.Awerbuch and D.Peleg.Concurrent online tracking ofmobile users.In ACM SIGCOMM Symposium on Commu-nication Architectures and Protocols,1991.[5]T.C.B.Williams,D.Mehta and W.Navidi.Predictive mod-els to rebroadcast in mobile ad hoc networks.IEEE Trans-actions on Mobile Computing(TMC),3:295–303,2004. [6] F.Banaei-Kashani and C.Shahabi.Criticality-based analy-sis and design of unstructured peer-to-peer network as“com-plex systems.In in Proceedings of the Third International Symposium on Cluster Computing and the Grid,pages351–358,2003.[7] B.Bollob´a s.Random Graphs.Cambridge University Press,second edition edition,1999.[8]M.M.C.Gkantsidis and A.Saberi.Hybrid search schemesfor unstructured peer-to-peer networks.In IEEE Infocom 2005,2005.[9] E.C.K.L.C.Lv,P.Cao and S.Shenker.Search and repli-cation in unstructured peer-to-peer networks.In ICS2002, 2002.[10]N.B.Chang and M.Liu.Optimal controlledflooding searchin a large wireless network.In Third International Sympo-sium on Modeling and Optimization in Mobile,Ad Hoc,and Wireless Networks(WiOpt’05),pages229–237,2005. [11] F.Chung and L.Lu.The diameter of sparse random graphs.Adv.in Appl.Math.,26(4):257–279,2001.[12]V.V.Dimakopoulos and E.Pitoura.On the performanceofflooding-based resource discovery.EEE Transactions on Parallel and Distributed Systems,17(11):290–297,Novem-ber2006.[13]P.Erd˝o s and A.R´e nyi.On random graphs.PublicationesMathematicae Debrecen6,pages290–297.[14]Gnutella.Gnutella rfc.http://rfc-gnutella./,2002.[15] D.K.M.K.I.Stoica,R.Morris and H.Balakrishnan.Chord:A scalable peer-to-peer lookup service for internet applica-tions.In in Proceedings of ACM SIGCOMM’01,2001. [16]R.Karp,C.Schindelhauer,S.Shenker,and B.V ocking.Randomized rumor spreading.In Proceedings of the41st Annual Symposium on Foundations of Computer Science, FOCS’00,2000.[17] F.H.M.van Steen and A.Tanenbaum.A model for world-wide tracking of distributed objects.In TINA’96Conference, 1999.[18]P.V.Mienghem.Performance Analysis of CommunicationNetworks and Systems.Cambridge University Press,2006.[19]M.E.J.Newman.The structure and function of complexnetworks.SIAM Review,45:167–256,2003.[20] A.Rowstron and P.Druschel.Pastry:Scalable,distributedobject location and routing for large-scale peer-to-peer sys-tems.In in IFIP/ACM International Conference on Dis-tributed Systems Platforms(Middleware),pages329–350, 2001.[21] A.G.B.P.S,Boyd and S.D.Randomized gossip algo-rithms.IEEE/ACM w.,14:2508–2530,2006. [22]M.H.R.K.S.Ratnasamy,P.Francis and S.Shenker.Ascalable content-addressable network.In in Proceedings of ACM SIGCOMM’01,2001.[23] D.Tsoumakos and N.Roussopoulos.Adaptive probabilisticsearch for peer-to-peer networks.In3rd IEEE International Conference on P2P Computing,2003.[24] D.G.V.Kalogeraki and D.Zeinalipour-Yazti.A local searchmechanism for peer-to-peer networks.In in CIKM(Interna-tional Conference on Information and Knowledge Manage-ment),2002.[25] B.Williams and parison of broadcasting tech-niques for mobile ad hoc networks.In ACMSymposium on Mobile Ad Hoc Networking and Computing(MOBIHOC), pages194–205,2002.[26] D.C.Yoav Sasson and A.Schiper.Probabilistic broadcastforflooding in wireless mobile ad hoc networks.In Swiss Federal Institute of Technology(EPFL),Technical Report IC/2002/54.[27]J.Y.H.Z.J.Haas and L.Li.Gossip-based ad hoc routing.IEEE/ACM w.,14:479–491,2006.。
点对点的英文作文Title: The Essence of Peer-to-Peer Communication。
In the modern era, the dynamics of communication have undergone a significant transformation, with peer-to-peer communication emerging as a pivotal aspect of interpersonal interaction. This form of communication, characterized by direct exchanges between individuals, holds immense significance in various spheres of life, ranging from personal relationships to professional collaborations. In this essay, we delve into the essence of peer-to-peer communication, exploring its intricacies, benefits, and implications.At its core, peer-to-peer communication embodies the fundamental essence of human connection. Unlikehierarchical communication structures, which entail a top-down dissemination of information, peer-to-peer communication fosters a sense of equality and reciprocity among participants. It transcends barriers of authority andhierarchy, enabling individuals to engage in authentic dialogue based on mutual respect and understanding.One of the defining features of peer-to-peer communication is its inherent flexibility and adaptability. In contrast to formal modes of communication, such as lectures or presentations, peer-to-peer communication allows for spontaneity and improvisation. Participants have the freedom to express their thoughts, opinions, and emotions in a fluid manner, without the constraints of predetermined scripts or agendas.Moreover, peer-to-peer communication facilitates active engagement and participation, fostering a collaborative environment where ideas can be shared, challenged, and refined. Through open dialogue and constructive feedback, individuals can collectively generate innovative solutions to complex problems, thereby driving progress and innovation across various domains.In the realm of education, peer-to-peer communication plays a crucial role in promoting collaborative learningand knowledge exchange. Students learn not only from teachers but also from their peers, who offer unique perspectives and insights derived from their diverse experiences and backgrounds. Peer-to-peer interactions encourage critical thinking and problem-solving skills, as students engage in lively discussions and debates to deepen their understanding of academic concepts.Similarly, in the workplace, peer-to-peer communication is instrumental in fostering teamwork and cohesion within organizations. By facilitating direct communication channels between employees, organizations can streamline decision-making processes and enhance operational efficiency. Moreover, peer-to-peer interactions nurture a sense of camaraderie and mutual support, fostering a positive work environment conducive to productivity and creativity.In the realm of technology, peer-to-peer communication has revolutionized the way information is disseminated and shared. Peer-to-peer networks, such as file-sharing platforms and decentralized systems, enable individuals toexchange data directly, bypassing traditional intermediaries. This decentralized model enhances privacy, security, and efficiency, empowering users to collaborate and communicate without relying on centralized authorities.However, despite its numerous benefits, peer-to-peer communication also presents certain challenges and limitations. In the absence of hierarchical structures, maintaining coherence and direction within group discussions can be challenging, leading to potential confusion or disorganization. Moreover, differences in communication styles or cultural backgrounds may give rise to misunderstandings or conflicts, necessitating effective communication strategies to foster mutual understanding and respect.In conclusion, peer-to-peer communication embodies the essence of human interaction, fostering connections based on equality, reciprocity, and collaboration. From education to the workplace and beyond, peer-to-peer communication plays a pivotal role in driving progress, innovation, and social cohesion. By embracing the principles of openness,empathy, and active listening, individuals can harness the power of peer-to-peer communication to build meaningful relationships, facilitate learning, and drive positive change in society.。
Mix Cascades vs.Peer-to-Peer:Is One Concept Superior?Rainer B¨o hme1,George Danezis2,Claudia D´ıaz3,Stefan K¨o psell1,and Andreas Pfitzmann11Technische Universit¨a t Dresden,Germany{rainer.boehme,sk13,pfitza}@inf.tu-dresden.de2University of Cambridge,United Kingdomgeorge.danezis@3Katholieke Universiteit Leuven,Belgiumclaudia.diaz@esat.kuleuven.ac.beAbstract.After almost two decades of research on anonymous networkcommunication the development has forked into two main directions,namely Mix cascades and peer-to-peer(P2P)networks.As these designoptions have implications on the achievable anonymity and performance,this paper aims to elaborate the advantages and disadvantages of eitherconcept.After clarifying the scope of the discussion,we present argu-ments for Mix cascades and P2P designs on multiple areas of interest:the level of anonymity,the incentives to cooperate,aspects of availabil-ity,and performance issues.Pointed thesis and antithesis are given forboth sides,before afinal synthesis tries to articulate the status quo ofthe discussion.1IntroductionDavid Chaum’s initial work[6]on Mixes led to a vast number of proposals on how to provide anonymous communication on the Internet.Some of these approaches have already been put into practice while others still rest as blueprints.They all have in common,that multiple Mixes are used to establish a certain amount of anonymity.The most salient difference between those approaches is the way, in which the Mixes are connected and organised.Two idealised concepts set the range on a continuum of possible designs.On the one end,we have Mix cascades:dedicated servers joining traffic from a large set of users and uniformly redirecting it on a predefined route.The other end is defined by peer-to-peer (P2P)systems:widely distributed and equal client applications unpredictably routing their traffic over all possible routes.The design of a Mix system has implications on the achievable anonymity and performance.This article tries to cover a wide range of aspects,which are all related with this essential design decision.It is not so much focused on detailed analysis of certain specific implementations(e.g.,[12,13,29]),but rather on the underlying2R.B¨o hme et al.explicit or implicit threat models.As the area of relevant aspects is broad and evidence still sparse,this paper is unlikely to deliver solid proofs.It should rather be regarded as an attempt to collect softer arguments—call it assumptions or beliefs—for both sides of the discussion.Arguments,which we consider as highly important,but which do notfit into the usually better grounded research papers because they are so hard to quantify[14,26,28].Apart from documenting these points,we try to structure the discussion and eventually make it more salient to the community.We assume that the reader is already familiar with the basic concepts of Mixes,so that our paper is structured straight off.In the next section,we propose a scheme to arrange the different approaches along two dimensions:flexibility of routing and task sharing(resources as well as trust and liability).In Section3,we pick up the arguments for Mix cascades and evaluate the impact of the respective designs on four aspects:anonymity,incentives to cooperate,availability,and performance.We underline our claims with a number of pointed thesis.In Section 4we contrast these points with arguments for peer-to-peer systems and assert a set of antitheses.Finally,a concluding synthesis is given in Section5.2Classification of ApproachesTo clarify the subject of the discussion,we propose to break down the properties of the different approaches on two dimensions.Thefirst difference between Mix cascades and P2P networks is the freedom of choice of traffic routing.Typical Mix cascades offer no choice to their users while free choice of link partners—and hence fullflexibility of routing—is a key feature of P2P concepts.The advantages and drawbacks of precisely this design issue have been addressed in the literature, for example in[5]and[8].However,none of the pure concepts seems to dominate the other one in all aspects.The second difference between Mix cascades and P2P networks concerns the organisation of the system.Mix cascades are typically provided by a small set of entities.Their Mixes are located at a small number of places,usually run on dedicated servers,and are administered by professionals.In contrast to this institutionalised approach,pure P2P networks avoid any exposed instances and organise themselves in a rather anarchical manner.All nodes act as equal peers and there is no allocation of duties.Hence,we name the second dimension task sharing.The division includes technical resources(e.g.,load balancing)as well as intangible aspects such as trust and responsibility.According to these two dimensions,we can arrange the different approaches for anonymity in networks in a two-by-two matrix as shown in Table1(p.3).The pure concept of Mix cascades is characterised by asymmetric task sharing andfixed routes.On the opposite,P2P networks consist of equal peers perform-ing the same tasks and routing their traffic over a multiple of user-selected routes. Mix networks are between these two extremes.They allow users to choose their respective routes,but still rely on dedicated Mix servers,which handle different tasks than the multiple of clients.Further topologies for anonymous networkMix Cascades vs.Peer-to-Peer3 Table1.Classification of different concepts for anonymous networksFlexibility of routingTask sharing System defined User’s choiceAsymmetric Mix cascade Mix networkSymmetric DC-net,Broadcast P2P network communication,such as DC-nets[7]and broadcast[20],exceed the scope of thisdiscussion.The proposed model helps to structure the topic of discussion,because cer-tain advantages or disadvantages can be precisely linked with one of the dimen-sions.However,the routing dimension may need a further refinement,because argumentation on the pros and cons of the pure concepts tends to drift some-where in the middle.For example,the literature responds to the critics of fully connected networks with restricted routes in sparse networks[8].Also the re-cent synchronous batching approach tries to join the predictable timing feature from Mix cascades with decentralised topologies[17].Thus it might be useful to subdivide the routing dimension into four sections,as shown in Table2.Table2.Extended classification of approaches for network anonymityFlexibility of routingSystem defined User’s choice Task sharing Fixed Variable Restricted FreeAsymmetric A1A2A3A4Symmetric S1S2S3S4 In the remainder of this paper we will discuss the impact of design decisions on each of the dimensions on the total quality of an anonymity service.The next section will show,why some scholars believe that the solutions A1and A2 provide the highest anonymity,coupled with the best performance in case of A1. In Section4we will elaborate why other researchers consider the solutions S3 and S4,or at least A3and A4as more desirable.4R.B¨o hme et al.3Arguments for Mix CascadesThe quality of anonymity services can be broadly divided into aspects of security and usability.However,as usability affects the size of the anonymity set,it also matters for the security of a system[2].Hence,the ideal system can be described as follows:It provides anonymous access to,or sending of any information(avail-ability),by hiding all information about the recipient’s and/or sender’s identity (anonymity)within a maximum large set of users(incentives to cooperate),with minimum degradation in usable bandwidth and response time(performance).To evaluate how Mix cascades and P2P networks perform in realising these goals, we analyse the impact of(1)flexibility of routing and(2)task sharing on the above-mentioned aspects.3.1AnonymityThe maximum anonymity that can be provided by Mix cascades means,that –all other senders or recipients of the messages of a particular time interval, or–all Mixeshave to cooperate to trace a message against the intention of its sender or re-cipient.A proof for this fact in a possibilistic as well as a probabilistic setting is given in[25]1.The effects offlexible routing on anonymity are described in[5].Assuming a globally listening and participating adversary,the authors conclude:“If only one Mix of a route is trustworthy,then the achievable anonymity is distinctly lower in a Mix network compared to a synchronously working Mix cascade”(p.37).Thesis1P2P networks assume adversaries outside a closed group of people.So,it is evident that the requirements for an anonymity service heavily de-pend on the threat model considered.However,even if the threat model“all other senders or recipients...,or all Mixes”is deemed too strong,there are good arguments for modifying the parameters rather than replacing the design. First,the anonymity level of a cascade can be downgraded by reducing the num-ber of Mixes or the number of messages required per batch.Both measures have favourable side effects:They lead to an increase in performance and quality of service.Second,the upholding of a cascade design keeps the advantages of a clear structure of the system:In contrast to distributed approaches,cascades are easy to analyse and hence easy to prove.2This means that the system is more robust 1The idea of the proof is also outlined in this document:/ 2004/talks/pfitza.pdf2The degree of complexity of P2P systems apparently inspires researchers to write fancy papers.However,as a consequence of this complexity,some really important aspects are regularly either neglected(e.g.,exit policies)or simplified with unrealistic assumptions(e.g.,low number or small fraction,resp.,of colluding nodes).Mix Cascades vs.Peer-to-Peer5 to remaining threats from information outside of the considered model.Third, when the threat model changes because stronger anonymity may be required in future,the switching costs are kept to moderate levels only if a cascade infras-tructure is already set up:Changing parameters is far cheaper than switching to a new topology.However,on the task sharing dimension,the low number of known Mix servers makes Mix cascades more vulnerable against adversaries that try to gain control over these servers[1].Cascade proponents argue that symmetric task sharing in P2P networks is even more dangerous,because the peer nodes are run by end-users on out-of-the-box operating systems.This combination is less likely to withstand decent attacks compared to dedicated Mix servers adminis-trated by security experts as in the asymmetric case.Equal peers in a symmetric architecture are also more likely to be down,hence forcing their users to change to another route.Frequent route switching increases the probability of using at least one time only collusive nodes.Thesis2P2P proponents err in assuming that the sheer number of nodes im-plies that only a minority colludes.This thesis has implications in terms of nodes,and in terms of links.The danger of a powerful attacker participating with a large set of nodes is quite well understood.Some measures have been proposed,such as reputation systems [15,18],which make this attack more difficult at least.But a node does not necessarily need to be malicious because of its owner’s bad intentions.It is also possible,that an attacker selectively throws out precisely those nodes,which impede his observation.Thus,the often-cited strength of distributed systems against denial of service(DoS)does at the same time attract active attackers, just because routes are dynamically switched and partial failures keep unnoticed. Here are obvious relations to aspects of availability,which are discussed in the next section.3.2AvailabilityAvailability of anonymity services includes three steps:first,the possibility to reach an access-point,second,the correct operation of the Mixes and their com-munication,and third,the accessibility of requested information.Thefirst step concerns possible blocking efforts of authoritarian states or organisations.De-pending on the details of thefilter method,a powerful blocker can always restrict access to anonymity services.However,it is obvious that publicly known or even central access points are more vulnerable than distributed networks.Concerning thefirst step,symmetric systems dominate asymmetric ones if,and only if,their protocols do not unveil too much information about the network’s topology(which else could be exploited to block packet forward-ing—another rarely addressed aspect in the literature).Also the second step seems to be more robust with a symmetric structure because denial of service attacks would have to hit a large set of peers instead of some central servers.6R.B¨o hme et al.But Mix cascades can also be equipped with redundant backup servers.In ad-dition,the professional administration makes Mix servers less vulnerable than user-run peers.The last step is probably the most difficult to discuss.If the peers used white-lists to manage their exit-policies(see below)the probability to get arbitrary information from the Internet would be quite low.However,in-stitutionalised last Mixes of cascades are also likely to suppress access to certain content,depending on the respective jurisdiction.Thesis3Concerning availability,P2P seems to dominate the pure(no redun-dancy)cascade approach.Given redundant cascades,no concept is superior. 3.3Incentives to CooperateIt is evident that a large user base is a prerequisite for strong anonymity.As a common implication,researchers discuss usability and performance aspects to evaluate the incentives to participate.These aspects depend on bothflexibility of routing and task sharing.Apart from that,also on the task sharing dimension,the responsibility of exit-nodes is a very important point,which is only marginally addressed in the literature[1].For a risk averse user,being accountable for arbitrary crimes that are linkable with one’s IP-address or computer is the major drawback of P2P structures.If thefirst case of,say,“music industry sues participant of P2P an-onymity service”went public,the user-base is in danger to vanish.The juridical risk of participation is a major disincentive for users with honourable intentions.Even if we assume that each user can restrict the access to a limited amount of obviously innocuous websites by administrating exit policies(usually black-or white-lists),the cost of managing these policies will exceed the resources for the majority of participants.As black-lists will never completely evade the risk, and demand a huge effort to keep them up to date,white-lists might be used. This will probably exclude most of the requested information from the network and thus render the system useless.Thesis4P2P systems are unlikely to receive support by an ample community.So,if the masses will be using Mixes at all,strong arguments suggest it will be Mix cascades.Talking about masses means considering the end users, individuals often described as spoiled and reluctant.Here,apart from what has been said before,performance will be an equally critical success factor.3.4PerformanceP2P proponents often state that centralised routes set an upper limit to the ser-vice’s performance and thus Mix cascades would perform worse.While thefirst statement is true,the second one needs further consideration.In fact,the batch size(or more general:theflushing strategy)sets the upper limit for the level of anonymity a service can provide.The frequency of packet arrivals determines the response latency of each Mix.Hence,up to a certain limit,packets passMix Cascades vs.Peer-to-Peer7 Mixes the faster the more packets arrive.As Mix cascades usually are connected with higher bandwidth than distributed peers,the upper limit for a cascade is far beyond the limited bandwidth between P2P nodes.Because of this atypical relationship between volume and delay,Mix cascades dominate P2P systems in both,performance and batch size.Thesis5High load on cascades leads to reduced response latencies for a given batch size.P2P can never reach such capacity effects and therefore P2P always performs worse.Dummy traffic has been described as effective measure to prevent long-term intersection attacks[4].The designers of many P2P systems suggest using dummy traffic—and hence using bandwidth—to reduce the risk of traffic anal-ysis by an observing adversary.However,dummy traffic between two adjacent Mixes does not prevent attacks from insiders,i.e.,Mixes,because they always can separate dummy packets from real data.To make dummy traffic indistin-guishable from real data,it also has to be routed over multiple peers.This leads to further inefficiencies because dummy packets—now per definition indistin-guishable—must be treated with the same quality of service as payload packets and thus cause additional load on already critical bottlenecks in the network topology.Thesis6Given an insider adversary,dummy traffic inflexible routing systems either is useless or jams connections.In both cases,performance suffers for little reward.Summing up all arguments on a more general level,we can put them into two points:First,the security of the cascade design can be proven with little assumptions.So we should not replace this design by a more obscure one,unless we have very good reasons.Second,many people tend to give up privacy if it is inconvenient.So if anonymity systems shall appeal a broader public then quality of service and low costs of maintenance are crucial.Mix cascades provide both very well.4Arguments for Mix Network DesignsMix systems need to be trusted to hide the correspondence between input and output messages,something that cannot be proved or observed in any other way.There is a need to distribute Mix functionality in order to distribute this trust,necessary to provide anonymity,and in order to balance the load across the system.The Mix cascades provide some distribution of trust,but since all traffic is routed through all nodes in a Mix cascade no load balancing at all. Antithesis1Greater capacity,through load balancing,is a security property.General Mix networks allow for both distribution of trust and load balanc-ing.Each new node that is added to the network provides extra capacity,and8R.B¨o hme et al.provided that there is sufficient traffic to prevent traffic analysis,increases the anonymity of all traffic.At the same time the latency of the anonymous commu-nications remains relatively low,since path lengths do not need to grow nearly as fast as the size of the network[8].This has to be contrasted with the Mix cascade approach,where new cascades need to be constructed to accommodate more traffic.These cascades do not mix traffic amongst them,and as a result provide less anonymity to their users.Therefore our thesis holds that a system that is able to mix together more traffic,is in the long run not simply more scal-able.In the case of anonymity properties,that intrinsically rely on other people being present,it is also more secure.Antithesis2Robustness and availability are security properties.Mix cascades have intrinsic single points of failure.The failure of any node in the cascade will bring the operation of the whole cascade to a halt.Bypassing the failed node will require manual intervention,and key redistribution.This makes Mix cascades very fragile,in comparison with fully connected Mix networks, where failures in the network do not interrupt service:new routes that do not use the unavailable nodes are constructed,and used to route traffic.The fragility of cascades makes them more susceptible to denial of service attacks.It is not necessary to subject the whole network to such an attack,since flooding a single node(such as the entry node to the cascade,that needs to be publicly visible)would be sufficient.The same applies for legal or compulsion attacks:it is sufficient to stop the operation of one node to disrupt all com-munications.Since by default there can be only fewer nodes participating in a cascade,due to the traffic load,such legal or compulsion attacks are easier to mount from a technical point of view.Finally Mix cascades are vulnerable to even a small minority of insiders that would attempt to disrupt service by dropping packets,orflooding subsequent nodes.A rich literature exists on how to make cascade designs publicly verifiable, yet most of them rely on extremely expensive cryptographic primitives,and extremely complex protocols[24].None of them has so far been implemented. On the other hand,two more practical suggestion to provide robustness,batch signing[6,3]and random partial checking[22],can be equally well applied to Mix networks and Mix cascades.Unreliable anonymous communication channels are likely to frustrate users and drive them away from using the system all together.This will reduce anony-mity sets,and therefore lower the overall security of the channel.At the same time,denial of service itself can be used to assist other attacks,such as the n−1 attack[29].Antithesis3Trust means choice!The key to Mix mediated anonymous communications is that the user trusts that the Mixes will not reveal the relation between input and output messages. This choice cannot,and must not,be‘outsourced’to any other third party. Furthermore it is desirable to be able to easily set up a specific Mix node,forMix Cascades vs.Peer-to-Peer9 the use of particular communities that would trust it.Mix networks can easily accommodate such a trust model,and deployment model.On the other hand, the cost of running a cascade and its rigid structure makes it impossible for small communities to run trusted nodes,or to join and blend in the anonymity of larger groups.The assumption that all but one Mix nodes are going to be corrupt,as the proponents of the cascade paradigm often do,is based on the false premise that there is afixed set of nodes that everybody trusts and uses.On the other hand Mix networks allow for different users trusting different subsets of nodes.In extreme cases this would split the anonymity sets:two disjoint trusted groups will emerge,with different user bases.In most cases users will choose to trust overlapping sets of Mix nodes,in such a way that the anonymity sets are still confounded,and entangled.This provides maximal anonymity,and the ability of users or operators to make their own trust judgements.Restricted routes networks,where the network graph is not complete,allows Mix server operators to make trust judgements,and only interconnect with a small set of others.This again increases the resilience of the network against Sybil attacks[19],or other ways a large number of corrupt nodes could infiltrate the network.Antithesis4Mix networks increase the attack surface.Mix networks allow traffic to come in and out of many nodes in the net-work.Therefore a Global Passive Adversary(GPA)or a Global Active Adversary (GAA)needs to make an effort proportionate to the number of nodes in the net-work to retain its capabilities.Furthermore using the peer-to-peer paradigm[21, 27]to do mixing increases even further the cost of the attacker,by multiplying the number and making nodes transient.Therefore it is likely that an attacker will never be able to attain the status of GPA or GAA.This has to be contrasted with Mix cascade structures that offer very well defined entry and exit points.These can be observed at afixed cost,and inter-section attacks can be mounted against the participants[23,9].Combined with the intrinsically smaller anonymity sets,such an attack would be devastating. In other words by trying to protect against an assumed very powerful adversary, Mix cascades make the existence of such an adversary easily possible.The single point of entry and exit also makes traffic analysis of anonymised streams easier.In the absence of a lot of cover traffic,which none of thefielded systems over the Internet have been able to provide,it is easier for an adversary to gain all the information necessary to perform traffic analysis[11].Mix network based system,such as Onion Routing[16],face the same problems,but make it more difficult for an adversary to gain all the information necessary by using more nodes and links.Antithesis5Anonymity is hard,general purpose anonymous channels are even harder!10R.B¨o hme et al.Anonymising traffic between users requires the system to make all traffic‘look the same’.In the same way as Quality of Service algorithms operate,perfect anonymity systems require an intimate knowledge of the traffic characteristics they will carry.Anonymous remailers do so,by assuming that mail messages will be of a certain size,and can tolerate very high latencies.Peer-to-peer systems, that attempt to facilitatefile sharing or covert communications try to use the application specific knowledge they have to construct more robust anonymous channel for their specific purpose.Mix networks are the natural structure of such channels since the established topologies,the trust judgements,and the pre-existing connections can be used to carry the anonymous channel and secure it.On the other hand Mix cascades require a lot of cooperation to set up,that is specific to Mix cascade channel,and map with difficulty to any other pre-existing structure that nodes might have established amongst them.It is difficult to imagine how complete collaborative applications could be built to setup cascades.Antithesis6Mix networks offer theflexibility to handle unforeseen problems and opportunities.Mix cascades can be seen technically as a Mix network with an extremely restricted topology,namely a cascade.Systems that support Mix networks can therefore during their operation be turned into cascades[10],if it is proven necessary.On the other hand a cascade based system does not carry the infor-mation necessary(the routing information)to be easily converted into any other topology.Mix networks can also be used to provide hybrid solutions relating to the topology.A solution such as a‘core cascade’with peer-to-peer nodes as the en-trance points and the exit points,could for example be implemented.Mix net-work systems can also be modified more easily to environments,where cascades are not possible,such as anonymising ad-hoc wireless traffic,where messages have to travel across a set of wireless points restricted by the physical layout of the network.5ConclusionsIn this paper we have discussed the issues that should be taken into account when choosing an anonymous communication system.We have classified existing systems according to theflexibility of routing and the task sharing dimensions.The choice between symmetric and asymmetric systems and the appropriate flexibility of routing are dependent on the threat model considered;the require-ments of the services that are implemented on top of the anonymous infrastruc-ture,in terms of performance,anonymity and availability;and the incentives for the participants in the system.In order to provide a good anonymity service,we need to attract a large number of users.Therefore the quality of service and the liability issues should be taken into account.In this respect,asymmetric systems seem to be moreMix Cascades vs.Peer-to-Peer11 appropriate than symmetric systems because the users are not liable for other’s actions,they require less resources and the available bandwidth is higher(better quality of service).Regarding the resistance towards attacks,Mix networks require that the attacker is able to cover more surface of attack,given that the number of entry and exit points is larger.Moreover,Mix cascades are more vulnerable towards Mix failures,insider adversaries or denial of service attacks than Mix networks. Mix cascades require more trust from the user than Mix networks,given that the user cannot choose which nodes he wants to trust,nor he can add his own Mix to the cascade.Symmetric systems are more vulnerable to attacks than asymmetric systems because the security level of the nodes is lower.Contrary to the claims of many P2P designs,we state that the fact of having many nodes in the network does not imply that a strong attacker is not able to control a significant number of these nodes.Symmetric systems typically offer a much larger number of entry and exit points than asymmetric systems.This is a feature that enhances the availability of the system,specially towards strong adversaries who want to prevent the users from accessing the anonymity service(these symmetric systems must conceal the topology of the network in order to prevent blocking the access to the service). Regarding theflexibility of routing dimension,Mix networks have better avail-ability properties than cascades,because the number of entry and exit points is larger,and it is also more difficult for an adversary to provoke a denial of service attack that shuts down the anonymity service.Asfinal conclusion,we should say that more research and empirical data are needed in order tofind concrete answers,as well as to develop policies or method-ologies that can simplify the decision on the type of system we should implement according to the requirements of our application.We hope that this paper will help identifying the important issues that need to be taken into account by the designers of systems for anonymous network communication. AcknowledgementsThis paper summarises a panel discussion chaired by Claudia D´ıaz at the Privacy Enhancing Technologies Workshop2004in Toronto,Canada.It would not have been possible to collect this wide range of arguments without the input from the panelists,namely George Danezis,Christian Grothoff,Andreas Pfitzmann,and Paul Syverson,as well as many engaged contributions from the audience. References1.Acquisti,A.,Dingledine,R.,and Syverson,P.:On the Economics of Anonymity.In:Wright,R.N.(ed.):Financial Cryptography(FC2003),LNCS2742,Springer-Verlag,Berlin Heidelberg(2003)84–1022.Back,A.,M¨o ller,U.,Stiglic,A.:Traffic Analysis Attacks and Trade-Offs in Anony-mity Providing Systems.In:Moskowitz,I.S.(ed.):Information Hiding(IH2001), LNCS2137,Springer-Verlag,Berlin Heidelberg(2001)245–257。
如何平衡数字交流和面对面交流英语作文How to Balance Digital Communication and Face-to-Face InteractionIn today's fast-paced world, we are constantly bombarded with emails, text messages, and social media notifications. While digital communication has made it easier to stay connected with others, it has also taken a toll on our face-to-face interactions. It is important to strike a balance between digital communication and in-person communication in order to maintain healthy relationships and social skills. In this article, we will discuss some tips on how to achieve this balance.First and foremost, it is important to prioritize face-to-face communication whenever possible. While digital communication is convenient, nothing can truly replace the personal connection that comes from meeting someone in person. Make an effort to schedule regular meetups with friends and family, whether it be for coffee, lunch, or a night out. By spending quality time together in person, you can strengthen your relationships and build a deeper connection with those around you.When it comes to digital communication, be mindful of how often and how you are using it. Set boundaries for yourself andlimit the amount of time you spend on your phone or computer each day. Consider turning off notifications or setting specific times for checking your messages in order to minimize distractions and focus on the present moment.In addition, it is important to be aware of the impact that digital communication can have on your mental health. Studies have shown that excessive use of social media and technology can lead to feelings of loneliness, anxiety, and depression. To combat this, make an effort to engage in meaningfulface-to-face interactions and limit your screen time. Take time to disconnect from your devices and be fully present in the moment.Another way to strike a balance between digital communication and face-to-face interaction is to use technology as a tool to enhance your relationships, rather than replace them. Utilize video calls, voice messages, and social media to stay in touch with loved ones who may be far away. These tools can be a great way to maintain connections with those you care about, even when you cannot be together in person.Overall, finding a balance between digital communication and face-to-face interaction is essential for maintaining healthy relationships and social skills. By prioritizing in-personcommunication, setting boundaries for yourself, and using technology mindfully, you can cultivate meaningful connections with those around you. Remember to take time to disconnect, be present in the moment, and nurture your relationships both on and offline.。
Robust Control and Estimation Here are some tips to write long-form content that is both deep and impactful: Connect on an emotional level. Humans are emotional beings, so it's important to connect with your readers on an emotional level. This means writing in a way thatis relatable, engaging, and thought-provoking. Use storytelling, personal anecdotes, and vivid language to draw readers in and make them feel something.Don't be afraid to be vulnerable and share your own experiences – authenticity builds trust. Go beyond the surface. Don't be afraid to explore complex topicsin depth. Dig deep into the subject matter and offer new insights and perspectives. Research thoroughly and cite credible sources. Present evidence and data tosupport your claims, but weave it into compelling narratives. Show, don't tell. Instead of just stating facts, use descriptive language to create vivid images in the reader's mind. Help them experience the ideas rather than just reading about them. Use metaphors, analogies, and sensory details to bring your writing to life. Craft a compelling narrative structure. Even in long-form content, the way you structure your piece matters. Think of it as a journey for the reader. Start witha strong hook that grabs their attention. Build tension and anticipation, leading them through the complexities of your subject. Offer moments of reflection and resolution. A clear narrative arc will keep them engaged. Embrace vulnerability and authenticity. Readers can tell when you're being genuine. Don't be afraid to share your own struggles, doubts, and triumphs. This will help you connect withyour audience on a deeper level and make your writing more relatable. Use a conversational tone. Write as if you were talking to a friend. This means using simple language, avoiding jargon, and being personable. Long-form content doesn't have to be stiff and formal. In fact, a friendly, conversational tone can make it more engaging. Edit ruthlessly. Once you've finished writing, take some time to edit your work carefully. This means checking for grammar and spelling errors, as well as making sure that your writing is clear, concise, and easy to read. Get feedback from others and be willing to revise. Remember: The goal of writing impactful long-form content is not just to inform, but to move, inspire, and provoke thought. Let me know if you'd like to explore any of these aspects inmore detail or have a specific writing project in mind. I'm happy to help you craft content that resonates deeply with your audience!。
Peer Best PracticesIntroductionPeer-to-peer (P2P) networking is a decentralized form of communication where participants in the network act as both clients and servers. This technology has gained popularity in recent years due to its efficiency, scalability, and resistance to censorship. However, to ensure the smooth functioning of P2P networks, it is essential to follow certain best practices. In this article, we will explore the key considerations for peer best practices.Ensuring SecurityAuthentication and Authorization1.Implement strong authentication mechanisms to verify the identityof peers.e secure protocols for communication to prevent eavesdroppingand tampering.3.Define proper access control policies to restrict unauthorizedaccess to network resources.Data Encryption1.Encrypt sensitive data to maintain confidentiality.2.Utilize robust encryption algorithms and key management protocols.3.Regularly update encryption mechanisms to address emergingvulnerabilities.Intrusion Detection and Prevention1.Deploy intrusion detection systems to monitor network traffic forsuspicious activity.2.Implement intrusion prevention systems to block malicious trafficand prevent unauthorized access.3.Keep intrusion detection and prevention systems up to date withthe latest threat intelligence.Ensuring ReliabilityRedundancy and Load Balancing1.Implement redundant peer nodes to ensure high availability andfault tolerance.e load balancing techniques to distribute network traffic evenlyacross multiple peers.3.Regularly test and monitor the performance of peer nodes toidentify and resolve bottlenecks.Error Handling and Recovery1.Implement error handling mechanisms to gracefully handleexceptions and recover from failures.e error codes and messages to provide meaningful feedback topeers.3.Implement backup and recovery strategies to minimize data loss inthe event of a failure.Bandwidth Management1.Implement bandwidth management techniques to prioritize criticalnetwork traffic.e Quality of Service (QoS) mechanisms to ensure a consistentlevel of service for different types of traffic.3.Monitor and analyze network traffic to identify potentialbandwidth bottlenecks and optimize resource allocation.Ensuring ScalabilityPeer Discovery and Routing1.Implement efficient peer discovery mechanisms to enable new peersto join the network.e robust routing protocols to facilitate communication betweenpeers.3.Regularly update routing tables to reflect changes in the networktopology.Distributed Hash Tables (DHT)1.Utilize DHTs to create a distributed index of network resourcesfor efficient lookup and retrieval.2.Implement mechanisms to ensure data consistency and availabilityin DHTs.3.Regularly monitor and maintain DHTs to handle changes in thenetwork and prevent data loss.Network Partitioning and Replication1.Implement techniques to handle network partitions and ensure dataavailability across different network segments.e replication mechanisms to distribute data across multiplepeers for increased fault tolerance.3.Regularly test and validate the integrity of replicated data toensure consistency.ConclusionBy following these peer best practices, P2P networks can achieve enhanced security, reliability, and scalability. It is crucial for developers and network administrators to continuously evaluate and update their practices to adapt to evolving threats and challenges. Implementing these best practices will contribute to the overall success and efficiency of P2P networks.。
2025年全国大学英语CET四级考试复习试题及解答参考一、写作(15分)Part I Writing (30 points)Directions: For this part, you are allowed 30 minutes to write a short essay on the topic “The Impact of Artificial Intelligence on Daily Life.” You should start your essay with a brief introduction to the topic, then give specific examples to illustrate your point, and finally, provide a conclusion with your personal view. Your essay should be about 120 to 150 words but no less than 100 words.Writing Sample:The advent of artificial intelligence (AI) has revolutionized our daily lives in numerous ways. From smart homes to advanced medical diagnosis, AI has become an integral part of modern society.In smart homes, AI systems like voice assistants and smart security cameras enhance our convenience and safety. These systems learn from our habits and preferences, making our homes more comfortable and efficient. Moreover, in the healthcare sector, AI algorithms are being used to analyze medical images and identify potential diseases at an early stage, which can significantly improvepatient outcomes.However, the rise of AI also brings challenges. For example, job displacement is a major concern, as AI can perform certain tasks more efficiently than humans. Additionally, there are ethical questions about privacy, data security, and the potential misuse of AI technology.In conclusion, while AI has brought substantial benefits to our daily lives, we must also address its challenges to ensure a balanced and ethical integration of AI into our society.Writing Analysis:•Introduction: The essay starts with a clear introduction to the topic of AI and its impact on daily life, providing a broad perspective.•Body Paragraphs: The body of the essay presents two distinct impacts of AI:•The positive impact of AI in smart homes and healthcare.•The negative impacts of job displacement and ethical concerns.•Conclusion: The essay concludes with a balanced view, acknowledging both the benefits and challenges of AI, and emphasizing the need for ethical considerations.•Structure and Coherence: The essay has a clear structure and is well-organized, making the flow of ideas easy to follow.•Length: The essay meets the required word count, with 120 words, demonstrating the writer’s ability to convey the main points concisely.二、听力理解-短篇新闻(选择题,共7分)第一题News:In recent years, global attention has been drawn to the rapid development of electric vehicles (EVs). According to a recent report by the International Energy Agency (IEA), the number of electric vehicles on the roads worldwide reached 13 million in 2021, up from just 2 million in 2015. The report also indicates that by 2030, the number of electric vehicles is expected to surpass 145 million.Question 1:What has the number of electric vehicles on the roads reached as of 2021 according to the recent report by the IEA?A) 1 millionB) 13 millionC) 2 millionAnswer: BQuestion 2:How many years is it mentioned from 2015 to 2021 in the report?A) 5 yearsB) 6 yearsC) 7 yearsAnswer: BQuestion 3:What is the expected number of electric vehicles by 2030 according to the report?A) 13 millionB) 2 millionC) 145 millionAnswer: C第二题News Item 1:A new study reveals that the global use of electric scooters has increased significantly in recent years. These scooters are becoming a popular form of transportation in cities around the world. However, the study also highlights the environmental and safety concerns associated with the rapid growth in electric scooter usage.Cities are faced with the challenge of managing the increased demand for parking spaces, as well as the potential risks of accidents involving these scooters. Improved infrastructure and regulations are being considered to address these issues.Questions:1、What is the primary topic of the news item?A. The decline of traditional scootersB. The environmental impact of electric scootersC. The safety concerns of using electric scootersD. The rise in global use of electric scooters2、“These scooters are becoming a popular form of transportation in cities around the world.” Which of the following is true regarding the use of electric scooters?A. They are only popular in developed countries.B. They have no environmental impact.C. They are causing a decrease in car usage.D. They have become a common mode of transportation globally.3、“Improved infrastructure and regulations are being considered to address these issues.” What is the implied issue that needs to be addressed?A. The overuse of public transportation.B. The need for more parking spaces for cars.C. The decline in bicycle usage.D. The potential safety risks and management challenges posed by electric scooters.Answers:1.D2.D3.D三、听力理解-长对话(选择题,共8分)First QuestionConversationA: Hey, Sarah! Did you finish listening to the podcast this morning?B: Yeah, I did. It was quite fascinating. Have you checked the transcript on their webpage?A: Not yet. I plan to review what we heard today after work. By the way, I was thinking it would be nice to form a study circle this semester.B: That sounds like a good idea. Could you host a meeting this weekend?A: Sure, I can. I’ll prepare some questio ns for us to discuss, and you can bring in your notes. It’ll make our learning more productive.B: Great! Should we stick to the topics in the podcast or choose something else?A: Let’s talk about the topics in the podcast first. That way, it’ll help us understand the context better.B: Sounds perfect. I have a couple of questions for you. How long have you been listening to podcasts?A: Since about a year now. I find it’s a great way to learn English while doing something productive.B: I agree. What’s your favorite podcast?A: Hmm, I really like “The Economist Briefing.” It covers current events and history, which are topics I find interesting.B: Nice choice. I’m a fan of “TED Talks Daily.” It’s a bit different from “The Economist Briefing” but still educational.A: That’s true. We can switch up the topics as we like. What are youstudying?B: I’m majoring in international relations. The podcast really helps me get more insights into what I’m studying.A: That’s awesome. What about yo ur plans for the future?B: I hope to travel around Europe for my study abroad program next year, so I’m trying to learn more European languages. It would be a great opportunity to practice my English as well.A: That sounds exciting! This weekend, let’s m eet for an hour at my place, okay?B: Sure, that works for me.Q1. What is one reason Sarah likes listening to this podcast?a)To practice her English.b)To pass CET-4.c)To prepare for a trip.d)To learn her major subject.Answer: aQ2. How long has the speaker been listening to podcasts?a)One yearb)Two yearsc)Three yearsd)Half a yearAnswer: aQ3. Who does the speaker admire for choosing “TED Talks Daily”?a)Sarahb) A friendc) A professord)Another studentAnswer: aQ4. What will they do this weekend?a)Meet for an hour at the speaker’s place.b)Join a club activity.c)Go to a coffee shop.d)Attend a lecture on English.Answer: aQuestion 2:Why does Liu feel a bit nervous about the exam?A) He is preparing for it for too long.B) He hasn’t studied hard enough.C) His friends are also enrolled in CET-4 course classes.D) He needs to take a break soon.Answer: AQuestion 3:What advice does Amy give to Liu?A) Enroll in a CET-4 course class.B) Review the past papers.C) Study every day.D) Take a break.Answer: BQuestion 4:What can be inferred about Liu from the conversation?A) He is confident about the exam.B) He has been preparing for the exam for a long time.C) He is ready for the upcoming exam.D) He doesn’t like studying hard.Answer: B四、听力理解-听力篇章(选择题,共20分)第一题Directions: In this section, you will hear a passage. Listen carefully and answer the questions that follow.Passage:In today’s fast-paced digital world, it has become increasingly important for businesses to adopt technologies that improve their efficiency and customer satisfaction. The rise of artificial intelligence (AI) and machine learning (ML) has led to significant advancements in the field of business operations. Companies are now exploring various ways to integrate these technologies to enhance their processes.1、What aspect of business operations has seen significant advancements dueto AI and ML integration?A) Customer serviceB) LogisticsC) Financial managementD) A2、Why is the adoption of AI and ML technologies regarded as important for businesses?A) To reduce operational costsB) To improve customer satisfactionC) To increase operational efficiencyD) C3、Which of the following is NOT an example of how businesses can integrate AI and ML?A) Enhancing predictive analyticsB) Automating routine tasksC) Increasing manual data entryD) C第二题Passage 1The globalization of the economy has brought about significant changes in the world, and one area that has been heavily affected is the sports industry. In this essay, we will explore how globalization has impacted the sports industry,focusing on the growth of international sports events and the role of sports in global culture.1、Why is globalization having a profound impact on the sports industry?A) Because it allows sports to be practiced anywhere in the world.B) Because it has led to the growth of international sports events.C) Because it has changed the way people culture around the world.D) Because it has increased the salaries of professional athletes.2、Which of the following is not mentioned as a change brought about by globalization in the sports industry?A) The increase in cross-cultural interactions.B) The decline in local sports teams.C) The rise of regional sports leagues.D) The increase in global fan bases for various sports.3、What is the main argument made by the essay about the role of sports in global culture?A) Sports have a单一 focus on winning and losing.B) Sports help to foster national pride and identity.C) Sports have become a way for countries to cooperatively compete.D) Sports have lost their relevance due to increased commercialization.Answer Key:1、B2、BThird Question: Listening Comprehension - Listening PassagePassage:Welcome to our final research trip to India. We are in a small village in the state of Kerala, known for its rich cultural heritage and scenic beauty. The village, named Paravoor, has a population of approximately 15,000. Today, we focus on the local economy, which is largely dependent on farming, tourism, and small-scale industries. Currently, the village is facing several challenges, including water scarcity and lack of proper infrastructure. The government plans to implement a new irrigation project, which will provide a significant boost to the agricultural sector. In addition, the village is promoting eco-tourism to diversify its economic base. However, these initiatives require support and investment from both the government and the local community.1、Which of the following is NOT a challenge facing Paravoor Village?A、Water scarcityB、Lack of proper infrastructureC、Dependence on large-scale industriesD、C、2、What is the villagers’ plan to diversify their economic base?A、Developing new industriesB、Promoting eco-tourismC、Increasing agricultural production3、Which of the following is a potential benefit of the new irrigation project?A、It will help diversify the local economy.B、It will improve the infrastructure.C、It will provide water to the entire state.D、C、五、阅读理解-词汇理解(填空题,共5分)第一题Reading PassageAlice, receiving a ring, was extremely pleased. Her father promptly asked, “Have you made up your mind, my dear?” “Not quite,” said Alice ominously, stepping out of her ring. “But I will do so directly,” she declared.With a faint shiver of delight, the father experienced her civil but firm decision and then together they went to bet {?1?} her little servant girl a seventeen-pound horse. While they were thus occupied, the children saw their disagreement. The richest and keenest-uprisinguchepest, perfectly struck their fancy, and though their(Game) competitive position was, by no means, satisfactory, they had no objection to feel very sorry for the seller.1、civil A. 非常高兴的B. 礼貌的;文明的C. 无数的;无休止的D. 非常出色的2、competititive A. 竞争性的;竞赛的B. 嫉妒的;充满敌意的C. 令人厌恶的;讨厌的D. 无能的;不称职的3、keen A. 苦涩的;尖利的B. 明锐的;敏锐的C. 高兴的;愉快的D. 枯燥的;乏味的4、Ominous A. 不吉利的;不祥的B. 温和的;文雅的C. 欢快的;愉快的D. 兴奋的;激动的5、shiver A. 战栗;发抖B. 淡水C. 快速降雨D. 柔软的动物答案:1、B2、A3、B4、A5、A第二题Directions: Read the following text and complete the sentences below. There is one word or phrase missing in each sentence. Choose the most appropriate word or phrase from the options given below each sentence.Reading Passage:The rapid growth of technology has profoundly transformed our social fabric. From the emergence of the internet to the advent of smartphones, our daily interactions and work routines have been fundamentally altered. These technological advancements have not only facilitated instant communication but also expanded our access to information. However, this shift comes with its own set of challenges. For instance, while the internet provides a vast array of resources, it also exposes us to misinformation and the need for digital literacy is increasingly important. Moreover, the reliance on technology in the workplace has raised concerns about job security, as automation and artificial intelligence continue to evolve and change the nature of work.1、The word “fabric” (Line 1) most closely r elates to the following word: _[Options: a) fabric b) structure c) society d) clothing_]•1、c) society2、The phrase “emergence of the internet” (Line 3) can be replaced with which of the following: _[Options: a) the start of the internet b) the appearance of the internet c) the deployment of the internet d) the invention of the internet_]•2、b) the appearance of the internet3、The word “instant” (Line 4) is synonymous with: _[Options: a) immediate b) brief c) quick d) rapid_]•3、a) immediate4、The challenge mentioned in the passage regarding the internet is: _[Options: a) accessing information b) exposure to misinformation c) maintaining digital literacy d) balancing physical and digital interactions_]•4、b) exposure to misinformation5、The phrase “nature of work” (Line 7) refers to: _[Options: a) the quality of work b) the purpose of work c) the essence of work d) the value of work_]•5、c) the essence of work六、阅读理解-长篇阅读(选择题,共10分)第一题Reading Passage OneIt is widely accepted that education is of great importance to all people. However, there are many arguments on its necessity. While some people believe it is important to receive an education, others argue that education is not essential in one’s life.One of the main arguments for education is that it offers opportunities for personal development. With a good education, individuals can acquire the knowledge and skills needed to succeed in life. They can also improve theircritical thinking abilities and make informed decisions. Furthermore, an education can help individuals become more adaptable and flexible, enabling them to thrive in a changing world.Opponents of education argue that people can succeed without it. They cite examples of successful individuals who dropped out of school, such as Steve Jobs and比尔·盖茨. They believe that talent and opportunities can compensate for a lack of formal education.In the following passage, there are some statements about education. Choose the most suitable answer for each of the following questions.Questions 1-51、Which of the following is the main issue discussed in the reading passage?A. The benefits of educationB. The drawbacks of educationC. The importance of personal developmentD. The relationship between education and success2、What do the proponents of education believe about the role of education in personal development?A. Education hinders personal growth.B. Education does not contribute to skill acquisition.C. Education improves critical thinking and decision-making skills.D. Education makes individuals less adaptable.3、What is the main argument against education mentioned in the passage?A. Education limits personal development.B. Successful individuals can compensate for a lack of education.C. Education stifles creativity and innovation.D. Education takes away opportunities for self-betterment.4、Which of the following does the reading passage NOT mention as a reason for supporting education?A. Increased opportunities for employment.B. Enhanced critical thinking abilities.C. Improved adaptability and flexibility.D. Theernenment in international cooperation.5、What is the author’s attitude towards the debate on education?A. The author believes that education is unnecessary.B. The author supports the idea that education is essential for personal development.C. The author prefers talent and opportunities over education.D. The author is neutral on the issue of education.Answer Key:1、A2、C3、B4、D5、B第二题Passage:The concept of cloud computing has been discussed for decades, but it has only recently become a practical solution for businesses and individuals. Itall began with the idea of using the Internet as a transmission medium for data and applications. As technology advanced, the costs of storage and bandwidth became more affordable, making cloud computing a viable option. Today, cloud services range from simple file storage to complex application delivery, and they are accessible via web browsers or special software applications.The benefits of cloud computing are numerous. First, there is no need for costly hardware or maintenance. Cloud providers handle all the backend operations, ensuring that the service runs smoothly without requiring any intervention from users. Second, cloud services are highly scalable, meaning they can handle sudden increases in demand without additional investment. Third, cloud computing encourages collaboration and mobility, as users can access data and applications from anywhere with an internet connection. Finally, cloud services often come with robust security features, which are continuously updated, minimizing the risk of data breaches.However, cloud computing also comes with challenges. Security remains a significant concern, as data is stored remotely and vulnerable to cyberattacks. Additionally, there is the issue of data sovereignty, where data stored outside a country’s borders may be subject to the laws of that country. Furthermore, some companies may be hesitant to switch to cloud services due to the lack of control over their data, a common concern known as “control issues.”Questions:1、What is the main idea of the passage?a) The history of cloud computing.b) The benefits and challenges of cloud computing.c) The security concerns of cloud computing.d) The scalability of cloud computing.2、Why did cloud computing become practical recently?a) Because of the decreased costs of storage and bandwidth.b) Because of the widespread availability of the Internet.c) Because of the advancement in technology.d) Because of the decreasing demand for hardware.3、What are the benefits of cloud computing mentioned in the passage?a) No need for costly hardware, scalability, collaboration and mobility, and robust security features.b) High scalability, easy maintenance, and data sovereignty.c) Low costs, easy access, and increased data security.d) Remote access, data availability, and decreased bandwidth requirements.4、Which of the following is a challenge of cloud computing?a) The lack of mobility.b) The high costs of hardware.c) The security risks associated with remote data storage.d) The limited availability of web browsers.5、What is the common concern known as “control issues” mentioned in the passage?a) Users have no control over their data.b) Users have control over their data, but it is stored remotely.c) Data stored outside a country’s borders may be subject to the laws of that country.d) Users can choose to control their data through special software applications.Answers:1、b) The benefits and challenges of cloud computing.2、a) Because of the decreased costs of storage and bandwidth.3、a) No need for costly hardware, scalability, collaboration and mobility, and robust security features.4、c) The security risks associated with remote data storage.5、a) Users have no control over their data.七、阅读理解-仔细阅读(选择题,共20分)First Reading Comprehension Part AReading PassageThe following is a passage about the importance of exercise for mental health and productivity. This passage is followed by some questions to which the answers can be found in the passage.In today’s fast-paced world, stress has become an integral part of our lives. It’s essential to find ways to manage and reduce stress to maintain both our mental and physical health. One effective way to combat stress is through regularexercise. Research has consistently shown that physical activity can have a profound impact on our mental well-being and productivity.1.Physical activity has been found to:A) improve mental healthB) enhance productivityC) both improve mental health and enhance productivityD) have no effect on mental health2.The passage primarily discusses:A) the negative impact of stress on mental healthB) the benefits of exercise in reducing stressC) the effectiveness of various stress management techniquesD) the effects of different types of stress on the body3.It is mentioned that physical activity can have a “profound impact” on our:A) attention spanB) moodC) ability to sleepD) All of the above4.The word “integral” in the first paragraph most closely means:A) essentialB) foundationC) simpleD) occasional5.According to the passage, what is one effective way to combat stress?A) Avoiding situations that cause stressB) Seeking professional helpC) Regular physical activityD) Meditating for a few minutes dailyOptions:1、C2、B3、D4、A5、C第二题阅读下面的文章,然后回答问题。
2024北京高三一模英语汇编阅读理解D篇一、阅读理解(2024北京门头沟高三一模)A recent global study, which surveyed 10,000 young people from 10 countries, showed that nearly 60 percent of them were extremely worried about the future state of the planet. The report, which was published in The Lancet, also showed that nearly half of the respondents said that such distress affected them daily, and three quarters agreed with the statement that “the future is frightening.” This, along with many other studies, shows clearly that climate change is not just a threat to the environment that we inhabit. It also poses a very real threat to our emotional well-being. Psychologists have categorized these feelings of grief and worry about the current climate emergency, a common occurrence among youth today, under the label of “eco-anxiety”.Eco-anxiety doesn’t just affect young people. It also affects researchers who work in climate and ecological science, burdened by the reality depicted by their findings, and it affects the most economically marginalized (边缘化的) across the globe, who bear the damaging impacts of climate breakdown.In 2024, eco-anxiety will rise to become one of the leading causes of mental health problems. The reasons are obvious. Scientists estimate that the world is likely to breach safe limits of temperature rise above pre-industrial levels for the first time by 2027.In recent years, we’ve seen wildfires tear through Canada and Greece, and summer floods ruin regions in Pakistan that are home to nearly 33 million people. Studies have shown that those impacted by air pollution and rising temperatures are more likely to experience psychological distress.To make matters worse, facing climate crisis, our political class is not offering strong leadership. The COP28 conference in Dubai will be headed by an oil and gas company executive. In the UK, the government is backtracking on its green commitments.Fortunately, greater levels of will also offer an avenue for resolving the climate crisis directly. According to Caroline Hickman, a researcher on eco-anxiety from the University of Bath, anyone experiencing eco-anxiety is displaying entirely natural and rational reactions to the climate crisis. This is why, in 2024, we will also see more people around the world join the fight for climate justice and seek jobs that prioritize environmental sustainability. Campaigners will put increased pressure on fossil fuel industries and the governments to rapidly abandon the usage of polluting coal, oil, and gas.It’s now clear that not only are these industries the main causes for the climate crisis, they are also responsible for the mental health crisis, which is starting to affect most of us. Eco-anxiety is not something we will defeat with therapy, but something we will tackle by taking action.1.What can we learn from the passage?A.The cause of eco-anxiety is emotions existing in our mind.B.People in developed countries are more likely to suffer from eco-anxiety.C.Eco-anxiety is a new kind of psychological disease due to climate change.D.The author is disappointed about government behaviour towards climate crisis.2.What does the underlined word “breach” in Paragraph 3 most probably mean?A.Break.B.Reach.C.Raise.D.Affect.3.As for Caroline Hickman’s opinion on eco-anxiety, the author is .A.puzzled B.favourable C.suspicious D.unconcerned4.What would be the best title for the passage?A.Who Is to Blame for Eco-anxiety?B.How Should You See Eco-anxiety?C.How Will Eco-anxiety Be Resolved?D.Why Do People Suffer from Eco-anxiety?(2024北京延庆高三一模)It is rapidly emerging as one of the most important technological, and increasingly ideological, divides of our times: should powerful generative artificial intelligence systems be open or closed?Supporters say they broaden access to the technology, stimulate innovation and improve reliability by encouraging outside scrutiny. Far cheaper to develop and deploy, smaller open models also inject competition into a field dominated by big US companies such as Google. Microsoft and OpenAI that have invested billions developing massive, closed and closely controlled generative Al systems.But detractors argue open models risk lifting the lid on a Pandora’s box of troubles. Bad actors can exploit them to spread personalised disinformation, while terrorists might use them to manufacture cyber or bio weapons. “The danger of open source is that it enables more crazies to do crazy things, “Geoffrey Hinton, one of the pioneers of modern AI, has warned.The history of OpenAI, which developed the popular ChatGPT chatbot, is itself instructive. As its name suggests, the research company was founded in 2015 with a commitment to develop the technology as openly as possible. But it later abandoned that approach for both competitive and safety reasons. Once OpenAI realised that its generative AI models were going to be “unbelievably potent”, it made little sense to open source them, Ilya Sutskever, OpenAI’s chief scientist said.Supporters of open models hit back, ridiculing the idea that open generative AI models enable people to access information they could not otherwise find from the internet or a rogue scientist. They also highlight the competitive self-interest of the big tech companies in shouting about the dangers of open models, whose intention is to establish their own market dominance strongly.But there is an ideological dimension to this debate, too. Yann LeCun, chief scientist of Meta, has likened the arguments for controlling the technology to medieval obscurantism (蒙昧主义): the belief that only a self-selecting priesthood of experts is wise enough to handle knowledge.In the future, all our interactions with the vast digital repository of human knowledge will be mediated through Al systems. We should not want a handful of Silicon Valley companies to control that access. Just as the internet flourished by resisting attempts to enclose it, so AI will thrive by remaining open, LeCun argues.Wendy Hall, royal professor of computer science at Southampton university, says we do not want to live in a world where only the big companies run generative Al. Nor do we want to allow users to do anything they like with open models. “We have to find some compromise,” she suggests.We should certainly resist the tyranny (暴政) of the binary (二进制) when it comes to thinking about AI models. Both open and closed models have their benefits and flaws. As the capabilities of these models evolve, we will constantly have to tweak the weightings between competition and control.5. What does the underlined word “potent” in Paragraph 4 most probably mean?A. Accessible.B. Powerful.C. Significant.D. Unnoticeable.6. What can we learn from this passage?A. It needs billions of dollars to develop and deploy open-source models.B. The field of generative AI systems is dominated by big companies.C. Only self-selecting experts can handle open models wisely.D. Users can do anything they like with open models at this moment.7. Regarding Wendy Hall’s suggestions, the author is ______.A. sympatheticB. puzzledC. unconcernedD. opposed8. Which of the following would be the best title for the passage?A. How to Keep the Lid on the Pandora’s Box of Open AIB. Divides on Open AI: technology and ideologyC. Where does the Debate on Open AI EndD. Pros and Cons of Open AI(2024北京东城高三一模)When I teach research methods, a major focus is peer review. As a process, peer review evaluates academic papers for their quality, integrity and impact on a field, largely shaping what scientists accept as "knowledge"- By instinct, any academic follows up a new idea with the question, "Was that peer reviewed?"Although I believe in the importance of peer review and I help do peer reviews for several academic journals-I know how vulnerable the process can be.I had my first encounter with peer review during my first year as a Ph. D student. One day, my adviser handed me an essay and told me to have my -written review back to him in a week. But at the time, I certainly was not a "peer"--I was too new in my field. Manipulated data (不实的数据)or substandard methods could easily have gone undetected. Knowledge is not self-evident. Only experts would be able to notice them, and even then, experts do not always agree on what they notice.Let's say in my life I only see white swans. Maybe I write an essay, concluding that all swans are white. And a "peer" says, "Wait a minute, I've seen black swans. "I would have to refine my knowledge.The peer plays a key role evaluating observations with the overall goal of advancing knowledge. For example, if the above story were reversed, and peer reviewers who all believed that all swans were white came across the first study observing a black swan, the study would receive a lot of attention.So why was a first-year graduate student getting to stand in for an expert? Why would my review count the same as an expert's review? One answer: The process relies almost entirely on unpaid labor.Despite the fact that peers are professionals, peer review is not a profession. As a result, the same over-worked scholars often receive masses of the peer review requests. Besides the labor inequity, a small pool of experts can lead to a narrowed process of what is publishable or what counts as knowledge, directly threatening diversity of perspectives and scholars. Without a large enough reviewer pool, the process can easily fall victim to biases, arising from a small community recognizing each other's work and compromising conflicts of interest.Despite these challenges. I still tell my students that peer review offers the best method for evaluating studies aird advancing knowledge. As a process, peer review theoretically works. The question is whether the issues with peer review can be addressed by professionalizing the field.9. What can we learn about peer review in the first paragraph?A. It generates knowledge.B. It is commonly practiced.C. It is a major research method.D. It is questioned by some scientists.10. What can be inferred about the example of swans?A. Complexity of peer review ensures its reliability.B. Contradictions between scientists may be balanced.C. Individuals can be limited by personal experiences.D. Experts should detect unscientific observation methods.11. What is the author's major concern about peer review?A. Workload for scholars.B. Toughness of the process.C. Diversification of publications.D. Financial support to reviewers.12. The passage is mainly about ______.A. what fuels peer review B why peer review is imperfectC. how new hands advance peer reviewD. whether peer reviewers are underrated(2024北京西城高三一模)While some allergies(过敏症)disappear over time or with treatment, others last a lifetime. For decades, scientists have been searching for the source of these lifetime allergies.Recently, researchers found that memory B cells may be involved. These cells produce a different class of antibodies known as IgG, which ward off viral infections But no one had identified exactly which of those cells were recalling allergens or how they switched to making the IgE antibodies responsible for allergies. To uncover the mysterious cells, two research teams took a deep dive into the immune (免疫的)cells of people with allergies and some without.Immunologist Joshua Koenig and colleagues examined more than 90, 000 memory B cells from six people with birch allergies, four people allergic to dust mites and five people with no allergies. Using a technique called RNA sequencing. the team identified specific memory B cells. which they named MBC2s. that make antibodies and proteins associated with the immune response that causes allergiesIn another experiment, Koenig and colleagues used a peanut protein to go fishing for memory B cells from people with peanut allergies. The team pulled out the same type of cells found in people with birch and dust mite allergies. In people with peanut allergies, those cells increased in number and produced IgE antibodies as the people started treatment to desensitize them to peanut allergens.Another group led by Maria Curotto de Lafaille, an immunologist at the Icahn School of Medicine at Mount Sinai in New York City, also found that similar cells were more. plentiful in 58 children allergic to peanuts than in 13 kids without allergies. The team found that the cells are ready to switch from making protective IgG antibodies to allergy-causing IgE antibodies. Even before the switch, the cells were making RNA for IgE but didn't produce the protein. Making that RNA enables the cells to switch the type of antibodies they make when they encounter allergens. The signal to switch partially depends on a protein called JAK. the group discovered. "Stopping JAK from sending the signal could help prevent the memory cells from switching to IgE production, " Lafaille says. She also predicts that allergists may be able to examine aspects of these memory cells to forecast whether a patient's allergy is likely to last or disappear with time or treatment.“Knowing which population of cells store allergies in long-term memory may eventually help scientists identify other ways to kill the allergy cells, " says Cecilia Berin, an immunologist at Northwestern University Feinberg School of Medicine. "You could potentially get rid of not only your peanut allergy but also all of your allergies. "13. Why did scientists investigate the immune cells of individuals with and without allergies?A. To explore the distinctions between IgG and IgE.B. To uncover new antibodies known as IgG and IgE.C. To identify cells responsible for defending against allergies.D. To reveal cells associated with the development of allergies.14. What does the word "desensitize" underlined in Paragraph 4 most probably mean?A. Make. . . less destructive.B. Make. . . less responsive.C. Make. . . less protective.D. Make. . . less effective.15. What can we learn from the two research teams' work?A. MBC2s make antibodies and proteins that prevent allergies.B. Memory B cells generate both RNA for IgE and the corresponding protein.C. JAK plays a role in controlling antibody production when exposed to allergens.D. Allergists are capable of predicting whether an allergy will last or disappear.16. Which could be the best title for the passage?A. RNA Sequencing Is Applied in Immunology ResearchB. Specific Cells Related to Peanut Allergies Are IdentifiedC. Unmasking Cells' Identities Helps Diagnose and Treat AllergiesD. Newfound Immune Cells Are Responsible for Long-lasting Allergies(2024北京石景山高三一模)On Feb.21, four students were standing on the side of Pacific Coast Highway in Malibu when a driver going 110 miles per hour lost control of his car and it crashed into the parked vehicles.12 people were killed at the scene, including 2 drivers.This kind of traffic death shouldn't be called an accident. In Los Angeles, we seem to have accepted constant carnage(屠杀)in our streets in exchange for maximizing driver speed and convenience. The official responses to proven traffic dangers are mere gestures, if even that.Los Angeles is a uniquely deadly city with a death rate that is four times the national average. Unsurprisingly, it's also a city that has been designed with one thing in mind: a concept called level of service, which grades streets on how well they serve those in automobiles. To many Angelenos, that makes sense—to design our streets for car traffic, which is the way many get around the city. Unfortunately, we don't recognize that there's a trade-off. We can either have streets bettered for free-flowing traffic, or we can design streets for people to move around safely outside of cars.City leaders consistently choose for the easy but deadly option. In one recent example, a resident asked the city's Department of Transportation to block drivers from using Cochran Avenue at Venice Boulevard as a cut-through street, as they were speeding through a quiet residential neighbourhood. The department responded by suggesting a "speed awareness campaign" in which neighbours put up yard signs urging drivers to slow down.People don't drive based on signage, but they drive on the design of the street. The trunk roads of Los Angeles such as Venice Boulevard all need to be revised so that people are prioritized over cars. This would include narrowing travel lanes(道), building bike lanes, and banning right turns at red lights. These measures would make drivers feel like they're in a city and not on a highway. A recent John Hopkins study says this would have substantial safety benefits.With more than 7,500 miles of streets in the city of Los Angeles, they won't all be rebuilt anytime soon. But with each road construction project, or each crash, we should be revising streets to make them safer for all road users.The solution to traffic jam isn't to make more space for cars. It's to design the streets to be safe enough for alternatives such as biking, walking and mass transit, especially for the 50% of trips daily in Los Angeles that are less than three miles. The solution to protecting people dining outdoors isn't crash barriers. It's a street design that forces drivers to go slowly. The problem is carnage in the streets, and we know the solutions.17. Why should the traffic death in Los Angeles be called “constant carnage”?A. The traffic accidents happen quite often.B. Too many people are killed in the traffic accidents.C. The drivers' speeding is to blame for the traffic death.D. City leaders' consistent choice contributes to the traffic death.18. What does the word "trade-off" underlined in Paragraph 3 most probably mean?A. Balance.B. Guideline.C. Conflict.D. Resolution.19. According to the passage, which is a likely solution to the traffic problem?A. To widen travel lanes.B. To add more crosswalks.C. To arrange more traffic police.D. To punish speeding drivers.20. Which would be the best title for the passage?A. Drivers first or walkers first?B. Traffic death or constant carnage?C. More warning signs or safer designs?D. More narrow lanes or speedy highways?(2024北京丰台高三一模)Several dozen graduate students in London were recently tasked with outwitting a large language model (LLM), a type of AI designed to hold useful conversations. LLMs are often programmed with guardrails designed to stop them giving harmful replies: instructions on making bombs in a bathtub, say, or the confident statement of “facts” that are not actually true.The aim of the task was to break those guardrails. Some results were merely stupid. For example, one participant got the chatbot to claim ducks could be used as indicators of air quality. But the most successful efforts were those that made the machine produce the titles, publication dates and host journals of non-existent academic articles.AI has the potential to be a big benefit to science. Optimists talk of machines producing readable summaries of complicated areas of research; tirelessly analysing oceans of data to suggest new drugs and even, one day, coming up with hypotheses of their own. But AI comes with downsides, too.Start with the simplest problem: academic misconduct. Some journals allow researchers to use LLMs to help write papers. But not everybody is willing to admit to it. Sometimes, the fact that LLMs have been used is obvious. Guillaume Cabanac, a computer scientist, has uncovered dozens of papers that contain phrases such as “regenerate response”-the text of a button in some versions of ChatGPT that commands the program to rewrite its most recent answer, probably copied into the manuscript(原稿)by mistake.Another problem arises when AI models are trained on AI-generated data. LLMs are trained on text from the Internet. As they churn out(大量炮制)more such text, the risk of LLMs taking in their own outputs grows. That can cause “model collapse”. In 2023 Ilia Shumailov, a computer scientist, co-authored a paper in which a model wasfed handwritten digits and asked to generate digits of its own, which were fed back to it in turn. After a few cycles, the computer's numbers became more or less illegible. After 20iterations (迭代),it could produce only rough circles or blurry lines.Some worry that computer-generated insights might come from models whose inner workings are not understood. Inexplainable models are not useless, says David Leslie at an AI-research outfit in London, but their outputs will need rigorous testing in the real world. That is perhaps less unnerving than it sounds. Checking models against reality is what science is supposed to be about, after all.For now, at least, questions outnumber answers. The threats that machines pose to the scientific method are, at the end of the day, the same ones posed by humans.AI could accelerate the production of nonsense just as much as it accelerates good science. As the Royal Society has it, nullius in verba: take nobody's word for it. No thing's, either.21.The result of the task conducted in London shows that___________.A. LLMs give away useful informationB. the guardrails turn out to be ineffectiveC.AI's influence will potentially be decreasedD. the effort put into the study of AI hardly pays off22.What does “model collapse”indicate?A. The readability of the models' output is underestimated.B. The diverse sources of information confuse the models.C. Training on regenerated data stops models working well.D. The data will become reliable after continuous iterations.23.According to the passage, people's worry over the inexplainable models is___________.A. impracticalB. unjustifiedC. groundlessD. unsettling24.What would be the best title for the passage?A. Faster Nonsense: AI Could Also Go WrongB. Imperfect Models: How Will AI Make Advances?C. The Rise of LLMs: AI Could Still Be PromisingD. Bigger Threats: AI Will Be Uncontrollable参考答案1.D 2.A 3.B 4.B【导语】这是一篇说明文。
Analyzing and Improving a BitTorrent Network’s Performance MechanismsAshwin R.Bharambe Cormac Herley Venkata N.Padmanabhan Carnegie Mellon University Microsoft Research Microsoft ResearchAbstract—In recent years,BitTorrent has emerged as a very scalable peer-to-peerfile distribution mechanism.While early measurement and analytical studies have verified BitTorrent’s performance,they have also raised questions about various metrics(upload utilization,fairness,etc.),particularly in settings other than those measured.In this paper,we present a simulation-based study of BitTorrent.Our goal is to deconstruct the system and evaluate the impact of its core mechanisms,both individually and in combination,on overall system performance under a variety of workloads.Our evaluation focuses on several important metrics,including peer link utilization,file download time,and fairness amongst peers in terms of volume of content served.Our results confirm that BitTorrent performs near-optimally in terms of uplink bandwidth utilization,and download time except under certain extreme conditions.We also show that low bandwidth peers can download more than they upload to the network when high bandwidth peers are present.Wefind that the rate-based tit-for-tat policy is not effective in preventing unfairness.We show how simple changes to the tracker and a stricter,block-based tit-for-tat policy,greatly improves fairness.I.I NTRODUCTIONBitTorrent[1]has recently emerged as a very scalable P2P content distribution tool.In a P2P system like BitTorrent, peers not only download content from the server but also serve it to other peers.Thus the serving capacity of the system grows with the number of nodes,making the system potentially self-scaling.In BitTorrent,afile is broken down into a large number of blocks and peers can start serving other peers as soon as they have downloaded theirfirst block.Peers preferentially download blocks that are rarest among their local peers so as to maximize their usefulness to other peers.These strategies allow BitTorrent to use bandwidth between peers (i.e.,perpendicular bandwidth[2])effectively and handleflash crowds well.In addition,BitTorrent incorporates a tit-for-tat(TFT)incentive mechanism,whereby nodes preferentially upload to peers from whom they are able to download at a fast rate in return.The soundness of these architectural choices is borne out by the success of the system in actual deployment.Recent measurement and analytical studies[3–5](discussed in Section III)indicate that BitTorrent scales well with system size and has performed well in real large-scalefile distributions. However,we believe that these studies leave a number of questions unanswered.For example:•Izal et al.[3]reported that clients observed high download rates.However,was this optimal?Could BitTorrent have achieved even higher bandwidth utilization in this setting?•Does BitTorrent’s Local Rarest First(LRF)policy for picking new blocks to download from peers effectively avoid the last block problem?•How effective is BitTorrent’s TFT policy in avoiding un-fairness i.e.,a node uploads much less than it downloads?Fairness can be important to encourage peer participation, for instance,in settings where the ISP charges users based on upload traffic volume.•Previous studies have assumed that at least a fraction of nodes perform altruistic uploading even afterfinishing their downloads.However,if nodes depart as soon as theyfinish,is the stability or scalability of the system hurt significantly?The answers depend on a number of parameters that Bit-Torrent uses.It would be difficult,if not impossible,to control such a large space of possibilities in an analytical or live measurement setting.Hence,in this paper,we attempt to an-swer these questions using a simulator which models the data-plane of BitTorrent.1We believe our study is complementary to previous BitTorrent studies.Our principalfindings are as follows:First,wefind BitTorrent to be remarkably robust and scalable at ensuring high uplink bandwidth utilization.It scales well as the number of nodes increases,keeping the load on the origin server bounded,even when nodes depart immediately after completing their downloads.The LRF policy performs better than alternative block-choosing policies in a wide range of environments(e.g.,flash crowd,post-flash crowd situations, small network sizes,etc.)By successfully getting rid of the last block problem,it provides a simpler alternative to previously proposed source coding strategies,e.g.,Digital Fountain[6]. Second,wefind that BitTorrent shows sub-optimal behavior in certain“extreme”workloads:1)The bandwidth of the origin server is a precious re-source,especially when it is scarce.It is important that server deliver unique packets to the network at least as quickly as they can be diffused among the peers.We present a simple way of ensuring this.2)BitTorrent’s rate based TFT mechanisms do not pre-vent unfairness in terms of the data served by nodes, especially in node populations with heterogeneous band-widths.We demonstrate that clustering of similar nodes using bandwidth matching is key to ensuring fairness 1We do not consider control-plane issues such as the performance of the centralized tracker used for locating peers.without sacrificing uplink bandwidth utilization.3)BitTorrent’s LRF policy“equalizes”the replication rateof all blocks.Hence,in a setting where there is a wide range in the fraction of afile that different nodes already possess(say because nodes that have downloaded a file partially rejoin during a subsequentflash crowd), BitTorrent is not very effective at allowing nodes who have most of afile to rapidlyfind the few blocks that they are missing.In addition to elaborating on thesefindings,we present simple yet effective techniques to alleviate the unfairness in BitTorrent.The main focus of our evaluation is on BitTorrent’s perfor-mance inflash-crowd scenarios.We believe that such scenarios are a critical test for systems such as BitTorrent,since it is such overload conditions that make provisioning a server-based content distribution system hard and make peer-to-peer alternatives such as BitTorrent seem appealing.That said,we do consider a few alternative scenarios as well.The rest of the paper is organized as follows:in Section II, we present a brief overview of the BitTorrent system.Sec-tion III discusses related analytical and measurement-based studies.Section IV describes our simulation environment and the evaluation metrics.More details about our methodology can be found in[7].The main contribution of our work appears in Sections V through VIII,where we present a simulation-based evaluation of BitTorrent under a variety of configurations and workloads.We conclude in Section IX.II.B IT T ORRENT O VERVIEWBitTorrent[1]is a P2P application whose goal is to enable fast and efficient distribution of largefiles by leveraging the upload bandwidth of the downloading peers.The basic idea is to divide thefile into equal-sized blocks(typically32-256 KB in size)and have nodes download the blocks from multiple peers concurrently.The blocks are further subdivided into sub-blocks to enable pipelining of requests so as to mask the request-response latency[8].Corresponding to each largefile available for download (called a torrent),there is a central component called the tracker that keeps track of the nodes currently in the system. The tracker receives updates from nodes periodically(every 30minutes)as well as when nodes join or leave the torrent. Nodes in the system are either seeds,i.e.,nodes that have a complete copy of thefile and are willing to serve it to others, or leechers,i.e.,nodes that are still downloading thefile but are willing to serve the blocks that they already have to others.2 When a new node joins a torrent,it contacts the tracker to obtain a list containing a random subset of the nodes currently in the system(both seeds and leechers).The new node then attempts to establish connections to about40existing nodes, which then become its neighbors.If the number of neighbors of a node ever dips below20,the node contacts the tracker again to obtain a list of additional peers it could connect to. 2In the rest of the paper,unless otherwise specified,we will use the terms “node”and“leecher”interchangeably.Each node looks for opportunities to download blocks from and upload blocks to its neighbors.In general,a node has a choice of several blocks that it could download.It employs a local rarestfirst(LRF)policy in picking which block to download:it tries to download the block that is least replicated among its neighbors.A tit-for-tat(TFT)policy is employed to guard against free-riding:a node preferentially uploads to neighbors that provide it the best download rates.Each node limits the number of concurrent uploads to a small number,typically5.Seeds have nothing to download,but they follow a similar policy:they upload to up to5nodes that have the highest download rate. The mechanism used to limit the number of concurrent uploads is called choking,which is the temporary refusal of a node to upload to a neighbor.Only the connections to the chosen neighbors(up to5)are unchoked at any point in time.A node reevaluates the download rate that it is receiving from its neighbors every10seconds to decide whether a currently unchoked neighbor should be choked and replaced with a different neighbor.BitTorrent also incorporates an optimistic unchoke policy, wherein a node,in addition to the normal unchokes described above,unchokes a randomly chosen neighbor regardless of the download rate achieved from that neighbor.Optimistic unchokes are typically performed every30seconds.This mechanism allows nodes to discover neighbors offering higher download rates,as well as enables new nodes to obtain their first block.III.R ELATED W ORKThere have been analytical as well as measurement-based studies of the BitTorrent system.At the analytical end,Qiu and Srikant[5]have considered a simplefluid model of BitTorrent. Their mainfindings are:(a)the system scales very well,i.e. the average download time is not dependent on the node arrival rate,and(b)file sharing is very effective,i.e.there is a high likelihood that a node holds a block that is useful to its peers. Izal et al.[3]and Pouwelse et al.[4]present measurement-based studies of BitTorrent based on tracker logs of different torrents.The mainfindings of these studies are:(a)the average download rate is consistently high;(b)as soon as a node has obtained a few chunks,it is able to start uploading to its peers (the local rarestfirst policy works);(c)the node download and upload rates are positively correlated(tit-for-tat policy works);and(d)nodes might not stay on in the system(as seeds)after completing their download.So it is the few long-lived seeds that are critical forfile availability.Accordingly, in our experiments,we simulate a small number of long-lived seeds,with leechers departing as soon as they havefinished downloading.Gkantsidis and Rodriguez[9]present a simulation-based study of a BitTorrent-like system.They show results indicating that the download time of a BitTorrent-like system is not optimal,especially in settings where there is heterogeneity in node bandwidth.They go on to propose a network coding[10] based scheme called Avalanche that alleviates these problems.Our study differs from previous studies of BitTorrent in the following important ways:first,while the analytical study reported in[5]presents the steady state scalability properties of BitTorrent,it ignores a number of important BitTorrent parameters(e.g.,node degree(d),maximum concurrent up-loads(u)),and environmental conditions(e.g.,seed bandwidth, etc.)that could affect uplink bandwidth utilization.Second, previous studies only briefly allude to free-riding;in this paper, we quantify systematic unfairness in BitTorrent and present mechanisms to alleviate it.Finally,there have been a number of proposals for peer-to-peer content distribution systems that employ a BitTorrent-like swarming approach(although at least some of these were proposed independently of BitTorrent).The innovations in these proposals include the use of erasure coding(e.g.,[2]), adaptive strategies for peer selection(e.g.,Slurpie[11]),and a combination of the two(e.g.,Bullet[12]and Bullet’[13]). While some of the techniques we evaluate here overlap with those presented in this body of work,we believe that our analysis is able to uncover the impact of each technique individually in the context of BitTorrent,something that is hard to do with new and different systems that incorporate a number of new techniques.Also,the issue of fairness that we focus on has not received much attention in this previous work.IV.M ETHODOLOGYTo explore aspects of BitTorrent that are difficult to study using traces of real torrents[3,4]or analysis[5],we use a simulation-based approach for understanding and deconstruct-ing BitTorrent performance.Such an approach provides the flexibility of carefully controlling the configuration parameters of the various BitTorrent mechanisms,or even selectively turning off certain mechanisms and replacing them with alternatives.This would be difficult or even impossible to achieve using live Internet measurement techniques(e.g.,using tracker logs[3,4]or by participating in a live-torrent).Thus, while certain interactions specific to a real deployment will be missed,we believe the abstraction is rich enough to expose most details that are relevant to our experiments.We briefly describe our simulator and define the metrics used in our evaluation.More details can be found in[7].A.Simulator DetailsOur discrete-event simulator models peer activity(joins, leaves,block exchanges)as well as many of the associated BitTorrent mechanisms(local rarestfirst,tit-for-tat,etc.)in detail.We have made the software available[14].The network model associates a downlink and an uplink bandwidth for each node,which allows modeling asymmetric access networks. The simulator uses these bandwidth settings to appropriately delay the blocks exchanged by nodes.The delay calculation takes into account the number offlows that are sharing the uplink or downlink at either end,which may vary with time. Doing bandwidth-sensitive delay computation for each block transmission is expensive enough that we have to limit the maximum scale of our experiments to8000nodes on a P4 2.7GHz,1GB RAM machine.Given the computational complexity of even the simple model above,we decided to simplify our network model in the following ways.First,we do not model network propagation delay,which is relevant only for the small-sized control pack-ets(e.g.,the packets used by nodes to request blocks from their neighbors).We believe that this simplification does not have a significant impact on our results because(a)the download time is dominated by the data traffic(i.e.,block transfers), and(b)BitTorrent’s pipelining mechanism(Section II)masks much of the control traffic latency in practice.Second,we do not model the packet-level dynamics of TCP connections. Instead,we make the assumption that connections traversing a link share the link bandwidth equally,with the share of each connectionfluctuating as the number of connections varies. Note that BitTorrent’s store-and-forward mode of operation at the granularity of blocks is unaffected by this simplification. Although this simplification means that TCP“anomalies”(e.g., timeouts)are not modeled,we believe that the relatively long length of the connections mitigates this issue.Note that,while an individual block in BitTorrent may not be very long(32-256KB),dataflow is kept as continuous as possible using pipelining of block requests.Finally,we do not model shared bottleneck links in the interior of the network.We assume that the bottleneck link is either the uplink of the sending node or the downlink of the receiving node.While Akella et al.[15]characterize bandwidth bottlenecks in the interior of the network,their study specifically ignores edge-bottlenecks by conducting measurements only from well-connected sites (e.g.,academic sites).The interior-bottlenecks theyfind are generally fast enough(≥5Mbps)that the edge-bottleneck is likely to dominate in most realistic settings.Hence we believe that our focus on just edge-bottlenecks is reasonable.We also make one simplification in modeling BitTorrent itself.We ignore the endgame mode[8],which is used by BitTorrent to make the end of a download faster by allowing a node to request the sub-blocks it is looking for in parallel from multiple peers.This is inconsequential to our study because:(a)the endgame mode has no effect on steady-state performance,and(b)it does not help with the“last block”problem which we study in Section VI-C.Recall that the last block problem occurs when a node has difficultyfinding a peer that possesses the last block,increasing the overall download time significantly in many distribution systems.Since the endgame mode assumes the availability of the last block at multiple peers it plays no role in removing this potential bottleneck.B.MetricsWe quantify the effectiveness of BitTorrent in terms of the following metrics:Link utilization:We use the mean utilization of the peers’uplinks and downlinks over time as the main metric for evaluating BitTorrent’s efficacy.The utilization at any pointin time is computed as the ratio of the aggregate traffic flow on all uplinks/downlinks to the aggregate capacity of all uplinks/downlinks in the system;i.e.,the ratio of the actual flow to the maximum possible.Notice that if all the uplinks in the system are saturated,the system as a whole is serving data at the maximum possible rate.While downlink utilization is also an important metric to consider,the asymmetry of most Internet access links makes the uplink the key determinant of performance.Also,by design,duplicatefile blocks(i.e.,blocks that a leecher already has)are never downloaded.Note that a small amount of uplink bandwidth could be“wasted”in BitTorrent due to duplicate sub-block requests during endgame mode;however,as noted above,we do not model this mode.Hence,the mean download time for leechers is inversely related to the average uplink utilization.Because of this and the fact that observed uplink utilization is easier to compare against the optimal value(i.e., 100%),we do not explicitly report the mean download time for most of our experiments.Fairness:The system should be fair in terms of the number of blocks served by the individual nodes.No node should be compelled to upload much more than it has downloaded. Nodes that willingly serve the system as seeds are,of course, welcome,but deliberate free-riding should not be possible. Fairness is important for there to be an incentive for nodes to participate,especially in settings where ISPs charge based on uplink usage or uplink bandwidth is scarce.We quantify fairness using the normalized count offile blocks uploaded by a node,i.e.,the number of blocks uploaded divided by the number of blocks in thefile.So,for example, a normalized load of1.5means that the node serves a volume of data equivalent to1.5copies of thefile.We also use the normalized count of blocks served to quantify the load on the seed(s)in the system.Optimality:Throughout this paper we will refer to a system as having optimal utilization if it achieves the maximum possible link utilization,and having complete fairness if every leecher downloads as many blocks as it uploads.We will refer to the system as being optimal on the whole if it has optimal utilization as well as complete fairness.Note that a heterogeneous setting can have differing degrees of fairness for the same level of bandwidth utilization.V.E XPERIMENT O VERVIEWA.Workload Derived from a Real TorrentIn order to set the stage for the experiments to follow,we first examine how our simulator performs under a realistic workload.We consider two important workload parameters: (a)node arrival pattern,and(b)uplink and downlink band-width distribution.To derive realistic arrival patterns,we use the tracker log for the Redhat9distribution torrent[3]. Table I describes the distribution of peer bandwidth,which was derived from the Gnutella study reported in[16].While discretizing the CDFs presented in[16],we excluded the tail of the distribution.This means that(a)dial-up modems are eliminated,since it is unlikely that they will participate in such large downloads,and(b)very high bandwidth nodes are eliminated,making the setting more bandwidth constrained. We set the seed bandwidth to6000kbps.Downlink(kbps)Uplink(kbps)Fraction of nodes7841280.215003840.4300010000.251000050000.15TABLE I:Bandwidth distribution of nodes derived from the actual distribution of Gnutella nodes[16].In order to make the simulations tractable,we made two changes.First,we used afile size of200MB(with a block size256KB),which is much smaller than the actual size of the Redhat torrent(1.7GB).This means the download time for a node is smaller and the number of nodes in the system at any single point is also correspondingly smaller.Second, we present results only for the second day of theflash crowd. This day witnesses over10000node arrivals;however,due to the smallerfile download time,the maximum number of active nodes in the system at any time during our simulations was about300.The simulation results can be summarized as follows: The observed uplink utilization was91%;this means that the overall upload capability of the network is almost fully utilized.However,this comes at the cost of considerable skew in load across the system.The seed serves approximately 127copies of thefile into the network.Worse,some clients uploaded6.26times as many blocks as they downloaded, which represents significant unfairness.These results lead toa number of interesting questions:1)How robust is the high uplink utilization to variations insystem configuration and workload,e.g.,differing num-ber of seeds and leechers,join-leave patterns,bandwidth distribution,etc.?2)Can the fairness of the system be improved withouthurting link utilization?3)How well does the system perform when there is het-erogeneity in terms of the extent to which leechers have completed their download e.g.,new nodes coexisting with nodes that have already completed most of their download?4)How sensitive is system performance to parameters suchas the node degree(i.e.,the number of neighbors of a node)and the maximum number of concurrent uploads? To answer these questions,we present a detailed simulation-based study of BitTorrent in the sections that follow.B.Road-map of ExperimentsUnless otherwise specified,we use the following default settings in our experiments:•File size:100MB(400blocks of256KB each)•Number of initial seeds:1(the origin server,which stays on throughout the duration of the experiment)•Seed uplink bandwidth:6000Kbps•Number of leechers that join the system(n):1000•Leecher downlink/uplink bandwidth:1500/400kbps •Join/leave process:aflash crowd where all leechers join within a10-second interval.Leechers depart as soon as theyfinish downloading.•Number of neighbors of each node(degree d):7.•Maximum number of concurrent upload transfers(u):5 The key parameters that affect the evolution of a torrent are: (1)the number of seed(s)and their serving capacity,(2)the number of leechers that wish to download,(3)the policies that nodes use to swap blocks among themselves,(4)the distribution of node upload/download capacities,and(5)the arrival/departure pattern of the leechers.We start in Section VI by examining only(1),(2)and(3). That is,we consider a homogeneous setting where all leechers have the same downlink/uplink bandwidth(1500/400Kbps by default,as noted above),and arrive in aflash crowd.We explore the impact of the number of leechers,the number of initial seeds,aggregate bandwidth of seeds,etc.This section also evaluates BitTorrent’s LRF policy for picking blocks.We wish to point out that,although we use a small node degree (d=7),our results are not affected at higher node degrees. We present results with a small node degree to emphasize BitTorrent’s resilience with respect to this parameter.Then in Section VII we examine(4)and turn to a het-erogeneousflash-crowd setting where there is a wide range in leecher bandwidth.We consider3kinds of connectivity for leechers:high-end cable(6000/3000Kbps),high-end DSL (1500/400Kbps),and low-end DSL(784/128Kbps). Finally,in Section VIII,we turn to(5)and consider work-loads other than a pureflash-crowd scenario.In particular,we consider cases where leechers with very different download “objectives”coexist in the system.For instance,new nodes in the post-flash crowd phase compete with nodes that have already downloaded most of the blocks.Likewise,an old node that reconnects during the start of a newflash crowd tofinish the remaining portion of its download competes with new nodes that start their downloads.VI.H OMOGENEOUS E NVIRONMENTIn this section,we study the performance of BitTorrent in a setting consisting of a homogeneous(with respect to bandwidth)collection of leechers.Figures presented in this section and the next(Sec.VII)represent the results of a single simulation run,since the differences across multiple runs were found to be statistically insignificant.A.Number of nodesFirst we examine the performance of the system with increasing network size.We vary the number of nodes that join the system from50to8000.All nodes join during a 10second period,and remain in the system until they have completed the download.Figure1a shows that upload capacity utilization(see Section IV)is close to100%regardless of system size.Utilization is a little short of100%because of the start-up phase when nodes are unable to utilize their uplinks effectively.The high uplink utilization indicates that the system is performing almost optimally in terms of mean download time.The downlink utilization,on the other hand, is considerably lower.This is expected given the asymmetric access links of the leechers.Another important measure of scalability is how the work done by the seed varies with the number of leechers.We measure this in terms of the normalized number of blocks served,i.e.,the number of blocks served divided by the number of blocks in one full copy of thefile.Ideally,we would like the work done by the seed to remain constant or increase very slowly with system size.Figure1b shows that this is actually the case.The normalized number of blocks served by the seed rises sharply initially(as seen from the extreme left of Figure1b)but thenflattens out.The initial rise indicates that the seed is called upon to do much of the serving when the system size is very small,but once the system has a critical mass of50or so nodes,peer-to-peer serving becomes very effective and the seed has to do little additional work even as the system size grows to8000.In summary,BitTorrent performance scales very well with increasing system size both in terms of bandwidth utilization and the work done by the seed.B.Number of seeds and bandwidths of seedsNext we consider the impact of numbers of seeds and aggregate seed bandwidth on the performance of BitTorrent. Wefirst consider the case where there is a single seed,and then move on to the case of multiple seeds.Figure1c shows the mean upload utilization as the band-width of a single seed varies from200Kbps to1000Kbps. The“nosmartseed”curve corresponds to default BitTorrent behavior.We see that upload utilization is very low(under 40%)when the seed bandwidth is only200Kbps.This is not surprising since the low seed bandwidth is not sufficient to keep the uplink bandwidth of the leechers(400Kbps)fully utilized,at least during the start-up phase.However,even when the seed bandwidth is increased to400or600Kbps,the upload utilization is still considerably below optimal.Part of the reason for poor upload utilization is that seed bandwidth is wasted serving duplicate blocks prematurely i.e., even before one full copy of thefile has been served.We find that about50%of the blocks are served prematurely when the seed bandwidth is400kbps.Thus,despite the LRF policy,multiple nodes connected to the seed can operate in an uncoordinated manner and independently request the same block.Once identified,there is a simplefix for this problem. We have implemented a smartseed policy,which has two components:(a)The seed does not choke a leecher to which it has transferred an incomplete block.This maximizes the opportunity for leechers to download and hence serve complete blocks.(b)For connections to the seed,the LRF policy is replaced with the following:among the blocks that a leecher is looking for,the seed serves the one that it has served the least.This policy improves the diversity of blocks in。
A Robust and Scalable Peer-to-Peer Gossiping ProtocolSpyros V oulgaris1,M´a rk Jelasity2,Maarten van Steen11Vrije Universiteit Amsterdam2University of BolognaAbstract.The newscast model is a general approach for communication in largeagent-based distributed systems.The two basic services—membership manage-ment and information dissemination—are implemented by the same epidemic-style protocol.In this paper we present the newscast model and report on experi-ments using a Java implementation.The experiments involve communication in alarge,wide-area cluster computer.By analysis of the outcome of the experimentswe demonstrate that the system indeed shows the scalability and dependabilityproperties predicted by our previous theoretical and simulation results.1IntroductionThe popularity of peer-to-peer systems in the last couple of years illustrates how the Internet is gradually shifting toward a distributed system that supports more than only client-server applications.A key issue in peer-to-peer systems is that distribution of data and control across processes is symmetric.Moreover,this distribution is done in such a way that processes are highly autonomous and independent of each other.The important advantage of this approach is scalability.A well-designed peer-to-peer system can easily scale to millions of processes,each of which can join or leave whenever it pleases without seriously disrupting the system’s overall quality of service.A crucial aspect of large-scale peer-to-peer systems is that they are easy to manage. Any system that attempts to centrally manage how processes connect to each other and distribute data and control will fail,notably when processes join and leave all the time. Instead,it should be a property of the design itself that the distribution of data and control takes place in an automated fashion that requires no global management at all. In effect,we are looking at the design of self-managing systems.There are many different types of peer-to-peer systems.In most cases,these systems can be divided into two separate layers.The lowest layer consists of protocols for han-dling group membership and communication,whereas the highest layer implements the required functionality for a specific application.The lowest layer thus forms the core of the peer-to-peer system.Roughly speaking,there are currently three types of core systems.Thefirst,and most popular type is designed to efficiently support content-based searching.In many cases,these systems operate with centralized index servers that keep track of where content is located.The index servers are often constructed dynamically in the form of super peers[16].Examples include Gnutella and KaZaa.The second type aims at efficiently routing a request to its destination through an overlay network formed by the collection of peers.Examples of such systems are Chord[13],Pastry[12],Tapestry[17],and CAN[10].A third type deploy epidemic protocols[3].In these systems,the goal is not so much enabling point-to-point communication between peers,but rather the rapid and efficient dissemination of information.Examples in this class include Scamp[6],and the systems underlying probabilistic reliable broadcasting[4,5,9].In this paper,we concentrate on information-dissemination based systems that de-ploy epidemic protocols.A crucial element in an epidemic protocol is that a participat-ing peer can randomly select another peer to exchange information.Traditional proto-cols supported this random selection by providing a list of all other participating peers. Clearly,such an approach cannot scale to large networks.As an alternative,approaches such as described in[4,7]ensure that a peer always has a list that represents an inde-pendent and randomly selected sample from the entire set of peers.We have recently developed an epidemic protocol for disseminating information in large,dynamically changing sets of autonomous agents.This,so-named,newscast protocol solves two problems inherent to large sets of agents:(1)information dissemi-nation,and(2)efficient membership management.The main distinction in comparison to similar epidemic-based solutions,is that agents can join and leave at virtually no cost at all,and without affecting the information-dissemination properties of the protocol.The associated model of newscasting,that is,the model of information dissemi-nation and membership management as presented to agents,is described in detail in a separate paper[8],along with theoretical analyses partly based on simulations.To better substantiate our claims regarding scalability,we have implemented the news-cast protocol(in Java).We subsequently used this implementation to conduct a series of experiments that emulate large-scale agent-based applications on a real network.In particular,we set up a series of experiments with128,000agents scattered across a hierarchically organized cluster of320processors.These processors,in turn,are geo-graphically spread over four different sites in the Netherlands.An important and interesting aspect of these experiments is that the underlying com-munication network is heterogeneous.It includes interprocess communication facilities on a single workstation,point-to-point local-area high-speed links,as well as wide-area links.In this way,we are better able to measure the effect that a real communication infrastructure has on the properties of our dissemination model.In this paper,we describe the newscast protocol and report on our experiments in-volving emulation of large networks of agents.We show that the theoretical results, which are based on an idealized underlying communication infrastructure,still hold when dealing with a realistic infrastructure,thus further substantiating our claims that newscasting is a highly robust and scalable model for information dissemination in large and rapidly changing sets of agents.In the following,we subsequently discuss our protocol,the experimental setup,and the results of our experiments,to end with conclusions.2The Newscast ProtocolIn our implementation of the newscast model,a large group of agents is connected through a simple peer-to-peer data exchange protocol.The protocol is extremely simple: each agent knows only a(continuously changing)small set of peers of which one israndomly chosen to exchange information.In this section,we start with explaining how the protocol works,after that we explore some remarkable theoretical properties of its emerging behavior.These properties are further investigated in Section3when we report on our large-scale emulation experiments.2.1Principal OperationThe two main building blocks of our newscast model are a collective of agents and a news agency,as shown in Figure1.The basic idea is that the news agency asks all agents regularly for news by means of a callback function getNews().In addition,the news agency provides each agent with news about the other agents in the collective, again through a callback function newsUpdate(news[]).Fig.1.The organization of a newscast application.The definition of what counts as news is application dependent.The agents simply live their lives(perform computations,listen to sensors and the news,etc.)and based on the computations they have completed and the information they have collected they must provide the news agency with news when asked.The model itself can be fully specified in terms of the functional and statistical properties of the operations getNews()and newsUpdate(news[]).Instead of this defini-tial style of specification,we take a much simpler approach in this paper by describing the semantics of the model in terms of the newscast protocol,of which we have shown that it meets the model’s specifications[8].Each agent has an associated correspondent that runs on the same machine hosting the agent.The correspondents jointly form the distributed implementation of the news agency.Each correspondent maintains afixed-sized cache of news items.Whenever an agent passes a news item to its correspondent,the latter timestamps the item,adds its own network address,and subsequently caches the item.A news item itself consists of an agent identifier and the actual news as provided by the agent,as shown in Figure2.Cache entryNews itemFig.2.The format of news items and cache entries.Correspondents regularly exchange caches as follows.Assume the size of each cache is c.Omitting specific details(which are found in[8]),each correspondent exe-cutes the followingfive steps once every∆T time units(∆T is referred to as the refresh interval).1.Request a fresh news item from the local agent by calling getNews().Add the itemto the cache.2.Randomly select a peer correspondent by considering the network address of other(and available)correspondents as found in the cache.3.Send all cache entries to the selected peer,and,in turn,receive all the peer’s cacheentries.Merge the received entries into the local cache.4.Pass the received cache entries from the peer agent to the local agent by callingnewsUpdate().5.The correspondent now has2c cache entries;it subsequently throws away the coldest ones.The selected peer correspondent executes the last three steps as well,so that after the exchange both correspondents have the same cache.Note,however,that as soon as any of these two correspondents executes the protocol again,their respective caches will most likely be different again.The protocol does not require that the clocks of correspondents are synchronized, but only that the timestamps of news items in a single cache are mutually consistent. We assume that the communication time between two correspondents is negligible in comparison to∆T(which is generally in the order of minutes).When a correspondent A passes its cache to B,it also sends along its current local time,T A.When B receives the cache entries,it subsequently adjusts the timestamp of each entry with a value T A−T B, effectively normalizing the time of each new entry to those already cached.2.2Properties of NewscastingAs it turns out,this simple model of communication has desirable statistical properties. To understand the behavior of newscasting,we consider the communication graphs G t at different time instants t that are induced by maintaining caches at each correspondent. Each such graph is constructed from a corresponding directed graph D t as follows.The vertex set V t of D t contains the correspondents.For correspondents a and b in V t we have the link a→b if and only if the address of b is in the cache of a at time t.The cache-exchange algorithm leads to a series of directed graphs,given an initial directedgraph D 0.The communication graph G t is now simply constructed by dropping the orientation in D t .G t expresses the possibility of cache exchanges.Now consider the series of graphs G 0,G ∆T ,G 2∆T ,....Note that during a time in-terval ∆T each correspondent initiates the cache-exchange algorithm.In other words,after ∆T time units,all correspondents will have fetched a news item from their agent,exchanged caches with at least one of their neighbors (and possibly more),and have passed c news items to their agent.We say that a communication cycle has completed.We have conducted simulations with up to 50,000correspondents,assuming an ide-alized communication infrastructure with no communication delays and packet losses.Our simulations show that even for relatively small cache sizes (say,c =20)each graph G k ∆T stays connected.Moreover,it turns out that the average length of each shortest path between two nodes is small,as shown in Figure 3(a).1.522.533.544.5100020005000100002000050000a v e r a g e p a t h l e n g t hnumber of agents c = 20c = 400.10.150.20.250.30.35100020005000100002000050000c l u s t e r i n g c o e f f i c i e n tnumber of agentsc = 20c = 40(a)(b)Fig.3.(a)Average shortest path length between two nodes for different cache sizes.(b)Average clustering coefficient taken over all nodes.In fact,further investigations revealed that the induced communication graphs have many properties in common with what are known as small worlds [1,15].An important property of these types of networks is that they show a relatively high clustering coeffi-cient ,which,for a given node,is the ratio of the number of edges between the neighbors of the node and the number of all possible edges between the neighbors.For example,in a complete graph all nodes have a clustering coefficient of 1while in random graphs this coefficient is typically small (if our graphs were random,the clustering coefficient would be expected to be c /n ).Figure 3(b)shows the clustering coefficient for different cache sizes c and communication graphs G k ∆T .Simulations also show that we need only an extremely simple way of handling mem-bership,which is an important improvement in comparison to other epidemic models.Consider the worst solution to handling membership that could possibly disrupt the emergent behavior of our protocol:an agent contacts a well-known central server and simply initiates the cache-exchange protocol with that server.This approach systemati-cally biases the content of caches,which now all depend on what is stored at the central server.We conducted a simulation experiment in which we admitted 50new agents at every communication cycle until 5000agents had joined the network,after which no new agents were allowed to join.When measuring the average path length again,we obtain the results shown in Figure 4.What is seen,is that shortly after the last 50agents have been added (i.e.,after 100completed cycles),the average path length quickly converges to the one we would expect in a stable graph.We can conclude that even this worst-imaginable membership protocol does not affect the properties of newscasting.In effect,when a node wants to join,it needs to know only the address of a single other node and can simply start with executing the newscast protocol.Leaving is done by simply stopping communication.1234567020406080100120140a v e r a g e p a t h l e n g t h f r o m n o d e 0complete cyclesgrowing by 50 each cycle to random old vertex growing by 50 each cycle to central vertex randomFig.4.Average shortest path length while adding 50agents every cycle until 5000agents have been added.3Validation of the Newscast ProtocolUsing theoretical analyses and simulations,we are able to show that the statistical prop-erties of the protocol meet the specifications of the newscast model.However,in the world of large-scale systems,theory and practice often diverge.Therefore,to investi-gate how the protocol would behave in practice and to further substantiate our claims,we conducted a series of emulation experiments.Emulation,as opposed to simulation,involves implementing the protocol and conducting experiments on a real network of computers.In our case,we carried out experiments with a collective of up to 128,000agents distributed across a 320-node wide-area cluster of workstations.We emphasize that these experiments involve the actual implementation of the newscast protocol,and as such,favorably compete with the large-scale simulations used for the evaluation of peer-to-peer protocols such as CAN [10],Chord [14],and Pastry [2,12],or the real-world experiments involving tens of nodes across wide-area links as conducted for OceanStore [11].3.1The ImplementationIn order to experiment with the newscast model described earlier,we implemented its underlying protocol.Java was chosen for portability reasons,allowing us to easily ex-ecute the protocol in heterogeneous environments.Our implementation is organized as three modules:the core,the application,and the utility module.The core module im-plements a correspondent materializing the epidemic protocol described in Section2.1. The core module has no dependencies on the other two modules.It is a self-contained implementation of the newscast protocol.All communication is based on UDP.Multi-ple instances of the core module can coexist in a single Java virtual machine,behaving as separate,independent correspondents.The application module provides the implementation of an agent.One instance of the core module has exactly one associated instance of the application module.Our experiments were focused on the properties of the epidemic protocol itself without con-sidering any particular application.Therefore the agent we defined has only basic func-tionality.It returns empty content in the getNews()operation,and ignores any content delivered to it through the newsUpdate(news[])operation.The utility module serves the specific needs of our experiments,such as batch run-ning,coordinating the experiments,and logging.Exactly one utility module instance exists in each virtual machine.In particular,the utility module takes care of starting multiple agent-correspondent pairs each running on a separate thread within the same Java virtual machine to allow emulation of a large network.It coordinates with utility modules running on other Java virtual machines(possibly on remote hosts)to determine initial connection addresses for the correspondents.The utility module also contains logging functionality.It periodically freezes the agent-correspondent pairs running in the Java virtual machine,logs their state,and then resumes operation.Utility modules coordinate to ensure that freezing and resuming for logging occur simultaneously on all the Java virtual machines spread across the different hosts.3.2The DAS-2We conducted our experiments on the Distributed ASCI Supercomputer(DAS-2),a wide-area distributed cluster-based system consisting offive clusters of dual-processor PCs located at different sites across the Netherlands.The cluster at the Vrije Universiteit consists of72nodes,while the other clusters consist of32nodes each,giving a total of 200nodes(400processors).Each node has two1-GHz Pentium-III processors,and at least1GB of RAM.Nodes within a single cluster are connected by a Fast Ethernet(100Mbps)network dedicated to their cluster.Clusters,in turn,communicate over wide-area links,which are shared for all traffic between the universities and which have shown to support an aggregated bandwidth of20Mbps.Figure5shows the DAS-2architecture.3.3Experimental SettingWe carried out experiments with a network of128,000agents distributed across160 dual-processor nodes on four of thefive DAS-2clusters.We recorded and analyzed theFig.5.The DAS-2architecturebehavior of the newscast model for three different cache sizes c:20,30and40.In all three cases the refresh interval∆T was10seconds.The presented series of experiments was conducted to examine the possible impact of the underlying network’s heterogeneity on the operation of the newscast model.It is, therefore,worth describing the deployment of agents across the DAS-2nodes.We used 160-dual processor nodes,selecting64nodes from the cluster at the Vrije Universiteit, and32nodes out of three other DAS-2clusters.We executed two Java virtual machines per node(one per processor),each Java virtual machine running400agents.The deployment of agents described above presents a desirable property for our experiments:network heterogeneity.Four different types of communication were in-volved,depending on the relative location of the agents communicating:–Intraprocess communication for agents running in different threads within the same Java virtual machine.–Interprocess communication for agents run by separate Java virtual machines,but on the same DAS-2node.–Local-area(or intracluster)communication for agents residing on different nodes, but within the same cluster.These agents were communicating through a100Mbps Fast Ethernet network.–Wide-area(or intercluster)communication for agents belonging to different clus-ters.This type of communication was carried out over the wide-area links shared with other wide-area traffic.This diverse environment(with respect to networking)provided us with a valuable testbed for studying the newscast model.It is important to observe that even though800agents run within each DAS-2node, more than99%of the communication between agents is either across wide-area or local-area links.For any given agent,799other agents run on the same node,and 127,200run on other nodes,which account for0.6%and99.4%of the total128,000 agents respectively.As we observed in our experiments,the items in an agent’s cache are randomly distributed over all the participating agents,irrespective of their location. Therefore we expect only0.6%of the total communication to be within or between processes on the same node,and all the rest to be across local-area or wide-area links. In particular,agents in the three32-node clusters are expected to experience80%wide-area and19.4%local-area traffic,while agents in our64-node cluster are expected to have60%wide-area and39.4%local-area traffic.Another parameter of our experiments that is worth noting,is the bootstrapping mechanism.By bootstrapping we refer to the procedure of providing agents with the information required to jump-start the newscast network’s formation.In principle,a new agent joins by contacting any existing agent and exchanging caches.When the whole network starts from scratch,a systematic way has to be present to provide one or more initial communication points to each agent.In our experiments this task was handled by the utility module.All agents were provided with the single address of one selected agent.Providing agents with a choice of(possibly random)agents to connect to initially, enhances the randomness of the network in the early cycles.However,a bootstrapping mechanism as simple and centralized as the one we chose further substantiates our claims of the protocol’s convergent behavior,as discussed in the following section.4ResultsThis section presents a thorough analysis of the output of our three large-scale exper-iments with128,000correspondents using cache sizes of20,30,and40,respectively. We will often compare the emerging communication graphs to random graphs.In all cases,the random graphs we refer to are generated by selecting exactly c undirected edges randomly for each node.For example,in such a graph each node has at least c edges(but usually more).4.1Statistical Properties of the Communication GraphFigure6illustrates the two most important properties of the emergent communication graphs.The number of cycles actually performed was over5000,however,only the ini-tial cycles are depicted because the values remain the same throughout the experiment indicating a convergent behavior.The average path length from a node is defined as the average of the minimal path lengths of that node to all other nodes.To get afinite value we have to have a con-nected graph.We can observe very low average path lengths which coincide with the expected lengths after extrapolation of the simulation data shown in Figure3(a).The initial peak is explained by the applied bootstrapping mechanism described in Section3. This mechanism results in an initially unbalanced neighborhood structure.However,af-ter all correspondents get connected to the collective,the average path length converges quickly to itsfinal value.12345678020406080100120140a v e r a g e p a t h l e n g t hcomplete cycles c = 20c = 400.20.40.60.81020406080100120140c l u s t e r i n g c o e f f i c i e n tcomplete cyclesc=20c=40(a)(b)Fig.6.(a)The evolution of average path length from a node.(b)Clustering coefficient.The average clustering coefficient taken over all nodes is shown in Figure 6(b),and again corresponds to our simulation results.Together with the values found for average path lengths,we can indeed conclude that our communication graphs are small-world graphs.Small-world graphs come in very different flavors however.One interesting property to investigate is whether our graphs are scale free or not.The degree of a random node defines a random variable.If this variable is exponentially distributed (linear on the log-log scale)then the graph is scale free.Figure 7shows the distribution of the node degree for the case of c =20which deviates the most from the random case.p r o p o r t i o n o f i n s t a n c e sdegreep r o p o r t i o n o f i n s t a n c e sdegreeFig.7.Degree distribution on (a)linear and (b)log-log scale using the same range.The depicted values belong to a single converged communication graph with n =128000and c =20(con-verged)and a graph with c random edges generated for each vertex (random).It can be seen clearly that our communication graph is not scale free.From a de-pendability point of view this is an advantage since scale-free graphs are sensitive to the removal of highly connected nodes (even though they are less sensitive to random node removal).The effect of node removal in our graphs is discussed next.5001000150020002500707580859095100i n d e p e n d e n t c l u s t e r spercentage of removed nodes c=20c=20, random graphc=30c=30, random graphc=40c=40, random graph500010000150002000025000300003500040000707580859095100s i z e o f l a r g e s t c l u s t e rpercentage of removed nodesc=20c=30c=40all remaining nodes(a)(b)Fig.8.Partitioning of the communication graph as a function of the percentage of removed ran-dom nodes (node failures).The curves belong to a single graph.The largest clusters of random graphs are omitted for clarity;their relationship to the communication graphs is similar to the relationship in the case of the number of clusters.4.2Robustness to Node RemovalFigure 8shows the effect of node removal to the connectivity of the communication graph.Note that the number of clusters decreases when approaching 100%removal be-cause the remaining graph itself becomes small.The graph shows very similar behavior to a random graph,especially if the cache is large.These results indicate considerable robustness to node failures especially considering the size of the largest cluster which indicates that most of the clusters are in fact very small and most of the nodes are still in a single connected cluster.5ConclusionsIn this paper we presented experiments with a Java implementation of the newscast model of information dissemination.The experiments involved 128,000correspondents communicating with each other using UDP over a wide-area,large-scale heterogeneous cluster of processors.The outcome of these experiments is particularly valuable since it represents the real implementation of our model as opposed to previously conducted simulations,yet the size of the system is comparable with the scale of typical simulation results as well.The results are in complete agreement with the theoretical predictions and simulations pre-sented in [8]providing practical evidence concerning the correctness of our algorithm and of the statistical properties of the emerging communication graphs.As we demonstrated,the series of the communication graphs show stable small-world properties which make it a dependable and effective device for information dis-semination and membership management.Most importantly,these properties are not maintained explicitly,but they are emergent from the underlying simple epidemic-style information exchange protocol.References1.R.Albert and A.-L.Barabasi.“Statistical Mechanics of Complex Networks.”Reviews ofModern Physics,74(1):47–97,Jan.2001.2.M.Castro,P.Druschel,Y.C.Hu,and A.Rowstron.“Exploitng Network Proximity in Peer-to-Peer Overlay Networks.”Technical Report MSR-TR-2002-82,Microsoft Research,Cam-bridge,UK,June2002.3.A.Demers,D.Greene,C.Hauser,W.Irish,rson,S.Shenker,H.Sturgis,D.Swinehart,and D.Terry.“Epidemic Algorithms for Replicated Database Management.”In Proc.Sixth Symp.on Principles of Distributed Computing,pp.1–12,Aug.1987.ACM.4.P.Eugster,R.Guerraoui,S.Handurukande, A.-M.Kermarrec,and P.Kouznetsov.“Lightweight Probabilistic Broadcast.”In Proc.Second Int’l Conf.Dependable Systems and Networks,pp.443–452,June2001.IEEE Computer Society Press,Los Alamitos,CA.5.P.T.Eugster and R.Guerraoui.“Probabilistic Multicast.”In Proc.Int’l Conf.DependableSystems and Networks,June2002.IEEE Computer Society Press,Los Alamitos,CA.6.A.Ganesh,A.-M.Kermarrec,and L.Massouli´e.“Scamp:Peer-to-Peer Lightweight Mem-bership Servic for Large-Scale Group Communication.”In worked Group Commu-nication Workshop,volume2233of Lect.Notes Comp.Sc.,pp.44–56,Nov.2001.Springer-Verlag,Berlin.7.A.Ganesh,A.-M.Kermarrec,and L.Massouli´e.“Peer-to-Peer Membership Managementfor Gossip-based Protocols.”IEEE p.,52(2):139–149,Feb.2003.8.M.Jelasity and M.van Steen.“Large-Scale Newscast Computing on the Internet.”TechnicalReport IR-503,Vrije Universiteit,Department of Computer Science,Oct.2002.9.A.-M.Kermarrec,L.Massouli´e,and A.Ganesh.“Probabilistic Reliable Dissemination inLarge-Scale Systems.”IEEE Trans.Par.Distr.Syst.,14(3):248–258,Mar.2003.10.S.P.Ratnasamy.A Scalable Content Addressable Network.PhD thesis,University of Cali-fornia at Berkeley,Oct.2002.11.S.Rhea,P.Eaton,D.Geels,H.Weatherspoon,B.Zhao,and J.Kubiatowicz.“Pond:theOceanStore Prototype.”In Proc.Second USENIX Conf.File and Storage Techn.,Mar.2003.USENIX,Berkeley,CA.12.A.Rowstron and P.Druschel.“Pastry:Scalable,Distributed Object Location and Routing forLarge-Scale Peer-to-Peer Systems.”In R.Guerraoui,(ed.),Proc.Middleware2001,volume 2218of Lect.Notes Comp.Sc.,pp.329–350,Nov.2001.Springer-Verlag,Berlin.13.I.Stoica,R.Morris,D.Karger,M.F.Kaashoek,and H.Balakrishnan.“Chord:A ScalablePeer-to-peer Lookup Service for Internet Applications.”In Proc.SIGCOMM,Aug.2001.ACM.14.I.Stoica,R.Morris,D.Liben-Nowell,D.R.Karger,M.F.Kaashoek,F.Dabek,and H.Bal-akrishnan.“Chord:A Scalable Peer-to-peer Lookup Protocol for Internet Applications.”IEEE/ACM w.,2003.To appear.15.D.J.Watts.Small Worlds,The Dynamics of Networks between Order and Randomness.Princeton University Press,Princeton,NJ,1999.16.B.Yang and H.Garcia-Molina.“Designing a Super-Peer Network.”In Proc.19th Int’l Conf.Data Engineering,Mar.2003.IEEE Computer Society Press,Los Alamitos,CA.17.B.Y.Zhao,J.Kubiatowicz,and A.D.Joseph.“Tapestry:An Infrastructure for Fault-tolerantWide-area Location and Routing.”Technical Report CSD-01-1141,Computer Science Di-vision,University of California,Berkeley,Apr.2001.。