Implementing a Highly Scalable and Adaptive Agent-Based Management Framework
- 格式:pdf
- 大小:271.08 KB
- 文档页数:5
hillstone secure connectHillstone Secure Connect: A Comprehensive Guide to Secure Network ConnectivityIntroductionIn today's digitally-driven world, secure network connectivity is of paramount importance for organizations of all sizes. With the increasing number of cyber threats and data breaches, businesses need reliable solutions to protect their networks and sensitive data. One such solution that has gained prominence is Hillstone Secure Connect. In this comprehensive guide, we will explore the various aspects of Hillstone Secure Connect, including its features, benefits, implementation, and best practices.1. What is Hillstone Secure Connect?Hillstone Secure Connect is a software-defined wide area network (SD-WAN) solution designed to provide secure and reliable network connectivity for organizations. It combines advanced threat prevention capabilities with high-performance networking to ensure seamless communicationacross multiple sites and branches. Hillstone Secure Connect offers centralized management, end-to-end encryption, intelligent routing, and real-time threat intelligence, making it an ideal choice for businesses seeking a robust network security solution.2. Key Features of Hillstone Secure Connect2.1 Centralized Management:Hillstone Secure Connect offers a centralized management platform that allows administrators to easily configure and monitor the entire network. This feature enables efficient provisioning of new sites, policy management, and real-time visibility into network performance.2.2 End-to-End Encryption:Data security is a top priority for any organization. Hillstone Secure Connect provides end-to-end encryption, ensuring that all network traffic is secure and protected from unauthorized access. This feature is essential for organizations that need to comply with regulatory requirements and protect sensitive information.2.3 Intelligent Routing:Hillstone Secure Connect utilizes intelligent routing algorithms to optimize network performance. It dynamically selects the most efficient path for data transmission, ensuring low latency and high availability. This feature enhances the overall network performance and user experience.2.4 Real-time Threat Intelligence:Protection against cyber threats is a critical component of any comprehensive network security solution. Hillstone Secure Connect integrates real-time threat intelligence, leveraging advanced threat detection techniques to identify and block malicious traffic. This proactive approach helps organizations stay one step ahead of potential cyberattacks.3. Benefits of Hillstone Secure Connect3.1 Enhanced Network Security:By combining advanced threat prevention capabilities with secure connectivity, Hillstone Secure Connect ensures that your network remains protected from both external and internal threats. It provides comprehensive security features, including firewall protection, intrusion prevention, and secure remote access, safeguarding your critical assets from unauthorized access and data breaches.3.2 Increased Productivity:With Hillstone Secure Connect, organizations can experience increased productivity due to improved network performance and reliable connectivity. The intelligent routing feature optimizes network traffic, reducing latency and ensuring that critical applications and services run smoothly. This helps employees work efficiently and collaborate seamlessly across multiple locations.3.3 Cost Savings:Implementing Hillstone Secure Connect can result in significant cost savings for organizations. The centralized management platform simplifies network administration, reducing the need for dedicated IT resources. Additionally,the intelligent routing feature optimizes bandwidth usage, minimizing the requirement for costly network upgrades.3.4 Scalability and Flexibility:Hillstone Secure Connect is highly scalable and can seamlessly adapt to the evolving needs of your organization. Whether you are expanding your business or adding new sites, the solution can easily accommodate the growing demands of your network infrastructure. Moreover, it supports a wide range of connectivity options, including MPLS, broadband, and wireless connections, providing flexibility in network deployment.4. Implementation Considerations4.1 Assessing Network Requirements:Before implementing Hillstone Secure Connect, it is essential to assess your organization's network requirements. Evaluate factors such as the number of sites, bandwidth requirements, security policies, and regulatory compliance. This assessment will help determine the optimal configuration and ensure a smooth implementation process.4.2 Collaborating with Service Providers:For a successful implementation, it is advisable to collaborate with experienced service providers who have expertise in deploying Hillstone Secure Connect. They can assist with the initial setup, configuration, and ongoing maintenance, ensuring that your network remains secure and operational.4.3 Employee Training and Awareness:To fully leverage the benefits of Hillstone Secure Connect, it is crucial to provide adequate training and awareness programs to employees. Educate them about the importance of network security, establish best practices for remote access, and encourage adherence to security policies. This will contribute to a strong security culture within the organization.ConclusionHillstone Secure Connect offers organizations a comprehensive solution for secure network connectivity. With its advanced security features, centralized management, and intelligent routing capabilities, it addresses the evolvingnetwork security challenges of the modern business landscape. By implementing Hillstone Secure Connect, organizations can ensure enhanced network security, increased productivity, cost savings, and scalability. Taking into account the implementation considerations and adhering to best practices, organizations can leverage the full potential of Hillstone Secure Connect and protect their networks against cyber threats.。
Implementing a high performance, scalable messaging infrastructure system can be a complex and formidable endeavor, requiring a great deal of specialized knowledge and more time than many businesses can afford.TIBCO SmartSockets® real-time messaging software addresses these challenges by masking your distributed applications from the underlying complexities of your network structure-whether network servers are local or remote, for example, or whether the network ismulticast- or only unicast-enabled. In this way, SmartSockets frees your development team to focus on their core competencies. In addition, SmartSockets automatically provides an extra layer of reliability to your distributed applications at the messaging level, complementing and extending any reliability capabilities that may reside at the deeper network level.In the increasingly complex and heterogeneous environment in which your business must exchange and distribute revenue-critical information, SmartSockets provides the technology your organization needs to stay ahead: exceptional performance, very high scalability,optimum bandwidth efficiency, robust fault tolerance, reliable real-time messaging using industry-standard protocols - and much more.A Comprehensive, Flexible Offering From data delivery to connectivity, security to monitoring and management, SmartSockets isa flexible messaging solution tailored to meet your critical real-time data delivery plementing the core SmartSockets solution, TIBCO offers an extensive range of add-on products that allow you to extend the power of SmartSockets. All SmartSockets products deliver advanced functionality without sacrificing performance.BenefitsSignificantly increase message volumeTransparently adapt messaging systems to changing requirementsScale applications across the enterprise or the internet More efficiently utilize network bandwidthFeatures Publish-subscribe for one-to-many communicationsMultithreaded, multiprocessor architecture for full system exploitationOnline security safeguards vital communicationsReal-time monitoring of network applicationsPerformance optimization for maximum throughputRobust, enterprise-quality fault tolerant GMD for reliable message delivery Speed and FlexibilitySmartSockets increases development productivity and reduces development cycles by allowing your development teams to work in their preferred platform and languageenvironments, and by embracing industry standards. SmartSockets offers:Support for multiple programming languagesConsistent, intuitive interfaces across all supported platformsEasy-to-use callbacks to respond to asynchronous event notificationsSupport for eXtensible Mark-up Language (XML)Isolation from network programming complexitiesSmartSockets is based on industry standards and protocols, including JMS, PGM (Pragmatic General Multicast), TCP/IP and SSL, ensuring that companies maximize investment by eliminating barriers to communication and interaction.T TIBCO SmartSocketsThe American Stock Exchange is taking advantage of TIBCO SmartSockets' unique multicast capability to efficiently transmitmarket data to multiple subscribersin a single operation -greatlyimproving network utilization andspeed. A TIBCO Messaging Solution。
d3651. Introductiond365 is a powerful software solution developed by Microsoft that offers comprehensive enterprise resource planning (ERP) and customer relationship management (CRM) capabilities. It helps businesses streamline their operations, manage finances, automate processes, and enhance customer engagement. This document provides an overview of d365 and its key features.2. Features2.1 ERP Capabilitiesd365 offers a wide range of ERP capabilities that help businesses effectively manage their financial transactions, supply chain, inventory, manufacturing processes, and more. Some key features of d365’s ERP capabilities include:•Financial Management: d365 provides a robust financial management module that enables businesses to efficiently handle their accounting, budgeting, cash flowmanagement, and financial reporting processes. It helpsbusinesses gain real-time insights into their financialperformance and make informed decisions.•Supply Chain Management: With d365, businesses can optimize their supply chain processes by efficientlymanaging procurement, inventory, demand forecasting,and logistics. It helps businesses improve order fulfillment, reduce inventory costs, and enhance overall supply chain visibility.•Manufacturing: d365’s manufacturing module helps businesses optimize their production processes,manage bills of materials, schedule production orders,track work in progress, and ensure timely delivery ofproducts. It supports discrete, process, and leanmanufacturing methods.2.2 CRM CapabilitiesIn addition to its strong ERP capabilities, d365 offers advanced CRM features that enable businesses to effectively manage their customer relationships, sales pipelines, marketing campaigns, and service activities. Some key features of d365’s CRM capabilities include:•Sales Automation: d365’s sales automation module helps businesses track leads, manage opportunities,automate sales processes, and forecast revenue. It provides sales representatives with a holistic view of theircustomers, enabling them to nurture relationships andclose deals more effectively.•Marketing Automation: d365 enables businesses to automate their marketing campaigns, segment customer data, track campaign performance, and generate actionable insights. It helps businesses target the right audience,personalize marketing messages, and improve overallmarketing effectiveness.•Service Management: d365’s service management module allows businesses to effectively capture, manage, and resolve customer service requests. It provides a centralized platform for managing service tickets, tracking service performance, and ensuring timely resolution of customer issues.3. Benefitsd365 offers several benefits to businesses, including:•Improved Efficiency: By automating various business processes, d365 helps businesses streamline operations and improve efficiency. It eliminates manual tasks, reduces errors, and accelerates decision-making.•Enhanced Collaboration: d365 provides a unified platform that enables seamless collaboration among different departments within an organization. It promotes knowledge sharing, improves communication, and facilitates cross-functional teamwork.•Better Customer Insights: With its integrated CRM capabilities, d365 allows businesses to gain deeper insights into customer behavior, preferences, and needs. This helps businesses tailor their offerings, provide personalized experiences, and build stronger customer relationships.•Scalability: d365 is highly scalable and can easily adapt to the changing needs of businesses. Whether a business is experiencing rapid growth or expanding into new markets, d365 can support its evolving requirements.4. Conclusiond365 is a comprehensive ERP and CRM solution that offers powerful capabilities for managing various aspects of a business. From financial management to supply chain optimization, from sales automation to service management, d365 provides businesses with the tools they need to succeed in today’s competitive landscape. With its numerous benefits, d365 is an ideal choice for businesses looking to streamline their operations, improve customer engagement, and drive business growth.。
PRIDE:A Data Abstraction Layer for Large-Scale2-tier Sensor NetworksWoochul Kang University of Virginia Email:wk5f@Sang H.SonUniversity of VirginiaEmail:son@John A.StankovicUniversity of VirginiaEmail:stankovic@Abstract—It is a challenging task to provide timely access to global data from sensors in large-scale sensor network applica-tions.Current data storage architectures for sensor networks have to make trade-offs between timeliness and scalability. PRIDE is a data abstraction layer for2-tier sensor networks, which enables timely access to global data from the sensor tier to all participating nodes in the upper storage tier.The design of PRIDE is heavily influenced by collaborative real-time ap-plications such as search-and-rescue tasks for high-rise building fires,in which multiple devices have to collect and manage data streams from massive sensors in cooperation.PRIDE achieves scalability,timeliness,andflexibility simultaneously for such applications by combining a model-driven full replication scheme and adaptive data quality control mechanism in the storage-tier. We show the viability of the proposed solution by implementing and evaluating it on a large-scale2-tier sensor network testbed. The experiment results show that the model-driven replication provides the benefit of full replication in a scalable and controlled manner.I.I NTRODUCTIONRecent advances in sensor technology and wireless connec-tivity have paved the way for next generation real-time appli-cations that are highly data-driven,where data represent real-world status.For many of these applications,data streams from sensors are managed and processed by application-specific devices such as PDAs,base stations,and micro servers.Fur-ther,as sensors are deployed in increasing numbers,a single device cannot handle all sensor streams due to their scale and geographic distribution.Often,a group of such devices need to collaborate to achieve a common goal.For instance,during a search-and-rescue task for a buildingfire,while PDAs carried byfirefighters collect data from nearby sensors to check the dynamic status of the building,a team of suchfirefighters have to collaborate by sharing their locally collected real-time data with peerfirefighters since each individualfirefighter has only limited information from nearby sensors[1].The building-wide situation assessment requires fusioning data from all(or most of)firefighters.As this scenario shows,lots of future real-time applications will interact with physical world via large numbers of un-derlying sensors.The data from the sensors will be managed by distributed devices in cooperation.These devices can be either stationary(e.g.,base stations)or mobile(e.g.,PDAs and smartphones).Sharing data,and allowing timely access to global data for each participating entity is mandatory for suc-cessful collaboration in such distributed real-time applications.Data replication[2]has been a key technique that enables each participating entity to share data and obtain an understanding of the global status without the need for a central server. In particular,for distributed real-time applications,the data replication is essential to avoid unpredictable communication delays[3][4].PRIDE(Predictive Replication In Distributed Embedded systems)is a data abstraction layer for devices performing collaborative real-time tasks.It is linked to an application(s) at each device,and provides transparent and timely access to global data from underlying sensors via a scalable and robust replication mechanism.Each participating device can transparently access the global data from all underlying sen-sors without noticing whether it is from local sensors,or from remote sensors,which are covered by peer devices. Since global data from all underlying sensors are available at each device,queries on global spatio-temporal data can be efficiently answered using local data access methods,e.g.,B+ tree indexing,without further communication.Further,since all participating devices share the same set of data,any of them can be a primary device that manages a sensor.For example,when entities(either sensor nodes or devices)are mobile,any device that is close to a sensor node can be a primary storage node of the sensor node.Thisflexibility via decoupling the data source tier(sensors)from the storage tier is very important if we consider the highly dynamic nature of wireless sensor network applications.Even with these advantages,the high overhead of repli-cation limits its applicability[2].Since potentially a vast number of sensor streams are involved,it is not generally possible to propagate every sensor measurement to all devices in the system.Moreover,the data arrival rate can be high and unpredictable.During critical situations,the data rates can significantly increase and exceed system capacity.If no corrective action is taken,queues will form and the laten-cies of queries will increase without bound.In the context of centralized systems,several intelligent resource allocation schemes have been proposed to dynamically control the high and unpredictable rate of sensor streams[5][6][7].However, no work has been done in the context of distributed and replicated systems.In this paper,we focus on providing a scalable and robust replication mechanism.The contributions of this paper are: 1)a model-driven scalable replication mechanism,which2significantly reduces the overall communication and computation overheads,2)a global snapshot management scheme for efficientsupport of spatial queries on global data,3)a control-theoretic quality-of-data management algo-rithm for robustness against unpredictable workload changes,and4)the implementation and evaluation of the proposed ap-proach on a real device with realistic workloads.To make the replication scalable,PRIDE provides a model-driven replication scheme,in which the models of sensor streams are replicated to peer storage nodes,instead of data themselves.Once a model for a sensor stream is replicated from a primary storage node of the sensor to peer nodes,the updates from the sensor are propagated to peer nodes only if the prediction from the current model is not accurate enough. Our evaluation in Section5shows that this model-driven approach makes PRIDE highly scalable by significantly re-ducing the communication/computation overheads.Moreover, the Kalmanfilter-based modeling technique in PRIDE is light-weight and highly adaptable because it dynamically adjusts its model parameters at run-time without training.Spatial queries on global data are efficiently supported by taking snapshots from the models periodically.The snapshot is an up-to-date reflection of the monitored situation.Given this fresh snapshot,PRIDE supports a rich set of local data orga-nization mechanisms such as B+tree indexing to efficiently process spatial queries.In PRIDE,the robustness against unpredictable workloads is achieved by dynamically adjusting the precision bounds at each node to maintain a proper level of system load,CPU utilization in particular.The coordination is made among the nodes such that relatively under-loaded nodes synchronize their precision bound with an relatively overloaded node. Using this coordination,we ensure that the congestion at the overloaded node is effectively resolved.To show the viability of the proposed approach,we imple-mented a prototype of PRIDE on a large-scale testbed com-posed of Nokia N810Internet tablets[8],a cluster computer, and a realistic sensor stream generator.We chose Nokia N810 since it represents emerging ubiquitous computing platforms such as PDAs,smartphones,and mobile computers,which will be expected to interact with ubiquitous sensors in the near future.Based on the prototype implementation,we in-vestigated system performance attributes such as communica-tion/computation loads,energy efficiency,and robustness.Our evaluation results demonstrate that PRIDE takes advantage of full replication in an efficient,highly robust and scalable manner.The rest of this paper is organized as follows.Section2 presents the overview of PRIDE.Section3presents the details of the model-driven replication.Section4discusses our pro-totype implemention,and Section5presents our experimental results.We present related work in Section6and conclusions in Section7.II.O VERVIEW OF PRIDEA.System ModelFig.1.A collaborative application on a2-tier sensor network. PRIDE envisions2-tier sensor network systems with a sensor tier and a storage tier as shown in Figure1.The sensor tier consists of a large number of cheap and simple sensors;S={s1,s2,...,s n},where s i is a sensor.Sensors are assumed to be highly constrained in resources,and per-form only primitive functions such as sensing and multi-hop communication without local storage.Sensors stream data or events to a nearest storage node.These sensors can be either stationary or mobile;e.g.,sensors attached to afirefighter are mobile.The storage tier consists of more powerful devices such as PDAs,smartphones,and base stations;D={d1,d2,...,d m}, where d i is a storage node.These devices are relatively resource-rich compared with sensor nodes.However,these devices also have limited resources in terms of processor cycles,memory,power,and bandwidth.Each storage node provides in-network storage for underlying sensors,and stores data from sensors in its vicinity.Each node supports multiple radios;an802.11radio to connect to a wireless mesh network and a802.15.4to communicate with underlying sensors.Each node in this tier can be either stationary(e.g.,base stations), or mobile(e.g.,smartphones and PDAs).The sensor tier and the storage tier have loose coupling; the storage node,which a sensor belongs to,can be changed dynamically without coordination between the two tiers.This loose coupling is required in many sensor network applications if we consider the highly dynamic nature of such systems.For example,the mobility of sensors and storage nodes makes the system design very complex and inflexible if two tiers are tightly coupled;a complex group management and hand-off procedure is required to handle the mobility of entities[9]. Applications at each storage node are linked to the PRIDE layer.Applications issue queries to underlying PRIDE layer either autonomously,or by simply forwarding queries from external users.In the search-and-rescue task example,each storage node serves as both an in-network data storage for nearby sensors and a device to run autonomous real-time applications for the mission;the applications collect data by issuing queries and analyzing the situation to report results to thefirefighter.Afterwards,a node refers to a storage node if it is not explicitly stated.3Fig.2.The architecture of PRIDE(Gray boxes).age ModelIn PRIDE,all nodes in the storage tier are homogeneous in terms of their roles;no asymmetrical function is placed on a sub-group of the nodes.All or part of the nodes in the storage tier form a replication group R to share the data from underlying sensors,where R⊂D.Once a node joins the replication group,updates from its local sensors are propagated to peer nodes;conversely,the node can receive updates from remote sensors via peer nodes.Any storage node,which is receiving updates directly from a sensor,becomes a primary node for the sensor,and it broadcasts the updates from the sensor to peer nodes.However,it should be noted that,as will be shown in Section3,the PRIDE layer at each node performs model-driven replication,instead of replicating sensor data,to make the replication efficient and scalable.PRIDE is characterized by the queries that it supports. PRIDE supports both temporal queries on each individual sensor stream and spatial queries on current global data.Tem-poral queries on sensor s i’s historical data can be answered using the model for s i.An example of temporal query is “What is the value of sensor s i5minutes ago?”For spatial queries,each storage node provides a snapshot on the entire set of underlying sensors(both local and remote sensors.)The snapshot is similar to a view in database ing the snapshot,PRIDE provides traditional data organization and access methods for efficient spatial query processing.The access methods can be applied to any attributes,e.g.,sensor value,sensor ID,and location;therefore,value-based queries can be efficiently supported.Basic operations on the access methods such as insertion,deletion,retrieval,and the iterating cursors are supported.Special operations such as join cursors for join operations are also supported by making indexes to multiple attributes,e.g.,temperature and location attributes. This join operation is required to efficiently support complex spatial queries such as“Return the current temperatures of sensors located at room#4.”III.PRIDE D ATA A BSTRACTION L AYERThe architecture of PRIDE is shown in Figure2.PRIDE consists of three key components:(i)filter&prediction engine,which is responsible for sensor streamfiltering,model update,and broadcasting of updates to peer nodes,(ii)query processor,which handles queries on spatial and temporal data by using a snapshot and temporal models,respectively,and (iii)feedback controller,which determines proper precision bounds of data for scalability and overload protection.A.Filter&Prediction EngineThe goals offilter&prediction engine are tofilter out updates from local sensors using models,and to synchronize models at each storage node.The premise of using models is that the physical phenomena observed by sensors can be captured by models and a large amount of sensor data can be filtered out using the models.In PRIDE,when a sensor Input:update v from sensor s iˆv=prediction from model for s i;1if|ˆv−v|≥δthen2broadcast to peer storage nodes;3update data for s i in the snapshot;4update model m i for s i;5store to cache for later temporal query processing;6else7discard v(or store for logging);8end9Algorithm2:OnUpdateFromPeer.stream s i is covered by PRIDE replication group R,each storage node in R maintains a model m i for s i.Therefore, all storage nodes in R maintain a same set of synchronized models,M={m1,m2,...,m n},for all sensor streams in underlying sensor tier.Each model m i for sensor s i are synchronized at run-time by s i’s current primary storage node (note that s i’s primary node can change during run-time because of the network topology changes either at sensor tier or storage tier).Algorithms1and2show the basic framework for model synchronization at a primary node and peer nodes,respec-tively.In Algorithm1,when an update v is received from sensor s i to its primary storage node d j,the model m i is looked up,and a prediction is made using m i.If the gap between the predicted value from the model,ˆv,and the sensor update v is less than the precision boundδ(line2),then the new data is discarded(or saved locally for logging.)This implies that the current models(both at the primary node and the peer nodes)are precise enough to predict the sensor output with the given precision bound.However,if the gap is bigger than the precision bound,this implies that the model cannot capture the current behavior of the sensor output.In this case, m i at the primary node is updated and v is broadcasted to all peer nodes(line3).In Algorithm2,as a reaction to the broadcast from d j,each peer node receives a new update v and updates its own model m i with v.The value v is stored in local caches at all nodes for later temporal query processing.4As shown in the Algorithms,the communication among nodes happens only when the model is not precise enough. Models,Filtering,and Prediction So far,we have not discussed a specific modeling technique in PRIDE.Several distinctive requirements guide the choice of modeling tech-nique in PRIDE.First,the computation and communication costs for model maintenance should be low since PRIDE han-dles a large number of sensors(and corresponding models for each sensor)with collaboration of multiple nodes.The cost of model maintenance linearly increases to the number of sensors. Second,the parameters of models should be obtained without an extensive learning process,because many collaborative real-time applications,e.g.,a search-and-rescue task in a building fire,are short-term and deployed without previous monitoring history.A statistical model that needs extensive historical data for model training is less applicable even with their highly efficientfiltering and prediction performance.Finally, the modeling should be general enough to be applied to a broad range of applications.Ad-hoc modeling techniques for a particular application cannot be generally used for other applications.Since PRIDE is a data abstraction layer for wide range of collaborative applications,the generality of modeling is important.To this end,we choose to use Kalmanfilter [10][6],which provides a systematic mechanism to estimate past,current,and future state of a system from noisy measure-ments.A short summary on Kalmanfilter follows.Kalman Filter:The Kalmanfilter model assumes the true state at time k is evolved from the state at(k−1)according tox k=F k x k−1+w k;(1) whereF k is the state transition matrix relating x k−1to x k;w k is the process noise,which follows N(0,Q k);At time k an observation z k of the true state x k is made according toz k=H k x k+v k(2) whereH k is the observation model;v k is the measurement noise,which follows N(0,R k); The Kalmanfilter is a recursive minimum mean-square error estimator.This means that only the estimated state from the previous time step and the current measurement are needed to compute the estimate for the current and future state. In contrast to batch estimation techniques,no history of observations is required.In what follows,the notationˆx n|m represents the estimate of x at time n given observations up to,and including time m.The state of afilter is defined by two variables:ˆx k|k:the estimate of the state at time k givenobservations up to time k.P k|k:the error covariance matrix(a measure of theestimated accuracy of the state estimate). Kalmanfilter has two distinct phases:Predict and Update. The predict phase uses the state estimate from the previous timestep k−1to produce an estimate of the state at the next timestep k.In the update phase,measurement information at the current timestep k is used to refine this prediction to arrive at a new more accurate state estimate,again for the current timestep k.When a new measurement z k is available from a sensor,the true state of the sensor is estimates using the previous predictionˆx k|k−1,and the weighted prediction error. The weight is called Kalman gain K k,and it is updated on each prediction/update cycle.The true state of the sensor is estimated as follows,ˆx k|k=ˆx k|k−1+K k(z k−H kˆx k|k−1).(3)P k|k=(I−K k H k)P k|k−1.(4) The Kalman gain K k is updated as follows,K k|k=P k|k−1H T k(H k P k|k−1H T k+R k).(5) At each prediction step,the next state of the sensor is predicted by,ˆx k|k−1=F kˆx k−1|k−1.(6) Example:For instance,a temperature sensor can be described by the linear state space,x k= x dxdtis the derivative of the temperature with respect to time.As a new(noisy)measurement z k arrives from the sensor1,the true state and model parameters are estimated by Equations3-5.The future state of the sensor at(k+1)th time step after∆t can be predicted using the Equation6, where the state transition matrix isF= 1∆t01 .(7) It should be noted that the parameters for Kalmanfilter,e.g., K and P,do not have to be accurate in the beginning;they can be estimated at run-time and their accuracy improves gradually by having more sensor measurements.We do not need massive past data for modeling at deployment time.In addition,the update cycle of Kalmanfilter(Equations3-5) is performed at all storage nodes when a new measurement is broadcasted as shown in Algorithm1(line5)and Algorithm2 (line2).No further communication is required to synchronize the parameters of the models.Finally,as will be shown in Section5,the prediction/update cycle of Kalmanfilter incurs insignificant overhead to the system.1Note that the temperature component of zk is directly acquired from the sensor,and dx5B.Query ProcessorThe query processor of PRIDE supports both temporal queries and spatial queries with planned extension to support spatio-temporal queries.Temporal Queries:Historical data for each sensor stream can be processed in any storage node by exploiting data at the local cache and linear smoother[10].Unlike the estimation of current and future states using one Kalmanfilter,the optimized estimation of historical data(sometimes called smoothing) requires two Kalmanfilters,a forwardfilterˆx and a backward filterˆx b.Smoothing is a non-real-time data processing scheme that uses all measurements between0and T to estimate the state of a system at a certain time t,where0≤t≤T(see Figure3.)The smoothed estimateˆx(t|T)can be obtained as a linear combination of the twofilters as follows.ˆx(t|T)=Aˆx(t)+A′ˆx(t)b,(8) where A and A′are weighting matrices.For detailed discus-sion on smoothing techniques using Kalmanfilters,the reader is referred to[10].Fig.3.Smoothing for temporal query processing.Spatial Queries:Each storage node maintains a snapshot for all underlying local and remote sensors to handle queries on global spatial data.Each element(or data object)of the snapshot is an up-to-date value from the corresponding sensor.The snapshot is dynamically updated either by new measurements from sensors or by models2.The Algorithm1 (line4)and Algorithm2(line1)show the snapshot updates when a new observation is pushed from a local sensor and a peer node,respectively.As explained in the previous section, there is no communication among storage nodes when models well represent the current observations from sensors.When there is no update from peer nodes,the freshness of values in the snapshot deteriorate over time.To maintain the freshness of the snapshot even when there is no updates from peer nodes,each value in the snapshot is periodically updated by its local model.Each storage node can estimate the current state of sensor s i using Equation6without communication to the primary storage node of s i.For example,a temperature after30seconds can be predicted by setting∆t of transition matrix in Equation7to30seconds.The period of update of data object i for sensor s i is determined,such that the precision boundδis observed. Intuitively,when a sensor value changes rapidly,the data object should be updated more frequently to make the data object in the snapshot valid.In the example of Section3.1.1, 2Note that the data structures for the snapshot such as indexes are also updated when each value of the snapshot is updated.the period can be dynamically estimated as follows:p[i]=δ/dxdtis the absolute validity interval(avi)before the data object in the snapshot violates the precision bound,which is±δ.The update period should be as short as the half of the avi to make the data object fresh[11].Since each storage node has an up-to-date snapshot,spatial queries on global data from sensors can be efficiently han-dled using local data access methods(e.g.,B+tree)without incurring further communication delays.(a)δ=5C(b)δ=10CFig.4.Varying data precision.Figure4shows how the value of one data object in the snapshot changes over time when we apply different precision bounds.As the precision bound is getting bigger,the gap be-tween the real state of the sensor(dashed lines)and the current value at the snapshot(solid lines)increases.In the solid lines, the discontinued points are where the model prediction and the real measurement from the sensor are bigger than the precision bound,and subsequent communication is made among storage nodes for model synchronization.For applications and users, maintaining the smaller precision bound implies having a more accurate view on the monitored situation.However, the overhead also increases as we have the smaller precision bound.Given the unpredictable data arrival rates and resource constraints,compromising the data quality for system sur-vivability is unavoidable in many situations.In PRIDE,we consider processor cycles as the primary limited resource,and the resource allocation is performed to maintain the desired CPU utilization.The utilization control is used to enforce appropriate schedulable utilization bounds of applications can be guaranteed despite significant uncertainties in system work-loads[12][5].In utilization control,it is assumed that any cycles that are recovered as a result of control in PRIDE layer are used sensibly by the scheduler in the application layer to relieve the congestion,or to save power[12][5].It can also enhance system survivability by providing overload protection against workloadfluctuation.Specification:At each node,the system specification U,δmax consists of a utilization specification U and the precision specificationδmax.The desired utilization U∈[0..1]gives the required CPU utilization not to overload the system while satisfying the target system performance6 such as latency,and energy consumption.The precisionspecificationδmax denotes the maximum tolerable precision bound.Note there is no lower bound on the precision as in general users require a precision bound as short as possible (if the system is not overloaded.)Local Feedback Control to Guarantee the System Spec-ification:Using feedback control has shown to be very effec-tive for a large class of computing systems that exhibit unpre-dictable workloads and model inaccuracies[13].Therefore,to guarantee the system specification without a priori knowledge of the workload or accurate system model we apply feedbackcontrol.Fig.5.The feedback control loop.The overall feedback control loop at each storage node is shown in Figure5.Let T is the sampling period.The utilization u(k)is measured at each sampling instant0T,1T,2T,...and the difference between the target utilization and u(k)is fed into the ing the difference,the controller computes a local precision boundδ(k)such that u(k)converges to U. Thefirst step for local controller design is modeling the target system(storage node)by relatingδ(k)to u(k).We model the the relationship betwenδ(k)and u(k)by using profiling and statistical methods[13].Sinceδ(k)has higher impact on u(k)as the size of the replication group increases, we need different models for different sizes of the group. We change the number of members of the replication group exponentially from2to64and have tuned a set offirst order models G n(z),where n∈{2,4,8,16,32,64}.G n(z)is the z-transform transfer function of thefirst-order models,in which n is the size of the replication group.After the modeling, we design a controller for the model.We have found that a proportional integral(PI)controller[13]is sufficient in terms of providing a zero steady-state error,i.e.,a zero difference between u(k)and the target utilization bound.Further,a gain scheduling technique[13]have been used to apply different controller gains for different size of replication groups.For instance,the gain for G32(z)is applied if the size of a replication group is bigger than24and less than or equal to48. Due to space limitation we do not provide a full description of the design and tuning methods.Coordination among Replication Group Members:If each node independently sets its own precision bound,the net precision bound of data becomes unpredictable.For example, at node d j,the precision bounds for local sensor streams are determined by d j itself while the precision bounds for remote sensor streams are determined by their own primary storage nodes.PRIDE takes a conservative approach in coordinating stor-age nodes in the group.As Algorithm3shows,the global precision bound for the k th period is determined by taking the maximum from the precision bounds of all nodes in theInput:myid:my storage id number/*Get localδ.*/1measure u(k)from monitor;2calculateδmyid(k)from local controller;3foreach peer node d in R−˘d myid¯do4/*Exchange localδs.*/5/*Use piggyback to save communication cost.*/ 6sendδmyid(k)to d;7receiveδi(k)from d;8end9/*Get thefinal globalδ.*/10δglobal(k)=max(δi(k)),where i∈R;11。
Isilon IQ and VMware vSphere 4.0Configuration Guide for VMWare vSphere and Isilon IQ™ with OneFS® v5.xBy Shai Harmelin, Sr. Solutions ArchitectAn Isilon Systems® Technical Configuration GuideUpdated August 2009Table of Contents1. Introduction (2)Scale-out NAS for Server Virtualization (2)2. Using Network Attached Storage (NAS) with VMWare (3)VMWare Support for NAS (3)3. Advantages of Isilon Scale-Out NAS over Traditional NAS (4)4. Isilon IQ Cluster Configuration (4)Cluster Configuration Considerations (4)Isilon Networking Concepts (6)Isilon Network Configuration (7)High Availability with NFS Failover (9)Isilon Network Design Best Practices (10)5. ESX Configuration (12)Creating a Virtual Switch (12)Configuring a Service Console (VI3 Only) (13)Creating a VMware Datastore (15)6. Virtual Machine Configuration (16)Placing VM Files in a Datastore on Isilon Cluster (16)Creating Virtual Disks in a Datastore on Isilon IQ Cluster (17)7. Migrating VMs between ESX Hosts (17)8. Working with Snapshots (18)Snapshots Explained (18)Taking Snapshots with the ESX Snapshot Utility (19)Reverting To a Snapshot (20)9. Disaster Recovery with SyncIQ (20)10. Performing Backups (20)Introduction (21)Best Practices for NDMP Backup of VMs (21)VMware Consolidated Backup (VCB) Best Practices (22)1.IntroductionScale-out NAS for Server VirtualizationIsilon IQ Scale-out NAS with clustered architecture and single file system, is designed to be scalable, highly reliable, and easy to manage platform for storing Virtual Machines (VMs) hosted on VMware vSphere 4.0 with ESX 4.0 or ESX3.0 using the industry-standard file-sharing protocol, NFS.Server virtualization is quickly becoming a standard in major enterprises to simplify overhead and reduce costs of managing large-scale server environments for test, development, and production applications as well as hosted services in the cloud. VMware is a leader in this trend, vastly simplifying server management, driving up server utilization rates, driving down costs, and providing a new virtual infrastructure to simplify the ongoing management of many VMs.While virtualization o a solution for server sprawl, a new challenge for enterprises with traditional SAN or scale-up NAS comes into play in especially when consolidating or deploying large numbers of VMs along with non-virtualized servers. These challenges with traditional storage can often negate the cost savings and efficiencies expected in virtual environments.While Storage Area Networks (SANs) enable sharing of storage across multiple ESX Server hosts (an advancement over Direct Attached Storage), each LUN requires a separate management point — either dedicated or shared acrossa set of VMs. If LUNs are dedicated to individual VMs, the number of management points grows quickly along withthe number of VMs. In turn, if VMs are consolidated on an individual LUN, changes to a storage device or LUNs often impact a large number of VMs. With traditional SAN or scale-up NAS in a virtualized environment, the complexity of volume sprawl, capacity and load balancing, mount management and other storage administration adds tomanagement time, and slows deployments.With a single file system and clustered architecture, Isilon Scale-out NAS is a scalable, agile and easy-to-manage storage platform for your virtualized environment and large-scale deployments.Isilon CertificationsIsilon is a VMWare Ready certified storage vendor for both ESX 3.0 and ESX 4.0 (vSphere). This certification for the Isilon IQ product family, including IQ1920x, IQ3000x, IQ6000x, IQ9000x and IQ12000x, ensures compatibility with VMware vSphere™ 4 and that Isilon is ready for deployment in customer environments.AssumptionsIn writing this Configuration Guide, it was assumed the reader has:•Understanding of the NFS protocol•Working knowledge of Isilon IQ storage, the OneFS® operating system, and the WebUI and the command-line interface•Working knowledge of VMware ESX Server and Virtual CenterThis configuration guide will provide the necessary steps to configure the Isilon IQ storage cluster and ESX hosts to manage virtual machine datastores on Isilon storage systems over NFS.ing Network Attached Storage (NAS) with VMWareVMWare Support for NASVMware introduced support for network NAS datastores in ESX Server 3.0. Prior to ESX 3.0, VMware supported only block-level storage options, i.e. direct attach storage (DAS) or storage area networks (SAN). With NAS support in ESX Server customers have a more manageable and flexible alternative to traditional block-level DAS or SAN storage.Fundamentally, NAS stores large VM datastores on a Network File System (an industry-standard file sharingprotocol) export rather than VMFS volumes and storage LUNs. The storage system is presented to each ESX Server as a network mount and ESX then stores and accesses VMs on the storage system using a Network File System (NFS).Among the advantages of NAS-based datastores are:•Rapid and simple storage provisioning: Once storage is allocated to an ESX host, it can be used and re-used as required.•Lower costs: Implementing a NAS infrastructure costs less than comparable SAN-based architectures. This is primarily due to lower networking and hardware costs, but also due to lower management costs, as dedicated storage administrators are typically not required.•Higher storage utilization rates: VMware disk files (VMDK files) are thin-provisioned by default with NAS datastores.•Easier to manage storage with multiple VMs: Instead of managing LUNs for individual VMs, all VMDK files may be stored on a common file export.•Simplified backup scenarios: All VM files may be backed up behind a single, central mount point.However not all NAS products are not created equal. Traditional NAS vendors that rely on single head architecture that simply manage SAN storage “under the covers” have some disadvantages:•Capacity scalability is limited to a single device: Traditional NAS systems are based on a scale-up architecture — where a finite amount of storage capacity is added within an individual storage device. However, a volume or LUN is often the limiting factor to scalability, typically at 2 to 16 TBs.•Single point of failure: If the device or head fails, access to VMs may be lost. Traditional failover clustering options provided by NAS vendors is often not sufficient.•Management complexity at scale: While traditional NAS systems are relatively easy to setup and configure, they become complicated to manage in large numbers. As more ESX hosts require storage, multiple file systems and mount points need to be provisioned and managed across multiple storage devices — each of which representsa separate management point.•Performance scalability is limited to a single device: While individual NAS heads may provide adequate performance for a limited number of VMs ( like a traditional server), at some point the NAS system will run out of performance resources depending on the number of VMs (and associated application workloads) stored on the device.•Unsupported Features: Some features for ESX Server (or through vSphere) may not be available using traditional NAS-based solutions. These include the ability to boot ESX server from SAN, Raw Device Mapping (RDM) for accessing SAN LUNs directory, Microsoft server clustering services, and VMWare SRM 1.0 (however SRM 1.1 will support NAS storage).ing Isilon Scale-Out NAS with VMwareThe Isilon IQ platform is a “Scale-out” approach to NAS storage and offers significant reductions in management overhead by simplifying management of the cluster through a single, common management point. In an Isilon cluster, petabytes of storage can be admininstered in a single file system instead of many small islands of storage.Isilon s OneFS® operating system combine the conventional, separate layers for RAID, volume management and file system into one unified software layer, creating a single symmetric cluster file system that spans all nodes within a cluster.Scalability and PerformanceEach Isilon IQ node contains disk capacity, CPU, memory and network connectivity. As additional Isilon IQ nodes are added to a cluster, all aspects of the cluster scale symmetrically, including capacity, throughput, memory, CPU and network connectivity. In contrast to traditional NAS designs, adding capacity to an Isilon node does not create bottlenecks with other system resources. Isilon IQ offers aggregate throughput from a single file system of up to 45 GB/second, with up to 5.2 petabytes of storage.Availability and ReliabilityIsilon IQ is a fully distributed architecture where all nodes work together to form a unified file system, tolerant of any component failure, including entire nodes. The Isilon file system goes beyond traditional RAID to protect against multiple failures in a cluster without losing data availability, and leverages the compute power of all nodes to deliver fast drive rebuild times. In addition, Isilon IQ clusters provide flexible protection levels on a file-by-file basis, protecting files independently from the location where they are stored. Additionally, with local data protection available with the Isilon SnapshotIQ application and seamless NFS failover available with Isilon SmartConnect, Isilon provides high availability and reliability that is required for a virtualized datacenter.4.Isilon IQ Cluster ConfigurationThis section provides requirements and best practices for configuring an Isilon cluster for use with an ESX Server.Cluster Configuration Considerations•When an ESX datastore is created, the directory where the datastore will be pointed to must already exist.•Take care with directory ownership as in most cases, the directories for VM images will be created locally on the cluster by the root user. By default, root access to the cluster over NFS is limited by mapping the root user to the user nobody. If the directory is created by root, and the ownership isn t changed, the ESX server(s) can t write to the directory. Write access can be assured by one of two methods:ing chown to change the owner to nobody, i.e. chown nobody:wheel <directory>ing the NFS Exports page in the WebUI to map root access to the root user. However, this is notrecommended as it can be a security hole.•By default, the cluster s NFS write commit behavior is set to synchronous. This ensures every write operation written by a VM is committed to disk as soon as possible. This extra level of data consistencyincurs per operation latency overhead and may not be necessary for many virtualized applications. Disabling synchronous writes may increase performance but be careful when determining if applications can support asynchronous writes.•To configure this behavior:1.From the WebUI select File System File Sharing Services Configure NFS2.On the Configure NFS page, select Synchronous or Asynchronous in the Write commit behaviorsection.3.Click Save.Figure 1 - ESX Datastore on Isilon IQ ClusterIsilon Networking ConceptsFlexNet 2.0FlexNet™ is the OneFS subsystem used for configuring and managing network interfaces. With the introduction of OneFS 5.0, major improvements were made to FlexNet, now in version 2.0.•FlexNet 2.0 is designed to support complex and variable network topologies. It has several hierarchical and overlapping management objects that allow for extremely flexible configurations, simply defined and managed. •FlexNet 2.0 is tightly integrated with Isilon SmartConnect to provide increase network connectivity and availability, as well as easy management.The following terms are important for understanding the operation of FlexNet 2.0:1.Subnet – Specifies a network subnet, netmask, gateway and other parameters related to layer-3 networking.VLAN tagging is configured here. A subnet contains one or more pool objects, which assign a range of IPaddresses to network interfaces on the cluster nodes.2.Pool – Also referred to as an IP Address Pool, containing one or more network interfaces (e.g. External-1) and aset of IP addresses to be assigned to them. SmartConnect settings, such as the zone name and allocating IPs statically or dynamically, are also configured at the Pool level.3.SmartConnect provides the ability to distribute client connections across a set of IPs in the pool based on acommon DNS name. SmartConnect Advanced Dynamic IP allocation allows the IPs in the dynamic pool to migrate across all interface members in the pool and failover from one member to another in case of an interface or complete node failure.4.Provisioning Rule – Specifies subnet and IP pool assignment actions when a node is added to the cluster, basedon the node type and interface. For example, a rule could state when a node of type storage is added, External-1 and External-2 are assigned to two different pools, which in turn belong to two separate subnets.Figure 2 - FlexNet 2.0 Pools, Subnets and RulesFlexNet 2.0 and SmartConnectFlexNet 2.0 is tightly coupled with SmartConnect, the OneFS client load-balancing and failover application. At the subnet level, the SmartConnect Service IP Address, formerly known as the Virtual IP (VIP), is specified. This is the IP address used primarily by a DNS server to forward SmartConnect zone lookups to the cluster.SmartConnect options having to do with zone name, load balancing and failover are set at the pool level. Different pools inside the same subnet can have different configurations for different use cases.NOTE: There are limitations to using SmartConnect with Virtual Center. Please see “Limitations of Virtual Center and SmartConnect”, below, for details.Isilon Network ConfigurationInitial ConfigurationFlexNet 2.0 introduces a new process for configuring external networking. When initially configuring a cluster, the first external network interface (typically External-1) is setup as part of the configuration process. In order for this process to complete successfully, the following information is required:•Netmask•IP address range•Default gateway•Domain name server list (optional)•DNS search list (optional)•SmartConnect zone name (optional)•SmartConnect service address (optional)When this information is provided, the following actions occur:• A default external subnet is created, named subnet0, with the netmask and optional SmartConnect service address.• A default IP address pool is created, named pool0, with the specified IP address range, the gateway, the optional SmartConnect zone name, and the initial external interface with the first node in the cluster as the only member. • A default network provisioning rule is created, named rule0, which automatically assigns the first external interface for all newly added nodes to pool0.•pool0 is added to subnet0 and configured to use subnet0 as its SmartConnect service address.•The global outbound DNS settings are configured with the optional domain name server list and DNS search list, if provided.Upgrade ConfigurationWhen an Isilon cluster is upgraded to OneFS v5.0, the following external networking and connection balancing configuration changes will automatically occur:Each FlexNet profile in earlier versions of OneFS will be transformed into a subnet. You can view the new subnets by clicking Networking on the Cluster menu in the web administration interface.In a simple external network configuration consisting of one SmartConnect zone with dynamic IP addresses, the upgrade will retain all settings from the earlier OneFS version including dynamic IP addresses that were part of the FlexNet profile, the load balancing policy, the SmartConnect zone name, and the interface members.If your Isilon cluster contains multiple SmartConnect zones with both dynamic and static IP addresses, after upgrading to OneFS v5.0 all the dynamic IP addresses will be consolidated into one SmartConnect zone and all the static IP addresses will be consolidated into a second SmartConnect zone. External network settings can be edited using the WebUI or CLI. Figure 3 shows the Edit Subnet page from the WebUI.Figure 3 - WebUI Subnet ConfigurationVLAN TaggingVirtual LANs (VLANs) are used to logically group together network endpoints, and to partition network traffic, e.g. for security. VLANs are tagged with a unique identifier to segregate traffic. FlexNet 2.0 supports VLAN tagging for use in external networks using VLANs. In FlexNet, VLANs are configured at the Subnet level.Configuring Link AggregationIsilon OneFS supports the use of redundant NICs to provide layer-2 failover. OneFS link aggregation supports the IEEE 802.3ad static LAG protocol, and works with switches and clients that support this protocol.Note: OneFS uses link aggregation primarily for Network Interface Card (NIC) failover purposes. Both NICs are used for client I/O, but the two channels are not bonded into a single 2 Gigabit link. Each NIC is serving a separate TCP connection.Link Aggregation Switch SupportIsilon network link aggregation requires 802.3ad static support and proper configuration on the switch. Cisco switches offer this support using the EtherChannel feature. It is highly recommended to configure cross-stack EtherChannel to provide protection against switch failures as well as NIC failures,Link aggregation, can be configured for a new subnet or an existing one, requires creating an IP pool with the aggregated interface on each node as the pool s members:1.On the Edit Subnet page, at the top of the IP Address Pools section, click the Add pool link.2.In the Create Pool wizard, enter a name for the pool, and optional description, and a range of IP addresses touse for this pool. Click Next3.If SmartConnect is used, options for the pool can be set on the next page of the wizard. Once these options havebeen selected, click Next.4.On the next page, the interfaces to be members of this pool are selected. To use link aggregation, select the ext-agg interface for each node to be in the pool. The interface type is also listed as AGGREGATION.5.Click Submit to complete the wizard.Note: Link aggregation provides protection against NIC failures but does not increase performance. A recommended alternative is to assign both NICs in a node to the same dynamic IP pool to gain both performance increase and NIC failure redundancy through dynamic IP failover. This is covered section to follow, High-Availability with NFS Failover.Configuring SmartConnectLimitations of Virtual Center and SmartConnectDue to the way Virtual Center manages datastore location paths, it does not support a DNS infrastructure in which a hostname is bound to multiple IP addresses, which is required for SmartConnect zone names to work for datastore creation and use. This means datastores must be created using the IP addresses of a cluster.This limitation does not preclude the use of dynamic IP addresses to implement NFS failover on the cluster. NFS failover is supported with VI 3 and vSphere when the dynamic IP addresses of cluster nodes are used.High Availability with NFS FailoverHow NFS Failover WorksSmartConnect implements NFS failover by assigning one or more dynamic IP addresses to each node in the cluster from a configured range of addresses. If a single interface or an entire node experiences a failure, SmartConnect moves the dynamic IP address to the remaining interfaces or to another node. Any I/O taking place to the failed node continues without interruption.When a node s interface or the entire node is brought back online the dynamic IPs in the pool are redistributed across the new set of interface pool members. This failback mechanism can occur automatically or manually.NFS failover is configured at the FlexNet pool object level, either at the time the pool is created, or by changing the pool settings on the network configuration page of the WebUI. Please see the OneFS 5.0 User Manual for specific steps.When a VMware NFS datastore is created, the dynamic IP address of the node is used, not the static IP address assigned to the node during initial cluster configuration. In case of an NFS failover, NFS datastore traffic continues uninterruptedly to the new storage interface assigned with the dynamic IP that the datastore path is defined on. In case of failback, the dynamic IPs on the storage cluster are redistributed, again, without interruption to the NFS datastore traffic.Isilon Network Design Best PracticesIsilon has developed a network topology that provides maximum performance, flexibility and availability for VMware installations. The design is in effect, a mesh connectivity design, in which every ESX server is connected to every IP address on a cluster, up to configuration maximums (see next section). Connecting “everything to everything” enables the following capabilities:•Since by definition all servers are connect to all datastores, VMotion between all ESX servers can be performed knowing that both servers can see the same datastore and thus the migration will be successful.•VMs can be created on different datastores to balance the I/O load between ESX servers and the cluster;these can be easily moved between datastores to eliminate hot spots. The more NFS datastores arecreated, the more TCP connections an ESX host can leverage to balance VM I/O.Figure 4 illustrates an example of this recommended configuration. Each ESX host has a primary datastore, with secondary connections to additional datastores located on the cluster.Figure 4 – multiple datastores on a single NFS volumeIncreasing Performance with the Maximum Number of NFS MountsIn VMware every NFS datastore represents a separate TCP connection to the NFS server, increasing aggregate storage throughput by parallelizing VM I/O to the storage system.With the Isilon clustered storage architecture, multiple NFS datastores can all point to the same NFS mount on the Isilon cluster through different (preferably dynamic) IPs. These multiple NFS datastores can also share a single pool of storage granting each datastore access to all VMs, allowing an administrator to quickly register and unregister a VM across datastores. This single pool of storage increases availability, performance and adaptability to changing performance requirements and growth.When this topology is implemented with larger numbers of ESX servers and/or cluster IP addresses it may be necessary to increase the number of NFS mounts available to an ESX server machine from the default of eight.1.In the VI console, select the ESX Server, then select the Configuration tab.2.In the Software section, select Advanced Settings.3.In the Advanced Settings dialog, select NFS from the left-side list.4.Locate the setting NFS.MaxVolumes, then set the value to a number between 8 and 32 inclusive.5.Click OK.5.ESX ConfigurationThis section details the steps necessary to configure ESX Server for use with Isilon storage. Follow the below steps to configure a network between the ESX server machine and the Isilon cluster.Creating a Virtual SwitchThe first step is to create a virtual switch for all network traffic between the ESX server machine and the Isilon cluster.1.In the VMware Infrastructure or vSphere Client, select the ESX server machine in the left-side tree view, thenselect the Configuration tab in the right-side pane.2.Under Hardware, select Networking, then select Add Networking.3.In the Add Network Wizard, in the Connection Types section, select VMkernel, then click Next.4.On the Network Access screen, select Create a virtual switch, or select an existing virtual switch. Click Next.Best practice: Create the virtual switch using at least one dedicated network interface card (NIC) for networkstorage traffic. This will ensure good performance, as well as isolate any problems from other traffic.5.On the Connection Settings screen, enter a network label and optional VLAN ID. It s often helpful to give thevirtual switch a meaningful label, such as “NFS Storage”.Note: For more information on VLAN usage in ESX Server, see the VMware whitepaper VMware ESX Server 3 802.1Q VLAN Solutions.6.In the IP Settings section, enter an IP address and subnet mask for the VMkernel port.7.If necessary, click the Edit button to change the default gateway. Click Next to go to the Summary screen.8.On the Summary screen, review the settings, and if correct, click Finish.9.Figure 2 provides an example configuration with virtual machine and VMkernel networks using separate physicalNICs.Figure 5: Example Network ConfigurationConfiguring a Service Console (V3 Only)It s important to configure a service console on the virtual switch you just created. Without a service console, it is possible for the ESX server machine to lose connectivity to storage located on the virtual switch. This step is NOT necessary for vSphere and ESX 4.01.In the VI Client, on the Configuration tab for the ESX server machine, select Properties next to the virtual switchthat you just created.2.In the Properties dialog, on the Ports tab, click Add.3.In the Add Network Wizard, in the Connection Types section, select Service Console, then click Next.4.On the Connection Settings screen, enter a network label and optional VLAN ID.5.The console can be given a static IP address or obtain one via DHCP, then click Next.6.On the Summary screen, review the settings, and if correct, click Finish.Using JUMBO FramesBest practice: Islon recommends using JUMBO frame with MTU 9000 rather than the default 1500. This requires both the ESX NIC and Switch to support JUMBO Frames. Isilon storage nodes (4.7 and above) already support JUMBO frames. Enabling JUMBO frames on ESX is performed through the ESX service console CLI:1.Assuming the VMKernel port of NFS Storage is created on vSwitch1 run the following command line:esxcfg-vswitch -m 9000 vSwitch12. A quick run of “esxcfg-vswitch -l” (that s a lowercase L) will show the vSwitch s MTU is now 9000; in addition,“esxcfg-nics -l” (again, a lowercase L) will show the MTU for the NICs linked to that vSwitch are now set to 9000 as well.3.Create a VMkernel interface with JUMBO frames (unfortunately an existing VMKernel switch cannot be updatedto use JUMBO frames). This step is a bit more complicated, because we need to have a port group in place already, and that port group needs to be on the vSwitch whose MTU we set previously:esxcfg-vmknic -a -i 172.16.1.1 -n 255.255.0.0 -m 9000 “NFS Storage”4.This creates a port group called “NFS Storage” on vSwitch1—the vSwitch whose MTU was previously set to9000—and then creates a VMkernel port with an MTU of 9000 on that port group. Be sure to use an IP address that is appropriate for your network when creating the VMkernel interface.5.Go back to the Isilon Cluster Network Management WebUI page and make sure the subnet MTU is set to 90006.Setup your Switch to support JUMBO frame traffic. On a cisco catalyst 3750 switch the command is:system mtu jumbo 90007.To test that everything is working so far, use the vmkping command from the ESX service console:vmkping -s 9000 172.16.1.20Configuring Link AggregationLink aggregation, also known as NIC failover or NIC teaming, is one approach to ensuring higher network availability between the ESX server and Isilon cluster. NIC teaming is a layer-2 IEEE standard known as 802.3ad. Perform the following steps to configure NIC teaming on an ESX server.NOTE: NIC teaming requires that both NICs involved in the team are on the same subnet.1.If two NICs are not configured in the virtual switch, add a second NIC by selecting Properties on theConfiguration tab for the ESX Server.2.On the Properties dialog, select the Network Adapters tab, then click Add. Follow the instructions in the AddAdapter wizard.3.After adding the second NIC, the virtual switch diagram will look like Figure 5.Figure 6: NIC Teaming Configured on ESX host•Once the second NIC is added to the virtual switch, teaming is enabled using a default configuration. To change NIC teaming options, select Properties... for the virtual switch.。
比较分析类英语作文标题Title: A Comparative Analysis of Different Types of Renewable Energy Sources。
Introduction:In today's world, the pursuit of renewable energy sources has become paramount due to environmental concerns and the finite nature of fossil fuels. This essay aims to compare and analyze various types of renewable energy sources, including solar, wind, hydroelectric, and biomass energy, highlighting their advantages, disadvantages, and potential applications.Solar Energy:Solar energy harnesses the power of sunlight through photovoltaic cells or solar thermal systems. One major advantage of solar energy is its abundance andaccessibility in most regions. Additionally, solar panelshave minimal maintenance requirements and can be installed on rooftops or in large solar farms. However, solar energy generation is intermittent and depends on weather conditions and daylight availability.Wind Energy:Wind energy utilizes wind turbines to convert kinetic energy from the wind into electricity. Wind power is a clean and renewable energy source with relatively low operational costs once the turbines are installed. Wind farms can be established both onshore and offshore, tapping into different wind patterns. However, wind energy is also intermittent, and turbine noise and visual impact may pose challenges, particularly in densely populated areas.Hydroelectric Energy:Hydroelectric power generates electricity by harnessing the energy of flowing water through dams or run-of-river systems. It is a mature technology with proven reliability and long operational life spans. Hydroelectric plantsprovide steady and predictable power generation, makingthem suitable for base-load electricity demand. Nonetheless, large-scale dam projects can have significant environmental and social impacts, including habitat disruption and displacement of communities.Biomass Energy:Biomass energy is derived from organic materials suchas wood, crop residues, or municipal solid waste. It can be utilized for heat generation, electricity production, or biofuel production. Biomass is considered carbon-neutral since it recycles carbon dioxide absorbed during the growth of plants. However, concerns regarding land use,competition with food crops, and emissions from combustion processes exist. Moreover, inefficient biomass production methods may lead to deforestation and habitat loss.Comparison and Analysis:When comparing these renewable energy sources, several factors need to be considered, including environmentalimpact, reliability, scalability, and cost-effectiveness. Solar and wind energy have minimal environmental impact during operation but may require significant land or space for deployment. Hydroelectric energy offers reliable power generation but faces challenges related to habitatalteration and social displacement. Biomass energy can be versatile but raises concerns about sustainability and emissions.In terms of reliability, hydroelectric power stands out for its consistent output, followed by solar and wind energy, which are dependent on weather conditions. Biomass energy can provide a reliable heat source but may not be suitable for large-scale electricity generation due to feedstock availability and processing limitations.Scalability varies among these renewable energy sources. Solar and wind energy are highly scalable and can be deployed at various scales, from individual households to utility-scale installations. Hydroelectric projects require suitable geographic locations and substantial investment, limiting their scalability. Biomass energy scalabilitydepends on feedstock availability and processing infrastructure.Cost-effectiveness is influenced by factors such as technology maturity, resource availability, and regulatory incentives. Solar and wind energy costs have decreased significantly in recent years, making them competitive with fossil fuels in many regions. Hydroelectric projects often involve high upfront costs but offer long-term economic benefits. Biomass energy costs can vary widely depending on feedstock availability and processing technologies.Conclusion:In conclusion, each renewable energy source has its strengths and limitations, and the optimal choice depends on factors such as geographic location, resource availability, and policy frameworks. Solar and wind energy are rapidly expanding due to declining costs and technological advancements. Hydroelectric power remains a significant contributor to global electricity generation, particularly in regions with suitable geography. Biomassenergy offers a versatile solution but requires careful consideration of sustainability and environmental impacts. By diversifying our energy mix and investing in renewable technologies, we can mitigate climate change and ensure a sustainable energy future.。
英文微服务参考文献English Microservices Reference LiteratureMicroservices have become a widely adopted architectural style in the development of modern software systems. This approach to software design emphasizes the decomposition of a large application into smaller, independent services that communicate with each other through well-defined interfaces. The concept of microservices has gained significant traction in the industry due to its ability to address the challenges posed by monolithic architectures, such as scalability, flexibility, and maintainability.The microservices architectural style has its roots in the principles of service-oriented architecture (SOA) and the idea of breaking down complex systems into more manageable components. However, microservices take this concept further by emphasizing the autonomy and independence of each service, as well as the use of lightweight communication protocols and the adoption of a decentralized approach to data management.One of the key benefits of microservices is the ability to scale individual services independently, allowing for more efficientresource utilization and the ability to handle increased traffic or workloads in specific areas of the application. This scalability is achieved through the deployment of individual services on separate infrastructure resources, such as virtual machines or containers, and the use of load-balancing mechanisms to distribute the workload across these resources.Another advantage of microservices is the increased flexibility and agility in software development. With each service being independent and loosely coupled, teams can work on different services concurrently, using different programming languages, frameworks, and deployment strategies. This allows for a more rapid and iterative development process, where new features or improvements can be introduced without disrupting the entire application.Maintainability is another significant benefit of the microservices architecture. By breaking down a large application into smaller, independent services, the codebase becomes more manageable, and the impact of changes or updates is localized to individual services. This reduces the risk of unintended consequences and makes it easier to identify and address issues within the system.However, the adoption of microservices also introduces new challenges and complexities. The need for effective communicationand coordination between services, the management of distributed data, and the complexity of monitoring and troubleshooting a distributed system are just a few of the challenges that organizations must address when implementing a microservices architecture.To address these challenges, a variety of tools and technologies have been developed to support the development, deployment, and management of microservices. These include service discovery mechanisms, API gateways, message brokers, distributed tracing systems, and container orchestration platforms, among others.One of the most prominent examples of a microservices-based architecture is the Netflix platform. Netflix has been a pioneer in the adoption of microservices, using this approach to build a highly scalable and resilient streaming platform that can handle millions of concurrent users. Netflix has also contributed significantly to the open-source community by releasing several tools and frameworks that facilitate the development and management of microservices, such as Eureka (a service discovery tool), Hystrix (a circuit breaker library), and Zuul (an API gateway).Another well-known example of a microservices-based architecture is the PayPal platform. PayPal has leveraged the microservices approach to modernize its legacy systems and improve the agility and scalability of its payment processing services. By breaking downits monolithic application into smaller, independent services, PayPal has been able to respond more quickly to changing market demands and customer needs.The adoption of microservices has also been prevalent in the e-commerce industry, where companies like Amazon and eBay have used this architectural style to build highly scalable and resilient platforms that can handle large volumes of transactions and user traffic.In the healthcare sector, microservices have been used to build integrated patient management systems that bring together various clinical and administrative services, such as appointment scheduling, medical records management, and billing. This approach has enabled healthcare providers to more easily integrate new technologies and services into their existing systems, improving the overall quality of patient care.The financial services industry has also embraced the microservices architecture, with banks and fintech companies using this approach to build flexible and scalable platforms for managing various financial products and services, such as lending, investment, and insurance.As the adoption of microservices continues to grow, the need forcomprehensive reference literature on the subject has also increased. Numerous books, articles, and online resources have been published to provide guidance and best practices for the design, implementation, and management of microservices-based systems.Some of the key areas covered in the microservices reference literature include:1. Architectural Patterns and Design Principles: Discussions on the fundamental principles and patterns that underpin the microservices architecture, such as the use of bounded contexts, event-driven communication, and the Strangler Fig pattern.2. Communication and Integration: Exploration of the various communication protocols and integration patterns used in microservices, including REST APIs, message queues, and event-driven architectures.3. Deployment and Orchestration: Examination of the tools and techniques used for the deployment and management of microservices, such as container technologies (e.g., Docker), orchestration platforms (e.g., Kubernetes), and continuous integration/continuous deployment (CI/CD) pipelines.4. Resilience and Fault Tolerance: Strategies for building resilient andfault-tolerant microservices, including the use of circuit breakers, retries, and fallbacks, as well as the implementation of distributed tracing and monitoring systems.5. Scalability and Performance: Discussions on the approaches to scaling microservices, such as horizontal scaling, load balancing, and the use of caching and asynchronous processing techniques.6. Data Management: Exploration of the challenges and best practices for managing data in a distributed microservices architecture, including the use of event sourcing, CQRS (Command Query Responsibility Segregation), and polyglot persistence.7. Security and Governance: Examination of the security considerations and governance models for microservices, such as authentication, authorization, and the management of API versioning and deprecation.8. Observability and Monitoring: Discussions on the tools and techniques used for monitoring and troubleshooting microservices-based systems, including distributed tracing, log aggregation, and metrics collection.9. Testing and Debugging: Exploration of the approaches to testing and debugging microservices, including the use of contract testing,consumer-driven contracts, and chaos engineering.10. Organizational and Cultural Considerations: Examination of the organizational and cultural changes required to support the successful adoption of a microservices architecture, such as the shift towards cross-functional teams, DevOps practices, and a culture of continuous improvement.The microservices reference literature provides a comprehensive guide for software architects, developers, and operations teams who are looking to design, implement, and manage microservices-based systems. By drawing on the collective experience and best practices of the industry, this literature helps organizations navigate the complexities and challenges associated with the adoption of a microservices architecture, ultimately enabling them to build more scalable, flexible, and resilient software systems.。
云计算节点服务器英语单词Cloud computing node serverA cloud computing node server refers to a physical or virtual server that is used to deploy and run cloud computing services. These servers are an essential component of cloud infrastructure, providing the processing power, storage, and networking capabilities needed to support cloud-based applications and services.Cloud computing node servers are typically deployed in data centers, where they are connected to the internet and other networking infrastructure. They are designed to be highly scalable and flexible, allowing cloud service providers to quickly add or remove server capacity as needed to meet changing demand.These servers are also often used in virtualized environments, where multiple virtual servers can run on a single physical node. This allows for greater resource utilization and efficiency, as well as easier management of server resources.In terms of hardware, cloud computing node servers are typically built using standard server hardware, such asmulti-core processors, large amounts of memory, and high-speed storage devices. They are often designed to be highly reliable, with redundant power supplies, cooling systems, and network connections to ensure continuous operation.From a software perspective, cloud computing nodeservers often run a variety of virtualization and cloud management software, such as VMware, OpenStack, orMicrosoft Hyper-V. This software allows the servers to be managed and provisioned dynamically, and to support the deployment of virtual machines and cloud-based applications.In addition to these technical considerations, cloud computing node servers also play a crucial role in ensuring the security and privacy of cloud-based services. Theyoften include features such as encryption, secure access controls, and monitoring and logging capabilities toprotect sensitive data and ensure compliance with privacy regulations.Overall, cloud computing node servers are a fundamental building block of cloud infrastructure, providing theessential computing resources needed to support a widerange of cloud-based applications and services.云计算节点服务器云计算节点服务器是指用于部署和运行云计算服务的物理或虚拟服务器。
Highly Available Distributed RAM (HADRAM): ScalableAvailability for Scalable Distributed Data StructuresDamian Cieslicki, Stefan Schäckeler, Thomas Schwarz, S.J.Department of Computer EngineeringSanta Clara University500 El Camino Real, Santa Clara, CA 95053dcieslicki@, sschaeck@, TJSchwarz@AbstractWe propose that the challenges in the design and implementation of an SDDS can besignificantly eased by separating the design of the scalable high-availability part(HADRAM) from the design of the SDDS proper. We have designed and partiallyimplemented a HADRAM system that allows measurement to prove the validity of theconcept. All other highly available SDDS provide failure tolerance at the record level,whereas HADRAM provides it at the memory level.1 IntroductionWith the advent of high-speed, high bandwidth computer networks, clusters of computers, also known as multicomputers, have emerged as a technology with excellent performance / cost ratio. These systems need new data structures that have performance independent of the number of nodes in the multicomputer, i.e. a Scalable Distributed Data Structure (SDDS. A number of SDDS) was developed, most prominently LH* [LNS96]and RP* [LNS94] that use the collective RAM (or sometimes disk storage) of the nodes to provide seemingly unlimited storage at attractive speeds. As the number of nodes increases, availability and reliability of nodes emerges as a new problem. If for example a system uses 100 nodes each available at the five nines level (up 99.999% of time), then the combined system is only available at the three nines level (up 99.9% of time). In response, failure tolerant (or to be more precise: unavailability tolerant) SDDS were developed such as LH*g, LH*m, LH*RS [LN95, L98, LMS05 ] The latter even implements scalable availability, that is, the level of protection of data against node unavailability grows with the number of nodes to achieve a high system-wide availability of all the data. We propose here Highly Available Distributed Random Access Memory (HADRAM) as a layer that provides scalably-available data storage in the combined RAM of the multicomputer, on which arbitrary SDDS structures can be build. High availability becomes then a generic module for all SDDS. In a manner of speaking, what Boxwood [M&al05] proposes for distributed disk storage, we propose for SDDS.HADRAM is similar to LH*RS and a comparison between the two shows the design goals of HADRAM. LH*RS achieves scalable availability by changing the basic LH* data structure through the addition of parity records and parity buckets. If one were to port scalable availability to another SDDS, e.g. RP*, then one would have to change re-implement RP* completely to obtain RP*, and so on for every SDDS. Implementing a functioning prototype for LH*RS took more than three programmer years and there is no reason to hope that implementing RP* would be any faster. In contrast, we try to achieve scalable high-availability for all SDDS with HADRAM.Our contribution here is three-fold. First, we give the API for HADRAM. Second, we report on our current implementation of the HADRAM core that in our opinion shows that an SDDS such as LH* on top of HADRAM should be about as fast as the customized scalable, high-availability version of the SDDS such as LH*RS. Third, we report on some improvements we made to the messaging layer and to the erasure coding that also benefit LH*RS. Our work is not finished. Ultimately, our goal is to run LH* on top of HADRAM and prove (or disprove) that its functionality is essentially the same as LH*RS. HADRAM has greatly benefited from the work on LH*RS and represents a different design strategy.2 Scalable Distributed Data StructuresAn SDDS stores data in the distributed RAM of the nodes of a multicomputer while offering access to these data in constant time independent of the number of nodes over which the data is spread. For this reason, they cannot have a central look-up scheme, which would turn into a bottleneck if the system grows too much. However, it is possible to have a central coordinator, as long as the coordinator is only rarely invoked. When the SDDS file (the collection of the data administered by the SDDS changes size) then data is moved to new nodes or data is evacuated from some nodes and moved to others. A good SDDS design will only move data sparingly so that data can be found quickly. For this reason, clients usually do not have an accurate picture of where data is, but the SDDS have a mechanism that allows updating the client’s picture so that the same mistake is not made twice.Most SDDS present a simple interface to any application that uses them. They store data in a file made up of records, indexed by a key, and provide at least access (insert, delete, update, read) by key as well as parallel search. An SDDS is implemented as a middleware (Figure 1). Clients run an application which shares an SDDS file. The clients access the files by sending their requests to the local SDDS client, which then uses the SDDS protocol to forward the request eventually (but usually directly) to the correct SDDS server running on one node of the multicomputer. The SDDS server itself stores the data in an SDDS bucket. It is possible that SDDS clients and SDDS servers reside on the same node and that SDDS servers administer more than a single SDDS bucket. The designer / implementer of an SDDS thus has a complicated design that includes the design of the client interface, the design of the SDDS including the addressing algorithm, the movement of data when the SDDS file grows or shrinks, the update of the local image of the state of the SDDS at the client, etc, the design of the communication between clients and servers, and the design of the SDDS bucket. When adding the requirement of high availability, especially scalable availability, then the challenges are indeed great.Figure 1: SDDS as Middleware3 Related WorkHigh-speed, high-capacity networks (including the Internet) enable cluster computing. To be successful, multicomputer need to provide service properties that include the ability to scale to large, rapidly growing user populations, high availability in the face of partial failures, consistency maintenance of users' data, and operational manageability. In response to these needs, Distributed Scalable Data Structures (SDDS) were proposed in 1993, [LNS93]. SDDSs gave rise to an important research effort that resulted in several algorithms, papers and implementations. It was shown that SDDS files are able to scale to thousands of sites and to terabytes in distributed RAM, with constant access performance, and search times under a millisecond. Multi-key searches that require an hour or so in a traditional file, e.g., a k-d file, may succeed in less than a second in an SDDS file [LN95, L96]. All these properties are of prime importance especially in the DBMS design arena, [ASS94]. Some SDDS are:SDDSs for hashed files. These extend the more traditional dynamic hash data structures, especially linear hashing, [L80], [SPW90] and dynamic hashing [L88], to multicomputers [D93], [KLR96], [LNS93], [LN96], [VBWY94]. The basic algorithm for such SDDSs, LH*, has found recognition in the computer literature [K98], [R98].SDDSs for ordered files. These extend traditional ordered data structures, B-trees or binary trees, to multicomputers, [LNS94], [KW94].SDDSs for multi-attribute (multi-key) files. These algorithms extend the traditional k-d data structures [S89] to multicomputers, [LN95], [N95], [L96]. Performance of multi-attribute search can improve by several orders of magnitude.High-availability SDDSs. These structures are designed to transparently survive failures of server sites. They typically apply principles of mirroring, or of striping, or of grouping of records, revised to support the file scalability [LN95], [LN96], [LR97], [LMR98][LMS05].There are several implementations of the SDDS structures. An interesting implementation of high-availability variant of LH* implemented in Java is presented in [L98]. Finally, [R98] discusses an application of another variant of LH* to telecommunication databases.The Boxwood project [M&al05] at Microsoft Research explored the feasibility and utility of providing high-level abstractions or data structures as the fundamental storage infrastructure fulfilling basic requirements such as redundancy and backup to tolerate failures, expansion mechanisms for load and capacity balancing, and consistency maintenance in the presence of failures. Boxwood targets a multicomputer with locally attached storage. Each node runs the Boxwood software components to implement abstractions and other services on the disks attached to the system. Failure tolerance results from using chained-declustered replication [HD90]. We make Boxwood’s central thesis our own and apply it to primarily RAM-based SDDS.4 Highly Available Distributed RAM (HADRAM)Storing data in RAM is fast, even over a network, but also expensive. As we need to store data redundantly in order to recover from node failure or node unavailability, we therefore need to use the less storage intensive redundancy offered by erasure correcting codes. A large number of erasure correcting codes are available and their properties are well documented. In our scheme, we place m buckets on m different nodes in a reliability group and add to these m data buckets k additional buckets – the parity buckets – containing parity data calculated from the m data buckets. This is the same redundancy mechanism as in LH*RS (provided we use a generalized Reed-Solomon code), but in contrast, HADRAM treats buckets as a big, flat chunks of main memory without any internal organization.HADRAM is designed to be the highly available memory layer of an SDDS. It provides memory allocation and de-allocation as well as failure tolerance and simultaneous atomic writes. The SDDS server stores its data in a bucket on HADRAM provided memory. The bucket can be organized in any way the SDDS designer chooses, as a Linear Hash file, a B+-tree or a link tree, etc. (In the spirit of modularization, the design of the SDDS bucket can be “off-the-shelf”.) When the bucket operation demands more memory, it requests it from the HADRAM layer through a malloc-like call, when the memory is no longer needed, then it frees it by a call to the HADRAM layer that operates like the “free”-operation. The SDDS server can request an adjustment of the availability provided by the HADRAM layer.To detect unavailable buckets, the n = m + k buckets in a HADRAM reliability group monitor each other. If they detect node unavailability, they collectively react to it. The central part of the reaction is to rebuild the lost data on another node in the SDDS. If a parity bucket is lost, then rebuilding it is all that needs to be done and the occurrence is transparent to the SDDS layer. In order to find a place for the replacement bucket, HADRAM can contact a HADRAM coordinator or can use a broadcast message. If a data bucket is lost, then the SDDS layer needs to be involved since requests need to be redirected from the unavailable to the new data bucket.We summarize the interaction between the SDDS and the HADRAM layer in . The bulk of the SDDS Server – HADRAM interaction are memory interactions such as allocate HADRAM memory, free HADRAM memory, read HADRAM memory, and write HADRAM memory. These need very little explanation besides the HADRAM design that supports them (Section 5). Note however that an operation such as a record insert changes many different memory locations. For example, if we insert in to a LH bucket implemented according to Ellis [E87] and the insert results in an LH bucket split, then we reset quite a number of pointers and deallocate and allocate memory. For performance reasons, these changes shouldbe processed by the HADRAM system in a single, atomic step; since changes in a HADRAM bucket need to be processed at all k parity buckets and involve at least one network round-trip. Because the SDDS server stores data in a local HADRAM data bucket, all reads become local and only writes incur network delays.Figure 2: HADRAM – SDDS Server Layers at a Node.The high-availability interactions impose changes in the design of the SDDS server, which we now describe. If an SDDS file grows or shrinks, it needs to adjust the degree of availability of individual buckets. HADRAM can increase the availability by adding a parity bucket and decrease it by releasing a bucket. It can also change it by combining two HADRAM reliability groups ((m + k) + (m + k) → (2m + k)) or splitting them. However, this also changes the performance of writes, which need to percolate to all parity buckets, which in turn see an increase or a decrease in write requests. In addition, it involves the SDDS layer in locating a group to merge with. The SDDS can communicate its need for a change in reliability by giving a command to HADRAM.The HADRAM layer needs to allocate replacement buckets in case of unavailability. In addition, if the SDDS creates a new data bucket, it needs to either find a slot in an existing HADRAM reliability group or create a new one. Depending on the needs of the SDDS, we might have to provide these services through a central HADRAM coordinator.5 HADRAM ImplementationAs we have seen, HADRAM implements redundant, distributed storage in the main memory of a cluster of commodity computers. We implemented its basic interchange with the SDDS bucket and in particular investigated a messaging architecture that provides transactional updates even in the presence of (limited) message loss.5.1 HADRAM ReliabilityInternally, HADRAM data buckets are placed in reliability groups to which parity buckets are added. The size of the data buckets is quite large, so that reconstruction of unavailable buckets is done in bulk. In order to provide transactional updates to the local SDDS bucket, HADRAM allows many writes to be grouped in a single transaction, even in the presence of failures. Bundling not only speeds up communications, but also simplifies the memory interface design. A write request to the HADRAM layer is simply a list of offsets within the HADRAM bucket and a list of modifications. HADRAM guarantees that all the modifications are performed simultaneously and atomically. It provides two types of acknowledgements, a first one that the request was received by the HADRAM layer and a second one that the request was performed and is permanent in the sense that it survives failures and unavailabilities (within the limits of our recovery capacity, i.e. not more than k unavailabilities within a reliability group).HADRAM itself is responsible for failure/unavailability detection and repair. Operationally, it detects the failure of parity buckets when they do not respond to an update request. In addition, parity buckets monitor the data buckets through “heartbeat monitoring”. In case an unavailable bucket is detected, the HADRAM buckets in a reliability group act in conjunction. We can use a distributed consensus protocol such as Paxos [GM02] or simply a leader election algorithm, though the latter can run into difficulties if messages can be severely delayed. The group (or the leader) first decides which buckets are available and which are unavailable. It then determines which buckets need to be reconstructed and finds locations for the new buckets. It also determines which (typically) m are implied in the reconstruction. The reconstruction itself depends on the erasure correcting code. We use a generalized Reed-Solomon code as in [LMS05]. Accordingly, the group leader first inverts an m by m matrix formed from the code’s generator matrix. For each bucket to be reconstructed, the process needs to calculate the XOR of the Galois field product of a symbol from each of the reconstructing buckets as a symbol of the reconstructed bucket. As an improvement over LH*RS, we use extensive, and if necessary staggered table look-up to construct the products. In consequence, the leader first calculates the tables and ships them to the sites about to store the reconstructed bucket. All buckets involved in reconstruction then send their contents (in large slices, as in [LMS05]) to the sites which recalculate the bucket.5.2 HADRAM Messaging Architecture: Transactions Even in thePresence of FailuresIn HADRAM every modification of a block is an update and needs to be performed at the data bucket and at all the parity buckets for sure. In [LMS05], the well-known 1-Phase-Commit (1PC) [S79, BG83, CS84] was implemented, but unfortunately, if messages can get lost then 1PC does not guarantee that a once acknowledged update cannot be inadvertently rolled by a bucket failure and following reconstruction. Litwin et al. [LMS05] propose two variants of 2-Phase-Commit that achieve transactional behavior even in the presence of failures for speed. We found that another protocol, called high-watermark, maintains the transactional quality while offering much better performance. In this protocol, we maintain all updates in a log. We purge entries from the log by applying the update to the contents of the bucket. This happens only if the site knows that all other sites have the update in their log. They gain their knowledge through a series of status update messages that in part are piggy-backed on other messages. In more detail, all updates are identified by the data bucket from which they come and a message number. A data site maintains the status of all parity buckets and a parity bucket maintains the status of all buckets. They exchange status exchange messages periodically (i.e. every l updates), but a data bucket also sends its status and what it knows about the status of all parity buckets whenever there is an update request. When an unavailability is detected, then all surviving buckets send each others status and exchange update requests that were not delivered. It turns out that an update message can only be not processed if a total of k messages or sites are lost. This messaging scheme can be proven to be k-available (surviving k node unavailabilities or message losses) and gains performance because it bundles acknowledgments and thus cuts down on the total number of messages.Model type Time [µseconds]1PC 2602PC 1050Watermark (100) 100Watermark (40) 233Watermark (20) 251Watermark (10) 274Table 1: Update TimesComparison 1-PC, watermarking-10, 0-PC# of the requests t i m e i n m i c r o s e c o n d sWatermark algorithm with various intervals # of requests t i m e i n m i c r o s e c o n d sFigure 3: HADRAM performance: MessagingRecovery performance - data record size 4 bytes# of records t i m e i n m i c r o s e c o n d s 0Recovery comparison for different HADRAM block sizes# of records t i m e i n m i c r o s e c o n d s Figure 4: HADRAM RecoveryTable 2: Data Reconstruction Times (4B records) # of records Time[microseconds]50 2930100 3706200 5976300 7586400 9780500 120341000 2154610000 9965450000 9614966 Experimental ResultsImplementing HADRAM as an abstraction for SDDS-memory is a difficult task and we have only implemented it partially. In particular, we implement the messaging structure and have shown that watermarking because of delayed acknowledgements is preferable to 1PC and 2PC, unless the system needs to acknowledge early on to the client. Figure 3 left shows the result of such an experiment where wepitch 1PC, 0PC (no acknowledgements) and watermarking with status exchange messages every 10 messages against each other. Table 1 gives these times in more explicit form. Figure 3 right shows that the performance improvement if we acknowledge less frequently. When compared to the results in [LMS05], it appears that watermarking improves performance considerably. In addition to this speed advantage, watermarking guarantees that a message acknowledged by a data server after it has sent out the ∆-updates to the parity buckets is committed for sure, unless a total of k messages or buckets are lost or become unavailable. That is, watermarking guarantees k availability. We also tested the speed of recovery (Figure 4, Table 2). It turns out that recovery is speedy, as could also have been expected from the corresponding measurements in [LMS05]. However, we achieve a small performance improvement by building and using tables for the Galois field operations instead of the logarithm-antilogarithm method implemented in[LMS05]. Recall that the sites exchange their logs and bring them up to the same stand before recovery proper starts. It turns out that the time for this procedure is dominated by the messaging delay and thus does not add significantly to the reconstruction overhead. Furthermore, setting up the reconstruction of the actual memory contents is also fast. The main delay here is inverting a Galois field matrix and Figure 5 shows that for reasonably sized reliability groups this is a matter of microseconds.Matrix inversion timesm t i m e i n m i c r o s e c o n d sFigure 5: Matrix Inversion TimesCurrently, we are finishing implementing our prototype. Our next set of measurements will give us the exact time of each component of the recovery procedure. We will also address experimentally heart beat unavailability monitoring between the buckets in a reliability group. Our final goal is to implement LH* over HADRAM and compare it to LH*RS .References[ASS94] Amin, M., Schneider, D., and Singh, V., An Adaptive, Load Balancing Parallel Join Algorithm.6th International Conference on Management of Data, Bangalore, India, December, 1994.[BG83] P. Bernstein, N. Goodman: The failure and recovery problem for replicated databases. ACMSymposium on Principles of Distributed Computing, Montreal, Canada, 114-122, 1983.[CS84] M. Carey, M. Stonebraker: The performance of Concurrency Control Algorithms for DatabaseManagement Systems, VLDB, 1984.[D93] Devine, R. Design and Implementation of DDH: Distributed Dynamic Hashing. Int. Conf. OnFoundations of Data Organizations, FODO-93. Lecture Notes in Comp. Sc., Springer-Verlag, Oct. 1993.[E87] Ellis, C. S.: Concurrency in linear hashing, ACM Transactions on Database Systems, 1987.[GM02] Chockler, G. and Malhki, D.: Active disk paxos with infinitely many processes. In Proceedingsof the 21st ACM Symp. on Principles of Distributed Computing (PODC-21), 2002.[G96] Gray, J. Super-Servers: Commodity Computer Clusters Pose a Software Challenge. Microsoft,1996. /[HD90] H. Hsiao and D.J. DeWitt. Chained declustering: A new availability strategy for multiprocessordatabase machines. In Proceedings of 6th International Conference on Data Engineering, pages 456-465, February 1990.[K98] Knuth D. The Art of Computer Programming. 3rd Ed. Addison Wesley,1998.[KLR96] Karlsson, J. Litwin, W., Risch, T. LH*lh: A Scalable High Performance Data Structure for Switched Multicomputers. Intl. Conf. on Extending Database Technology, EDBT-96, Avignon,March 1996.[KW94] Kroll, B., Widmayer, P. Distributing a Search Tree Among a Growing Number of Processors.ACM-SIGMOD Int. Conf. On Management of Data, 1994.[L80] Linear Hashing : a new tool for file and tables addressing. Reprinted from VLDB-80 in READINGS IN DATABASES. 2-nd ed. Morgan Kaufmann Publishers, Inc., 1994. Stonebraker, M.(Ed.).[L88] Larson, P. Dynamic Hash Tables, CACM, 31 (4), 1988.[L98] Lindberg., R. A Java Implementation of a Highly Available Scalable and Distributed Data Structure LH*g. Master Th. LiTH-IDA-Ex-97/65. U. Linkoping, 1997, 62.[LNS93] Litwin, W. Neimat, M-A., Schneider, D. LH*: Linear Hashing for Distributed Files. ACM-SIGMOD Intl. Conf. On Management of Data, 1993.[LNS94] Litwin, W., Neimat, M-A., Schneider, D. RP*: A Family of Order-Preserving Scalable Distributed Data Structures. 20th Intl. Conf on Very Large Data Bases (VLDB), 1994.[LN95] Litwin, W., Neimat. k-RP* : a Family of High Performance Multi-attribute Scalable Distributed Data Structure. IEEE Intl. Conf. on Par. & Distr. Systems, PDIS-96, (Dec. 1996).[LN95] W. Litwin, M-A Neimat. LH*s : a high-availability and high-security Scalable Distributed Data Structure. IEEE Workshop on Research Issues in Data Engineering. IEEE Press, 1997.[LN96] Litwin, W., Neimat M-A. High-Availability LH* Schemes with Mirroring. With M-A, Neimat.Intl. Conf. on Coope. Inf. Syst. COOPIS-96, Brussels, 1996.[LNS96] Litwin, W., Neimat, M-A., Schneider, D. LH*: A Scalable Distributed Data Structure. ACM Transactions on Database Systems ACM-TODS, (Dec. 1996).[LR97] Litwin, W. Risch, T. LH*g : a high-availability Scalable Distributed Data Structure through record grouping. U-Paris 9 Tech. Rep. (May, 1997).[LMR98] Litwin, W. Menon, J., Risch, T. LH* Schemes with Scalable Availability. IBM Almaden Research Rep. (May, 1998).[LMS05] Litwin, W., Moussa, R., Schwarz, T.: LH*RS – A Highly-Available Scalable Distributed Data Structure, Transactions on Database Systems (TODS), September 2005.[L96] Lomet, D. Replicated Indexes for Distributed Data to app. in IEEE Intl. Conf. on Par. & Distr.Systems, PDIS-96, (Dec. 1996).[M&al05] MacCormick, J., Murphy, N., Narjok, M., Thekkath, C. and Zhou, L.: Boxwood: Abstractions as the Foundation for Storage Infrastructure. 6th Syposium on Operating System DesignImplementation, OSDI. San Francisco, Dec. 6-8 2004.[N95] Nardelli, E. Distributed Searching of Multi-dimensional Data: A Performance Evaluation Study.Journal of Par. & Distr. Computing 49, 11-134, 1998.[R98] Ronstrom, M. Design and Modelling of a Parallel Data Server for Telecom Applications. Ph.D, Thesis U. Linkoping, 1998, 250.[S89] Samet, H. The Design and Analysis of Spatial Data Structures. Addision-Wesley, Reading, Mass., 1990.[SPW90] Severance, C., Pramanik, S. Wolberg, P. Distributed linear hashing and parallel projection in main memory databases. VLDB-90.[VBWY94] Vingralek, R., Breitbart, Y., Weikum, G. Distributed File Organization with Scalable Cost/Performance. ACM-SIGMOD Int. Conf. On Management of Data, 1994.。
关于物联网的介绍英语作文Internet of Things (IoT): Transforming the World Through Seamless Connectivity.In an era marked by technological advancements, the Internet of Things (IoT) has emerged as a transformative force, revolutionizing the way we interact with the world around us. IoT refers to the vast interconnected network of physical devices, vehicles, home appliances, and otheritems embedded with sensors, software, and internet connectivity, allowing them to collect, exchange, and analyze data in real-time.Genesis and Evolution of the IoT.The concept of IoT can be traced back to the early days of the internet, with the introduction of interconnected devices in the 1980s. However, it was not until the adventof wireless technologies and the proliferation of smartphones in the 2000s that IoT truly began to take shape.Today, IoT devices are ubiquitous, ranging from smart thermostats and security systems to industrial machineryand medical devices, creating a vast and interconnected ecosystem.Key Components and Functionalities.IoT systems consist of several key components:Devices: Physical objects equipped with sensors, actuators, and embedded software.Connectivity: Wireless protocols like Wi-Fi, Bluetooth, or cellular networks enabling devices to communicate with each other and the internet.Data Platform: Cloud-based platforms responsible for storing, managing, and analyzing the vast amounts of data generated by IoT devices.Applications: Software programs that utilize IoT datato provide valuable insights, automate tasks, or enhanceuser experiences.The functionality of IoT systems varies widely depending on the specific application and industry. For example, in smart homes, IoT devices can monitor energy consumption, control lighting and temperature, and enhance security. In industrial settings, IoT sensors can optimize production processes, predict equipment failures, and improve safety. In healthcare, IoT devices can trackpatient vitals, monitor medical conditions remotely, and facilitate remote consultations.Benefits and Applications of IoT.The widespread adoption of IoT technology across various sectors has led to a plethora of benefits and applications:Improved Efficiency: Automated processes and real-time data enable businesses to streamline operations, reduce costs, and increase productivity.Enhanced Customer Experience: IoT devices provide personalized services, tailored recommendations, and improved product support, leading to increased customer satisfaction.New Business Models: IoT data and insights allow companies to develop innovative products, services, and revenue streams.Environmental Sustainability: IoT sensors and smart systems contribute to energy conservation, waste reduction, and improved resource management.Healthcare Advancements: IoT devices enable remote patient monitoring, personalized treatments, and early disease detection, improving healthcare outcomes.Smart Cities: IoT technologies enhance urban infrastructure, optimize traffic flow, improve public safety, and provide data-driven decision-making for city management.Challenges and Considerations.Despite its transformative potential, IoT also presents certain challenges and considerations:Security and Privacy: IoT devices often collect and transmit sensitive data, raising concerns about data breaches, privacy violations, and security vulnerabilities.Interoperability and Standardization: The lack of standardized protocols and data formats can hinder the seamless integration and interoperability of IoT devices from different vendors.Scalability and Data Management: The vast amounts of data generated by IoT devices require scalable andefficient storage, processing, and analysis capabilities.Cost and Complexity: Implementing and managing IoT systems can be costly and complex, especially for large-scale deployments.Ethical Implications: The widespread use of IoTdevices raises ethical questions about the potential for surveillance, loss of autonomy, and impact on human relationships.Future Prospects and Trends.As technology continues to advance, the IoT market is poised for significant growth and evolution. Some keytrends to watch include:5G and Edge Computing: Enhanced connectivity speedsand distributed computing capabilities will enable real-time data processing and analytics closer to the edge devices.Artificial Intelligence (AI) and Machine Learning (ML): AI and ML algorithms will play a critical role in analyzing IoT data, providing predictive insights, and automating decision-making.Blockchain Technology: Blockchain-based solutions canenhance data security, ensure data integrity, andfacilitate secure transactions in IoT ecosystems.Low-Power Wide-Area Networks (LPWANs): LPWAN technologies will enable long-range, low-power communication for IoT devices, expanding the reach of IoT deployments into remote areas.Internet of Everything (IoE): The convergence of IoT, mobile devices, and cloud technologies will create a seamlessly interconnected ecosystem, enabling new possibilities and applications.Conclusion.The Internet of Things has revolutionized the way we interact with the physical world, creating a vast and interconnected network of devices that collect, exchange, and analyze data in real-time. Its transformative power has brought about countless benefits and applications across various sectors, enhancing efficiency, improving customer experience, and driving innovation. While challenges remainin terms of security, privacy, and scalability, the futureof IoT holds immense potential for shaping a more interconnected, intelligent, and sustainable world. As technology continues to advance and new applications emerge, the IoT will undoubtedly continue to play a pivotal role in defining the digital landscape of tomorrow.。
Implementing a Highly Scalable and Adaptive Agent-Based ManagementFrameworkDamianos Gavalas†, Dominic Greenwood*, Mohammed Ghanbari†, Mike O’Mahony††Communication Networks Research Group, Electronic Systems Engineering Department,University of Essex, Colchester, CO4 3SQ, U.K.E-mail: {dgaval, ghan, mikej}@*Network Agent Research,Fujitsu Laboratories of America, Inc., 595 Lawrence Expressway, Sunnyvale, 94086, CA, USA.E-mail: d.greenwood@Abstract - This paper introduces the concept of dynamic hierarchical management, enabled by Mobile Agent (MA) technology. The proposed framework addresses the scalability limitations of the centralised paradigm and the poor flexibility of static hierarchical management architectures to changing networking conditions. The increased adaptability of our framework is enabled by a novel management entity, termed Mobile Distributed Manager (MDM). MDMs, being MAs themselves, can dynamically migrate to an assigned network domain (given that certain requirements are met) and undertake its management responsibility, operating at an intermediary level between the central manager and SNMP agents, localising the associated management traffic. The paper also focuses on the design decisions and implementation experiences of the proposed architecture.I.I NTRODUCTIONNetwork management world has witnessed several revolutions during the 90’s. The main objective has been to devise new management models characterised by increased flexibility and scalability. It is now agreed that the traditional centralised archetypes (adopted by widely deployed standards such as the SNMP) exhibit severe scalability limitations as they typically involve massive transfers of data. The situation seriously deteriorates when the management of remote subnetworks is considered, as the traffic associated with these management tasks typically traverses several network segments and, when summed up, results in increased bandwidth waste. Furthermore, the processing load at the manager station increases, requiring expensive computers to deal with relatively simple, but repetitive tasks [1].A major shift towards decentralisation has been realised through the SNMPv2 [2] standard, which introduces the concept of “proxy agent” leading to hierarchical management models. When placed across a WAN link, remotely from the manager platform, the proxy obviates the necessity of performing normal SNMP polling. Namely, it reduces the polling traffic on the WAN link, thereby achieving significant cost savings. Hierarchical paradigms address the main shortcoming of centralised models, scalability, but they lack flexibility: once a task has been defined in an agent, there is no way to modify it dynamically; it remains static [1]. In addition, the management roles in such hierarchies is statically defined. For instance, the assignment of managed entities to specific physical locations that will function under the supervision of higher-level entities cannot be dynamically configured. However, this is not in line with the continuously evolving topological and traffic characteristics of large-scale enterprise networks that requires an analogous adaptation of the management systems.The first clear effort towards fully-distributed management has been the Management by Delegation (MbD) framework [3]. MbD agents are interpolated between the managers and the static management agents, with their functionality dynamically extended during runtime.The idea of management distribution is taken further by solutions that exploit Mobile Agents (MA) [4], which can be regarded as a superset of MbD agents, in the sense that they may be downloaded to a managed device and execute a management function, having the additional benefit of mobility. Incoming MAs are received and dispatched by the Mobile Agent Servers (MAS), which serve as execution environments and inter-operate with the legacy systems. The data throughput problem can be addressed by delegation of authority from managers to MAs, which are able to process and filter data locally without the need of frequent communication with the central manager.As a result of these advantages, several Mobile Agent Frameworks (MAF) have been recently proposed for NM applications [5][6][7][8]. Notably though, most of these frameworks assume a ‘flat’ network architecture, i.e. a single MA is launched from the manager platform and sequentially visits all the managed NEs, regardless from the underlying topology [5]. However, this approach does not conform to the hierarchical structure of modern networks, while it does not adequately address the scalability concerns for the following reasons: (a) in large networks the round-trip delay for the MA will greatly increase, (b) when considering management of remote LANs, connected to the backbone network through low-bandwidth WAN links, frequent MA transfers are likely to create bottlenecks and considerably increase the management cost.Rubinstein et al [6] argue that MAFs scalability improves when the managed network is partitioned into several domains and a single MA object is assigned to each of them (i.e. when using multiple MAs in parallel), as the overall response time is reduced. However, domain-based approaches fail to limit the number of MA transfers over the links connecting the managed network segments.With this work, we address these problems through introducing a hierarchical MAF tailored to distributed NM applications. Such a model presupposes the presence of an additional, novel management element, termed Mobile Distributed Manager (MDM), operating at an intermediary level between the manager and the stationary agents. MDMs are essentially MAs, which take full control of managing a specific network domain and localise the associated traffic, leading to robust and highly scalable management systems. Apart from the fact that management functionality may be added/configured at runtime, this architecture can also dynamically adapt to fluctuating networking conditions. Namely, an MDM entity may be assigned to / removed from a network segment to reflect a change on network traffic patterns, or move to the least loaded host in order to optimise its impact on the use of local resources.The remainder of the paper is organised as follows: Section 2 discusses the design considerations and requirements for our hierarchical MA-based approach. Section 3 discusses the implementation details of the introduced architecture and Section 5 concludes the paper.II.O VERVIEW OF THE H IERARCHICAL M ANAGEMENTF RAMEWORKIn this work, we encompass the concepts of hierarchical and MA-based distributed management. The introduced MDM entities resemble the SNMPv2 proxy agents with their mobility feature used to increase management flexibility. MDMs are assigned to a domain given that certain criteria (determined by the administrator) are satisfied. For instance, when the manager station ascertains that the number of managed devices in a remote segment has increased beyond a pre-specified limit, it will choose to deploy an MDM to that segment (Figure 1a). Upon arriving to its assigned remote domain, the MDM will take over the management of local devices from the central manager. As a result, the traffic related to the management of that domain becomes localised, as the MDM is able to dispatch and receive MAs to collect NM data from the local hosts, or even execute centralised management operations on them (Figure 1b). Management functionality may be downloaded at runtime, i.e. the central manager may send to distributed MDMs new MA configurations corresponding to introduced management tasks. In addition, this architecture is adaptive to changing networking conditions since the location and the roles of the entities involved in the management procedure may be dynamically modified. Namely, an MDM entity can be deployed to / removed from a network segment in response to a change in network topology or traffic distribution.Figure 1. The proposed architectureCertainly, the fact that MDMs rely on other MAs to sequentially visit managed devices and collect data brings about performance issues, especially when these MAs need to be frequently transferred. However, in a variety of monitoring applications, MAs may beneficially use the knowledge (data) already obtained from previously visited hosts to apply a second level of data filtering at each hop, thereby minimising the use of network resources [9]. In performance management applications, only aggregated values and statistics are sent to the manager at regular intervals, diminishing the amount of data transferred through the WAN link. The duration of these intervals is task-dependent and determined by the administrator.MDMs also improve the system’s fault tolerance, as they continue to perform their task without the manager’s intervention, even if the interconnecting link fails. It is noted that the management domain assigned to an MDM entity can be confined to a single network segment or expand to a larger set of hosts. In the latter case, when the population of the remotely managed devices increases beyond a certain limit, the MDM will be instructed to clone itself, with its duplicate object transparently sent to a nearby segment and taking over its management.A key issue in the framework's design has been to equally distribute the total workload among the various processors of the underlying subsystems. Hence, MDMs are originally de-ployed to the least loaded host to minimise the usage of local resources, while they can transparently migrate to another device as soon as their hosting system becomes overloaded. In conclusion, the proposed hierarchical NM model adds flexibility and scalability to the management system. That is, the location where MDMs run is not fixed, neither is the set of hosts under their control. MDMs can be transparently sent to a domain when the associated cost savings are conside-rable or removed when their existence is no longer necessary. It is noted that the described architecture has been developed on the top of the framework described in [7]. Among other features, that framework includes a security-enhanced MA execution environment and a tool that automatically generates the code of task-oriented MAs. Similar work has been reported in [8] that comprises an interesting study of an MA-based management architecture adopting a hierarchical, multi-level approach. However, there is no implementation supplementing this work, while the authors have not considered providing mobility features to their “Middle Managers”, so as to dynamically change their location, resulting in a static management hierarchy. In addition, the criteria according to which the managed network is segmented in domains and the way that these domains are assigned to Middle Managers are not clarified.III.A RCHITECTURE D ESIGN AND I MPLEMENTATIONIn order to provide a functional verification of the proposed hierarchical framework and assess its impact upon realistic network environments, we complemented our design ideas with a prototype. Java was the chosen implementation platform due to its inherent portability, rich class hierarchy and dynamic class loading capability. The prototype has been tested on a LAN comprising a number of WinNT and Solaris machines.A.Topology Map of Active DevicesAn important element of our framework is the topology map, a graphical component of the manager application, used to view the devices with currently active MAS servers and the underlying topology of the managed network (Figure 2a). In terms of implementation, the topology map is internally represented by a tree structure (termed the “topology tree”), where each of the tree nodes corresponds to a specific subnetwork. The node representing the manager’s location is the root of the topology tree (see Figure 2b).Figure 2. (a) Topology map’s snapshot, (b) The topology tree structure Each of the tree nodes consists of the following attributes: the subnetwork’s name;the names of hosts and routers physically connected to this subnetwork; a flag indicating the presence of an active MDM on thissubnetwork;the number n l of local active hosts on this subnetwork; the number n s of active hosts on the subnetwork’s “subtree” (the term subtree here denotes the set of subnetworks located in hierarchically lower levels in the topology tree, including the present subnetwork itself),hence lsnn≥;a pointer to the upper level tree node;pointers to the next level nodes;a list of graphical components, each corresponding to aspecific host, that will be made visible upon discovering an active MAS entity on that host.For instance, the number of active hosts in the subtree of Subnetwork A (in Figure 2b) will be:subD,lsubC,lsubB,lsubA,lsubA,snnnnn+++=(1)All the information related to the managed network topology described above, is given to the manager application upon its initialisation, through parsing a text file (“network configuration”file). The configuration file does not include, of course, the activity status information, which is automatically discovered by the manager (the manager application ‘listens’ for related broadcast messages of the activated MASs). For each file entry, a new subnetwork node is created and inserted into the topology tree. In particular, its ‘parent’ (upper-level) subnetwork is located and then the next-level pointer of the parent node as well as the upper-level pointer of the inserted node are updated.As shown below, the topology tree plays a crucial role when the manager application needs to make a decision on which subnets require the deployment of an MDM entity.B.MDMs Deployment PoliciesA key characteristic of this work is the dynamic adaptation of our architecture to changes in the managed network. The structure of the proposed model is not rigidly designed, as MDMs may be dynamically deployed to specific network domains, given that certain requirements are met. Namely, the administrator may explicitly set the policies that define the hierarchical NM system operation, i.e. specify the criteria that should be satisfied for deploying an MDM to a network segment. In general, the administrator may choose one of the following two policies to determine the MDMs deployment strategy:Policy 1: the population of remotely active managed devices.Policy 2: the overall cost involved with the management of a remote set of devices.In the former case (Policy 1), the administrator specifies the number of remote managed NEs that will justify the deployment of an MDM to a particular network segment. This number may either denote n l or n s. If, for instance, the(a)specified number N denotes the population of the examined subnetwork’s local devices n l, an MDM will be deployed to every network segment S with Nn S,l≥, otherwise to every segment with Nn S,s≥. In the latter case (Policy 2), the management cost may either be: (a) proportional to the inverse of link bandwidth, or (b) manually specified. By choosing appropriate constants, the administrator may either enforce or impede the deployment of MDMs.C.Implementing MDMs DeploymentUpon discovering an active MAS module, the corresponding host is located through scanning the topology tree and finding the subnetwork where the host belongs, whilst the host icon is instantly made visible on the topology map. Then, the number n l of active hosts on that subnetwork is increased by one and subsequently, through following the pointer to the upper-level nodes, all the topology tree nodes up to the root are traversed and their number n s of subtree nodes is also updated. A similar procedure is followed when an MAS server is being shut down.The discovery or termination of an MAS server triggers an event at the manager host. The topology tree is then scanned and an MDM is sent to each subnetwork that meets certain requirements. In case that ‘Policy 1’ is employed, referring to the policies listed in the preceding section, the chosen subnetworks will include those with n l or n s (depending on whether the MDMs deployment is a function of the active devices running locally or in the whole subtree) greater than the specified constant N. If ‘Policy 2’ is employed, the cost corresponding to the management of each subnetwork is evaluated and the list of subnetworks created accordingly. Ultimately, an MDM will be deployed to each of the subnetworks included in the list. The MDMs deployment algorithm is illustrated in the Figure 3 flow diagram (we assume that Policy 1 is used).Figure 3. MDMs deployment algorithm diagram Certainly, the set of management tasks already performed by the manager on these subnetworks will need to be conveyed to the MDM deployed therein. This is achieved through sending the Polling Threads (PT) configurations along with the MDM. PTs are originally started and controlled by the manager application with each of them corresponding to a single monitoring task. Unfortunately, PTs cannot be transparently transferred along with the MDM retaining their execution state, due to a Java constraint (Java does not support threads serialisation/deserialisation). Hence, PT attributes are saved in configuration files, which are ‘attached’ to the MDMs when sent to a remote domain. Upon its arrival at the remote subnet, the MDM instantiates the PTs using their configurations. The PTs will thereafter start per-forming their tasks without any further disruption of the ma-nagement process: they launch the required number of MAs (supplied with their corresponding itinerary) and then ‘sleep’for one polling interval. When this period elapses the same process is repeated. Meanwhile, an MDM’s listener daemon receives the MAs that return carrying their collected data. D.Optimising Host Resources UtilisationAlthough MDMs have been designed to be as lightweight as possible, they cannot avoid consuming memory and processing resources on the NE where they execute. The framework should therefore be sufficiently flexible to allow MDMs to autonomously move to another host, when their current hosting device is overloaded, in order to provide a more balanced distribution of the overall processing load.Figure 4. Migration of the MDM to the least loaded host within itsassigned domainThis is accomplished through the regular inspection of the domain’s NEs, in terms of their memory and CPU utilisation: an MA object, termed Resources Inspector (RI), is periodically dispatched and visits all the local devices obtaining these figures before delivering the results to the MDM. If the hosting processor is seriously overloaded, compared to the neighbouring devices, the MDM willResourcesHost EMem: 18%Host Btransparently move to the least loaded node. In the example depicted in Figure 4, an RI sequentially visits all the managed devices in the MDM’s local domain. At each host, the RI obtains the average CPU & memory load values during the last interval, keeping track of the least loaded device (in this example that will be Host D). Finally, the RI reports its results to the MDM, which in turn transparently migrates from Host A to Host D after informing the manager application about its decision.E.Obtaining Host Load ProfileIn order to obtain a view of devices load we have built a tool, developed in C, able to accurately measure the CPU and memory load profile. On the Windows platforms, the low-level functions included in the Win32 API [10] are used, whereas standard UNIX commands (e.g. the ps command) are executed under Solaris.The integration of this tool with the MAS application, developed in Java, is achieved through the Java Native Interface (JNI) [11]. The JNI allows Java code that runs within a Java Virtual Machine to operate with applications and libraries written in other languages, such as C or C++. JNI is used to write native methods to handle those situations when an application cannot be written entirely in Java. The Java front-end (methods) accessed by the incoming RIs provides them a uniform handle onto the local resources, whilst hiding the underlying architecture, i.e. the native methods implementation.‘Snapshots’ of hosts load profile are taken in regular intervals. The duration of these intervals should be carefully set: it should be long enough to avoid sensitivity on sporadic load peaks and, at the same time, short enough so as not to omit potentially prolonged increments of the processing load.F.Manager-MDMs CommunicationOne of our framework key advantages is that it greatly reduces the amount of information exchanged between the manager platform and the managed devices. This is due to the introduction of the intermediate management level (MDMs). However, that does not obviate the necessity for bi-directional communication between MDMs and the manager host. In particular, MDMs often need to send the manager the statistics obtained through filtering raw data collected from the local devices, inform the manager when migrating to another host, etc. In the opposite direction, the manager may request an MDM to terminate its execution, move to another domain, create a clone and send it to a nearby segment, update a PT configuration, modify the statistics delivery frequency, undertake the management responsibility of a host that has just started execution on the MDM’s local segment, download in runtime an additional management service, etc. Java Remote Method Invocation (RMI) has been chosen for implementing the communication bus between the distributed MDMs and the manager host, due to its inherent simplicity and the rapid prototype development that it offers.IV.C ONCLUSIONSThis paper proposed the use of MA technology for dynamic hierarchical management. In this context, we introduced the MDM, a novel management entity, which can be assigned to a given network segment at runtime to localise the associated management traffic. MDMs are enhanced with mobility capabilities allows the management system to instantly adapt to potential changes of the managed network topology or traffic distribution and optimise the use of local resources. In addition, the use of MAs with filtering capabilities reduces the cost associated with the actual management data collection procedure. The design of our framework is supplemented by a prototype implemented in Java and tested under realistic network conditions.A CKNOWLEDGEMENTSThis work has been funded by Fujitsu Telecommunications Europe Ltd. We are also grateful to Paolo Bellavista, Prof. Antonio Corradi and Dr. Christina Politi for their insightful ideas during the framework's development phase.R EFERENCES[1] Martin-Flatin J-P., Znaty S., "Two Taxonomies of DistributedNetwork and Systems Management Paradigms", Chapter 3, "Emerging Trends and Challenges in Network Management", Plenum Press, New York, NY, USA, 2000.[2] Perkins D.T., “SNMP Versions”, Simple Times, 5(1):13-14, 1997,/.[3]Goldszmidt G., “Distributed Management by Delegation”, PhD thesis,Columbia University, New York, NY, USA, Dec. 1995.[4]Pham V., Karmouch A., “Mobile Software Agents: An Overview”,IEEE Communications, Vol. 36, No 7, pp. 26-37, 1998.[5] Puliafito A. Tomarchio O., “Using Mobile Agents to implementflexible Network Management strategies”, Computer Communications, 23(8), pp. 708-719, April 2000.[6] Rubinstein, M., Duarte O. C., Pujolle G., "Reducing the ResponseTime in Network Management by Using Multiple Mobile Agents", Proc. of the 4th Int. Conf. on Autonomous Agents (Agents'2000), June 2000.[7]Gavalas D., Greenwood D., Ghanbari M., O’Mahony M., “AnInfrastructure for Distributed and Dynamic Network Management based on Mobile Agent Technology”, Proc. of the IEEE Int. Conf. on Communications (ICC’99), pp. 1362-1366, June 1999.[8]Liotta A., Knight G., Pavlou G., Modelling Network and SystemMonitoring Over the Internet with Mobile Agents”, Proc. of the IEEE/IFIP Network Operations and Management Symposium (NOMS'98), pp. 303-312, Feb. 1998.[9] Gavalas D., Greenwood D., Ghanbari M., O’Mahony M., “EnablingMobile Agent Technology for Intelligent Bulk Management Data Filtering”, Proc. of the 2000 IEEE/IFIP Network Operations and Management Symposium (NOMS’2000), pp.623-636, April 2000. [10] Platfptm SDK: Win32 API, /library/psdk/portals/win32start_1n6t.htm.[11] Java Native Interface (JNI), /docs/books/tutorial/native1.1/index.html.。