分布式系统设计毕业论文外文文献翻译及原文
- 格式:pdf
- 大小:411.47 KB
- 文档页数:9
本科生毕业设计 (论文)外文翻译原文标题MVC Design Pattern for the multi frameworkdistributed applications using XML, spring and strutsframework译文标题使用XML,Spring和struts以MVC为设计模式的多分布式应用程序框架作者所在系别计算机与遥感信息技术学院作者所在班级B12511作者姓名王硕作者学号20124051117指导教师姓名耿炎指导教师职称院长完成时间2015 年1 月北华航天工业学院教务处制译文标题使用XML,Spring和struts以MVC为设计模式的多分布式应用程序框架原文标题MVC Design Pattern for the multi frameworkdistributed applications using XML, spring and struts framework作者Praveen Gupta 译名普利文·古塔国籍印度原文出处International Journal on Computer Science and Engineering 使用XML,Spring和struts以MVC为设计模式的多分布式应用程序框架摘要:模型-视图-控制器(MVC)是一个基本的设计模式,用来分离用户界面与业务的逻辑。
近些年来,应用程序的规模越来越大,而MVC设计模式可以弱耦合不同应用层的程序。
本文提出了一种基于J2EE平台上的网络,它扩展了MVC、XML 的应用程序框架,易于维护。
这是一个包括表示层、业务层、数据持久层和数据库层的多系统层,由于其代码的具有独立性,可维护性和可重用性大大提升。
在本文中,我们使用MVC实现spring和struts框架。
我们的研究显示,应用多个框架设计应用程序时,使用MVC概念能使应用程序转变为更容易、较单一的框架。
关键字:MVC,Spring,XML一介绍近些年来,web是非常复杂的问题。
hadoop分布式存储平台外文翻译文献(文档含中英文对照即英文原文和中文翻译)原文:Technical Issues of Forensic Investigations in Cloud Computing EnvironmentsDominik BirkRuhr-University BochumHorst Goertz Institute for IT SecurityBochum, GermanyRuhr-University BochumHorst Goertz Institute for IT SecurityBochum, GermanyAbstract—Cloud Computing is arguably one of the most discussed information technologies today. It presents many promising technological andeconomical opportunities. However, many customers remain reluctant to move their business IT infrastructure completely to the cloud. One of their main concerns is Cloud Security and the threat of the unknown. Cloud Service Providers(CSP) encourage this perception by not letting their customers see what is behind their virtual curtain. A seldomly discussed, but in this regard highly relevant open issue is the ability to perform digital investigations. This continues to fuel insecurity on the sides of both providers and customers. Cloud Forensics constitutes a new and disruptive challenge for investigators. Due to the decentralized nature of data processing in the cloud, traditional approaches to evidence collection and recovery are no longer practical. This paper focuses on the technical aspects of digital forensics in distributed cloud environments. We contribute by assessing whether it is possible for the customer of cloud computing services to perform a traditional digital investigation from a technical point of view. Furthermore we discuss possible solutions and possible new methodologies helping customers to perform such investigations.I. INTRODUCTIONAlthough the cloud might appear attractive to small as well as to large companies, it does not come along without its own unique problems. Outsourcing sensitive corporate data into the cloud raises concerns regarding the privacy and security of data. Security policies, companies main pillar concerning security, cannot be easily deployed into distributed, virtualized cloud environments. This situation is further complicated by the unknown physical location of the companie’s assets. Normally,if a security incident occurs, the corporate security team wants to be able to perform their own investigation without dependency on third parties. In the cloud, this is not possible anymore: The CSP obtains all the power over the environmentand thus controls the sources of evidence. In the best case, a trusted third party acts as a trustee and guarantees for the trustworthiness of the CSP. Furthermore, the implementation of the technical architecture and circumstances within cloud computing environments bias the way an investigation may be processed. In detail, evidence data has to be interpreted by an investigator in a We would like to thank the reviewers for the helpful comments and Dennis Heinson (Center for Advanced Security Research Darmstadt - CASED) for the profound discussions regarding the legal aspects of cloud forensics. proper manner which is hardly be possible due to the lack of circumstantial information. For auditors, this situation does not change:Questions who accessed specific data and information cannot be answered by the customers, if no corresponding logs are available. With the increasing demand for using the power of the cloud for processing also sensible information and data, enterprises face the issue of Data and Process Provenance in the cloud [10]. Digital provenance, meaning meta-data that describes the ancestry or history of a digital object, is a crucial feature for forensic investigations. In combination with a suitable authentication scheme, it provides information about who created and who modified what kind of data in the cloud. These are crucial aspects for digital investigations in distributed environments such as the cloud. Unfortunately, the aspects of forensic investigations in distributed environment have so far been mostly neglected by the research community. Current discussion centers mostly around security, privacy and data protection issues [35], [9], [12]. The impact of forensic investigations on cloud environments was little noticed albeit mentioned by the authors of [1] in 2009: ”[...] to our knowledge, no research has been published on how cloud computing environments affect digital artifacts,and on acquisition logistics and legal issues related to cloud computing environments.” This statement is also confirmed by other authors [34], [36], [40] stressing that further research on incident handling, evidence tracking and accountability in cloud environments has to be done. At the same time, massive investments are being made in cloud technology. Combined with the fact that information technology increasingly transcendents peoples’ private and professional life, thus mirroring more and more of peoples’actions, it becomes apparent that evidence gathered from cloud environments will be of high significance to litigation or criminal proceedings in the future. Within this work, we focus the notion of cloud forensics by addressing the technical issues of forensics in all three major cloud service models and consider cross-disciplinary aspects. Moreover, we address the usability of various sources of evidence for investigative purposes and propose potential solutions to the issues from a practical standpoint. This work should be considered as a surveying discussion of an almost unexplored research area. The paper is organized as follows: We discuss the related work and the fundamental technical background information of digital forensics, cloud computing and the fault model in section II and III. In section IV, we focus on the technical issues of cloud forensics and discuss the potential sources and nature of digital evidence as well as investigations in XaaS environments including the cross-disciplinary aspects. We conclude in section V.II. RELATED WORKVarious works have been published in the field of cloud security and privacy [9], [35], [30] focussing on aspects for protecting data in multi-tenant, virtualized environments. Desired security characteristics for current cloud infrastructures mainly revolve around isolation of multi-tenant platforms [12], security of hypervisors in order to protect virtualized guest systems and secure network infrastructures [32]. Albeit digital provenance, describing the ancestry of digital objects, still remains a challenging issue for cloud environments, several works have already been published in this field [8], [10] contributing to the issues of cloud forensis. Within this context, cryptographic proofs for verifying data integrity mainly in cloud storage offers have been proposed,yet lacking of practical implementations [24], [37], [23]. Traditional computer forensics has already well researched methods for various fields of application [4], [5], [6], [11], [13]. Also the aspects of forensics in virtual systems have been addressed by several works [2], [3], [20] including the notionof virtual introspection [25]. In addition, the NIST already addressed Web Service Forensics [22] which has a huge impact on investigation processes in cloud computing environments. In contrast, the aspects of forensic investigations in cloud environments have mostly been neglected by both the industry and the research community. One of the first papers focusing on this topic was published by Wolthusen [40] after Bebee et al already introduced problems within cloud environments [1]. Wolthusen stressed that there is an inherent strong need for interdisciplinary work linking the requirements and concepts of evidence arising from the legal field to what can be feasibly reconstructed and inferred algorithmically or in an exploratory manner. In 2010, Grobauer et al [36] published a paper discussing the issues of incident response in cloud environments - unfortunately no specific issues and solutions of cloud forensics have been proposed which will be done within this work.III. TECHNICAL BACKGROUNDA. Traditional Digital ForensicsThe notion of Digital Forensics is widely known as the practice of identifying, extracting and considering evidence from digital media. Unfortunately, digital evidence is both fragile and volatile and therefore requires the attention of special personnel and methods in order to ensure that evidence data can be proper isolated and evaluated. Normally, the process of a digital investigation can be separated into three different steps each having its own specific purpose:1) In the Securing Phase, the major intention is the preservation of evidence for analysis. The data has to be collected in a manner that maximizes its integrity. This is normally done by a bitwise copy of the original media. As can be imagined, this represents a huge problem in the field of cloud computing where you never know exactly where your data is and additionallydo not have access to any physical hardware. However, the snapshot technology, discussed in section IV-B3, provides a powerful tool to freeze system states and thus makes digital investigations, at least in IaaS scenarios, theoretically possible.2) We refer to the Analyzing Phase as the stage in which the data is sifted and combined. It is in this phase that the data from multiple systems or sources is pulled together to create as complete a picture and event reconstruction as possible. Especially in distributed system infrastructures, this means that bits and pieces of data are pulled together for deciphering the real story of what happened and for providing a deeper look into the data.3) Finally, at the end of the examination and analysis of the data, the results of the previous phases will be reprocessed in the Presentation Phase. The report, created in this phase, is a compilation of all the documentation and evidence from the analysis stage. The main intention of such a report is that it contains all results, it is complete and clear to understand. Apparently, the success of these three steps strongly depends on the first stage. If it is not possible to secure the complete set of evidence data, no exhaustive analysis will be possible. However, in real world scenarios often only a subset of the evidence data can be secured by the investigator. In addition, an important definition in the general context of forensics is the notion of a Chain of Custody. This chain clarifies how and where evidence is stored and who takes possession of it. Especially for cases which are brought to court it is crucial that the chain of custody is preserved.B. Cloud ComputingAccording to the NIST [16], cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal CSP interaction. The new raw definition of cloud computing brought several new characteristics such as multi-tenancy, elasticity, pay-as-you-go and reliability. Within this work, the following three models are used: In the Infrastructure as a Service (IaaS) model, the customer is using the virtual machine provided bythe CSP for installing his own system on it. The system can be used like any other physical computer with a few limitations. However, the additive customer power over the system comes along with additional security obligations. Platform as a Service (PaaS) offerings provide the capability to deploy application packages created using the virtual development environment supported by the CSP. For the efficiency of software development process this service model can be propellent. In the Software as a Service (SaaS) model, the customer makes use of a service run by the CSP on a cloud infrastructure. In most of the cases this service can be accessed through an API for a thin client interface such as a web browser. Closed-source public SaaS offers such as Amazon S3 and GoogleMail can only be used in the public deployment model leading to further issues concerning security, privacy and the gathering of suitable evidences. Furthermore, two main deployment models, private and public cloud have to be distinguished. Common public clouds are made available to the general public. The corresponding infrastructure is owned by one organization acting as a CSP and offering services to its customers. In contrast, the private cloud is exclusively operated for an organization but may not provide the scalability and agility of public offers. The additional notions of community and hybrid cloud are not exclusively covered within this work. However, independently from the specific model used, the movement of applications and data to the cloud comes along with limited control for the customer about the application itself, the data pushed into the applications and also about the underlying technical infrastructure.C. Fault ModelBe it an account for a SaaS application, a development environment (PaaS) or a virtual image of an IaaS environment, systems in the cloud can be affected by inconsistencies. Hence, for both customer and CSP it is crucial to have the ability to assign faults to the causing party, even in the presence of Byzantine behavior [33]. Generally, inconsistencies can be caused by the following two reasons:1) Maliciously Intended FaultsInternal or external adversaries with specific malicious intentions can cause faults on cloud instances or applications. Economic rivals as well as former employees can be the reason for these faults and state a constant threat to customers and CSP. In this model, also a malicious CSP is included albeit he is assumed to be rare in real world scenarios. Additionally, from the technicalpoint of view, the movement of computing power to a virtualized, multi-tenant environment can pose further threads and risks to the systems. One reason for this is that if a single system or service in the cloud is compromised, all other guest systems and even the host system are at risk. Hence, besides the need for further security measures, precautions for potential forensic investigations have to be taken into consideration.2) Unintentional FaultsInconsistencies in technical systems or processes in the cloud do not have implicitly to be caused by malicious intent. Internal communication errors or human failures can lead to issues in the services offered to the costumer(i.e. loss or modification of data). Although these failures are not caused intentionally, both the CSP and the customer have a strong intention to discover the reasons and deploy corresponding fixes.IV. TECHNICAL ISSUESDigital investigations are about control of forensic evidence data. From the technical standpoint, this data can be available in three different states: at rest, in motion or in execution. Data at rest is represented by allocated disk space. Whether the data is stored in a database or in a specific file format, it allocates disk space. Furthermore, if a file is deleted, the disk space is de-allocated for the operating system but the data is still accessible since the disk space has not been re-allocated and overwritten. This fact is often exploited by investigators which explore these de-allocated disk space on harddisks. In case the data is in motion, data is transferred from one entity to another e.g. a typical file transfer over a network can be seen as a data in motion scenario. Several encapsulated protocols contain the data each leaving specific traces on systems and network devices which can in return be used by investigators. Data can be loaded into memory and executed as a process. In this case, the data is neither at rest or in motion but in execution. On the executing system, process information, machine instruction and allocated/de-allocated data can be analyzed by creating a snapshot of the current system state. In the following sections, we point out the potential sources for evidential data in cloud environments and discuss the technical issues of digital investigations in XaaS environmentsas well as suggest several solutions to these problems.A. Sources and Nature of EvidenceConcerning the technical aspects of forensic investigations, the amount of potential evidence available to the investigator strongly diverges between the different cloud service and deployment models. The virtual machine (VM),hosting in most of the cases the server application, provides several pieces of information that could be used by investigators. On the network level, network components can provide information about possible communication channels between different parties involved. The browser on the client, acting often as the user agent for communicating with the cloud, also contains a lot of information that could be used as evidence in a forensic investigation. Independently from the used model, the following three components could act as sources for potential evidential data.1) Virtual Cloud Instance: The VM within the cloud, where i.e. data is stored or processes are handled, contains potential evidence [2], [3]. In most of the cases, it is the place where an incident happened and hence provides a good starting point for a forensic investigation. The VM instance can be accessed by both, the CSP and the customer who is running the instance. Furthermore, virtual introspection techniques [25] provide access to the runtime state of the VM via the hypervisor and snapshot technology supplies a powerful technique for the customer to freeze specific states of the VM. Therefore, virtual instances can be still running during analysis which leads to the case of live investigations [41] or can be turned off leading to static image analysis. In SaaS and PaaS scenarios, the ability to access the virtual instance for gathering evidential information is highly limited or simply not possible.2) Network Layer: Traditional network forensics is knownas the analysis of network traffic logs for tracing events that have occurred in the past. Since the different ISO/OSI network layers provide several information on protocols and communication between instances within as well as with instances outside the cloud [4], [5], [6], network forensics is theoretically also feasible in cloud environments. However in practice, ordinary CSP currently do not provide any log data from the network components used by the customer’s instances or applications. For instance, in case of a malware infection of an IaaS VM, it will be difficult for the investigator to get any form of routing information and network log datain general which is crucial for further investigative steps. This situation gets even more complicated in case of PaaS or SaaS. So again, the situation of gathering forensic evidence is strongly affected by the support the investigator receives from the customer and the CSP.3) Client System: On the system layer of the client, it completely depends on the used model (IaaS, PaaS, SaaS) if and where potential evidence could be extracted. In most of the scenarios, the user agent (e.g. the web browser) onthe client system is the only application that communicates with the service in the cloud. This especially holds for SaaS applications which are used and controlled by the web browser. But also in IaaS scenarios, the administration interface is often controlled via the browser. Hence, in an exhaustive forensic investigation, the evidence data gathered from the browser environment [7] should not be omitted.a) Browser Forensics: Generally, the circumstances leading to an investigation have to be differentiated: In ordinary scenarios, the main goal of an investigation of the web browser is to determine if a user has been victim of a crime. In complex SaaS scenarios with high client-server interaction, this constitutes a difficult task. Additionally, customers strongly make use of third-party extensions [17] which can be abused for malicious purposes. Hence, the investigator might want to look for malicious extensions, searches performed, websites visited, files downloaded, information entered in forms or stored in local HTML5 stores, web-based email contents and persistent browser cookies for gathering potential evidence data. Within this context, it is inevitable to investigate the appearance of malicious JavaScript [18] leading to e.g. unintended AJAX requests and hence modified usage of administration interfaces. Generally, the web browser contains a lot of electronic evidence data that could be used to give an answer to both of the above questions - even if the private mode is switched on [19].B. Investigations in XaaS EnvironmentsTraditional digital forensic methodologies permit investigators to seize equipment and perform detailed analysis on the media and data recovered [11]. In a distributed infrastructure organization like the cloud computing environment, investigators are confronted with an entirely different situation. They have no longer the option of seizing physical data storage. Data and processes of the customer are dispensed over an undisclosed amount of virtual instances, applications and network elements. Hence, it is in question whether preliminary findings of the computer forensic community in the field of digital forensics apparently have to be revised and adapted to the new environment. Within this section, specific issues of investigations in SaaS, PaaS and IaaS environments will be discussed. In addition, cross-disciplinary issues which affect several environments uniformly, will be taken into consideration. We also suggest potential solutions to the mentioned problems.1) SaaS Environments: Especially in the SaaS model, the customer does not obtain any control of the underlying operating infrastructure such as network,servers, operating systems or the application that is used. This means that no deeper view into the system and its underlying infrastructure is provided to the customer. Only limited userspecific application configuration settings can be controlled contributing to the evidences which can be extracted fromthe client (see section IV-A3). In a lot of cases this urges the investigator to rely on high-level logs which are eventually provided by the CSP. Given the case that the CSP does not run any logging application, the customer has no opportunity to create any useful evidence through the installation of any toolkit or logging tool. These circumstances do not allow a valid forensic investigation and lead to the assumption that customers of SaaS offers do not have any chance to analyze potential incidences.a) Data Provenance: The notion of Digital Provenance is known as meta-data that describes the ancestry or history of digital objects. Secure provenance that records ownership and process history of data objects is vital to the success of data forensics in cloud environments, yet it is still a challenging issue today [8]. Albeit data provenance is of high significance also for IaaS and PaaS, it states a huge problem specifically for SaaS-based applications: Current global acting public SaaS CSP offer Single Sign-On (SSO) access control to the set of their services. Unfortunately in case of an account compromise, most of the CSP do not offer any possibility for the customer to figure out which data and information has been accessed by the adversary. For the victim, this situation can have tremendous impact: If sensitive data has been compromised, it is unclear which data has been leaked and which has not been accessed by the adversary. Additionally, data could be modified or deleted by an external adversary or even by the CSP e.g. due to storage reasons. The customer has no ability to proof otherwise. Secure provenance mechanisms for distributed environments can improve this situation but have not been practically implemented by CSP [10]. Suggested Solution: In private SaaS scenarios this situation is improved by the fact that the customer and the CSP are probably under the same authority. Hence, logging and provenance mechanisms could be implemented which contribute to potential investigations. Additionally, the exact location of the servers and the data is known at any time. Public SaaS CSP should offer additional interfaces for the purpose of compliance, forensics, operations and security matters to their customers. Through an API, the customers should have the ability to receive specific information suchas access, error and event logs that could improve their situation in case of an investigation. Furthermore, due to the limited ability of receiving forensicinformation from the server and proofing integrity of stored data in SaaS scenarios, the client has to contribute to this process. This could be achieved by implementing Proofs of Retrievability (POR) in which a verifier (client) is enabled to determine that a prover (server) possesses a file or data object and it can be retrieved unmodified [24]. Provable Data Possession (PDP) techniques [37] could be used to verify that an untrusted server possesses the original data without the need for the client to retrieve it. Although these cryptographic proofs have not been implemented by any CSP, the authors of [23] introduced a new data integrity verification mechanism for SaaS scenarios which could also be used for forensic purposes.2) PaaS Environments: One of the main advantages of the PaaS model is that the developed software application is under the control of the customer and except for some CSP, the source code of the application does not have to leave the local development environment. Given these circumstances, the customer obtains theoretically the power to dictate how the application interacts with other dependencies such as databases, storage entities etc. CSP normally claim this transfer is encrypted but this statement can hardly be verified by the customer. Since the customer has the ability to interact with the platform over a prepared API, system states and specific application logs can be extracted. However potential adversaries, which can compromise the application during runtime, should not be able to alter these log files afterwards. Suggested Solution:Depending on the runtime environment, logging mechanisms could be implemented which automatically sign and encrypt the log information before its transfer to a central logging server under the control of the customer. Additional signing and encrypting could prevent potential eavesdroppers from being able to view and alter log data information on the way to the logging server. Runtime compromise of an PaaS application by adversaries could be monitored by push-only mechanisms for log data presupposing that the needed information to detect such an attack are logged. Increasingly, CSP offering PaaS solutions give developers the ability to collect and store a variety of diagnostics data in a highly configurable way with the help of runtime feature sets [38].3) IaaS Environments: As expected, even virtual instances in the cloud get compromised by adversaries. Hence, the ability to determine how defenses in the virtual environment failed and to what extent the affected systems have been compromised is crucial not only for recovering from an incident. Alsoforensic investigations gain leverage from such information and contribute to resilience against future attacks on the systems. From the forensic point of view, IaaS instances do provide much more evidence data usable for potential forensics than PaaS and SaaS models do. This fact is caused throughthe ability of the customer to install and set up the image for forensic purposes before an incident occurs. Hence, as proposed for PaaS environments, log data and other forensic evidence information could be signed and encrypted before itis transferred to third-party hosts mitigating the chance that a maliciously motivated shutdown process destroys the volatile data. Although, IaaS environments provide plenty of potential evidence, it has to be emphasized that the customer VM is in the end still under the control of the CSP. He controls the hypervisor which is e.g. responsible for enforcing hardware boundaries and routing hardware requests among different VM. Hence, besides the security responsibilities of the hypervisor, he exerts tremendous control over how customer’s VM communicate with the hardware and theoretically can intervene executed processes on the hosted virtual instance through virtual introspection [25]. This could also affect encryption or signing processes executed on the VM and therefore leading to the leakage of the secret key. Although this risk can be disregarded in most of the cases, the impact on the security of high security environments is tremendous.a) Snapshot Analysis: Traditional forensics expect target machines to be powered down to collect an image (dead virtual instance). This situation completely changed with the advent of the snapshot technology which is supported by all popular hypervisors such as Xen, VMware ESX and Hyper-V.A snapshot, also referred to as the forensic image of a VM, providesa powerful tool with which a virtual instance can be clonedby one click including also the running syst em’s memory. Due to the invention of the snapshot technology, systems hosting crucial business processes do not have to be powered down for forensic investigation purposes. The investigator simply creates and loads a snapshot of the target VM for analysis(live virtual instance). This behavior is especially important for scenarios in which a downtime of a system is not feasible or practical due to existing SLA. However the information whether the machine is running or has been properly powered down is crucial [3] for the investigation. Live investigations of running virtual instances become more common providing evidence data that is not available on powered down systems. The technique of live investigation。
毕业设计(论文)外文资料翻译学院:机械电子工程学院专业:机械设计制造及其自动化姓名:孙明明学号: 070501504外文出处: The advantages of PLC control,filed under PLC Articles附件: 1.外文资料翻译译文;2.外文原文。
(用外文写)附件1:外文资料翻译译文PLC的控制优势任何控制系统从概念到进入工厂工作都要经历四个阶段。
PLC系统在每一个阶段都有优势。
第一阶段是设计,对工厂的需要进行研究和制定控制策略,传统的运行平台的设计和制造必须在设计进行前完成。
PLC系统仅仅需要的是一个模糊的关于机器的可能大小的想法和I/O数量的要求(多少输入和输出接口)。
在这个阶段输入和输出芯片十分便宜,所以可以内置一个很健全的备用容量,它允许用来补充遗漏项目和为未来的扩充做准备。
其次是设计。
传统的方案是,每一项工作都是“一次成型”这不可避免的造成了工程拖延和增加成本。
一个的PLC系统使用最简单的标准件螺栓连接在一起。
在这样的连接下开始编写 PLC程序(或者至少是写入详细的程序规范)。
下一阶段是安装,安装是一种繁琐和昂贵的工作,例如安装传感器、执行器、限制开关系统和主机的连接。
分布式PLC系统使用串行链路式的预编译,测试界面可以简化安装它带来了巨大的成本优势。
PLC的程序多数在这个阶段完成。
最后是调试,而这正是PLC真正的优势被发掘的部分。
没有任何设备在第一次就正常工作。
人性就是这样,总会有一些疏漏。
与传统的系统变动情况的耗时和昂贵相比,PLC的设计师提供了系的内置备用内存容量、备用I/O和一些备用多芯电缆线,多数的变动能迅速和相对便宜的完成。
另外一个好处是,所有的变化PLC都有记录,程序的调试和修改不会因为没有被记录而遗失,这是一个经常发生在常规系统中的问题。
还有一个额外的第五阶段,维护,一旦启动工作,并移交生产就产生了维护的问题。
所有设备都有缺点,大多数设备在错误的模式中度过了它们的大部分的时间。
外文翻译原文来源The Hadoop Distributed File System: Architecture and Design 中文译文Hadoop分布式文件系统:架构和设计姓名 XXXX学号 ************2013年4月8 日英文原文The Hadoop Distributed File System: Architecture and DesignSource:/docs/r0.18.3/hdfs_design.html IntroductionThe Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed onlow-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project. The project URL is/core/.Assumptions and GoalsHardware FailureHardware failure is the norm rather than the exception. An HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system’s data. The fact that there are a huge number of components and that each component has a non-trivial probability of failure means that some component of HDFS is always non-functional. Therefore, detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS.Streaming Data AccessApplications that run on HDFS need streaming access to their data sets. They are not general purpose applications that typically run on general purpose file systems. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access. POSIX imposes many hard requirements that are notneeded for applications that are targeted for HDFS. POSIX semantics in a few key areas has been traded to increase data throughput rates.Large Data SetsApplications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It should support tens of millions of files in a single instance.Simple Coherency ModelHDFS applications need a write-once-read-many access model for files. A file once created, written, and closed need not be changed. This assumption simplifies data coherency issues and enables high throughput data access. AMap/Reduce application or a web crawler application fits perfectly with this model. There is a plan to support appending-writes to files in the future.“Moving Computation is Cheaper than Moving Data”A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and increases the overall throughput of the system. The assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where the application is running. HDFS provides interfaces for applications to move themselves closer to where the data is located.Portability Across Heterogeneous Hardware and Software PlatformsHDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications.NameNode and DataNodesHDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocksare stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.The NameNode and DataNode are pieces of software designed to run on commodity machines. These machines typically run a GNU/Linux operating system (OS). HDFS is built using the Java language; any machine that supports Java can run the NameNode or the DataNode software. Usage of the highly portable Java language means that HDFS can be deployed on a wide range ofmachines. A typical deployment has a dedicated machine that runs only the NameNode software. Each of the other machines in the cluster runs one instance of the DataNode software. The architecture does not preclude running multiple DataNodes on the same machine but in a real deployment that is rarely the case.The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. The NameNode is the arbitrator and repository for all HDFS metadata. The system is designed in such a way that user data never flows through the NameNode.The File System NamespaceHDFS supports a traditional hierarchical file organization. A user or an application can create directories and store files inside these directories. The file system namespace hierarchy is similar to most other existing file systems; one can create and remove files, move a file from one directory to another, or rename a file. HDFS does not yet implement user quotas or access permissions. HDFS does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features.The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This information is stored by the NameNode.Data ReplicationHDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time.The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster.Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode.Replica Placement: The First Baby StepsThe placement of replicas is critical to HDFS reliability and performance. Optimizing replica placement distinguishes HDFS from most other distributed file systems. This is a feature that needs lots of tuning and experience. The purpose of a rack-aware replica placement policy is to improve data reliability, availability, and network bandwidth utilization. The current implementation for the replica placement policy is a first effort in this direction. The short-term goals of implementing this policy are to validate it on production systems, learn more about its behavior, and build a foundation to test and research more sophisticated policies.Large HDFS instances run on a cluster of computers that commonly spread across many racks. Communication between two nodes in different racks has to go through switches. In most cases, network bandwidth between machines in the same rack is greater than network bandwidth between machines in different racks.The NameNode determines the rack id each DataNode belongs to via the process outlined in Rack Awareness. A simple but non-optimal policy is to place replicas on unique racks. This prevents losing data when an entire rack fails and allows use of bandwidth from multiple racks when reading data. This policy evenly distributes replicas in the cluster which makes it easy to balance load on component failure. However, this policy increases the cost of writes because a write needs to transfer blocks to multiple racks.For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack, another on a different node in the local rack, and the last on a different node in a different rack. This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure; this policy does not impact data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three. With this policy, the replicas of a file do not evenly distribute across the racks. One third of replicas are on one node, two thirds of replicas are on one rack, and the other third are evenly distributed across the remaining racks. This policy improves write performance without compromising data reliability or read performance.The current, default replica placement policy described here is a work in progress. Replica SelectionTo minimize global bandwidth consumption and read latency, HDFS tries to satisfy a read request from a replica that is closest to the reader. If there exists a replica on the same rack as the reader node, then that replica is preferred to satisfy the read request. If angg/ HDFS cluster spans multiple data centers, then a replica that is resident in the local data center is preferred over any remote replica.SafemodeOn startup, the NameNode enters a special state called Safemode. Replication of data blocks does not occur when the NameNode is in the Safemode state. The NameNode receives Heartbeat and Blockreport messages from the DataNodes. A Blockreport contains the list of data blocks that a DataNode is hosting. Each block has a specified minimum number of replicas. A block is considered safely replicated when the minimum number of replicas of that data block has checked in with the NameNode. After a configurable percentage of safely replicated data blocks checks in with the NameNode (plus an additional 30 seconds), the NameNode exits the Safemode state. It then determines the list of data blocks (if any) that still have fewer than the specified number of replicas. The NameNode then replicates these blocks to other DataNodes.The Persistence of File System MetadataThe HDFS namespace is stored by the NameNode. The NameNode uses a transaction log called the EditLog to persistently record every change that occurs to file system metadata. For example, creating a new file in HDFS causes the NameNode to insert a record into the EditLog indicating this. Similarly, changing the replication factor of a file causes a new record to be inserted into the EditLog. The NameNode uses a file in its local host OS file system to store the EditLog. The entire file system namespace, including the mapping of blocks to files and file system properties, is stored in a file called the FsImage. The FsImage is stored as a file in the NameNode’s local file system too.The NameNode keeps an image of the entire file system namespace and file Blockmap in memory. This key metadata item is designed to be compact, such that a NameNode with 4 GB of RAM is plenty to support a huge number of files and directories. When the NameNode starts up, it reads the FsImage and EditLog from disk, applies all the transactions from the EditLog to the in-memory representation of the FsImage, and flushes out this new version into a new FsImage on disk. It can then truncate the old EditLog because its transactions have been applied to the persistent FsImage. This process is called a checkpoint. In the current implementation, a checkpoint only occurs when the NameNode starts up. Work is in progress to support periodic checkpointing in the near future.The DataNode stores HDFS data in files in its local file system. The DataNode has no knowledge about HDFS files. It stores each block of HDFS data in a separatefile in its local file system. The DataNode does not create all files in the same directory. Instead, it uses a heuristic to determine the optimal number of files per directory and creates subdirectories appropriately. It is not optimal to create all local files in the same directory because the local file system might not be able to efficiently support a huge number of files in a single directory. When a DataNode starts up, it scans through its local file system, generates a list of all HDFS data blocks that correspond to each of these local files and sends this report to the NameNode: this is the Blockreport.The Communication ProtocolsAll HDFS communication protocols are layered on top of the TCP/IP protocol. A client establishes a connection to a configurable TCP port on the NameNode machine. It talks the ClientProtocol with the NameNode. The DataNodes talk to the NameNode using the DataNode Protocol. A Remote Procedure Call (RPC) abstraction wraps both the Client Protocol and the DataNode Protocol. By design, the NameNode never initiates any RPCs. Instead, it only responds to RPC requests issued by DataNodes or clients.RobustnessThe primary objective of HDFS is to store data reliably even in the presence of failures. The three common types of failures are NameNode failures, DataNode failures and network partitions.Data Disk Failure, Heartbeats and Re-ReplicationEach DataNode sends a Heartbeat message to the NameNode periodically. A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. The NameNode detects this condition by the absence of a Heartbeat message. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. Any data that was registered to a dead DataNode is not available to HDFS any more. DataNode death may cause the replication factor of some blocks to fall below their specified value. The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased.Cluster RebalancingThe HDFS architecture is compatible with data rebalancing schemes. A scheme might automatically move data from one DataNode to another if the free space on a DataNode falls below a certain threshold. In the event of a sudden high demand for a particular file, a scheme might dynamically create additional replicas and rebalance other data in the cluster. These types of data rebalancing schemes are not yet implemented.Data IntegrityIt is possible that a block of data fetched from a DataNode arrives corrupted. This corruption can occur because of faults in a storage device, network faults, or buggy software. The HDFS client software implements checksum checking on the contents of HDFS files. When a client creates an HDFS file, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace. When a client retrieves file contents it verifies that the data it received from each DataNode matches the checksum stored in the associated checksum file. If not, then the client can opt to retrieve that block from another DataNode that has a replica of that block.Metadata Disk FailureThe FsImage and the EditLog are central data structures of HDFS. A corruption of these files can cause the HDFS instance to be non-functional. For this reason, the NameNode can be configured to support maintaining multiple copies of the FsImage and EditLog. Any update to either the FsImage or EditLog causes each of the FsImages and EditLogs to get updated synchronously. This synchronous updating of multiple copies of the FsImage and EditLog may degrade the rate of namespace transactions per second that a NameNode can support. However, this degradation is acceptable because even though HDFS applications are very data intensive in nature, they are not metadata intensive. When a NameNode restarts, it selects the latest consistent FsImage and EditLog to use.The NameNode machine is a single point of failure for an HDFS cluster. If the NameNode machine fails, manual intervention is necessary. Currently, automatic restart and failover of the NameNode software to another machine is not supported.SnapshotsSnapshots support storing a copy of data at a particular instant of time. One usage of the snapshot feature may be to roll back a corrupted HDFS instance to a previously known good point in time. HDFS does not currently support snapshots but will in a future release.Data OrganizationData BlocksHDFS is designed to support very large files. Applications that are compatible with HDFS are those that deal with large data sets. These applications write their data only once but they read it one or more times and require these reads to be satisfied at streaming speeds. HDFS supports write-once-read-many semantics on files. A typical block size used by HDFS is 64 MB. Thus, an HDFS file is chopped up into 64 MB chunks, and if possible, each chunk will reside on a different DataNode.StagingA client request to create a file does not reach the NameNode immediately. In fact, initially the HDFS client caches the file data into a temporary local file. Application writes are transparently redirected to this temporary local file. When the local file accumulates data worth over one HDFS block size, the client contacts the NameNode. The NameNode inserts the file name into the file system hierarchy and allocates a data block for it. The NameNode responds to the client request with the identity of the DataNode and the destination data block. Then the client flushes the block of data from the local temporary file to the specified DataNode. When a file is closed, the remaining un-flushed data in the temporary local file is transferred to the DataNode. The client then tells the NameNode that the file is closed. At this point, the NameNode commits the file creation operation into a persistent store. If the NameNode dies before the file is closed, the file is lost.The above approach has been adopted after careful consideration of target applications that run on HDFS. These applications need streaming writes to files. If a client writes to a remote file directly without any client side buffering, the network speed and the congestion in the network impacts throughput considerably. This approach is not without precedent. Earlier distributed file systems, e.g. AFS, have used client side caching to improve performance. APOSIX requirement has been relaxed to achieve higher performance of data uploads.Replication PipeliningWhen a client is writing data to an HDFS file, its data is first written to a local file as explained in the previous section. Suppose the HDFS file has a replication factor of three. When the local file accumulates a full block of user data, the client retrieves a list of DataNodes from the NameNode. This list contains the DataNodes that will host a replica of that block. The client then flushes the data block to the first DataNode. The first DataNode starts receiving the data in small portions (4 KB), writes each portion to its local repository and transfers that portion to the second DataNode in the list. The second DataNode, in turn starts receiving each portion of the data block, writes that portion to its repository and then flushes that portion to the third DataNode. Finally, the third DataNode writes the data to its local repository. Thus, a DataNode can be receiving data from the previous one in the pipeline and at the same time forwarding data to the next one in the pipeline. Thus, the data is pipelined from one DataNode to the next.AccessibilityHDFS can be accessed from applications in many different ways. Natively, HDFS provides a Java API for applications to use. A C language wrapper for this Java API is also available. In addition, an HTTP browser can also be used to browse the files of an HDFS instance. Work is in progress to expose HDFS through the WebDAV protocol.FS ShellHDFS allows user data to be organized in the form of files and directories. It provides a commandline interface called FS shell that lets a user interact with the data in HDFS. The syntax of this command set is similar to other shells (e.g. bash, csh) that users are already familiar with. Here are some sample action/command pairs:FS shell is targeted for applications that need a scripting language to interact with the stored data.DFSAdminThe DFSAdmin command set is used for administering an HDFS cluster. These are commands that are used only by an HDFS administrator. Here are some sample action/command pairs:Browser InterfaceA typical HDFS install configures a web server to expose the HDFS namespace through a configurable TCP port. This allows a user to navigate the HDFS namespace and view the contents of its files using a web browser.Space ReclamationFile Deletes and UndeletesWhen a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS first renames it to a file in the /trash directory. The file can be restored quickly as long as it remains in /trash. A file remains in/trash for a configurable amount of time. After the expiry of its life in /trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.A user can Undelete a file after deleting it as long as it remains in the /trash directory. If a user wants to undelete a file that he/she has deleted, he/she can navigate the /trash directory and retrieve the file. The /trash directory contains only the latest copy of the file that was deleted. The /trash directory is just like any other directory with one special feature: HDFS applies specified policies to automatically delete files from this directory. The current default policy is to delete files from /trash that are more than 6 hours old. In the future, this policy will be configurable through a well defined interface.Decrease Replication FactorWhen the replication factor of a file is reduced, the NameNode selects excess replicas that can be deleted. The next Heartbeat transfers this information to the DataNode. The DataNode then removes the corresponding blocks and the corresponding free space appears in the cluster. Once again, there might be a time delay between the completion of the setReplication API call and the appearance of free space in the cluster.中文译本原文地址:/docs/r0.18.3/hdfs_design.html一、引言Hadoop分布式文件系统(HDFS)被设计成适合运行在通用硬件(commodity hardware)上的分布式文件系统。
分布式控制系统(dcs)设计与应用实例英文版I. IntroductionIntroduction:Distributed Control Systems (DCS) represent a pivotal technology in industrial automation, offering efficient and reliable control over complex processes across multiple locations. These systems decentralize control functions, enhancing system resilience and flexibility.Intrduction:分布式控制系统(DCS)在工业自动化领域扮演着关键角色,能有效并可靠地对多个地点的复杂流程进行控制。
这些系统通过分散控制功能,提升了系统的抗风险能力和灵活性。
II. DCS Design PrinciplesDesign Fundamentals:The design of a DCS revolves around the principle of modularity and redundancy. It consists of controllers, sensors, actuators, and human-machine interfaces (HMIs), all interconnected through a communication network. Each component is designed to perform specific tasks while maintaining real-time data exchange for coordinatedoperation.设计基础:DCS的设计以模块化和冗余性原则为核心。
Impact of High Penetration of Distributed Generation on SystemDesign and Operations(1.Bartosz Wojszczyk1,Omar Al-Juburi2,Joy Wang3 Accenture,Raleigh 27601,U.S.;2.Accenture,San Francisco 94105,U.S.;3.Accenture,Shanghai 200020,China) ABSTRACT: This paper addresses the topic of massive utility-oriented deployment of Distributed Generation (DG) in power systems. High penetration of DG presents significant challenges to design/engineering practices as well as to the reliable operation of the power system. This paper examines the impact of large-scale DER implementation on system design, reliable operation and performance and includes practical examples from utility demonstration projects. It also presents a vision for the utility of the future and describes DG technologies being implemented by utilities.KEY WORDS: distributed energy resources ;distributed generation;power system design and operation0 IntroductionDistributed generation (DG) or decentralized generation is not a new industry concept. In 1882, Thomas Edison built his first commercial electric plant—“Pearl Street”. This power station provided 110V DC electricity to 59 customers in lower Manhattan. In 1887, there were 121 Edison power stations in the United States delivering DC electricity to customers. These first power plants were run on water or coal. Centralized power generation became possible when it was recognized that alternating current power could be transported at relatively low costs and reduce power losses across great distances by taking advantage of the ability to raise the voltage at the generation station and lower the voltage near customer loads. In addition, the concepts of improved system performance (system stability) and more effective generation asset utilization provided a platform for wide-area/global grid integration. In recent years, there has been a rapidly growing interest in wide deployment of DG. Commercially available technologies for DG are based on combustion engines, micro- and mini-gas turbines, wind turbines, fuel-cells, various photovoltaic (PV) solutions, low-head hydro units and geothermal systems.Deregulation of the electric utility industry (in some countries), environmental concerns associated with traditional fossil fuel generation power plants, volatility of electric energy costs, Federal and State regulatory support of “green” energy and rapid technological developments all support the proliferation of DG units in electric utility systems. The growing rate of DGdeployment suggests that alternative energy-based solutions play an increasingly important role in the smart grid and modern utility.Large-scale implementation of DG can lead to situations in which the distribution/medium voltage network evolves from a “passive” (local/limited automation, monitoring and control) system to one that actively (global/integrated, self-monitoring, semi-automated) responds to the various dynamics of the electric grid. This poses a challenge for design, operation and management of the power grid as the network no longer behaves as it once did. Consequently, the planning and operation of new systems must be approached somewhat differently with a greater amount of attention paid to global system challenges.The principal goal of this paper is to address the topic of high penetration of distributed generation and its impact on grid design and operations. The following sections describe a vision for the modern utility, DG technology landscape, and DG design/engineering challenges and highlights some of the utility DG demonstration projects.1 Vision for modern utilities1.1 Centralized vs. distributedThe bulk of electric power used worldwide is produced at central power plants, most of which utilize large fossil fuel combustion, hydro or nuclear reactors. A majority of these central stations have an output between 30MW (industrial plant) and 1.7GW. This makes them relatively large in terms of both physical size and facility requirements as compared with DG alternatives. In contrast, DG is:1)Installed at various locations (closer to the load) throughout the power system and mostly operated by independent power producers or consumers.2)Not centrally dispatched (although the development of “virtual” power plants, where many decentralized DG units operate as one single unit, may be an exception to this definition).3)Defined by power rating in a wide range from a few kW to tens of MW (in some countries MW limitation is defined by standards, e.g. US, IEEE 1547 defines DG up to 10MW –either as a single unit or aggregate capacity).4)Connected to the distribution/medium voltage network - which generally refers to the part of the network that has an operating voltage of 600V up to 110kV (depends on the utility/country).The main reasons why central, rather than distributed, generation still dominates current electricity production include economy of scale, fuel cost and availability, and lifetime. Increasing the size of a production unit decreases the cost per MW; however, the advantage of economy of scale is decreasing—technological advances in fuel conversion have improved the economy of small units. Fuel cost and availability is still another reason to keep building largepower plants. Additionally, with a lifetime of 25~50 years, large power plants will continue to remain the prime source of electricity for many years to come.The benefits of distributed generation include: higher efficiency; improved security of supply; improved demand-response capabilities; avoidance of overcapacity; better peak load management; reduction of grid losses; network infrastructure cost deferral; power quality support; reliability improvement; and environmental and aesthetic concerns (offers a wide range of alternatives to traditional power system design). DG offers extraordinary value because it provides a flexible range of combinations between cost and reliability. In addition, DG may eventually become a more desirable generation asset because it is “closer” to the customer and is more economical than central station generation and its associated transmission infrastructure. The disadvantages of DG are ownership and operation, fuel delivery (machine-based DG, remote locations), cost of connection, dispatchability and controllability (wind and solar).1.2 Development of “smart grid”In recent years, there has been a rapidly growing interest in what is called “Smart Grid –Digitized Grid –Grid of the Future”. The main drivers behind this market trend are grid performance, technology enhancement and stakeholders’ atten tion . The main vision behind this market trend is the use of enhanced power equipment/technologies, monitoring devices (sensors), digital and fully integrated communications, and embedded digital processing to make the power grid observable (able to measure the states of critical grid elements), controllable (able to affect the state of any critical grid element), automated (able to adapt and self-heal), and user-friendly (bi-directional utility–customer interaction). The Smart Grid concept should be viewed through the modern utility perspective of remaining profitable (good value to shareholders), continuing to grow revenue streams, providing superior customer service, investing in technologies, making product offerings cost effective and pain free for customers to participate and partnering with new players in the industry to provide more value to society. It is important to recognize that there is merit in the Smart Grid concept and should be viewed in light of it bringing evolutionary rather than revolutionary changes in the industry.In general, this market trend requires a new approach to system design, re-design and network integration and implementation needs. In addition, utilities will have to develop well-defined engineering and construction standards and operation and maintenance practices addressing high penetration levels of DG.2 DG technology landscapeDG systems can utilize either well-established conventional power generation technologies such as low/high temperature fuel cells, diesel, combustion turbines, combined cycle turbines,low-head hydro or other rotating machines, renewable energy technologies including PV, concentrated PV (CPV), solar concentrators, thin-film, solar thermal and wind/mini-wind turbines or technologies that are emerging on the market (e.g. tidal/wave, etc.). Each of the DG technologies has its own advantages and disadvantages which need to be taken into consideration during the selection process.3 DR interconnection requirementsDR interconnection design and engineering details depend on the specific installation size (kW vs. MW); however, the overall components of the installation should include the following:1)DG prime mover (or prime energy source) and its power converter.2)Interface/step-up transformer.3)Grounding (when needed—grounding type depends on utility specific system equirements).4)Microprocessor protective relays for:① Three-, single-phase fault detection and DG overload.② Islanding and abnormal system conditions detection.③ Voltage and current unbalances detection .④ Undesirable reverse power detection .⑤ Machine-based DG synchronization .5)Disconnect switches and/or switchgear(s).6)Metering, control and data logging equipment.7)Communication link(s) for transfer trip and dispatch control functions (when needed).4 Impact of DR integra tion and “penetration” levelIntegration of DG may have an impact on system performance. This impact can be assessed based on:1)Size and type of DG design: power converter type, unit rating, unit impedance, protective relay functions, interface transformer, grounding, etc.2)Type of DG prime mover: wind, PV, ICE, CT, etc.3)Intended DG operating mode(s): load shaving, base-load CHP, power export market, Volt-Var control, etc.4)Interaction with other DG(s) or load(s).5)Location in the system and the characteristics of the grid such as:① Network, auto-looped, radial, etc.② System impedance at connection point.③ Voltage control equipment types, locations and settings.④ Grounding design.⑤ Protection equipment types, locations, and settings.⑥ And other.DR system impact is also dependent on the “penetration” level of the DG connected to the grid. There are a number of factors that should be considered when evaluating the penetration level of DG in the system. Examples of DG penetration level factors include:1)DG as a percent of feeder or local interconnection point peak load (varies with location on the feeder).2)DG as a percent of substation peak load or substation capacity.3)DG as a percent of voltage drop capacity at the interconnection point (varies with location on the feeder).4)DG source fault current contribution as a percent of the utility source fault current (at various locations).4.1 DG impact on voltage regulationVoltage regulation, and in particular voltage rise effect, is a key factor that limits the amount (penetration level) of DG that can be connected to the system. Fig. 2 shows the first example of the network with a relatively large (MW size) DG interconnected at close proximity to the utility substation.Careful investigation of the voltage profile indicates that during heavy load conditions, with connected DG, voltage levels may drop below the acceptable/permissible by standards. The reason for this condition is that relatively large DG reduces the circuit current value seen by the Load Tap Changer (LTC) in the substation (DG current contribution). Since the LTC sees “less” current (representing a light load) than the actual value, it will lower the tap setting to avoid a “light load, high voltage” condition. This action makes the actual “heavy load, low voltage” condition even worse. As a general rule, if the DG contributes less than 20% of the load current, then the DG current contribution effect will be minor and can probably be ignored in most cases.However, if the real power (P) flow direction reverses, toward the substation (Fig. 4), the VR will operate in the reverse mode (primary control). Since the voltage at the substation is a stronger source than the voltage at the DG (cannot be lowered by VR), the VR will increase the number of taps on the secondary side; therefore, voltage on the secondary side increases dramatically.Bi-directional voltage regulators have several control modes for coordination with DG operation. Bi-directionality can be defined based on real (P) and/or reactive (Q) power flow. However, reactive power support (Q) from DG is generally prohibited by standards in many countries. Therefore, VR bi-directionality is set for co-generation modes (real current).4.2 DG impact on power qualityTwo aspects of power quality are usually considered to be important during evaluation of DG impact on system performance: voltage flicker conditions and harmonic distortion of the voltage. Depending on the particular circumstance, a DG can either decrease or increase the quality of the voltage received by other users of the distribution/medium voltage network. Power quality is an increasingly important issue and generation is generally subject to the same regulations as loads. The effect of increasing the grid fault current by adding generation often leads to improved power quality; however, it may also have a negative impact on other aspects of system performance (e.g. protection coordination). A notable exception is that a single large DG, or aggregate of small DG connected to a “weak” grid may lead to power quality problems during starting and stopping conditions or output fluctuations (both normal and abnormal). For certain types of DG, such as wind turbines or PV, current fluctuations are a routine part of operation due to varying wind or sunlight conditions.Other types of DG such as ICE or CT can also have fluctuations due to various factors (e.g. cylinders misfiring and pulsation torque - one misfiring in a 1800 rpm engine translates to 15 Hz pulsation frequency).Harmonics may cause interference with operation of some equipment including overheating/ de-rating of transformers, cables and motors leading to shorter life. In addition, they may interfere with some communication systems located in the close proximity of the grid. In extreme cases they can cause resonant over-voltages, blown fuses, failed equipment, etc. DG technologies have to comply with pre-specified by standards harmonic levels.In order to mitigate harmonic impact in the system the following can be implemented:1)Use an interface transformer with a delta winding or ungrounded winding to minimize injection of triplen harmonics.2)Use a grounding reactor in neutral to minimize triplen harmonic injection.3)Specify rotating generator with 2/3 winding pitch design.4)Apply filters or use phase canceling transformers.5)For inverters: specify PWM inverters with high switching frequency. Avoid line commutated inverters or low switching frequency PWM – otherwise more filters may be needed.6)Place DG at locations with high ratios of utility short circuit current to DG rating.A screening criterion to determine whether detailed studies are required (stiffness factor) to assess DG impact on power quality can be performed based on the ratio between the available utility system fault current at the point of DG connection and the DG’s full load rated output current.4.3 DG impact on ferroresonanceClassic ferroresonance conditions can happen with or without interconnected DG (e.g.resonance between transformer magnetization reactance and underground cable capacitance on an open phase). However, by adding DG to the system we can increase the such as: DG connected rated power is higher than the rated power of the connected load, presence of large capacitor banks (30% to 400% of unit rating), during DG formation on a non-grounded island.4.4 DG impact on system protectionSome DG will contribute current to a circuit current on the feeder. The current contribution will raise fault levels and in some cases may change fault current flow direction. The impact of DG fault current contributions on system protection coordination must be considered. The amount of current contribution, its duration and whether or not there are any protection coordination issues depends on:1)Size and location of DG on the feeder.2)Type of DG (inverter, synchronous machine, induction machine) and its impedance.3)DG protection equipment settings (how fast it trips).4)Impedance, protection and configuration of feeder.5)Type of DG grounding and interface transformer.Machine-based DG (IEC, CT, some micro turbines and wind turbines) injects fault current levels of 4-10 times their rated current with time contribution between 1/3 cycle to several cycles depending on the machine. Inverters contribute about 1-2 times their rated current to faults and can trip-off very quickly--many in less than 1 cycle under ideal conditions. Generally, if fault current levels are changed less than 5% by the DG, then it is unlikely that fault current contribution will have an impact on the existing system/equipment operation. Utilities must also consider interrupting capability of the equipment, e.g. circuit breakers, reclosers and fuses must have sufficient capacity to interrupt the combined DG and utility source fault levels.5 DG interconnection–utility demonstration project examples5.1 Utility 1: ground-fault current contribution for a synchronous DG and changing transformer configurationGround-fault current contribution for 100 kVA, 500kVA and 2MVA synchronous DG is being investigated on some rural feeders in the U.S. In addition, during investigation, the transformer configuration (Delta/Wye/Grounded Wye) on the DG side and utility side was changed.The DG fault current contribution changes in a range from less than 1% (for Delta/Delta transformer configuration) to approx. 30% for 2MVA DG with Delta\Grounded Wye transformer configuration. Slight changes of the fault current for the non-grounded utility-side are due to an increase in pre-fault voltage.5.2 Utility 2: customer-based reliability enhancement on rural feeders – planned islandingapplicationThe planned islanding application is being investigated on some rural feeders in Canada to improve the reliability of supply for rural communities, where the corresponding distribution substation is only supplied by a single high voltage (HV) line [4]. Customers on those feeders may experience sustained power outages for several hours a few times per year due to environmental and weather impacts that cause line break-downs and power outages. A local independent power producer (IPP) equipped with additional equipment is employed to supply the load downstream of the substation when the HV line is down or during maintenance periods of the substation. The IPP is paid an additional bonus if it can successfully serve the load during a power outage.5.3 Utility 3: peaking generation and DG for demand reduction applicationThis application addresses a utility approach toward peak shaving and demand reduction which is attractive to those LDC that purchase electricity from larger utility companies based on a particular rate structure. The cost of electricity for LDC is normally calculated based on energy rate (MWh), the maximum demand (MW) and surcharges due to exceeding the agreed upon maximum demand. The peak-time cost of electricity can be as high as 10~20 times the regular rates.In this case, LDC may install peaking units or have an agreement with specific customers in the LDC’s area that already have on-site backup generation units. The peaking units are operated only during peak-load time for a total of 100 to 200 hours per year based on 5~10 min dispatch commands. In return, the participating facilities are paid for the total power supplied during peak-demand periods at an agreed-upon rate that compensates for both generation/maintenance costs and plant upgrading costs in order to respond to utility dispatch commands.5.4 Utility 4: energy storage applications for firming up intermittency of distributed renewable generation (DRG)Medium to high penetration of renewable energy resources (RES) can cause large power fluctuations due to the variable and dynamic nature of the primary energy source, such as wind and solar photovoltaic generation. Power fluctuations may cause reverse power flow toward the main grid, especially during light load conditions of the feeder. Furthermore, due to inherent intermittent resource characteristics, the firm capacity of a large RES-based DG may be very low and ultimately the utility grid will still be the main provider of the spinning reserve capacity and emergency backup generation in the area. Deployment of distributed energy storage units, when adequately sized and properly co-located with RES integration, has been explored by several utility companies in the U.S. to firm up power fluctuations of the high penetration of renewableenergy (wind and solar) and to reduce adverse impacts on the main grid. Fig. 10 shows an application of energy storage that locally compensates the variation in power output of a large wind farm and averages out the power fluctuations. Hence, the power flow measured at the point of common coupling (PCC) can be controlled based on a pre-scheduled profile, below permissible demand capacity of the feeder.The controlled level of power flow at the PCC also drastically reduces the reserve capacity requirement for management of the load on this feeder since the energy storage unit can provide back-up generation in case of a sudden reduction in wind power production; therefore, it increases the load carrying capacity of the wind farm.6 ConclusionA growing number of electric utilities worldwide are seeking ways to provide excellent energy services and become more customer-focused, competitive, efficient, innovative, and environmentally responsible. Distributed Generation is becoming an important element of the electric utility’s Smart Grid portfolio in the 21st century. Present barriers to widespread implementation of DG are being reduced as technologies mature and financial incentives (including government- and- investor-supported funding) materialize. However, there are still technical challenges that need to be addressed and effectively overcome by utilities. Distributed Generation should become part of day-to-day planning, design and operation processes/practices. Special consideration should be given to the following:1)Transmission and distribution substation designs that are able to handle significant penetration of DG2)Equipment rating margins for fault level growth (due to added DG).3)Protective relays, and settings that can provide reliable and secure operation of the system with interconnected DG (that can handle multiple sources, reverse flow, variable fault levels, etc.).4)Feeder voltage regulation and voltage-drop design approaches that factor possible significant penetration of DG.5)Service restoration practices that reduce chance of interference of DG in the process and even take advantage of DG to enhance reliability where possible.6)Grounding practices and means to control DG induced ground fault over voltages.References[1] KEMA Consulting.Power Quality and Utilization Guide,Section 8 -Distributed Generation andRenewables[M/OL] .Leonardo Energy,Cooper Development Association.http://www.Copperinfo..[2] Willis H L,Scott W G.Distributed Power Generation:Planning and Evaluation[M].New York:Marcel Dekker,2000.[3] Wojszczyk B,Katiraei F.Distributed Energy Resources-Control Operation,and Utility Interconnection[C]//Seminar for Various North American Utilities,2007&2008.[4] Abbey C,Katiraei F,Brothers C,et al.Integration of distributed generation and wind energy in Canada[C]//IEEE PES GM,Montreal,2006:7.。
An Introduction to Database Systems 1A Database Management System (DBMS) consists of a collection of interrelated data and a set of programs to access those data. A database is a collection of data organized to server many applications efficiently by centralizing the data and minimizing redundant data. The primary goal of a DBMS is to provide an environment that is both convenient and efficient to use in retrieving and storing database information。
Database systems are designed to manage large bodies of information。
The management of data involves both the definition of structures for the storage of information and the provision of mechanisms for the manipulation of information stored, despite system must avoid possible anomalous results。
The important of information in most organizations, which determines the value of the database, has led to the development of a large body of concepts and techniques for the database, has led to the development of a large body of concepts and techniques for the efficient management of data.The typical file—processing system is supported by a conventional operating system。
hadoop分布式存储平台外文翻译文献(文档含中英文对照即英文原文和中文翻译)原文:TechnicalIssuesofForensicInvestigationsinCloudComputing EnvironmentsDominik BirkRuhr-UniversityBochum HorstGoertzInstituteforITSecurityBochum, GermanyRuhr-University BochumHorstGoertzInstitute forITSecurity Bochum,GermanyAbstract—Cloud Computing is arguably one of the mostdiscussed information technologiestoday. It presentsmany promising technological andeconomicalopportunities. However, many customers remain reluctant tomove t hei r bus i n ess I T infr a structure com pl etely t o th e cl o ud. O n e of theirma i n concernsisCloudSecurityandthethreatoftheunknown.CloudService Provide rs(CSP) encourage this perception by not letting their customerssee whatis behindtheirvirtualcurtain.Aseldomlydiscussed,butinthisregard highly relevant open issue is the ability to perform digital investigations.This c onti nue s to fuel insecu ri ty on the si d es of both pro vi dersa nd c us t omers.CloudForensicsconstitutesanewanddisruptivechallengeforinvestigators.D ue to the decentralized nature of data processing in the cloud,traditional approachestoevidencecollectionandrecoveryareno longerpractical. This paperfocusesonthetechnicalaspectsofdigitalforensicsindistributedcloud e nvir onments. W e c ont ribute b y ass es sing whether it i s possible for t he customerofcloudcomputingservicestoperformatraditionaldigital investigation from a technical point of view. Furthermore we discusspossible solutions and possible new methodologies helping customers to perform such investigations.I.INTRODUCTIONAlthough the cloud might appear attractive to small as well as tolargecompanies, it does not comealongwithoutits own uniqueproblems.Outsourcingsensitive corporatedata intothe cloudraisesconcerns regardingtheprivacyandsecurityofdata.Securitypolicies,companiesmainpillarc oncern ing s e curity, cann ot be ea s il y d eployed into distributed,virtualize d cloud environments. This situation is further complicated bytheunknown physicallocationofthe companie’s assets.Normally,if a securityincident occurs,thecorporatesecurityteamwantstobeabletoperformtheirownin vestigationwithoutdependencyonthirdparties.Inthecloud,thisisnotpos sible any m ore: The CSP obtain s all t he pow e r over the env i ronmentandthus controls the sources of evidence. In the best case, a trusted thirdpartyactsasa trustee and guarantees for the trustworthiness of theCSP.Furthermore,theimplementationofthe technicalarchitecture andcircumstanceswithincloudcomputingenvironmentsbiasthewayani nvestigation may b e proc es sed. I n detail, evidenc e dat a hasto be i nte r pret ed by an investigator in a We would like to thank the reviewers for thehelpful commentsandDennisHeinson(CenterforAdvancedSecurityResearch Darmstadt - CASED) for the profound discussions regarding the legalaspects ofcloudforensics.propermannerwhichishardlybepossibleduetothelack of circumstantial information. For auditors, this situation does notchange:Questions who accessed specific data and information cannot be answeredby t hec us tomers,i fnocorrespondingl og sareav a ila bl e.Withthei nc rea s i n g demandforusingthepowerofthecloudforprocessingals osensible informationand data,enterprises face the issue of DataandProcess Provenance in the cloud [ 10]. Digital provenance, meaning meta-data that describesthe ancestry or history of a digital object, is a crucialfeaturefor f oren s i c i nve stigations. In combination w ith a suitableauthentication sche m e,it provides information about who created and who modified what kind of data in the cloud. These are crucial aspects for digital investigations indistributed environmentssuchasthecloud.Unfortunately,theaspectsofforensic invest igations in distributed environment have so far been mostly neglectedby t he res e a rc h commun i ty. Current discuss i on c ent e rs mostly a rounds e curity,privacy and data protection issues [ 35], [ 9], [ 12]. The impact of forensic investigations on cloud environments was little noticed albeit mentionedby the authors of[ 1]in2009:”[...] to ourknowledge,noresearchhasbeen published on how cloud computing environments affect digitalartifacts,a ndon acquisit i on l og i s t i cs a nd lega l i s sues rela t ed t oc l oudcomputing environments.”This statementis also confirmed by other authors [34],[36],[ 40] stressing that further research on incident handling, evidence trackingand accountabilityincloudenvironmentshastobedone.Atthe same time,binedwiththefa ct t hat i nforma t iont e chnologyin c re a singl y transcende nt s pe oples’p ri vateand professional life, thus mirroring more and more of peoples’actions,itbecomes apparent that evidence gathered from cloud environments will beof high significance to litigation or criminal proceedings in the future. Within thiswork, we focus the notionof cloud forensics by addressing the technical issuesof for e nsi c s in a ll three m a j or c loud ser vi ce m od els a ndcons i dercross-disciplinary aspects. Moreover, we address the usability of varioussources of evidence for investigative purposes and propose potential solutions to the issues from a practical standpoint. This work should be considered as asurveying discussion of an almost unexploredresearch area. The paperisor g an iz e d a s fol lows:Wedi s c us stherela t edw or ka n dt h e f unda m enta l technical background information of digital forensics, cloudcomputing and the fault model in section II and III. In section IV, wefocus on the technical issues of cloud forensics and discuss the potential sources and natureofdigital evidenceaswellasinvestigationsin XaaSenvironmentsincludingthe cross-disciplinary aspects. We conclude in sectionV.II.RELATEDWORKV ariousworkshavebe e npubli s he d in t hefi e ldofclo ud se c ur i tya n dp riva cy[9],[35],[30 ]focussingonaspectsforprotectingdatainmulti-tenant,virtualized environments. Desired security characteristics forcurrentcloud infrastructuresmainlyrevolvearoundisolationofmulti-tenantplatforms[12],security of hypervisors in order to protect virtualized guest systems andsecure ne tworki nfra struc t ures[32].Albe i td i g ita l provenance,desc r ib i ngthean ce s try of digital objects, still remains a challenging issue for cloud environments, several works have already been published in this field [ 8], [10]contributing totheissuesofcloudforensis.Withinthiscontext,cryptographicproofsfor v erifying data integrity mainly in cloud storage offers have beenproposed,y et l a cki n g ofpractic a li m pl em entations[24],[37],[23].Tradi t iona l computer forensics hasalreadywellresearchedmethodsforvariousfieldsofapplication[ 4], [5], [6], [ 11], [13]. Also the aspects of forensics in virtual systemshave been addressed by several works [2], [ 3], [20] including the notionofvirtual introspection [ 25]. Inaddition, the NIST already addressed WebService F orensics [ 22] which has a hug e im pa ct on investig a ti on processes in c loudcomputing environments. In contrast, the aspects of forensic investigationsincloud environments have mostly been neglected by both the industry andthe researchcommunity.Oneofthefirstpapersfocusingonthistopicwaspublishedby Wolthusen[40]afterBebeeetalalready introduced problemsw i t hin c loud e nvi ronments [ 1]. Wol t husen s t re s s ed t hat t here is an i nhere nt strong need for interdisciplinary work linking the requirements andconceptsofevidence arising from the legal field to what can be feasiblyreconstructed andinferredalgorithmicallyorinanexploratorymanner.In2010, Grobaueretal [ 36] published a paper discussing the issues of incident response in cloude nvironments- un f ortunatelyno s pe ci f i c i s sue s a nd s ol uti onsof cloudforensics have been proposed which will be done within thiswork.III.TECHNICALBACKGROUNDA. Traditional DigitalForensicsThe notion of Digital Forensics is widely known as the practice of identifying,e xt rac ti ng andconsi de r i ngevide n cefromdi g i t almedia.U nf ortunately,dig italevidence is both fragile and volatile and therefore requires the attentionof specialpersonnelandmethodsinordertoensurethat evidencedatacanbe proper isolated and evaluated. Normally, the process of a digitalinvestigation can be separatedinto three different steps each having its ownspecificpurpose:1)In the Securing Phase, the majorintention is the preservation of evidence for an alys is. T h e d ata has t o be col l ec t ed in a ma nne r that ma x imize s its integrity. This is normally done by abitwise copy of the original media. As can be imagined, this represents a huge problem in the field of cloudcomputing where you never know exactly where your data is andadditionallydonot have access to any physical hardware.However,thesnapshot t ec hnol ogy,di s c us s e di n s e c ti on IV-B3,provi de sapowerful t oolt ofr e ez e system states and thus makes digital investigations, at least in IaaSscenarios,theoreticallypossible.2)We refer to the Analyzing Phase as the stage in which the data is sifted and combined.Itisinthisphasethatthedatafrommultiplesystemsorsourcesis p ull e dto g et he r to c r e ate as complete a pic t ure a nd event r econst r uc t ionas possible.Especiallyindistributedsysteminfrastructures, thismeans that bits and pieces of data are pulled together for deciphering the real story ofwhat happened and for providing a deeper look into the data.3)Finally, at the end of the examination and analysis of the data, theresultsof t heprevi ous phasesw i ll be r ep r o cessedint heP re s entati o n P ha s e.T he r ep ort,cr eated in this phase, is a compilation of all the documentation andevidencefromtheanalysisstage.Themainintentionofsuchareportisthatitcontain s allresults,itiscompleteandcleartounderstand.Apparently,thesuccessofthesethreesteps stronglydependsonthefirststage.Ifitis notpossible tos e cur e the com pl ete s et ofe vidence data, no exhausti ve analysi s w ill bepossible. However, in real world scenarios often only a subset of theevidencedatacanbesecuredbytheinvestigator.Inaddition,animportantdefinition in the general context of forensics is the notion of a Chain of Custody. This chainclarifies how and where evidence is stored and who takes possession ofit.E spe c iallyforc a se s w h ic h areb ro ughtt o c ou rt it iscrucia l thatt he chainofcustody ispreserved.B.CloudComputingAccordingtotheNIST[16],cloudcomputingisamodel forenablingconvenient,on-demandnetworkaccesstoasharedpoolof configurablec omput i ng resources(e.g., networks, se rv ers, storage, a ppli c at i onsandservices) that can be rapidly provisioned and released with minimalCSP interaction. The new raw definition of cloud computing brought several new characteristics such as multi-tenancy, elasticity, pay-as-you-go andreliability.Within this work, the following three models are used: In the Infrastructureas aService(IaaS)model,thecustomerisusingthevirtualmachineprovidedbytheCSP for installing his own system on it. The system can be used likeany o ther physical com pu ter w i th a few lim i tations. Ho w eve r,t he additive customerpoweroverthesystemcomesalongwithadditionalsecurity obligations. Platform as a Service (PaaS) offerings provide the capabilityto deploy application packages created using the virtual development environment supported by the CSP. Forthe efficiency of software de velopmentprocessth i s s e rv icem ode lcanbepropel l ent.Inth e S oft w a reas aService(SaaS)model,the customermakesuseofaservicerunbytheCSPon a cloud infrastructure. In most of the cases this service canbe accessed throughanAPIforathinclientinterfacesuchas a web browser.Closed-source public SaaS offers such as Amazon S 3 and GoogleMailcan onlybe usedi nt hepublicdepl oy ment m odelleading t ofurtherissue sco ncerning security, privacy and the gathering of suitable evidences. Furthermore, two main deployment models, private and public cloud haveto monpubliccloudsaremadeavailabletothe general public. The corresponding infrastructure is owned by one organizationacting a sa C SPand of fe ri ngser vi ces t oits c usto m ers.I n cont ra st,t he pri va tecl ou disexclusively operated for an organization but may not providethescalabilityand agility of public offers. The additional notions of community andhybrid cloud are notexclusively covered within this work. However, independentlyfromthespecific modelused,themovementofapplicationsand data tothec loud c ome sa longwit h limitedcontrolfort he custom e ra bou tt h eappl i ca t ionitself, the data pushed into the applications and also about theunderlyingtechnicalinfrastructure.C.FaultModelBeitanaccountforaSaaSapplication,adevelopmentenvironment(PaaS)ora vi rt ual ima g e of an Iaa S environm e nt, sys t ems in th e clou d can b e a f fec t edby inconsistencies. Hence, for both customer and CSP it is crucial to havetheability to assign faults to the causing party, even in the presence of Byzantine behavior [33]. Generally, inconsistencies can be caused bythefollowing tworeasons:1)M a l ic i ous ly Inte nde dF a ult sInternal or external adversaries with specific malicious intentions cancause faultsoncloudinstancesorapplications.Economicrivalsaswellasformer empl oyees can be the reason for these faults and state a constant threatto customersandCSP.Inthismodel,alsoamaliciousCSPisincludedalbeitheis assu med to be rare in real world scenarios. Additionally, from thetechnicalpointof view, the movement of computing power to a virtualized,multi-tenant e nvironme nt c a npos e fu rt herthreadsandri s kst o thesyste m s.Onere a sonfor thisi sthatifasinglesystemorserviceinthecloudiscompromised,allother guest systems and even the host system are at risk. Hence, besides the needfor furthersecurity measures, precautions for potential forensic investigations have to be taken intoconsideration.2)U ni n tent i onal F aul t s Inconsistenciesintechnicalsystemsorprocessesintheclouddonothave implicitly to be caused by malicious intent. Internal communication errorsor humanfailurescanleadtoissuesintheservicesofferedtothecostumer(i.e. loss or modification of data). Although these failures are notcaused i nte nt iona l ly, both t he CS P and t he custom e r have a stron g int e nti on to discover the reasons and deploy correspondingfixes.IV.TECHNICALISSUES Digitalinvestigationsareaboutcontrolofforensicevidencedata.Fromthe technicalstand point,thisdatacanbeavailableinthreedifferentstates:atrest,i n motion or in execut i o n. Da t a at re st i s repr e sent ed by a l locat e d diskspace.Whether the data is stored in a database or in a specific file format, itallocatesdisk space. Furthermore, if a file is deleted, the disk space is de-allocatedfor theoperatingsystembutthedataisstillaccessiblesincethediskspacehasnotbeen re-allocated and overwritten. This fact is often exploited byinvestigatorsw hi c hexplorethesede-a l l ocated d iskspa c eonha rddi s ks.I n c a set heda taisi n motion, data is transferred from one entity to another e.g. a typical filetransferover a network can be seen as a data in motion scenario.Severalencapsulated protocolscontainthedataeachleavingspecifictracesonsystemsan dnetworkdeviceswhichcaninreturnbeusedbyinvestigators.Datacanbeloadedintom em or ya nd e x ecutedasap ro c e ss.In t his c as e,the da taisneithe r atres t orinmotion but in execution. Onthe executing system, processinformation,machine instruction and allocated/de-allocated data can beanalyzed by creating a snapshot of the current system state. In thefollowing sections, wepoint out the potential sources for evidential data in cloud environments anddi s c us s t he te c hnical issue s ofdigit a l i nve stiga t ions in XaaSenvironm e ntsaswell as suggest several solutions to theseproblems.A.Sources and Nature ofEvidenceConcerning the technical aspects of forensic investigations, the amountof potential evidence available to the investigator strongly diverges between the different cloud service anddeployment models. The virtualmachine(VM),hostingin most of the cases the server application, provides severalpiecesof i nforma t ionthatc oul dbeusedbyi nve s t iga t ors.Onth e netw orkl evel,n e twork componentscanprovideinf ormationaboutpossiblecommunicationchannels between different parties involved. The browser on the client, acting often as the user agent for communicating with the cloud, also contains a lot of information that could be used as evidence in a forensicinvestigation.I nd epe ndentlyfromthe u sedm ode l,thefol l ow i n g t hr e e compone ntscouldac tas sourcesfor potential evidentialdata.1)VirtualCloudInstance:TheVMwithinthecloud,wherei.e.dataisstored orprocesses arehandled,containspotentialevidence[2],[3].Inmostofthe cases, it is the place where an incident happened and hence providesagood s t a rti ngpo int for afor e nsici nve stig a ti o n.The V Mi ns ta nc e c anbeaccess e dby both, the CSP and the customer whois running the instance. Furthermore,virtual introspection techniques [25] provide access to the runtime state ofthe VM via the hypervisor and snapshot technology suppliesa powerful technique for the customer to freeze specific states of the VM. Therefore,virtual i ns t anc e scanb es t i ll runn ingduri n g a na lys i s whichle a dstothec a s e ofli v einvestigations [41] or can be turned off leading to static image analysis. InSaaS and PaaS scenarios, the ability to access the virtual instance for gathering evidential informationis highly limited or simply notpossible.2)Network Layer: Traditional network forensics isknowna sthe anal ys is of net w ork tra f fic logs fo r trac i ng e v e nt s t hat h a ve oc curred inthe past. Since the differentISO/OS I networklayers provide severalinformation on protocols and communication between instances within aswell aswithinstancesoutsidethecloud[4],[5],[6],networkforensics istheoreticallyalsofeasiblein cloud environments. However inpractice,ordi nary CSP currently do not provide any log d ata f rom th e network componentsusedbythe customer’s instancesorapplications.Forinstance, in case of a malware infection of an IaaSVM, it will be difficult forthe investigator to getany form of routing information and network logdata ingeneral which is crucial for further investigative steps. This situationgetse ven more complica te d in case of Paa So r SaaS. So again, the sit ua tiono f gathering forensic evidence is strongly affected by the support theinvestigator receives from thecustomer and the CSP.3)Client System: On the system layer of the client, it completely depends on theusedmodel(IaaS,PaaS,SaaS)ifandwherepotentialevidencecouldbe extracted. In most of the scenarios, the user agent (e.g. the web browser)ontheclient system is the only application that communicates with the servicein t he c l oud. Thi s e s peci a lly holds for SaaS appl ic ati ons which are use d and controlled by the web browser. But also in IaaSscenarios, the administration interface is often controlled via the browser. Hence, in an exhaustiveforensic investigation, the evidence data gathered from the browserenvironment [7]should not beomitted. a)B rows e r F o rensics: Generally, the circumst a nces leading to aninve s t i gation haveto be differentiated: In ordinary scenarios, themaingoal of an investigationofthewebbrowseristodetermineifauserhasbeenvictimofa crime.Inco mplexSaaSscenarioswithhighclient-serverinteraction,this constitutes a difficult task. Additionally, customers strongly make useof t hird-part y extens i ons [17] w hi ch can b e a bu s e d for malic i ous pu rposes.He n ce,theinvestigatormightwanttolookformalicious extensions, searches performed, websites visited, files downloaded, information entered in formsor stored in local HTML5 stores, web-based email contents andpersistent browser cookies for gathering potential evidence data. Within this context, itis i nevitable to investi g ate the appe a r a nce of mal i ci o us J a vaScript [18] l eadi n g toe.g. unintended AJAX requests and hence modified usage ofadministrationinterfaces. Generally, the web browser contains a lot of electronicevidence datathatcouldbeusedtogiveananswertobothoftheabovequestions-evenif the private mode is switched on [19].B. Investigat i ons i n X a aSEnvironmentsTraditionaldigital forensicmethodologiespermit investigators toseizeequipment and perform detailed analysis on the media and data recovered [11].In a distributedinfrastructure organization like the cloudcomputingenvironment, investigators are confrontedwith an entirely different situation.T he y have no longer t he opt i on of s e izing physic a l data stora g e. Dataandprocesses of the customer are dispensed over an undisclosed amount ofvirtualinstances, applications and network elements. Hence, it is in question whether preliminary findingsof the computer forensic community in the fieldof digitalforensicsapparently havetoberevisedandadaptedtothenewe nvironment. Wi t hin t h i s sectio n, s pecific issues of inve s t ig ations inS a aS,PaaSand IaaSenvironments will be discussed. In addition,cross-disciplinary issueswhichaffectseveralenvironmentsuniformly,willbetakeninto consideration. We also suggest potential solutions to the mentionedproblems.1)SaaS Environments: Especially in the SaaS model, the customer does not obtain any control of the underlying operating infrastructure such as network,servers,operatingsystemsortheapplicationthatisused.Thismeansthatno d eepervie w in t o the s ystem and it s unde r lying infrastructure i s provi d ed tot hecustomer. Only limited userspecific application configuration settings can be controlled contributing to the evidences which can be extractedfromtheclient(seesectionIV-A3).In alotof casesthisurgestheinvestigatorto rely on high-level logs which are eventually provided by the CS P. Giventhe c ase t ha t the CS P doe s not run any logging a ppl i ca t ion, the c ustomer hasno opportunity tocreateanyusefulevidencethroughthe installationof any toolkit or logging tool. These circumstances do not allow a validforensic investigationandleadtothe assumptionthatcustomersofSaaSoffersdonot have any chance to analyze potentialincidences.a)Data Pr ove nance: The notio n of Digita l Provenanc e is known a sm eta-dat athatdescribestheancestryorhistoryofdigitalobjects.Secureprovenancethat records ownershipandprocesshistoryofdataobjectsisvitaltothesuccessof data forensics in cloud environments, yet it is still a challenging issue today[ 8]. Albeit data provenance is of high significance also for IaaSandPaaS,it s t a te s ahugepro bl ems p ecifica l l y forSaaS-ba s eda p plica t i o ns:Curr e nt g l o ba l acting public SaaS CSP offer Single Sign-On (SSO) access control to the setoftheir services. Unfortunately in case of an account compromise, mostofthe CSPdonotofferanypossibilityforthecustomertofigureoutwhichdataandinformati onhasbeenaccessedbytheadversary.Forthevictim,thissituationc an hav e tr e mendous impac t: If sensitive dat a has been compr om ised, i ti sunclear which data has been leaked and which has not been accessed bytheadversary. Additionally, data could be modified or deleted byanexternal adversaryorevenbytheCSPe.g.duetostoragereasons.Thecustomerhasnoabil ity to proof otherwise. Secure provenance mechanisms fordistributede nvironm e ntscan improve this sit u ation but have no tbeenp r actic a llyimplemented by CSP [10]. Suggested Solution: In private SaaS scenariosthissituation is improved by the fact that the customer and the CSP areprobably under the sameauthority. Hence, logging and provenance mechanismscouldbeimplemented which contribute to potential investigations. Additionally, thee xac t loca t io n o f theserversandt he dataisknownatanyti m e.P ublicS aa SCSP should offer additional interfaces for the purpose of compliance,forensics,operationsandsecuritymatterstotheircustomers.ThroughanAPI ,the customers should have the ability to receive specific informationsuch asaccess,errorandeventlogsthatcouldimprovetheirsituationincaseofan investigation. Furthermore, dueto the limited ability of receiving forensicinformationfrom the server and proofing integrity of stored data inSaaS s cen a rios, the c li e n t ha s t o c on tribut e t o t his process. This could bea c hieved byimplementingProofsofRetrievability(POR)inwhichaverifier(client)is en abled to determine that a prover(server) possesses a file or data object andit canbe retrievedunmodified[24].ProvableData Possession(PDP) techniques[ 37] could be used to verify that an untrusted server possesses the originaldata w i t houtt he need fo rthec l ienttore t ri e vei t.Alt houg ht h esec r yptogra phi c proofs have not been implemented by any CS P, the authors of [ 23] introduced a new data integrity verification mechanism for SaaS scenarios whichcould also be used for forensicpurposes. 2)PaaSEnvironments:OneofthemainadvantagesofthePaaSmodelisthat t he developed s oft wa r e applica t ion is u nde r t he c ontrol of t hecus t omer a nd exceptforsomeCSP,thesourcecodeoftheapplicationdoesnothavetoleave t he local development environment. Given these circumstances,thecustomer obtainstheoreticallythepowertodictatehowthe applicationinteractswith other dependencies such as databases, storage entities etc. CSP normallyclaim t hi s tr a nsfer is encrypt e d but t his statem e n t can hardly be verifi e d bythecustomer. Since the customer has the ability to interact with the platform overa prepared API, system states and specific application logs can beextracted.However potential adversaries, which can compromise the application duringruntime,shouldnotbeabletoaltertheselogfilesafterwards.SuggestedS olution: Depending on the runtime environment,logging mechanisms couldbeimplemented which automatically sign and encrypt the log informationbefore itstransfertoacentralloggingserverunderthecontrolofthecustomer. Additional signing and encrypting could prevent potential eavesdroppers fromb eingabl e to v i e wandalter l ogdatainformation on theway t othel o gg i n g server. Runtime compromise of an PaaSapplication by adversaries couldbemonitored by push-only mechanisms for log data presupposing that theneeded informationtodetectsuchanattackare logged. Increasingly, CSPofferingPaaSsolutionsgivedeveloperstheabilitytocollectandstoreavarietyofdi agn os t ics dat a i nahighlyconfigura bl eway w ith t heh el po f r unt imefeaturesets [38].3)IaaSEnvironments: As expected, even virtual instances in the cloudget compromised by adversaries. Hence, the ability to determine how defenses in thevirtualenvironment failedandtowhatextent the affected systems have been compromised is crucial not only for recovering from an incident. Alsoforensicinvestigations gain leverage from such information and contributeto r e s ili en ce a gai n stfutur e a t tac ks onthesy s tems.F r o mt hef or e ns ic poi ntof view,IaaSinstancesdoprovidemuchmoreevidencedatausableforpotent ial forensics than PaaSand SaaS models do. This fact is causedthrough theabilityofthecustomertoinstallandsetuptheimageforforensicpurposes before an incident occurs. Hence, as proposed for PaaSenvironments, logdata a nd other f orensic ev i de nc e infor m at i on could be s i gned and e nc r ypt e d befo reit istransferred to third-party hosts mitigating the chance that amaliciously motivatedshutdownprocessdestroysthevolatiledata.Although,IaaS enviro nments provide plenty of potential evidence, it has to beemphasized t hat the customer VM is in the end s till unde r t he cont rol of the CS P. He controls the hypervisor which is e.g. responsible for enforcing hardware boundariesand routing hardware requests amongdifferent VM.Hence,besides the security responsibilities of the hypervisor, he exertstremendous controloverhow customer’s VMcommunicatewiththehardwareand t heor etica l ly c ani nt erven e ex e cutedpr oc e s sesont he hos t edvirtuali ns t a ncethrough virtual introspection [ 25]. This could also affect encryption orsigningprocesses executed on the VM and therefore leading to theleakage ofthe secretkey.Althoughthisriskcanbedisregardedinmostof thecases, theimpact on the security of high security environments istremendous.a)Sna p s hot Analysi s: T r a d ition a l fo r e ns i c s expect t arge t ma c hines tob e powered down to collect an image (dead virtual instance). Thissituationcompletely changed with the advent of the snapshot technology whichis supported byall popular hypervisors such as Xen, VMware ESX andHyper-V.A snapshot, also referred to as the forensic image of a VM,providesa powerful t ool with wh i ch a v irtua l instance ca n be c l oned byoneclickincludingalsotherunning system’s memory.Duetotheinvention of the snapshot technology, systems hosting crucial business processes donot have to be powered down for forensic investigation purposes. The investigatorsimply createsand loads a snapshot of the target VM foranalysis(l ivevi rt ual i n stance). Thi s beha vi or is especi a lly i mportant for sc e nariosi n which a downtime of a system is not feasible or practical due to existingSLA.However the information whether the machine is running or has been properly powered down is crucial [ 3] for the investigation. Live investigationsof running virtual instancesbecome more common providing evidence data that is not available on powered down systems. The technique of liveinvestigation。
DCS分布式控制系统中英文资料对照外文翻译文献综述中文:DCSDCS是分布式控制系统的英文缩写(Distributed Control System),在国内自控行业又称之为集散控制系统。
即所谓的分布式控制系统,或在有些资料中称之为集散系统,是相对于集中式控制系统而言的一种新型计算机控制系统,它是在集中式控制系统的基础上发展、演变而来的。
它是一个由过程控制级和过程监控级组成的以通信网络为纽带的多级计算机系统,综合了计算机,通信、显示和控制等4C技术,其基本思想是分散控制、集中操作、分级管理、配置灵活以及组态方便。
在系统功能方面,DCS和集中式控制系统的区别不大,但在系统功能的实现方法上却完全不同。
首先,DCS的骨架—系统网络,它是DCS的基础和核心。
由于网络对于DCS 整个系统的实时性、可靠性和扩充性,起着决定性的作用,因此各厂家都在这方面进行了精心的设计。
对于DCS的系统网络来说,它必须满足实时性的要求,即在确定的时间限度内完成信息的传送。
这里所说的“确定”的时间限度,是指在无论何种情况下,信息传送都能在这个时间限度内完成,而这个时间限度则是根据被控制过程的实时性要求确定的。
因此,衡量系统网络性能的指标并不是网络的速率,即通常所说的每秒比特数(bps),而是系统网络的实时性,即能在多长的时间内确保所需信息的传输完成。
系统网络还必须非常可靠,无论在任何情况下,网络通信都不能中断,因此多数厂家的DCS均采用双总线、环形或双重星形的网络拓扑结构。
为了满足系统扩充性的要求,系统网络上可接入的最大节点数量应比实际使用的节点数量大若干倍。
这样,一方面可以随时增加新的节点,另一方面也可以使系统网络运行于较轻的通信负荷状态,以确保系统的实时性和可靠性。
在系统实际运行过程中,各个节点的上网和下网是随时可能发生的,特别是操作员站,这样,网络重构会经常进行,而这种操作绝对不能影响系统的正常运行,因此,系统网络应该具有很强在线网络重构功能。
DCS分布式控制系统中英文资料对照外文翻译文献综述中文:DCSDCS是分布式控制系统的英文缩写(Distributed Control System),在国内自控行业又称之为集散控制系统。
即所谓的分布式控制系统,或在有些资料中称之为集散系统,是相对于集中式控制系统而言的一种新型计算机控制系统,它是在集中式控制系统的基础上发展、演变而来的。
它是一个由过程控制级和过程监控级组成的以通信网络为纽带的多级计算机系统,综合了计算机,通信、显示和控制等4C技术,其基本思想是分散控制、集中操作、分级管理、配置灵活以及组态方便。
在系统功能方面,DCS和集中式控制系统的区别不大,但在系统功能的实现方法上却完全不同。
首先,DCS的骨架—系统网络,它是DCS的基础和核心。
由于网络对于DCS 整个系统的实时性、可靠性和扩充性,起着决定性的作用,因此各厂家都在这方面进行了精心的设计。
对于DCS的系统网络来说,它必须满足实时性的要求,即在确定的时间限度内完成信息的传送。
这里所说的“确定”的时间限度,是指在无论何种情况下,信息传送都能在这个时间限度内完成,而这个时间限度则是根据被控制过程的实时性要求确定的。
因此,衡量系统网络性能的指标并不是网络的速率,即通常所说的每秒比特数(bps),而是系统网络的实时性,即能在多长的时间内确保所需信息的传输完成。
系统网络还必须非常可靠,无论在任何情况下,网络通信都不能中断,因此多数厂家的DCS均采用双总线、环形或双重星形的网络拓扑结构。
为了满足系统扩充性的要求,系统网络上可接入的最大节点数量应比实际使用的节点数量大若干倍。
这样,一方面可以随时增加新的节点,另一方面也可以使系统网络运行于较轻的通信负荷状态,以确保系统的实时性和可靠性。
在系统实际运行过程中,各个节点的上网和下网是随时可能发生的,特别是操作员站,这样,网络重构会经常进行,而这种操作绝对不能影响系统的正常运行,因此,系统网络应该具有很强在线网络重构功能。
基于多数据融合传感器的分布式温度控制系统摘要:在过去的几十年,温度控制系统已经被广泛的应用。
对于温度控制提出了一种基于多传感器数据融合和CAN总线控制的一般结构。
一种新方法是基于多传感器数据融合估计算法参数分布式温控系统。
该系统的重要特点是其共性,其适用于很多具体领域的大型的温度控制。
实验结果表明该系统具有较高的准确性、可靠性,良好的实时性和广泛的应用前景。
关键词:分布式控制系统;CAN总线控制;智能CAN节点;多数据融合传感器。
1介绍分布式温度控制系统已经被广泛的应用在我们日常生活和生产,包括智能建筑、温室、恒温车间、大中型粮仓、仓库等。
这种控制保证环境温度能被保持在两个预先设定的温度间。
在传统的温度测量系统中,我们用一个基于温度传感器的单片机系统建立一个RS-485局域网控制器网络。
借助网络,我们能实行集中监控和控制.然而,当监测区域分布更广泛和传输距离更远,RS-485总线控制系统的劣势更加突出。
在这种情况下,传输和响应速度变得更低,抗干扰能力更差。
因此,我们应当寻找新的通信的方法来解决用RS-485总线控制系统而产生的问题。
在所有的通讯方式中,适用于工业控制系统的总线控制技术,我们可以突破传统点对点通信方式的限制、建立一个真正的分布式控制与集中管理系统,CAN总线控制比RS-485总线控制系统更有优势。
比如更好的纠错能力、改善实时的能力,低成本等。
目前,它正被广泛的应用于实现分布式测量和范围控制。
随着传感器技术的发展,越来越多的系统开始采用多传感器数据融合技术来提高他们的实现效果。
多传感器数据融合是一种范式对多种来源整合数据,以综合成新的信息,比其他部分的总和更加强大。
无论在当代和未来,系统的低成本,节省资源都是传感器中的一项重要指标。
2分布式架构的温度控制系统分布式架构温度控制系统如图中所示的图1。
可以看出,这系统由两个模块——两个智能CAN节点和一个主要的控制器组成。
每个模块部分执行进入分布式架构。
毕业设计(论文)外文文献翻译文献、资料中文题目:在分布式环境中自动动态的基础设施部署的J2EE应用文献、资料英文题目:文献、资料来源:文献、资料发表(出版)日期:院(部):专业:班级:姓名:学号:指导教师:翻译日期: 2017.02.14英文原文Infrastructure for Automatic Dynamic DeploymentOf J2EE Application in Distributed EnvironmentsAbstract: in order to achieve such dynamic adaptation, we need an infrastructure for automating J2EE application deployment in such an environment. This need is quite evident to anyone who has ever tried deploying a J2EE application even on a single application server, which is a task that involves a great deal of configuration of both the system services and application components. Key words: j2ee; component; Distributed; Dynamic Deployment;1 IntroductionIn recent years, we have seen a significant growth in component-based enterprise application development. These applications are typically deployed on company Intranets or on the Internet and are characterized by high transaction volume, large numbers of users and wide area access. Traditionally they are deployed in a central location, using server clustering with load balancing (horizontal partitioning) to sustain user load. However, horizontal partitioning has been shown very efficient only in reducing application-related overheads of user-perceived response times, without having much effect on network-induced latencies. Vertical partitioning (e.g., running web tier and business tier in separate VMs) has been used for fault isolation and load balancing but it is sometimes impractical due to significant run-time overheads (even if one would keep the tiers on a fast local-area network) related to heavy use of remote invocations. Recent work [14] in the context of J2EE component based applications has shown viability of vertical partitioning in wide-area networks without incurring the aforementioned overheads. The key conclusions from that study can be summarized as follows:• Using properly designed applications, vertical distribution across wide-area networks improves user-perceived latencies.• Wide-area vertical layering requires replication of application components and maintaining consistency between replicas.• Additional replicas may be deployed dynamically to handle new requests.• Different replicas may, i n fact, be different implementations of the same componentbased on usage (read-only, read-write).• New request paths may reuse components from previously deployed paths. Applying intelligent monitoring [6] and AI planning [2, 12] techniques in conjunction with the conclusions of that study, we see a potential for dynamic adaptation in industry-standard J2EE component-based applications in wide area networks Through deployment of additional application components dynamically based on active monitoring. However, in order to achieve such dynamic adaptation, we need an infrastructure for automating J2EE application deployment in such an environment. This need is quite evident to anyone who has ever tried deploying a J2EE application even on a single application server, which is a task that involves a great deal of configuration of both the system services and application components. For example one has to set up JDBC data sources, messaging destinations and other resource adapters before application components can be configured and deployed. In a wide area deployment that spans multiple server nodes, this proves even more complex, since more system services that facilitate inter-node communications need to be configured and started and a variety of configuration data, like IP addresses, port numbers, JNDI names and others have to be consistently maintained in various configuration files on multiple nodes.This distributed deployment infrastructure must be able to:• address inter-component connectivity specification and define its effects on component configuration and deployment,• address application component dependencies on application server services, their configuration and deployment,• provide simple but expressive abstractions to control adaptation throug h dynamic deployment and undeployment of components,• enable reuse of services and components to maintain efficient use of network nodes’ resources,• provide these facilities without incurring significant additional design effort on behalf of application programmers.In this paper we propose the infrastructure for automatic dynamic deployment ofJ2EE applications, which addresses all of the aforementioned issues. The infrastructure defines architecture description languages (ADL) for component and link description and assembly. The Component Description Language is used to describe application components and links. It provides clear separation of application components from system components. A flexible type system is used to define compatibility of component ports and links. A declaration and expression language for configurable component properties allows for specification of inter-component dependencies and propagation of properties between components. The Component (Replica) Assembly Language allows for assembly of replicas of previously defined components into application paths byConnecting appropriate ports via link replicas and specifying the mapping of these component replicas onto target application server nodes. The Component Configuration Process evaluates an application path’s correctness, identifies the dependenciesof application components on system components, and configures component replicas for deployment. An attempt is made to match and reuse any previously deployed replicas in the new path based on their configurations. We implement the infrastructure as a part of the JBoss open source Java application server [11] and test it on severalSample J2EE applications – Java Pets tore [23], Rubies [20] and TPC-W-NYU [32]. The infrastructure impl ementation utilizes the JBoss’s extendable micro-kernel architecture, based on the JMX [27] specification. Componentized architecture of JBoss allows incremental service deployments depending on the needs of deployed applications. We believe that dynamic reconfiguration of application servers through dynamic deployment and undeployment of system services is essential to building a resource-efficient framework for dynamic distributed deployment of J2EE applications. The rest of the paper is organized as follows. Section 2 provides necessary background for understanding the specifics of the J2EE component technology which are relevant to this study. Section 3 gives a general description of the infrastructure architecture, while section 4 goes deeper in describing particularlyimportant and interesting internal mechanisms of the infrastructure. Section 5 describes the implementation of the framework, and related work is discussed in section 6.2 J2EE Background2.1 IntroductionComponent frameworks. A component framework is a middleware system that supports applications consisting of components conforming to certain standards. Application components are “plugged” into the component framework, which establishes their environmental conditions and regulates the interactions between them. This is usually done through containers, component holders, which also provide commonly required support for naming, security, transactions, and persistence. Component frameworks provide an integrated environment for component execution, as a result significantly reduce the effort .it takes to design, implement, deploy, and maintain applications. Current day industry component framework standards are represented by Object Management Group’s CORBA Component Model [18], Sun Microsystems’ Java 2 Platform Enterprise Edition (J2EE) [25] and Microsoft’s .NET [17], with J2EE being currently the most popular and widely used component framework in the enterprise arena.J2EE. Java 2 Platform Enterprise Edition (J2EE) [25] is a comprehensive standard for developing multi-tier enterprise Java applications. The J2EE specification among other things defines the following:• Component programming model,• Component contracts with the hosting server,• Services that the platform provides to these components,• Various human roles,• Compatibility test suites and compliance testing procedures.Among the list of services that a compliant application server must provide are messaging, transactions, naming and others that can be used by the application components. Application developed using J2EE adhere to the classical 3-Tier architectures –Presentation Tier, Business Tier, and Enterprise Information System(EIS) Tier (see Fig. 1). J2EE components belonging to each tier are developed adhering to theSpecific J2EE standards.1. Presentation or Web tier.This tier is actually subdivided into client and server sides. The client side hosts a web browser, applets and Java applications that communicate with the server side of presentation tier or the business tier. The server side hosts Java Servlet components [30], Java Server Pages (JSPs) [29] and static web content. These components are responsible for presenting business data to the end users. The data itself is typically acquired from the business tier and sometimes directly from the Enterprise Information System tier. The server side of the presentation tier is typically accessed through HTTP(S) protocol.2. Business or EJB tier.This tier consists of Enterprise Java Beans (EJBs) [24] that model the business logic of the enterprise application. These components provide persistence mechanisms and transactional support. The components in the EJB tier are invoked through remote invocations (RMI), in-JVM invocations or asynchronous message delivery, depending on the type of EJB component. The EJB specification defines several types of components. They differ in invocation style (synchronous vs. asynchronous, local vs. remote) and statefulness: completely stateless (e.g., Message-Driven Bean), stateful non-persistent(e.g., Stateful Session Bean), stateful persistent (e.g., Entity Bean). Synchronously invocable EJB components expose themselves through a special factory proxy object (an EJB Home object, which is specific to a given EJB), which is typically bound in JNDI by the deployer of the EJB. The EJB Home object allows creation or location of an EJBObject, which is a proxy to a particular instance of an EJB 1.3. Enterprise Information System (EIS) or Data tier.This tier refers to the enterprise information systems, like relational databases, ERP systems, messaging systems and the like. Business and presentation tier component。
Ice Distributed ProgrammingInternet Communications Engine (Ice) is a modern distributed computing platform with support for C++, .NET, Java, Python, Objective-C, Ruby, PHP, and ActionScript. Ice is used in mission-critical projects by companies all over the world.Ice is easy to learn, yet provides a powerful network infrastructure and vast array of features for demanding technical applications.Ice is free software, available with full source, and released under the terms of GNU General Public License (GPL). Commercial licenses.An Overview of the Ice Platform:The Ice Approach: Flexible and SimpleIce provides a communication solution that is simple to understand and easy to program with. Yet, despite its simplicity, Ice is flexible enough to accommodate even the most demanding and mission-critical applications.Cutting-Edge TechnologyIce was designed and implemented by industry experts with many years of distributed computing experience. ZeroC's class-leading technology is flexible, easy to use, robust, and provides superior performance and scalability.Programming LanguagesIce allows you to write your distributed applications in C++, Java, C# (and other .NET languages, such as Visual Basic), Python, Ruby, PHP, and ActionScript. With Ice Touch, your application can include Objective-C components that run on the iPhone, iPad, and iPod touch, while Ice for Java can also be used to build Ice applications for Android. Ice-E allows you to deploy C++ components on resource-constrained devices running Gumsticks or Windows Mobile Professional. (Ice-E applications must be written in C++.) This makes Ice the platform of choice for heteregeneous distributed systems that span multiple operating systems and programming languages.APIsIce provides a set of APIs that emphasize simplicity and ease of use. All APIs are thread-safe and exception-safe, and the C++ APIs make it very difficult to leak or corrupt memory. This shortens development time, decreases testing effort, and reduces time to market.The APIs for the various programming languages (apart from a very small number of exceptions) are identical: for example, if you know the API for Java, you also know the API for C++ and C# (and any other supported programming language). For systems written in more than one language, this reduces the learning curve and allows reuse of design patterns and implementation techniques.Advanced TechnologyIce is much more than just a remote procedure call mechanism. For example, Ice supports synchronous as well as asynchronous calls, co-exists with firewalls due to its support for bidirectional connections, allows messages to be batched for efficiency, and permits sophisticated control of threads and resource allocation. See Ice Features for more detailed information.Fault Tolerance and Load BalancingIce allows you to create systems that are fault tolerant. Multiple instances of a server can be deployed on different machines, with transparent fail-over if a machine crashes or is disconnected from the network. This not only makes applications resilient against failures, but also increases performance because Ice allows you to balance the load of a distributed system over several servers. Once a system takes advantage of load balancing, it is easy to scale it to higher loads simply by adding more serversPerformance and ScalabilityIce was designed from the ground up for applications that require the utmost in performance and scalability. At the network level, Ice uses an efficient binary protocol thatminimizes bandwidth consumption. Ice uses little CPU and memory, and its highly efficient internal data structures do not impose arbitrary size limitations. This allows applications to scale to tens of thousands of clients with ease.Ice is fast. So fast that, as a rule, it imposes no discernable overhead on a distributed application. Data can be transmitted at whatever speed is supported by the network, so Ice does not create any performance bottleneck. Our Performance and Scalability white paper provides a comparison with other popular distributed computing solutions.Realistic Feature SetIce was developed to meet the demands of real-world distributed systems and it incorporates extensive customer feedback. Ice provides realistic features that are actually useful (as opposed to theorectical features that were considered possibly useful by a committee).ServicesFor realistic distributed systems, even the most sophisticated remote procedure call mechanism is useless without additional infrastructure. Ice provides a rich set of services, such as event distribution, firewall traversal with authentication and filtering, automatic persistence, automatic application deployment and monitoring, and automatic software distribution and patching. All services can be replicated for fault tolerance, so they do not introduce a single point of failure. Use of these services greatly reduces development time because they eliminate the need to create distribution infrastructure as part of application development. Our Ice Services page provides an overview of the different services.SecurityIce is inherently secure and suitable for use over insecure public networks. Traffic can be encrypted with SSL, and a variety of authentication mechanisms ensure that only authorized clients can access a system. Ice can work with existing firewalls: a single port is sufficient to provide secure access for an unlimited number of clients to an unlimited number of servers.Web IntegrationYou can use Ice to integrate a distributed system with the web. Ice for Java allows you to run an Ice client as an applet inside a browser, and Ice for PHP makes it easy to populate web content with data collected from back-end servers. In addition, Ice for Silverlight allows you to run Ice clients directly in a browser.First-Class Documentation and SupportZeroC prides itself on the industry-leading quality of its documentation and API reference. The documentation is arranged into topics by language mapping and programing task, so you can easily locate a section of interest. The indexed and searchable text is available online and as a download suitable for off-line viewing or printing. The documentation includes many code examples that illustrate how to implement different features, making it easy for programmers to learn implementation techniques and acquire know-how. Beyond tutorial and reference material, the documentation covers many non-trivial topics that are relevant for real-world application development.Language Support:The programming languages used in today's development projects are often determined by a number of factors, including application requirements, developer backgrounds, organizational policies, and compatibility with third-party tools. We designed Ice to be a practical distributed computing platform for real developers writing real applications. That goal drives everything we do at ZeroC, and it is the reason why Ice supports such a broad spectrum of programming languages. Whether your project uses one language or several, Ice will enable your components to communicate with each other naturally, efficiently and transparently.The Ice API was carefully designed to conform with the standards and practices of each programming language while maintaining a consistency that minimizes the learning curve of a developer working in multiple languages. As you will see from the sample code shown on the language pages, the Ice API is easy to learn and just as easy to use.Ice currently offers language mappings for the following programming languages:C++The Ice language mapping for C++ combines raw speed with an elegant design that enhances productivity and virtually eliminates memory leaks. See Ice for C++ for more information on the C++ mapping and a code example.JavaJava's portability and wealth of tools is an attractive complement to Ice, and support for Android allows Ice applications to run on devices in Google's ecosystem. See Ice for Java for more information on the Java mapping and a code example..NETCompatible with both Microsoft and Mono, developers can utilize Ice to integrate their .NET applications. Embedded and mobile devices that use the .NET Compact Framework can also employ Ice. See Ice for .NET for more information on the C# mapping and a code example.PythonIce for Python provides seamless access to Ice resources from this popular scripting language. See Ice for Python for more information on the Python mapping and a code example.PHPDynamic web applications can extend their reach to distributed objects using Ice for PHP. See Ice for PHP for more information on the PHP mapping and a code example.Objective-CWith support for the iPhone, iPod touch, Cocoa, and OS X, developers can use Ice Touch to integrate Apple's mobile devices. See Ice Touch for Objective-C for more information on the Objective-C mapping and a code example.Object landscapes and lifetimesTechnically, OOP is just about abstract data typing, inheritance, and polymorphism, but other issues can be at least as important. The remainder of this section will cover these issues.One of the most important factors is the way objects are created and destroyed. Where is the data for an object and how is the lifetime of the object controlled? There are different philosophies at work here. C++ takes the approach that control of efficiency is the most important issue, so it gives the programmer a choice. For maximum run-time speed, thestorage and lifetime can be determined while the program is being written, by placing the objects on the stack (these are sometimes called automatic or scoped variables) or in the static storage area. This places a priority on the speed of storage allocation and release, and control of these can be very valuable in some situations. However, you sacrifice flexibility because you must know the exact quantity, lifetime, and type of objects while you're writing the program. If you are trying to solve a more general problem such as computer-aided design, warehouse management, or air-traffic control, this is too restrictive.The second approach is to create objects dynamically in a pool of memory called the heap. In this approach, you don't know until run-time how many objects you need, what their lifetime is, or what their exact type is. Those are determined at the spur of the moment while the program is running. If you need a new object, you simply make it on the heap at the point that you need it. Because the storage is managed dynamically, at run-time, the amount of time required to allocate storage on the heap is significantly longer than the time to create storage on the stack. (Creating storage on the stack is often a single assembly instruction to move the stack pointer down, and another to move it back up.) The dynamic approach makes the generally logical assumption that objects tend to be complicated, so the extra overhead of finding storage and releasing that storage will not have an important impact on the creation of an object. In addition, the greater flexibility is essential to solve the general programming problem.Java uses the second approach, exclusively]. Every time you want to create an object, you use the new keyword to build a dynamic instance of that object.There's another issue, however, and that's the lifetime of an object. With languages that allow objects to be created on the stack, the compiler determines how long the object lasts and can automatically destroy it. However, if you create it on the heap the compiler has no knowledge of its lifetime. In a language like C++, you must determine programmatically when to destroy the object, which can lead to memory leaks if you don’t do it correctly (and this is a common problem in C++ programs). Java provides a feature called a garbage collector that automatically discovers when an object is no longer in use and destroys it. A garbage collector is much more convenient because it reduces the number of issues that youmust track and the code you must write. More important, the garbage collector provides a much higher level of insurance against the insidious problem of memory leaks (which has brought many a C++ project to its knees).附录2(中文翻译)Ice分布式程序设计互联网通讯引擎(Ice)是一个现代的分布式计算平台支持C++,.NET、Java、Python、objective - c, Ruby、PHP和 ActionScript。
Distributed Control System Design for HumanoidRobotsAbstract—A humanoid robot with multiple sensors needs a control system with severe requests like real-time control and friendly man-computer interaction. In this paper, we propose a distributed control system consisting of double PC control sub-system based on CAN bus for humanoid robot. The control system is divided into three layers. As the main control layer, double PC control architecture is formed with two sub-systems. Man-machine interface sub-system under Windows OS aims at controller debugging and information observation, meanwhile real-time motion control sub-system under RT-Linux OS is developed for real-time motion control realization. The communication layer based on CAN bus assures the reliable communication between real-time computer and joint controllers. The actuating layer is responsible for joint servo control and sensory data acquisition. At last, the experiment is implemented on newly-build humanoid robot MIH-2. Joint motor testing under Windows OS confirms the effectiveness of humanoid robot debugging and accuracy of data acquisition. Real-time control system testing based on RT-Linux OS demonstrates that the system can provide stationary time-lag ensuring the stability during the walking pattern control for humanoid robot. The experiment results show that the presented control system can meet all the requests during the humanoid robot walking control.Ⅰ. INTRODUCTIONWith the development and maturity of robot technology, humanoid robot with multi-sensors has gradually become another hotspot in this field [1]. Generally, a humanoid robot has more than thirty DOFs (degree of freedom) to be servo-actuated and should deal with multiple sensor information, which requires reliable and stable control system with high computing speed.Great breakthrough such as well-known humanoid robots P3 [2], ASIMO and HRPs has been achieved thanks to the efforts of many institutes and researchers [3]. Many of the robots adopt centralized control system that places a computer and an interface board which has A/D, D/A at the center. Controllers of the motor and sensors are all connected to the computer directly. System with this architecture has very high communication speed, which can be controlled by simple software. However, the wires connected to the central computer are too excessive to assure the reliability and stability during the control of walking pattern from the perspective of hardware. In terms of the humanoid robot requiring high real-time control, the redundant wires increase the susceptibility to noise and wiring that undermines the basic stationary time-lag for real-time control. In the case of adding a new servo module to the body, all wires need to be connected to the central computer which means the overall arrangement of existed wires should be changed [4]. In order to solve these problems, the concept of distributed control system with simple hardware connection has been brought forward [5].In this paper, we propose a distributed control system based on CAN bus and double PC control architecture to solve the problem of stability and reliability during real-time control of the humanoid robot. Through the double PC control design, performance of the whole system is also optimized. We adopt centralized management and distributed control model in our system that can be divided into 3 layers based on architecture and function. Main control layer is formed with double OS (operating system) to acquire the advantages of Windows OS and also ensure the stable real-time control. Thereinto, Windows OS aiming at controller debugging and information observation provides friendly man-machine interface and meanwhile RT-Linux OS realizes the real-time motion control. Actuating layer is integrated on the robot body to implement specific servo control, motor actuation and signal acquisition. Communication layer takes responsibility for data exchange between main control layer and actuating layer. Double PCs and every I/O node of controllers are all connected with CAN bus to satisfy high-speed and reliable communication. CAN bus consists of two wires and maintains 1Mbps speed under distance of 40m. These advantages make it possible for humanoid robot to simplify its overall arrangement of wire while keeping its original properties. Because of the whole design concept that regards every controller module as an I/O node, it is very easy to add a motor or sensor so long as the capacity of the network is sufficient.As a whole, the remainder of this paper is organized as follows. The overview of the wholecontrol system is described in Section Ⅱ. Every control layer is specified in this part. In Section Ⅲ, we present the concrete operating process of the most important layer—main control layer. Functions of Windows OS and RT-Linux OS are introduced in this part as well as their realization methods. The specific experiment implementation of joint motor testing under Windows OS and real-time control testing on RT-Linux OS is demonstrated in Section Ⅳ. Finally, the conclusion about the properties of the whole system is given in Section Ⅴ.Ⅱ. DISTRIBUTED CONTROL SYSTEMARCHITECTUREA.Overview of the Humanoid RobotThe proposed distributed control system based on CAN bus was implemented on our newly-built humanoid robot MIH-I. The humanoid robot shown in Fig.1 was equipped with stereo CCD cameras in the head, torque/force sensors at waist and feet, acceleration sensors and gyro sensors in chest and joint angle detection sensors in order to acquire environmental information and joint data. Its height and weight were 798 [mm] and 26 [kg] respectively. It consisted of 25 degree of freedoms (DOF). The configuration and joint angle constraints were shown in Table Ⅰ. MIH-I was developed as a platform for further research of distributed control system and algorithms for walking pattern control.(a)Photo of humanoid robot (b) Joint configuration modelFig.1 The humanoid robotB.Architecture of the Computer ControlSystemThe method of centralized management and distributed control was employed in our system. According to architecture and function of this system, it was divided into 3 layers. Main control layer was formed with double OS (operating system) to acquire the advantages of Windows OS and also ensured the stable real-time control. Thereinto, Windows OS aiming at controller debugging and information observation provided friendly man-machine interface and meanwhile RT-Linux OS realized the real-time motion control. Actuating layer was integrated on the robot body to implement specific servo control, motor actuation and signal acquisition. Communication layer took responsibility for data exchange between main control layer and actuating layer. CAN bus was connected between double PC and every I/O node of controllers to satisfy high-speed and reliable communication. The specific architecture of the computer control system was shown in Fig. 2. It realized the synchronous joint motion while keeping the stability of the humanoid robot and optimized the whole control system at the same time.C.Structure of Main Control LayerTwo control PCs with different OS (operating system) were recruited in main control layer. This layer produced the series of joint data and meanwhile coordinated every joint to complete the planned motion. The PC with RT-Linux OS was stable and reliable enough to complete the gait planning and walking pattern control. The other PC with Windows OS made it easy to observe statement of every working motor. Since the Windows OS could support functions of multi-media sensors, it has been convenient to debug the joint motor, check the signals sent by motors and sensors. The friendly man-machine interface assured the rapidity and accuracy during handling unexpected emergency. A PCI CAN card with two isolated and independent channels was applied in this layer to enhance anti-interface capability. The SJA1000 chip was built in the CAN card as CAN controller [6]. It provided bus arbitration and error detection with an automatic transmission repetition function to reduce the chance of data loss and ensure system reliability.D.Structure of Communication Layer Communication layer transferred the information between main control layer and actuating layer. Main control PCs and all the servo controllers as I/O nodes connected with each other on CAN bus.Table Ⅰ(a)DEGREE OF FREEDOM OF THE ROBOTHead 1DOF×1=1DOFArms Shoulder 3DOF×2=6DOFHands 3DOF×2=6DOFHip 3DOF×2=6DOFKnee 1DOF×2=2DOFLegs Ankle 2DOF×2=4DOF (b)ANGLE RANGE CONSTRAINTS OF LEGS Parts Joint Motion Range(Degree)Hip Yaw -40 ~ +40 Roll -30 ~ +30 Pitch -40 ~ +65Knee Pitch -100 ~ 0Ankle Pitch -55 ~ +63Roll -30 ~ +30Fig.2 Humanoid robot motion control system structurediagramThe concrete implementation of the connection was as follows, the double PCs mounted on CAN bus through the PCI CAN card and every servo controller articulated on CAN bus through CAN transceivers. CAN bus was simply composed of twisted-pairs and it adopted short frame transmission to achieve the real-time message with high speed. Communication layer recorded every frame loss and offered error handling function through locating the I/O node which was out of order [2]. It was essential to design the CAN communication protocol on application level between main control layer and actuating layer. We resorted to master-slave communication mode and bus arbitration in this protocol design to avoid communication conflict. The two main control PCs acted as master and every servo controller acted as slaves in the mode. The CAN network was in standard format in the case of humanoid robot, so the arbitration field consisted of the 11-bit identifier and the RTR bit. The 11-bit identifier was split into three parts illustrated in Table Ⅱ.Table ⅡALLOCATION OF 11-BIT IDENTIFIER OF CANFRAMESID bit 10 9-5 4-0 Function 0/1 Command/Response Slave nodescode(0-31) address(0-9) Since the 10bit of identifier had the top priority, it was used to indicate the sender of CAN data. The bits from 9 to 5 of identifier were used to indicate the command frames sent to servo controllers. Similarly the bits from 4-0 were used to mark the ID of I/O node which the master wanted to control or the slaves needed to communicate with.E.Structure of Actuating LayerActuating layer took responsibility for joint servo control of the humanoid robot. To achieve the severe request of real-time and stable circumstance, several intelligent modules were employed to realize the function of actuating layer [7]. Every intelligent module was independent and was formed with MCU (micro controller unique), the actuator and the corresponding sensors shown in Fig. 3. Intelligent module analyzed the joint motion data received from main control layer to drive the motors. On the other hand, intelligent module acquired the sensor data which was sent to be utilized by the master. Every intelligent module cooperated with the master to handle unexpected emergency during walking pattern control of the robot.Every joint was controlled by the corresponding intelligent module. DSP2812 was utilized as MCU because of its powerful function integrated inside. The PD control algorithm was uploaded inside MCU to compute data and control the actuator. The QEP (quadrature encoder pulse) circuit integrated in DSP received data from incremental encoder sensors to attain the relative position of the motor. D/A converter was applied to provide analog voltage as command signal.Fig. 3 Block diagram of intelligent moduleⅢ. IMPLEMENTA TION OF MAIN CONTROLLAYERA.Design of Control System Based onWindowsThe software design based on Windows was divided into three layers: denotation layer, datalayer and operation layer as shown in Fig. 4 [8]. Denotation layer was utilized as man-machine interface to set system parameters and working model. It monitored the statement of humanoid robot timely and supported dynamic display function of environmental information. Data layer accomplished the walking pattern planning via the analysis of the data which was reserved from sensors and the environment. Operation layer took the responsibility for command process, task process, track producing and coordination motion control.Overall, operation layer was the core of these three layers. It completed the management of dynamic environmental information and sent reference data of every joint to the robot. Task process module was the scheduling centre that assorted operation layer with other modules. It received instruction from command process module and environmental information from global information module. The history records in data layer could be available to task process module if necessary. Motion control module encapsulated the planned walking array as CAN data packet which was transferred on CAN devices. Sensor signal module acquired and processed information from gyro and other sensors then reflected the accurate result to the monitor.Fig. 4 Software framework of control system based onWindowsB.Design of Control System Based onRT-LinuxMain control system based on RT-Linux remedied the defect of Windows OS. Thanks to the RT-Linux, the request of hard real-time control and preemptive model was fulfilled. To assure the stability during walking, RT-Linux OS shortened the control period and kept stationary time lag. The response within limited time and the capability to handle multi tasks at the same time made the robot robust enough to adjust the surroundings in time.RT-Linux as a real-time kernel interfaced at the bottom of Linux OS and allowed Linux tasks to work with the lowest priority. In this way, RT-Linux amended the thread scheduling and interruption process according to Linux OS. Non-real time tasks worked under Linux OS to provide function of Internet communication and meanwhile real-time tasks worked under RT-Linux OS to complete assignment with severe limitation of time.RT-Linux real-time control system was formed with device driver module and real-time control module as shown in Fig. 5.Device driver module supplied the sockets between application programs and CAN bus devices to realize the basic transmission of CAN data. Real-time control module ran under RT-Linux kernel to carry out the major tasks such as transmission of joint angle data, receiving of sensor information and completion of algorithm.Fig. 5 Block diagram of real-time control system based onRT-LinuxⅣ. EXPERIMENTSThe proposed distributed control system based on humanoid robot was experimented on the robot which consisted of two parts: joint motor debugging and real-time system test.A.Joint Motor DebuggingTo acquire the information of every joint and perceive the surroundings, test based on sensors of vision, angle, electronic compass and encoder was employed in this part. The debugging man-machine interface was shown in Fig. 6 and Fig. 7.Every button of the joints corresponded to every joint motor as presented in the picture. Manipulator tested every joint motor ofthehumanoid robot respectively and the angle data of the joint was reflected on the monitor according to the planned data. All the joints could also cooperate to finish tasks regardless of the request of precision. The expected sensor data and CAN frames were chosen during the initialization stage of the system whereafter the expected data was processed and displayed by sensor signal module based on Windows. In addition, the sensor signal module handled information of vision and hearing to improve the multi-media function.Fig. 6 Operation interface of motion control moduleFig. 7 Display interface of sensor signal collection andprocessing moduleB.Real-time System TestTo compare the real-time accuracy under different OS platform, the CAN communication baud rate was set to 500 [kbps]. The property of walking control under Windows OS and RT-Linux OS was recorded while the control period was set to 10 [ms] as shown in Fig. 8.Fig. 8 Transmission time intervals of joint signals based on Windows or RT-Linux control systemEven though the control system based on Windows OS was more convenient in terms of joint motor debugging, it caused too much time lag out of the tolerance of the humanoid robot while the control period was 10 [ms]. When thecontrol period approached 40 [ms], the time interval of sent data was out of control with random time lag. On the contrary, we achieved stationary time lag about 0.12 [ms] which was accepted by humanoid robot while the control period was 10 [ms] and the time interval of sent data stayed the same even if the control period added up to 40 [ms].Ⅴ. CONCLUSIONThis paper provided a description of a distributed control system based on double PC control architecture for humanoid robot. The principal results of this paper were summarized as follows.1) A distributed control system with WindowsOS and RT-Linux OS was proposed and implemented. This design took advantage of the double PC control architecture to realize the friendly man-machine interface and met the requirements of communication speed, stability, real-time and reliability during the walking pattern control.2) The software design based on CAN bus fordata transmission between real-time system and non-real time system was utilized. Through the interaction of three control layers, the severe stationary time lag was achieved. The basic walking experiment confirmed the effectiveness of this software design.3) The distributed control system consideringevery module as an I/O node simplified overall arrangement of wire and upgraded the architecture conveniently.The results support our claim that Windows OS plus RT-Linux OS could balance information observation with real-time control for humanoid robot. However, the requirement of communication speed and safety as the far-reaching desire is still a practical limitation.Further research will be devoted to removing this limitation.R EFERENCES[1]Xie Tao, Xu Jianfeng, Zhou Yongxue, etal. History, current state, andprospect of study of humanoids [J]. Robot, 2002, 24(4): 367-374. [2]Jung Y.K, Jungho L, etal. System design and dynamic walking ofhumanoid robot KHR-2 [C]. Barcelona, Spain: Proceeding of the 2005 IEEE International Conference on Robotics and Automation, 2005.[3]Gordon W, Damien K. Distributed control of gait for a humanoid robot[C]. Padua, Italia, Robot world cup soccer and rescue competitions andconferences No7, 2004.[4]Zhong Hua, Wu Zhenwei, Bu Chunguang. Research and implementationof a humanoid robot control system [J]. Robot, 2005, 27(5): 455-459. [5]Yu Z.G, Huang Q, Li J.X, etal. Distributed control system for humanoidrobot [J]. Harbin, China: Proceedings of the 2007 IEEE International Conference on Mechatronic and Automation, 2007.[6]SJA1000 Stand-alone CAN control application note. PhilipsSemiconductor, 1997.[7]Chen Jian, Lei Xusheng, Su Jianbo. Agents based on hierarchical controlsystem for humanoid robots[J]. High Technology Letters, 2007, 17(6): 5862-5900.[8]Bonasso R.P, Firby R.J, Gat E. A proven three-tiered architectureforprogramming autonomous robots[J]. Journal of Experimental and Theoretical Artificial Intelligence, 1997, 9(2):703-711.。
外文翻译原文来源The Hadoop Distributed File System: Architecture and Design 中文译文Hadoop分布式文件系统:架构和设计姓名 XXXX学号 200708202137英文原文The Hadoop Distributed File System: Architecture and DesignSource:/docs/r0.18.3/hdfs_design.html IntroductionThe Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed onlow-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project. The project URL is/core/.Assumptions and GoalsHardware FailureHardware failure is the norm rather than the exception. An HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system’s data. The fact that there are a huge number of components and that each component has a non-trivial probability of failure means that some component of HDFS is always non-functional. Therefore, detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS.Streaming Data AccessApplications that run on HDFS need streaming access to their data sets. They are not general purpose applications that typically run on general purpose file systems. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than lowlatency of data access. POSIX imposes many hard requirements that are not needed for applications that are targeted for HDFS. POSIX semantics in a few key areas has been traded to increase data throughput rates.Large Data SetsApplications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It should support tens of millions of files in a single instance.Simple Coherency ModelHDFS applications need a write-once-read-many access model for files. A file once created, written, and closed need not be changed. This assumption simplifies data coherency issues and enables high throughput data access. AMap/Reduce application or a web crawler application fits perfectly with this model. There is a plan to support appending-writes to files in the future.“Moving Computation is Cheaper than Moving Data”A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and increases the overall throughput of the system. The assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where the application is running. HDFS provides interfaces for applications to move themselves closer to where the data is located.Portability Across Heterogeneous Hardware and Software PlatformsHDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications.NameNode and DataNodesHDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data tobe stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.The NameNode and DataNode are pieces of software designed to run on commodity machines. These machines typically run a GNU/Linux operating system (OS). HDFS is built using the Java language; any machine that supports Java can run the NameNode or the DataNode software. Usage of the highlyportable Java language means that HDFS can be deployed on a wide range of machines. A typical deployment has a dedicated machine that runs only the NameNode software. Each of the other machines in the cluster runs one instance of the DataNode software. The architecture does not preclude running multiple DataNodes on the same machine but in a real deployment that is rarely the case.The existence of a single NameNode in a cluster greatly simplifies the architecture of the system. The NameNode is the arbitrator and repository for all HDFS metadata. The system is designed in such a way that user data never flows through the NameNode.The File System NamespaceHDFS supports a traditional hierarchical file organization. A user or an application can create directories and store files inside these directories. The file system namespace hierarchy is similar to most other existing file systems; one can create and remove files, move a file from one directory to another, or rename a file. HDFS does not yet implement user quotas or access permissions. HDFS does not support hard links or soft links. However, the HDFS architecture does not preclude implementing these features.The NameNode maintains the file system namespace. Any change to the file system namespace or its properties is recorded by the NameNode. An application can specify the number of replicas of a file that should be maintained by HDFS. The number of copies of a file is called the replication factor of that file. This information is stored by the NameNode.Data ReplicationHDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time.The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster.Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode.Replica Placement: The First Baby StepsThe placement of replicas is critical to HDFS reliability and performance. Optimizing replica placement distinguishes HDFS from most other distributed file systems. This is a feature that needs lots of tuning and experience. The purpose of a rack-aware replica placement policy is to improve data reliability, availability, and network bandwidth utilization. The current implementation for the replica placement policy is a first effort in this direction. The short-term goals of implementing this policy are to validate it on production systems, learn more about its behavior, and build a foundation to test and research more sophisticated policies.Large HDFS instances run on a cluster of computers that commonly spread across many racks. Communication between two nodes in different racks has to go through switches. In most cases, network bandwidth between machines in the same rack is greater than network bandwidth between machines in different racks.The NameNode determines the rack id each DataNode belongs to via the process outlined in Rack Awareness. A simple but non-optimal policy is to place replicas on unique racks. This prevents losing data when an entire rack fails and allows use of bandwidth from multiple racks when reading data. This policy evenly distributes replicas in the cluster which makes it easy to balance load on component failure. However, this policy increases the cost of writes because a write needs to transfer blocks to multiple racks.For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack, another on a different node in the local rack, and the last on a different node in a different rack. This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure; this policy does not impact data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three. With this policy, the replicas of a file do not evenly distribute across the racks. One third of replicas are on one node, two thirds of replicas are on one rack, and the other third are evenly distributed across the remaining racks. This policy improves write performance without compromising data reliability or read performance.The current, default replica placement policy described here is a work in progress. Replica SelectionTo minimize global bandwidth consumption and read latency, HDFS tries to satisfy a read request from a replica that is closest to the reader. If there exists a replica on the same rack as the reader node, then that replica is preferred to satisfy the read request. If angg/ HDFS cluster spans multiple data centers, then a replica that is resident in the local data center is preferred over any remote replica.SafemodeOn startup, the NameNode enters a special state called Safemode. Replication of data blocks does not occur when the NameNode is in the Safemode state. The NameNode receives Heartbeat and Blockreport messages from the DataNodes. A Blockreport contains the list of data blocks that a DataNode is hosting. Each block has a specified minimum number of replicas. A block is considered safely replicated when the minimum number of replicas of that data block has checked in with the NameNode. After a configurable percentage of safely replicated data blocks checks in with the NameNode (plus an additional 30 seconds), the NameNode exits the Safemode state. It then determines the list of data blocks (if any) that still have fewer than the specified number of replicas. The NameNode then replicates these blocks to other DataNodes.The Persistence of File System MetadataThe HDFS namespace is stored by the NameNode. The NameNode uses a transaction log called the EditLog to persistently record every change that occurs to file system metadata. For example, creating a new file in HDFS causes the NameNode to insert a record into the EditLog indicating this. Similarly, changing the replication factor of a file causes a new record to be inserted into the EditLog. The NameNode uses a file in its local host OS file system to store the EditLog. The entire file system namespace, including the mapping of blocks to files and file system properties, is stored in a file called the FsImage. The FsImage is stored as a file in the NameNode’s local file system too.The NameNode keeps an image of the entire file system namespace and file Blockmap in memory. This key metadata item is designed to be compact, such that a NameNode with 4 GB of RAM is plenty to support a huge number of files and directories. When the NameNode starts up, it reads the FsImage and EditLog from disk, applies all the transactions from the EditLog to the in-memory representation of the FsImage, and flushes out this new version into a new FsImage on disk. It can then truncate the old EditLog because its transactions have been applied to the persistent FsImage. This process is called a checkpoint. In the current implementation, a checkpoint only occurs when the NameNode starts up. Work is in progress to support periodic checkpointing in the near future.The DataNode stores HDFS data in files in its local file system. The DataNode has no knowledge about HDFS files. It stores each block of HDFS data in a separatefile in its local file system. The DataNode does not create all files in the same directory. Instead, it uses a heuristic to determine the optimal number of files per directory and creates subdirectories appropriately. It is not optimal to create all local files in the same directory because the local file system might not be able to efficiently support a huge number of files in a single directory. When a DataNode starts up, it scans through its local file system, generates a list of all HDFS data blocks that correspond to each of these local files and sends this report to the NameNode: this is the Blockreport.The Communication ProtocolsAll HDFS communication protocols are layered on top of the TCP/IP protocol. A client establishes a connection to a configurable TCP port on the NameNode machine. It talks the ClientProtocol with the NameNode. The DataNodes talk to the NameNode using the DataNode Protocol. A Remote Procedure Call (RPC) abstraction wraps both the Client Protocol and the DataNode Protocol. By design, the NameNode never initiates any RPCs. Instead, it only responds to RPC requests issued by DataNodes or clients.RobustnessThe primary objective of HDFS is to store data reliably even in the presence of failures. The three common types of failures are NameNode failures, DataNode failures and network partitions.Data Disk Failure, Heartbeats and Re-ReplicationEach DataNode sends a Heartbeat message to the NameNode periodically. A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. The NameNode detects this condition by the absence of a Heartbeat message. The NameNode marks DataNodes without recent Heartbeats as dead and does not forward any new IO requests to them. Any data that was registered to a dead DataNode is not available to HDFS any more. DataNode death may cause the replication factor of some blocks to fall below their specified value. The NameNode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a DataNode may become unavailable, a replica may become corrupted, a hard disk on a DataNode may fail, or the replication factor of a file may be increased.Cluster RebalancingThe HDFS architecture is compatible with data rebalancing schemes. A scheme might automatically move data from one DataNode to another if the free space on a DataNode falls below a certain threshold. In the event of a sudden high demand for a particular file, a scheme might dynamically create additional replicas and rebalance other data in the cluster. These types of data rebalancing schemes are not yet implemented.Data IntegrityIt is possible that a block of data fetched from a DataNode arrives corrupted. This corruption can occur because of faults in a storage device, network faults, or buggy software. The HDFS client software implements checksum checking on the contents of HDFS files. When a client creates an HDFS file, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace. When a client retrieves file contents it verifies that the data it received from each DataNode matches the checksum stored in the associated checksum file. If not, then the client can opt to retrieve that block from another DataNode that has a replica of that block.Metadata Disk FailureThe FsImage and the EditLog are central data structures of HDFS. A corruption of these files can cause the HDFS instance to be non-functional. For this reason, the NameNode can be configured to support maintaining multiple copies of the FsImage and EditLog. Any update to either the FsImage or EditLog causes each of the FsImages and EditLogs to get updated synchronously. This synchronous updating of multiple copies of the FsImage and EditLog may degrade the rate of namespace transactions per second that a NameNode can support. However, this degradation is acceptable because even though HDFS applications are very data intensive in nature, they are not metadata intensive. When a NameNode restarts, it selects the latest consistent FsImage and EditLog to use.The NameNode machine is a single point of failure for an HDFS cluster. If the NameNode machine fails, manual intervention is necessary. Currently, automatic restart and failover of the NameNode software to another machine is not supported.SnapshotsSnapshots support storing a copy of data at a particular instant of time. One usage of the snapshot feature may be to roll back a corrupted HDFS instance to a previously known good point in time. HDFS does not currently support snapshots but will in a future release.Data OrganizationData BlocksHDFS is designed to support very large files. Applications that are compatible with HDFS are those that deal with large data sets. These applications write their data only once but they read it one or more times and require these reads to be satisfied at streaming speeds. HDFS supports write-once-read-many semantics on files. A typical block size used by HDFS is 64 MB. Thus, an HDFS file is chopped up into 64 MB chunks, and if possible, each chunk will reside on a different DataNode.StagingA client request to create a file does not reach the NameNode immediately. In fact, initially the HDFS client caches the file data into a temporary local file. Application writes are transparently redirected to this temporary local file. When the local file accumulates data worth over one HDFS block size, the client contacts the NameNode. The NameNode inserts the file name into the file system hierarchy and allocates a data block for it. The NameNode responds to the client request with the identity of the DataNode and the destination data block. Then the client flushes the block of data from the local temporary file to the specified DataNode. When a file is closed, the remaining un-flushed data in the temporary local file is transferred to the DataNode. The client then tells the NameNode that the file is closed. At this point, the NameNode commits the file creation operation into a persistent store. If the NameNode dies before the file is closed, the file is lost.The above approach has been adopted after careful consideration of target applications that run on HDFS. These applications need streaming writes to files. If a client writes to a remote file directly without any client side buffering, the network speed and the congestion in the network impacts throughput considerably. This approach is not without precedent. Earlier distributed file systems, e.g. AFS, have used client side caching to improve performance. APOSIX requirement has been relaxed to achieve higher performance of data uploads.Replication PipeliningWhen a client is writing data to an HDFS file, its data is first written to a local file as explained in the previous section. Suppose the HDFS file has a replication factor of three. When the local file accumulates a full block of user data, the client retrieves a list of DataNodes from the NameNode. This list contains the DataNodes that will host a replica of that block. The client then flushes the data block to the first DataNode. The first DataNode starts receiving the data in small portions (4 KB), writes each portion to its local repository and transfers that portion to the second DataNode in the list. The second DataNode, in turn starts receiving each portion of the data block, writes that portion to its repository and then flushes that portion to the third DataNode. Finally, the third DataNode writes the data to its local repository. Thus, a DataNode can be receiving data from the previous one in the pipeline and at the same time forwarding data to the next one in the pipeline. Thus, the data is pipelined from one DataNode to the next.AccessibilityHDFS can be accessed from applications in many different ways. Natively, HDFS provides a Java API for applications to use. A C language wrapper for this Java API is also available. In addition, an HTTP browser can also be used to browse the files of an HDFS instance. Work is in progress to expose HDFS through the WebDAV protocol.FS ShellHDFS allows user data to be organized in the form of files and directories. It provides a commandline interface called FS shell that lets a user interact with the data in HDFS. The syntax of this command set is similar to other shells (e.g. bash, csh) that users are already familiar with. Here are some sample action/command pairs:FS shell is targeted for applications that need a scripting language to interact with the stored data.DFSAdminThe DFSAdmin command set is used for administering an HDFS cluster. These are commands that are used only by an HDFS administrator. Here are some sample action/command pairs:Browser InterfaceA typical HDFS install configures a web server to expose the HDFS namespace through a configurable TCP port. This allows a user to navigate the HDFS namespace and view the contents of its files using a web browser.Space ReclamationFile Deletes and UndeletesWhen a file is deleted by a user or an application, it is not immediately removed from HDFS. Instead, HDFS first renames it to a file in the /trash directory. The file can be restored quickly as long as it remains in /trash. A file remains in/trash for a configurable amount of time. After the expiry of its life in /trash, the NameNode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.A user can Undelete a file after deleting it as long as it remains in the /trash directory. If a user wants to undelete a file that he/she has deleted, he/she can navigate the /trash directory and retrieve the file. The /trash directory contains only the latest copy of the file that was deleted. The /trash directory is just like any other directory with one special feature: HDFS applies specified policies to automatically delete files from this directory. The current default policy is to delete files from /trash that are more than 6 hours old. In the future, this policy will be configurable through a well defined interface.Decrease Replication FactorWhen the replication factor of a file is reduced, the NameNode selects excess replicas that can be deleted. The next Heartbeat transfers this information to the DataNode. The DataNode then removes the corresponding blocks and the corresponding free space appears in the cluster. Once again, there might be a time delay between the completion of the setReplication API call and the appearance of free space in the cluster.中文译本原文地址:/docs/r0.18.3/hdfs_design.html一、引言Hadoop分布式文件系统(HDFS)被设计成适合运行在通用硬件(commodity hardware)上的分布式文件系统。