SEC A Practical Secure Erasure Coding Scheme for Peer-to-Peer Storage System
- 格式:pdf
- 大小:157.09 KB
- 文档页数:14
Nutanix技术概览及联想HX 系列钟德荣联想售前技术支持AgendaIT基础架构发展以及超融合架构概念Nutanix超融合架构介绍Nutanix超融合高级功能介绍Lenovo HX系列超融合架构的重要概念IT 基础架构演进:传统→集成系统(1.0)→超融合(2.0)Cisco 、Brocade 、IntelEMC 、NetApp 、HP 、HDS Cisco 、Brocade HP 、Dell 、Cisco 、LenovoVMware 、Microsoft 、Linux …Cisco 、Brocade 、Intel EMC 、NetApp 、Oracle Cisco 、Brocade HP 、Cisco 、Oracle VMware 、Microsoft 、Linux…Cisco 、Brocade 、IntelVMware 、Microsoft 、KVM…传统IT 基础架构三层架构,独立采购融合系统/集成系统整机柜打包,集成化部署超融合架构硬件与虚拟化完整融合Nutanix 、EVO:RAIL SimpliVity 、MaxtaIT基础架构对比架构类型传统架构融合式架构超融合架构架构特点计算、存储、网络资源独立计算、存储、网络集成化部署与虚拟化平台完整融合计算、存储、网络融合预先完成系统整合快速完成部署简化架构简化管理积木堆叠扩展代表厂商与产品EMC、IBM、HP、NetApp VCE、VSPEX、FlexPod Nutanix、EVO:RAIL、SimpliVity融合与超融合的对比分析超融合:定义及关键特性超融合(HCI:Hyper-Converged Infrastructure)✓天然的将两个或更多的组件组合成一个单元,并可以很容易的扩展✓紧耦合的计算、网络和存储硬件,不再需要专门的SAN存储✓通过软件层面与硬件结合:✓提供存储管理功能(及可选的备份、恢复、复制、重删和压缩等)✓以及通过Hypervisor提供计算资源计算与存储资源融合软件定义基础架构分布式自治系统线性横向扩展负载是多维度的。
2025年软件资格考试信息安全工程师(基础知识、应用技术)合卷(中级)复习试题(答案在后面)一、基础知识(客观选择题,75题,每题1分,共75分)1、信息安全工程师在进行风险评估时,以下哪种方法不属于定性风险评估方法?A、问卷调查法B、专家判断法C、历史数据分析法D、概率风险评估法2、在信息安全管理体系(ISMS)中,以下哪项不是信息安全管理体系文件的一部分?A、信息安全政策B、信息安全组织结构C、信息安全风险评估报告D、信息安全操作手册3、以下关于计算机病毒的说法中,正确的是()A、计算机病毒是一种程序,它只能通过物理媒介传播B、计算机病毒是一种生物病毒,它可以通过空气、水源等生物媒介传播到计算机C、计算机病毒是一种程序,它可以通过网络、移动存储设备等多种途径传播D、计算机病毒是一种恶意软件,它不能通过任何途径传播4、以下关于信息安全的表述中,不属于信息安全基本要素的是()A、保密性B、完整性C、可用性D、可审计性5、题干:在信息安全领域,以下哪种加密算法属于对称加密算法?A. RSAB. DESC. SHA-256D. MD56、题干:以下哪项不属于信息安全的基本要素?A. 可用性B. 完整性C. 保密性D. 法律性7、在信息安全领域,以下哪种加密算法属于对称加密算法?A. RSAB. AESC. DESD. SHA-2568、在信息安全中,以下哪个术语描述的是数据在传输过程中的安全?A. 数据保密性B. 数据完整性C. 数据可用性D. 数据不可否认性9、以下哪项不属于信息安全的基本原则?A. 完整性B. 可用性C. 可信性D. 可追溯性 10、在信息安全事件中,以下哪种类型的攻击通常是指攻击者通过欺骗手段获取系统访问权限?A. 拒绝服务攻击(DoS)B. 网络钓鱼(Phishing)C. 中间人攻击(MITM)D. 系统漏洞攻击11、题目:以下哪种加密算法属于对称加密算法?A. RSAB. AESC. DESD. SHA-25612、题目:以下关于信息安全的描述,错误的是?A. 信息安全的目标是保护信息的完整性、可用性、保密性和可控性B. 防火墙是保护网络安全的第一道防线C. 加密技术是实现信息安全的重要手段之一D. 物理安全只涉及计算机硬件的保护13、关于密码学中的对称加密算法和非对称加密算法,下列说法错误的是:A. 对称加密算法使用相同的密钥进行加密与解密。
Overview GuideTable of ContentsChapter 1. Introduction (1)Chapter 2. DOCA Libraries (2)2.1. DOCA App Shield (2)2.2. DOCA Arg Parser (2)2.3. DOCA Comm Channel (2)2.4. DOCA Compress (3)2.5. DOCA Core (3)2.6. DOCA DMA (3)2.7. DOCA DPA (3)2.8. DOCA DPI (3)2.9. DOCA Erasure Coding (3)2.10. DOCA Ethernet (4)2.11. DOCA Flow (4)2.12. DOCA GPUNetIO (4)2.13. DOCA IPsec (4)2.14. DOCA Programable Congestion Control (5)2.15. DOCA RDMA (5)2.16. DOCA RegEx (5)2.17. DOCA Rivermax (5)2.18. DOCA SHA (5)2.19. DOCA Telemetry (6)2.20. DOCA UCX (6)Chapter 1.IntroductionDOCA programming guides provide the full picture of DOCA libraries and their APIs. Each guide includes an introduction, architecture, API, and many more library-specific information with the aim of making DOCA libraries easy to use.Chapter 2.DOCA LibrariesDOCA libraries are designed to serve DOCA-based software such as the provided example applications. For optimal performance, it is recommended to run these applications on the DPU. However, if necessary, DOCA libraries can be run on the host.In addition, built-in gRPC support for DOCA allows certain libraries to be used by gRPC clients running on the host that communicates with a matching gRPC server which implements the library's functionality on the DPU.2.1. DOCA App ShieldDOCA App Shield library API offers intrusion detection capabilities using the built-in hardware services of the DPU to collect data from the host's memory space. App Shield makes it possible to detect attacks on critical services in the host system. This library leverages the DPU's direct memory access (DMA) capability to monitor the host's memory space directly without involving the host's operating system nor CPU.2.2. DOCA Arg ParserDOCA Arg Parser library offers DOCA-based programs an easy and simple command-line interface. Arg Parser supports both regular command-line arguments and a JSON mode that accepts a JSON file containing the required arguments.2.3. DOCA Comm ChannelDOCA Comm Channel library creates a secure, network-independent communication channel between the host and the DPU. Comm Channel provides a client-server API. Comm Channel is reliable and message-based. It offers a notification mechanism that can be used by Linux system calls (e.g., epoll, poll, select) and support for multiple connections on the server-side.2.4. DOCA CompressThe DOCA Compress library offers a hardware-accelerated way to compress and decompress data on both DPU and host.2.5. DOCA CoreThe DOCA Core library provides a unified interface to construct standardized DOCA workflows that other libraries and applications can build upon.2.6. DOCA DMAThe DOCA Direct Memory Access (DMA) library offers an API for copying data buffers between the host and the DPU using hardware acceleration, supporting both local and remote copy. DMA allows the execution of complex memory operations in an optimized, hardware-accelerated manner.2.7. DOCA DPAThe DOCA DPA library offers a programming model for offloading communication-centric user code to run on the DPA processor on NVIDIA® BlueField®-3 DPU.DOCA DPA provides a high-level programming interface to the DPA processor.2.8. DOCA DPIDOCA Deep Packet Inspection (DPI) library offers a deep examination of data packets as they traverse a monitored network checkpoint. DPI provides a robust mechanism for enforcing network packet filtering, as it can be used to identify or block a range of complex threats due to efficient data stream inspection.DPI leverages the RegEx engine on the DPU which can very efficiently parse regular expressions found in packets.DOCA DPI has built-in gRPC support.2.9. DOCA Erasure CodingThe DOCA Erasure Coding library provides an API to encode and decode data using hardware acceleration, supporting both the host and NVIDIA® BlueField® DPU memory regions.DOCA Erasure Coding recovers lost data fragments by creating generic redundancy fragments (backup). Each redundancy block that the library creates can help recover any block in the original data should total loss of a fragment occur.DOCA Erasure Coding increases data redundancy and reduces data overhead.2.10. DOCA EthernetThe DOCA Ethernet library provides two APIs for receiving Ethernet packets on an RX queue and for sending Ethernet packets on a TX queue respectively.The library collects the user configuration data on the host CPU side, creates TX/RX objects, and exports them to the GPU side for execution in the data-path.2.11. DOCA FlowDOCA Flow library is the most fundamental API for building generic execution pipes in hardware. The main building block of the library is a pipe. Each pipe consists of match criteria, monitoring, and a set of actions. Pipes can be chained to create a set of complex actions to be performed on ingress packets.This library serves as an abstraction layer API for network acceleration and should be used by applications intended to offload packet processing from the operating system Kernel directly to the user space.DOCA Flow has a built-in gRPC-support.2.12. DOCA GPUNetIOThe DOCA GPUNetIO library offers building blocks to create a GPU-centric packet processing network application where CUDA kernels are capable of directly interacting with the network card without involving the CPU in the main critical path.This library provides CUDA device functions to send and receive packets. Additionally, an object named semaphore is provided to allow message passing across CUDA kernels or a CUDA kernel and a CPU thread.This library also allow allocating memory on the GPU that would be accessible from the CPU and vice versa.2.13. DOCA IPsecThe DOCA IPsec library provides an API to create the security association (SA) objects required for DOCA Flow's hardware-accelerated encryption and decryption.2.14. DOCA Programable CongestionControlThe DOCA Programmable congestion control (PCC) library provides a high-level programming interface that allows users to implement their own customized congestion control (CC) algorithm.2.15. DOCA RDMADOCA RDMA enables direct access to the memory of remote machines, without interrupting the processing of their CPUs or operating systems. Avoiding CPU interruptions reduces context switching for I/O operations, leading to lower latency and higher bandwidth compared to traditional network communication methods.2.16. DOCA RegExDOCA RegEx library provides regular expression pattern matching to DOCA programs. It provides access to the regular expression processing (RXP) engine, a high-performance hardware-accelerated engine available on the DPU.RegEx allows the execution of complex regular expression operations in an optimized, hardware-accelerated manner.2.17. DOCA RivermaxThe DOCA Rivermax library provides an API for using NVIDIA® Rivermax®, an optimized networking SDK for media and data streaming applications. Rivermax leverages the DPU hardware streaming acceleration technology which allows data to be transferred to and from the GPU to deliver best-in-class throughput and latency.2.18. DOCA SHAThe DOCA SHA library provides a flexible and unified API to leverage the secure hash algorithm offload engine present in the NVIDIA® BlueField®-2 DPU. The SHA hardware engine supports SHA-1, SHA-256, and SHA-512 algorithms either as "single shot" or stateful calculations.2.19. DOCA TelemetryDOCA Telemetry library offers a fast and convenient way to transfer user-defined data to the DOCA Telemetry Service (DTS). Telemetry API provides the user a choice between several different outputs including saving the data directly to storage, NetFlow, Fluent Bit forwarding, or Prometheus endpoint.2.20. DOCA UCXUnified Communication X (UCX) is an optimized point-to-point communication framework. UCX exposes a set of abstract communication primitives that makes the best use of available hardware resources and offloads. UCX facilitates rapid development by providing a high-level API, masking the low-level details, while maintaining high performance and scalability.NoticeThis document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation nor any of its direct or indirect subsidiaries and affiliates (collectively: “NVIDIA”) make no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assume no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality.NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document.NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document, ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.No license, either expressed or implied, is granted under any NVIDIA patent right, copyright, or other NVIDIA intellectual property right under this document. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Use of such information may require a license from a third party under the patents or other intellectual property rights of the third party, or a license from NVIDIA under the patents or other intellectual property rights of NVIDIA.Reproduction of information in this document is permissible only if approved in advance by NVIDIA in writing, reproduced without alteration and in full compliance with all applicable export laws and regulations, and accompanied by all associated conditions, limitations, and notices.THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Notwithstanding any damages that customer might incur for any reason whatsoever, NVIDIA’s aggregate and cumulative liability towards customer for the products described herein shall be limited in accordance with the Terms of Sale for the product.TrademarksNVIDIA, the NVIDIA logo, and Mellanox are trademarks and/or registered trademarks of Mellanox Technologies Ltd. and/or NVIDIA Corporation in the U.S. and in other countries. The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive licensee of Linus Torvalds, owner of the mark on a world¬wide basis. Other company and product names may be trademarks of the respective companies with which they are associated.Copyright© 2023 NVIDIA Corporation & affiliates. All rights reserved.NVIDIA Corporation | 2788 San Tomas Expressway, Santa Clara, CA 95051。
Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded ReadsOsama Khan,Randal Burns Department of Computer Science Johns Hopkins UniversityJames Plank,William PierceDept.of Electrical Engineering and Computer ScienceUniversity of TennesseeCheng HuangMicrosoft ResearchAbstractTo reduce storage overhead,cloudfile systems are transitioning from replication to erasure codes.This pro-cess has revealed new dimensions on which to evalu-ate the performance of different coding schemes:the amount of data used in recovery and when performing degraded reads.We present an algorithm thatfinds the optimal number of codeword symbols needed for recov-ery for any XOR-based erasure code and produces re-covery schedules that use a minimum amount of data. We differentiate popular erasure codes based on this cri-terion and demonstrate that the differences improve I/O performance in practice for the large block sizes used in cloudfile systems.Several cloud systems[15,10]have adopted Reed-Solomon(RS)codes,because of their gen-erality and their ability to tolerate larger numbers of fail-ures.We define a new class of rotated Reed-Solomon codes that perform degraded reads more efficiently than all known codes,but otherwise inherit the reliability and performance properties of Reed-Solomon codes.1IntroductionCloudfile systems transform the requirements for era-sure codes because they have properties and workloads that differ from traditionalfile systems and storage ar-rays.Our model for a cloudfile system using era-sure codes is inspired by Microsoft Azure[10].It con-forms well with HDFS[8]modified for RAID-6[14] and Google’s analysis of redundancy coding[15].Some cloudfile systems,such as Microsoft Azure and the Google File system,create an append-only write work-load using a large block size.Writes are accumulated and buffered until a block is full and then the block is sealed: it is erasure coded and the coded blocks are distributed to storage nodes.Subsequent reads to sealed blocks often access smaller amounts data than the block size,depend-ing upon workload[14,46].When examining erasure codes in the context of cloud file systems,two performance critical operations emerge. These are degraded reads to temporarily unavailable data and recovery from single failures.Although era-sure codes tolerate multiple simultaneous failures,single failures represent99.75%of recoveries[44].Recovery performance has always been important.Previous work includes architecture support[13,21]and workload op-timizations for recovery[22,48,45].However,it is par-ticularly acute in the cloud owing to scale.Massive sys-tems have frequent component failures so that recovery becomes part of regular operation[16].Frequent and temporary data unavailability in the cloud results in degraded reads.In the period between failure and recovery,reads are degraded because they must reconstruct data from unavailable storage nodes us-ing erasure codes.This is by necessity a slower opera-tion than reading the data without reconstruction.Tem-porary unavailability dominates disk failures.Transient errors in which no data are lost account for more than 90%of data center failures[15],owing to network par-titions,software problems,or non-disk hardware faults. For this reason,Google delays the recovery of failed stor-age nodes for15minutes.Temporary unavailability also arises systematically when software upgrades take stor-age nodes offline.In many data centers,software updates are a rolling,continuous process[9].Only recently have techniques emerged to reduce the data requirements of recovering an erasure code.Two re-cent research projects have demonstrated how the RAID-6codes RDP and EVENODD may recover from single disk failures by reading significantly smaller subsets of codeword symbols than the previous standard practice of recovering from the parity drive[51,49].Our contribu-tions to recovery performance generalize these results to all XOR-based erasure codes,analyze existing codes to differentiate them based on recovery performance,and experimentally verify that reducing the amount of data used in recovery translates directly into improved perfor-mance for cloudfile systems,but not for typical RAID array configurations.Wefirst present an algorithm thatfinds the optimal number of symbols needed for recovering data from an arbitrary number of disk failures,which also minimizes the amount of data read during recovery.We include an analysis of single failures in RAID-6codes that reveals that sparse codes,such as Blaum-Roth[5],Liberation [34]and Liber8tion[35],have the best recovery proper-ties,reducing data by about30%over the standard tech-nique that recovers each row independently.We also an-alyze codes that tolerate three or more disk failures,in-cluding the Reed-Solomon codes used by Google[15] and Microsoft Azure[10].Our implementation and evaluation of this algorithm demonstrates that minimizing recovery data translates di-rectly into improved I/O performance for cloudfile sys-tems.For large stripe sizes,experimental results track the analysis and increase recovery throughput by30%.How-ever,the algorithm requires the large stripes created by large sealed blocks in cloudfile systems in order to amor-tize the seek costs incurred when reading non-contiguous symbols.This is in contrast to recovery of the smaller stripes used by RAID arrays and in traditionalfile sys-tems in which the streaming recovery of all data outper-forms our algorithm for stripe sizes below1MB.Prior work on minimizing recovery I/O[51,49,27]is purely analytic,whereas our work incorporates measurements of recovery performance.We also examine the amount of data needed to perform degraded reads and reveal that it can use fewer symbols than recovery.An analysis of RAID-6and three disk failure codes shows that degraded read performance dif-ferentiates codes that otherwise have the same recovery properties.Reads that request less than a stripe of data make the savings more acute,as much as50%.Reed-Solomon codes are particularly poor for de-graded reads in that they must always read all data disks and parity for every degraded read.This is problem-atic because RS codes are popular owing to their gen-erality and applicability to nearly all coding situations. We develop a new class of codes,Rotated Reed-Solomon codes,that exceed the degraded read performance of all other codes,but otherwise have the encoding perfor-mance and reliability properties of RS Codes.Rotated RS codes can be constructed for arbitrary numbers of disks and failures.2Related WorkPerformance Metrics:Erasure codes have been eval-uated historically on a variety of metrics,such as the CPU impact of encoding and decoding[3,11,37],the penalty of updating small amounts of data[5,26,52]and the ability to reconfigure systems without re-encoding[3,7,26].The CPU performance of different erasure codes can vary significantly.However,for all codes that we consider,encoding and decoding bandwidth is orders of magnitude faster than disk bandwidth.Thus,the dom-inant factor when sealing data is writing the erasure-coded blocks to disk,not calculating the codes.Simi-larly,when decoding either for recovery or for degraded reads,the dominant factor is reading the data. Updating small amounts of data is also not a con-cern in cloudfile systems—the append-only write pattern and sealed blocks eliminate small writes in their entirety. System reconfiguration refers to changing coding param-eters:changing the stripe width or increasing/decreasing fault tolerance.This type of reconfigurability is less im-portant in clouds because each sealed block defines an independent stripe group,spread across cloud storage nodes differently than other sealed blocks.There is no single array of disks to be reconfigured.If the need for reconfiguration arises,each sealed block is re-encoded independently.There has been some work lowering I/O costs in erasure-coded systems.In particular,WEA VER[19], Pyramid[23]and Stepped Combination Codes[18]have all been designed to lower I/O costs on recovery.How-ever,all of these codes are non-MDS,which means that they do not have the storage efficiency that cloud stor-age systems demand.The REO RAID Engine[26]min-imizes I/O in erasure-coded storage systems;however, its focus is primarily on the effect of updates on storage systems of smaller scale.Cloud Storage Systems:The default storage policy in cloudfile systems has become triplication(triple repli-cation),implemented in the Google File system[16]and adopted by Hadoop[8]and many others.Triplication has been favored because of its ease of implementation,good read and recovery performance,and reliability.The storage overhead of triplication is a concern,lead-ing system designers to consider erasure coding as an al-ternative.The performance tradeoffs between replication and erasure coding are well understood and have been evaluated in many environments,such as peer-to-peerfile systems[43,50]and open-source coding libraries[37]. Investigations into applying RAID-6(two fault toler-ant)erasure codes in cloudfile systems show that they reduce storage overheads from200%to25%at a small cost in reliability and the performance of large reads [14].Microsoft research further explored the cost/benefit tradeoffs and expand the analysis to new metrics:power proportionality and complexity[53].For these reasons, Facebook is evaluating RAID-6and erasure codes in their cloud infrastructure[47].Our work supports this trend,providing specific guidance as to the relative mer-its of different RAID-6codes with a focus on recover-ability and degraded reads.Ford et al.[15]have developed reliability models forGoogle’s cloudfile system and validated models against a year of workload and failure data from the Google in-frastructure.Their analysis concludes that data place-ment strategies need to be aware of failure groupings and failure bursts.They also argue that,in the presence of correlated failures,codes more fault tolerant than RAID-6are needed to to reduce exposure to data loss;they con-sider Reed-Solomon codes that tolerate three and four disk failures.Windows Azure storage employs Reed-Solomon codes for similar reasons[10].The rotated RS codes that we present inherit all the properties of Reed-Solomon codes and improve degraded reads. Recovery Optimization:Workload-based approaches to improving recovery are independent of the choice of erasure code and apply to minimum I/O recovery algo-rithm and rotated RS codes that we present.These in-clude:load-balancing recovery among disks[22],recov-ering popular datafirst to decrease read degradation[48], and only recovering blocks that contain live data[45]. Similarly,architecture support for recovery can be ap-plied to our codes,such as hardware that minimizes data copying[13]and parity declustering[21].Reducing the amount of data used in recovery has only emerged recently as a topic and thefirst results have given minimum recovery schedules for EVENODD[49] and row-diagonal parity[51],both RAID-6codes.We present an algorithm that defines the recovery I/O lower bound for any XOR-based erasure code and allows mul-tiple codes to be compared for I/O recovery cost. Regenerating codes provide optimal recovery band-width[12]among storage nodes.This concept is differ-ent than minimizing I/O;each storage node reads all of its available data and computes and sends a linear combi-nation.Regenerating codes were designed for distributed systems in which wide-area bandwidth limits recovery performance.Exact regenerating codes[39]recover lost data exactly(not a new linear combination of data).In addition to minimizing recovery bandwidth,these codes can in some cases reduce recovery I/O.The relationship between recovery bandwidth and recovery data size re-mains an open problem.RAID systems suffer reduced performance during recovery because the recovery process interferes with workload.Tian et al.[48]reorder recovery so that fre-quently read data are rebuiltfirst.This minimizes the number of reads in degraded mode.Jin et al.[25]pro-pose reconfiguring an array from RAID-5to RAID-0 during recovery so that reads to strips of data that are not on the failed disk do not need to be recovered.Our treatment differs in that we separate degraded reads from recovery;we make degraded reads more efficient by re-building just the requested data,not the entirestripe.Figure1:One stripe from an erasure coded storage sys-tem.The parameters are k=6,m=3and r=4.3Background:Erasure Coded Storage Erasure coded storage systems add redundancy for fault-tolerance.Specifically,a system of n disks is partitioned into k disks that hold data and m disks that hold coding information.The coding information is calculated from the data using an erasure code.For practical storage sys-tems,the erasure code typically has two properties.First, it must be Maximum Distance Separable(MDS),which means that if any m of the n disks fails,their contents may be recomputed from the k surviving disks.Second, it must be systematic,which means that the k data disks hold unencoded data.An erasure coded storage system is partitioned into stripes,which are collections of disk blocks from each of the n disks.The blocks themselves are partitioned into symbols,and there is afixed number of symbols for each disk in each stripe.We denote this quantity r.The stripes perform encoding and decoding as independent units in the disk system.Therefore,to alleviate hot spots that can occur because the coding disks may require more activ-ity than the data disks,one can rotate the disks’identities on a stripe-by-stripe basis.For the purpose of our analysis,we focus on a sin-gle stripe.There are k data disks labeled D0,...,D k−1 and m coding disks labeled C0,...,C m−1.There are nr symbols in the stripe.We label the r symbols on data disk i as d i,0,d i,1,...,d i,r−1and on coding disk j as c j,0,c j,1,...,c j,r−1.We depict an example system in Figure1.In this example,k=6,m=3(and there-fore n=9)and r=4.Erasure codes are typically defined so that each sym-bol is a w-bit word,where w is typically small,often one.Then the coding words are defined as computations of the data words.Thus for example,suppose an era-sure code were defined in Figure1for w=1.Then each symbol in the stripe would be composed of one sin-gle bit.While that eases the definition of the erasureAll erasure codes may be expressed in terms of a matrix-vector product.An example is pictured in Figure3.This continues the example from Figure1,where k=6, m=3and r=4;In this picture,the erasure code is de-fined precisely.This is a Cauchy Reed-Solomon code[6] optimized by the Jerasure library[38].The word size,w equals one,so all symbols are treated as bits and arith-metic is composed solely of the XOR operation.The kr symbols of data are organized as a kr-element bit vector. They are multiplied by a nr×kr Generator matrix G T.1 The product is a vector,called the codeword,with nr el-ements.These are all of the symbols in the stripe.Each collection of r symbols in the vector is stored on a differ-ent disk in the system.Since the the top kr rows of G T compose an identity matrix,thefirst kr symbols in the codeword contain the 1The archetypical presentation of erasure codes[26,29,32]typi-cally uses the transpose of this matrix;hence,we call this matrix G T.mr symbols are calculated from themr rows of the Generator matrix.disks fail,the standard methodolgy fork surviving disks and create a de-from the kr rows of the Generator ma-to them.The product of B−1andk surviving disks yields the originalMDS erasure codes that apply toReed-Solomon codes[40]are de-of k and m.With a Reed-Solomonw must be such that2w≥n.Gener-constructed from a Vandermonde ma-k×k subset of the Generator matrixis quite a bit of reference materialcodes as they apply to storage sys-41],plus numerous open-source Reed-libraries[42,38,30,31].codes convert Reed-Solomon1and w>1to a code where r=wdoing so,they remove the expensiveGalois Fields and replace it with addi-There are an exponential numberthe Generator matrix of a CauchyThe Jerasure library attempts towith a minimal number of non-zerothese matrices that we use in our exam-Reed-Solomon codes.,otherwise known as RAID-6,therebit of research on constructing codesthe CPU performance is optimized.RDP[11]and Blaum-Roth[5]codes allbe a prime number such that k≤r+1k≤r.The Liberation codes[34]a prime number and k≤r,and the Liber8tion code[35]is defined for r=8and k≤r.The latter three codes(Blaum-Roth,Liberation and Liber8tion)belong to a family of codes called Minimum Density codes,whose Generator matrices have a prov-ably minimum number of ones.Both EVENODD and RDP codes have been extrapo-lated to higher values of m[2,4].We call these Gen-eralized EVENODD and RDP.With m=3,the same restrictions on r apply.For larger values of m,there are additional restrictions on r.The STAR code[24]is an instance of the generalized EVENODD codefor m=3, where recovery is performed without using the Generator matrix.All of the above codes have a convenient feature that disk C0is constructed as the parity of the data disks,as in RAID-4/5.Thus,the r rows of the Generator matrix im-mediately below the identity portion are composed of k (r×r)identity matrices.To be consistent with these RAID systems,we will refer to disk C0as the“P drive.”Figure3:The matrix-vector representation of an erasure code.The parameters are the same as Figure1:k=6,m=3 and r=4.Symbols are one bit(i.e.w=1).This is a Cauchy Reed-Solomon code for these parameters.4Optimal Recovery of XOR-Based Era-sure codesWhen a data disk fails in an erasure coded disk array,it is natural to reconstruct it simply using the P drive.Each failed symbol is equal to the XOR of corresponding sym-bols on each of the other data disks,and the parity sym-bol on the P disk.We call this methodology“Reading from the P drive.”It requires k symbols to be read from disk for each decoded symbol.Although it is straightforward both in concept and im-plementation,in many cases,reading from the P drive requires more I/O than is necessary.In particular,de-pending on the erasure code,there are savings that can be exploited when multiple symbols are recovered in the same stripe.This effect wasfirst demonstrated by Xiang et al.in RDP systems in which one may reconstruct all the failed blocks in a stripe by reading25percent fewer symbols than reading from the P drive[51].In this sec-tion,we approach the problem in general.4.1Algorithm to Determine the MinimumNumber of Symbols for RecoveryWe present an algorithm for recovering from a single disk failure in any XOR-based erasure code with a mini-mum number of symbols.The algorithm takes as input a Generator matrix whose symbols are single bits and the identity of a failed disk and outputs equations to decode each failed symbol.The inputs to the equations are the symbols that must be read from disk.The number of in-puts is minimized.The algorithm is computationally expensive—for the systems evaluated for this paper,each instantiation took from seconds to hours of compute-time.However,for any realistic storage system,the number of recovery sce-narios is limited,so that the algorithm may be run ahead of time,and the results may be stored for when they are required by the system.We explain the algorithm by using the erasure code of Figure4as an example.This small code,with k=m= r=2,is not an MDS code;however its simplicity facil-itates our explanation.We label the rows of G T as R i, 0≤i<nr.Each row R i corresponds to a data or coding symbol,and to simplify our presentation,we will refer to symbols using R i rather than d i,j or c i,j.Consider a set of symbols in the codeword whose corresponding rows in the Generator matrix sum to a vector of zeroes.One example is{R0,R2,R4}.We call such a set of symbols a decoding equation,because the fact their rows sum to zero allows us to decode any one symbol in the set as long as the remaining symbols are not lost.Suppose that we enumerate all decoding equations for a given Generator matrix,and suppose that some sub-set F of the codeword symbols are lost.For each sym-bol R i∈F,we can determine the set E i of decod-ing equations for R i.Formally,an equation e i∈E i if e i∩F={R i}.For example,the equation represented by the set{R0,R2,R4}may be a decoding equation in e2 so long as neither R0nor R4is in F.Figure4:An example erasure code to explain the algo-rithm to minimize the number of symbols required to re-cover from failures.We can recover all the symbols in F by selecting one decoding equation e i from each set E i,reading the non-failed symbols in e i and then XOR-ing them to produce the failed symbol.To minimize the number of symbols read,our goal is to select one equation e i from each E i such that the number of symbols in the union of all e i is minimized.For example,suppose that a disk fails,and both R0 and R1are lost.A standard way to decode the failed bits is to read from the P drive and use coding sym-bols R4and R5.In equation form,F={R0,R1} e0={R0,R2,R4}and e1={R1,R3,R5}.Since e0 and e1have distinct symbols,their union is composed of six symbols,which means that four must be read for re-covery.However,if we instead use{R1,R2,R7}for e1, then(e0∪e1)hasfive symbols,meaning that only three are required for recovery.Thus,our problem is as follows:Given|F|sets of decoding equations E0,E1,...E|F|−1,we wish to se-lect one equation from each set such that the size of the union of these equations is minimized.Unfortunately, this problem is NP-Hard in|F|and|E i|.2However,we can solve the problem for practical values of|F|and|E i| (typically less than8and25respectively)by converting the equations into a directed,weighted graph andfinding the shortest path through the graph.Given an instance of the problem,we convert it to a graph as follows.First,we represent each decoding equation in set form as an nr-element bit string.For example,{R0,R2,R4}is repre-sented by10101000.Each node in the graph is also represented by an nr-element bit string.There is a starting node Z whose string is all zeroes.The remaining nodes are partitioned into|F|sets,labeled S0,S1,...S|F|−1.For each equa-tion e0∈E0,there is a node s0∈S0whose bit string equals e0’s bit string.There is an edge from Z to each s0 whose weight is equal to the number of ones in s0’s bit string.For each node s i∈S i,there is an edge that cor-responds to each e i+1∈E i+1.This edge is to a node s i+1∈S i+1whose bit string is equal to the bitwise OR of the bit strings of s i and e i+1.The OR calculates the union of the equations leading up to s i and e i+1.The weight of the edge is equal to the difference between the number of ones in the bit strings of s i and s i+1.The shortest path from Z to any node in S|F|−1denotes the minimum number of elements required for recovery.If we annotate each edge with the decoding equation that creates it,then the shortest path contains the equations that are used for recovery.To illustrate,suppose again that F={R0,R1},mean-ing f0=R0and f1=R1.The decoding equations 2Reduction from Vertex Cover.for E0and E1are denoted by e i,j where i is the index of the lost symbol in the set F and j is an index into the set E i.E0and E1are enumerated below:E0E1e0,0=10101000e1,0=01010100e0,1=10010010e1,1=01101110e0,2=10011101e1,2=01100001e0,3=10100111e1,3=01011011These equations may be converted to the graph de-picted in Figure5,which has two shortest paths of length five:{e0,0,e1,2}and{e0,1,e1,0}.Both require three symbols for recovery:{R2,R4,R7}and{R3,R5,R6}. While the graph clearly contains an exponential num-ber of nodes,one may program Dijkstra’s algorithm to determine the shortest path and prune the graph drasti-cally.For example,in Figure5,the shortest path will be discovered before the the dotted edges and grayed nodes are considered by the algorithm.Therefore,they may bepruned.Figure5:The graph that results when R0and R1are lost.4.2Algorithm for ReconstructionWhen data disk i fails,the algorithm is applied for F= {d i,0,...,d i,r−1}.When coding disk j fails,F= {c j,0,...,c j,r−1}.If a storage system rotates the iden-tities of the disks on a stripe-by-stripe basis,then the av-erage number of symbols for all failed disks multiplied by the total number of stripes gives a measure of the sym-bols required to reconstruct a failed disk.4.3Algorithm for Degraded ReadsTo take maximum advantage of parallel I/O,we assume that contiguous symbols in thefile system are stored ondifferent disks in the storage system.In other words,if one is reading three symbols starting with symbol d0,0, then those three symbols are d0,0,d1,0and d2,0,coming from three different disk drives.To evaluate degraded reads,we assume that an appli-cation desires to read B symbols starting at symbol d x,y, and that data disk f has failed.We determine the penalty of the failure to be the number of symbols required to perform the read,minus B.There are many cases that can arise from the differ-ing values of B,f,x and y.To illustrate,first suppose that B<k(which is a partial read case)and that none of the symbols to be read reside on disk f.Then the failure does not impact the read operation—it takes exactly B symbols to complete the read,and the penalty is zero. As a second case,consider when B=kr and d x,y= d0,0.Then we are reading exactly one stripe in its en-tirety.In this case,we have to read the(k−1)r non-failed data symbols to fulfill the read request.Therefore,we may recover very easily from the P drive by reading all of its symbols and decoding.The read requires kr=B symbols.Once again,the penalty is zero.However,consider the case when B=k,f=0,and d x,y=d1,0.Symbols d1,0through d k−1,0are non-failed and must be read.Symbol d0,1must also be read and it is failed.If we use the P drive to recover,then we need to read d1,1through d k−1,0and c0,1.The total symbols read is2k−1:the failure has induced a penalty of k−1 symbols.In all of these cases,the degraded read is contained by one stripe.If the read spans two stripes,then we can calculate the penalty as the sum of the penalties of the read in each stripe.If the read spans more than two stripes,then we only need to calculate the penalties in the first and last stripe.This is because,as described above, whole-stripe degraded reads incur no penalty.When we perform a degraded read within a stripe,we modify our algorithm slightly.For each non-failed data symbol that must be read,we set its bit in the state of the starting node Z to one.For example,in Figure4,sup-pose we are performing a degraded read where B=2, f=0and d x,y=d0,0.There is one failed bit:F=d0,0. Since d1,0=R2must be read,the starting state Z of the shortest path graph is labeled00100000.The algorithm correctly identifies that only c0,0needs to be read to re-cover d0,0and complete the read.5Rotated Reed-Solomon CodesBefore performing analyses of failed disk reconstruction and degraded reads,we present two instances of a new erasure code,called the Rotated Reed-Solomon code. These codes have been designed to be MDS codes that optimize the performance of degraded reads for single disk failures.The general formulation and theoretical evaluation of these codes is beyond the scope of this pa-per;instead,we present instances for m∈{2,3}. Figure6:A Reed-Solomon code for k=6and m= 3.Symbols must be w-bit words such that w≥4,andarithmetic is over GF(2w).The most intuitive way to present a Rotated Reed-Solomon code is as a modification to a standard Reed-Solomon code.We present such a code for m≤3in Equation1.As with all Reed-Solomon codes,r=1.for0≤j<3,c j,0=k−1i=02jid i,0(1)This is an MDS code so long as k,m,r and w adhere to some constraints,which we detail at the end of this section.This code is attractive because one may imple-ment encoding with XOR and multiplication by two and four in GF(2w),which are all very fast operations.For example,the m=2version of this code lies at the heart of the Linux RAID-6coding engine[1].We present the code pictorally in Figure6.A chain of circles denotes taking the XOR of d i,0;a chain of tri-angles denotes taking the XOR of2i d i,0,and a chain of squares denotes taking the XOR of4i d i,0.To convert this code into a Rotated Reed-Solomon code,we allow r to take on any positive value,and define the coding symbols with Equation2.c j,b=kjm−1i=0(2j)i d i,(b+1)%r+k−1i=kjm(2j)i d i,b.(2)Intuitively,the Rotated Reed-Solomon code converts the one-row code in Figure6into a multi-row code, and then the equations for coding disks1and2are split across adjacent rows.We draw the Rotated Reed-Solomon codes for k=6and m={2,3}and r=3in Figures7and8.These codes have been designed to improve the penalty of degraded reads.Consider a RAID-6system that performs a degraded read of four symbols starting at d5,0when disk5has failed.If we reconstruct from。
2022年职业考证-软考-信息安全工程师考试全真模拟易错、难点剖析B卷(带答案)一.综合题(共15题)1.单选题Symmetric-key cryptosystems use the(1)key for encryption and decryption of a message, though a message or group of messages may have a difent key than others. A significant disadvantage of symmetric ciphers is the key management necessary to use them securely. Each ditinct pair of communicating parties must, ideally, share a diferent key, and perhaps each ciphertext exchanged as well. The number of keys required increases as the square of the number of network members, which very quickly requires complex key management schemes to keep them all straight and secret. The dificulty of securely establishing a secret (2)between two communicating parties, when a secure channel doesn't already exist between them, also presents a chicken-and-egg problem which is a considerable practical obstacle for cryptography users in the real world.Whitfield Diffe and Martin Hellman, authors of the frst paper on public-key cryptography.In a groundbreaking 1976 paper, Whitfield Difle and Martin Hellman proposed the notion of public-key (also, more generally, called asymmetric key) cryptography in which two different but mathematically related keys are used- a public key and a private key. A public key system is so constructed that calculation of one key (he private key) is computationally infeasible (3)the other (the pubic key), even though they are necessarily related. Instead, both keys are generated secretly, as an interrelated pair. The historian David Kahn described public-key cryptography as "the most revolutionary new concept in the field since poly-alphabetic substitution emerged in the Renaissance".In public-key cryptosystems, the(4)key may be feely distributed, while its paired private key must remain secret. The public key is typically used for encryption, while the private or secret key is used for decryption. Dife and Hllman showed that public-key cryptography was possible by presenting the Difit-Hellman key exchange protocol.In 1978, Ronald Rivest, Adi Shamir, and Len Adleman invented(5), another public-key system. In 1997, it finally became publicly known that asymmetric key cryptography had been invented by James H. Ellis at GCHQ, a British itelligence organization, and that, in the early 1970s, both the Diffie-Hellman and RSA algorithms had been previously developed (by Malcolm J. Williamson and Clifford Cocks, respectively).问题1选项A.differentB.sameC.publicD.private问题2选项A.plaintextB.streamC.ciphertextD.key问题3选项A.fromB.inC.toD.of问题4选项A.publicB.privateC.symmetricD.asymmetric问题5选项A.DESB.AESC.RSAD.IDEA【答案】第1题:B第2题:D第3题:A第4题:A第5题:C【解析】第1题:第2题:第3题:第4题:第5题:对称密钥密码系统使用相同(same)的密钥对消息进行加密和解密,尽管一条消息或一组消息可能使用与其他消息不同的密钥。
存储HCIP模拟考试题含参考答案一、单选题(共38题,每题1分,共38分)1.某企业要求进行数据备份时占用的存储空间少且备份窗口短,此时我们最好建议该企业采用哪种备份类型?A、累积增量式备份结合差异增量式备份B、完全备份C、完全备份结合差异增量式备份D、完全备份结合累积增量式备份正确答案:C2.下列哪个选项不是云存储的典型应用场景:A、政务云B、运营云C、私有云D、警务云正确答案:C3.对 FTP 主动传输模式的理解正确的时哪一项?A、客户端使用端口号 21 为源端口进行数据连接的建立B、服务器使用端口号 20 为源端口进行数据连接的建立C、服务器使用端口号 21 为源端口进行数据连接的建立D、客户端使用端口号 20 为源端口进行数据连接的建立正确答案:B4.华为 OceanStor 9000 InfoEqualizer 默认支持的自动负载均衡方式是那个?A、按节点综合负载B、轮循C、按节点连接数D、按节点吞吐量正确答案:B5.InfiniBand 协议特点不包括以下哪一项?A、速度高B、远程直接内存存取功能C、基于标准协议开发D、传输卸载,可将数据包路由从芯片级转到 os正确答案:D6.在华为混合闪存存储系列存储系统中,关于 IP 漂移逻辑端口的关系描述错误的是以下哪一项?A、1 个 VLAN 上可以配置多个绑定端口B、1 个以太网端口上可以配置多个绑定端口C、1 个以太网端口.上可以配置多个 VLAND、1 个绑定端口上可以配置多个逻辑端口正确答案:A7.华为 OceanStor 9000 对外提供单一文件系统,具有统一的文件系统命名空间,以下有关 OceanStor 9000 统一命名空间描述不正确的是哪一项?A、命名空间在 OceanStor 9000 启动时自动创建,其名称与系统命名保持一致B、命名空间作为整个系统的根目录,不可以被共享和访问,只有其下面的子目录才可以被共享和访问。
C、统一的命名空间为用户提供统一的存储资源池,实现对存储空间的集中管理和分配D、各文件共享目录均为命名空间的子目录正确答案:B8.以下哪个选项的应用没有提升 oceanstor v3 存储产品的可靠性?A、内置 bbuB、支持 acc 卡C、支持 raid2.0+D、部件冗余设计正确答案:B9.站点 A 需要的存储容量为 2543GB.,站点 B 需要的存储容量为3000GB.,站点 B 的备份数据远程复制到站点 A保存。
24秋学期《信息安全概论》作业参考
1.PKI系统的核心是什么?
选项A:CA服务器
选项B:RA服务器
选项C:安全服务器
选项D:数据库服务器
参考答案:A
2.AES算法正式公布于哪一年
选项A:1982
选项B:1992
选项C:2002
选项D:2012
参考答案:B
3.在计算机系统中可以用做口令的字符有多少个?
选项A:75
选项B:85
选项C:95
选项D:105
参考答案:C
4.以下哪项是公开密钥密码算法?
选项A:DES
选项B:AES
选项C:RC5
选项D:RSA
参考答案:D
5.《计算机信息系统安全保护等级划分准则》将信息系统划分为几个等级?
选项A:1
选项B:3
选项C:5
选项D:7
参考答案:C
6.Windows系统的SAM数据库保存什么信息?
选项A:账号和口令
选项B:时间
选项C:日期
选项D:应用程序
参考答案:A
7.在CC标准中,要求按照严格的商业化开发惯例,应用专业级安全工程技术及
思想,提供高等级的独立安全保证的安全可信度级别为
选项A:EAL1
选项B:EAL3
选项C:EAL5
选项D:EAL7
参考答案:C
8.在面向变换域的数字水印算法中,DCT算法是一种
选项A:离散傅里叶变换算法
选项B:离散余弦变换算法
选项C:离散小波变换算法
选项D:最低有效位变换算法
参考答案:B
9.CIH病毒是一种
选项A:传统病毒
选项B:蠕虫病毒
选项C:木马
选项D:网络钓鱼软件。
保护知识产权的重要性高中生英语作文【中英文实用版】The Importance of Protecting Intellectual Property RightsIntellectual property rights refer to the legal rights given to individuals or organizations over their creations or inventions.These rights include patents, trademarks, copyrights, and trade secrets.Protecting intellectual property rights is crucial for various reasons.Firstly, protecting intellectual property rights encourages innovation and creativity.When individuals or companies have the assurance that their creations will be protected from unauthorized use, they are more likely to invest time and resources into developing new ideas, products, and technologies.This, in turn, leads to economic growth, as innovative businesses can thrive and contribute to the advancement of society.Secondly, protecting intellectual property rights fosters competition and economic efficiency.When intellectual property is safeguarded, businesses can fully benefit from their investments in research and development.This encourages healthy competition, as companies strive to create better and more innovative products to gain a competitive edge.Additionally, it promotes economic efficiency by ensuring that resources are allocated to their most productive uses.Furthermore, protecting intellectual property rights supportscreators and inventors.Artists, writers, musicians, and other creative individuals often rely on their intellectual property rights to earn a living.When these rights are violated, such as when someone copies or uses their work without permission, it can lead to financial loss and discourage future creative endeavors.By protecting their rights, society recognizes and values the efforts and talents of these individuals, encouraging them to continue producing excellent work.Lastly, protecting intellectual property rights contributes to global trade and cooperation.Intellectual property theft can harm international relationships and trade agreements.Countries that respect and enforce intellectual property rights build trust and foster cooperation with other nations.This promotes a fair and level playing field for businesses worldwide, encouraging global trade and economic prosperity.In conclusion, protecting intellectual property rights is of utmost importance.It encourages innovation, supports creators, promotes competition and economic efficiency, and contributes to global trade and cooperation.As high school students, it is essential for us to understand and respect intellectual property rights, not only for our own benefit but also for the betterment of society as a whole.。
SEC: A Practical Secure Erasure Coding Schemefor Peer-to-Peer Storage System1Jing Tian, Zhi Yang, Yafei DaiCNDS Lab, Peking University{tianjing, yangzhi, dyf}@Abstract. Though conventional block cipher is widely used in Peer-to-Peerstorage systems, it is insufficient to ensure the confidentiality of long-term ar-chive data because of the inherent Peer-to-Peer storage vulnerability that data isstored on other untrusted peers. There are two potential security weak spots: (1)sensitive data may be exposed from the encrypted data stored on an adversarypeer when the weakness of block cipher is discovered or there is a leakage ofsecret key; (2) even when a user realizes the security threats, he/she cannot de-stroy the encrypted data stored on an adversary peer to minimizing the loss.This paper proposes a novel secure erasure code (SEC) scheme to solve theabove security problems. The SEC scheme encodes the sensitive data into sev-eral fragments, and then stores the fragments onto different peers. In theory,SEC ensures unconditional confidentiality at the fragment level, which meansthat an adversary cannot obtain any portion of sensitive data from local storedencrypted fragment even if he/she has the secret key and infinite computationpower. By leveraging the large scale property of Peer-to-Peer systems, SECfurther makes it infeasible for the adversary to collect enough fragments for de-crypting. SEC can be used alone or together with block cipher to solve thesepotential security problems. The performance results show that the SEC schemeis practical for the real-world applications.1 IntroductionIn recent years, the Peer-to-Peer technology has been shown to be promising in con-structing large scale persistent storage. Many Peer-to-Peer storage systems are pro-posed, such as [1] [2] [3] [4] [5] [6]. Extensive research efforts have focused on pro-viding stable overlay network and long-term data durability. However, there has been little research reported on the confidentiality of the data.In Peer-to-Peer storage systems, the user’s data is stored on disks of other un-trusted peers, and any peer can leave the system at any time. Consequently, there is a great challenge for the confidentiality and durability of user’s data. The conventional approach used in several proposed systems [1] [2] [6] is to encrypt the user’s data to 1This work is supported by National Grand Fundamental Research 973 program of China under Grant No.2004CB318204, National Natural Science Foundation of China under Grant No.90412008ensure its confidentiality, and then store the encrypted data using either a replication or an erasure code [7] strategy. However, we believe that only the block cipher en-cryption is insufficient because of the inherent Peer-to-Peer storage system vulner-ability that data is stored on other untrusted peers. The potential security weak spots are: (1) sensitive data may be exposed from the encrypted data stored on an adversary peer when the weakness of block cipher is discovered or there is a leakage of secret key; (2) even when a user realizes the security threats, the user cannot destroy the encrypted data stored on an adversary peer to minimizing the loss. In other words, once the encrypted data is stored, the owner is put in a passive position in the face of various security threats. These problems set great barrier for the applications of the Peer-to-Peer storage system.To solve these problems, this paper proposes a novel secure erasure code (SEC) scheme for the Peer-to-Peer storage systems. The SEC scheme is in fact a customized Reed-Solomon erasure code. By customizing, SEC provides a strong strength data encryption as well as the data redundancy at the same time. The SEC first encrypts data into several fragments, and then distributes the fragments out as independent objects to different peers by the underlying Peer-to-Peer storage protocol. The SEC scheme guarantees that less than a threshold number of fragments do not give any information about any portion of the encrypted data theoretically. This property pre-vents adversary from decrypting the sensitive data only from the local stored frag-ment even if the weakness of the cipher is discovered or the adversary obtains the secret key. By the system assumption detailed in section 4, the SEC scheme further makes it infeasible that the adversary can collect enough fragments of one object for decrypting throughout the Peer-to-Peer system. Even though the adversary may col-lect enough fragments, the strong strength encryption of SEC protects the confidenti-ality of data. Thus, SEC provides a three-layer security. The SEC alone can be used as an encryption scheme, while it can also be used with conventional block cipher to solve security problems in Peer-to-Peer storage systems.The body of this paper is organized as follows. After introducing some preliminary knowledge in section 2, we describe the whole SEC scheme in section 3. In section 4, the system model and a three-layer security analysis framework are presented, and the optimal parameter of the SEC is suggested according to a detailed analysis of the security. The performance results are given in section 5 which proves the practicality of the SEC scheme. In section 6, we show the two application modes of the SEC. After introducing the related works in section 7, we conclude this paper in section 8. 2 Reed-Solomon Codes PreliminariesAs opposed to replication scheme, erasure codes provide a more flexible redundancy scheme to tolerate storage component failures. Erasure codes divide a data object into m fragments, and encode them into n fragments, where all the fragments have same size and n m ≥, so the redundancy rate r is n/m . When decoding, erasure codes only need any t (t m ≥) fragments out of all the n encoded fragments to reconstruct the data. The original data fragments can be viewed as a vector 12(,,...,)m D D D D =, and theencoding function Enc maps the original data fragments onto the encoded fragments ()Enc D E =, 12(,,...,)n E E E E =. The decoding function Dec maps a subset of E back to the original data, 12(,,...,)t i i i Dec E E E D =.Reed-Solomon codes are an important subclass of erasure codes. Reed-Solomon codes are linear codes , and that means the encoding function is linear. Consequently, the encoding function of a Reed-Solomon code can be described by a vector-matrix multiplication ()Enc D DG = of a generator matrix G and the original data fragments vector D , where G is an m n × matrix.Reed-Solomon codes are also maximal distance separable (MDS) codes, where MDS means that any m encoded data fragments are sufficient to recover the data. It is easy to see that the MDS property requires that any m m × square submatrix of G is non-singular. Consequently, the decoding function can also be described by vector-matrix multiplication 1(')''Dec E E G −=, where E’ is an m elements sub vector of E and G’ is the submatrix with m corresponding rows of G .To achieve a low computation cost, designers usually employ the systematic Reed-Solomon codes. A code is called systematic if and only if its generator matrix has the form (I m |R ), where I m is an m m × identity matrix. Hence, the m original data frag-ments also appear in the encoded fragments.3 The SEC SchemeThe SEC scheme is composed of two parts, the encryption part and the fragment naming part. The encryption part is in fact a customized version of Reed-Solomon codes, which encodes the original data into several encrypted fragments. Then the fragment naming part assigns the encrypted fragments with some meaningless names and distributes them out. Recall that the Reed-Solomon code need the G’-1 to recon-struct the original data, then if we keep the generator matrix G as the secret key, the adversary cannot get the original data even when he/she has collected enough en-coded fragments. Based on this fact, SEC encrypts the data by creating a secret gen-erator matrix G SEC from the user specified key. The fragment naming part further makes it impractical for the adversary to collect enough fragments. In this section, we first show how to construct the required G SEC , and then describe the fragment naming algorithm.3.1 The Secure Generator MatrixThe Reed-Solomon codes do work in an infinite Z field. However, the real domain and range of the computation in our computer are binary words of a fixed length L . Hence, we must perform the Reed-Solomon codes over a Galois Field with 2L ele-ments, denoted by GF (2L ) [8]. The elements of GF (2L ) are the integers from zero to 2L -1, and each element is represented by a L bits word. To ensure the correctness of our computation, the GF (2L ) should at least contain more than n+m elements[9], namely 2L >m+n . In the following discussion, all the arithmetic is over the GF (2L ).However, the detailed computation over Galois Field is beyond the scope of this pa-per. Over the GF (2L ), we define our G SEC over GF (2L ) as followsDefinition 1 An m n × matrix G over GF (2L ) is called a secure generator matrix of the SEC scheme if and only if G is non-singular and it does not contain any (1)(1)m m −×− singular submatrix .One important subclass of non-singular matrices over GF (2L ) is Cauchy matrices, and we construct our G SEC by customizing the Cauchy matrix. Here, we give the defi-nition of Cauchy matrix.Definition 2 Let 1{,,}m x x … and 1{,,}n y y … be two sets of elements in GF(2L ). If the two sets satisfy the following constraint(1) ,{1,,}, i j x x i j m i j ≠∀∈≠…(2) ,{1,,}, i j y y i j n i j ≠∀∈≠…(3) {1,,}, {1,,}i jx y i m j n ≠∀∈∀∈…… then, the matrix111212122212111111111n n m m m n x y x y x y x y x y x y x y x y x y ⎡⎤⎢⎥+++⎢⎥⎢⎥⎢⎥+++⎢⎥⎢⎥⎢⎥⎢⎢+++⎣⎦is called a Cauchy matrix over GF (2L ).A Cauchy matrix has the following property, and the proof of it can be found in[10].Theorem 1 Every square submatrix of a Cauchy matrix is non-singular .3.2 The Construction of G SECG SEC requires any (1)(1)m m −×− square submatrix to be non-singular, and a Cauchy matrix satisfies this condition. However, it is impractical to keep such a big matrix G SEC as secret key. Contrarily, we generate the G SEC from a user specified secret key. In this subsection, we introduce how to construct the G SEC by customizing a Cauchy matrix according to a user specified l k bit secret key k G .Since the G SEC is an m n × matrix over GF (2L ), we also need m and n to be algo-rithm parameters besides k G . Here, we require l k to be a multiple of m+n , and L=l k /(m+n). Subsequently, the k G can be segmented into a vector consisting of m+n elements, where each element is a bit string of size L . By using the pseudo random function, we get a new vector of m+n distinct elements from the prior vector. Finally, these m+n elements can be used to generate the G SEC . The pseudocode is in Fig1. In the key generation procedure, there are two constraints of the parameters as fol-lows,22log ()log ((1))L m n m r >+=×+ (1)()(1)k l L m n L m r =×+=××+ (2)To encode the data, we should segment the original data into vectors, each of which has m bit strings of size L. Hence, for each vector, the vector-matrix multiplica-tion yields an encoded vector of size n. Thus, all the ith elements in all encoded vec-Fig1: The pseudocode to construct the G SECtors form the ith encoded fragment, and then we get n encoded fragments.3.3 Fragments Naming and PlacementTo keep the whole data secure, we store each encoded fragment as a new object indi-vidually in the Peer-to-Peer storage system to make it impractical for the adversary to collect enough fragments. Here, we employ a strong one-way hash function Hash, e.g. MD5 or SHA1, to hide the relationship between any two fragments of the same origi-nal data. We use SHA1 in this paper. Besides, we also need a bit vector V for eachoriginal data and a separate key k N . The filename of the ith fragment f i is generated as follows for i=1, 2, …,n (|| is the bit concatenation):10(||||)""i i N f Hash f k V f −=⎧⎨=⎩ (3) The V can be a concatenation of some properties of the original data, e.g. the path-name and the last modified time of a file. Though the k N can be the same with the key k G for G SEC , it is not suggested to do that.After assigned meaningless names, all the fragments should be distributed out ran-domly or by their names, and be stored on different peers throughout the whole Peer-to-Peer system using the underlying protocol.4 The Security of SECUnlike the block cipher, the SEC scheme provides data confidentiality at three layers. In this section, after giving the system model we present a three-layer security analy-sis framework for the SEC scheme, and then give a detailed analysis of the confiden-tiality based on the analysis framework, and finally we suggest the optimal parameter selection according to the analysis.4.1 System ModelIn the following analysis, we distinguish the two security levels, the information-theoretic security and computational security. The information-theoretic security, also called “unconditional security”, means that the ciphertext gives no information about the original data even if the adversary has infinite computational power. Compared with information-theoretic security, the computational security is a little weaker, and it means that no polynomial-time algorithm can reconstruct the original data from the ciphertext.By classifying the adversary’s intent, there are two adversary models for Peer-to-Peer storage systems, the passive adversary and the active adversary. A passive adver-sary only tries to peek at the data stored on the disk of local computer, and the adver-sary does not make an effort to compromise more computers for more information. In a Peer-to-Peer storage system, every user can be a passive adversary since the data are all stored on peers. Therefore, the passive adversaries are very dangerous though they seem somewhat naïve. It is clear that the conventional block cipher scheme is only computational secure in the face of passive adversary. An active adversary has more powerful ability. Besides accessing the local stored data, the active adversary may request other files in the system, or compromise other peers to learn the desired files. In particularly, an active adversary tries to collect all the fragments for one data object in a system using the SEC scheme. SEC provides very strong computational security in the face of the active adversary.In this paper, we make the following system assumptions of the underlying Peer-to-Peer system: (1) it provides anonymous routing [12] [13]; (2) it is Byzantine-fault-tolerant [14] [15]; (3) it is Sybil attack resistant [16]. These system assumptions are hot issues of Peer-to-Peer security, which have been extensively studied.4.2 Three-layer Security FrameworkThe SEC scheme provides the following three-layer securities,Fragment Security: The fragment security is the base layer security, which pro-vides an information-theoretic secure at fragment level. This ensures that less than m fragments reveal nothing about any fragment of the original data theoretically. It has been proved in [17] that less than m shares definitely give some information about the secret when the size of the share is smaller than the secret. However, less than m fragments can give no information about any single fragment of original data because the encoded fragments and the original fragments are in the same size.Distribution Security: The SEC scheme assigns the fragments with different meaningless names, which makes it impractical for the adversary to find out all the fragments of one data object throughout the large scale Peer-to-Peer system. Encryption Security: Though an adversary may collect enough m fragments, he/she has to find out the secret matrix G SEC to decrypt encoded fragments.The fragment security is important since a single fragment is theoretically mean-ingless for the adversary. Therefore, only with the local stored fragment the adversary cannot get any portion of the sensitive data even if the weakness of the cipher is dis-covered or the adversary gets the secret key. Thus, the passive adversary is no longer a threat for the sensitive data in SEC scheme, while the data stored on passive adver-sary is the core reason of the potential security problems in conventional scheme. Subsequently, fragment security enforces active adversary to collect enough frag-ments, which results in distribution security. Based on the system assumptions, an adversary can only try all the candidate combinations of fragments in the whole sys-tem for decrypting, which is shown to be impractical in next subsection. Because the adversary cannot get all the fragments of the sensitive data easily, there is a chance for the user to remove other fragments when there is security threat. Eventually the strong encryption security of SEC protects the sensitive data even if the adversary collects enough fragments.Without anonymous routing, the owner information of the fragments will be learned by adversaries, which will significantly weaken the distribution security. If there is Byzantine fault or Sybil attack, by attacking the routing tables or using multi-ple identifiers the adversary can lead more upload traffic onto themselves, and then the adversary can reduce the number of trials of candidate fragment combinations by the timing information of the concurrent uploads. This also weakens the distribution security.Though we can store the encrypted fragments of conventional block cipher by the simple segmenting, this simple scheme does not provide fragment security and the distribution security. This is because the adversary can obtain the portion of the sensi-tive data by decrypting a single fragment, and it is not necessary for the adversary to collect all the fragments. Therefore, the fragment security and the distribution security are particular properties of the SEC scheme.4.3 Analysis of Three-layer SecurityWith respect to the three-layer security framework, we prove the information-theoretic property of fragment security layer, and give a quantitative analysis on strength of the distribution security layer and the encryption security layer. Fragment SecurityTo prove the information-theoretic property of fragment security, we first give the following lemma.Lemma 2 Let G be an m m × matrix. There is no zero element in the inverse ma-trix G -1 if G does not contain any (1)(1)m m −×− singular submatrix .Proof: According to the computation of inverse matrix,*1||G G G −= if there is no (1)(1)m m −×− singular submatrix in G , there is no zero element in G *, so there is no zero element in G -1. Q.E.D of Lemma 2.Theorem 3 Let the m n × matrix G be the secure generator matrix of SEC . The SEC scheme provides information-theoretic security at fragment level i.e. the adver-sary cannot obtain any information about any fragment of the original data from less than m fragments even when the adversary has got the matrix G .Proof: Let 12(,,...,)m D D D be the original data vector, and 12(,,...,)m i i i E E E be anyencoded fragments vector with m element. Let G’ be the m m ×submatrix of G SEC with the columns i 1, i 2,…i m . Therefore, we have 12'112(,,...,)(,,...,)mm i i i D D D E E E G −=. Ac-cording to lemma 2, the G’-1 does not have any zero element. As a result, any frag-ment of the original data D j must be computed from enough m encoded fragments, and less than m encoded fragments give no information about any fragment of the original data in information theoretic sense. Q.E.D of Theorem 3. Distribution SecuritySince the SEC provides information-theoretic security at the fragment level, the active adversary will try to collect enough fragments throughout the whole system. However, meaningless fragment naming and random placement make it infeasible to collect sufficient m fragments. For a (m, n ) SEC scheme, if there are N possible frag-ments in the system, there are C (N,m ) possible combinations of m fragments while only C (n,m ) are valid ones. Thus, we get the number of trials for the adversary as following,(,)(,)(,)(,)C N m C N m C n m C m r m =× (4) If the adversary knows the V in (3), then the number of trails is as following, (,)min{,2}(,)k N l C N m C n m (5) where N k l is the length of k N .The SEC’s distribution security leverages the huge possible candidate fragments in Peer-to-Peer system to keep the high security of the data object. From (4), we find that the distribution security will strengthen dramatically along with the number of candidates N and that the larger redundancy rate will result in worse distribution secu-rity. In a real Peer-to-Peer storage system, we can improve the distribution security by increasing m while N is a relative stable property of the system.Encryption SecurityIf the worst case happens, that the adversary collects enough m fragments, the SEC would falls back on its third line of defense, the encryption security. Here, we give the analysis on the number of trials for the brute force attack.Lemma 4 Let 1212{,,...,,,,...,}m n x x x y y y , and 1212{',',...,',',',...,'}m n x x x y y y be two sequences over GF (2L ), where m>1 and n>1. If='' {1,,}, {1,,}i j i j x y x y i m j n ++∀∈∀∈……then =',{1,,}i i x x i m ∀∈… and =', {1,,}j j y y j n ∀∈… Proof: Because of that m>1 and n>1, the following linear equation system always exists (a, b, c and d are constants)1111121221212222''''''''x y a x y x y b x y x y c x y x y d x y +==+⎧⎪+==+⎪⎨+==+⎪⎪+==+⎩(6) It is clear that there is only one unique solution for both four variables linear equa-tion systems in (6). Hence, we get 11'x x = and 11'y y =. Consequently, we get =', {1,,}j j y y j n ∀∈… since 11=''j j x y x y ++ and 11'x x =. For the same reason, we have 'i i x x = {1,,}i m ∀∈…. Q.E.D of Lemma 4.From Lemma 4, we yield the following theorem,Theorem 5 A Cauchy matrix is determined by one and only one unique sequence 11{,...,,,...,}m n x x y ySuppose that the adversary has collected the m encoded fragments 12(,,...,)mi i i E E E , then we have 11111111(,...,)(,...,)11m m m i i m i i m i m i x y x y D D E E x y x y ⎛⎞⎜⎟++⎜⎟⎜⎟=⎜⎟⎜⎟⎜⎟++⎝⎠(7) where 12(,,...,)m D D D is the original data vector. Theorem 5 implies that there isonly one sequence 11{,...,,,...,}mm i i x x y y which can generate the m m × matrix used in (7). As a result, the adversary can only guess that very sequence over GF (2L ) for the decryption. Since there are 2m × different elements in the sequence, the ad-versary must try P(2L , 2m) times in the worst case. From (2) we yield,(1)(2, 2)(2,2)k l L r mP m P m +×= (8)From (8), it is obvious that for the given l k and r , the strength is determined by m ,5 Performance EvaluationAlthough the analysis has shown that the SEC scheme can provide strong strength confidentiality for Peer-to-Peer storage systems, it is important for a real applied scheme to consume less of the end user’s CPU cycles in a Peer-to-Peer environment. In this section, we report some performance results of our implementation in Java language. The experiment results are collected on a desktop computer, with a Cel-eron(R) CPU running at 2.4GHz, and 512MB of main memory, which are the typical settings of the desktop computer. Fig 2: Optimal m for different key length l k andnumber of candidate fragments N with r=1where L is 8 for all key lengths.Computation OverheadsThere are some computation overheads before the main vector-matrix multiplica-tion in the encoding and decoding phases of the SEC scheme. First of all, the SEC generates the log and inverse log tables for every elements in GF(2L ). Then the SEC generates the encoding matrix from the specified key for the encoding, or the inverse matrix for the decoding. We present the performance for a 512-bit key in table 1. It is clear that the computation overheads are negligible.Table 1. The computation overheads, l k =512, r=1, m=16, L=16 log/inverse log tables in GF(2L ) encoding matrix from key inverse matrix for decoding31ms 0.157ms 0.047msEncoding and Decoding ThroughputNow we focus on the main computation part of the SEC, the vector-matrix multi-plication over GF (2L ). We collect the throughput information for encoding a 100M data object using different l k and different r , and plot them in Fig 3. From the experi-ment results, we get1 k r encoding throughput l +∝ (10)which can be also deduced from [11]. The throughput for the decoding procedure is the same as the encoding procedure since the vector-matrix computations are the same.From (10) we get the following implications. First, the encoding and decoding throughputs are both independent with the parameter m when the key length is fixed. Therefore, we can make use of the optimal parameter m discussed in the prior section without influencing the throughput. Secondly, the larger r will result in better throughputs though it decreases the security strength. Thirdly, we can use a smaller key length to achieve better performance, when the SEC is used together with a block cipher and the block cipher provides strong enough encryption security.Fig 3 shows that the performance of the SEC is good enough. When using a 128-bit key with r=1, the SEC achieves a throughput of 20MB/s . Alternatively speaking, the SEC scheme only costs 7% of CPU cycles for a typical wide area peer with a 10Mb/s bandwidth. If short key length and large redundancy rate are used, the SEC scheme can even achieve a throughput over 200MB/s .6 Application of SEC SchemeStand-alone EncryptionThe SEC scheme provides information-theoretic security against the passive ad-versary and computational security against the active adversary. Therefore, the SEC alone can be used as an encryption scheme while providing the data redundancy as well. The encryption security of the SEC scheme can be very strong with r=1, e.g. thenumber of trials for a 128-bit key can be 2127.99. Though the encryption security dropswhen increasing r, the distribution security of SEC still keeps the overall security very strong. For instance, if the designer uses a SEC scheme with r=2 and l k≤128 in a system with N=104, the upper bound of the encryption security is only 288. However,when we use a 120-bit key with the parameter m=8, the overall number of trials forthe active adversary can reach 2150.8.The parameters of the SEC scheme can be adaptively configured to satisfy thegiven requirements of the performance, the security and the redundancy. First, ac-cording to (10) SEC determines the upper bound of l k for the given r and the per-formance criteria. Second, according to the estimated parameter N of the system, SEClooks for the optimal m and l k that satisfy both (1) and (2), where the overall securityof (9) meets the designer’s requirement. If the optimal m is found, the performance,the redundancy and the security are all satisfied by the selected parameters.When used in stand-alone mode, the vector-matrix multiplication may be attackedby the known-plaintext. To solve this problem, we can simply use the hash value ofthe secret key and the object identifier as the encryption key for every data object.Working with Block CipherThe SEC scheme can also be used together with the conventional block cipher, since the block cipher is known to have a strong encryption security and high per-formance. For example, the AES with 128-bit key has a throughput of 61MB/s [25]. When using together, the block cipher first encrypts the data object, and then SEC encrypts the encrypted data and distributes the fragments out. As the SEC is mainly responsible for the fragment security and the distribution security, it is not necessary to use a long encryption key length l k, which will result in a better throughput of the SEC. As a result, the overall performance is still good enough.7 Related WorksThe confidentiality is the essential issue for every Peer-to-Peer storage system. Most of the proposed systems, such as [1] [2] [3] [4] [5] and [6], simply employ the stan-dard block cipher algorithm.Shamir develops a secret sharing algorithm in [19], where less than a threshold fragments give no information of the data. Unfortunately, each share of the secret is the same size as the secret, so it is not space efficient for the storage system. The IDA [20] algorithm is also a secret sharing algorithm, where each share is smaller than the secret. The IDA can be applied in routing for parallel computers and reliable storage.[21] proposes a combined scheme of XOR secret sharing and replication to eliminate the key management problem in the collaborative environments. Pasis [18] uses the threshold scheme for both confidentiality and durability, where the threshold scheme includes the secret sharing scheme by definition. Unlike the threshold scheme, SEC can work alone as an encryption function because it encrypts data using the user specified secret key. Moreover, with the fragment naming sub-algorithm SEC is de-。