毕业论文外文翻译-数据仓库
- 格式:doc
- 大小:62.50 KB
- 文档页数:13
英文翻译数据库管理系统的介绍Raghu Ramakrishnan数据库(database,有时被拼作data base)又称为电子数据库,是专门组织起来的一组数据或信息,其目的是为了便于计算机快速查询及检索。
数据库的结构是专门设计的,在各种数据处理操作命令的支持下,可以简化数据的存储、检索、修改和删除。
数据库可以存储在磁盘、磁带、光盘或其他辅助存储设备上。
数据库由一个或一套文件组成,其中的信息可以分解为记录,每一条记录又包含一个或多个字段(或称为域)。
字段是数据存取的基本单位。
数据库用于描述实体,其中的一个字段通常表示与实体的某一属性相关的信息。
通过关键字以及各种分类(排序)命令,用户可以对多条记录的字段进行查询,重新整理,分组或选择,以实体对某一类数据的检索,也可以生成报表。
所有数据库(除最简单的)中都有复杂的数据关系及其链接。
处理与创建,访问以及维护数据库记录有关的复杂任务的系统软件包叫做数据库管理系统(DBMS)。
DBMS软件包中的程序在数据库与其用户间建立接口。
(这些用户可以是应用程序员,管理员及其他需要信息的人员和各种操作系统程序)DBMS可组织、处理和表示从数据库中选出的数据元。
该功能使决策者能搜索、探查和查询数据库的内容,从而对正规报告中没有的,不再出现的且无法预料的问题做出回答。
这些问题最初可能是模糊的并且(或者)是定义不恰当的,但是人们可以浏览数据库直到获得所需的信息。
简言之,DBMS将“管理”存储的数据项和从公共数据库中汇集所需的数据项用以回答非程序员的询问。
DBMS由3个主要部分组成:(1)存储子系统,用来存储和检索文件中的数据;(2)建模和操作子系统,提供组织数据以及添加、删除、维护、更新数据的方法;(3)用户和DBMS之间的接口。
在提高数据库管理系统的价值和有效性方面正在展现以下一些重要发展趋势:1.管理人员需要最新的信息以做出有效的决策。
2.客户需要越来越复杂的信息服务以及更多的有关其订单,发票和账号的当前信息。
外文文献-中文翻译-数据库英文原文2:《DBA Survivor: Become a Rock Star DBA》by Thomas LaRock,Published By Apress.2010You know that a database is a collection of logically related data elements that may be structured in various ways lo meet the multiple processing and retrieval needs of organizations and individuals. There’s nothing new about databases—early ones were chiseled in stone, penned on scrolls, and written on index cards. But now databases are commonly recorded on magnetizable media, and computer programs are required to perform the necessary storage and retrieval operations.Yo u’ll see in the following pages that complex data relationships and linkages may be found in all but the simplest databases. The system software package that handles the difficult tasks associated with creating, accessing, and maintaining database records is called a database management system (DBMS) .The programs in a DBMS package establish an interface between the database itself and the users of the database. (These users may be applications programmers, managers and others with information needs, and various OS programs.)A DBMS can organize, process, and present selected data elements from the database. This capability enables decision makers to search, probe, and query database contents in order to extract answers to nonrecurring and unplanned questions (hat aren't available in regular reports. These questions might initially be vague and / or poorly defined, but peo ple can "browse” through the database until they have the needed information. Inshort, the DBMS will “m anage”the stored data items and assemble the needed items from the common database in response to the queries of those who aren’t10programmers. In a file-oriented system, users needing special information may communicate their needs to a programmer, who, when time permits, will write one or more programs to extract the data and prepare the information[4].The availability of a DBMS, however, offers users a much faster alternative communications path.If the DBMS provides a way to interactively and update the database, as well as interrogate it capability allows for managing personal data-Aces however, it does not automatically leave an audit trail of actions and docs not provide the kinds of control a necessary in a multiuser organization. These-controls arc only available when a set of application programs arc customized for each data entry and updating function.Software for personal computers which perform me of the DBMS functions have been very popular. Personal computers were intended for use by individuals for personal information storage and process- These machines have also been used extensively small enterprises, professionals like doctors, acrylics, engineers, lasers and so on .By the nature of intended usage, database systems on these machines except from several of the requirements of full doge database systems. Since data sharing is not tended, concurrent operations even less so. the fewer can be less complex. Security and integrity maintenance arc de-emphasized or absent. As data limes will be small, performance efficiency is also important. In fact, the only aspect of a database system that is important is data Independence. Data-dependence, as stated earlier, means that applicant programs and user queries need not recognizant physical organization of data on secondary storage. The importance of this aspect, particularly for the personal computer user, is that this greatly simplifies database usage. The user can store, access and manipulate data a( a high level (close to (he application) and be totally shielded from the10low level (close to the machine) details of data organization. We will not discuss details of specific PC DBMS software packages here. Let us summarize in the following the strengths and weaknesses of personal computer data-base software systems:The most obvious positive factor is the user friendliness of the software. A user with no prior computer background would be able to use the system to store personal and professional data, retrieve and perform relayed processing. The user should, of course, satiety himself about the quality of software and the freedom from errors (bugs) so that invest-merits in data arc protected.For the programmer implementing applications with them, the advantage lies in the support for applications development in terms of input screen generations, output report generation etc. offered by theses stems.The main negative point concerns absence of data protection features. Unless encrypted, data cane accessed by whoever has access to the machine Data can be destroyed through mistakes or malicious intent. The second weakness of many of the PC-based systems is that of performance. If data volumes grow up to a few thousands of records, performance could be a bottleneck.For organization where growth in data volumes is expected, availability of. the same or compatible software on large machines should be considered.This is one of the most common misconceptions about database management systems that are used in personal computers. Thoroughly comprehensive and sophisticated business systems can be developed in dBASE, Paradox and other DBMSs. However, they are created by experienced programmers using the DBMS's own programming language. Thai is not the same as users who create and manage personal10files that are not part of the mainstream company system.Transaction Management of DatabaseThe objective of long-duration transactions is to model long-duration, interactive Database access sessions in application environments. The fundamental assumption about short-duration of transactions that underlies the traditional model of transactions is inappropriate for long-duration transactions. The implementation of the traditional model of transactions may cause intolerably long waits when transactions aleph to acquire locks before accessing data, and may also cause a large amount of work to be lost when transactions are backed out in response to user-initiated aborts or system failure situations.The objective of a transaction model is to pro-vide a rigorous basis for automatically enforcing criterion for database consistency for a set of multiple concurrent read and write accesses to the database in the presence of potential system failure situations. The consistency criterion adopted for traditional transactions is the notion of scrializability. Scrializa-bility is enforced in conventional database systems through theuse of locking for automatic concurrency control, and logging for automatic recovery from system failure situations. A “transaction’’ that doesn't provide a basis for automatically enforcing data-base consistency is not really a transaction. To be sure, a long-duration transaction need not adopt seri-alizability as its consistency criterion. However, there must be some consistency criterion.Version System Management of DatabaseDespite a large number of proposals on version support in the context of computer aided design and software engineering, the absence of a consensus on version semantics10has been a key impediment to version support in database systems. Because of the differences between files and databases, it is intuitively clear that the model of versions in database systems cannot be as simple as that adopted in file systems to support software engineering.For data-bases, it may be necessary to manage not only versions of single objects (e.g. a software module, document, but also versions of a collection of objects (e.g. a compound document, a user manual, etc. and perhaps even versions of the schema of database (c.g. a table or a class, a collection of tables or classes).Broadly, there arc three directions of research and development in versioning. First is the notion of a parameterized versioning", that is, designing and implementing a versioning system whose behavior may be tailored by adjusting system parameters This may be the only viable approach, in view of the fact that there are various plausible choices for virtually every single aspect of versioning.The second is to revisit these plausible choices for every aspect of versioning, with the view to discardingsome of themes either impractical or flawed. The third is the investigation into the semantics and implementation of versioning collections of objects and of versioning the database.There is no consensus of the definition of the te rm “management information system”. Some writers prefer alternative terminology such as “information processing system”, "information and decision syste m, “organizational information syste m”, or simply “i nformat ion system” to refer to the computer-based information processing system which supports the operations, management, and decision-making functions of an organization. This text uses “MIS” because i t is descriptive and generally understood; it also frequently uses "information system”instead of ''MIS” t o refer to an organizational information system.10A definition of a management information system, as the term is generally understood, is an integrated, user-machine system for providing information 丨o support operations, management, and decision-making functions in an organization. The system utilizes computer hardware and software; manual procedures: models for analysis planning, control and decision making; and a database. The fact that it is an integrated system does not mean that it is a single, monolithic structure: rather, ii means that the parts fit into an overall design. The elements of the definition arc highlighted below: Computer-based user-machine system.Conceptually, a management information can exist without computer, but it is the power of the computer which makes MIS feasible. The question is not whether computers should be used in management information system, but the extent to whichinformation use should be computerized. The concept of a user-machine system implies that some (asks are best performed humans, while others are best done by machine. The user of an MIS is any person responsible for entering input da(a, instructing the system, or utilizing the information output of the system. For many problems, the user and the computer form a combined system with results obtained through a set of interactions between the computer and the user.User-machine interaction is facilitated by operation in which the user's input-output device (usually a visual display terminal) is connected lo the computer. The computer can be a personal computer serving only one user or a large computer that serves a number of users through terminals connected by communication lines. The user input-output device permits direct input of data and immediate output of results. For instance, a person using The computer interactively in financial planning poses 4t what10if* questions by entering input at the terminal keyboard; the results are displayed on the screen in a few second.The computer-based user-machine characteristics of an MIS affect the knowledge requirements of both system developer and system user, “computer-based” means that the designer of a management information system must have a knowledge of computers and of their use in processing. The “user-machine” concept means the system designer should also understand the capabilities of humans as system components (as information processors) and the behavior of humans as users of information.Information system applications should not require users Co be computer experts. However, users need to be able lo specify(heir information requirements; some understanding of computers, the nature of information, and its use in various management function aids users in this task.Management information system typically provide the basis for integration of organizational information processing. Individual applications within information systems arc developed for and by diverse sets of users. If there are no integrating processes and mechanisms, the individual applications may be inconsistent and incompatible. Data item may be specified differently and may not be compatible across applications that use the same data. There may be redundant development of separate applications when actually a single application could serve more than one need. A user wanting to perform analysis using data from two different applications may find the task very difficult and sometimes impossible.The first step in integration of information system applications is an overall information system plan. Even though application systems are implemented one at a10time, their design can be guided by the overall plan, which determines how they fit in with other functions. In essence, the information system is designed as a planed federation of small systems.Information system integration is also achieved through standards, guidelines, and procedures set by the MIS function. The enforcement of such standards and procedures permit diverse applications to share data, meet audit and control requirements, and be shares by multiple users. For instance, an application may be developed to run on a particular small computer. Standards for integration may dictate that theequipment selected be compatible with the centralized database. The trend in information system design is toward separate application processing form the data used to support it. The separate database is the mechanism by which data items are integrated across many applications and made consistently available to a variety of users. The need for a database in MIS is discussed below.The term “information” and “data” are frequently used interchangeably; However, information is generally defined as data that is meaningful or useful to The recipient. Data items are therefore the raw material for producing information.The underlying concept of a database is that data needs to be managed in order to be available for processing and have appropriate quality. This data management includes both software and organization. The software to create and manage a database is a database management system.When all access to any use of database is controlled through a database management system, all applications utilizing a particular data item access the same data item which is stored in only one place. A single updating of the data item updates it for10all uses. Integration through a database management system requires a central authority for the database. The data can be stored in one central computer or dispersed among several computers; the overriding requirement is that there be an organizational function to exercise control.It is usually insufficient for human recipients to receive only raw data or even summarized data. Data usually needs to be processed and presented in such a way that Che result is directed toward the decision to be made. To do this, processing of dataitems is based on a decision model.For example, an investment decision relative to new capital expenditures might be processed in terms of a capital expenditure decision model.Decision models can be used to support different stages in the decision-making process. “Intelligence’’ models can be used to search for problems and/or opportunities. Models can be used to identify and analyze possible solutions. Choice models such as optimization models maybe used to find the most desirable solution.In other words, multiple approaches are needed to meet a variety of decision situations. The following are examples and the type of model that might be included in an MIS to aid in analysis in support of decision-making; in a comprehensive information system, the decision maker has available a set of general models that can be applied to many analysis and decision situations plus a set of very specific models for unique decisions. Similar models are available tor planning and control. The set of models is the model base for the MIS.Models are generally most effective when the manager can use interactive dialog (o build a plan or to iterate through several decision choices under different conditions.10中文译文2:《数据库幸存者:成为一个摇滚名明星》众所周知,数据库是逻辑上相关的数据元的汇集.这些数据元可以按不同的结构组织起来,以满足单位和个人的多种处理和检索的需要。
Inventory managementInventory ControlOn the so-called "inventory control", many people will interpret it as a "storage management", which is actually a big distortion.The traditional narrow view, mainly for warehouse inventory control of materials for inventory, data processing, storage, distribution, etc., through the implementation of anti-corrosion, temperature and humidity control means, to make the custody of the physical inventory to maintain optimum purposes. This is just a form of inventory control, or can be defined as the physical inventory control. How, then, from a broad perspective to understand inventory control? Inventory control should be related to the company's financial and operational objectives, in particular operating cash flow by optimizing the entire demand and supply chain management processes (DSCM), a reasonable set of ERP control strategy, and supported by appropriate information processing tools, tools to achieved in ensuring the timely delivery of the premise, as far as possible to reduce inventory levels, reducing inventory and obsolescence, the risk of devaluation. In this sense, the physical inventory control to achieve financial goals is just a means to control the entire inventory or just a necessary part; from the perspective of organizational functions, physical inventory control, warehouse management is mainly the responsibility of The broad inventory control is the demand and supply chain management, and the whole company's responsibility.Why until now many people's understanding of inventory control, limited physical inventory control? The following two reasons can not be ignored: First, our enterprises do not attach importance to inventory control. Especially those who benefit relatively good business, as long as there is money on the few people to consider the problem of inventory turnover. Inventory control is simply interpreted as warehouse management, unless the time to spend money, it may have been to see the inventory problem, and see the results are often very simple procurement to buy more, or did not do warehouse departments .Second, ERP misleading. Invoicing software is simple audacity to call it ERP, companies on their so-called ERP can reduce the number of inventory, inventory control, seems to rely on their small software can get. Even as SAP, BAAN ERP world, the field of these big boys, but also their simple modules inside the warehouse management functionality is defined as "inventory management" or "inventory control." This makes the already not quite understand what our inventory control, but not sure what is inventory control.In fact, from the perspective of broadly understood, inventory control, should include the following:First, the fundamental purpose of inventory control. We know that the so-called world-class manufacturing, two key assessment indicators (KPI) is, customer satisfaction and inventory turns, inventory turns and this is actually the fundamental objective of inventory control.Second, inventory control means. Increase inventory turns, relying solely on the so-called physical inventory control is not enough, it should be the demand and supply chain management process flow of this large output, and this big warehouse management processes in addition to including this link, the more important The section also includes: forecasting and order processing, production planning and control, materials planning and purchasing control, inventory planning and forecasting in itself, as well as finished products, raw materials, distribution and delivery of the strategy, and even customs management processes. And with the demand and supply chain management processes throughout the process, it is the information flow and capital flow management. In other words, inventory itself is across the entire demand and supply management processes in all aspects of inventory control in order to achieve the fundamental purpose, it must control all aspects of inventory, rather than just manage the physical inventory at hand.Third, inventory control, organizational structure and assessment. Since inventory control is the demand and supply chain management processes, output, inventory control to achieve the fundamental purpose of this process must be compatible with a rational organizational structure. Until now, we can seethat many companies have only one purchasing department, purchasing department following pipe warehouse. This is far short of inventory control requirements. From the demand and supply chain management process analysis, we know that purchasing and warehouse management is the executive arm of the typical, and inventory control should focus on prevention, the executive branch is very difficult to "prevent inventory" for the simple reason that they assessment indicators in large part to ensure supply (production, customer). How the actual situation, a reasonable demand and supply chain management processes, and thus set the corresponding rational organizational structure and is a question many of our enterprises to exploreThe role of inventory controlInventory management is an important part of business management. In the production and operation activities, inventory management must ensure that both the production plant for raw materials, spare parts demand, but also directly affect the purchasing, sales of share, sales activities. To make an inventory of corporate liquidity, accelerate cash flow, the security of supply under the premise of minimizing Yaku funds, directly affects the operational efficiency. Ensure the production and operation needs of the premise, so keep inventories at a reasonable level; dynamic inventory control, timely, appropriate proposed order to avoid over storage or out of stock; reduce inventory footprint, lower total cost of inventory; control stock funds used to accelerate cash flow.Problems arising from excessive inventory: increased warehouse space and inventory storage costs, thereby increasing product costs; take a lot of liquidity, resulting in sluggish capital, not only increased the burden of payment of interest, etc., would affect the time value of money and opportunity income; finished products and raw materials caused by physical loss and intangible losses; a large number of enterprise resource idle, affecting their rational allocation and optimization; cover the production, operation of the whole process of the various contradictions and problems, is not conducive to improve the management level.Inventory is too small the resulting problems: service levels caused a decline in the profit impact of marketing and corporate reputation; production system caused by inadequate supply of raw materials or other materials, affecting the normal production process; to shorten lead times, increase the number of orders, so order (production) costs; affect the balance of production and assembly of complete sets.NotesInventory management should particularly consider the following two questions:First, according to sales plans, according to the planned production of the goods circulated in the market, we should consider where, how much storage.Second, starting from the level of service and economic benefits to determine how to ensure inventories and supplementary questions.The two problems with the inventory in the logistics process functions. In general, the inventory function:(1) to prevent interrupted. Received orders to shorten the delivery of goods from the time in order to ensure quality service, at the same time to prevent out of stock.(2) to ensure proper inventory levels, saving inventory costs.(3) to reduce logistics costs. Supplement with the appropriate time interval compatible with the reasonable demand of the cargo in order to reduce logistics costs, eliminate or avoid sales fluctuations.(4) ensure the production planning, smooth to eliminate or avoid sales fluctuations.(5) display function.(6) reserve. Mass storage when the price falls, reduce losses, to respond to disasters and other contingencies.About the warehouse (inventory) on what the question, we must consider the number and location. If the distribution center, it should be possible according to customer needs, set at an appropriate place; if it is stored incentral places to minimize the complementary principle to the distribution centers, there is no place certain requirements. When the stock base is established, will have to take into account are stored in various locations in what commodities.库存管理库存控制在谈到所谓“库存控制”的时候,很多人将其理解为“仓储管理”,这实际上是个很大的曲解。
附件1:外文资料翻译译文数据库简介1.数据库管理系统(DBMS)。
众所周知,数据库是逻辑上相关的数据元的集合。
这些数据元可以按不同的结构组织起来,以满足单位和个人的多种处理和检索的需要。
数据库本身不是什么新鲜事——早期的数据库记录在石头上或写在名册上,以及写入索引卡中。
而现在,数据库普遍记录再可磁化的介质上,并且需要用计算机程序来执行必需的存储和检索操作。
在后文中你将看到除了简单的以外,所有数据库中都有复杂的数据关系及其连接。
处理与创建、访问以及维护数据库记录有关的复杂任务的系统软件包叫做数据库管理系统(DBMS)。
DBMS软件包中的程序在数据库极其用户间建立了接口(这些用户可以是应用程序员、管理员以及其他需要信息和各种操作系统的人员)。
DBMS可组织、处理和显示从数据库中选择的数据元。
该功能使决策者可以搜索、试探和查询数据库的内容,从而对在正式报告中没有的、不再出现的且无计划的问题作出回答。
这些问题最初可能是模糊的并且是定义不清的,但是人们可以浏览数据库直到获得问题的答案。
也就是说DBMS将“管理”存储的数据项,并从公共数据库中汇集所需的数据项以回答那些非程序员的询问。
在面向文件的系统中,需要特定信息的用户可以将他们的要求传送给程序员。
该程序员在时间允许时,将编写一个或多个程序以提取数据和准备信息。
但是,使用DBMS可为用户提供一种更快的、用户可以选择的通信方式。
顺序的、直接的以及其它的文件处理方式常用于单个文件中数据的组织和构造,而DBMS能够访问和检索非关键记录字段的数据,即DBMS能够将几个大文件中逻辑相关的数据组织并连接在一起。
逻辑结构。
确定这些逻辑关系是数据管理者的任务,由数据定义语言完成。
DBMS 在存储、访问和检索操作过程中可选用以下逻辑结构技术:(1)表结构。
在该逻辑方式中,记录通过指针链接在一起。
指针是记录中的一个数据项,它指出另一个逻辑相关的记录的存储位置,例如,顾客主文件的记录将包含每个顾客的姓名和地址,而且该文件中的每个记录都由一个帐号标识。
中英文对照外文翻译文献(文档含英文原文和中文翻译)Database Security in a Web Environment IntroductionDatabases have been common in government departments and commercial enterprises for many years. Today, databases in any organization are increasingly opened up to a multiplicity of suppliers, customers, partners and employees - an idea that would have been unheard of a few years ago. Numerous applications and their associated data are now accessed by a variety of users requiring different levels of access via manifold devices and channels – often simultaneously. For example:• Online banks allow customers to perform a variety of banking operations - via the Internet and over the telephone – whilst maintaining the privacy of account data.• E-Commerce merchants and their Service Providers must store customer, order and payment data on their merchant server - and keep it secure.• HR departments allow employees to update their personal information –whilst protecting certain management information from unauthorized access.• The medical profession must protect the confidentiality of patient data –whilst allowing essential access for treatment.• Online brokerages need to be able to provide large numbers of simultaneous users with up-to-date and accurate financial information.This complex landscape leads to many new demands upon system security. The global growth of complex web-based infrastructures is driving a need for security solutions that provide mechanisms to segregate environments; perform integrity checking and maintenance; enable strong authentication andnon-repudiation; and provide for confidentiality. In turn, this necessitates comprehensive business and technical risk assessment to identify the threats,vulnerabilities and impacts, and from this define a security policy. This leads to security definitions throughout the infrastructure - operating system, database management system, middleware and network.Financial, personal and medical information systems and some areas of government have strict requirements for security and privacy. Inappropriate disclosure of sensitive information to the wrong parties can have severe social, legal and regulatory consequences. Failure to address the basics can result in substantial direct and consequential financial losses - witness the fraud losses through the compromise of several million credit card numbers in merchants’ databases [Occf], plus associated damage to brand-image and loss of consumer confidence.This article discusses some of the main issues in database and web server security, and also considers important architecture and design issues.A Simple ModelAt the simplest level, a web server system consists of front-end software and back-end databases with interface software linking the two. Normally, the front-end software will consist of server software and the network server operating system, and the back-end database will be a relational orobject-oriented database fulfilling a variety of functions, including recording transactions, maintaining accounts and inventory. The interface software typically consists of Common Gateway Interface (CGI) scripts used to receive information from forms on web sites to perform online searches and to update the database.Depending on the infrastructure, middleware may be present; in addition, security management subsystems (with session and user databases) that address the web server’s and related applications’ requirements for authentication, accesscontrol and authorization may be present. Communications between this subsystem and either the web server, middleware or database are via application program interfaces (APIs)..This simple model is depicted in Figure 1.Security can be provided by the following components:• Web server.• Middleware.• Operating system.. Figure 1: A Simple Model.• Database and Database Management System.• Security management subsystem.The security of such a system addressesAspects of authenticity, integrity and confidentiality and is dependent on the security of the individual components and their interactions. Some of the most common vulnerabilities arise from poor configuration, inadequate change control procedures and poor administration. However, even if these areas are properlyaddressed, vulnerabilities still arise. The appropriate combination of people, technology and processes holds the key to providing the required physical and logical security. Attention should additionally be paid to the security aspects of planning, architecture, design and implementation.In the following sections, we consider some of the main security issues associated with databases, database management systems, operating systems and web servers, as well as important architecture and design issues. Our treatment seeks only to outline the main issues and the interested reader should refer to the references for a more detailed description.Database SecurityDatabase management systems normally run on top of an operating system and provide the security associated with a database. Typical operating system security features include memory and file protection, resource access control and user authentication. Memory protection prevents the memory of one program interfering with that of another and limits access and use of the objects employing techniques such as memory segmentation. The operating system also protects access to other objects (such as instructions, input and output devices, files and passwords) by checking access with reference to access control lists. Security mechanisms in common operating systems vary tremendously and, for those that are lacking, there exists special-purpose security software that can be integrated with the existing environment. However, this can be an expensive, time-consuming task and integration difficulties may also adversely impact application behaviors.Most database management systems consist of a number of modules - including database querying and database and file management - along with authorization, concurrent access and database description tables. Thesemanagement systems also use a variety of languages: a data definition language supports the logical definition of the database; developers use a data manipulation language; and a query language is used by non-specialist end-users.Database management systems have many of the same security requirements as operating systems, but there are significant differences since the former are particularly susceptible to the threat of improper disclosure, modification of information and also denial of service. Some of the most important security requirements for database management systems are: • Multi-Level Access Control.• Confidentiality.• Reliability.• Integrity.• Recovery.These requirements, along with security models, are considered in the following sections.Multi-Level Access ControlIn a multi-application and multi-user environment, administrators, auditors, developers, managers and users – collectively called subjects - need access to database objects, such as tables, fields or records. Access control restricts the operations available to a subject with respect to particular objects and is enforced by the database management system. Mandatory access controls require that each controlled object in the database must be labeled with a security level, whereas discretionary access controls may be applied at the choice of a subject.Access control in database management systems is more complicated than in operating systems since, in the latter, all objects are unrelated whereas in a database the converse is true. Databases are also required to make accessdecisions based on a finer degree of subject and object granularity. In multi-level systems, access control can be enforced by the use of views - filtered subsets of the database - containing the precise information that a subject is authorized to see.A general principle of access control is that a subject with high level security should not be able to write to a lower level object, and this poses a problem for database management systems that must read all database objects and write new objects. One solution to this problem is to use a trusted database management system.ConfidentialitySome databases will inevitably contain what is considered confidential data. For example, it could be inherently sensitive or its source may be sensitive, or it may belong to a sensitive table, thus making it difficult to determine what is actually confidential. Disclosure is also difficult to define, as it can be direct, indirect, involve the disclosure of bounds or even mere existence.An inference problem exists in database management systems whereby users can infer sensitive information from relatively insensitive queries. A trivial example is a request for information about the average salary of an employee and the number of employees turns out to be just one, thus revealing the employee’s salary. However, much more sophisticated statistical inference attacks can also be mounted. This highlights the fact that, although the data itself may be properly controlled, confidential information may still leak out.Controls can take several forms: not divulging sensitive information to unauthorized parties (which depends on the respective subject and object security levels), logging what each user knows or masking response data. The first control can be implemented fairly easily, the second quickly becomesunmanageable for a large number of users and the third leads to imprecise responses, and also exemplifies the trade-off between precision and security. Polyinstantiation refers to multiple instances of a data object existing in the database and it can provide a partial solution to the inference problem whereby different data values are supplied, depending on the security level, in response to the same query. However, this makes consistency management more difficult.Another issue that arises is when the security level of an aggregate amount is different to that of its elements (a problem commonly referred to as aggregation). This can be addressed by defining appropriate access control using views.Reliability, Integrity and RecoveryArguably, the most important requirements for databases are to ensure that the database presents consistent information to queries and can recover from any failures. An important aspect of consistency is that transactions execute atomically; that is, they either execute completely or not at all.Concurrency control addresses the problem of allowing simultaneous programs access to a shared database, while avoiding incorrect behavior or interference. It is normally addressed by a scheduler that uses locking techniques to ensure that the transactions are serial sable and independent. A common technique used in commercial products is two-phase locking (or variations thereof) in which the database management system controls when transactions obtain and release their locks according to whether or not transaction processing has been completed. In a first phase, the database management system collects the necessary data for the update: in a second phase, it updates the database. This means that the database can recover from incomplete transactions by repeatingeither of the appropriate phases. This technique can also be used in a distributed database system using a distributed scheduler arrangement.System failures can arise from the operating system and may result in corrupted storage. The main copy of the database is used for recovery from failures and communicates with a cached version that is used as the working version. In association with the logs, this allows the database to recover to a very specific point in the event of a system failure, either by removing the effects of incomplete transactions or applying the effects of completed transactions. Instead of having to recover the entire database after a failure, recovery can be made more efficient by the use of check pointing. It is used during normal operations to write additional updated information - such as logs, before-images of incomplete transactions, after-images of completed transactions - to the main database which reduces the amount of work needed for recovery. Recovery from failures in distributed systems is more complicated, since a single logical action is executed at different physical sites and the prospect of partial failure arises.Logical integrity, at field level and for the entire database, is addressed by the use of monitors to check important items such as input ranges, states and transitions. Error-correcting and error-detecting codes are also used.Security ModelsVarious security models exist that address different aspects of security in operating systems and database management systems. For example, theBell-LaPadula model defines security in terms of mandatory access control and addresses confidentiality only. The Bell LaPadula models, and other models including the Biba model for integrity, are described more fully in [Cast95] and [Pfle89]. These models are implementation-independent and provide a powerfulinsight into the properties of secure systems, lead to design policies and principles, and some form the basis for security evaluation criteria.Web Server SecurityWeb servers are now one of the most common interfaces between users and back-end databases, and as such, their security becomes increasingly important. Exploitation of vulnerabilities in the web server can lead to unforeseen attacks on middleware and backend databases, bypassing any controls that may be in place. In this section, we focus on common web server vulnerabilities and how the authentication requirements of web servers and databases are met.In general, a web server platform should not be shared with other applications and should be the only machine allowed to access the database. Using a firewall can provide additional security - either between the web server and users or between the web server and back-end database - and often the web server is placed on a de-militarized zone (DMZ) of a firewall. While firewalls can be used to block certain incoming connections, they must allow HTTP (and HTTPS) connections through to the web server, and so attacks can still be launched via the ports associated with these connections.VulnerabilitiesVulnerabilities appear on a weekly basis and, here, we prefer to focus on some general issues rather than specific attacks. Common web server vulnerabilities include:• No policy exists.• The default configuration is on.• Reusable passwords appear in clear.• Unnecessary ports available for network services are not disabled.• New security holes are not tracked. Even if they are, well-known vulnerabilities are not always fixed as the source code patches are not applied by system administrator and old programs are not re-compiled or removed.• Security tools are not used to scan the network for weaknesses and changes or to detect intrusions.• Faulty and buggy software - for example, buffer overflow and stack smashingAttacks• Automatic directory listings - this is of particular concern for the interface software directories.• Server root files are generally visible or accessible.• Lack of logs and bac kups.• File access is often not explicitly configured by the system administrator according to the security policy. This applies to configuration, client, administration and log files, administration programs, and CGI program sources and executables. CGI scripts allow dynamic web pages and make program development (in, for example, Perl) easy and rapid. However, their successful exploitation may allow execution of malicious programs, launching ofdenial-of-service attacks and, ultimately, privilege escalation on a server.Web Server and Database AuthenticationWhile user, browser and web server authentication are relatively well understood [Garf97], [Ghos98] and [Tree98], the introduction of additional components, such as databases and middleware, raise a number of authentication issues. There are a variety of options for authentication in a simple model (Figure 1). Firstly, both the web server and database management system can individually authenticate a user. This option requires the user to authenticatetwice which may be unacceptable in certain applications, although a singlesign-on device (which aims to manage authentication in a user-transparent way) may help. Secondly, a common approach is for the database to automatically grant user access based on web server authentication. However, this option should only be used for accessing publicly available information. Finally, the database may grant user access employing the web server authentication credentials as a basis for its own user authentication, using security management subsystems (Figure 1). We consider this last option in more detail.Web-based communications use the stateless HTTP protocol with the implication that state, and hence authentication, is not preserved when browsing successive web pages. Cookies, or files placed on user’s machine by a web server, were developed as a means of addressing this issue and are often used to provide authentication. However, after initial authentication, there is typically no re authentication per page in the same realm, only the use of unencrypted cookies (sometimes in association with IP addresses). This approach provides limited security as both cookies and IP addresses can be tampered with or spoofed.A stronger authentication method, commonly used by commercial implementations, uses digitally signed cookies. This allows additional systems, such as databases, to use digitally signed cookie data, including a session ID, as a basis for authentication. When a user has been authenticated by a web server (using a password, for example), a session ID is assigned and is stored in a security management subsystem database. When a user subsequently requests information from a database, the database receives a copy of the session ID, the security management subsystem checks this session ID against its local copy and, if authentication is successful, user access is granted to the database.The session ID is typically transmitted in the clear between the web server and database, but may be protected by SSL or even by physical security measures. The communications between the browser and web servers, and the web servers and security management subsystem (and its databases), are normally protected by SSL and use a web server security API that is used to digitally sign and verify browser cookies. The communications between the back-end databases and security management subsystem (and its databases) are also normally protected by SSL and use a database security API that verifies session Ids originating from the database and provides additional user authorization credentials. The web server security API is generally proprietary while, for the database security API, many vendors have adopted standards such as the Generic Security Services API (GSS-API) or CORBA [RFC2078] and [Corba].Architecture and DesignSecurity requirements for designing, building and implementing databases are important so that the systems, as part of the overall infrastructure, meet their requirements in actual operation. The various security models provide an important insight into the design requirements for databases and their management systems.Secure Database Management System ArchitecturesIn multi-level database management systems, a variety of architectures are possible: trusted subject, integrity locked, kernels and replicated. Trusted subject is used by most of the leading database management system vendors and can be integrated in existing products. Basically, the trusted subject architecture allows users to access a database via an un trusted front-end, a trusted database management system and trusted operating system. The operating systemprovides physical access to the database and the database management system provides multilevel object protection.The other architectures - integrity locked, kernels and replicated - all vary in detail, but they use a trusted front-end and an un trusted database management system. For details of these architectures and research prototypes, the reader is referred to [Cast95]. Different architectures are suited to different environments: for example, the trusted subject architecture is less integrated with the underlying operating system and is best suited when a trusted path can be assured between applications and the database management system.Secure Database Management System DesignAs discussed above, there are several fundamental differences between operating system and database management system design, including object granularity, multiple data types, data correlations and multi-level transactions. Other differences include the fact that database management systems include both physical and logical objects and that the database lifecycle is normally longer.These differences must be reflected in the design requirements which include:• Access, flow and infer ence controls.• Access granularity and modes.• Dynamic authorization.• Multi-level protection.• Polyinstantiation.• Auditing.• Performance.These requirements should be considered alongside basic information integrity principles, such as:• Well-formed transactions - to ensure that transactions are correct and consistent.• Continuity of operation - to ensure that data can be properly recovered, depending on the extent of a disaster.• Authorization and role management – to ensure that distinct roles are defined and users are authorized.• Authenticated users - to ensure that users are authenticated.• Least privilege - to ensure that users have the minimal privilege necessary to perform their tasks.• Separation of duties - to ensure that no single individual has access to critical data.• Delegation of authority - to ensure that the database management system policies are flexible enough to meet the organization’s requirements.Of course, some of these requirements and principles are not met by the database management system, but by the operating system and also by organizational and procedural measures.Database Design MethodologyVarious approaches to design exist, but most contain the same main stages. The principle aim of a design methodology is to provide a robust, verifiable design process and also to separate policies from how policies are actually implemented. An important requirement during any design process is that different design aspects can be merged and this equally applies to security.A preliminary analysis should be conducted that addresses the system risks, environment, existing products and performance. Requirements should then beanalyzed with respect to the results of a risk assessment. Security policies should be developed that include specification of granularity, privileges and authority.These policies and requirements form the input to the conceptual design that concentrates on subjects, objects and access modes without considering implementation details. Its purpose is to express information and process flows in a complete and consistent way.The logical design takes into account the operating system and database management system that will be used and which of the security requirements can be provided by which mechanisms. The physical design considers the actual physical realization of the logical design and, indeed, may result in a revision of the conceptual and logical phases due to physical constraints.Security AssuranceOnce a product has been developed, its security assurance can be assessed by a number of methods including formal verification, validation, penetration testing and certification. For example, if a database is to be certified as TCSEC Class B1, then it must implement the Bell-LaPadula mandatory access control model in which each controlled object in the database must be labeled with a security level.Most of these methods can be costly and lengthy to perform and are typically specific to particular hardware and software configurations. However, the international Common Criteria certification scheme provides the added benefit of a mutual recognition arrangement, thus avoiding the prospect of multiple certifications in different countries.ConclusionThis article has considered some of the security principles that are associated with databases and how these apply in a web based environment. Ithas also focused on important architecture and design principles. These principles have focused mainly on the prevention, assurance and recovery aspects, but other aspects, such as detection, are equally important in formulating a total information protection strategy. For example, host-based intrusion detection systems as well as a robust and tested set of business recovery procedures should be considered.Any fit-for-purpose, secure e-business infrastructure should address all the above aspects: prevention, assurance, detection and recovery. Certain industries are now starting to specify their own set of global, secure e-business requirements. International card payment associations have recently started to require minimum information security standards from electronic commerce merchants handling credit card data, to help manage fraud losses and associated impacts such as brand-image damage and loss of consumer confidence.网络环境下的数据库安全简介数据库在政府部门和商业机构得到普遍应用已经很多年了。
附录附录A: 外文资料翻译-原文部分:CUSTOMER TARGETTINGThe earliest determinant of success in the development of a profitable card scheme will lie in the quality of applicants that are attracted by the marketing effort. Not only must there be sufficient creditworthy applicants to avoid fruitless and expensive application processing, but it is critical that the overall mix of new accounts meets the standard necessary to ensure ultimate profitability. For example, the marketing initiatives may attract sufficient volume of applicants that are assessed as above the scorecard cut-off, but the proportion of acceptances in the upper bands may be insufficient to deliver the level of profit and lesser bad debt required to achieve the financial objectives of the scheme.This chapter considers the range of data sources available to support the development of a credit card scheme and the tools that can be applied to maximize the flow of applications from the required categories.Data availabilityThe data that makes up the ingredients from which marketing campaigns can be constructed can come from many diverse sources. Typically, it will fall into four categories:1 the national or regional register of voters;2 the national or regional register of court judgments that records the outcomeof creditor-debtor legislation;3 any national or regional pooled information showing the credit history of clients of the participating lenders; and4 commercially compiled data including and culled from name and address lists, survey results and other market analysis data, e.g. neighborhoods and lifestyle categorization through geo-demographic information systems.The availability and quality of this data will vary from country to country and bureau to bureau.Availability is not only governed by the extent to which the responsible agency has undertaken to record it, but also by the feasibility of accessing the data and the extent (if any) to which local consumer legislation or other considerations (e.g. religious principles) will allow it to be used. Other limitations on the use of available data may lie in the simple impossibility or expense of accessing the information sources, perhaps because necessary consumer consent for divulgence has been withheld or because the records are not yet stored electronically.The local credit information bureaux will be able to provide guidance on all of these matters, as will many local trade or professional associations or the relevant government departments.Data segmentation and AnalysesThe following remarks deal with the ways in which lawfully obtained data may then be processed and analyzed in order to maximize its value as the basis of a marketing prospect list. Examples of the types and uses of data that will play a role in the credit decision area are discussed later in the chapter, within the context of application processing.The key categories into which prospects may be segmented include lifestyle, propensity to purchase specific products (financial or otherwise) and levels of risk. The leading international information bureaux will be able to provide segmentation systems that are able to correlate each of these data categories to provide meaningful prospect lists in rank order. Additionally, many bureaux will have the capability to further enhance the strength and value of the data. Through the selective purchasing of data from bona fide market sources, and by overlaying generic factors deduced from the analysis of the broad mass of industry information that routinely passes through their systems, the best international operators are now able to offer marketing and credit information support that can add significantly to the quality of new applicants.The importance of the role and standard of this data in influencing the quality of the target population for mailings, etc. should not be underestimated. Information that is dated or inaccurate may not only lead a marketer and the organization into embarrassment and damage their reputations, but it will also open the credit card scheme to applicants from outside either the target sector or ,worse still, applicants outside the lender’s view of an acceptable credit risk.From this, it follows that you should seek to use an information bureau whose business principles and operating practices comply with the highest levels of both competence and integrity.Developing the prospect databaseThis is the process by which the raw data streams are brought together and subjected to progressive refinement, with the output representing the refined base from which prospecting can begin in earnest. A wide experience-often across many different markets and countries-in the sourcing, handling and analysis of data inevitably improves the quality of the ideas and systems that a bureau can offer for the development of the prospect database.In summary, the typical shape of the service available from the very best bureaux will support a process that runs as follows:1.collect and consolidate all data to be screened for inclusion;2.merge the various streams;3.sort and classify the data by market and credit categories;4.screen the date using predetermined marketing and credit criteria; and5.consolidate and output the refined list.Bureaux will charge for the use of their expertise and systems.Therefore, consideration should be given to the volumes of data that are to be processed and the costs involved at each stage. The most cost-effective approach to constructing prospect databases only undertakes the lowest-cost screening process within the earlier stages. The more expensive screening processes are not employed until the mass of the data has been reduced by earlier filtering.It is impossible to be prescriptive about the range and levels of service that are available, but reference to one of the major bureaux operating in the region could certainly be a good starting point.Campaign Management and AnalysisAgain, this is an area where excellent support is available from the best-of-breed bureaux. They will provide both the operational support and software capabilities to mount, monitor and analyse your marketing campaign, should you so wish. Their depth of experience and capabilities in the credit sector will often open up income: cost possibilities from the solicitation exercise that would not otherwise be available to the new entrant.The First Important Applications of DBMS’sData items include names and addresses of customers, accounts, loans and their balance, and the connection between customers and their accounts and loans, e.g., who has signature authority over which accounts. Queries for account balances are common, but far more common are modifications representing a single payment from or deposit to an account.As with the airline reservation system, we expect that many tellers and customers (through ATM machines) will be querying and modifying the bank’s data at once. It is vital that simultaneous accesses to an account not cause the effect of an ATM transaction to be lost. Failures cannot be tolerated. For example, once the money has been ejected from an ATM machine ,the bank must record the debit, even if the power immediately fails. On the other hand, it is not permissible for the bank to record the debit and then not deliver the money because the power fails. The proper way to handle this operation is far from obvious and can be regarded as one of the significant achievements in DBMS architecture.Database system changed significantly. Codd proposed that database system should present the user with a view of data organized as tables called relations. Behindthe scenes, there might be a complex data structure that allowed rapid response to a variety of queries. But unlike the user of earlier database systems, the user of a relational system would not be concerned with storage structure. Queries could be expressed in a very high level language, which greatly increased the efficiency of database programmers. Relations are tables. Their columns are headed by attributes.Client –Server ArchitectureMany varieties of modern software use a client-server architecture, in which requests by one process (the client ) are sent to another process (the server) for execution. Database systems are no exception, and it is common to divide the work of the components shown into a server process and one or more client processes.In the simplest client/server architecture, the entire DBMS is a server, except for the query interfaces that the user and send queries or other commands across to the server. For example, relational systems generally use the SQL language for representing requests from the client to the server. The database server then sends the answer, in the form of a table or relation, back to client. The relationship between client and server can get more complex, especially when answers are extremely large. We shall have more to say about this matter in section 1.3.3. there is also a trend to put more work in the client, since the server will be a bottleneck if there are many simultaneous database users.附录B: 外文资料翻译-译文部分:客户目标:最早判断发展可收益卡的成功性是在于受市场影响的被吸引的申请人的质量。
外文资料原文Database Systems1.Introduction to Database SystemToday, more than at any previous time, the success of an organization depends on its ability to acquire accurate and timely data about its operation, to manage this data effectively, and to use it to analyze and guide its activities. Phrases such as the information superhighway have become ubiquitous, and information processing is a rapidly growing multibillion dollar industry .The amount of information available to us is literally exploding, and the value of data as an organizational asset is being widely recognized. This paradox drives the need for increasingly powerful and flexible data management systems .A database is a collection of data , typically describing the activities of one or more related organizations . For example , a university database might contain information about the following .●Entities such as students , faculty , courses , and classrooms .●Relationships between entities , such as students’enrollment in courses , faculty teaching courses , and the use of rooms for courses .A database management system , or DBMS , is software designed to assist in maintaining and utilizing large collections of data , and the need for such systems , as well as their use , is growing rapidly . The alternative to using a DBMS is to use ad hoc approaches that do notcarry over from one application to another , for example , to store the data in files and write application-specific code to manage it .The area of database management systems is a microcosm of computer science in general . The issues addressed and the techniques used span a wide spectrum , including languages , object-orientation and other programming paradigms , compilation , operating systems1concurrent programming , data structures , algorithms ,theory , parallel and distributed systems , user interfaces , expert systems and artificial intelligence , statistical techniques , and dynamic programming .Database management continues to gain importance as more and more data is brought on-line, and made ever more accessible through computer networking. Today the field is being driven by exciting visions such as multimedia databases, interactive video, digital libraries, a host of scientific projects such as the human genome mappin g effort and NASA’s Earth Observation System project, and the desire of companies to consolidate their decision-making processes and mine their data repositories for useful information about their business . Commercially , database management systems represent one of the largest and most vigorous market segments . Thus the study of database systems couldprove to be richly rewarding in more ways than one .2.Database consistsA database consists of a file or a set of files. The information in these files may be broken down into records, each of which consists of one or more fields. Fields are the basic units of data storage, and each field typically contains information pertaining to one aspect or attribute of the entity described by the database. Using keywords and various sorting commands, users can rapidly search, rearrange, group, and select the fields in many records to retrieve or create reports on particular aggregates of data.Database records and files must be organized to allow retrieval of the information. Early systems were arranged sequentially (i.e., alphabetically, numerically, or chronologically); the development of direct-access storage devices made possible random access to data via indexes. Queries are the main way users retrieve database information. Typically, the user provides a string of characters, and the computersearches the database for a corresponding sequence and provides the source materials in which those characters appear. A user can request, for example, all records in which the content of the field for a person’s last name is the word Smith.In flat databases , records are organized according to a simplelist of entities; many simple databases for personal computers are flat in structure. The records in hierarchical databases are organized in a treelike structure, with each level of records branching off into a set of smaller categories. Unlike hierarchical databases, which provide single links between sets of records at different levels, network databases create multiple linkages between sets by placing links, orpointers, to one set of records in another; the speed and versatility of network databases have led to their wide use in business.Relational databases are used where associations among files or records cannot be expressed by links; a simple flat list becomes one table, or “relation”, and multiple relations can be mathematically associated to yield desired information. Object-oriented databases store and manipulate more complex data struct ures, called “objects”, which are organized into hierarchical classes that may inherit properties from classes higher in the chain; this database structure is the mostflexible and adaptable.3.Structure of the Relational databaseThe relational model is the basis for any relational database management system (RDBMS).A relational model has three core components: a collection of objects or relations, operators that act on the objects or relations, and data integrity methods. In other words, it has a place to store the data, a way to create and retrieve the data, and a way to make sure that the data is logically consistent.A relational database uses relations, or two-dimensional tables, to store the information needed to support a business.3.1.Tables, Row, and ColumnsA table in a relational database, alternatively known as a relation, is a two-dimensional structure used to hold related information. A database consists of one or more related tables.Note: Don't confuse a relation with relationships. A relation is essentially a table, and a relationship is a way to correlate, join, or associate two tables.A row in a table is a collection or instance of one thing, such as one employee or one line item on an invoice. A column contains all the information of a single type, and the piece of data at the intersection of a row and a column, a field, is the smallest piece of informationthat can be retrieved with the database's query language. For example, a table with information about employees might have a column calledLAST_NAME that contains all of the employees' last names. Data is retrieved from a table by filtering on both the row and the column.3.2.Primary Keys, Data types, and Foreign KeysRelation: A two-dimensional structure used to hold related information, also known as a table.Row: A group of one or more data elements in a database table that describes a person, place, or thing.Column: The component of a database table that contains all of the data of the same name and type across all rows.Primary Key: A column (or columns) in a table that makes the row in the table distinguishable from every other row in the same table.Data types: numeric values, character or alphabetic values, and date values.A foreign key enforces the concept of referential integrity in a relational database.Foreign Key: A column (or columns) in a table that draws its valuesfrom a primary or unique key column in another table. A foreign key assists in ensuring the data integrity of a table. Referential Integrity A method employed by a relational database system that enforces one-to-many relationships between tables.3.3.Data ModelingIn this process, the developer conceptualizes and documents all the tables for the database. One of the common methods for modeling a database is called ERA, which stands for entities, relationships, and attributes. The database designer uses an application that can maintain entities, their attributes, and their relationships. In general, anentity corresponds to a table in the database, and the attributes of the entity correspond to columns of the table.Data Modeling: A process of defining the entities, attributes, and relationships between the entities in preparation for creating the physical database.The data-modeling process involves defining the entities, defining the relationships between those entities, and then defining theattributes for each of the entities. Once a cycle is complete, it is repeated as many times as necessary to ensure that the designer is capturing what is important enough to go into the database. Let's take a closer look at each step in the data-modeling process.3.4. Defining the EntitiesFirst, the designer identifies all of the entities within the scope of the database application.The entities are the persons, places, or things that are important to the organization and need to be tracked in the database. Entitieswill most likely translate neatly to database tables.。
毕业设计(论文)外文参考资料及译文译文题目:学生姓名:学号:专业:所在学院:指导教师:职称:年月日1. Database management system1. Database management systemA Database Management System (DBMS) is a set of computer programs that controls the creation, maintenance, and the use of a database. It allows organizations to place control of database development in the hands of database administrators (DBAs) and other specialists. A DBMS is a system software package that helps the use of integrated collection of data records and files known as databases. It allows different user application programs to easily access the same database. DBMSs may use any of a variety of database models, such as the network model or relational model. In large systems, a DBMS allows users and other software to store and retrieve data in a structured way. Instead of having to write computer programs to extract information, user can ask simple questions in a query language. Thus, many DBMS packages provide Fourth-generation programming language (4GLs) and other application development features. It helps to specify the logical organization for a database and access and use the information within a database. It provides facilities for controlling data access, enforcing data integrity, managing concurrency, and restoring the database from backups. A DBMS also provides the ability to logically present database information to users.2. OverviewA DBMS is a set of software programs that controls the organization, storage, management, and retrieval of data in a database. DBMSs are categorized according to their data structures or types. The DBMS accepts requests for data from an application program and instructs the operating system to transfer the appropriate data. The queries and responses must be submitted and received according to a format that conforms to one or more applicable protocols. When a DBMS is used, information systems can be changed much more easily as the organization's information requirements change. New categories of data can be added to the database without disruption to the existing system.Database servers are computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions.3. HistoryDatabases have been in use since the earliest days of electronic computing. Unlike modern systems which can be applied to widely different databases and needs, the vast majority of older systems were tightly linked to the custom databases in order to gain speed at the expense of flexibility. Originally DBMSs were found only in large organizations with the computer hardware needed to support large data sets.3.1 1960s Navigational DBMSAs computers grew in speed and capability, a number of general-purpose database systems emerged; by the mid-1960s there were a number of such systems in commercial use. Interest in a standard began to grow, and Charles Bachman, author of one such product, Integrated Data Store (IDS), founded the "Database Task Group" within CODASYL, the group responsible for the creation and standardization of COBOL. In 1971 they delivered their standard, which generally became known as the "Codasyl approach", and soon there were a number of commercial products based on it available.The Codasyl approach was based on the "manual" navigation of a linked data set which was formed into a large network. When the database was first opened, the program was handed back a link to the first record in the database, which also contained pointers to other pieces of data. To find any particular record the programmer had to step through these pointers one at a time until the required record was returned. Simple queries like "find all the people in India" required the programto walk the entire data set and collect the matching results. There was, essentially, no concept of "find" or "search". This might sound like a serious limitation today, but in an era when the data was most often stored on magnetic tape such operations were too expensive to contemplate anyway.IBM also had their own DBMS system in 1968, known as IMS. IMS was a development of software written for the Apollo program on the System/360. IMS was generally similar in concept to Codasyl, but used a strict hierarchy for its model of data navigation instead of Codasyl's network model. Both concepts later became known as navigational databases due to the way data was accessed, and Bachman's 1973 Turing Award award presentation was The Programmer as Navigator. IMS is classified as a hierarchical database. IMS and IDMS, both CODASYL databases, as well as CINCOMs TOTAL database are classified as network databases.3.2 1970s Relational DBMSEdgar Codd worked at IBM in San Jose, California, in one of their offshoot offices that was primarily involved in the development of hard disk systems. He was unhappy with the navigational model of the Codasyl approach, notably the lack of a "search" facility which was becoming increasingly useful. In 1970, he wrote a number of papers that outlined a new approach to database construction that eventually culminated in the groundbreaking A Relational Model of Data for Large Shared Data Banks.[1]In this paper, he described a new system for storing and working with large databases. Instead of records being stored in some sort of linked list of free-form records as in Codasyl, Codd's idea was to use a "table" of fixed-length records. A linked-list system would be very inefficient when storing "sparse" databases where some of the data for any one record could be left empty. The relational model solved this by splitting the data into a series of normalized tables, with optional elements being moved out of the main table to where they would take up room only if needed.For instance, a common use of a database system is to track information about users, their name, login information, various addresses and phone numbers. In the navigational approach all of these data would be placed in a single record, and unused items would simply not be placed in the database. In the relational approach, the data would be normalized into a user table, an address table and a phone number table (for instance). Records would be created in these optional tables only if the address or phone numbers were actually provided.Linking the information back together is the key to this system. In the relational model, some bit of information was used as a "key", uniquely defining a particular record. When information was being collected about a user, information stored in the optional (or related) tables would be found by searching for this key. For instance, if the login name of a user is unique, addresses and phone numbers for that user would be recorded with the login name as its key. This "re-linking" of related data back into a single collection is something that traditional computer languages are not designed for.Just as the navigational approach would require programs to loop in order to collect records, the relational approach would require loops to collect information about any one record. Codd's solution to the necessary looping was a set-oriented language, a suggestion that would later spawn the ubiquitous SQL. Using a branch of mathematics known as tuple calculus, he demonstrated that such a system could support all the operations of normal databases (inserting, updating etc.) as well as providing a simple system for finding and returning sets of data in a single operation.Codd's paper was picked up by two people at the Berkeley, Eugene Wong and Michael Stonebraker. They started a project known as INGRES using funding that had already been allocated for a geographical database project, using studentprogrammers to produce code. Beginning in 1973, INGRES delivered its first test products which were generally ready for widespread use in 1979. During this time, a number of people had moved "through" the group — perhaps as many as 30 people worked on the project, about five at a time. INGRES was similar to System R in a number of ways, including the use of a "language" for data access, known as QUEL — QUEL was in fact relational, having been based on Codd's own Alpha language, but has since been corrupted to follow SQL, thus violating much the same concepts of the relational model as SQL itself.IBM itself did one test implementation of the relational model, PRTV, and a production one, Business System 12, both now discontinued. Honeywell did MRDS for Multics, and now there are two new implementations: Alphora Dataphor and Rel. All other DBMS implementations usually called relational are actually SQL DBMSs. In 1968, the University of Michigan began development of the Micro DBMS relational database management system. It was used to manage very large data sets by the US Department of Labor, the Environmental Protection Agency and researchers from University of Alberta, the University of Michigan and Wayne State University. It ran on mainframe computers using Michigan Terminal System. The system remained in production until 1996.3.3 End 1970s SQL DBMSIBM started working on a prototype system loosely based on Codd's concepts as System R in the early 1970s. The first version was ready in 1974/5, and work then started on multi-table systems in which the data could be split so that all of the data for a record (much of which is often optional) did not have to be stored in a single large "chunk". Subsequent multi-user versions were tested by customers in 1978 and 1979, by which time a standardized query language, SQL, had been added. Codd's ideas were establishing themselves as both workable and superior to Codasyl, pushing IBM to develop a true production version of System R, known as SQL/DS, and, later, Database 2 (DB2).Many of the people involved with INGRES became convinced of the future commercial success of such systems, and formed their own companies to commercialize the work but with an SQL interface. Sybase, Informix, NonStop SQL and eventually Ingres itself were all being sold as offshoots to the original INGRES product in the 1980s. Even Microsoft SQL Server is actually a re-built version of Sybase, and thus, INGRES. Only Larry Ellison's Oracle started from a different chain, based on IBM's papers on System R, and beat IBM to market when the first version was released in 1978.Stonebraker went on to apply the lessons from INGRES to develop a new database, Postgres, which is now known as PostgreSQL. PostgreSQL is often used for global mission critical applications (the .org and .info domain name registries use it as their primary data store, as do many large companies and financial institutions).In Sweden, Codd's paper was also read and Mimer SQL was developed from the mid-70s at Uppsala University. In 1984, this project was consolidated into an independent enterprise. In the early 1980s, Mimer introduced transaction handling for high robustness in applications, an idea that was subsequently implemented on most other DBMS.3.4 1980s Object Oriented DatabasesThe 1980s, along with a rise in object oriented programming, saw a growth in how data in various databases were handled. Programmers and designers began to treat the data in their databases as objects. That is to say that if a person's data were in a database, that person's attributes, such as their address, phone number, and age, were now considered to belong to that person instead of being extraneous data. This allows for relationships between data to be relation to objects and their attributes and not to individual fields.Another big game changer for databases in the 1980s was the focus on increasing reliability and access speeds. In 1989, two professors from the University of Michigan at Madison, published an article at an ACM associated conference outlining their methods on increasing database performance. The idea was to replicate specific important, and often queried information, and store it in a smaller temporary database that linked these key features back to the main database. This meant that a query could search the smaller database much quicker, rather than search the entire dataset. This eventually leads to the practice of indexing, which is used by almost every operating system from Windows to the system that operates Apple iPod devices.4. DBMS building blocksA DBMS includes four main parts: modeling language, data structure, database query language, and transaction mechanisms:4.1 Components of DBMS∙DBMS Engine accepts logical request from the various other DBMS subsystems, converts them into physical equivalents, and actually accesses thedatabase and data dictionary as they exist on a storage device.∙Data Definition Subsystem helps user to create and maintain the data dictionary and define the structure of the files in a database.∙Data Manipulation Subsystem helps user to add, change, and delete information in a database and query it for valuable information. Software tools within the data manipulation subsystem are most often the primary interfacebetween user and the information contained in a database. It allows user tospecify its logical information requirements.∙Application Generation Subsystem contains facilities to help users to develop transaction-intensive applications. It usually requires that userperform a detailed series of tasks to process a transaction. It facilitateseasy-to-use data entry screens, programming languages, and interfaces.∙Data Administration Subsystem helps users to manage the overall database environment by providing facilities for backup and recovery, security management, query optimization, concurrency control, and change management.4.2 Modeling languageA data modeling language to define the schema of each database hosted in the DBMS, according to the DBMS database model. The four most common types of models are the:•hierarchical model,•network model,•relational model, and•object model.Inverted lists and other methods are also used. A given database management system may provide one or more of the four models. The optimal structure dependson the natural organization of the application's data, and on the application's requirements (which include transaction rate (speed), reliability, maintainability, scalability, and cost).The dominant model in use today is the ad hoc one embedded in SQL, despite the objections of purists who believe this model is a corruption of the relational model, since it violates several of its fundamental principles for the sake of practicality and performance. Many DBMSs also support the Open Database Connectivity API that supports a standard way for programmers to access the DBMS.Before the database management approach, organizations relied on file processing systems to organize, store, and process data files. End users became aggravated with file processing because data is stored in many different files and each organized in a different way. Each file was specialized to be used with a specific application. Needless to say, file processing was bulky, costly and nonflexible when it came to supplying needed data accurately and promptly. Data redundancy is an issue with the file processing system because the independent data files produce duplicate data so when updates were needed each separate file would need to be updated. Another issue is the lack of data integration. The data is dependent on other data to organize and store it. Lastly, there was not any consistency or standardization of the data in a file processing system which makes maintenance difficult. For all these reasons, the database management approach was produced. Database management systems (DBMS) are designed to use one of five database structures to providesimplistic access to information stored in databases. The five database structures are hierarchical, network, relational, multidimensional and object-oriented models.The hierarchical structure was used in early mainframe DBMS. Records’ relationships form a treelike model. This structure is simple but nonflexible because the relationship is confined to a one-to-many relationship. IBM’s IMS system and the RDM Mobile are examples of a hierarchical database system with multiple hierarchies over the same data. RDM Mobile is a newly designed embedded database for a mobile computer system. The hierarchical structure is used primary today for storing geographic information and file systems.The network structure consists of more complex relationships. Unlike the hierarchical structure, it can relate to many records and accesses them by following one of several paths. In other words, this structure allows for many-to-many relationships.The relational structure is the most commonly used today. It is used by mainframe, midrange and microcomputer systems. It uses two-dimensional rows and columns to store data. The tables of records can be connected by common key values. While working for IBM, E.F. Codd designed this structure in 1970. The model is not easy for the end user to run queries with because it may require a complex combination of many tables.The multidimensional structure is similar to the relational model. The dimensions of the cube looking model have data relating to elements in each cell. This structure gives a spreadsheet like view of data. This structure is easy to maintain because records are stored as fundamental attributes, the same way they’re viewed and the structure is easy to understand. Its high performance has made it the most popular database structure when it comes to enabling online analytical processing (OLAP).The object oriented structure has the ability to handle graphics, pictures, voice and text, types of data, without difficultly unlike the other database structures. This structure is popular for multimedia Web-based applications. It was designed to work with object-oriented programming languages such as Java.4.3 Data structureData structures (fields, records, files and objects) optimized to deal with very large amounts of data stored on a permanent data storage device (which implies relatively slow access compared to volatile main memory).4.4 Database query languageA database query language and report writer allows users to interactively interrogate the database, analyze its data and update it according to the users privileges on data. It also controls the security of the database. Data security prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called subschemas. For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data.If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases. However, it may not leave an audit trail of actions or provide the kinds of controls necessary in a multi-user organization. These controls are only available when a set of application programs are customized for each data entry and updating function.4.5 Transaction mechanismA database transaction mechanism ideally guarantees ACID properties in orderto ensure data integrity despite concurrent user accesses (concurrency control), and faults (fault tolerance). It also maintains the integrity of the data in the database. The DBMS can maintain the integrity of the database by not allowing more than one user to update the same record at the same time. The DBMS can help prevent duplicate records via unique index constraints; for example, no two customers with the same customer numbers (key fields) can be entered into the database. See ACID properties for more information (Redundancy avoidance).5. DBMS topics5.1 External, Logical and Internal viewA database management system provides the ability for many different users to share data and process resources. But as there can be many different users, there are many different database needs. The question now is: How can a single, unified database meet the differing requirement of so many users?A DBMS minimizes these problems by providing two views of the database data: an external view(or User view), logical view(or conceptual view)and physical(or internal) view. The user’s view, of a database program represents data in a format that is meaningful to a user and to the software programs that process those data. That is, the logical view tells the user, in user terms, what is in the database. The physicalview deals with the actual, physical arrangement and location of data in the direct access storage devices(DASDs). Database specialists use the physical view to make efficient use of storage and processing resources. With the logical view users can see data differently from how they are stored, and they do not want to know all the technical details of physical storage. After all, a business user is primarily interested in using the information, not in how it is stored.One strength of a DBMS is that while there is typically only one conceptual (or logical) and physical (or Internal) view of the data, there can be an endless number of different External views. This feature allows users to see database information in a more business-related way rather than from a technical, processing viewpoint. Thus the logical view refers to the way user views data, and the physical view to the way the data are physically stored and processed...5.2 DBMS features and capabilitiesAlternatively, and especially in connection with the relational model of database management, the relation between attributes drawn from a specified set of domains can be seen as being primary. For instance, the database might indicate that a car that was originally "red" might fade to "pink" in time, provided it was of some particular "make" with an inferior paint job. Such higher arity relationships provide information on all of the underlying domains at the same time, with none of them being privileged above the others.5.3 DBMS simple definitionData base management system is the system in which related data is stored in an "efficient" and "compact" manner. Efficient means that the data which is stored in the DBMS is accessed in very quick time and compact means that the data which is stored in DBMS covers very less space in computer's memory. In above definition the phrase "related data" is used which means that the data which is stored in DBMS is about some particular topic.Throughout recent history specialized databases have existed for scientific, geospatial, imaging, document storage and like uses. Functionality drawn from such applications has lately begun appearing in mainstream DBMSs as well. However, the main focus there, at least when aimed at the commercial data processing market, is still on descriptive attributes on repetitive record structures.Thus, the DBMSs of today roll together frequently needed services or features of attribute management. By externalizing such functionality to the DBMS, applications effectively share code with each other and are relieved of much internal complexity. Features commonly offered by database management systems include:5.3.1 Query abilityQuerying is the process of requesting attribute information from various perspectives and combinations of factors. Example: "How many 2-door cars in Texas are green?" A database query language and report writer allow users to interactively interrogate the database, analyze its data and update it according to the users privileges on data.5.3.2 Backup and replicationCopies of attributes need to be made regularly in case primary disks or other equipment fails. A periodic copy of attributes may also be created for a distant organization that cannot readily access the original. DBMS usually provide utilities to facilitate the process of extracting and disseminating attribute sets. When data is replicated between database servers, so that the information remains consistent throughout the database system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit replication transparency.5.3.2 Rule enforcementOften one wants to apply rules to attributes so that the attributes are clean and reliable. For example, we may have a rule that says each car can have only one engine associated with it (identified by Engine Number). If somebody tries to associate a second engine with a given car, we want the DBMS to deny such a request and display an error message. However, with changes in the model specification such as, in this example, hybrid gas-electric cars, rules may need to change. Ideally such rules should be able to be added and removed as needed without significant data layout redesign.5.3.4 SecurityOften it is desirable to limit who can see or change which attributes or groups of attributes. This may be managed directly by individual, or by the assignment of individuals and privileges to groups, or (in the most elaborate models) through the assignment of individuals and groups to roles which are then granted entitlements.5.3.5 ComputationThere are common computations requested on attributes such as counting, summing, averaging, sorting, grouping, cross-referencing, etc. Rather than have each computer application implement these from scratch, they can rely on the DBMS to supply such calculations.5.3.6 Change and access loggingOften one wants to know who accessed what attributes, what was changed, and when it was changed. Logging services allow this by keeping a record of access occurrences and changes.5.3.7 Automated optimizationIf there are frequently occurring usage patterns or requests, some DBMS can adjust themselves to improve the speed of those interactions. In some cases the DBMS will merely provide tools to monitor performance, allowing a human expert to make the necessary adjustments after reviewing the statistics collected5.4 Meta-data repositoryMetadata is data describing data. For example, a listing that describes what attributes are allowed to be in data sets is called "meta-information". The meta-data is also known as data about data.5.5 Current trendsIn 1998, database management was in need of new style databases to solve current database management problems. Researchers realized that the old trends of database management were becoming too complex and there was a need for automated configuration and management. Surajit Chaudhuri, Gerhard Weikum and Michael Stonebraker, were the pioneers that dramatically affected the thought of database management systems. They believed that database management needed a more modular approach and that there are so many specifications needs for various users. Since this new development process of database management we currently have endless possibilities. Database management is no longer limited to “monolithic entities”. Many solutions have developed to satisfy individual needs of users. Development of numerous database options has created flexible solutions in database management.Today there are several ways database management has affected the technology world as we know it. Organizations demand for directory services has become an extreme necessity as organizations grow. Businesses are now able to use directory services that provided prompt searches for their company information. Mobile devices are not only able to store contact information of users but have grown to bigger capabilities. Mobile technology is able to cache large information that is used for computers and is able to display it on smaller devices. Web searches have even been affected with database management. Search engine queries are able to locate data。
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latestmust-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical perspective(e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information onwhich an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending),(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which informationfrom multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kindsof data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。
原文:Structure of the Relational database—《Database System Concepts》Part1: Relational Databases The relational model is the basis for any relational database management system (RDBMS).A relational model has three core components: a collection of obj ects or relations, operators that act on the objects or relations, and data integrity methods. In other words, it has a place to store the data, a way to create and retrieve the data, and a way to make sure that the data is logically consistent.A relational database uses relations, or two-dimensional tables, to store the information needed to support a business. Let's go over the basic components of a traditional relational database system and look at how a relational database is designed. Once you have a solid understanding of what rows, columns, tables, and relationships are, you'll be well on your way to leveraging the power of a relational database.Tables, Row, and ColumnsA table in a relational database, alternatively known as a relation, is a two-dimensional structure used to hold related information. A database consists of one or more related tables.Note: Don't confuse a relation with relationships. A relation is essentially a table, and a relationship is a way to correlate, join, or associate two tables.A row in a table is a collection or instance of one thing, such as one employee or one line item on an invoice. A column contains all the information of a single type, and the piece of data at the intersection of a row and a column, a field, is the smallest piece of information that can be retrieved with the database's query language. For example, a table with information about employees might have a column calledLAST_NAME that contains all of the employees' last names. Data is retrieved from a table by filtering on both the row and the column.Primary Keys, Datatypes, and Foreign KeysThe examples throughout this article will focus on the hypothetical work of Scott Smith, database developer and entrepreneur. He just started a new widget company and wants to implement a few of the basic business functions using the relational database to manage his Human Resources (HR) department.Relation: A two-dimensional structure used to hold related information, also known as a table.Note: Most of Scott's employees were hired away from one of his previous employers, some of whom have over 20 years of experience in the field. As a hiring incentive, Scott has agreed to keep the new employees' original hire date in the new database.Row:A group of one or more data elements in a database table that describes a person, place, or thing.Column:The component of a database table that contains all of the data of the same name and type across all rows.You'll learn about database design in the following sections, but let's assume for the moment that the majority of the database design is completed and some tables need to be implemented. Scott creates the EMP table to hold the basic employee information, and it looks something like this:Notice that some fields in the Commission (COMM) and Manager (MGR) columns do not contain a value; they are blank. A relational database can enforce the rule that fields in a column may or may not be empty. In this case, it makes sense for an employee who is not in the Sales department to have a blank Commission field. It also makes sense for the president of the company to have a blank Manager field, since that employee doesn't report to anyone.Field:The smallest piece of information that can be retrieved by the database query language. A field is found at the intersection of a row and a column in a database table.On the other hand, none of the fields in the Employee Number (EMPNO) column are blank. The company always wants to assign an employee number to an employee, and that number must be different for each employee. One of the features of a relational database is that it can ensure that a value is entered into this column and that it is unique. Th e EMPNO column, in this case, is the primary key of the table.Primary Key:A column (or columns) in a table that makes the row in the table distinguishable from every other row in the same table.Notice the different datatypes that are stored in the EMP ta ble: numeric values, character or alphabetic values, and date values.As you might suspect, the DEPTNO column contains the department number for the employee. But how do you know what department name is associated with what number? Scott created the DEPT table to hold the descriptions for the department codes in the EMP table.The DEPTNO column in the EMP table contains the same values as the DEPTNO column in the DEPT table. In this case, the DEPTNO column in the EMP table is considered a foreign key to the same column in the DEPT table.A foreign key enforces the concept of referential integrity in a relational database. The concept of referential integrity not only prevents an invalid department number from being inserted into the EMP table, but it also prevents a row in the DEPT table from being deleted if there are employees still assigned to that department.Foreign Key:A column (or columns) in a table that draws its values from a primary or unique key column in another table. A foreign key assists in ensuring the data integrity of a table. Referential Integrity A method employed by a relational database system that enforces one-to-many relationships between tables.Data ModelingBefore Scott created the actual tables in the database, he went through a design process known as data modeling. In this process, the developer conceptualizes and documents all the tables for the database. One of the common methods for mod eling a database is called ERA, which stands for entities, relationships, and attributes. The database designer uses an application that can maintain entities, their attributes, and their relationships. In general, an entity corresponds to a table in the database, and the attributes of the entity correspond to columns of the table.Data Modeling:A process of defining the entities, attributes, and relationships between the entities in preparation for creating the physical database.The data-modeling process involves defining the entities, defining the relationships between those entities, and then defining the attributes for each of the entities. Once a cycle is complete, it is repeated as many times as necessary to ensure that the designer is capturing what is important enough to go into the database. Let's take a closer look at each step in the data-modeling process.Defining the EntitiesFirst, the designer identifies all of the entities within the scope of the database application.The entities are the pers ons, places, or things that are important to the organization and need to be tracked in the database. Entities will most likely translate neatly to database tables. For example, for the first version of Scott's widget company database, he identifies four entities: employees, departments, salary grades, and bonuses. These will become the EMP, DEPT, SALGRADE, and BONUS tables.Defining the Relationships Between EntitiesOnce the entities are defined, the designer can proceed with defining how each of the entities is related. Often, the designer will pair each entity with every other entity and ask, "Is there a relationship between these two entities?" Some relationships are obvious; some are not.In the widget company database, there is most likely a relations hip between EMP and DEPT, but depending on the business rules, it is unlikely that the DEPT and SALGRADE entities are related. If the business rules were to restrict certain salary grades to certain departments, there would most likely be a new entity that defines the relationship between salary grades and departments. This entity wouldbe known as an associative or intersection table and would contain the valid combinations of salary grades and departments.Associative Table:A database table that stores th e valid combinations of rows from two other tables and usually enforces a business rule. An associative table resolves a many-to-many relationship.In general, there are three types of relationships in a relational database:One-to-many The most common type of relationship is one-to-many. This means that for each occurrence in a given entity, the parent entity, there may be one or more occurrences in a second entity, the child entity, to which it is related. For example, in the widget company database, the DEPT entity is a parent entity, and for each department, there could be one or more employees associated with that department. The relationship between DEPT and EMP is one-to-many.One-to-one In a one-to-one relationship, a row in a table is related to only one or none of the rows in a second table. This relationship type is often used for subtyping. For example, an EMPLOYEE table may hold the information common to all employees, while the FULLTIME, PARTTIME, and CONTRACTOR tables hold information unique to full-time employees, part-time employees, and contractors, respectively. These entities would be considered subtypes of an EMPLOYEE and maintain a one-to-one relationship with the EMPLOYEE table. These relationships are not as common as one-to-many relationships, because if one entity has an occurrence for a corresponding row in another entity, in most cases, the attributes from both entities should be in a single entity.Many-to-many In a many-to-many relationship, one row of a table may be related to man y rows of another table, and vice versa. Usually, when this relationship is implemented in the database, a third entity isdefined as an intersection table to contain the associations between the two entities in the relationship. For example, in a database used for school class enrollment, the STUDENT table has a many-to-many relationship with the CLASS table—one student may take one or more classes, and a given class may have one or more students. The intersection table STUDENT_CLASS would contain the comb inations of STUDENT and CLASS to track which students are in which classes.Once the designer has defined the entity relationships, the next step is to assign the attributes to each entity. This is physically implemented using columns, as shown here for th e SALGRADE table as derived from the salary grade entity.After the entities, relationships, and attributes have been defined, the designer may iterate the data modeling many more times. When reviewing relationships, new entities may be discovered. For exa mple, when discussing the widget inventory table and its relationship to a customer order, the need for a shipping restrictions table may arise.Once the design process is complete, the physical database tables may be created. Logical database design sessions should not involve physical implementation issues, but once the design has gone through an iteration or two, it's the DBA's job to bring the designers "down to earth." As a result, the design may need to be revisited to balance the ideal database implementation versus the realities of budgets andschedules.译文:关系数据库的结构—《数据库系统结构》第一章:关系数据库关系模型是任何关系数据库管理系统(RDBMS)的基础。
河北工程大学毕业论文(设计)英文参考文献原文复印件及译文数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。
大量组织机构已经发现,在当今这个充满竞争、快速发展的世界,数据仓库是一个有价值的工具。
在过去的几年中,许多公司已花费数百万美元,建立企业范围的数据仓库。
许多人感到,随着工业竞争的加剧,数据仓库成了必备的最新营销武器——通过更多地了解客户需求而保住客户的途径。
“那么”,你可能会充满神秘地问,“到底什么是数据仓库?”数据仓库已被多种方式定义,使得很难严格地定义它。
宽松地讲,数据仓库是一个数据库,它与组织机构的操作数据库分别维护。
数据仓库系统允许将各种应用系统集成在一起,为统一的历史数据分析提供坚实的平台,对信息处理提供支持。
按照W. H. Inmon,一位数据仓库系统构造方面的领头建筑师的说法,“数据仓库是一个面向主题的、集成的、时变的、非易失的数据集合,支持管理决策制定”。
这个简短、全面的定义指出了数据仓库的主要特征。
四个关键词,面向主题的、集成的、时变的、非易失的,将数据仓库与其它数据存储系统(如,关系数据库系统、事务处理系统、和文件系统)相区别。
让我们进一步看看这些关键特征。
(1) 面向主题的:数据仓库围绕一些主题,如顾客、供应商、产品和销售组织。
数据仓库关注决策者的数据建模与分析,而不是构造组织机构的日常操作和事务处理。
因此,数据仓库排除对于决策无用的数据,提供特定主题的简明视图。
(2) 集成的:通常,构造数据仓库是将多个异种数据源,如关系数据库、一般文件和联机事务处理记录,集成在一起。
使用数据清理和数据集成技术,确保命名约定、编码结构、属性度量的一致性等。
(3) 时变的:数据存储从历史的角度(例如,过去5-10 年)提供信息。
数据仓库中的关键结构,隐式或显式地包含时间元素。
(4) 非易失的:数据仓库总是物理地分离存放数据;这些数据源于操作环境下的应用数据。
由于这种分离,数据仓库不需要事务处理、恢复和并行控制机制。
通常,它只需要两种数据访问:数据的初始化装入和数据访问。
概言之,数据仓库是一种语义上一致的数据存储,它充当决策支持数据模型的物理实现,并存放企业决策所需信息。
数据仓库也常常被看作一种体系结构,通过将异种数据源中的数据集成在一起而构造,支持结构化和启发式查询、分析报告和决策制定。
“好”,你现在问,“那么,什么是建立数据仓库?”根据上面的讨论,我们把建立数据仓库看作构造和使用数据仓库的过程。
数据仓库的构造需要数据集成、数据清理、和数据统一。
利用数据仓库常常需要一些决策支持技术。
这使得“知识工人”(例如,经理、分析人员和主管)能够使用数据仓库,快捷、方便地得到数据的总体视图,根据数据仓库中的信息做出准确的决策。
有些作者使用术语“建立数据仓库”表示构造数据仓库的过程,而用术语“仓库DBMS”表示管理和使用数据仓库。
我们将不区分二者。
“组织机构如何使用数据仓库中的信息?”许多组织机构正在使用这些信息支持商务决策活动,包括:(1)、增加顾客关注,包括分析顾客购买模式(如,喜爱买什么、购买时间、预算周期、消费习惯);(2)、根据季度、年、地区的营销情况比较,重新配置产品和管理投资,调整生产策略;(3)、分析运作和查找利润源;(4)、管理顾客关系、进行环境调整、管理合股人的资产开销。
从异种数据库集成的角度看,数据仓库也是十分有用的。
许多组织收集了形形色色数据,并由多个异种的、自治的、分布的数据源维护大型数据库。
集成这些数据,并提供简便、有效的访问是非常希望的,并且也是一种挑战。
数据库工业界和研究界都正朝着实现这一目标竭尽全力。
对于异种数据库的集成,传统的数据库做法是:在多个异种数据库上,建立一个包装程序和一个集成程序(或仲裁程序)。
这方面的例子包括IBM 的数据连接程序和Informix的数据刀。
当一个查询提交客户站点,首先使用元数据字典对查询进行转换,将它转换成相应异种站点上的查询。
然后,将这些查询映射和发送到局部查询处理器。
由不同站点返回的结果被集成为全局回答。
这种查询驱动的方法需要复杂的信息过滤和集成处理,并且与局部数据源上的处理竞争资源。
这种方法是低效的,并且对于频繁的查询,特别是需要聚集操作的查询,开销很大。
对于异种数据库集成的传统方法,数据仓库提供了一个有趣的替代方案。
数据仓库使用更新驱动的方法,而不是查询驱动的方法。
这种方法将来自多个异种源的信息预先集成,并存储在数据仓库中,供直接查询和分析。
与联机事务处理数据库不同,数据仓库不包含最近的信息。
然而,数据仓库为集成的异种数据库系统带来了高性能,因为数据被拷贝、预处理、集成、注释、汇总,并重新组织到一个语义一致的数据存储中。
在数据仓库中进行的查询处理并不影响在局部源上进行的处理。
此外,数据仓库存储并集成历史信息,支持复杂的多维查询。
这样,建立数据仓库在工业界已非常流行。
1.操作数据库系统与数据仓库的区别由于大多数人都熟悉商品关系数据库系统,将数据仓库与之比较,就容易理解什么是数据仓库。
联机操作数据库系统的主要任务是执行联机事务和查询处理。
这种系统称为联机事务处理(OLTP)系统。
它们涵盖了一个组织的大部分日常操作,如购买、库存、制造、银行、工资、注册、记帐等。
另一方面,数据仓库系统在数据分析和决策方面为用户或“知识工人”提供服务。
这种系统可以用不同的格式组织和提供数据,以便满足不同用户的形形色色需求。
这种系统称为联机分析处理(OLAP)系统。
OLTP 和OLAP 的主要区别概述如下。
(1) 用户和系统的面向性:OLTP 是面向顾客的,用于办事员、客户、和信息技术专业人员的事务和查询处理。
OLAP 是面向市场的,用于知识工人(包括经理、主管、和分析人员)的数据分析。
(2) 数据内容:OLTP 系统管理当前数据。
通常,这种数据太琐碎,难以方便地用于决策。
OLAP 系统管理大量历史数据,提供汇总和聚集机制,并在不同的粒度级别上存储和管理信息。
这些特点使得数据容易用于见多识广的决策。
(3) 数据库设计:通常,OLTP 系统采用实体-联系(ER)模型和面向应用的数据库设计。
而OLAP 系统通常采用星形或雪花模型和面向主题的数据库设计。
(4) 视图:OLTP 系统主要关注一个企业或部门内部的当前数据,而不涉及历史数据或不同组织的数据。
相比之下,由于组织的变化,OLAP 系统常常跨越数据库模式的多个版本。
OLAP 系统也处理来自不同组织的信息,由多个数据存储集成的信息。
由于数据量巨大,OLAP 数据也存放在多个存储介质上。
(5)、访问模式:OLTP 系统的访问主要由短的、原子事务组成。
这种系统需要并行控制和恢复机制。
然而,对OLAP系统的访问大部分是只读操作(由于大部分数据仓库存放历史数据,而不是当前数据),尽管许多可能是复杂的查询。
OLTP 和OLAP 的其它区别包括数据库大小、操作的频繁程度、性能度量等。
2.但是,为什么需要一个分离的数据仓库“既然操作数据库存放了大量数据”,你注意到,“为什么不直接在这种数据库上进行联机分析处理,而是另外花费时间和资源去构造一个分离的数据仓库?”分离的主要原因是提高两个系统的性能。
操作数据库是为已知的任务和负载设计的,如使用主关键字索引和散列,检索特定的记录,和优化“罐装的”查询。
另一方面,数据仓库的查询通常是复杂的,涉及大量数据在汇总级的计算,可能需要特殊的数据组织、存取方法和基于多维视图的实现方法。
在操作数据库上处理OLAP 查询,可能会大大降低操作任务的性能。
此外,操作数据库支持多事务的并行处理,需要加锁和日志等并行控制和恢复机制,以确保一致性和事务的强健性。
通常,OLAP 查询只需要对数据记录进行只读访问,以进行汇总和聚集。
如果将并行控制和恢复机制用于这OLAP 操作,就会危害并行事务的运行,从而大大降低OLTP 系统的吞吐量。
最后,数据仓库与操作数据库分离是由于这两种系统中数据的结构、内容和用法都不相同。
决策支持需要历史数据,而操作数据库一般不维护历史数据。
在这种情况下,操作数据库中的数据尽管很丰富,但对于决策,常常还是远远不够的。
决策支持需要将来自异种源的数据统一(如,聚集和汇总),产生高质量的、纯净的和集成的数据。
相比之下,操作数据库只维护详细的原始数据(如事务),这些数据在进行分析之前需要统一。
由于两个系统提供很不相同的功能,需要不同类型的数据,因此需要维护分离的数据库。
Data warehousing provides architectures and tools for business executives to sy stematically organize, understand, and use their data to make strategic decisions. A lar ge number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people f eel that with competition mounting in every industry, data warehousing is the latest m ust-have marketing weapon —— a way to keep customers by learning more about the ir needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulat e a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse s ystems allow for the integration of a variety of application systems. They support info rmation processing by providing a solid platform of consolidated, historical data for a nalysis.According to W. H. Inmon, a leading architect in the construction of data wareho use systems, “a data warehouse is a subject-oriented, integrated, time-variant, and non volatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. T he four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distingui sh data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day o perations and transaction processing of an organization, a data warehouse focuses on t he modeling and analysis of data for decision makers. Hence, data warehouses typical ly provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple he terogeneous sources, such as relational databases, flat files, and on-line transaction rec ords. Data cleaning and data integration techniques are applied to ensure consistency i n naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical pers pective (e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data tra nsformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and co ncurrency control mechanisms. It usually requires only two operations in data accessi ng: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a p hysical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneou s sources to support structured and/or ad hoc queries, analytical reporting, and decisio n making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integratio n, data cleaning, and data consolidation. The utilization of a data warehouse often nec essitates a collection of decision support technologies. This allows “knowledge worke rs" (e.g., managers, analysts, and executives) to use the warehouse to quickly and con veniently obtain an overview of the data, and to make sound decisions based on infor mation in the warehouse. Some authors use the term “data warehousing" to refer onlyto the process of data warehouse construction, while the term warehouse DBMS is use d to refer to the management and utilization of data warehouses. We will not make thi s distinction here.“How are organizations using the information from data warehouses?" Many org anizations are using this information to support business decision making activities, in cluding:(1) increasing customer focus, which includes the analysis of customer buying pa tterns (such as buying preference, buying time, budget cycles, and appetites for spendi ng),(2) repositioning products and managing product portfolios by comparing the per formance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous dat abase integration. Many organizations typically collect diverse kinds of data and main tain large databases from multiple, heterogeneous, autonomous, and distributed infor mation sources. To integrate such data, and provide easy and efficient access to it is hi ghly desirable, yet challenging.Much effort has been spent in the database industry and research community tow ards achieving this goal.The traditional database approach to heterogeneous database integration is to buil d wrappers and integrators (or mediators) on top of multiple, heterogeneous databases . A variety of data joiner and data blade products belong to this category. When a quer y is posed to a client site, a metadata dictionary is used to translate the query into quer ies appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sit es are integrated into a global answer set. This query-driven approach requires comple x information filtering and integration processes, and competes for resources with pro cessing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach o f heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which informati on from multiple, heterogeneous sources is integrated in advance and stored in a ware house for direct querying and analysis. Unlike on-line transaction processing database s, data warehouses do not contain the most current information. However, a data ware house brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured int o one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store an d integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1. Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems .The major task of on-line operational database systems is to perform on-line trans action and query processing. These systems are called on-line transaction processing ( OLTP) systems. They cover most of the day-to-day operations of an organization, suc h as, purchasing, inventory, manufacturing, banking, payroll, registration, and account ing. Data warehouse systems, on the other hand, serve users or “knowledge workers" i n the role of data analysis and decision making. Such systems can organize and presen t data in various formats in order to accommodate the diverse needs of the different us ers. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as f ollows.(1). Users and system orientation: An OLTP system is customer-oriented and is u sed for transaction and query processing by clerks, clients, and information technolog y professionals. An OLAP system is market-oriented and is used for data analysis by k nowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amou nts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the d ata easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER)data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterpri se or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with informat ion that originates from different organizations, integrating information from many da ta stores. Because of their huge volume, OLAP data are stored on multiple storage me dia.(5). Access patterns: The access patterns of an OLTP system consist mainly of sh ort, atomic transactions. Such a system requires concurrency control and recovery me chanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although m any could be complex queries.Other features which distinguish between OLTP and OLAP systems include data base size, frequency of operations, and performance metrics and so on. 2. But, why ha ve a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and w orkloads, such as indexing and hashing using primary keys, searching for particular re cords, and optimizing “canned" queries. On the other hand, data warehouse queries ar e often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementati on methods based on multidimensional views. Processing OLAP queries in operationa l databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several t ransactions. Concurrency control and recovery mechanisms, such as locking and loggi ng, are required to ensure the consistency and robustness of transactions. An OLAP qu ery often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reducethe throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically mainta in historical data. In this context, the data in operational databases, though abundant, i s usually far from complete for decision making. Decision support requires consolidat ion (such as aggregation and summarization) of data from heterogeneous sources, resu lting in high quality, cleansed and integrated data. In contrast, operational databases c ontain only detailed raw data, such as transactions, which need to be consolidated bef ore analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.五分钟搞定5000字毕业论文外文翻译,你想要的工具都在这里!在科研过程中阅读翻译外文文献是一个非常重要的环节,许多领域高水平的文献都是外文文献,借鉴一些外文文献翻译的经验是非常必要的。