Chapter2-DataWarehouse
- 格式:ppt
- 大小:1.81 MB
- 文档页数:46
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latestmust-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical perspective(e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information onwhich an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending),(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which informationfrom multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kindsof data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。
关于数据仓库的书-回复
关于数据仓库的书籍,这里推荐几本在业内颇具影响力的:
1. 《数据仓库工具箱:维度建模权威指南》
作者:Ralph Kimball
这本书被誉为数据仓库领域的经典之作,作者是维度建模方法的主要倡导者,书中详细阐述了维度建模的概念、方法和实践案例,对于理解并构建数据仓库非常有帮助。
2. 《数据仓库设计与实现》
作者:W.H. Inmon
W.H. Inmon被誉为“数据仓库之父”,这本书是其代表作之一,系统介绍了数据仓库的设计原则、架构以及实施过程,适合对数据仓库进行全面学习的读者。
3. 《大数据技术原理与应用——从数据获取到商业智能》
作者:耿新等
这本书结合理论与实践,深入浅出地讲解了大数据环境下的数据仓库技术,包括Hadoop、数据仓库模型、ETL等,并探讨了如何将这些技术应用于商业智能中。
4. 《数据仓库与商务智能:概念、技术和应用》
作者:Ramesh Sharda, Dursun Delen, Efraim T. Turban
这本书全面介绍了数据仓库与商务智能的基本概念、关键技术及应用实例,适合初学者或希望深入了解数据仓库在商务智能领域应用的专业人士阅读。
5. 《数据湖实战:构建企业级数据仓库》
作者:Jarek Ratajski
随着大数据技术的发展,数据湖成为新的热门话题。
本书主要介绍了如何基于现代大数据技术如Hadoop、Spark等构建企业级数据仓库和数据湖。
以上书籍可以按照您的需求和兴趣选择阅读,从而更好地理解和掌握数据仓库的相关知识和技能。
《计算机英语》参考答案Chapter 11.(1) 中央处理器(Central Processing Unit)(2) 随机访问内存(Random-access Memory)(3) 美国国际商用机器公司(International Business Machine)(4) 集成电路(Integrated Circuit)(5) 大规模集成电路(Large Scale Integration)(6) 超大规模集成电路(Very Large Scale Integration)(7) 个人数字助理(Personal Digital Assistant)(8) 图形用户界面(Graphical User Interface)2.(1) data(2) software(3) IC(4) ENIAC(5) supercomputer(6) superconductivity3.(1) F (ENIAC is the second digital computer after Atanasoff-Berry Computer)(2) T(3) F (Data is a unorganized)(4) T(5) T(6) T4.(1) 人工智能(2) 光计算机(3) 神经网络(4) 操作系统(5) 并行处理(6) vacuum tube(7) integrated circuit(8) electrical resistance(9) silicon chip(10) minicomputer5.数据是未经组织的内容的集合,数据可以包括字符、数字、图形和声音。
计算机管理数据,并将数据处理生成信息。
向计算机输入的数据称为输入,处理的结果称为输出。
计算机能在某一个称为存储器的地方保存数据和信息以备后用。
输入、处理、输出和存储的整个周期称为信息处理周期。
与计算机交互或使用计算机所产生信息的人称为用户。
1.(1) 发光二极管(Light-Emitting Diode)(2) 静态随机存储器(Static Random Access Memory)(3) 只读存储器(Read Only Memory)(4) 运算器(Arithmetic and Logical Unit)(5) 阴极射线管(Cathode Ray Tube)(6) 视频显示单元(Visual Display Unit)(7) 可编程只读存储器(Programmable Read Only Memory)(8) 液晶显示屏(Liquid Crystal Display)2.(1) CPU(2) peripheral(3) memory(4) modem(5) control unit(6) byte3.(1) T(2) T(3) F (RAM is volatile memory because the information within the computer chips is erased as soon as the computer is powered off whereas ROM is nonvolatile)(4) T(5) T(6) F (Microphones and digital cameras are input devices)4.(1) 寄存器组(2) 主机(3) 二进制的(4) 算法(5) 光盘(6) CD-RW(7) logic operation(8) barcode(9) peripheral device(10) volatile memory5.计算机的内存可被视为一系列的单元,可以在单元中存取数字。
•••To better support innovation and dif-ferentiation, you need the ability to bring together a “customer view” with a traditional “product view” and you need to give more users and processes on demand access to accurate, in-context and actionable information. Of course, the idea of more timely and widespread information access is great. But the technologist side of your brain is prob-ably screaming, “Complexity!” Andthe business side is probably dubious, given the potential costs and risks. Both sides know that status quo data ware-housing solutions and approaches will not support these seemingly conflicting needs. That’s why a new approach that employs more dynamic and balanced warehousing capabilities is required.With IBM Balanced Warehouse offer-ings, IBM can help your company optimize warehousing performance with best practices to enable you to:Coordinate marketing plans across channels to position your company for growth.Manage inventory across channels and plan assortment based on mar-ketplace needs.Tailor promotions to each customer segment.Enable staff with right-time views into inventory availability.Watch ideas take flight with a flexible, manageable approach IBM provides all of the software and hardware capabilities you need to deploy, maintain and evolve an enterprise-wide data warehouse through IBM Balanced Warehouse solutions. A robust com-bination of database, analytic and warehousing software, servers and stor-age components gives you the ability to analyze and act on large amounts of structured and unstructured information. Moreover, Balanced Warehouse solu-tions rely on industry open standards and nonproprietary hardware, so they’ll work with your existing systems and support easy redeployment as needed.••••IBM Balanced Warehouse solutions are preconfigured using best practices and extensive certification to support the needs of enterprise environments, including the need to:Handle large data volumes. IBM uses a modular design that enables you to easily and cost-effectively scale units to support data growth.Maintain high availability. Balanced Warehouse solutions use IBM compo-nents selected for optimum price and performance, and include hardware component redundancy and a fault-tolerant design for robust availability.Work with comprehensive, integrated software. All of the software tools you need to get started—including information storage, management and delivery tools, and business analytics tools—come standard.Given their advanced,integrated capabilitiesand performance attributes, IBM Balanced Warehouse solutions are an ideal foundation to support dynamic warehousing. This approach enables you to leverage immediate business insight across merchandising, supply chain, store and channel operations, rather than limiting you to providing only after-the-fact reports and analysis from data warehouses. So more people and processes have the information they need to create differentiated customer experiences that help improve customer satisfaction and loyalty.The heart of dynamic ware-housing: IBM DB2 Warehouse Derive more value from information more quickly without adding IT staff. Unlike most data warehousing and business intelligence solutions that are pieced together with components from mul-tiple vendors, IBM DB2® Warehouse software, which is the heart of the IBM Balanced Warehouse solution, provides•••••a complete, integrated and highly flex-ible and scalable data warehousingstack that works together from day one.It offers the tooling and infrastructureto simplify the design, deployment andmaintenance of an enterprise data ware-house. And built-in retail data models(for example, models for customercentricity, merchandising management,store operations and product manage-ment, and supply chain management)and other industry-optimized miningtools and in-line analytics extendpowerful warehousing capabilities toall frontline users. Imagine what the ITdepartment, decision makers and evenstore employees could do with a datawarehouse that enables you to:Store more with less and improvequery performance dramaticallywith the help of row compressiontools, which can help reduce diskstorage needs by 50 percent, andwith materialized query tables andmultidimensional clusters, which aredesigned to improve the performanceof complex aggregate queries.•Reduce investment risks with amodular, quality-tested solution thatprovides around-the-clock supportfrom a single phone number and easygrowth at a predictable cost.Provide users with visibility intooperational and transactional datawithin the context of the applicationsthey use every day, to support greaterresponsiveness to business needs.Exchange data in two directions tohelp ensure that the data warehouseis feeding accurate data to opera-tional and transactional systems andbusiness intelligence applications.Provide high performance for mixedworkload query processing with thehelp of a shared-nothing architecturethat can scale multiple workloads upand out without affecting performance.Unify business intelligence into asingle solution with built-in analyticbuilding blocks that help you extendanalytics into applications.Start seeing the advantage of a balanced warehouseBased on IBM’s experience providing data warehousing to leading companies around the world, IBM has identified three strategic pillars for warehouse solutions that guide its solution design: Simplicity. Reliability and performance. And extended insight. As your data volumes and need for dynamic informa-tion grow, you can be confident that IBM solutions designed using these prin-ciples will help you optimize the value of your information.Choose a solution that’s right for youIBM understands what it takes to runa data warehouse in a retail enter-prise. To meet your company’s unique needs, IBM offers DB2 Warehousein standalone solutions or as part of preconfigured, preintegrated, pretested and highly scalable IBM Balanced Warehouse solutions. Access to accurate information acrossmerchandising, supply chain, store andchannel operations is the key to deliver-ing a superior shopping experience,creating a demand-driven supply chainand driving operational excellence. DB2Warehouse solutions offer targeted anal-ysis for merchandising, supply chain,multichannel and store applications.And with prebuilt retail data models, aproven implementation methodologyand embedded mining capabilities, youcan potentially achieve a faster time tovalue from data warehousing effortswhen you employ DB2 Warehouse. Byhelping you give more users and appli-cations access to dynamic information,Balanced Warehouse solutions can helpyou unlock the value of all of your data.So you can drive greater efficiency, dif-ferentiation and customer loyalty.For more informationTo learn more about IBM BalancedWarehouse solutions and IBM DB2Warehouse, and for help choosing thesolution that’s right for you, contactyour IBM sales representative or visit:/software/bi© Copyright IBM Corporation 2007IBM CorporationSoftware GroupRoute 100Somers, NY 10589U.S.A.Produced in the United States of America08-07All Rights ReservedDB2, IBM, the IBM logo and are trademarksof International Business Machines Corporation in theUnited States, other countries or both.Other company, product and service names may betrademarks or service marks of others.References in this publication to IBM products orservices do not imply that IBM intends to make themavailable in all countries in which IBM operates.The information contained in this documentationis provided for informational purposes only. Whileefforts were made to verify the completeness andaccuracy of the information contained in this docu-mentation, it is provided “as is” without warranty ofany kind, express or implied. In addition, this infor-mation is based on IBM’s current product plans andstrategy, which are subject to change by IBM withoutnotice. IBM shall not be responsible for any dam-ages arising out of the use of, or otherwise relatedto, this documentation or any other documentation.Nothing contained in this documentation is intendedto, nor shall have the effect of, creating any warran-ties or representations from IBM (or its suppliers orlicensors), or altering the terms and conditions of theapplicable license agreement governing the use ofIBM software.The IBM home page on the Internet can be found at®.IMB10923-USEN-00。
数仓模型设计流程Designing a data warehouse model is a crucial step in the process of building a robust and efficient data infrastructure. 数仓模型设计是建立稳健高效数据基础设施过程中的关键一步。
It involves structuring and organizing data in a way that facilitates easy access, retrieval, and analysis for decision-making. 这涉及对数据进行结构化和组织,以便于决策时进行轻松访问、检索和分析。
A well-designed data warehouse model should be able to integrate data from multiple sources, maintain data quality, and provide valuable insights for business operations. 一个精心设计的数仓模型应该能够集成来自多个来源的数据,保持数据质量,并为业务运营提供有价值的见解。
One of the key aspects of designing a data warehouse model is understanding the specific requirements of the organization and its stakeholders. 设计数仓模型的一个关键方面是理解组织及其利益相关者的具体需求。
This involves conducting thorough interviews and meetings with various departments and business users to gather requirements and ensure that the data warehouse model meets the needs of all stakeholders. 这包括与各部门和业务用户进行深入的访谈和会议,以收集需求,并确保数仓模型满足所有利益相关者的需求。
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latestmust-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.(3).Time-variant: Data are stored to provide information from a historical perspective(e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneoussources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending),(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies,(3) analyzing operations and looking for sources of profit,(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, datawarehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, d ata warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kindsof data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。
Data warehousingTo support your business objectivesRely on business insight—not just intuition. Guide business strategy more effectively with IBM DB2 Warehouse solutions for small data warehouse implementationsChances are your systems contain all of the hard evidence that business lead-ers need to understand trends and make informed decisions. The problem is you don’t have a way to easily and cost-efficiently compile and analyze the data —so your organization’s decision makers must rely heavily on experience and intuition. But what intuition suggests and what data shows are often surpris-ingly different.Take the example of a small bank in the northwestern United States. Bank management assumed that wealthy clients who made large deposits were the bank’s most profitable clients. However, after implementing a solution that enabled bank leaders to analyze customer profitability, they realized that the opposite was true. Wealthy clients were savvy negotiators who typically negotiated all of the profits out of the instruments they purchased from the bank. In fact, smaller, more modest investors were more profitable —the bank just needed a strategy to attract more of them.Fact or best guess?The decision-making disconnectIn recent years, small and large organizations alike have seen rapid increases in the amounts of company and cus-tomer data available to them.For many companies, however, an inability to easily access and use this information strategically limits theira bility to optimize operational efficien-cies and differentiate themselves from the competition. And given commoditi-zation and intense competitive pressures, businesses need every advantage they can get to deliver higher service levels, increase efficiency and address regula-tory requirements.Transform information into a strategic asset with a data warehouseIn any industry—whether you work in a small business or in a large company— if you can’t compile and analyzeh istorical data, you can’t separate facts from best guesses to maximize your competitive advantage. And the reason that many organizations can’t get to this information is a lack of integration among company data sources ranging from desktops to legacy systems, servers and intranets. While it may be easy to access current sales or financial data in a single system, pulling together differ-ent types of historical data from multiple systems to see whether you can find any business opportunities is another story. Fortunately, there is an easy and cost-effective way to compile and analyze your data for strategic advantage: a data warehouse. In fact, a data ware-house —which is a central repository of information from the systems across your business—can help you improve decision making and give flight to new ideas across key strategic areas of your business, including:Sales analysis. Understand the regions and time periods in which products are selling, and identify the factors that contribute to wins and losses.•Customer relationship manage-ment. Better understand who your customers are, what they want and what they’re buying so you can give them what they want.Resource planning. Identify cost- cutting opportunities and budget trends to support better investment decisions.Make more informed decisions:IBM DB2 Warehouse softwareAn IBM DB2®Warehouse solution canprovide a data warehouse that delivers an up-to-the-moment, single view of company-wide data without overstretch-ing your IT team or bank account. By integrating data sources ranging from spreadsheets to heterogeneous, siloed, legacy systems, DB2 Warehouse can help decision makers capitalize on an organization-wide view of the information. And with the help of online analytical processing (OLAP) and data mining capabilities from IBM Business Partners, the software can help you navigate and find hidden relationships in your data to spark innovative ideas and see new business opportunities.••DB2 Warehouse software includes a powerful graphical tool, called the SQW warehousing tool, for designing, deploying and loading the warehouse to support data mining and analytics activities. And an easy-to-use interface makes it possible for a wide range of employees to access the capabilities. Moreover —unlike some small-scale data warehouse solutions from third-party vendors that support only limited, difficult-to-scale solutions —flexibility is a key attribute of DB2 Warehouse soft-ware. DB2 Warehouse can support both operational and transactional workloads and prioritize different requests, orga-nizations and users. The architecture also enables you to add more complex workloads, and it easily scales as your business requirements demand.A vast network of IBM Business Partners underpins DB2 Warehouse solutions. Business Partners provide an added layer of local support as well as solu-tions that are proven to integrate easily with IBM technology.Clear insight, costs and a growth pathLet’s face it, if you can’t deploy a data warehouse relatively quickly and maintain it with existing staff, then you probably don’t want one. What’s more, if it doesn’t start adding value soon after it’s in place, then it’s hard to justify the investment. Whether it’s by providing new insight through summarized data or through the output of a Business Partner applica-tion analyzing the key metrics for your business, DB2 Warehouse software can begin delivering insight as soon as your data is loaded.To help simplify the deployment of a data warehouse, IBM offers the IBM Balanced Warehouse solution. Specifi-cally designed to jump-start smaller warehousing implementations, the IBM Warehouse C class provides out-of-the-box solutions that include preintegrated, preconfigured DB2 Warehouse software and IBM systems and storage technol-ogy that are pretested to support optimal performance. Based on nonpropri-etary, readily available hardware, IBM B alanced Warehouse solutions can be easily reused and redeployed depending on changing business needs. They’re competitively priced and simple to use, and they scale easily as your business grows, helping to reduce hidden costs related to training, maintenance and growth. If you want to implement a datawarehouse solution on your own hard-ware, you can also choose from threedifferent competitively priced versions ofthe DB2 Warehouse software—basedon the features that make sense foryour business.IBM’s data warehouse offerings areflexible and agile so you can implementa solution that supports your currentbusiness needs and scale it all the wayup to hundreds of terabytes of data.Take comfort in knowing that you cantransform your data into reliable, con-sistent business insight and easily growyour data warehouse if your require-ments change.For more informationTo learn more about the IBM DB2Warehouse or IBM BalancedW arehouse solutions that bestmeet your business needs, visit:/software/bi© Copyright IBM Corporation 2007IBM CorporationSoftware GroupRoute 100Somers, NY 10589U.S.A.Produced in the United States of America03-07All Rights ReservedDB2, IBM and the IBM logo are trademarks ofI nternational Business Machines Corporation in theUnited States, other countries or both.Other company, product and service names may betrademarks or service marks of others.References in this publication to IBM products orservices do not imply that IBM intends to make themavailable in all countries in which IBM operates.The information contained in this documentationis provided for informational purposes only. Whileefforts were made to verify the completeness andaccuracy of the information contained in this docu-mentation, it is provided “as is” without warranty ofany kind, express or implied. In addition, this infor-mation is based on IBM’s current product plans andstrategy, which are subject to change by IBM withoutnotice. IBM shall not be responsible for any dam-ages arising out of the use of, or otherwise relatedto, this documentation or any other documentation.Nothing contained in this documentation is intendedto, nor shall have the effect of, creating any warran-ties or representations from IBM (or its suppliers orlicensors), or altering the terms and conditions of theapplicable license agreement governing the use ofIBM softwareThe IBM home page on the Internet can be foundat .IMB10902-USEN-00。
库房(货场)管理制度(中英文)库房(货场)管理制度 Warehouse (yard) management system第一章目的The first chapter goal为加强仓库(货场)物资管理,明确物资出入库手续和流程,确保仓库(货场)有序、安全,特制定本制度。
Order to strengthen the administration of the warehouse (yard) supplies clear material for the procedures and processes,Ensure the safety of the warehouse (yard) order, this system.第二章库房(货场)人员工作内容The first chapter Warehouse (yard) personnel work scope and content1、保持库房(货场)材料堆放整齐、标示清楚,环境干净卫生与安全。
Keep the warehouse (yard) materials piled up neatly, clearly marked, the environment clean health and safety.2、入库:根据物资设备采购计划和报关资料核查入库物资型号、规格、数量、包装是否完好,检查无误后入库并填写《开箱检查记录》、《设备材料入库台账》。
Storage: according to the material equipment purchasing plan and customs declaration data check incoming material type, specification, quantity and packing are in good condition, and so on, put in storage after inspection and correct and fill in out of the inspection records、equipment, material storage parameter.3、出库:根据工区施工人员编制的《施工作业票》发放材料,并填写《物资设备领用单》月底25日汇总填写《物资设备用料台账》。
DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast evolving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon —— a way to keep customers by learning more about their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organization's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1)Subject-oriented: A data warehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2)Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on..(3)Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data..In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utilization of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisionsbased on information in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending).(2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies.(3) analyzing operations and looking for sources of profit.(4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processing databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermore, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database systems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data warehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical processing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1)Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2)Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3)Database design: An OLTP system usually adopts an entity-relationship (ER) data model and an application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4)View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their huge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries.Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2.But, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason for such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantially degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.数据仓库数据仓库为商务运作提供了组织结构和工具,以便系统地组织、理解和使用数据进行决策。