当前位置:文档之家› 数据挖掘论文中英文翻译

数据挖掘论文中英文翻译

数据挖掘论文中英文翻译
数据挖掘论文中英文翻译

数据挖掘教程

塞思保罗

杰米麦克伦南

唐昭辉

斯科特欧俉桑

摘要:微软的SQL Server ?2005年提供了一个综合完整的环境,用于创建和从事数据挖掘模型

工作。本教程使用如下四个实例:目标邮购,数据预测,购物篮,序列簇用来演示阐述如何使用挖掘

模型算法,挖掘模型浏览器,和数据挖掘工具,这些是包含在本次发布的SQL Server中。

在本文件所载的信息,代表了当前微软公司对于出版日期的讨论的看法。因为Microsoft必须响应不断变化的市场条件,它不应被解释为是一种代表微软的承诺,微软和Microsoft不能保证出版日期后提出的任何资料的准确性。

本白皮书仅供参考,对于本文件中的资讯,Microsoft不作任何担保,明示或暗指。

遵守所有适用的版权法是用户的责任。在没有版权的情况下,未经微软公司明确的书面许可,不得以任何形式或以任何方式(电子,机械,影印,录音或其他方式)或为任何目的而复制,储存或引入检索系统,或传输本文件任何一部分。

本文件中可能涉及到微软的专利,专利申请,商标,版权或其他知识产权事项。除明文规定外的任何书面许可协议,微软提供的这份文件没有给你任何许可这些专利,商标,版权或其他知识产权。

@ 2003年微软公司。保留所有权利。

微软既是一个注册商标又是微软公司在美国和/或其他国家的商标。

文中提到的公司和产品的名字可能是它们各自所有者的商标。

介绍

数据挖掘教程的目的是引导您通过微软SQL Server 2005创建数据挖掘模型。该数据挖掘算法和工具,在SQL Server 2005可以很容易地建立一个全面的解决方案的各种项目,包括购物篮分析,预测分析,和邮购分析。对这些解决方案的描述在教程中有更详细的解释。

SQL Server 2005最明显的部分是用来创建和处理数据挖掘模型的工作室。在线分析处理(OLAP )和数据挖掘工具被统一为两个工作环境:商业智能开发工作室和SQL Server 管理工作室。

通过商业智能开发工作室,您可以在与服务器断开连接的情况下建立一个服务项目分析。当项目已经准备就绪,您可以发布到服务器上。您也可以直接面向服务器工作。SQL Server 管理工作室的主要职能是管理服务器。之后将有针对每一个环境的详细说明。欲了解更多关于从两个环境中选择的信息,请参看SQL Server联机丛书中的“在SQL Server 工作室和商业智能开发工作室中选择”。

所有的数据挖掘工具中存在的数据挖掘编辑器。使用编辑器,您可以管理挖掘模型,创造新的模式,以期车型,比较模型,并建立预测的基础上现有的模式。

当您建立一个挖掘模型后,你会想要探索它,寻找有趣的模式和规则。编辑器中每个挖掘模型视图都被定制为由具体算法创建的探索模型。欲了解更多关于视图的信息,请参看SQL Server联机丛书中的“查看数据挖掘模型”。

您的项目往往会包含多个挖掘模型,所以才能使用的模式创建的预测,你要能够确定哪些模式是最准确的。出于这个原因,编辑包含一个模型比较工具挖掘精度的图表标签。使用此工具,您可以比较准确的预测模型和您确定最佳模式。

为了建立数据预期,你将使用一种DME语言,DMX扩展了传统的SQL语法,包含了一些创建修改和建立数据预期的命令,关于DMX的详细信息,请参考SQL BOL中的“Data Mining

Extensions (DMX) Reference”章节。因为建立一个数据预期可能比较复杂,所以数据挖掘编辑器包含了一个工具叫做“Prediction Query Builder”,该工具可以让你在一个图形化的界面下编辑DMX查询语句,你也可以在该工具中可以查看自动生成的DMX语句。

了解了前面介绍的实现数据挖掘的工具之外,同等重要的是了解数据挖掘模型的结构本身,建立一个数据模型的关键是数据挖掘算法,该算法在你操作的数据中寻找我们需要的部分,并且转换这些数据成为一个可操作的数据模型,SQL2005 包含9中数据模型算法:

?决策树

?簇

?传统贝叶斯

?序列簇

?时间系

?联结

?神经网络

?线性回归

?逻辑回归

组合的使用这9种数据算法,你能够创建适应大部分商业逻辑的数据挖掘解决方案,本教程将详细的介绍这些算法。

一些很重要的建立数据挖掘解决方案的步骤是用来整理准备那些用于建立数据模型的数据,SQL2005包含一个DTS的工作环境以及一些DTS的工具用于清理验证准备数据,关于DTS的更多信息请查看SQL BOL中的"DTS Data Mining Tasks and Transformations"章节。

为了阐述SQL2005中的数据挖掘特性,本教程使用了一个新的示例数据库

AdventureWorksDW ,该数据库包含在SQL2005中它提供OLAP以及数据挖掘的一些实例数据。为了使用这个数据库你需要在安装SQL的时候选择它。

Adventure 数据库

AdventureWorksDW 数据库是基于一个虚构的自行车制造公司而建立,公司的名称叫做“Adventure Works Cycles”(简称AW公司)。AW公司生产并向北美,欧洲和亚洲的商业市场销售金属和复合材料的自行车,主要的工作都在华盛顿Bothell完成,那里拥有500 员工,以及一些地区销售部门遍及各地。

AW公司通过INTERNET批发和零售他们的产品,本教程中的数据模型实例需要你使用这些网络销售数据作为数据模型。

关于AW公司数据库的更多信息,请参考SQL Server联机丛书中的如下章节:"Sample Databases and Business Scenarios"。

数据库详细信息

网络销售数据构架包含9242个客户的信息,这些客户分布在6个国家,并被合并为3个区域:?南美(83%)

?欧洲(12%)

?澳大利亚(7%)

该数据库包含三个财政年度的数据:2002年,2003年和2004年。

数据库中的产品根据子类别,型号和产品来分类。

商业智能开发工作室

商业智能开发工作室是一套用于创建商务智能项目的工具。由于商业智能开发工作室是创建于IDE环境中的,在该环境中,你可以在脱机状态下创建一个完整地解决方案。你可以想改多少数据挖掘对象就改多少,但是在你发布该项目前,这些改变将不会反映在服务器上。

在商业智能开发工作室下工作是有益的,理由如下:

?您具有强大的可定制的工具来配置商业智能开发工作室以满足您的需要。

?你可以将各种数据挖掘技术与SSAS项目集成,在同一个工具中完成一个全面的解决方案.

?强大的源码以及版本控制支持使你的团队可以协作的建立一个解决方案.

建立一个SSAS项目是所有商业智能项目的基础,一个SSAS项目独立的建立一个SSAS数据库用于集成多种技术,这个数据库作为数据挖掘模型以及OLAP等技术的基础。你可以使用商业智能建立和修改一个SSAS项目并部署这个项目到一个或多个SSAS服务

如果你在开发一个SSAS项目你也可以使用商业智能开发工作室直接连接数据库,这样你所作的改动可以立刻影响到数据库中。

SQL Server 管理工作室

SQL Server 管理工作室是一个与微软SQL Server协作的管理和脚本工具的集合。这个工作室与商业智能开发工作室的不同在于,你是在一个联机的环境下工作,一旦你保存工作,你的行为就被传送到服务器上。

在数据被清理并为数据挖掘准备好后,大多数和创建苏局挖掘解决方案相关联的工作都在商业智能开发工作室中工作。通过使用商业智能开发工作室,你可以利用迭代过程确定的给定情况下的最佳模式来发布和测试数据挖掘解决方案。一旦开发商对解决方案满意,就可以将其发布到分析服务服务器。

从这点来看,重点从SQL Server管理工作室的开发转移到了维护和应用。在SQL Server管理工作室中,您可以管理您的数据库和执行一些在商业智能开发工作室中的相同的职能,比如在挖掘模式中查看、创建预测。

数据转换服务

在SQL Server 2005中数据转换服务(DTS )包括抽取,转换和加载(简称ETL )工具。

这些工具可用于执行一些数据挖掘中最重要的任务,为数据模型的建立清理和准备数据。在数据挖掘,您通常可以执行重复数据转换清理数据,然后利用这些数据组成挖掘模型。利用DTS中的任务和转移,您可以把数据准备和模型建立结合为一个单一的DTS包。

DTS公司还提供了DTS设计器,以帮助您轻松地建立和运行的包含了所有的任务和转变的软件包。

利用DTS设计器,您可以将包发布到服务器上并定期的运行他们。这是非常有用例如,你每周收集数据资料,并向要每次自动执行相同的清洁转换工作。

你可以通过向商业智能开发式的解决方案中分别增加项目来将数据转换项目和分析服务项目结合起来工作,作为商务智能解决方案的一部分。

挖掘模式算法

数据挖掘算法是挖掘模型的创建的基础。SQL Server 2005中各种各样的算法可以让你执行多种类型的执行。欲了解更多有关算法及其参数调整的信息,请参看SQL Server联机丛书中的“数据挖掘算法”。

决策树

决策树算法支持分类与回归并且对预测模型也行之有效。利用该算法,你可以预测离散和连续这两个属性。

在建立模型时,该算法检查每个数据集的输入属性是怎样的影响预测属性的结果,以及使用最强的关系的输入属性制造了一系列的分裂,称为节点。随着新节点添加到模型中,树状结构开始形成。顶端节点树描述了大多数预测属性的统计分析。每个节点建立把预测属性比作投入的属性的分布情况上。如果输入的属性被视为导致预测属性有利于促成比另一个更好的状态,于是一个新的节点添加到模型。该模型继续增长,直到没有剩余的属性制造分裂提供了一个更好的预测在现有节点。该模型力图找到一个结合的属性和引起在预测属性不成比例分配的状态,因此,您可以预测预测属性的结果。

簇算法采用迭代技术组从包含相似特性的数据及中进行分类。利用这些组合,您可以探讨的数据,更多地了解存在的关系,这在理论上可能不容易通过偶然的观察获得。此外,您也可以从算法创建的簇建立预测模型。例如,考虑那些住在同一社区,驱动器相同的车,吃同样的食物,买了类似的版本的产品的那一个群体的人。这是一组数据。另一组可能包括去相同的餐厅,也有类似的薪金,休假和每年两次以外的地区的人。观测这些集合是如何的分布,可以更好地了解预测属性的结果是如何相互影响的。

传统贝叶斯

传统贝叶斯算法迅速的建立挖掘模型,可用来做分类和预测。它适合各个输入属性情况的可能情况,并考虑到每种预测属性的情况,以后可以在已知的输入属性的基础上来预测预测属性的结果。概率用来生成计算和储存加工过程中的立方体的模型。该算法只支持分立或离散属性,以及它认为所有输入的属性是独立的。传统贝叶斯算法产生一个简单的挖掘模型,可以被视为在数据挖掘过程中的一个起点。由于大多数的计算结果是立方体处理的过程中生成的,结果很快返回。这使得该模型成为探索数据和发现各种不同的输入属性在不同预测属性的情况下是如何分布的一个很好的选择。

时间系

时间系算法创建可以用来预测连续变量随着时间的推移从联机分析处理和关系数据源的模式,。例如,您可以使用时间系预测算法历史数据立方体的基础上来预测销售额和利润。

利用该算法,您可以选择一个或多个变量来预测,但他们必须是继续的。对每个模式您只能有一系列案例。一系列的案例等同于一系列位置,诸如寻求销售的长度的日期超过几个月或几年。

一个例子可能包含了一套变量(例如,销售不同的商店)。时间系算法可以在预测中使用跨变量。

例如,在一个商店的先售可能在预测另一个商店的当前销售时也有用。

联结

联结算法是专门设计用于市场篮子分析。该算法认为每个属性/值配对(如产品/自行车)作为一个项目。一个相集是在单一事务的项目上的一个组合。该算法通过扫描数据集试图找到往往出现在许多交易的项目集。出现在很多交易项面前的支持参数确定被认为是重要的。例如,频繁项目集可能包含(性别= “男性”,婚姻状况= “已婚”,年龄= “30-35 ”)。每个项目集包含项目的数量都有个大小。在这种情况下,大小是3 。

往往联结模式在包含嵌套表的数据集之后工作,如客户名单在一个嵌套的购买列表后。如果一个嵌套表中存在数据集,每个嵌套的建制(如在购买表的产品)被认为是一个项目。

算法同时找到项目集之间的联系。关联模型的规则看起来像A,B= >C (发生概率的联系),其中有A , B ,C都是频繁项目集。' = > …意味着C是通过A和B预测的。概率阈值是一个在被深思考虑的规则之前确定了最低概率参数。这些概率在数据挖掘文献中也被称为“信任”。

联结模式同样对交叉销售或协同过滤有用。例如,您可以使用联结模式在他们购物篮项目上来预测一个用户可能希望购买的产品。

序列簇

序列簇分析算法分析有关联导向的包含离散值系列的数据。通常串联的一连串属性拥有特定的命令(如点击路径)的一组事件。通过分析有关联的事物之间的情况的转变,该算法可以预测有关联的事务将来的情况。

序列簇算法是一种混合型的序列和聚类算法。该算法根据这些关系的相似性将有关系属性的的多重案例分组成片段。该算法的一个典型的使用情况是一个门户网站的网络客户分析。一个门户网站拥有一套附属领域,如新闻,天气,金钱,邮件,和体育。每个网站的客户通过在这些领域中网页点击的顺序联系起来。序列簇算法可以根据他们的导航模式将这些网页客户分组成差不多同质的团体。这些团体是视化的,提供了详细的了解客户如何使用该网站。

神经网络

在Microsoft SQL Server 2005分析服务中,神经网络算法通过构建多层感知神经元网络建立分类与回归挖掘模型。类似微软决策树算法的供应商,考虑到每个可预测属性的情况,该算法为马格可能输入属性的情况计算概率。该算法提供案例的过程,反复比较预测分类的情况和已知的实际分类的案件。

这些来自第一代的整套案件中从最初的分类错误,被反馈到网络,用来修改网络性能的下一代,等等。

以后您可以在输入属性的基础上使用这些概率来预测那些预测属性的结果。然而,该算法和决策树算法其中一个主要区别,是其学习的过程是朝着尽量减少错误的方向优化网络参数,而决策树算法的分裂规则,以求最大限度地发挥信息增益。该算法支持预测的离散和连续属性。

线性回归

线性回归算法是决策树算法的一种特殊的构造,获得了无效的分裂(整个回归公式是建立在一个单一根节点)。该算法支持预测连续属性。

逻辑回归

逻辑回归算法是神经网络算法的一种特殊的构造,得到了消除隐蔽层。该算法支持预测的离散和连续属性。

通过教程实践

在本教程你将在商业智能开发工作室中工作(所描绘图1 )。如需要更多关于商业智能开发工作室的消息,见“使用SQL Server Management Studio中”SQL Server在线联机丛书中。

图 1 商业智能工作室

该教程是分为三个部分:准备SQL Server数据库,编写分析服务数据库,建设并从事挖掘模型的工作。

数据库的准备

该AdventureWorksDW数据库,作为本教程的基础,与SQL Server 一起安装(但不是默认的,作为一个选项在安装时间),并已包含将用于建立挖掘模型的意见。如果没有在安装时安装,您可以在控制面板->添加/删除程序->微软SQL Server 2005选择“改变”按钮添加它。你可以根据在线图书和工作站组件样品查找AdventureWorksDW采样数据仓库。

准备分析服务数据库

在您开始创建和使用挖掘模型之前,您必须执行下列任务:

1.创建一个新的分析服务项目

2.创建一个数据源

3.创建一个数据源的视图

创建分析服务项目

每个分析服务项目为一个单一的分析服务数据库中的对象定义概要。分析服务数据库是由它包含的挖掘模型,OLAP的立方体,和供给对象所定义的。欲了解更多有关分析服务项目的信息,请参看SQL Server联机丛书中的“在商业智能开发工作室中创建分析服务项目”。

要创建一个分析服务项目:

1.打开商业智能开发工作室

2.在文件菜单中选择新建 项目

3.新项目的类型选择分析服务项目,并命名为AdventureWorks

4.单击确定

在商务智能开发工作室中打开新项目。

创建数据源

数据源是一个数据连接,它在您的项目中被保存和管理,并被发布到您的分析服务数据库中。它包含服务器名称和源数据所在的数据库,以及其他被需求的连接属性。

要创建数据源:

1.在解决方案资源管理器中右键单击该数据源工程项目,并选择新数据源

2.在欢迎页上,单击下一步

3.单击新建增加一个到AdventureWorksDW数据库的连接

4.弹出连接管理器对话框。在服务器名称下拉框中,选择服务器托管AdventureWorksDW (例如,

本地),导入您的证书,然后在选择数据库服务器上下拉框中选择AdventureWorksDW数据库。

5.单击确定以关闭连接管理器对话框

6.单击下一步

7.默认的数据源命名为探险工程数据仓库。单击完成

AdventureWorksDW作为新的数据源,出现在解决方案资源管理器中的数据源文件夹中。

英文原文

Data Mining Tutorial

Seth Paul

Jamie MacLennan

Zhaohui Tang

Scott Oveson

Abstract: Microsoft? SQL Se rver? 2005 provides an integrated environment for creating

and working with data mining models. This tutorial uses four scenarios,

targeted mailing, forecasting, market basket, and sequence clustering, to demonstrate

how to use the mining model algorithms, mining model viewers, and data mining

tools that are included in this release of SQL Server.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

2003 Microsoft Corporation. All rights reserved.

Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owner

Introduction

The data mining tutorial is designed to walk you through the process of creating data mining models in Microsoft SQL Server 2005. The data mining algorithms and tools in SQL Server 2005 make it easy to build a comprehensive solution for a variety of projects, including market basket analysis, forecasting analysis, and targeted mailing analysis. The scenarios for these solutions are explained in greater detail later in the tutorial.

The most visible components in SQL Server 2005 are the workspaces that you use to create and work with data mining models. The online analytical processing (OLAP) and data mining tools are consolidated into two working environments: Business Intelligence Development Studio and SQL Server Management Studio. Using Business Intelligence Development Studio, you can develop an Analysis Services project disconnected from the server. When the project is ready, you can deploy it to the server. You can also work directly against the server. The main function of SQL Server Management Studio is to manage the server. Each environment is described in more detail later in this introduction. For more information on choosing between the two environments, see "Choosing Between SQL Server Management Studio and Business Intelligence Development Studio" in SQL Server Books Online. All of the data mining tools exist in the data mining editor. Using the editor you can manage mining models, create new models, view models, compare models, and create predictions based on existing models.

After you build a mining model, you will want to explore it, looking for interesting patterns and rules. Each mining model viewer in the editor is customized to explore models built with a specific algorithm. For more information about the viewers, see "Viewing a Data Mining Model" in SQL Server Books Online.

Often your project will contain several mining models, so before you can use a model to create predictions, you need to be able to determine which model is the most accurate. For this reason, the editor contains a model comparison tool called the Mining Accuracy Chart tab. Using this tool you can compare the predictive accuracy of your models and determine the best model.

To create predictions, you will use the Data Mining Extensions (DMX) language. DMX extends SQL, containing commands to create, modify, and predict against mining models. For more information about DMX, see "Data Mining Extensions (DMX) Reference" in SQL Server Books Online. Because creating a prediction can be complicated, the data mining editor contains a tool called Prediction Query Builder, which allows you to build queries using a graphical interface. You can also view the DMX code that is generated by the query builder.

Just as important as the tools that you use to work with and create data mining models are the mechanics by which they are created. The key to creating a mining model is the data mining algorithm. The algorithm finds patterns in the data that you pass it, and it translates them into a mining model — it is the engine behind the process. SQL Server 2005 includes nine algorithms:

?Microsoft Decision Trees

?Microsoft Clustering

?Microsoft Na?ve Bayes

?Microsoft Sequence Clustering

?Microsoft Time Series

?Microsoft Association

?Microsoft Neural Network

?Microsoft Linear Regression

?Microsoft Logistic Regression

Using a combination of these nine algorithms, you can create solutions to common business problems. These algorithms are described in more detail later in this tutorial.

Some of the most important steps in creating a data mining solution are consolidating, cleaning, and preparing the data to be used to create the mining models. SQL Server 2005 includes the Data Transformation Services (DTS) working environment, which contains tools that you can use to clean, validate, and prepare your data. For more information on using DTS in conjunction with a data mining solution, see "DTS Data Mining Tasks and Transformations" in SQL Server Books Online.

In order to demonstrate the SQL Server data mining features, this tutorial uses a new sample database called AdventureWorksDW. The database is included with SQL Server 2005, and it supports OLAP and data mining functionality. In order to make the sample database available, you need to select the sample database at the installation time in the “Advanced” dialog for component selection.

The audience for this tutorial is business analysts, developers, and database administrators who have used data mining tools before and are familiar with data mining concepts. If you are new to data mining, download "Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis Services"

(https://www.doczj.com/doc/848625885.html,/library/default.asp?url=/servers/books/sqlserver/mining.as p).

Adventure Works

AdventureWorksDW is based on a fictional bicycle manufacturing company

named Adventure Works Cycles. Adventure Works produces and distributes metal and composite bicycles to North American, European, and Asian commercial

markets. The base of operations is located in Bothell, Washington with 500

employees, and several regional sales teams are located throughout their market base.

Adventure Works sells products wholesale to specialty shops and to individuals through the Internet. For the data mining exercises, you will work with the

AdventureWorksDW Internet sales tables, which contain realistic patterns that work well for data mining exercises.

For more information on Adventure Works Cycles see "Sample Databases and

Business Scenarios" in SQL Server Books Online.

Database Details

The Internet sales schema contains information about 9,242 customers. These customers live in six countries, which are combined into three regions:

?North America (83%)

?Europe (12%)

?Australia (7%)

The database contains data for three fiscal years: 2002, 2003, and 2004.

The products in the database are broken down by subcategory, model, and product. Business Intelligence Development Studio Business Intelligence Development Studio is a set of tools designed for creating business intelligence projects. Because Business Intelligence Development Studio was created as an IDE environment in which you can create a complete solution, you work disconnected from the server. You can change your data mining objects as much as you want, but the changes are not reflected on the server until after you deploy the project.

Working in an IDE is beneficial for the following reasons:

?You have powerful customization tools available to configure Business Intelligence Development Studio to suit your needs.

?You can integrate your Analysis Services project with a variety of other business intelligence projects encapsulating your entire solution into a single view.

?Full source control integration enables your entire team to collaborate in creating a complete business intelligence solution.

The Analysis Services project is the entry point for a business intelligence solution.

An Analysis Services project encapsulates mining models and OLAP cubes, along with supplemental objects that make up the Analysis Services database. From Business Intelligence Development Studio, you can create and edit Analysis

Services objects within a project and deploy the project to the appropriate Analysis Services server or servers.

If you are working with an existing Analysis Services project, you can also use Business Intelligence Development Studio to work connected the server. In this way, changes are reflected directly on the server without having to deploy the solution.

SQL Server Management Studio

SQL Server Management Studio is a collection of administrative and scripting tools for working with Microsoft SQL Server components. This workspace differs from Business Intelligence Development Studio in that you are working in a connected environment where actions are propagated to the server as soon as you save your work.

After the data has been cleaned and prepared for data mining, most of the tasks associated with creating a data mining solution are performed within Business Intelligence Development Studio. Using the Business Intelligence Development Studio tools, you develop and test the data mining solution, using an iterative process to determine which models work best for a given situation. When the developer is satisfied with the solution, it is deployed to an Analysis Services server.

From this point, the focus shifts from development to maintenance and use, and thus SQL Server Management Studio. Using SQL Server Management Studio, you can administer your database and perform some of the same functions as in

Business Intelligence Development Studio, such as viewing, and creating

predictions from mining models.

Data Transformation Services

Data Transformation Services (DTS) comprises the Extract, Transform, and Load (ETL) tools in SQL Server 2005. These tools can be used to perform some of the most important tasks in data mining: cleaning and preparing the data for model creation. In data mining, you typically perform repetitive data transformations to clean the data before using the data to train a mining model. Using the tasks and transformations in DTS, you can combine data preparation and model creation into

a single DTS package.

DTS also provides DTS Designer to help you easily build and run packages

containing all of the tasks and transformations. Using DTS Designer, you can deploy the packages to a server and run them on a regularly scheduled basis. This is useful if, for example, you collect data weekly data and want to perform the same cleaning transformations each time in an automated fashion.

You can work with a Data Transformation project and an Analysis Services project together as part of a business intelligence solution, by adding each project to a solution in Business Intelligence Development Studio.

Mining Model Algorithms

Data mining algorithms are the foundation from which mining models are created.

The variety of algorithms included in SQL Server 2005 allows you to perform many types of analysis. For more specific information about the algorithms and how they can be adjusted using parameters, see "Data Mining Algorithms" in SQL Server Books Online.

Microsoft Decision Trees

The Microsoft Decision Trees algorithm supports both classification and regression and it works well for predictive modeling. Using the algorithm, you can predict both discrete and continuous attributes.

In building a model, the algorithm examines how each input attribute in the dataset affects the result of the predicted attribute, and then it uses the input attributes with the strongest relationship to create a series of splits, called nodes. As new nodes are added to the model, a tree structure begins to form. The top node of the tree describes the breakdown of the predicted attribute over the overall population. Each additional node is created based on the distribution of states of the predicted

attribute as compared to the input attributes. If an input attribute is seen to cause the predicted attribute to favor one state over another, a new node is added to the model. The model continues to grow until none of the remaining attributes create a split that provides an improved prediction over the existing node. The model seeks to find a combination of attributes and their states that creates a disproportionate distribution of states in the predicted attribute, therefore allowing you to predict the outcome of the predicted attribute.

Microsoft Clustering

The Microsoft Clustering algorithm uses iterative techniques to group records from

a dataset into clusters containing similar characteristics. Using these clusters, you

can explore the data, learning more about the relationships that exist, which may not be easy to derive logically through casual observation. Additionally, you can create predictions from the clustering model created by the algorithm. For example, consider a group of people who live in the same neighborhood, drive the same kind of car, eat the same kind of food, and buy a similar version of a product. This is a cluster of data. Another cluster may include people who go to the same restaurants, have similar salaries, and vacation twice a year outside the country. Observing how these clusters are distributed, you can better understand how the records in a dataset interact, as well as how that interaction affects the outcome of a predicted attribute.

Microsoft Na?ve Bayes

The Microsoft Na?ve Bayes algorithm quickly builds mining models that can be used for classification and prediction. It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute, which can later be used to predict an outcome of the predicted attribute based on the known input attributes. The probabilities used to generate the model are calculated and stored during the processing of the cube. The algorithm supports only discrete or

discretized attributes, and it considers all input attributes to be independent. The Microsoft Na?ve Bayes algorithm produces a simple mining model that can be

considered a starting point in the data mining process. Because most of the

calculations used in creating the model are generated during cube processing, results are returned quickly. This makes the model a good option for exploring the data and for discovering how various input attributes are distributed in the different states of the predicted attribute.

Microsoft Time Series

The Microsoft Time Series algorithm creates models that can be used to predict continuous variables over time from both OLAP and relational data sources. For example, you can use the Microsoft Time Series algorithm to predict sales and profits based on the historical data in a cube.

Using the algorithm, you can choose one or more variables to predict, but they must be continuous. You can have only one case series for each model. The case series identifies the location in a series, such as the date when looking at sales over a length of several months or years.

A case may contain a set of variables (for example, sales at different stores). The

Microsoft Time Series algorithm can use cross-variable correlations in its predictions.

For example, prior sales at one store may be useful in predicting current sales at another store.

Microsoft Association

The Microsoft Association algorithm is specifically designed for use in market basket analyses. The algorithm considers each attribute/value pair (such as

product/bicycle) as an item. An itemset is a combination of items in a single

transaction. The algorithm scans through the dataset trying to find itemsets that tend to appear in many transactions. The SUPPORT parameter defines how many transactions the itemset must appear in before it is considered significant. For example, a frequent itemset may contain {Gender="Male", Marital Status =

"Married", Age="30-35"}. Each itemset has a size, which is number of items it contains. In this case, the size is 3.

Often association models work against datasets containing nested tables, such as a customer list followed by a nested purchases table. If a nested table exists in the dataset, each nested key (such as a product in the purchases table) is considered an item.

The Microsoft Association algorithm also finds rules associated with itemsets. A rule in an association model looks like A, B=>C (associated with a probability of

occurring), where A, B, C are all frequent itemsets. The '=>' implies that C is predicted by A and B. The probability threshold is a parameter that determines the minimum probability before a rule can be considered. The probability is also called "confidence" in data mining literature.

Association models are also useful for cross sell or collaborative filtering. For

example, you can use an association model to predict items a user may want to purchase based on other items in their basket.

Microsoft Sequence Clustering

The Microsoft Sequence Clustering algorithm analyzes sequence-oriented data that contains discrete-valued series. Usually the sequence attribute in the series holds a set of events with a specific order (such as a click path). By analyzing the transition between states of the sequence, the algorithm can predict future states in related sequences.

The Microsoft Sequence Clustering algorithm is a hybrid of sequence and clustering algorithms. The algorithm groups multiple cases with sequence attributes into segments based on similarities of these sequences. A typical usage scenario for this algorithm is Web customer analysis for a portal site. A portal Web site has a set of affiliated domains such as News, Weather, Money, Mail, and Sport. Each Web customer is associated with a sequence of Web clicks on these domains. The

Microsoft Sequence Clustering algorithm can group these Web customers into more-or-less homogenous groups based on their navigations patterns. These

groups can then be visualized, providing a detailed understanding of how customers are using the site.

Microsoft Neural Network

In Microsoft SQL Server 2005 Analysis Services, the Microsoft Neural Network algorithm creates classification and regression mining models by constructing a multilayer perceptron network of neurons. Similar to the Microsoft Decision Trees algorithm provider, given each state of the predictable attribute, the algorithm calculates probabilities for each possible state of the input attribute. The algorithm provider processes the entire set of cases , iteratively comparing the predicted classification of the cases with the known actual classification of the cases. The errors from the initial classification of the first iteration of the entire set of cases is fed back into the network, and used to modify the network's performance for the next iteration, and so on. You can later use these probabilities to predict an outcome of the predicted attribute, based on the input attributes. One of the primary

differences between this algorithm and the Microsoft Decision Trees algorithm, however, is that its learning process is to optimize network parameters toward minimizing the error while the Microsoft Decision Trees algorithm splits rules in order to maximize information gain. The algorithm supports the prediction of both discrete and continuous attributes.

Microsoft Linear Regression

The Microsoft Linear Regression algorithm is a particular configuration of the Microsoft Decision Trees algorithm, obtained by disabling splits (the whole

regression formula is built in a single root node). The algorithm supports the

prediction of continuous attributes.

Microsoft Logistic Regression

The Microsoft Logistic Regression algorithm is a particular configuration of the Microsoft Neural Network algorithm, obtained by eliminating the hidden layer. The algorithm supports the prediction of both discrete and continuous attributes. Working Through the Tutorial

Throughout this tutorial you will work in Business Intelligence Development Studio (as depicted in Figure 1). For more information about working in Business

Intelligence Development Studio, see "Using SQL Server Management Studio" in SQL Server Books Online.

Figure 1 Business Intelligence Studio

The tutorial is broken up into three sections: Preparing the SQL Server Database, Preparing the Analysis Services Database, and Building and Working with the Mining Models.

Preparing the SQL Server Database

The AdventureWorksDW database, which is the basis for this tutorial, is installed with SQL Server (not by default, but as an option at installation time) and already contains views that will be used to create the mining models. If it was not installed at the installation time, you can add it by selecting Change button from Control Panel → Add/Remove Programs → Microsoft SQL Server 2005. Look for AdventureWorksDW Sample Data Warehouse under Books online and

Samples of Workstation Components.

Preparing the Analysis Services Database Before you begin to create and work with mining models, you must perform the following tasks:

8.Create a new Analysis Services project

9.Create a data source.

10.Create a data source view.

Creating an Analysis Services Project

Each Analysis Services project defines the schema for the objects in a single

Analysis Services database. The Analysis Services database is defined by the mining models, OLAP cubes, and supplemental objects that it contains. For more

information about Analysis Services projects, see "Creating an Analysis Services Project in Business Intelligence Development Studio" in SQL Server Books Online.

To create an Analysis Services project

1.Open Business Intelligence Development Studio.

2.Select New and Project from the File menu.

3.Select Analysis Services Project as the type for the new project and name it

AdventureWorks.

4.Click Ok.

The new project opens in Business Intelligence Development Studio. Creating a Data Source

A data source is a data connection that is saved and managed within your project

and deployed to your Analysis Services database. It contains the server name and database where your source data resides, as well as any other required connection properties.

To create a data source

1.Right-click the Data Source project item in Solution Explorer and select New

Data Source.

2.On the Welcome page, click Next.

3.Click New to add a connection to the AdventureWorksDW database.

4.The Connection Manager dialog box opens. In the Server name drop-down

box, select the server where AdventureWorksDW is hosted (for example,

localhost), enter your credentials, and then in the Select the database on the server drop-down box select the AdventureWorksDW database.

5.Click OK to close the Connection Manager dialog box.

6.Click Next.

7.By default the data source is named Adventure Works DW. Click Finish

The new data source, Adventure Works DW, appears in the Data Sources folder in Solution Explorer.

数据挖掘论文

数据挖掘课程论文 ——————数据挖掘技术及其应用的实现 数据挖掘技术及其应用的实现 摘要:随着网络、数据库技术的迅速发展以及数据库管理系统的广泛应用,人们积累的数据越来越多。数据挖掘(Data Mining)就是从大量的实际应用数据中提取隐含信息和知识,它利用了数据库、人工智能和数理统计等多方面的技术,是一类深层次的数据分析方法。本文介绍了数据库技术的现状、效据挖掘的方法以及它在Bayesian网建网技术中的应用:通过散据挖掘解决Bayesian网络建模过程中所遇到的具体问题,即如何从太规模效据库中寻找各变量之间的关系以及如何确定条件概率问题。 关键字:数据挖掘、知识获取、数据库、函数依赖、条件概率 一、引言: 数据是知识的源泉。但是,拥有大量的数据与拥有许多有用的知识完全是两回事。过去几年中,从数据库中发现知识这一领域发展的很快。广阔的市场和研究利益促使这一领域的飞速发展。计算机技术和数据收集技术的进步使人们可以从更加广泛的范围和几年前不可想象的速度收集和存储信息。收集数据是为了得到信息,然而大量的数据本身并不意味信息。尽管现代的数据库技术使我们很容易存储大量的数据流,但现在还没有一种成熟的技术帮助我们分析、理解并使数据以可理解的信息表示出来。在过去,我们常用的知识获取方法是由知识工程师把专家经验知识经过分析、筛选、比较、综合、再提取出知识和规则。然而,由于知识工程师所拥有知识的有局限性,所以对于获得知识的可信度就应该打个 折扣。目前,传统的知识获取技术面对巨型数据仓库无能为力,数据挖掘技术就应运而生。 数据的迅速增加与数据分析方法的滞后之间的矛盾越来越突出,人们希望在对已有的大量数据分析的基础上进行科学研究、商业决策或者企业管理,但是目前所拥有的数据分析工具很难对数据进行深层次的处理,使得人们只能望“数”兴叹。数据挖掘正是为了解决传统分析方法的不足,并针对大规模数据的分析处理而出现的。数据挖掘通过在大量数据的基础上对各种学习算法的训练,得到数据对象间的关系模式,这些模式反映了数据的内在特性,是对数据包含信息的更高层次的抽象[1]。目前,在需要处理大数据量的科研领域中,数据挖掘受到越来越多的关注,同时,在实际问题中,大量成功运用数据挖掘的实例说明了数据挖掘对科学研究具有很大的促进作用。数据挖掘可以帮助人们对大规模数据进行高效的分

中英文论文对照格式

英文论文APA格式 英文论文一些格式要求与国内期刊有所不同。从学术的角度讲,它更加严谨和科学,并且方便电子系统检索和存档。 版面格式

表格 表格的题目格式与正文相同,靠左边,位于表格的上部。题目前加Table后跟数字,表示此文的第几个表格。 表格主体居中,边框粗细采用0.5磅;表格内文字采用Times New Roman,10磅。 举例: Table 1. The capitals, assets and revenue in listed banks

图表和图片 图表和图片的题目格式与正文相同,位于图表和图片的下部。题目前加Figure 后跟数字,表示此文的第几个图表。图表及题目都居中。只允许使用黑白图片和表格。 举例: Figure 1. The Trend of Economic Development 注:Figure与Table都不要缩写。 引用格式与参考文献 1. 在论文中的引用采取插入作者、年份和页数方式,如"Doe (2001, p.10) reported that …" or "This在论文中的引用采取作者和年份插入方式,如"Doe (2001, p.10) reported that …" or "This problem has been studied previously (Smith, 1958, pp.20-25)。文中插入的引用应该与文末参考文献相对应。 举例:Frankly speaking, it is just a simulating one made by the government, or a fake competition, directly speaking. (Gao, 2003, p.220). 2. 在文末参考文献中,姓前名后,姓与名之间以逗号分隔;如有两个作者,以and连接;如有三个或三个以上作者,前面的作者以逗号分隔,最后一个作者以and连接。 3. 参考文献中各项目以“点”分隔,最后以“点”结束。 4. 文末参考文献请按照以下格式:

中英文对照版合同翻译样本

1.Sales Agreement The agreement, (is) made in Beijing this eighth day of August 1993 by ABC Trading Co., Ltd., a Chinese Corporation having its registered office at Beijing, the People’ Repubic of China (hereinafter called “Seller”) and International Tradi ng Co., Ltd., a New York Corporation having its registered office at New York, N.Y., U.S.A. (hereinafter called “Buyer”). 2.WITNESSETH WHEREAS, Seller is engaged in dealing of (product) and desires to sell (product)to Buyer, and WHEREAS, Buyer desires to purchase(product) from Sellers, Now, THEREFORE, it is agreed as follows: 3.Export Contract This Contract is entered into this 5th day of August 1993 between ABC and Trading Co., Ltd. (hereinafter called “Seller”) who agrees to sell, and XYZ Trading Co., Ltd. (hereinafter called “Buyer”) who agrees to buy the following goods on the following terms and condition. 4.Non-Governmental Trading Agreement No. __This Agreement was made on the_day of_19_, BETWEEN _ (hereinafter referred to as the Seller) as the one Side and _ (hereinafter referred to as the Buyer) as the one other Side. WHEREAS, the Seller has agreed to sell and the buyer has agreed to buy _ (hereinafter referred to as the Goods ) the quantity, specification, and price of which are provided in Schedule A. IT IS HEREBY AGREED AS FOLLOWS: 5.Contract For Joint-Operation Enterprise __ COMPANY LTD., a company duly organized under the Law of __ and having its registered office at (hereinafter called “Party A”) AND __ COMPANY LTD., a company duly organized under the Law of __ and having its registere d office at (hereinafter called “Party B”) Party A and Party B (hereinafter referred to as the “Parties”) agree to jointly form a Co-operation Venture Company (hereinafter referred to as the “CVC”) in accordance with “the Laws of the People’s Republic of C hina on Joint Ventures Using Chinese and Foreign Investment” and the “Regulations for the Implementation of the Laws of the People’s Republic of China on Joint Ventures Using Chinese and Foreign Investment” and other applicable laws and regulations. 6.MODEL CONTRACT Contract No. Date: Seller: Signed at: Address: Cable Address: Buyer: Address: Cable Address: The Seller and the Buyer have agreed to conclude the following transactions according to the terms and conditions stipulated below: https://www.doczj.com/doc/848625885.html, of Commodity: 2.Specifications: 3.Quantity: 4.Unit Price: 5.Total Price: U.S.$: 6.Packing: 7.Time of Shipment: days after receipt of L/C. 8.Loading Port & Destination Port: From via to . 9.Insurance:

文献翻译英文原文

https://www.doczj.com/doc/848625885.html,/finance/company/consumer.html Consumer finance company The consumer finance division of the SG group of France has become highly active within India. They plan to offer finance for vehicles and two-wheelers to consumers, aiming to provide close to Rs. 400 billion in India in the next few years of its operations. The SG group is also dealing in stock broking, asset management, investment banking, private banking, information technology and business processing. SG group has ventured into the rapidly growing consumer credit market in India, and have plans to construct a headquarters at Kolkata. The AIG Group has been approved by the RBI to set up a non-banking finance company (NBFC). AIG seeks to introduce its consumer finance and asset management businesses in India. AIG Capital India plans to emphasize credit cards, mortgage financing, consumer durable financing and personal loans. Leading Indian and international concerns like the HSBC, Deutsche Bank, Goldman Sachs, Barclays and HDFC Bank are also waiting to be approved by the Reserve Bank of India to initiate similar operations. AIG is presently involved in insurance and financial services in more than one hundred countries. The affiliates of the AIG Group also provide retirement and asset management services all over the world. Many international companies have been looking at NBFC business because of the growing consumer finance market. Unlike foreign banks, there are no strictures on branch openings for the NBFCs. GE Consumer Finance is a section of General Electric. It is responsible for looking after the retail finance operations. GE Consumer Finance also governs the GE Capital Asia. Outside the United States, GE Consumer Finance performs its operations under the GE Money brand. GE Consumer Finance currently offers financial services in more than fifty countries. The company deals in credit cards, personal finance, mortgages and automobile solutions. It has a client base of more than 118 million customers throughout the world

数据挖掘论文

数据仓库及其应用技术 摘要本文对于大量存在于计算机信息系统中的数据,通过数据仓库、联机处理技术和数据挖掘技术,对数据进行加工、分析、产生用于决策支持的信息,得以充分利用。 关键词数据仓库数据仓库应用 OLAP 联机分析处理 引言数据仓库技术是计算机数据库系统发展的新方向,近几年来已经在许多领域得到了应用。以数据仓库为基础的商业职能系统强大的功能在实际应用中能带来高利润的回报,所以近年来数据仓库在证券业、银行领域、税务领域、控制金融风险、保险、客户管理等众多领域得到了越来越广泛的应用。据调查,财富500 强企业中已经有85 %的企业建成或正在建立数据仓库。 数据仓库与Internet 一样,正在成为最快的IT 增长点。1996 年,全球企业在数据仓库上的投资达到16. 8 亿美元,并且以每年19. 1 %的速度增长。那么什么是数据仓库? 数据仓库有哪些特征和技术? 下面做一些简单的介绍。 一、数据仓库概念及特征 1、数据仓库概念。 数据仓库就是面向主题的、集成的、不可更新的(稳定的) 、随时间不断变化的数据集合。与其他数据库应用不同的是,数据仓库更像一种过程,即对分布在企业内部各处的业务数据的整合、加工和分析的过程,而不是一种可以购买的产品。 2、数据仓库的特征: ①面向主题。数据仓库中的数据是按照一定的主题域进行组织。主题是一个抽象的概念,是指用户使用数据仓库进行决策时所关心的重点方面,一个主题通常与多个操作型信息系统相关。 ②集成的。数据仓库中的数据是在对原有分散的数据库数据抽取、清理的基础上,经过系统加工、汇总和整理得到的,必须消除源数据中的不一致性,以保证数据仓库内的信息是关于整个企业的一致的全局信息。 ③相对稳定的。数据仓库的数据主要供企业决策分析之用,所涉及的数据操作主要是数据查询,一旦某个数据进入数据仓库以后,一般情况下将被长期保留,也就是数据仓库中一般有大量的查询操作,但修改和删除操作很少,通常只需要定期的加载、刷新。 ④反映历史变化。数据仓库中的数据通常包含历史信息,系统记录了企业从过去某一时点到目前各个阶段的信息,通过这些信息,可以对企业的发展历程和未来趋势做出定量分析和预测。 二、数据仓库的分析技术 1、OLAP 技术 1.1 OLAP (联机分析处理) 的概念。

论文中英文翻译

An Analysis of Cooperative Principles and Humorous Effects in Friend s 合作原则的分析和在朋友的幽默效应 Humor is a very intriguing and fascinating phenomenon of human society, which is multidimensional, complex and all pervasive. Therefore, many scholars and experts at all times and in all over the world have done profound research on humor. 幽默是人类社会的一个非常有趣和引人入胜的现象,这是多方面的,复杂和无孔不入的。所以,在任何时候,在世界各地的许多学者和专家总是对幽默进行深入的研究。 The significant functions of humor have aroused the interest of many scholars. About 2,000 years ago, people began the research on humor. However, the study of humor is not a simple task for the reason that it is an interdisciplinary science drawing upon a wide range of academic disciplines including biology, psychology, sociology, philosophy, geography, history, linguistics, literature, education, family science, and film studies and so on. Moreover, there are different reasons and purposes for humor. One may wish to be sociable, cope better, seem clever, solve problems, make a critical point, enhance therapy, or express something one could not otherwise express by means of humor. 显著幽默的功能已引起许多学者的兴趣。大约在2000年前,人们对幽默开始研究,然而,这项幽默的研究不是一个简单的任务,理由是它是一个跨学科的科学绘图在各种各样的学科,包括生物学、心理学、社会学、哲学、地理、历史、语言、文学、教育、家庭科学和电影研究等。此外,幽默有不同的原因和目的,人们可能希望有点大男子主义,随机应变,似乎是聪明,解决问题,使一个临界点,加强治疗,或表达的东西不能以其他方式表达幽默的方式。 Within the 20th century, linguistics has developed greatly in almost every area of the discipline from sounds, words and sentences to meaning and texts. Meanwhile, linguistic studies on humor have also extended considerably to social, cultural, and pragmatic concerns. One of the most noticeable achievements in linguistics over the

中英文翻译

附录A:英文原文 Website Design (Xiao Xiang Zhang) Abstract:Website construction is divided into several aspects: set theme site and collect data, establish the site directory structure, to set up the web page link relations, page design and web production, site testing, web site As well as the promotion and maintenance. Each step in the process of website construction are play an role instead. Keywords:the directory structure; Web site production; Site tests; Web site Based on the needs of the society now a days, this paper mainly introduces how to create a static web site. And in the process of establishing personal website gradually over the years at the university of understanding and strengthen the learning of all kinds of knowledge points about creating a website. In my personal understanding, to create a distinctive personal website, can have their own personality and characteristics. In this way can attract people. Only one goal, that is, do yourself. 1 the website construction of stages website localization is the ultimate goal of web site design and guidelines. Who established the theme of the site to solve design for, attract, communication. making web pages through the web page design of web site overall coordinated style. Website of colour fundamental key color, color schemes, page style and page composition and image, animation, repeated research. Convenient navigation design users easy access to the site of the resources. the release and maintenance of the site to complete the site test of all files and uploaded to the server, website promotion and publicity,

中英文文献翻译

毕业设计(论文)外文参考文献及译文 英文题目Component-based Safety Computer of Railway Signal Interlocking System 中文题目模块化安全铁路信号计算机联锁系统 学院自动化与电气工程学院 专业自动控制 姓名葛彦宁 学号 200808746 指导教师贺清 2012年5月30日

Component-based Safety Computer of Railway Signal Interlocking System 1 Introduction Signal Interlocking System is the critical equipment which can guarantee traffic safety and enhance operational efficiency in railway transportation. For a long time, the core control computer adopts in interlocking system is the special customized high-grade safety computer, for example, the SIMIS of Siemens, the EI32 of Nippon Signal, and so on. Along with the rapid development of electronic technology, the customized safety computer is facing severe challenges, for instance, the high development costs, poor usability, weak expansibility and slow technology update. To overcome the flaws of the high-grade special customized computer, the U.S. Department of Defense has put forward the concept:we should adopt commercial standards to replace military norms and standards for meeting consumers’demand [1]. In the meantime, there are several explorations and practices about adopting open system architecture in avionics. The United Stated and Europe have do much research about utilizing cost-effective fault-tolerant computer to replace the dedicated computer in aerospace and other safety-critical fields. In recent years, it is gradually becoming a new trend that the utilization of standardized components in aerospace, industry, transportation and other safety-critical fields. 2 Railways signal interlocking system 2.1 Functions of signal interlocking system The basic function of signal interlocking system is to protect train safety by controlling signal equipments, such as switch points, signals and track units in a station, and it handles routes via a certain interlocking regulation. Since the birth of the railway transportation, signal interlocking system has gone through manual signal, mechanical signal, relay-based interlocking, and the modern computer-based Interlocking System. 2.2 Architecture of signal interlocking system Generally, the Interlocking System has a hierarchical structure. According to the function of equipments, the system can be divided to the function of equipments; the system

总结报告-数据挖掘技术论文开题报告 精品

数据挖掘技术论文开题报告 毕业都是需要进行论文的写作,数据挖掘技术论文的开题报告怎么写?下面是数据挖 掘技术论文开题报告,欢迎阅读! 数据挖掘技术综述 数据挖掘(Data Mining)是一项较新的数据库技术,它基于由日常积累的大量数据所 构成的数据库,从中发现潜在的、有价值的信息——称为知识,用于支持决策。数据 挖掘是一项数据库应用技术,本文首先对数据挖掘进行概述,阐明什么是数据挖掘, 数据挖掘的技术是什么,然后介绍数据挖掘的常用技术,数据挖掘的主要过程, 如何 进行数据挖掘,主要应用领域以及国内外现状分析。 一. 研究背景及意义 近十几年来,随着数据库系统的广泛流行以及计算机技术的快速发展,人们利用信息 技术生产和搜集数据的能力大幅度提高。千万个数据库被用于商业管理、政府办公、 科学研究和工程开发等,特别是网络系统的流行,使得信息爆炸性增长。这一趋势将 持续发展下去。大量信息在给人们带来方便的同时也带来了一大堆的问题:第一是信 息过量,难以消化;第二是信息真假难以辨认;第三是信息安全难以保证;第四是信 息形式不一致,难以统一处理。面对这种状况,一个新的挑战被提出来:如何才能不 被信息的汪洋大海所淹没,从中及时发现有用的知识,提高信息利用率呢?这时出现 了新的技术——数据挖掘(Data Mining)技术便应用而生了。 面对海量的存储数据,如何从中发现有价值的信息或知识,成为一项非常艰巨的任务。数据挖掘就是为迎合这种要求而产生并迅速发展起来的。数据挖掘研究的目的主要是 发现知识、使数据可视化、纠正数据。 二. 概述 1,数据挖掘 数据挖掘(Data Mining)就是从大量的、不完全的、有噪声的、模糊的、随机的数据中,提取隐含在其中的、人们事先不知道的、但又是潜在有用的信息和知识的过程。这些 数据可以是结构化的,如关系数据库中的数据,也可以是半结构化的,如文本,图形, 图像数据,甚至是分布在网络上的异构型数据。发现知识的方法可以是数学的,也可 以是非数学的,可以是演绎的,也可以是归纳的。发现了的知识可以被用于信息管理、查询优化、决策支持、过程控制等,还可以进行 数据自身的维护。数据挖掘借助了多年来数理统计技术和人工智能以及知识工程等领 域的研究成果构建自己的理论体系,是一个交叉学科领域,可以集成数据数据库、人 工智能、数理统计、可视化、并行计算等技术。 2,数据挖掘技术

论文外文翻译

Analysis of the role of complaint management in the context of relationship marketing Author: Leticia Su′arez ′Alvarez, University of Oviedo, Spain Abstract This research aims to contribute to the relationship-marketing strategy by studying the role of complaint management in long-term relationships. Two factors distinguish it from other studies: it takes into account two types of customers, consumers and firms, and the result variable selected is the probability of ending an ongoing relationship. Two questionnaires were designed for every population. One of them was auto-administrated to a sample of consumers in the north of Spain, and the other one was sent to a representative sample of Spanish firms. The data analyses were conducted using structural equation modeling. The findings confirm the importance that theory accords to the relationship-marketing strategy, and also provide evidence for the importance of complaint management. Thus having a good complaint-handling system and trained and motivated staff who are fully committed to the firm’s objectives are fundamental requisites for firms to be able to build a stable customer portfolio. Keywords complaint management; relationship marketing; relationship termination; trust; satisfaction Introduction Nowadays, the main task for tourism firms is undoubtedly to deliver superior value to customers. One way that these firms can achieve part of this value is by maintaining quality relationships with their customers. In fact, it is well known that managing these relationships is critical for achieving corporate success. Thus the general aim of the present research is to analyze the most important factors that contribute to relationship stabilization between tourism firms and their customers. This research canters on retail travel agencies. We chose this particular type of tourism firm for two reasons. First, competition between retail travel agencies is becoming much more intense, fundamentally due to the advent of the Internet as an alternative distribution channel for tourism services (Wang & Cheung, 2004). The second reason is the current phenomenon of disintermediation, or the tendency of some tourism service providers to contact the end-customer directly. Because of these two developments, retail travel agencies urgently need to develop a strategy that allows them to maintain a stable portfolio of customers over time if they are to remain in the market for the long term. In order to achieve the proposed objective, we set out a causal model that incorporates a number of factors that can condition the future of the relationships between travel agencies and their customers. Specifically, we chose two variables that

中英文翻译混凝土配合比的选择英文原文

规范 5.2-混凝土配合比的选择 5.2.1-混凝土配比的确定有以下规定: (a)和易性和稠度使混凝土在浇筑时,易于成型和易于与钢筋粘结,不会离析或泌水。 (b)按第四章的要求,混凝土具有抵抗侵蚀的性能。 (c)符合5.6节中强度试验要求。 5.2.2-不同材料用在不同部分,起不同作用,要评测每一个组合。 5.2.3-混凝土配比要与5.3节或5.4节相一致,而且要满足适用性。 5.3-以现场试验和(或)试拌配料 注释 R5.2-混凝土配合比的选择 “普通混凝土,重混凝土,大体积混凝土配比选择标准”(ACI 211.1) 5.1给出了选择混凝土配比的细节规则。(提供了两种选择和调整普通混凝土配比的方法:估计重量和绝对体积方法。给出了两种方法的计算实例。按绝对体积方法配制的重混凝土的配比查看附录。)“结构用轻骨料混凝土配比选择标准”(ACI 211.1) 5.2给出了轻质混凝土选料的方法。(提供了选择和调整不同建筑等级轻骨料混凝土的配比方法。) R5.2.1-所用水灰比要足够低,或者轻骨料混凝土的抗压强度足够高,以满足强度标准(见5.3或5.4)和特殊环境(第四章)的要求。该条规范不包括极恶劣暴露环境的要求,如:酸性条件,高温条件,同时也没考虑美学效果,如表面装修。这些方面超出了该规范的范围,应包含在工程技术要求中。混凝土配比要满足该规范的最低要求,以及合同附加的条款。 R5.2.3-本规范强调混合物的现场试验或试验室试拌(见5.3)作为首选的混凝土配比方法。 R5.3-以现场试验和(或)试拌配料 在选择合适的混凝土混合料时,遵循以下三步。第一,确定样品标准偏差。第二,确定要求的混凝土平均抗压强度。第三,根据传统配制试验或合理的经验记录,选择满足该平均强度要求的混凝土配合比。Fig.R5.3是选择配料的流程图。

毕业论文英文参考文献与译文

Inventory management Inventory Control On the so-called "inventory control", many people will interpret it as a "storage management", which is actually a big distortion. The traditional narrow view, mainly for warehouse inventory control of materials for inventory, data processing, storage, distribution, etc., through the implementation of anti-corrosion, temperature and humidity control means, to make the custody of the physical inventory to maintain optimum purposes. This is just a form of inventory control, or can be defined as the physical inventory control. How, then, from a broad perspective to understand inventory control? Inventory control should be related to the company's financial and operational objectives, in particular operating cash flow by optimizing the entire demand and supply chain management processes (DSCM), a reasonable set of ERP control strategy, and supported by appropriate information processing tools, tools to achieved in ensuring the timely delivery of the premise, as far as possible to reduce inventory levels, reducing inventory and obsolescence, the risk of devaluation. In this sense, the physical inventory control to achieve financial goals is just a means to control the entire inventory or just a necessary part; from the perspective of organizational functions, physical inventory control, warehouse management is mainly the responsibility of The broad inventory control is the demand and supply chain management, and the whole company's responsibility. Why until now many people's understanding of inventory control, limited physical inventory control? The following two reasons can not be ignored: First, our enterprises do not attach importance to inventory control. Especially those who benefit relatively good business, as long as there is money on the few people to consider the problem of inventory turnover. Inventory control is simply interpreted as warehouse management, unless the time to spend money, it may have been to see the inventory problem, and see the results are often very simple procurement to buy more, or did not do warehouse departments . Second, ERP misleading. Invoicing software is simple audacity to call it ERP, companies on their so-called ERP can reduce the number of inventory, inventory control, seems to rely on their small software can get. Even as SAP, BAAN ERP world, the field of

相关主题
文本预览
相关文档 最新文档