数据挖掘在历史名胜古迹分类中的应用(IJISA-V8-N8-7)

格式：pdf
大小：467.70 KB
文档页数：8

下载文档原格式

数据挖掘技术在旅游业中的应用——以黄山市为例

ｓｏｓｈｓｐｐｒｕｅｅｔｖｌｇｄｔｂａｄｂｆｒ，ａａｙｅｎｏｒｃｅｅｏｇｎｌｓｉｃｔｎｏｖｒｅｓｔｕｓｒｖｌｉｐｔ．Ｔｉａｅｓｄｔｒｅｉａａａｏｒｅｏｅｎｌｚｄａｄｃｒｅｔｄｔｒｉａｃａｓａｉｆｏｅａｏｒｔｔｅｎｈａｎｈｉｌｉｆｏｓｉｓａ
旅游数据挖掘，分析修正我国旅游统计中入境游客的原有分类方法，出适合黄山市实际的分类方法，取更多有价值的旅游信提获
息。关键词：据挖掘；游；类分析；山市数旅聚黄
ＴｈｅＯｔｉｉｇＴｃｎｑｅｉｕｉｍ ——０ｅＣＳｆｕｎｓａｉｅＵｓｆＤａａＭｎｎｅｈｉｕｎＴｏｒｓｎｔＡＥｏａｇｈｎＣｔｈＨｙ
文献标识码：Ａ
基金项目：徽省自然科学基金项目（５４００）教育部 “ 安００５１３；十五 ” 划项目—— 旅游管理信息系统（教函【０２１）规高２０】７号
数据挖掘技术在旅游业中的应用一以山为黄市例
ＡｂｔａｔｈａｅｎｒｄｃｄｔｅｋｏｅｇｙｅｗｉｈｔｅＤＭｅｈｏｏｙｍａｉｃｖｒｉｈｏｒｓｓｒｃ：Ｔｅｐｐｒｉｔｏｕｅｈｎｗｌｄｅｔｐｈｃｈｔｃｎｌｇｙｄｓｏｅｎｔｅｔｕｍ．ＩｕｅｈｉｅｅｏｉｔｓｄｔｅｔｍｅｓｒｓｔｉｐｅｉｔｔｅｔｕｉｔｎｍｂｒａｄｉｃｍｅｏｕａｇｈｅｔｉｄｔｅｒｌｔｄｒｌｏｅｃｖｔｕｅｒｌｔｄｄｇｅｆｓｍｅｓｅｉｒｄｃｈｏｒｕｅｎｎｏｆＭｏｔＨｕｎ；ｔｎｕｉｚｈｅａｅｕｅｔｘａａｅｏｔｔａｅｅｒｅｏｏｃｎｃｓｌｅｈｅ

全面认识GIS在考古研究中的应用

全面认识GIS在考古研究中的应用王子荀200800080048山东大学历史文化学院08考古摘要：GIS在考古研究中的应用是现代考古学发展的必然趋势，科学、高效始终是考古研究所追求的目标。

在此，本文欲对GIS考古研究的发展历程、研究内容及实践、未来的发展趋势做一详细总结，以增强GIS在考古研究中应用的全面认识。

关键词：考古研究 GIS 发展历程内容及实践发展趋势全面认识考古学以古代人类的各种物质遗存为研究对象1，涉及人类生活的方方面面。

所以这种广泛的研究内容就要求研究手段的多样性，特别是自然科学和技术要求的提升在现代考古学显得愈加重要。

其中，GIS考古研究就是在这样的背景下勃然兴起的。

众所周知，考古工作是一个从古代遗存中发掘、整理、研究古代物质文化遗存的过程，所以，GIS的应用对处理考古研究中的海量信息具有十分重的作用，从而应用到了与考古相关的各个环节。

GIS（Geograhic Information System），即地理信息系统，是一项以计算机科学为基础的数据管理、分析研究的综合技术系统。

通过对空间数据的坐标、位置处理，可以实现数据的有效管理及运用，可以高效、科学的解决考古研究中的相关问题2。

GIS考古研究的产生于发展经历了一个从简单要复杂的过程，它今后的发展趋势也必将随着自然科学中GIS研究的深入而不断拓展，下面我们就其发展历程、应用实践和发展趋势进行全面记述。

一、GIS运用于考古研究的基础GIS来源自然科学研究领域，在利用到考古研究的过程中，并不是偶然的结果，GIS考古研究具有其特定的研究基础。

GIS基础的应用必然和现代考古研究实践在某些方面具有相应的切合点，从而才促使了两个学科的结合。

首先，GIS（地理信息系统，Geograhic Information System）属于空间信息系统。

它是上世纪六十年代开始发展起来的地理学研究技术系统，作为计算机技术、地理、遥感、测绘、统计、规划、管理学和制图学等学科交叉运用的产物，它代表了现代计算机科学和其他学科相互渗透的发展方向。

数据挖掘技术在旅游业中的应用

数据挖掘技术在旅游业中的应用【前言】数据挖掘技术是一种快速发展的计算机技术，在旅游业中的应用也越来越广泛。

本文就数据挖掘技术在旅游业中的应用进行探讨。

【数据挖掘技术概述】数据挖掘是从海量数据中找出有用的信息和知识的过程。

数据挖掘可以通过各种技术和方法获取、识别、分析和理解数据中的模式和关系。

数据挖掘技术包括多种算法、模型和工具，例如分类、聚类、预测、关联规则挖掘等。

【旅游业中数据挖掘技术的应用】1. 用户画像与定位用户画像与定位是指通过对旅游者的数据进行分析推断出其个性化需求和偏好。

数据挖掘技术可以帮助旅游从业者更好地了解旅游者，为旅游者提供更个性化的旅游产品和服务。

2. 市场分析与预测市场分析与预测是指对旅游市场进行深入分析和预测。

数据挖掘技术可以帮助旅游从业者对市场情况进行全面的了解，包括市场规模、市场口碑和竞争状况等。

旅游从业者可以根据数据挖掘的结果调整产品和服务，满足旅游者需求，提高市场竞争力。

3. 旅游线路推荐旅游线路推荐是指通过对旅游者历史活动记录和偏好进行分析，为旅游者推荐更符合其需求的旅游线路。

数据挖掘技术可以帮助旅游从业者迅速识别旅游者的偏好和需求，并为旅游者提供更好的体验和服务。

4. 景点评分景点评分是指基于旅游者对景点的评价和反馈，对景点进行评分和排名。

数据挖掘技术可以帮助旅游从业者快速获取旅游者的反馈信息，并根据其对景点的评价和反馈，调整景点的运营和服务，提高旅游者的满意度。

5. 旅游投诉分析旅游投诉分析是指对旅游者投诉进行数据挖掘和分析，从而全面了解旅游者的困难和问题。

数据挖掘技术可以帮助旅游从业者更好地了解旅游者对产品和服务的不满，以便及时做出改善和调整。

【结论】数据挖掘技术在旅游业中的应用越来越广泛，旅游从业者可以通过数据挖掘技术获取更多的信息和知识，为旅游者提供更好的体验和服务，从而提高竞争力和市场份额。

同时，企业也可以通过数据挖掘技术识别用户需求，提高用户黏性和满意度，形成巨大的商业价值。

关于档案管理方面的数据挖掘分析及应用探讨

关于档案管理方面的数据挖掘分析及应用探讨随着计算机技术的不断发展和进步，人们的生活水平质量也逐渐提高。

档案管理方面也开始应用计算机行业的数据挖掘技术，来提高档案管理效率，进而推动档案管理行业的发展。

本文将从数据挖掘概念及形式入手，分析并探讨数据挖掘技术在档案管理方面的应用。

标签：数据挖掘技术;档案管理;应用引言数据信息在人们的日常生活中扮演着重要的角色。

数据可以组成若干事件、物体，甚至能够组成整个社会。

其中，这些事件和物体之间也存在着错综复杂的关系，而数据挖掘技术便是要从所有数据中找到关系所在，并根据这些关系直接推断出来一些有价值且能够直接使用的信息，而非仅仅通过一些片面的数据信息进行定论。

目前，档案管理行业应适应社会发展，运用数据挖掘技术，使人们实时获取所需信息，提高办公效率。

本文主要对档案管理方面的数据挖掘技术的应用进行探讨。

一、数据挖掘技术的形式数据挖掘技术的形式分为描述型和预测型。

描述型是从现有的数据使用描述行为描述出存在的规则，进而发掘现有数据中更深层次的规律。

预测型是从现有的数据中总结出共同点，同时对未来即将发生的事件进行预测。

在数据挖掘技术的应用场景中，通常使用分类法、关联法和粗糙集法。

（一）分类法分类法是数据挖掘技术的核心。

分类的优劣不仅关系着数据不同属性的分析，而且会对数据质量产生较大的影响。

分类法的主要操作流程如下：首先，对数据库中现有的数据根据不同属性进行分类。

其次，对现有数据进行训练集和测试集的划分，保证训练量足够多，而测试量足够的少。

最后，对数据进行测试，再根据不同属性进行二次分类。

（二）关联法关联法在数据挖掘技术中不仅能够对现有数据的相关性进行详细的分析，而且能够精确描述出相关数据。

该方法主要流程如下：首先，对现有数据进行详细描述。

然后把属于同一属性的数据结合，并分析其相同点。

这种方法不仅提高了数据的准确性，而且提升了整体工作效率。

（三）粗糙集法粗糙集常用于研究不确定、不精确的知识。

数据挖掘技术在智慧旅游中的应用

数据挖掘技术在智慧旅游中的应用旅游，是人们重要的消费方式之一，也是一种享受生活的方式。

随着人们生活水平的提高，旅游行业也不断发展壮大。

但是，在旅游过程中，游客往往会面临种种问题和需求，这些问题和需求如何得以解决和满足，就需要智慧旅游技术的应用。

数据挖掘技术，是智慧旅游技术的重要支撑。

本文将从数据挖掘技术的角度，分析其在智慧旅游中的应用，以期为旅游行业的智能化发展提供有益的思路。

1. 智慧旅游的背景和意义现代旅游，已经不再是单纯的“去玩”了。

相反，现代旅游是一种带着目的的消费行为。

游客出行之前，会通过各种渠道了解目的地的景点、美食、住宿等方面的情况，同时也会了解当地的环境、交通等相关信息，以便做好旅游的准备。

智慧旅游的理念，就是为了满足游客对于这些需求的更完美诉求，提高整个旅游行业的服务质量和效率，从而为游客提供更加优质的旅游体验和感受。

2. 数据挖掘技术的原理和应用数据挖掘技术，是利用计算机自动分析数据的方法和技术。

它通过数据挖掘算法、模型、工具以及方法论等，自动发现数据中的规律和关系，从而提取有价值的信息。

有了这些信息，就可以进行数据分析和决策，从而支持和推进旅游行业的智慧化。

数据挖掘技术在智慧旅游中的应用，主要包括以下几个方面：2.1 游客的个性化推荐数据挖掘技术可以通过对游客的历史游览数据的分析，挖掘出游客的兴趣爱好和偏好，从而为游客提供更加个性化的推荐服务。

通过游客的历史游览数据，可以分析出游客喜欢哪些景点，喜欢哪类美食，喜欢怎样的住宿方式等等。

当游客再次出行时，就可以根据这些信息提供更加贴心的推荐服务，为游客提供更好的体验和感受。

2.2 旅游景点的智能推荐数据挖掘技术可以通过分析旅游景点的历史游客数据，挖掘出景点的热门时段、游客喜欢的景点以及游客的满意度等信息。

对于景区管理方来说，可以根据这些信息制定更加合理的景区规划、服务计划和营销策略。

对于游客来说，可以通过这些信息获得更加智能化的景点推荐，从而更好地规划自己的旅游路线。

数据挖掘在档案管理中的运用

数据挖掘在档案管理中的运用作者：于然来源：《办公室业务（上半月）》 2017年第3期【摘要】由于我国的信息技术迅速发展，传统档案管理的技术已经不能满足现代的信息需求，数据挖掘技术的应用为档案管理工作效率的提升带来便利。

本文通过说明数据挖掘技术的有关内容，阐明数据挖掘技术的相关知识，并对数据挖掘技术在档案管理工作中的实际运用来进行举例分析。

【关键词】数据挖掘技术；档案管理；分析运用由于信息技术的迅速发展，现代的档案管理模式与过去相比，也有了很大的变化，也让如今的档案管理模式有了新的挑战。

让人们对信息即时、大量地获取是目前档案管理工作和档案管理系统急切需要解决的问题。

一、数据挖掘概述（一）数据挖掘技术。

数据挖掘是指从大量的、不规则、乱序的数据中，进行分析归纳，得到隐藏的，未知的，但同时又含有较大价值的信息和知识。

它主要对确定目标的有关信息，使用自动化和统计学等方法对信息进行预测、偏差分析和关联分析等，从而得到合理的结论。

在档案管理中使用数据挖掘技术，能够充分地发挥档案管理的作用，从而达到良好的档案管理工作效果。

（二）数据挖掘技术分析。

数据挖掘技术分析的方法是多种多样的，其主要方法有以下几种：1.关联分析。

指从已经知道的信息数据中，找到多次展现的信息数据，由信息的说明特征，从而得到具有相同属性的事物特征。

2.分类分析。

利用信息数据的特征，归纳总结相关信息数据的数据库，建立所需要的数据模型，从而来识别一些未知的信息数据。

3.聚类分析。

通过在确定的数据中，找寻信息的价值联系，得到相应的管理方案。

4.序列分析。

通过分析信息的前后因果关系，从而判断信息之间可能出现的联系。

二、数据挖掘的重要性在进行现代档案信息处理时，传统的档案管理方法已经不能满足其管理的要求，数据挖掘技术在这方面确有着显著的优势。

首先，档案是较为重要的信息记录，甚至有些档案的重要性大到无价，因此对于此类的珍贵档案，相关的档案管理人员也是希望档案本身及其价值一直保持下去。

数据挖掘技术及其应用

数据挖掘技术及其应用摘要：随着网络、数据库技术的迅速发畏以及数据库管理系统的广泛应用，人们积累的数据越来越多。

数据挖掘(Data Mining)就是从大量的实际应用数据中提取隐含信息和知识，它利用了数据库、人工智能和数理统计等多方面的技术，是一类深层次的数据分析方法。

关键词：数据挖掘；知识；分析；市场营销；金融投资随着网络、数据库技术的迅速发展以及数据库管理系统的广泛应用，人们积累的数据越来越多。

由此，数据挖掘技术应运而生。

下面，本文对数据技术及其应用作一简单介绍。

一、数据挖掘定义数据挖掘(Data Mining)就是从大量的、不完全的、有噪声的、模糊的、随机的实际应用数据中，提取隐含在其中的、人们事先不知道的、但又是潜在有用的信息和知识的过程。

它是一种新的商业信息处理技术，其主要特点是对商业数据库中的大量业务数据进行抽取、转换、分析和其他模型化处理，从中提取辅助商业决策的关键性数据。

简而言之，数据挖掘其实是一类深层次的数据分析方法。

从这个角度数据挖掘也可以描述为：按企业制定的业务目标，对大量的企业数据进行探索和分析，揭示隐藏的、未知的或验证已知的规律性，并进一步将其模型化的先进有效的方法。

二、数据挖掘技术数据挖掘技术是人们长期对数据库技术进行研究和开发的结果，代写论文其中数据仓库技术的发展与数据挖掘有着密切的关系。

大部分情况下，数据挖掘都要先把数据从数据仓库中拿到数据挖掘库或数据集市中，因为数据仓库会对数据进行清理，并会解决数据的不一致问题，这会给数据挖掘带来很多好处。

此外数据挖掘还利用了人工智能(AI)和统计分析的进步所带来的好处，这两门学科都致力于模式发现和预测。

数据库、人工智能和数理统计是数据挖掘技术的三大支柱。

由于数据挖掘所发现的知识的不同，其所利用的技术也有所不同。

1．广义知识。

指类别特征的概括性描述知识。

根据数据的微观特性发现其表征的、带有普遍性的、较高层次概念的、中观和宏观的知识，反映同类事物的共同性质，是对数据的概括、精炼和抽象。

数据挖掘技术在旅游行业中的应用案例分析

数据挖掘技术在旅游行业中的应用案例分析随着互联网的高速发展和大数据时代的来临，旅游行业正逐渐开始关注和采用数据挖掘技术来优化运营和提升用户体验。

本文将通过分析几个旅游行业中的实际案例，探讨数据挖掘技术在旅游行业中的应用和价值。

1. 用户数据分析与个性化推荐一家在线旅游平台利用数据挖掘技术对大量用户行为数据进行分析，从而了解用户的偏好和需求。

通过分析用户在平台上的浏览记录、搜索关键字、点击率、订购行为等数据，平台能够准确地了解用户的旅游偏好，从而提供个性化的旅游推荐和优惠活动。

例如，平台分析用户的浏览记录和搜索关键字，发现某用户对海滩度假酒店比较感兴趣。

平台会根据这一信息，推荐给用户海滩度假酒店的相关产品和特价优惠，从而提高用户的购买意愿和满意度。

2. 航班延误预测与智能安排对于航空公司而言，航班延误对乘客和运营都是非常不利的。

通过利用数据挖掘技术，航空公司能够对多种因素进行分析和预测，从而及时预警和应对可能的航班延误情况。

航空公司可以通过分析天气、空域拥堵、机械故障等因素的历史数据，建立相应的预测模型。

这些模型可以根据实时数据和即将发生的天气等情况，提前预测航班延误的概率，并做出智能安排，如调整飞行路线、机组人员和备用飞机等，以最大限度地减少延误对乘客的影响。

3. 景点热度预测与游客管理对于旅游景点而言，了解游客的轨迹、行为和兴趣可以帮助景点管理者更好地规划和管理景区资源。

数据挖掘技术可以通过收集游客的手机信号、社交媒体信息等数据，分析游客的行为模式和兴趣偏好，从而预测景点的热度和拥堵情况。

通过分析游客的实时位置数据和游玩行为，景区管理者可以及时调度人员和资源，如增加导游数量、开设更多的停车位等，以提高景区的服务质量和游客满意度。

预测景点热度还可以帮助游客规划自己的行程，选择较空闲的时间段参观，从而避开拥堵和人流高峰。

4. 旅游市场分析与策略制定数据挖掘技术可以帮助旅游行业从大数据中发现市场趋势和消费者需求，从而制定更有效的营销策略。

数据挖掘技术在档案管理中的应用研究

数据挖掘技术在档案管理中的应用研究作者：邬萍来源：《中国管理信息化》2017年第08期[摘要]信息技术快速发展，数据挖掘技术的出现使信息管理逐渐实现智能化、信息化。

数据挖掘技术在档案管理中也发挥着至关重要的作用，其能够使档案管理的水平、整体效果得到提升，减轻档案管理人员的工作压力与负担。

基于此，本文对数据挖掘技术在档案管理中的应用价值以及具体情况进行了分析，以供大家参考。

[关键词]数据挖掘技术；档案管理；应用价值doi：10.3969/j.issn.1673 - 0194.2017.08.106[中图分类号]G270.7 [文献标识码]A [文章编号]1673-0194（2017）08-0-01社会不断发展，信息技术水平得到提升，且信息技术在档案管理中也发挥着越来越重要的作用。

传统的档案管理模式已经无法满足现在社会档案管理的实际需要，因此必须要有一种时效性、系统性和需求量都能满足需要的管理技术模式。

而数据挖掘技术的出现很好的改善了档案管理的情况，保证了档案资源得到合理的分类、收集与保留。

1 档案管理中应用数据挖掘技术的价值分析档案记录的内容都是极为重要的信息资料，且能够在一定程度上反映出档案管理人员的素质与智慧。

随着信息技术以及计算机水平的提高，数据挖掘技术逐渐被应用到档案信息管理中，使传统的档案管理模式发生了转变，并促进了档案管理水平的提升，因此在档案管理中应用数据挖掘技术具有重要的价值。

首先，其能够使档案管理更加安全。

档案资料中记录的都是比较珍贵、具有一定历史价值的资料，档案信息实体是其价值体现的重要载体。

作为档案管理人员应尽可能长时间的对档案信息资源进行保存，更好的体现和增加其价值，并且提高档案信息资源的使用频率。

但是这种情况下档案信息资料的保存管理工作就会遇到一定的难度，使用的次数越多，档案的寿命就会缩短。

且在档案管理工程中，保密性也是重要的工作内容，如果档案信息泄露，相关信息会暴露，工作人员的隐私权等也会受到不利的影响，使档案管理与使用之间存在着一定的问题。

数据挖掘技术在旅游行业中的应用研究

数据挖掘技术在旅游行业中的应用研究近年来，数据挖掘技术在各个领域得到了越来越广泛的应用，其中在旅游行业中的应用更是备受关注。

随着旅游业的快速发展，旅游数据量大幅增加，如何有效地挖掘和利用这些数据成为了旅游行业的一大挑战和机遇。

本文将探讨数据挖掘技术在旅游行业中的具体应用和研究。

一、数据挖掘技术简介数据挖掘技术是一种从大量数据中提取出有价值信息的过程。

通过对数据进行分析、模式识别和预测，可以帮助企业更好地理解市场趋势、客户需求和产品特点。

数据挖掘技术主要包括分类、聚类、关联规则挖掘和预测分析等方法。

在旅游行业中，数据挖掘技术可以帮助旅行社、酒店、景点等企业挖掘客户需求和行为模式，提高市场竞争力和服务质量。

下面将具体探讨几种主要的数据挖掘技术在旅游行业中的应用。

二、分类分析分类分析是一种通过构建分类器将数据样本分为不同类别的数据挖掘方法。

在旅游行业中，分类分析可以帮助企业对客户进行分类，了解客户的偏好和需求，从而有针对性地进行市场营销和产品推广。

以在线旅行社为例，通过对用户购买行为和偏好进行分类分析，可以发现不同类别用户的消费习惯和旅游偏好，进而为他们提供个性化的服务和产品推荐。

通过数据挖掘技术，旅行社可以更好地了解客户需求，提高用户满意度和购买转化率。

三、聚类分析聚类分析是一种将数据样本划分为多个相似的簇群的数据挖掘方法。

在旅游行业中，聚类分析可以帮助企业发现不同客户群体之间的相似性和差异性，为产品定价、市场定位和推广策略提供依据。

以酒店行业为例，通过对客户入住记录和评价数据进行聚类分析，可以发现不同类型客户的偏好和行为模式，根据不同客户群体的需求提供相应的服务和促销活动。

通过数据挖掘技术，酒店可以更好地满足客户需求，提高客户忠诚度和口碑效应。

四、关联规则挖掘关联规则挖掘是一种从数据集中发现事务项之间的关联性规则的数据挖掘方法。

在旅游行业中，关联规则挖掘可以帮助企业发现不同产品和服务之间的关联性，为交叉销售和推荐系统提供支持。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

I.J. Intelligent Systems and Applications, 2016, 8, 58-65Published Online August 2016 in MECS (/)DOI: 10.5815/ijisa.2016.08.07Application of Data Mining in the Classification of Historical Monument PlacesSiddu P. AlgurDepartment of Computer Science, Rani Channamma University, Belagavi-591156, Karnataka, IndiaE-mail: siddu_p_algur@*Prashant BhatDepartment of Computer Science, Rani Channamma University, Belagavi-591156, Karnataka, IndiaE-mail: prashantrcu@P.G. Sunitha HiremathDept. of Information Science and Engg., BVB College of Engineering and Tech., Hubli, Karnataka, IndiaE-mail: pgshiremath64@Abstract—The economic development and promotion of a country or region is depends on several facts such as- tourism, industries, transport, technology, GDP etc. The Government of the country is responsible to facilitate the opportunities to develop tourism, technology, transport etc. In view of this, we look into the Department of Tourism to predict and classify the number of tourists visiting historical Indian monuments such as Taj- Mahal, Agra, and Ajanta etc.. The data set is obtained from the Indian Tourist Statistics which contains year wise stati stics of visitors to historical monuments places. A survey undertaken every year by the government is preprocessed to fill out the possible missing values, and normalize inconsistent data. Various classification techniques under Decision Tree approach such as- Random Tree, REPTree, Random Forest and J48 algorithms are applied to classify the historical monuments places. Performance evaluation measures of the classification models are analyzed and compared as a step in the process of knowledge discovery.Index Terms—Tourist Classification, Random Tree, Random Forest, J48 Algorithm, REP Tree.I.I NTRODUCTIONData Mining is a method of retrieving formerly unknown, suitable, potentional useful and unknown patterns from large data sets (Connolly, 1999). Nowadays the amount of data stored in Government databases is increasing rapidly. In order to get required benefits from such large data and to find hidden relationships between variables using different data mining techniques developed and used (Han and Kamber, 2006). There are increasing research interests in using data mining in different Government sectors such as Tourism, Health, Travels, Army, and Education etc.Tourism in India is economically significant and is increasing rapidly. The World Travel & Touris m Council measured that in the year 2012, the tourism produced 6.4 trillion or 6.6% of the nation's GDP [1]. It supported 39.5 million employments, 7.7% of its total employment. The sector is predicted to grow at an average annual rate of 7.9% till 2023 making India the third fastest growing tourism destination over the next decade. India has a large medical touris m sector which is expected to grow at an estimated rate of 30% annually to reach about ₹95 billion by 2015 [1].About 22.57 million tourists arrived in India in 2014, compared to 19.95 million in 2013. This ranks India as the 38th country in the world in terms of foreign tourist arrivals. Domestic tourist visits to all states and Union Territories numbered 1,036.35 million in 2012, an increase of 16.5% from 2011. In 2014, Tamil Nadu, Maharashtra and Uttar Pradesh were the most popular states for tourists. Chennai, Delhi, Mumbai and Agra have been the four most visited cities of India by foreign tourists during the year 2011. Worldwide, Chennai is ranked 38 by the number of foreign tourists, while Mumbai is ranked at 50, Delhi at 52 and Agra at 66 and Kolkata at 99. The Knowledge of approximate number of visitors to a place of historical monuments can help the Government to provide adequate resources. By looking into these statistics, to predict the number of visitors for the next year is helpful for the Government to provide adequate resources such as halting, waters, and some other tourist friendly resources in the historical monuments places to increase the number of visitors every year, thereby to get increase in the overall GDP of the country.In this work, a dataset is taken which contains the information of number of visitors to 55 historical monuments places from the year 2002 to 2013. The different data mining strategies such as preprocessing, building the classification model and testing the built models are effectively undertaken for the considered tourist statistical data to classify the historical monuments places based on their number of visitors. The rest of the paper is organized as follows. The section 2 providessome related works regarding Touris m Research, the section 3 provides proposed model which contains different data mining strategies to classify 55 historical monuments places, and the section 4 represents classification results and analysi s. Finally, the section 5 represents conclusion and future enhancements.II.R ELATED W ORKSThis section represents some related prior works on Tourism research based on Data Mining techniques. The authors [2] Jianhong et al.described how to design and implement the methods to identify the spatiotemporal movement patterns and across patterns between various categories of tourist and their spatio-temporal movement patterns. The frequent spatio-temporal movement sequence in the case study was extracted from the database. The major finding of [2] was to identify and study of spatio temporal pattern of visit of tourists in different day times for different places.The authors Rob Law, et.al, [3] have reviewed various tourist forecasting research papers and found that, the application of Data Mining techniques are well suited for the tourism field. The authors [3] also discussed research gap and suggested for future research work in tourism forecasting. The work of [3] made contribution for academic researchers, industrial practitioners, and official policy makers by drawing attention to the importance and necessity of integrating Data Mining and Tourism forecasting fields.The authors S. Cankurt and A. Subasi [4] proposed a machine learning model which is deterministic generation of auxiliary variables, and contains the seasonal, cyclic and trend components of the time series associated with tourism demand. To test the contribution of the deterministically generated auxiliary variables, the authors [4] have employed multilayer preceptor (MLP) regression, and support vector regression (SVR) models. These models are used to make multivariate tourism forecasting for Turkey respected to two data sets: raw data set and data set with deterministica lly generated auxiliary variables. The forecasting performances were compared with respect to these two data two sets. The entropy evaluation measures –relative absolute error (RA E) and root relative squared error (RRSE) of the proposed machine learning models have achieved significantly better forecasting accuracy.The authors Chang-Jui Lin, et.al,[5] have studied three types of forecast models and A RIMA, ANN, and MARS, were used for the analysis. The aim of [5] was to find out the most accurate model for forecasting tourism demand. The results of this study exposed that the MAPE of the ARIMA forecast model is less than the other two models. ARIMA model showed the better forecasting ability. The MAPE of MARS found the highest values indicating that its forecasting ability is the worst. The MAPE of ANN was between the other two models, indicating that its forecasting ability is normal.The authors Haiyan Song and Gang Li [6] have reviewed and studied on tourism demand modeling and forecasting since 2000. The key finding of their review was, the methods used in analyzing and forecasting the demand for touris m. In addition to the most popular time series and econometric models, a number of new techniques have emerged in the literature. However, as far as the forecasting accuracy is concerned, the study showed that, there is no single model that consistently outperforms other models in all situations. Also, this study identified some new research directions, which include improving the forecasting accuracy through forecast combination; integrating both qualitative and quantitative forecasting approaches, tourism cycles and seasonality analysis, events’ impact assessment and risk forecasting.The authors Chang-Jui Lin and Tian-Shyug Lee [7] have developed tourism demand econometric models based on the monthly data of visiting of tourists to Taiwan and had adopt Multivariate Adaptive Regression Splines (MARS), Artificial Neural Network (ANN) and Support Vector Regression (SVR), MARS, A NN and SVR to develop forecast models. The forecast results of built models were compared. The results showed that SVR model is the optimal model, with a mean error rate of 3.61%, ANN model is the sub-optimal model, with a mean error rate of 7.08%, and MARS is the worst model, with a mean error rate of 11.26%.In concerned with the research in tourism, the author Erika kulcsár [8] analyzed the measure between GDP dependent variable in the sector of hotels and restaurants and the some other independent variables such as- overnight stays in the establishments of touristic reception, arrivals in the establishments of touristic reception and investments in hotels and restaurants sector in the period of 1995-2007. By using the multiple regression analysis technique, the paper [8] found that investments and tourist arrivals are significant predictors for the GDP dependent variable. Based on these results, the author [8] found those components of the marketing mix, which would contribute to the positive development of tourist arrivals in the establishments of touristic reception.The authors Panayiotis G.Curtis and Dimitris X. Kokotos [9] surveyed the managements of the various hotels and analyzed the survey data with the use of the Decision Tree tool. The issue of competitiveness of the tourism product was assessed. The development of alternative forms of tourism was proposed as a means of improving competitiveness and restoring sustainability in the tourism sector.III.P ROPOSED M ETHODThis section describes the detailed method of the proposed work. The various steps in the proposed methodology are shown in Fig.1. The proposed system model consists of the following components: Data Collection Process, Data Refinement process, Prediction and Classification Process, and finally, Result Analysis and KDD process. The function of each component is described in the following subsections.3.1 Data Collection ProcessThe dataset of Historical Monuments are collected from the year 2002 to 2013 from Indian Historical Tourism Statistics. The dataset contains the information of 55 historical monument places such as Taj Mahal, Agra, Ajanta, Ellora, Elephanta etc., and the number of visitors of the years from 2002 to 2013. The schematic structure of the dataset is represented in Fig.2.Fig.1. Proposed System ModelFig.2. Schematic Structure of the DatasetIn Data Mining strategies the data preprocessing step decides the quality of the result. Hence effective preprocessing is needed to make the dataset consistent for the experiment. The Historical Monument datasetcontains missing values in the numeric attri bute …Visitors‟. The details of the missing values in the dataset are represented in the Table 1.Table 1. Missing values details in the datasetTo handle the missing values present in the dataset, we plot the normal distribution of the numeric attribute …Visitors‟ and is shown in Fig. 3.Fig.3. Normal distribution of …Visitors‟.From the Fig.3, it is observed that, the normal distribution is positively skewed. Since, the normal distribution is positively skewed, we should employ median of the numeric attribute to fill the missing values. The numeric attribute is sorted and found the value 119145 as median. The median value 119145 is replaced with the missing values. The data are then stored in CSV or ARFF file format for effective experimental process. The each Historical Monument place is assigned with one of the class label {Low, Medium, High} based on their respective visitors statistics from the year 2002 to 2013.Fig.4. Concept Hierarchy for attribute …Visitors‟The discretization method is used to assign conceptual class labels for Indian Historical Monument Places based on the number of visitors. Firstly, the raw values of numeric attribute …Visitors‟ is replaced by the interval labels 0-50000, 50000-1000000, and 1000000-N, where N>1000000 respectively. Then the conceptual labels …Low‟, …Medium‟ and …High‟ are corresponds to theVisitors0-49999 5000-9999991000000-N LowHighMediumPositively Skewedabove intervals respectively. The labels are recursively organized into higher-level concepts which results in a concept hierarchy for the numeric attribute …Visitors‟. The Fig.4 represents concept hierarchy for the numeric attribute …Visitors‟.The Fig.5 represents a part (structure) of refined dataset after preprocessing and class label assignment.Fig.5. Schematic structure part of the Historical Monument dataset afterpreprocessing3.2 Classification ProcessThe following four classification models are used in the proposed experiment.i) Random Tree (RT) Classification Modelii) REPTree Classification Modeliii) Random Forest Classification Modeliv) J48 Classification ModelThe functionality of each cluster model is discussed as follows.3.2.1 Random Tree Classification ModelThe Random Decision Tree algorithm builds several decision trees randomly.The Random Tree Classification Model is built using WEKA and a part of the built tree is extracted and represented in the form of classification rules as shown in Table 2. When constructing each tree, the algorithm picks a "remaining" feature randomly at each node expansion without any purity function check such as- gini index, information gain etc. A categorical feature such as …High‟ is considered "remaining" if the same categorical feature of …High‟ has not been chosen before in a specific decision path starting from the root of tree to the present node. . Once a categorical feature such as …High‟ is taken, it is useless to choose it once more on the same decision path because every pattern in the same path will have the same value (either High, Medium or Low). On the other hand, a continuous feature such as …Visitors‟ can be selected more than once in the same decision path. Each moment the continuous feature is selected, a random threshold is chosen.A tree stops growing any deeper if one of the following conditions is met:a)There are no more examples to split in the currentnode or a node becomes emptyb)The depth of tree goes beyond some limitations.3.2.2. REPTree Classification ModelThe WEKA supports REPTree Classification Models which is well known as fast decision tree learner. The REPTree classifier is constructed using WEKA and a part of the built tree is extracted and represented in the form of classification rules as shown in Table 3.The REPTree Classification Model builds a decision/regression tree using information gain/variance and prunes it using reduced-error pruning (with back fitting). The procedure to obtain information gain of each attribute is discussed in our previous work [10]. The REPTree Classification Model sorts values for numeric attributes only one time.3.2.3 Random Forest Classification ModelRandom Forest classification model constructs many classification trees. For the classification of a new instance from an input vector space (dataset), put the input vector down each of the trees in the forest. Each tree will gives a classification (votes). The forest chooses the classification having the majority votes among over all the trees generated the forest. The WEKA supports Random Forest classification technique and constructs a forest of random trees.3.2.4 J48 Classification ModelJ48 is bespoke version of C4.5 classification algorithm. The J48 algorithm generates a classification-decision tree for the Historical Monument data-set by recursive partitioning the tuples. The J48 Tree classifier is built using WEKA and a part of the built tree is e xtracted and represented in the form of classification rules as shown in Table 4. The decision tree is grown using depth-first strategy. The algorithm considers all the possible tests that can split the Historical Monument data set and selects a test that gives the best information gain. For each Historical Monument numeric attribute values, binary tests involving every distinct values of the attribute are considered. In order to gather the information gain of all these binary tests efficiently, the information gain of the binary partition point based on each distinct values are calculated and sub trees are formed accordingly. This process is repeated for each attributes considered for classification.3.3 Result Analysis and KDD processThe classification results of all the four built models will be analyzed effectively using different performance evaluation metrics such as- Correctly Classified instances, incorrectly classified instances, TP rate, FP rate, Precision, Recall and F-Score etc. The results will be compared as a step in the process of knowledge discovery.Table 2. Random Tree Classification RulesTable 2: Random Tree Classification Rules==========Name = Taj Mahal,Agra| Visitors < 1804647 : medium (1/0)| Visitors >= 1804647 : high (11/0)Name = Agra Fort| Visitors < 996229 : medium (3/0)| Visitors >= 996229 : high (9/0)Name = Fatehpur Sikri : medium (12/0)Name = Akbar's Tomb, Agra| Visitors < 316046 : medium (8/0)| Visitors >= 316046| | Visitors < 368261 : low (1/0)| | Visitors >= 368261 : medium (3/0)Name = Mariams Tomb Agra : low (12/0)Name = Itimad-ud-Daula Agra : medium (12/0)Name = Ram Bagh Agra| Visitors < 33482.5 : low (8/0)| Visitors >= 33482.5| | Visitors < 50022| | | Visitors < 38288 : medium (1/0)| | | Visitors >= 38288 : low (1/0)| | Visitors >= 50022 : medium (2/0)Name = Methab Bagh Agra| Visitors < 42988.5| | Visitors < 13864 : low (6/0)| | Visitors >= 13864| | | Visitors < 19172 : medium (1/0)| | | Visitors >= 19172 : low (2/0)| Visitors >= 42988.5 : medium (3/0)Name = Ajanta Caves : medium (12/0)Name = Ellora Caves| Visitors < 963136.5 : medium (9/0)| Visitors >= 963136.5 : high (3/0)Name = Elephanta Caves : medium (12/0)Name = Pandavlena Caves| Visitors < 52869.5 : low (4/0)| Visitors >= 52869.5 : medium (8/0)Name = Daulata Bad Fort| Visitors < 126218 : low (1/0)| Visitors >= 126218 : medium (11/0)Name = Bibi-Ka Maqbara| Visitors < 994804 : medium (8/0)| Visitors >= 994804 : high (4/0)Name = Aurangabad Caves : low (12/0)Name = Kanheri Caves-- - - - - - - - - - - - - - - -- - - - - --- - - - -- - - - - -- - - -- - - - - - -- --- -- - - -- - - - - - - -- - - - - - - -- -Table 3. REPTree Classification RulesTable 3: REPTree Classification Rules============Name = Taj Mahal,Agra : high (6/1) [6/0]Name = Agra Fort| Visitors < 996229 : medium (2/0) [1/0]| Visitors >= 996229 : high (8/0) [1/0]Name = Fatehpur Sikri : medium (7/0) [5/0]Name = Akbar's Tomb, Agra : medium (9/1) [3/0]Name = Mariams Tomb Agra : low (8/0) [4/0] Name = Itimad-ud-Daula Agra : medium (11/0) [1/0] Name = Ram Bagh Agra : low (4/1) [8/2]Name = Methab Bagh Agra : low (10/4) [2/0]Name = Ajanta Caves : medium (7/0) [5/0]Name = Ellora Caves| Visitors < 963136.5 : medium (4/0) [5/0]| Visitors >= 963136.5 : high (2/0) [1/0]Name = Elephanta Caves : medium (10/0) [2/0]Name = Pandavlena Caves : medium (9/4) [3/0]Name = Daulata Bad Fort : medium (9/1) [3/0]Name = Bibi-Ka Maqbara| Visitors < 994804 : medium (8/0) [0/0]| Visitors >= 994804 : high (3/0) [1/0]Name = Aurangabad Caves : low (4/0) [8/0]Name = Kanheri Caves| Visitors < 61664 : low (3/0) [1/0]| Visitors >= 61664 : medium (4/0) [4/1]Name = Karla Caves : medium (8/0) [4/0]Name = Raigad Fort : medium (7/0) [5/0]Name = Shaniwarwada : medium (5/0) [7/0]Name = Hampi : medium (9/0) [3/0]Name = Daria Daulat Bagh : medium (9/0) [3/0]Name = Keshwa Temple Somnathpura : medium (5/0) [7/0] Name = Tipu sultan Palace : medium (4/0) [8/0]Name = Chitradurga Fort : medium (6/0) [6/0]Name = Bellary Fort : low (9/0) [3/0]Name = Khajuraho Monuments : medium (8/0) [4/0] Name = Shahi Quila Burhanpur| Visitors < 44868.5 : low (4/0) [2/0]| Visitors >= 44868.5 : medium (6/0) [0/0]Name = Bagh Caves : low (7/0) [5/0]Name = Royal Complex Mandu : medium (9/0) [3/0] Name = Ranirupavathi Museum : medium (7/0) [4/0] Name = Hoshang Shahs Tomb| Visitors < 68404 : low (4/0) [1/0]| Visitors >= 68404 : medium (5/0) [1/0]Name = Stupa Sanchi Monument : medium (7/5) [5/1] Name = Bhojshala Dharmoula Mosque : low (8/0) [3/0] Name = Gwalior Musem : medium (8/1) [4/0]Name = Buddist Monuments Sanchi : medium (9/0) [3/0] -- - - - - - - - -- - - - -- - - - - -- - - - - - - - - --- - - - - - -- -- - - - - - - - - -- - - - - - - - - - --- - - - - - -- - -- - - - - - - - - - - - - - - - - - - -Table 4. J48 pruned tree Classification Rules Table 4: J48 pruned tree Classification Rules------------------Name = Taj Mahal,Agra: high (12.0/1.0)Name = Agra Fort| Visitors <= 989804: medium (3.0)| Visitors > 989804: high (9.0)Name = Fatehpur Sikri: medium (12.0)Name = Akbar's Tomb, Agra: medium (12.0/1.0)Name = Mariams Tomb Agra: low (12.0)Name = Itimad-ud-Daula Agra: medium (12.0)Name = Ram Bagh Agra| Visitors <= 33281: low (8.0)| Visitors > 33281: medium (4.0/1.0)Name = Methab Bagh Agra| Visitors <= 42925: low (9.0/1.0)| Visitors > 42925: medium (3.0)Name = Ajanta Caves: medium (12.0)Name = Ellora Caves| Visitors <= 955677: medium (9.0)| Visitors > 955677: high (3.0)Name = Elephanta Caves: medium (12.0)Name = Pandavlena Caves| Visitors <= 52621: low (4.0)| Visitors > 52621: medium (8.0)Name = Daulata Bad Fort: medium (12.0/1.0)Name = Bibi-Ka Maqbara| Visitors <= 989804: medium (8.0)| Visitors > 989804: high (4.0)Name = Aurangabad Caves: low (12.0)Name = Kanheri Caves| Visitors <= 65358: low (5.0)| Visitors > 65358: medium (7.0)Name = Karla Caves: medium (12.0)Name = Raigad Fort: medium (12.0)Name = Shaniwarwada: medium (12.0)Name = Hampi: medium (12.0)Name = Daria Daulat Bagh: medium (12.0)Name = Keshwa Temple Somnathpura: medium (12.0)Name = Tipu sultan Palace: medium (12.0)Name = Chitradurga Fort: medium (12.0)Name = Bellary Fort: low (12.0)Name = Khajuraho Monuments: medium (12.0)Name = Shahi Quila Burhanpur| Visitors <= 46240: low (6.0)| Visitors > 46240: medium (6.0)Name = Bagh Caves: low (12.0)Name = Royal Complex Mandu: medium (12.0)Name = Ranirupavathi Museum: medium (11.0)Name = Hoshang Shahs Tomb| Visitors <= 68337: low (5.0)| Visitors > 68337: medium (6.0)Name = Stupa Sanchi Monument : medium (12.0/6.0)Name = Bhojshala Dharmoula Mosque: low (11.0)-- - - - - -- - - - - -- - - - -- - - - - -- - - -- - --- - - -- - - - -- - - - - -- - -- - - - - - - - -- - - -IV.R ESULTS AND D ISCUSSIONSThe Random Tree, REPTree, J48 and Random Forest Classification Models are built using 10 cross validation folds. To test the considered classification models for the experiment, 656 Historical Monument instances are taken and preprocessed using WEKA tool. The Table 5 represents result obtained by the Random Tree, REPTree, J48 and Random Forest classification models.The results describes performance evaluation metrics such as- correctly classified instances, incorrectly classified instances, TP rate, FP rate, Precision, Recall, F-Score. Out of 656 test instances, 626 tuples are correctly classified and 30 instances are incorrectly classified by the Random Tree Classification Model with 10 cross validation fold. The performance of the Random Tree Classification Model is found good with 95% accuracy. Similarly, out of 656 test instances, 604 tuples are correctly classified and 52 instances are incorrectly classified by the REPTree Classification Model. The efficiency of the REPTree Classification Model is found 92% with 10 cross validation method.The same test set is given to the Random Forest Classification Model as an input. The Random Forest Classification Model performed well on the test instances and exhibits 95% efficiency. 627 instances are correctly classified and 29 instances are incorrectly classified by the Random Forest Classification Model.The performance of J48 Classification Model is studied and tested for the test dataset which was used to test the other classification models. The classification performance is found good with 95% accuracy. It is observed from the experimental result that, there are minor differences in the performance evaluation metrics of Random Tree, Random Forest and J48 Classification Models. But the the performance of REPTree Classification Model reveals less efficiency as compared to remaining Classification Models.To study TP rate and FP rate of all the four Classification Models in depth, the experimental result is represented in the form of confusion matrices. The Table 6 represents confusion matrices obtained by the results of Random Tree and REPTree Classification Models and the confusion matrices obtained by the results of Random Forest and J48 Classification Models. In the confusion matrices, the column …a‟ and row …a‟ corresponds to the class label …High‟ and representing the Historical Monument places which are having more than 10,00,000 visitors per year. The column …b‟ and row …b‟ corresponds to the class label …Medium‟ and representing the Historical Monument places which are having more than 50,000 and less than 10, 00,000 visitors per year and finally, column …c‟ and row …c‟ corresponds to the class label …Low‟ and representing the Historical Monument places which are having less than 50,000 visitors per year. There are 39 Historical Monument places which are recognized with the class label …High‟, 389 places which are recognized with the class label …Medium‟, and 228 places which are recognized with the class label …Low‟. By looking at the confusion matrices as shown in Table 5 and 6, all the four classification models have more error rate (FP rate) while predicting the class label …Medium‟. The reason could be- the class label …Medium‟ has large number of tuples/instance as compared to the remaining class labels.During the experiment, the Random Tree, Random Forest and J48 Classification Models reveal same characteristics to classify Historical Monument places. Three Historical Monument places which are belongs to the class label …High‟ were wrongly classifi ed and assigned to …Medium‟ class label by the Random Tree, Random Forest and J48 Classification Models whereas 9 Historical Monument places were wrongly classified as …Medium‟ by the REPTree Classification Model.While classifying the Historical Monument places which are belongs to the class label …Medium‟, 4 Historical Monument places were wrongly classified and assigned to the class label …High‟, by the Random Tree, Random Forest, J48 Classification Models and 7 Historical Monument places were wrongly classified and assigned to the class label …High‟, by the REPTree Classification Model. 11 Historical Monument places were misclassified as …Low‟ which are originally belongs to the class label …Medium‟ by the Random Tree, Random Forest, J48 Classification Models whereas 17 Historical Monument places were wrongly classified and assigned to the class label …Low‟, by the REPTree Classification Model.In the classification of Historical Monument places which are belongs to the class label …Low‟, 216 places were correctly classified and 12 places were wrongly classified as …Medium‟ by the Random Tree and J48 Classification Models. The Random Forest Classification Model reveal better performance than the Random Tree and J48 Classification Models; 217 Historical Monument places were correctly classified and 11 places were。