利用序列模式树和基于内容过滤的个性化推荐-毕业论文外文文献翻译
- 格式:doc
- 大小:54.50 KB
- 文档页数:11
一种基于内容过滤的科技文献推荐算法
这种基于内容过滤的科技文献推荐算法(CF)是智能信息获取系统中
常用的识别技术,其目的是提供有意义的、可用的、准确的信息推荐。
一、什么是基于内容过滤的科技文献推荐算法?
基于内容过滤的科技文献推荐算法(CF)是一种自动推荐研究文献的
技术,它的主要思想是根据文献的关键字和内容信息,比较主题的相
似性,对文献进行内容和领域分类,为用户提供具有个性化特征的研
究文献推荐。
二、CF算法的运作流程
1. 首先,将文献信息通过归一化处理,进行特征提取,获取文献的关
键字信息、标题、作者及内容中出现的术语,并利用词频和文本相似
度算法对文献进行索引;
2. 索引结果再经过信息增强,为文献提取更多的特征信息,如抽取文
本的扩展词、词干等,从而提高算法推荐的准确性;
3. 根据得到的内容特征,利用(如)gensim计算出文献的语义空间与向量表示;
4. 根据文献的关键字特征,采用基于词袋模型的文本表示方法将文献进行词袋表示;
5. 根据文献的表示形式,利用余弦相似度计算文献之间的相似性,并计算出文献之间的相似度矩阵;
6. 依据相似度矩阵,为用户推荐文献信息。
三、CF的优缺点
(1)优点: CF能够权衡文献的内容特征和关键字特征,更具有灵活性,能够在大量文献数据中提取出用户最感兴趣的文献信息,从而提供准确精准的文献推荐;
(2)缺点:CF通过余弦相似度计算文献之间的相似性,仅仅能够检测出文献之间的表面相似性,无法体现出文献之间的联系性,对于文献间联系性较强的情况推荐效果不太理想。
基于内容的推荐(Content-basedRecommendations)[本⽂链接:,转载请注明出处]Collaborative Filtering Recommendations (协同过滤,简称CF) 是⽬前最流⾏的推荐⽅法,在研究界和⼯业界得到⼤量使⽤。
但是,⼯业界真正使⽤的系统⼀般都不会只有CF推荐算法,Content-based Recommendations (CB) 基本也会是其中的⼀部分。
产品(本⽂统称为item),为⽤户推荐和他过去喜欢的产品相似的产CB应该算是最早被使⽤的推荐⽅法吧,它根据⽤户过去喜欢的产品品。
例如,⼀个推荐饭店的系统可以依据某个⽤户之前喜欢很多的烤⾁店⽽为他推荐烤⾁店。
CB最早主要是应⽤在信息检索系统当中,所以很多信息检索及信息过滤⾥的⽅法都能⽤于CB中。
CB的过程⼀般包括以下三步:1. Item Representation:为每个item抽取出⼀些特征(也就是item的content了)来表⽰此item;2. Profile Learning:利⽤⼀个⽤户过去喜欢(及不喜欢)的item的特征数据,来学习出此⽤户的喜好特征(profile);3. Recommendation Generation:通过⽐较上⼀步得到的⽤户profile与候选item的特征,为此⽤户推荐⼀组相关性最⼤的item。
[3]中对于上⾯的三个步骤给出⼀张很细致的流程图(第⼀步对应着Content Analyzer,第⼆步对应着Profile Learner,第三步对应着Filtering Component):举个例⼦说明前⾯的三个步骤。
对于个性化阅读来说,⼀个item就是⼀篇⽂章。
根据上⾯的第⼀步,我们⾸先要从⽂章内容中抽取出代表它们的属性。
常⽤的⽅法就是利⽤出现在⼀篇⽂章中词来代表这篇⽂章,⽽每个词对应的权重往往使⽤信息检索中的tf-idf来计算。
⽐如对于本⽂来说,词“CB”、“推荐”和“喜好”的权重会⽐较⼤,⽽“烤⾁”这个词的权重会⽐较低。
《基于深度学习的推荐系统研究》篇一一、引言随着互联网技术的快速发展和大数据时代的到来,信息过载问题日益严重。
为了解决这一问题,推荐系统应运而生,并逐渐成为信息检索和个性化服务的重要工具。
传统的推荐系统主要基于协同过滤、内容过滤等方法,但在处理大规模、高维度的数据时,其准确性和效率均受到挑战。
近年来,深度学习技术的崛起为推荐系统的研究提供了新的思路和方法。
本文旨在研究基于深度学习的推荐系统,探讨其原理、方法及在实践中的应用。
二、深度学习推荐系统的原理与方法1. 深度学习原理深度学习是机器学习的一个分支,其通过构建多层神经网络来模拟人脑的神经网络结构,从而实现复杂模式的识别和预测。
在推荐系统中,深度学习可以通过分析用户的历史行为、兴趣偏好以及物品的属性、内容等信息,学习出用户和物品之间的潜在关系,从而为用户提供更准确的推荐。
2. 深度学习推荐系统的方法(1)基于协同过滤的深度学习推荐系统该方法将协同过滤的思想与深度学习技术相结合,通过神经网络学习用户和物品的潜在特征,从而进行推荐。
具体包括基于用户行为的协同过滤和基于物品属性的协同过滤等方法。
(2)基于内容的深度学习推荐系统该方法主要利用深度学习技术分析物品的内容信息以及用户的兴趣偏好,从而为用户推荐符合其需求的物品。
如利用卷积神经网络(CNN)或循环神经网络(RNN)等模型进行内容分析和特征提取。
(3)混合推荐系统混合推荐系统将多种推荐技术进行融合,以充分利用各种方法的优点。
在深度学习推荐系统中,可以将基于协同过滤和基于内容的推荐方法进行混合,以提高推荐的准确性和多样性。
三、深度学习推荐系统的应用1. 电商领域在电商领域,深度学习推荐系统可以根据用户的购物历史、浏览记录、搜索行为等信息,分析用户的兴趣偏好和需求,从而为用户推荐符合其需求的商品。
此外,还可以根据商品的属性、价格、销量等信息进行推荐,提高商品的转化率和销售额。
2. 视频推荐系统在视频推荐系统中,深度学习技术可以分析用户的观看历史、喜好以及视频的内容信息等,从而为用户推荐符合其兴趣的视频内容。
个性化推荐算法综述作者:孙光浩刘丹青李梦云来源:《软件》2017年第07期摘要:在现有文献统计下个性化推荐算法可以分为如下三类:基于内容的推荐(Content-based Recommendation)、基于协同过滤的推荐(CollaborativeFilteringbasedRecommendation),以及混合型推荐系统(Hybrid Recommendation)。
其中,基于协同过滤的推荐因其对专家知识依赖度低以及可以利用群体智慧等特点,得到了最为深入也最为广泛的研究,它又可以被分为多个子类别,主要包括基于用户的协同过滤(User-based CF),基于物品的协同过滤(Item-based CF),以及基于模型的协同过滤(Model-based CF),等。
其中基于模型的推荐是一类方法的统称,它指利用系统已有的数据和用户历史行为,学习和构建一个模型,进而利用该模型进行用户偏好建模、预测与个性化推荐,根据具体应用场景和可用数据的不同,这里的模型可以是常用的奇异值分解等矩阵分解模型,也可以是主题模型、人工神经网络、概率图模型、组合优化甚至深度学习等机器学习模型。
在下面的部分,我们将在如上几个方面对个性化推荐系统的研究现状进行具体的介绍。
关键词:推荐算法;协同过滤;个性化1研究背景随着互联网的迅速发展,个性化推荐系统已经逐渐成为各种网络应用中不可缺少的核心功能,并以各种各样的方式影响着人们日常生活的方方面面:电子商务网站中的购物推荐引擎为用户提供可能感兴趣的商品推荐;社交网络中的好友推荐为用户寻找潜在的好友关注;视频网站中的视频推荐为用户提供最可能点击的视频推荐;新闻门户网站中的内容推荐为用户提供最有信息量的新闻——个性化推荐技术已经是支撑互联网智能的基础技术之一。
2国内外现状互联网的快速发展开启了人类活动线上化的进程,越来越多传统上只能在线下完成的任务变得可以方便快捷地在互联网上完成。
已经深入人们日常生活中的电子商务就是这一进程的典型代表,例如阿里巴巴、京东商城、亚马逊网络商城等电子商务网站的普及,使得人们不必走出家门即可购买自己所需要的商品,并且可以在更多的备选商品中进行挑选。
《基于深度学习的学术论文个性化推荐方法研究》篇一一、引言随着互联网技术的快速发展,学术资源的数字化与网络化已经成为趋势。
海量的学术论文不仅为学术研究提供了丰富的资源,同时也给学者们带来了信息过载的问题。
如何有效地从海量学术资源中筛选出符合用户兴趣和需求的论文,成为了一个亟待解决的问题。
为此,本文提出了一种基于深度学习的学术论文个性化推荐方法,旨在提高学术论文推荐的准确性和效率。
二、相关工作在学术论文推荐领域,传统的推荐方法主要基于协同过滤、内容过滤等方法。
然而,这些方法往往无法充分挖掘学术论文的深度信息,且对于新用户和新论文的冷启动问题难以有效解决。
近年来,深度学习在推荐系统中的应用逐渐成为研究热点。
深度学习能够通过学习数据的深层特征,提高推荐的准确性和个性化程度。
因此,本文采用深度学习方法,对学术论文个性化推荐方法进行研究。
三、方法本文提出的基于深度学习的学术论文个性化推荐方法主要包括以下几个步骤:1. 数据准备:收集学术论文的元数据、作者信息、关键词等信息,构建学术论文数据集。
同时,收集用户的行为数据,如浏览、点击、下载等行为,以及用户的个人信息,如研究领域、兴趣偏好等。
2. 数据预处理:对收集到的数据进行清洗、去重、格式化等处理,以便于后续的深度学习模型训练。
3. 深度学习模型构建:采用深度神经网络(DNN)或卷积神经网络(CNN)等深度学习模型,对学术论文的文本内容进行语义分析和特征提取。
同时,结合用户行为数据和用户个人信息,构建用户兴趣模型。
4. 推荐算法设计:根据用户兴趣模型和学术论文的特征,设计个性化的推荐算法。
可以采用基于内容的推荐、协同过滤的推荐等方法,结合深度学习模型的输出结果,生成个性化的推荐列表。
5. 推荐结果评估:通过用户满意度、推荐准确率、召回率等指标,对推荐结果进行评估和优化。
四、实验与分析本文采用某学术数据库的论文数据和用户行为数据进行了实验。
实验结果表明,基于深度学习的学术论文个性化推荐方法能够有效地提高推荐的准确性和个性化程度。
基于内容推荐算法的个性化推荐系统设计与实现随着人们对互联网的依赖度越来越高,个性化推荐系统已经成为各大网站和APP中最常见的功能之一。
这种系统可以根据用户过去的浏览和搜索行为以及其他相关信息,提供与用户个人兴趣和需求相匹配的内容。
其中,基于内容推荐算法被广泛应用于各种个性化推荐系统中。
本文将探讨基于内容推荐算法的个性化推荐系统的设计和实现。
一、基于内容推荐算法的原理基于内容推荐算法是一种利用物品(item)的内容特征来进行推荐的算法。
它可以通过计算物品之间的相似度,将用户对已知物品的偏好推广到其他未知物品上。
其基本原理如下:1. 物品表示在基于内容推荐算法中,每个物品都需要被表示成一个向量或特征集合,使得算法可以用向量之间的距离或相似度来计算它们之间的相似性。
例如,在一个音乐推荐系统中,可以用歌曲的名称、歌曲的时长、演唱者等信息来表示一首歌曲。
2. 特征提取为了将物品表示成向量或特征集合,需要进行特征提取。
这个过程通常是将物品的内容转换为数字形式。
在音乐推荐系统中,可以将歌曲转换成数字表示,如音乐频域、时域信息等。
这个过程需要根据物品的类型和使用场景进行不同的处理。
3. 相似度计算物品的相似度可以通过计算向量之间的距离或相似度来完成。
例如,在基于欧式距离(Euclidean distance)的相似度计算中,可以计算两个向量之间的距离,然后将距离越小的物品视为越相似。
4. 推荐结果生成根据相似度计算的结果,可以选择与用户查看历史记录相似度较高的物品来进行推荐。
推荐结果通常是按照相似度从大到小排序,然后从中选择一定数量的物品来呈现给用户。
这些呈现的物品是根据用户过去的兴趣和互动方式进行筛选的。
二、基于内容推荐算法的个性化推荐系统设计基于内容推荐算法的个性化推荐系统设计通常包括以下几个步骤:1. 数据收集为了搭建一个个性化推荐系统,首先需要收集用户行为数据和物品数据。
用户行为数据通常包括浏览历史、搜索查询、购买记录等;物品数据则包括物品的属性、描述、标签等。
《基于深度学习的学术论文个性化推荐方法研究》篇一一、引言随着互联网的飞速发展,学术论文的数量呈爆炸性增长,这为学术研究人员带来了巨大的挑战。
如何在海量的学术论文中快速找到自己感兴趣的文献成为了一个亟待解决的问题。
个性化推荐系统因此应运而生,成为了解决这一问题的有效途径。
传统的推荐方法主要基于协同过滤、内容过滤等算法,然而这些方法在处理学术论文这类复杂数据时存在诸多不足。
因此,本文提出了一种基于深度学习的学术论文个性化推荐方法,以期提高推荐系统的准确性和效率。
二、研究背景及现状目前,深度学习在许多领域已经取得了显著的成果,尤其在推荐系统领域。
学术论文作为知识传播和学术交流的重要载体,其个性化推荐显得尤为重要。
传统的学术论文推荐方法主要基于文献的引文关系、作者关系以及关键词等特征进行推荐,但这些方法往往忽略了用户的行为和兴趣偏好。
而基于深度学习的推荐方法可以通过分析用户的历史行为、兴趣偏好以及文献的语义信息等,实现更准确的个性化推荐。
三、方法论本文提出的基于深度学习的学术论文个性化推荐方法主要包括以下几个步骤:1. 数据预处理:收集学术论文的元数据、引文关系、作者关系以及用户的行为和兴趣偏好等数据,进行清洗、去重、归一化等处理,以便后续分析。
2. 特征提取:利用深度学习技术,从原始数据中提取出有用的特征,如文献的语义信息、用户的兴趣偏好等。
3. 模型构建:构建深度学习模型,如卷积神经网络(CNN)、循环神经网络(RNN)或深度神经网络(DNN)等,用于分析用户的行为和兴趣偏好以及文献的语义信息等。
4. 训练与优化:利用大量的训练数据对模型进行训练,通过调整模型的参数和结构,优化模型的性能。
5. 推荐生成:根据用户的兴趣偏好和历史行为等信息,以及训练好的模型,生成个性化的学术论文推荐结果。
四、实验与分析为了验证本文提出的个性化推荐方法的有效性,我们进行了大量的实验。
实验数据来源于某学术数据库中的学术论文以及用户的浏览、下载等行为数据。
基于最大频繁序列模式树的个性化页面推荐
谭小球;姚敏;顾沈明
【期刊名称】《微电子学与计算机》
【年(卷),期】2006(23)9
【摘要】提出一种基于最大频繁序列模式的页面推荐技术,由于考虑了用户会话的页面访问顺序,比一些不考虑页面访问顺序的推荐技术有更高的准确率。
通过引入一树型结构,其上压缩存储了所有最大频繁序列,由于前缀相同的序列共享共同的树结点,从而大大节省了存储空间。
推荐引擎截取用户活动会话中最近被访问的页面子序列,与树的部分路径进行匹配,无需在整个模式库中搜索相同或相似的模式,加快模式匹配的速度,更好地满足页面推荐的实时要求。
实验证明,方法是有效的。
【总页数】4页(P108-111)
【关键词】最大频繁序列模式;个性化推荐;Web使用挖掘;页面关联规则
【作者】谭小球;姚敏;顾沈明
【作者单位】浙江海洋学院信息学院;浙江大学计算机学院
【正文语种】中文
【中图分类】TP31
【相关文献】
1.基于序列模式的个性化Web页面推荐模型 [J], 易明
2.基于频繁模式树的最大频繁模式挖掘算法 [J], 缪裕青
3.改进的基于频繁模式树的最大频繁项集挖掘算法——FP-MFIA [J], 杨鹏坤;彭慧;
周晓锋;孙玉庆
4.基于改进频繁模式树的最大频繁项目集\r更新挖掘算法 [J], 赵群礼;郭玉堂;史君华
5.基于改进频繁模式树的最大频繁项目集更新挖掘算法 [J], 赵群礼;郭玉堂;史君华;因版权原因,仅展示原文概要,查看原文内容请购买。
中英⽂双语外⽂⽂献翻译:⼀种基于...此⽂档是毕业设计外⽂翻译成品(含英⽂原⽂+中⽂翻译),⽆需调整复杂的格式!下载之后直接可⽤,⽅便快捷!本⽂价格不贵,也就⼏⼗块钱!⼀辈⼦也就⼀次的事!英⽂3890单词,20217字符(字符就是印刷符),中⽂6398汉字。
A Novel Divide-and-Conquer Model for CPI Prediction UsingARIMA, Gray Model and BPNNAbstract:This paper proposes a novel divide-and-conquer model for CPI prediction with the existing compilation method of the Consumer Price Index (CPI) in China. Historical national CPI time series is preliminary divided into eight sub-indexes including food, articles for smoking and drinking, clothing, household facilities, articles and maintenance services, health care and personal articles, transportation and communication, recreation, education and culture articles and services, and residence. Three models including back propagation neural network (BPNN) model, grey forecasting model (GM (1, 1)) and autoregressive integrated moving average (ARIMA) model are established to predict each sub-index, respectively. Then the best predicting result among the three models’for each sub-index is identified. To further improve the performance, special modification in predicting method is done to sub-CPIs whose forecasting results are not satisfying enough. After improvement and error adjustment, we get the advanced predicting results of the sub-CPIs. Eventually, the best predicting results of each sub-index are integrated to form the forecasting results of the national CPI. Empirical analysis demonstrates that the accuracy and stability of the introduced method in this paper is better than many commonly adopted forecasting methods, which indicates the proposed method is an effective and alternative one for national CPI prediction in China.1.IntroductionThe Consumer Price Index (CPI) is a widely used measurement of cost of living. It not only affects the government monetary, fiscal, consumption, prices, wages, social security, but also closely relates to the residents’daily life. As an indicator of inflation in China economy, the change of CPI undergoes intense scrutiny. For instance, The People's Bank of China raised the deposit reserve ratio in January, 2008 before the CPI of 2007 was announced, for it is estimated that the CPI in 2008 will increase significantly if no action is taken. Therefore, precisely forecasting the change of CPI is significant to many aspects of economics, some examples include fiscal policy, financial markets and productivity. Also, building a stable and accurate model to forecast the CPI will have great significance for the public, policymakers and research scholars.Previous studies have already proposed many methods and models to predict economic time series or indexes such as CPI. Some previous studies make use of factors that influence the value of the index and forecast it by investigating the relationship between the data of those factors and the index. These forecasts are realized by models such as Vector autoregressive (VAR)model1 and genetic algorithms-support vector machine (GA-SVM) 2.However, these factor-based methods, although effective to some extent, simply rely on the correlation between the value of the index and limited number of exogenous variables (factors) and basically ignore the inherent rules of the variation of the time series. As a time series itself contains significant amount of information3, often more than a limited number of factors can do, time series-based models are often more effective in the field of prediction than factor-based models.Various time series models have been proposed to find the inherent rules of the variation in the series. Many researchers have applied different time series models to forecasting the CPI and other time series data. For example, the ARIMA model once served as a practical method in predicting the CPI4. It was also applied to predict submicron particle concentrations frommeteorological factors at a busy roadside in Hangzhou, China5. What’s more, the ARIMA model was adopted to analyse the trend of pre-monsoon rainfall data forwestern India6. Besides the ARIMA model, other models such as the neural network, gray model are also widely used in the field of prediction. Hwang used the neural-network to forecast time series corresponding to ARMA (p, q) structures and found that the BPNNs generally perform well and consistently when a particular noise level is considered during the network training7. Aiken also used a neural network to predict the level of CPI and reached a high degree of accuracy8. Apart from the neural network models, a seasonal discrete grey forecasting model for fashion retailing was proposed and was found practical for fashion retail sales forecasting with short historical data and better than other state-of-art forecastingtechniques9. Similarly, a discrete Grey Correlation Model was also used in CPI prediction10. Also, Ma et al. used gray model optimized by particle swarm optimization algorithm to forecast iron ore import and consumption of China11. Furthermore, to deal with the nonlinear condition, a modified Radial Basis Function (RBF) was proposed by researchers.In this paper, we propose a new method called “divide-and-conquer model”for the prediction of the CPI.We divide the total CPI into eight categories according to the CPI construction and then forecast the eight sub- CPIs using the GM (1, 1) model, the ARIMA model and the BPNN. To further improve the performance, we again make prediction of the sub-CPIs whoseforecasting results are not satisfying enough by adopting new forecasting methods. After improvement and error adjustment, we get the advanced predicting results of the sub-CPIs. Finally we get the total CPI prediction by integrating the best forecasting results of each sub-CPI.The rest of this paper is organized as follows. In section 2, we give a brief introduction of the three models mentioned above. And then the proposed model will be demonstrated in the section 3. In section 4 we provide the forecasting results of our model and in section 5 we make special improvement by adjusting the forecasting methods of sub-CPIs whose predicting results are not satisfying enough. And in section 6 we give elaborate discussion and evaluation of the proposed model. Finally, the conclusion is summarized in section 7.2.Introduction to GM(1,1), ARIMA & BPNNIntroduction to GM(1,1)The grey system theory is first presented by Deng in 1980s. In the grey forecasting model, the time series can be predicted accurately even with a small sample by directly estimating the interrelation of data. The GM(1,1) model is one type of the grey forecasting which is widely adopted. It is a differential equation model of which the order is 1 and the number of variable is 1, too. The differential equation is:Introduction to ARIMAAutoregressive Integrated Moving Average (ARIMA) model was first put forward by Box and Jenkins in 1970. The model has been very successful by taking full advantage of time series data in the past and present. ARIMA model is usually described as ARIMA (p, d, q), p refers to the order of the autoregressive variable, while d and q refer to integrated, and moving average parts of the model respectively. When one of the three parameters is zero, the model is changed to model “AR”, “MR”or “ARMR”. When none of the three parameters is zero, the model is given by:where L is the lag number,?t is the error term.Introduction to BPNNArtificial Neural Network (ANN) is a mathematical and computational model which imitates the operation of neural networks of human brain. ANN consists of several layers of neurons. Neurons of contiguous layers are connected with each other. The values of connections between neurons are called “weight”. Back Propagation Neural Network (BPNN) is one of the most widely employed neural network among various types of ANN. BPNN was put forward by Rumelhart and McClelland in 1985. It is a common supervised learning network well suited for prediction. BPNN consists of three parts including one input layer, several hidden layers and one output layer, as is demonstrated in Fig 1. The learning process of BPNN is modifying the weights of connections between neurons based on the deviation between the actual output and the target output until the overall error is in the acceptable range.Fig. 1. Back-propagation Neural Network3.The Proposed MethodThe framework of the dividing-integration modelThe process of forecasting national CPI using the dividing-integration model is demonstrated in Fig 2.Fig. 2.The framework of the dividing-integration modelAs can be seen from Fig. 2, the process of the proposed method can be divided into the following steps: Step1: Data collection. The monthly CPI data including total CPI and eight sub-CPIs are collected from the official website of China’s State Statistics Bureau (/doc/d62de4b46d175f0e7cd184254b35eefdc9d31514.html /).Step2: Dividing the total CPI into eight sub-CPIs. In this step, the respective weight coefficient of eight sub- CPIs in forming the total CPI is decided by consulting authoritative source .(/doc/d62de4b46d175f0e7cd184254b35eefdc9d31514.html /). The eight sub-CPIs are as follows: 1. Food CPI; 2. Articles for Smoking and Drinking CPI; 3. Clothing CPI; 4. Household Facilities, Articles and Maintenance Services CPI; 5. Health Care and Personal Articles CPI; 6. Transportation and Communication CPI;7. Recreation, Education and Culture Articles and Services CPI; 8. Residence CPI. The weight coefficient of each sub-CPI is shown in Table 8.Table 1. 8 sub-CPIs weight coefficient in the total indexNote: The index number stands for the corresponding type of sub-CPI mentioned before. Other indexes appearing in this paper in such form have the same meaning as this one.So the decomposition formula is presented as follows:where TI is the total index; Ii (i 1,2, ,8) are eight sub-CPIs. To verify the formula, we substitute historical numeric CPI and sub-CPI values obtained in Step1 into the formula and find the formula is accurate.Step3: The construction of the GM (1, 1) model, the ARIMA (p, d, q) model and the BPNN model. The three models are established to predict the eight sub-CPIs respectively.Step4: Forecasting the eight sub-CPIs using the three models mentioned in Step3 and choosing the best forecasting result for each sub-CPI based on the errors of the data obtained from the three models.Step5: Making special improvement by adjusting the forecasting methods of sub-CPIs whose predicting results are not satisfying enough and get advanced predicting results of total CPI. Step6: Integrating the best forecasting results of 8 sub-CPIs to form the prediction of total CPI with the decomposition formula in Step2.In this way, the whole process of the prediction by the dividing-integration model is accomplished.3.2. The construction of the GM(1,1) modelThe process of GM (1, 1) model is represented in the following steps:Step1: The original sequence:Step2: Estimate the parameters a and u using the ordinary least square (OLS). Step3: Solve equation as follows.Step4: Test the model using the variance ratio and small error possibility.The construction of the ARIMA modelFirstly, ADF unit root test is used to test the stationarity of the time series. If the initial time series is not stationary, a differencing transformation of the data is necessary to make it stationary. Then the values of p and q are determined by observing the autocorrelation graph, partial correlation graph and the R-squared value.After the model is built, additional judge should be done to guarantee that the residual error is white noise through hypothesis testing. Finally the model is used to forecast the future trend ofthe variable.The construction of the BPNN modelThe first thing is to decide the basic structure of BP neural network. After experiments, we consider 3 input nodes and 1 output nodes to be the best for the BPNN model. This means we use the CPI data of time , ,toforecast the CPI of time .The hidden layer level and the number of hidden neurons should also be defined. Since the single-hidden- layer BPNN are very good at non-liner mapping, the model is adopted in this paper. Based on the Kolmogorov theorem and testing results, we define 5 to be the best number of hidden neurons. Thus the 3-5-1 BPNN structure is determined.As for transferring function and training algorithm, we select ‘tansig’as the transferring function for middle layer, ‘logsig’for input layer and ‘traingd’as training algorithm. The selection is based on the actual performance of these functions, as there are no existing standards to decide which ones are definitely better than others.Eventually, we decide the training times to be 35000 and the goal or the acceptable error to be 0.01.4.Empirical AnalysisCPI data from Jan. 2012 to Mar. 2013 are used to build the three models and the data from Apr. 2013 to Sept. 2013 are used to test the accuracy and stability of these models. What’s more, the MAPE is adopted to evaluate the performance of models. The MAPE is calculated by the equation:Data sourceAn appropriate empirical analysis based on the above discussion can be performed using suitably disaggregated data. We collect the monthly data of sub-CPIs from the website of National Bureau of Statistics of China(/doc/d62de4b46d175f0e7cd184254b35eefdc9d31514.html /).Particularly, sub-CPI data from Jan. 2012 to Mar. 2013 are used to build the three models and the data from Apr. 2013 to Sept. 2013 are used to test the accuracy and stability of these models.Experimental resultsWe use MATLAB to build the GM (1,1) model and the BPNN model, and Eviews 6.0 to build the ARIMA model. The relative predicting errors of sub-CPIs are shown in Table 2.Table 2.Error of Sub-CPIs of the 3 ModelsFrom the table above, we find that the performance of different models varies a lot, because the characteristic of the sub-CPIs are different. Some sub-CPIs like the Food CPI changes drastically with time while some do not have much fluctuation, like the Clothing CPI. We use different models to predict the sub- CPIs and combine them by equation 7.Where Y refers to the predicted rate of the total CPI, is the weight of the sub-CPI which has already been shown in Table1and is the predicted value of the sub-CPI which has the minimum error among the three models mentioned above. The model chosen will be demonstrated in Table 3:Table 3.The model used to forecastAfter calculating, the error of the total CPI forecasting by the dividing-integration model is 0.0034.5.Model Improvement & Error AdjustmentAs we can see from Table 3, the prediction errors of sub-CPIs are mostly below 0.004 except for two sub- CPIs: Food CPI whose error reaches 0.0059 and Transportation & Communication CPI 0.0047.In order to further improve our forecasting results, we modify the prediction errors of the two aforementioned sub-CPIs by adopting other forecasting methods or models to predict them. The specific methods are as follows.Error adjustment of food CPIIn previous prediction, we predict the Food CPI using the BPNN model directly. However, the BPNN model is not sensitive enough to investigate the variation in the values of the data. For instance, although the Food CPI varies a lot from month to month, the forecasting values of it are nearly all around 103.5, which fails to make meaningful prediction.We ascribe this problem to the feature of the training data. As we can see from the original sub-CPI data on the website of National Bureau of Statistics of China, nearly all values of sub-CPIs are around 100. As for Food CPI, although it does have more absolute variations than others, its changes are still very small relative to the large magnitude of the data (100). Thus it will be more difficult for the BPNN model to detect the rules of variations in training data and the forecastingresults are marred.Therefore, we use the first-order difference series of Food CPI instead of the original series to magnify the relative variation of the series forecasted by the BPNN. The training data and testing data are the same as that in previous prediction. The parameters and functions of BPNN are automatically decided by the software, SPSS.We make 100 tests and find the average forecasting error of Food CPI by this method is 0.0028. The part of the forecasting errors in our tests is shown as follows in Table 4:Table 4.The forecasting errors in BPNN testError adjustment of transportation &communication CPIWe use the Moving Average (MA) model to make new prediction of the Transportation and Communication CPI because the curve of the series is quite smooth with only a few fluctuations. We have the following equation(s):where X1, X2…Xn is the time series of the Transportation and Communication CPI, is the value of moving average at time t, is a free parameter which should be decided through experiment.To get the optimal model, we range the value of from 0 to 1. Finally we find that when the value of a is 0.95, the forecasting error is the smallest, which is 0.0039.The predicting outcomes are shown as follows in Table5:Table 5.The Predicting Outcomes of MA modelAdvanced results after adjustment to the modelsAfter making some adjustment to our previous model, we obtain the advanced results as follows in Table 6: Table 6.The model used to forecast and the Relative ErrorAfter calculating, the error of the total CPI forecasting by the dividing-integration model is 0.2359.6.Further DiscussionTo validate the dividing-integration model proposed in this paper, we compare the results of our model with the forecasting results of models that do not adopt the dividing-integration method. For instance, we use the ARIMA model, the GM (1, 1) model, the SARIMA model, the BRF neural network (BRFNN) model, the Verhulst model and the Vector Autoregression (VAR) model respectively to forecast the total CPI directly without the process of decomposition and integration. The forecasting results are shown as follows in Table7.From Table 7, we come to the conclusion that the introduction of dividing-integration method enhances the accuracy of prediction to a great extent. The results of model comparison indicate that the proposed method is not only novel but also valid and effective.The strengths of the proposed forecasting model are obvious. Every sub-CPI time series have different fluctuation characteristics. Some are relatively volatile and have sharp fluctuations such as the Food CPI while others are relatively gentle and quiet such as the Clothing CPI. As a result, by dividing the total CPI into several sub-CPIs, we are able to make use of the characteristics of each sub-CPI series and choose the best forecasting model among several models for every sub-CPI’s prediction. Moreover, the overall prediction error is provided in the following formula:where TE refers to the overall prediction error of the total CPI, is the weight of the sub-CPI shown in table 1 and is the forecasting error of corresponding sub-CPI.In conclusion, the dividing-integration model aims at minimizing the overall prediction errors by minimizing the forecasting errors of sub-CPIs.7.Conclusions and future workThis paper creatively transforms the forecasting of national CPI into the forecasting of 8 sub-CPIs. In the prediction of 8 sub-CPIs, we adopt three widely used models: the GM (1, 1) model, the ARIMA model and the BPNN model. Thus we can obtain the best forecasting results for each sub-CPI. Furthermore, we make special improvement by adjusting the forecasting methods of sub-CPIs whose predicting results are not satisfying enough and get the advanced predicting results of them. Finally, the advanced predicting results of the 8 sub- CPIs are integrated to formthe forecasting results of the total CPI.Furthermore, the proposed method also has several weaknesses and needs improving. Firstly, The proposed model only uses the information of the CPI time series itself. If the model can make use of other information such as the information provided by factors which make great impact on the fluctuation of sub-CPIs, we have every reason to believe that the accuracy and stability of the model can be enhanced. For instance, the price of pork is a major factor in shaping the Food CPI. If this factor is taken into consideration in the prediction of Food CPI, the forecasting results will probably be improved to a great extent. Second, since these models forecast the future by looking at the past, they are not able to sense the sudden or recent change of the environment. So if the model can take web news or quick public reactions with account, it will react much faster to sudden incidence and affairs. Finally, the performance of sub-CPIs prediction can be higher. In this paper we use GM (1, 1), ARIMA and BPNN to forecast sub-CPIs. Some new method for prediction can be used. For instance, besides BPNN, there are other neural networks like genetic algorithm neural network (GANN) and wavelet neural network (WNN), which might have better performance in prediction of sub-CPIs. Other methods such as the VAR model and the SARIMA model should also be taken into consideration so as to enhance the accuracy of prediction.References1.Wang W, Wang T, and Shi Y. Factor analysis on consumer price index rising in China from 2005 to 2008. Management and service science 2009; p. 1-4.2.Qin F, Ma T, and Wang J. The CPI forecast based on GA-SVM. Information networking and automation 2010; p. 142-147.3.George EPB, Gwilym MJ, and Gregory CR. Time series analysis: forecasting and control. 4th ed. Canada: Wiley; 20084.Weng D. The consumer price index forecast based on ARIMA model. WASE International conferenceon information engineering 2010;p. 307-310.5.Jian L, Zhao Y, Zhu YP, Zhang MB, Bertolatti D. An application of ARIMA model to predict submicron particle concentrations from meteorological factors at a busy roadside in Hangzhou, China. Science of total enviroment2012;426:336-345.6.Priya N, Ashoke B, Sumana S, Kamna S. Trend analysis and ARIMA modelling of pre-monsoon rainfall data forwestern India. Comptesrendus geoscience 2013;345:22-27.7.Hwang HB. Insights into neural-network forecasting of time seriescorresponding to ARMA(p; q) structures. Omega2001;29:273-289./doc/d62de4b46d175f0e7cd184254b35eefdc9d31514.html am A. Using a neural network to forecast inflation. Industrial management & data systems 1999;7:296-301.9.Min X, Wong WK. A seasonal discrete grey forecasting model for fashion retailing. Knowledge based systems 2014;57:119-126.11. Weimin M, Xiaoxi Z, Miaomiao W. Forecasting iron ore import and consumption of China using grey model optimized by particleswarm optimization algorithm. Resources policy 2013;38:613-620.12. Zhen D, and Feng S. A novel DGM (1, 1) model for consumer price index forecasting. Greysystems and intelligent services (GSIS)2009; p. 303-307.13. Yu W, and Xu D. Prediction and analysis of Chinese CPI based on RBF neural network. Information technology and applications2009;3:530-533.14. Zhang GP. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003;50:159-175.15. Pai PF, Lin CS. A hybrid ARIMA and support vector machines model in stock price forecasting. Omega 2005;33(6):497-505.16. Tseng FM, Yu HC, Tzeng GH. Combining neural network model with seasonal time series ARIMA model. Technological forecastingand social change 2002;69(1):71-87.17.Cho MY, Hwang JC, Chen CS. Customer short term load forecasting by using ARIMA transfer function model. Energy management and power delivery, proceedings of EMPD'95. 1995 international conference on IEEE, 1995;1:317-322.译⽂:⼀种基于ARIMA、灰⾊模型和BPNN对CPI(消费物价指数)进⾏预测的新型分治模型摘要:在本⽂中,利⽤我国现有的消费者价格指数(CPI)的计算⽅法,提出了⼀种新的CPI预测分治模型。
附录1 英文原文Personalized recommendation of learning material using sequential pattern mining and attribute based collaborative filteringAbstractMaterial recommender system is a significant part of e-learning systems for personalization and recommendation of appropriate materials to learners. However, in the existing recommendation algorithms, dynamic interests and multi-preference of learners and multidimensional-attribute of materials are not fully considered simul-taneously. Moreover, these algorithms can not effectively use the learner’s historical sequential patterns of material accessing in recommendation. For addressing these problems and improving the accuracy and quality of recommendation, a new material recommender system framework based on sequential pattern mining and multidimensional attribute-based collaborative filtering (CF) is proposed. In the sequential pattern based approach, modified Apriori and PrefixSpan algorithms are implemented to discover latent patterns in accessing of materials and use them for recommendation. Leaner Preference Tree (LPT) is introduced to take into account multidimensional-attribute of materials, and learners’ rating and model dynamic and multi-preference of learners in the multidimensional attribute-based CF ap-proach. Finally, the recommendation results of two approaches are combined using cascade, weighted and mixed methods. The proposed method outperforms the previous algorithms on the classification accuracy measures and the learner’s real learning preference can be satisfied accurately according to the real-time up dated contextual information.Keywords: Personalized recommendation ;Apriori algorithm;. Learning material ;e-learning ; Dynamic preference ; Multi-attributeWith growth of many online learning systems, a huge amount of e-learning materials have been generated which are highly heterogeneous and in various media formats (Chen et al. 2012). Therefore, in this situation, it is quite difficult to findsuitable learning materials ba sed on learner’s preference. The task of delivering personalized learning material is often framed in terms of a recommendation task in which a system recommends items to an active user (Mobasher 2007). Therefore, recom-mender systems have been used for e-learning environments to recommend useful materials to users. These systems address information overload and make a personal learning environment (PLE) for users. The motivation for any recommender system is to assure an efficient use of available materials. Using this approach, we can improve a personal learning path according to pedagogical issues and available material.In the recent years, recommender system is being deployed in more and more e-commerce entities to best expres s and accommodate customer’s interests. According to the strategies applied, they can be divided into three major categories: content-based, collaborative, and hybrid recommendation (Adomavicius and Tuzhilin 2005). Content-based recommendation is derived from Information Retrieval. A content-based recommendation algorithm identifies and extracts features of items and user and then builds a matching model for them. Recommendations are made based on comparison of user’s preference and item’s features. On the other hand, the main idea of collaborative filtering is grouping like-minded users together. These systems are also called clique-based systems. It is assumed that users who had similar choices before will make the same selection in the future. Collaborative recommender systems give users suggestion by observing the neighbor of the user. Hybrid recom-mendation mechanisms attempt to deal with some of limitation and overcome draw-backs of pure content-based approach and pure collaborative approach by combining the two approaches.There are several drawbacks when applying existing recommendation algorithms to e-learning environments directly:Since the learning process is repeatable and periodic, there are some intrinsic orders for le arning material in users’ learning processes that can present material access patterns. This information can reflect the learner’s latent preference. But, most of existing recommendation systems don’t use this information. To imple-ment asequential pattern based recommendation, the new algorithms are pre-sented in this research.Some of traditional recommendation algorithms only use learners’ rating for recommendation and don’t consider attributes of learners and learning materials. To model multi-preference of learner this research takes into account multidimensional-attribute of materials and learners’ rating matrix in the unified model.The learners’ preferences will be changing dynamically. Therefore, to make good recommendation in time when learners’current interests are changing, a recom-mendation algorithm must trace learner behaviour to propose dynamic recom-mendation. Thus, this research implements a dynamic approach for producing recommendations in the multidimensional attribute-based CF.According to the described drawbacks, this paper proposes a new material recom-mender system framework and relevant recommendation algorithms for e-learning environments. First, in the multidimensional attribute-based CF recommendation approach, to reflect lear ner’s complete spectrum of interests, Leaner Preference Tree (LPT) is introduced to consider multidimensional-attributes of materials, learn-er’s rating simultaneously. Truly, Leaner Preference Tree is built based on target learner’s historical access reco rds and multidimensional-attributes of materials. Then, a new similarity measure that can take into account the information of LPTs for calculating similarity between learners is introduced. In the sequential pattern based recommendation approach, to discover the latent patterns of accessed materials and give recommendation, the weighted association rules (Apriori algorithm) and PrefixSpan algorithm are implemented. The results of two approaches are combined to create final recommendations.The rest of this paper is Literature survey,In Literature survey section, the previous related works on e-learning material recommender systems are discussed.Learning materials have grew either offline or online in educational organizations. So, it is difficult for learners to discover the most appropriate materials according to keyword searching methods. The creation of the technology forpersonalized lifelong learning has been recognized as a Grand Challenge Problem by peak research bodies (Kay 2008). Therefore, recommender systems have been used for e-learning environ-ments to recommend useful materials to users. The first recommender system was developed in the mid of 1990s (Felfernig et al. 2007). Many recommendation systems in various fields such as movies, music, news, commerce and medicine have been developed but few in education field (Drachsler et al. 2007). The Overview of the recommendation strategies and techniques with their usefulness for material recom-mendation have been presented in Table 1. We briefly survey some of important works and explain the drawbacks of them that can be addressed by our proposed approach.Content based filtering This technique suggests items similar to the ones that each user liked in the past taking into account the object content analysis that the user has evaluated in the past (Lops et al. 2011). As an example for e-learning application, Khribi et al. ( 2009) used learners' recent navigation histories and similarities and dissimilarities among the contents of the learning materials for online automatic recommendations. Clustering was proposed by Hammouda and Kamel ( 2006) to group learning documents based on their topics and similarities. In fact, the existing metrics in content based filtering only detect similarity between items that share the same attributes. Indeed, the basic process performed by a content-based recommender consists in matching up the attributes of a user profile in which preferences and interests are stored, with the attributes of a content object (item), in order to recom-mend to the user new interesting items (Lops et al. 2011). This causes overspecialized recommendations that only include items very similar to those the user already knows. To avoid the overspecialization of content-based methods, researchers pro-posed new personalization strategies, such as collaborative filtering and hybrid approaches mixing both techniques.Collaborative filtering Majority of researchers used collaborative filtering based recommendation system. (Milicevic et al. 2010; Bobadilla et al. 2010). CF approaches used in e-learning environments focus on the correlations among users having similar interests (Marlin 2004; Sergio et al. 2005) and can be divided in tothree categories that have been shown in Table 1. The collaborative e-learning field is strongly growing (Tan et al. 2008; García et al. 2009; García et al. 2011; Wang and Liao 2011), converting this area in an important receiver of applications and generating numerous research papers. Collaborative filtering was used by Soonthornphisaj et al. ( 2006) for prediction the most suitable materials for the learner. At first, the weight between all users and the active learner is calculated by Pearson correlation. Then, the n users that have the highest similarity to the active learner are selected as the neighborhoods. Finally, using the weight combination obtained from the neighborhood, the rating prediction is calculated. Bobadilla et al. ( 2009) used a new equation for incorporating the learners score obtained from a test into the calculations in collaborative filtering for materials prediction. Their experiment showed that the method obtained high item-prediction accuracy.Since in the e-learning environment learning materials are in a variety of multi-media formats including text, hypertext, image, video, audio and slides, it is difficult to calculate content similarity of two items (Chen et al. 2012). In this sense, users’ preference information is a good indication for recommendation. Therefore, CF is more suitable in e-learning systems since it is completely independent of the intrinsic properties of the items being rated or recommended (Yu et al. 2011).Regardless of its success in many application domains, collaborative filtering has two serious drawbacks. First, its applicability and quality are limited by the so-called sparsity problem, which occurs when the available data are insufficient for identifying similar users (Cotter and Smyth 2000). Therefore, many researches were run to alleviate sparsity problem using data mining techniques. For example, Romero et al. ( 2009) developed a specific Web mining tool for discovering suitable rules in recommender engine. Their objective was to recommend to a student the most appropriate links/WebPages to visit next. Second, it requires knowing many user profiles in order to elaborate accurate recommendations for a given user. Therfore, in some e-learning enviroment that number of learner is low, recommendation result has not adequate accuracy.Hybrids To overcome drawbacks of these strategies, researchers used hybridapproaches for material recommendation. Combining several recommendation strat-egies can be expected to provide better results than either strategy alone.As examples for in e-learning environment, Liang et al. ( 2006) implemented the combination of content-based filtering and collaborative filtering to make personal-ized recommendations for a courseware selection module. The algorithm starts with user u entering some keywords on the portal of courseware management system. Next, the courseware recommendation module finds within the same user interest group of user u the k courseware with the same or similar keywords that others choose. García et al. ( 2009) applied association rule mining to discover interesting information through student’s usage data in the form of IF-THEN recommendation rules and then used a collaborative recommender system to share and score the recommendation rules obtained by teachers with similar profiles along with other experts in education.An appropriate recommendation technique must be chosen according to pedagog-ical reasons. These pedagogical reasons are derived from specific demands of lifelong learning (Drachsler et al. 2007). One way to implement pedagogical decisions into a recommender system is to use a variety of recommendation techniques in a recom-mendation strategy. The decision to change from one recommendation technique to another can be done according to pedagogical reasons, derived from specific demands of lifelong learning (Drachsler et al. 2008). This paper uses two recommen-dation techniques based on explicit and implicit attributes of learner and materials.First technique integrates multi-dimensional attributes of materials, learner’s rating information using proposed learner preference tree. Our proposed framework can use this information simultaneously to model adaptive multi-preference of learner. According to the property of this technique, system can improve the accuracy and diversity of recommendation. The second technique integrates information about sequential latent pattern of accessed materials by learners. Using this information and applying sequential pattern mining algorithms help us to filter items according to common learning sequences.In summary, in order to improve the learning material recommendation efficiency, developing a framework for integrating multidimentional-attributes of materials, learn-er’s rating information and also latent patterns of material access is necessary. Most of researches only use some of this information in material recommendation process.附录2 中文原文利用序列模式树和基于内容过滤的个性化推荐摘要基于内容的推荐是电子学习系统中个性化推荐给学生合适资料的的显著部分。