基于FIUT结构增量式频繁项集挖掘
- 格式:pdf
- 大小:313.78 KB
- 文档页数:4
一种基于滑动窗口的数据流频繁项集挖掘算法寇香霞;任永功;宋奎勇【期刊名称】《计算机应用与软件》【年(卷),期】2013(030)001【摘要】数据流的流动性与连续性,使得数据流所蕴含的知识会随着时间的推移而发生变化.挖掘数据流中的频繁项集是一项意义重大且具有挑战性的工作.提出一种基于滑动窗口数据流的频繁项集挖掘——FIUT-Stream算法,FIUT-Stream算法分块挖掘数据流,在内存中维持一个滑动窗口数据的概要结构,随着窗口滑动动态更新该存储结构,利用FIUT算法进行频繁项集挖掘.实验表明,该算法能节省内存空间、精确获得频繁项集.%The flowability and continuity of data stream make the knowledge implicated in data streams change as the time passes. To mine frequent itemsets in data streams is a significant and challenging work. A new algorithm of FIUT-Stream, mining the frequent itemsets in data streams over sliding window, is proposed in the article. FIUT-Stream mines the data stream by blocks and maintains in memory an outlined structure of a sliding window data, dynamically updates the storage structure when the window slides, and uses FIUT algorithm to mine the frequent itemsets. Experiments show that this algorithm can save memory space and accurately acquires the frequent itemsets.【总页数】4页(P143-146)【作者】寇香霞;任永功;宋奎勇【作者单位】辽宁师范大学计算机与信息技术学院辽宁大连116029;辽宁师范大学计算机与信息技术学院辽宁大连116029;辽宁师范大学计算机与信息技术学院辽宁大连116029【正文语种】中文【中图分类】TP301【相关文献】1.基于加权滑动窗口的数据流频繁项集挖掘算法 [J], 白川平;杨志翀2.滑动窗口中数据流最大频繁项集挖掘算法研究 [J], 尹绍宏;单坤玉;范桂丹3.基于滑动窗口的不确定性数据流频繁项集挖掘算法 [J], 刘慧婷;周开申;赵鹏4.数据流中一种基于滑动窗口的前K个频繁项集挖掘算法 [J], 张文煜;周满元5.数据流中基于滑动窗口的最大频繁项集挖掘算法 [J], 杨路明;刘立新;毛伊敏;谢东因版权原因,仅展示原文概要,查看原文内容请购买。
邮局订阅号:82-946120元/年技术创新软件时空《PLC 技术应用200例》您的论文得到两院院士关注频繁项集高效挖掘算法研究Study on Efficient Algorithm of Frequent Item-set Mining(常州工学院)刘芝怡常睿LIU Zhi-yi CHANG Rui摘要:为进一步提高频繁项集挖掘算法的可扩展性,对频繁项集的搜索空间以及FP-tree 的操作方法进行了研究。
在此基础上提出了基于frequent-pattern 链表的高效频繁项集挖掘算法FPL-Growth 。
FPL-Growth 运用递增构建候选项集策略和Apri -ori 性质来缩小搜索空间,运用交叉计数方法快速获取频繁项集的支持数。
最后的实验证明了该算法的有效性。
关键词:frequent-pattern 链表;频繁项集;数据挖掘中图分类号:TP311文献标识码:AAbstract:To further improve the scalability of the algorithm for frequent item -set mining,studies on the frequent item -set search space and the FP-tree operation method were made.On this basis,an efficient algorithm for frequent itemset mining based on the fre -quent-pattern list is presented,which employs the strategy of incremental construction of the candidate itemset and Apriori property to reduce the searching space,and gets support-count of the frequent itemset by intersecting stly the algorithm is realized on experiment and is proved to be efficient.Key words:frequent-pattern list;frequent itemset;data mining文章编号:1008-0570(2012)10-0491-03引言自1994年Agrawal 提出算法Apriori 后,频繁项集挖掘算法的发展得到了相当大的关注。
基于集合运算的频繁集挖掘优化算法娄兰芳潘庆先(烟台大学计算机科学技术学院山东省烟台市 264000)E-mail: loulanfang@pqx@中图分类号TP311.132 文献标识码 A摘要挖掘关联规则是数据挖掘中一个重要的课题,产生频繁项目集是其中的一个关键步骤。
本文提出了一种基于集合运算的数据挖掘算法,并将该算法与经典算法进行比较。
该算法只需要对数据库扫描一遍。
实验表明该算法的效率较好。
关键词数据挖掘,关联规则,频繁项目集A Improved Algorithm Based on Sets Operation for Mining Large Item SetsLOU LANFANG PAN QINGXIAN(School of Computer Science and Technology, Yantai Univ. Shandong Yantai 264000)E-mail:loulanfang@pqx@Abstract Mining association rules is an important topic in data mining. Generating large item sets is one of its keys. This paper presents a data mining algorithm based on sets operation and compares it with traditional algorithms. The improved algorithm only needs to scan the database once. Experiment results indicate that the new algorithm has good efficiency.Keywords: data mining, association rules, large item sets1.引言数据挖掘也叫数据库中的知识发现。
有效的不确定数据概率频繁项集挖掘算法作者:刘浩然刘方爱李旭王记伟来源:《计算机应用》2015年第06期摘要:针对已有概率频繁项集挖掘算法采用模式增长的方式构建树时产生大量树节点,导致内存空间占用较大以及发现概率频繁项集效率低等问题,提出了改进的不确定数据频繁模式增长(PUFPGrowth)算法。
该算法通过逐条读取不确定事务数据库中数据,构造类似频繁模式树(FPTree)的紧凑树结构,同时更新项头表中保存所有尾节点相同项集的期望值的动态数组。
当所有事务数据插入到改进的不确定数据频繁模式树(PUFPTree)中以后,通过遍历数组得到所有的概率频繁项集。
最后通过实验结果和理论分析表明:PUFPGrowth算法可以有效地发现概率频繁项集;与不确定数据频繁模式增长(UFGrowth)算法和压缩的不确定频繁模式挖掘(CUFPMine)算法相比,提出的PUFPGrowth算法能够提高不确定数据概率频繁项集挖掘的效率,并且减少了内存空间的使用。
关键词:数据挖掘;不确定数据;可能世界模型;概率频繁项集;频繁模式中图分类号: TP301.6 文献标志码:A英文摘要Abstract:When using the way of pattern growth to construct tree structure, the exiting algorithms for mining probabilistic frequent itemsets suffer many problems, such as generating large number of tree nodes, occupying large memory space and having low efficiency. In order to solve these problems, a Progressive Uncertain Frequent Pattern Growth algorithm named PUFPGrowth was proposed. By the way of reading data in the uncertain database tuple by tuple, the proposed algorithm constructed tree structure as compact as Frequent Pattern Tree (FPTree) and updated dynamic array of expected value whose header table saved the same itemsets. When all transactions were inserted into the Progressive Uncertain Frequent Pattern tree (PUFPTree), all the probabilistic frequent itemsets could be mined by traversing the dynamic array. The experimental results and theoretical analysis show that PUFPGrowth algorithm can find the probabilistic frequent itemsets effectively. Compared with the Uncertain Frequent pattern Growth (UFGrowth) algorithm and Compressed Uncertain FrequentPattern Mine (CUFPMine) algorithm, the proposed PUFPGrowth algorithm can improve mining efficiency of probabilistic frequent itemsets on uncertain dataset and reduce memory usage to a certain degree.英文关键词Key words:data mining; uncertain data; possible world model; probabilistic frequent itemset; frequent pattern0 引言随着网络技术的快速发展,网络的实际应用中会产生许多不确定性数据,例如传感器采集的数据[1]、通过全球定位系统(Global Positioning System,GPS)定位获取的地理位置信息[2]、网上商城的商品浏览信息等。
基于FIUT结构增量式频繁项集挖掘寇香霞;任永功;宋奎勇【摘要】增量式频繁项集挖掘是当前研究的热点,基于FP-Growth的Pre-FUFP算法有效处理了频繁模式的更新,但需递归遍历FP-tree,导致效率较低.提出Pre-FIUT算法,引入频繁超度量树结构,提高了获得频繁项集挖掘效率;基于FIUT的Pre-FIUT可通过查看频繁超度量树叶子结点的支持度确定频繁项集,并与次频繁项集概念相结合进行增量式频繁项集挖掘.实验表明,Pre-FIUT算法能快速扫描和更新数据,合理利用内存,精确获得频繁项集.%Incremental mining of frequent itemsets is a focus in current researches. Pre-FUFP algorithm, which is based on FP-Growth, effectively deals with the updates of frequent pattern, but the mining process needs to recursively traverse the FP-tree, which results in low efficiency. In this paper we propose the Pre-FIUT algorithm, introduce the frequent items ultrametric tree structure, have improved the frequent itemsets efficiency; Pre-FIUT is based on FIUT method, it can determine the frequent itemsets by checking the support of leave nodes of frequent ultrametric tree, as well as combines the pre-laTge itemset concept to conduct incremental mining of frequent itemsets. Experimental results show that the proposed approach can scan and update data quickly, use memory much more reasonable, and get the frequent itemsets precisely.【期刊名称】《计算机应用与软件》【年(卷),期】2012(029)007【总页数】4页(P105-108)【关键词】FIUT;数据挖掘;频繁项集;次频繁项集;Pre-FIUT算法【作者】寇香霞;任永功;宋奎勇【作者单位】辽宁师范大学计算机与信息技术学院辽宁大连116029;辽宁师范大学计算机与信息技术学院辽宁大连116029;辽宁师范大学计算机与信息技术学院辽宁大连116029【正文语种】中文【中图分类】TP3010 引言频繁项集挖掘是关联规则最基本也是最重要的问题,近些年来一直是数据挖掘领域的研究热点。
基于矩阵的不确定数据频繁项集快速挖掘算法刘芝怡;常睿【摘要】针对CUF-growth算法中项集的期望支持度估算值过大,且挖掘过程中需要反复递归构造条件CUF-tree 导致挖掘效率降低这一问题,提出 UFIM-Matrix ( Uncertain frequent itemset mining-matrix)算法. 该算法不需要建立树结构,而是利用计算项集估算期望支持度的新方法和矩阵结构来产生规模更小候选项集,能在一定程度上减少计算开销,提高挖掘效率. 最后的实验结果也表明了新算法性能更优.%The CUF-growth algorithm gives an upper bound on the expected support of itemsets,but the estimate is too high. It has own bottleneck that needs to build conditional CUF-tree recursively in the mining process for getting candidate itemsets. According to the deficiency of the CUF-growth,the UFIM-Matrix( Uncertain frequent itemset mining-matrix) algorithm is proposed. This algorithm does not need to build a pattern tree while it generates smaller candidate sets by using a matrix structure and an improved method to calculate the upper bound of the expected support of itemsets. It can greatly reduce the cost of computing and improve the mining efficiency. The experimental results indicate the algorithm is more effective and efficient.【期刊名称】《南京理工大学学报(自然科学版)》【年(卷),期】2015(034)004【总页数】6页(P420-425)【关键词】不确定数据;频繁项集;期望支持度;快速挖掘【作者】刘芝怡;常睿【作者单位】常州工学院计算机信息工程学院 ,江苏常州213002;常州工学院计划财务处,江苏常州213002【正文语种】中文【中图分类】TP311随着不确定数据集的不断涌现,不确定性数据挖掘已成为数据挖掘领域中新的研究热点。