Web数据中频繁模式树的挖掘

王自强; 冯博琴

引用本文:	王自强, 冯博琴.Web数据中频繁模式树的挖掘[J].控制理论与应用,2005,22(3):429~433.[点击复制]
	WANG Zi-qiang, FENG Bo-qin.Mining frequent pattern tree in Web data[J].Control Theory & Applications,2005,22(3):429~433.[点击复制]

Web数据中频繁模式树的挖掘

Mining frequent pattern tree in Web data

摘要点击 2340 全文点击 2038 投稿时间：2003-09-26 修订日期：2004-06-07

查看全文查看/发表评论下载PDF阅读器 HTML

DOI编号 10.7641/j.issn.1000-8152.2005.3.017

2005,22(3):429-433

中文关键词数据挖掘 Web数据频繁模式树有序树

英文关键词 data mining Web data frequent pattern tree ordered tree

基金项目国家"八六三"高技术研究发展计划基金资助项目(2003AA1Z2610).

作者	单位
王自强, 冯博琴	西安交通大学计算机科学系,陕西西安 710049

中文摘要

为了高效地从半结构化WEB数据中挖掘频繁模式树,提出了把半结构化数据表示为标记、有序树,并基于最右路径扩展技术在有序树中发现所有频繁模式树的算法.其基本思想是,首先从只有一个节点的模式树开始,而新增节点只能通过添加到最右路径上来生成新的模式树,另外,还通过维护最右叶子出现次数列表来实现支持度的逐步计算.理论分析和试验结果表明该算法是可行的,并且具有计算性能线性于最大频繁模式总和的优点.

英文摘要

To efficiently mine all frequent pattern trees from the semi-structured web data,the semi-structured data were modeled as labeled-ordered tree and an algorithm for mining all frequent pattern trees in an ordered data tree was proposed.This algorithm used rightmost path expansion technique,which started with pattern trees with only one node and nodes were added only to the rightmost path to generate new pattern trees.Furthermore,this algorithm maintained only the occurrences of the rightmost leaves to efficiently implement incremental computation of support.The theoretical analysis and experimental results show that this algorithm scales linearly in the total size of maximal tree pattern and works efficiently in practice.