引用本文:王自强, 冯博琴.Web数据中频繁模式树的挖掘[J].控制理论与应用,2005,22(3):429~433.[点击复制]
WANG Zi-qiang, FENG Bo-qin.Mining frequent pattern tree in Web data[J].Control Theory and Technology,2005,22(3):429~433.[点击复制]
Web数据中频繁模式树的挖掘
Mining frequent pattern tree in Web data
摘要点击 1745  全文点击 1778  投稿时间:2003-09-26  修订日期:2004-06-07
查看全文  查看/发表评论  下载PDF阅读器
DOI编号  10.7641/j.issn.1000-8152.2005.3.017
  2005,22(3):429-433
中文关键词  数据挖掘  Web数据  频繁模式树  有序树
英文关键词  data mining  Web data  frequent pattern tree  ordered tree
基金项目  国家"八六三"高技术研究发展计划基金资助项目(2003AA1Z2610).
作者单位
王自强, 冯博琴 西安交通大学 计算机科学系,陕西 西安 710049 
中文摘要
      为了高效地从半结构化WEB数据中挖掘频繁模式树,提出了把半结构化数据表示为标记、有序树,并基于最右路径扩展技术在有序树中发现所有频繁模式树的算法.其基本思想是,首先从只有一个节点的模式树开始,而新增节点只能通过添加到最右路径上来生成新的模式树,另外,还通过维护最右叶子出现次数列表来实现支持度的逐步计算.理论分析和试验结果表明该算法是可行的,并且具有计算性能线性于最大频繁模式总和的优点.
英文摘要
      To efficiently mine all frequent pattern trees from the semi-structured web data,the semi-structured data were modeled as labeled-ordered tree and an algorithm for mining all frequent pattern trees in an ordered data tree was proposed.This algorithm used rightmost path expansion technique,which started with pattern trees with only one node and nodes were added only to the rightmost path to generate new pattern trees.Furthermore,this algorithm maintained only the occurrences of the rightmost leaves to efficiently implement incremental computation of support.The theoretical analysis and experimental results show that this algorithm scales linearly in the total size of maximal tree pattern and works efficiently in practice.