A framework for efficient association rule mining in XML data

Ji Zhang, Han Liu, Tok Wang Ling, Robert M. Bruckner, A. Min Tjoa

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

In this article, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently. In XAR-Miner, raw data in the XML document first are preprocessed to transform either to an Indexed XML Tree (IX-tree) or to Multirelational Databases (Multi-DB), depending on the size of the XML document and the memory constraint of the system, for efficient data selection and AR mining. Concepts that are relevant to the AR mining task are generalized to produce generalized metapatterns. A suitable metric is devised for measuring the degree of concept generalization in order to prevent undergeneralization or overgeneralization. Resulting generalized metapatterns are used to generate large ARs that meet the support and confidence levels. A greedy algorithm is also presented in order to integrate data selection and large itemset generation to enhance the efficiency of the AR mining process. The experiments conducted show that XAR-Miner is more efficient in performing a large number of AR mining tasks from XML documents than the state-of-the-art method of repetitively scanning through XML documents in order to perform each of the mining tasks.

Original languageEnglish (US)
Pages (from-to)19-40
Number of pages22
JournalJournal of Database Management
Volume17
Issue number3
DOIs
StatePublished - Jan 1 2006

Fingerprint

Association rules
XML
Miners
Scanning
Data storage equipment
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Keywords

  • Association rule mining
  • Concept generalization
  • Data transformation and indexing
  • Metapatterns
  • XML data

Cite this

Zhang, J., Liu, H., Ling, T. W., Bruckner, R. M., & Min Tjoa, A. (2006). A framework for efficient association rule mining in XML data. Journal of Database Management, 17(3), 19-40. https://doi.org/10.4018/jdm.2006070102
Zhang, Ji ; Liu, Han ; Ling, Tok Wang ; Bruckner, Robert M. ; Min Tjoa, A. / A framework for efficient association rule mining in XML data. In: Journal of Database Management. 2006 ; Vol. 17, No. 3. pp. 19-40.
@article{ed1d6d60025a4773be410b32c8379018,
title = "A framework for efficient association rule mining in XML data",
abstract = "In this article, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently. In XAR-Miner, raw data in the XML document first are preprocessed to transform either to an Indexed XML Tree (IX-tree) or to Multirelational Databases (Multi-DB), depending on the size of the XML document and the memory constraint of the system, for efficient data selection and AR mining. Concepts that are relevant to the AR mining task are generalized to produce generalized metapatterns. A suitable metric is devised for measuring the degree of concept generalization in order to prevent undergeneralization or overgeneralization. Resulting generalized metapatterns are used to generate large ARs that meet the support and confidence levels. A greedy algorithm is also presented in order to integrate data selection and large itemset generation to enhance the efficiency of the AR mining process. The experiments conducted show that XAR-Miner is more efficient in performing a large number of AR mining tasks from XML documents than the state-of-the-art method of repetitively scanning through XML documents in order to perform each of the mining tasks.",
keywords = "Association rule mining, Concept generalization, Data transformation and indexing, Metapatterns, XML data",
author = "Ji Zhang and Han Liu and Ling, {Tok Wang} and Bruckner, {Robert M.} and {Min Tjoa}, A.",
year = "2006",
month = "1",
day = "1",
doi = "https://doi.org/10.4018/jdm.2006070102",
language = "English (US)",
volume = "17",
pages = "19--40",
journal = "Journal of Database Management",
issn = "1063-8016",
publisher = "IGI Publishing",
number = "3",

}

Zhang, J, Liu, H, Ling, TW, Bruckner, RM & Min Tjoa, A 2006, 'A framework for efficient association rule mining in XML data', Journal of Database Management, vol. 17, no. 3, pp. 19-40. https://doi.org/10.4018/jdm.2006070102

A framework for efficient association rule mining in XML data. / Zhang, Ji; Liu, Han; Ling, Tok Wang; Bruckner, Robert M.; Min Tjoa, A.

In: Journal of Database Management, Vol. 17, No. 3, 01.01.2006, p. 19-40.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A framework for efficient association rule mining in XML data

AU - Zhang, Ji

AU - Liu, Han

AU - Ling, Tok Wang

AU - Bruckner, Robert M.

AU - Min Tjoa, A.

PY - 2006/1/1

Y1 - 2006/1/1

N2 - In this article, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently. In XAR-Miner, raw data in the XML document first are preprocessed to transform either to an Indexed XML Tree (IX-tree) or to Multirelational Databases (Multi-DB), depending on the size of the XML document and the memory constraint of the system, for efficient data selection and AR mining. Concepts that are relevant to the AR mining task are generalized to produce generalized metapatterns. A suitable metric is devised for measuring the degree of concept generalization in order to prevent undergeneralization or overgeneralization. Resulting generalized metapatterns are used to generate large ARs that meet the support and confidence levels. A greedy algorithm is also presented in order to integrate data selection and large itemset generation to enhance the efficiency of the AR mining process. The experiments conducted show that XAR-Miner is more efficient in performing a large number of AR mining tasks from XML documents than the state-of-the-art method of repetitively scanning through XML documents in order to perform each of the mining tasks.

AB - In this article, we propose a framework, called XAR-Miner, for mining ARs from XML documents efficiently. In XAR-Miner, raw data in the XML document first are preprocessed to transform either to an Indexed XML Tree (IX-tree) or to Multirelational Databases (Multi-DB), depending on the size of the XML document and the memory constraint of the system, for efficient data selection and AR mining. Concepts that are relevant to the AR mining task are generalized to produce generalized metapatterns. A suitable metric is devised for measuring the degree of concept generalization in order to prevent undergeneralization or overgeneralization. Resulting generalized metapatterns are used to generate large ARs that meet the support and confidence levels. A greedy algorithm is also presented in order to integrate data selection and large itemset generation to enhance the efficiency of the AR mining process. The experiments conducted show that XAR-Miner is more efficient in performing a large number of AR mining tasks from XML documents than the state-of-the-art method of repetitively scanning through XML documents in order to perform each of the mining tasks.

KW - Association rule mining

KW - Concept generalization

KW - Data transformation and indexing

KW - Metapatterns

KW - XML data

UR - http://www.scopus.com/inward/record.url?scp=33846569322&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846569322&partnerID=8YFLogxK

U2 - https://doi.org/10.4018/jdm.2006070102

DO - https://doi.org/10.4018/jdm.2006070102

M3 - Article

VL - 17

SP - 19

EP - 40

JO - Journal of Database Management

JF - Journal of Database Management

SN - 1063-8016

IS - 3

ER -