Wednesday, September 16, 2009

Data Mining Primitives- Introduction

Data Mining Primitives, Languages and System Architecture

1. Data Mining Primitives: Designed in order to faciliate efficient and fruitful knowledge discovery.

A DMP includes:

a.  Specification of relevant portion of DB
b. The kind of knowledge to be mined
c. Background knowledge useful in guiding the discovery process.
d. Interestingness measure for pattern evaluation.
e. How the discovered knowledge should be visualised

Use of DMQL ( Data Mining Query Language)

It facilitates the DM system communication with other information systems.

DM Primitives- What defines a DM Task

A DM query is defined in terms of the following primitives

1. Task Relevant Data: This is DB portion to be investigated

2. Kind of Knowledge to be mined: It means specified data mining function to be performed eg. characterisation, discrimination, association, clustering or evolution analysis

3.  Background Knowledge: It includes knowledge about the domain to be mined. This includes concept hierarchies which allows the data to be mined at different levels of granualities. This also includes evaluation of the patterns according to the degree of expectedness and unexpectedness.

4. Interestingness Measures: These functions are used to separate uninteresting patterns from knowledge. It includes support ( % of tuples) and confidence ( degree).

5. Presentation and Visualisation of Discovered Patterns: This includes rules, tables, charts, groups, decision trees and cubes.