Priyank Goyal's Data Mining Lectures: April 2009

Concept Description-2

Data Generalisation: It can be done in two ways

a. Attribute Removal
b. Attribute Generalisation

a. Attribute Removal: The rule is "If there is a large set of distinct values for an attribute of the initial working relation, but either:

i. There is no concept hierarchy defined for the attribute), or

(ii) Its higher-level concepts are expressed in terms of other attributes, then the attribute should be removed from the working relation.

Attribute Generalisation

If there is a large set of distinct values for an attribute in the initial working relation, and there exist a concept hierarchy on the attribute than that concept hierarchy should be selected and applied on the attribute.

3. Count and Aggregate value Acculumation

Why needed
- To obtain a quantitative measure of generalisation.

How to Calculate

- A number called count is associated with each tuple in the initial working relation.
- Its value is initialised to 1
- Through generalisation there will be a group of identical tuples
- Such identical tuples are merged into one with their counts accumulated

eg let 52 data tuples are all generalised to same tuple say T. Then these are merged to form one tuple whose count is 52.

Attribute Generalisation Control

It is the control of how high an attribute should be generalised.

Two Approached to Generalised Control

1. Attribute Generalisation ( AGTC) Threshold Control- It sets a generalisation threshold for the attributes.
- If the number of distinct values of an attribute is greater than attribute threshold, further generalisation is possible.

2. Generalised Relation Threshold Control ( GRTC)

It sets a threshold for the generalised relation

- If the number of distinct tuples in the generalisation relation is > threshold, further generalisation should be possible.

Here first AGTC than GRTC is applied to generalise the data.

Data Mining Functionalities- Characterisation and Comparison

DM
- Descriptive DM
- Predictive DM

Concept Description

It is the most basic form of discriptive DM
- It describes a given set of task relevant data.
- In a concise and summarative manner
- presenting interesting general properties of the data

Conception Description Has
- Characterisation: It summarises and describes a collection of data called target class.
- Comparison: It summarises and distinguishes
- one collection of data
- called the target class
- from the other colleciton(s) of data
- Collectively called the contrasting class

Concept Characterisation

There are two approaches:
1. the data cube OLAP approach
2. Attribute Oriented Induction Approach
It can be implemented using
- relational structure
- data cube structure.

1.
2. Attribute oriented Induction Approach

Technique of AOI approach

a. Data Focussing:

It corresponds to speficication of task relevant data

eg.

"use"- Big University DB
"mine characteristics as "- Science Students
"in relevance to "- name, gender, major
"from" - student
'where"- status
"in"- graduate

The table so obtained is called the Initial Working Relation.

Priyank Goyal's Data Mining Lectures

Tuesday, April 28, 2009

Concept Description-2

Saturday, April 25, 2009

Data Mining Functionalities- 1

Blog Archive

About Me