Concept Description-2
Data Generalisation: It can be done in two ways
a. Attribute Removal
b. Attribute Generalisation
a. Attribute Removal: The rule is "If there is a large set of distinct values for an attribute of the initial working relation, but either:
i. There is no concept hierarchy defined for the attribute), or
(ii) Its higher-level concepts are expressed in terms of other attributes, then the attribute should be removed from the working relation.
Attribute Generalisation
If there is a large set of distinct values for an attribute in the initial working relation, and there exist a concept hierarchy on the attribute than that concept hierarchy should be selected and applied on the attribute.
3. Count and Aggregate value Acculumation
Why needed
- To obtain a quantitative measure of generalisation.
How to Calculate
- A number called count is associated with each tuple in the initial working relation.
- Its value is initialised to 1
- Through generalisation there will be a group of identical tuples
- Such identical tuples are merged into one with their counts accumulated
eg let 52 data tuples are all generalised to same tuple say T. Then these are merged to form one tuple whose count is 52.
Attribute Generalisation Control
It is the control of how high an attribute should be generalised.
Two Approached to Generalised Control
1. Attribute Generalisation ( AGTC) Threshold Control- It sets a generalisation threshold for the attributes.
- If the number of distinct values of an attribute is greater than attribute threshold, further generalisation is possible.
2. Generalised Relation Threshold Control ( GRTC)
It sets a threshold for the generalised relation
- If the number of distinct tuples in the generalisation relation is > threshold, further generalisation should be possible.
Here first AGTC than GRTC is applied to generalise the data.
Tuesday, April 28, 2009
Saturday, April 25, 2009
Data Mining Functionalities- 1
Data Mining Functionalities- Characterisation and Comparison
DM
- Descriptive DM
- Predictive DM
Concept Description
It is the most basic form of discriptive DM
- It describes a given set of task relevant data.
- In a concise and summarative manner
- presenting interesting general properties of the data
Conception Description Has
- Characterisation: It summarises and describes a collection of data called target class.
- Comparison: It summarises and distinguishes
- one collection of data
- called the target class
- from the other colleciton(s) of data
- Collectively called the contrasting class
Concept Characterisation
There are two approaches:
1. the data cube OLAP approach
2. Attribute Oriented Induction Approach
It can be implemented using
- relational structure
- data cube structure.
1.
2. Attribute oriented Induction Approach
Technique of AOI approach
a. Data Focussing:
It corresponds to speficication of task relevant data
eg.
"use"- Big University DB
"mine characteristics as "- Science Students
"in relevance to "- name, gender, major
"from" - student
'where"- status
"in"- graduate
The table so obtained is called the Initial Working Relation.
DM
- Descriptive DM
- Predictive DM
Concept Description
It is the most basic form of discriptive DM
- It describes a given set of task relevant data.
- In a concise and summarative manner
- presenting interesting general properties of the data
Conception Description Has
- Characterisation: It summarises and describes a collection of data called target class.
- Comparison: It summarises and distinguishes
- one collection of data
- called the target class
- from the other colleciton(s) of data
- Collectively called the contrasting class
Concept Characterisation
There are two approaches:
1. the data cube OLAP approach
2. Attribute Oriented Induction Approach
It can be implemented using
- relational structure
- data cube structure.
1.
2. Attribute oriented Induction Approach
Technique of AOI approach
a. Data Focussing:
It corresponds to speficication of task relevant data
eg.
"use"- Big University DB
"mine characteristics as "- Science Students
"in relevance to "- name, gender, major
"from" - student
'where"- status
"in"- graduate
The table so obtained is called the Initial Working Relation.
Subscribe to:
Posts (Atom)