Tuesday, April 28, 2009

Concept Description-2

Concept Description-2

Data Generalisation: It can be done in two ways

a. Attribute Removal
b. Attribute Generalisation

a. Attribute Removal: The rule is "If there is a large set of distinct values for an attribute of the initial working relation, but either:

i. There is no concept hierarchy defined for the attribute), or

(ii) Its higher-level concepts are expressed in terms of other attributes, then the attribute should be removed from the working relation.

Attribute Generalisation

If there is a large set of distinct values for an attribute in the initial working relation, and there exist a concept hierarchy on the attribute than that concept hierarchy should be selected and applied on the attribute.

3. Count and Aggregate value Acculumation

Why needed
- To obtain a quantitative measure of generalisation.

How to Calculate

- A number called count is associated with each tuple in the initial working relation.
- Its value is initialised to 1
- Through generalisation there will be a group of identical tuples
- Such identical tuples are merged into one with their counts accumulated

eg let 52 data tuples are all generalised to same tuple say T. Then these are merged to form one tuple whose count is 52.

Attribute Generalisation Control

It is the control of how high an attribute should be generalised.


Two Approached to Generalised Control

1. Attribute Generalisation ( AGTC) Threshold Control- It sets a generalisation threshold for the attributes.
- If the number of distinct values of an attribute is greater than attribute threshold, further generalisation is possible.

2. Generalised Relation Threshold Control ( GRTC)

It sets a threshold for the generalised relation

- If the number of distinct tuples in the generalisation relation is > threshold, further generalisation should be possible.

Here first AGTC than GRTC is applied to generalise the data.



No comments: