Data mining is defined as the process of discovering patterns in data. The process must be automatic or (more usually) semiautomatic. The patterns discovered must be meaningful in that they lead some advantage, usually an economic one.
Data mining have two major functions:
1. Classification:
- Classification maps data into predefined groups or classes.
- It is often referred to as supervised learning because the classes are determined before examining the data.
- Classification creates a function from training data. The training data consist of pairs of input objects, and desired output. The output of the function can be a continuous value, or can predict a class label of the input object.
- The task of the classification is to predict the value of the function for any valid input object after having seen only a small number of training examples.
2. Clustering:
- Clustering is similar to classification except that the groups are not predefined, but rather defined by the data alone.
- Clustering is alternatively referred to as unsupervised learning or segmentation.
- It can be thought of as partitioning or segmenting the data into groups that might or might not be disjointed.
- The clustering is usually accomplished by determining the similarity among the data on predefined attributes. The most similar data are grouped into clusters.