A Genetic Algorithm Based Data Clustering Approach
Hiren Kumar Deva Sarma hiroo135@yahoo.co.uk
Information Technology Sikkim Manipal University, Sikkim, India India
Abstract
Data
mining is the process of extracting potentially useful information from
unanalyzed data. It is also referred to as knowledge discovery in large
databases which means process of non- trivial extraction of implicit,
previously unknown and potentially useful information such as knowledge
rules, constraints and regularities from huge amount of data in databases.
It is a search process for hidden patterns that may exist in large databases.
Data clustering is not a new technique and it has been studied extensively
in statistrics, machine learning and database communities with diverse
emphases. Clustering is a useful technique for discovery of data distribution
and patterns in the underlying data. Goal of clustering is to discover
dense and sparse region in data set.
Basically, data clustering principle maximizes the intra-class similarity
and minimizes the inter-class similarity. Clustering analysis helps to
construct meaningful partitions of a large set of objects based on a divide
and conquer methodology which decompose a large scale system into smaller
components to simplify design and implementation. Genetic algorithms are
search algorithms based on the principle of natural genetics, i.e., operation
existing in nature. They combine a Derwinian "Survival of Fittest" approach
with a structured, yet randomized, information exchange procedure. The
advantage is that they can search complex and large amount spaces efficiently
and locate near optimal solution pretty rapidly. This paper analyzes some
useful K-medoid clustering algorithms used in data mining. A GA based
algorithm for data clustering is proposed here. Future possible enhancements
are also mentioned.
|