Home

A Genetic Algorithm Based Data Clustering Approach

Hiren Kumar Deva Sarma
hiroo135@yahoo.co.uk
Information Technology
Sikkim Manipal University, Sikkim, India
India

Abstract

Data mining is the process of extracting potentially useful information from unanalyzed data. It is also referred to as knowledge discovery in large databases which means process of non- trivial extraction of implicit, previously unknown and potentially useful information such as knowledge rules, constraints and regularities from huge amount of data in databases. It is a search process for hidden patterns that may exist in large databases. Data clustering is not a new technique and it has been studied extensively in statistrics, machine learning and database communities with diverse emphases. Clustering is a useful technique for discovery of data distribution and patterns in the underlying data. Goal of clustering is to discover dense and sparse region in data set.

Basically, data clustering principle maximizes the intra-class similarity and minimizes the inter-class similarity. Clustering analysis helps to construct meaningful partitions of a large set of objects based on a divide and conquer methodology which decompose a large scale system into smaller components to simplify design and implementation. Genetic algorithms are search algorithms based on the principle of natural genetics, i.e., operation existing in nature. They combine a Derwinian "Survival of Fittest" approach with a structured, yet randomized, information exchange procedure. The advantage is that they can search complex and large amount spaces efficiently and locate near optimal solution pretty rapidly. This paper analyzes some useful K-medoid clustering algorithms used in data mining. A GA based algorithm for data clustering is proposed here. Future possible enhancements are also mentioned.



 
Copyright & Disclaimers

© 2005 ATCM, Inc. © 2005 Any2Any Technologies, Ltd.