Performance of Genetic Algorithms for Data Classification

Stine, Matthew E (2001) Performance of Genetic Algorithms for Data Classification. Undergraduate thesis, under the direction of Dr. Dawn Wilkins from Computer and Information Science, University of Mississippi.


Download (21MB) | Preview


In today's world, the amount of raw data archived across multiple distinct domains is growing at an exponential rate. "Data Mining" is a continuously evolving family of processes by which individuals extract useful information from these data. Classification is one of these processes, and is the construction of varying types of descriptive models from labeled data objects, for the purpose of predicting the label of those objects with unknown labels. The construction of these modules is often adversely affected by the presence of incorrect values or outlier values within the data, a phenomenon known as noise. The original motivation of this research was to test the performance of the binary genetic algorithm, one of a multitude of algorithms used for model construction, in the presence of data with varying percentages of noise. However, in the course of experimentation, several issues arose concerning the effectiveness of the binary genetic algorithm as a classifier. Specifically, the chosen method for encoding classification hypotheses demonstrated limited scalability. Furthermore, the chosen method for encoding continuous and nominally valued data attributes was discovered to be unreasonably strict, leading to poor performance. Further research should be undergone to investigate a more reasonable encoding method. However, the algorithm performed favorably on purely categorical data with a relatively moderate number of small-domained dimensions. Upon injecting varying percentages of noise into these data, the algorithm exhibited a slow, steady descent in classification accuracy. These results lead to the conclusion that the binary genetic algorithm should not be discounted as a possible answer to the question of data classification, especially for data sets with the above characteristics, and further research could reveal hypothesis encoding strategies that will result in improved scalability.

Item Type: Thesis (Undergraduate)
Creators: Stine, Matthew E
Student's Degree Program(s): B.S. in Computer Science
Thesis Advisor: Dr. Dawn Wilkins
Thesis Advisor's Department: Computer and Information Science
Institution: University of Mississippi
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Depositing User: Mr. Matthew Stine
Date Deposited: 14 Jul 2014 15:09
Last Modified: 14 Jul 2014 15:17

Actions (login required)

View Item View Item