Abstract: We present an integrated tool for preprocessing and analysis of genetic data through data mining. Our goal is the prediction of the functional behavior of proteins, a critical problem in functional genomics. During the last years, many programming approaches have been developed for the identification of short amino-acid chains, which are included in families of related proteins. These chains are called motifs and they are widely used for the prediction of the protein's behavior, since the latter is dependent on them. The idea to use data mining techniques stems from the sheer size of the problem. Since every protein consists of a specific number of motifs, some stronger than others, the identification of the properties of a protein requires the examination of immeasurable combinations. The presence or absence of stronger motifs affects the way in which a protein reacts. GenMiner is a preprocessing software tool that can receive data from three major protein databases and transform them in a form suitable for input to the WEKA data mining suite. A decision tree model was created using the derived training set and an efficiency test was conducted. Finally, the model was applied to unknown proteins. Our experiments have shown that the use of the decision tree model for mining protein data is an efficient and easy-to-implement solution, since it possesses a high degree of parameterization and therefore, it can be used in a plethora of cases.
G. Hatzidamianos, S. Diplaris, I. N. Athanasiadis, P. A. Mitkas, GenMiner: A data mining tool for protein analysis, 9th Panhellenic Conference in Informatics, pg. 346-360, 2003, Greek Computer Society (EPY).