The effects of training set size for performance of support vector machines and decision trees

The effects of training set size for performance of support vector machines and decision trees
Taskin Kavzoglu and Ismail Colkesen Gebze 

Institute of Technology, Department of Geodetic and Photogrammetric Engineering, Cayirova Campus, 41400, Gebze-KOCAELI, TURKEY. (1kavzoglu@gyte.edu.tr, icolkesen@gyte.edu.tr)

Abstract: Thematic maps representing the characteristics of the Earth’s surface have been widely used as a primary input in many land related studies. Classification of remotely sensed images is an effective way to produce these maps. Selecting proper number of samples and classification method are essential issues to produce accurate thematic maps. In the literature, many classification algorithms have been developed and their performances have been analyzed for different data sets. In this study, support vector machines (SVMs) and decision trees (DTs), relatively new and widely used methods, were applied to produce land use/land cover thematic map of the study area, which covers the center of Trabzon province of Turkey. Training data sets at various sizes were used to investigate the effect of the training set size on the classification accuracy. Variations in the classification performances were analyzed using overall classification accuracy and Kappa coefficient derived from the error matrix. Furthermore, McNemar’s and z tests were employed to determine the statistical significance of differences in classifier performances depending on the training sample size. Results showed that classification performances of SVMs and DTs improved till a certain level

Keywords: Support vector machines, decision trees, accuracy comparison, McNemar’s test.

AttachmentSize
kavzogluAccuracy2012.pdf71.68 KB