Minimizing Information Loss in Continuous Representations: A Fuzzy Classification Technique based on Principal Components Analysis

Barry J. Kronenfeld 1, Nathan D. Kronenfeld 2
1 Department of Geography, University at Buffalo
105 Wilkeson Quad, Amherst, NY 14261
Tel: (716) 645-2722    Fax: (716) 645-2329
2 Oakville, Ontario

The increasing use of fuzzy classification methods to generalize environmental data has led to a persistent question of how to determine class membership values, as well as how to interpret these values once they have been determined.  This paper integrates the above two problems as complementary aspects of the same data reduction process.  Within this process, it is shown that a fuzzy classification technique based on Principal Components Analysis will minimize the amount of information lost through classification.  The PCA-based fuzzy classification technique is analogous to linear spectral unmixing models in remote sensing, and differs from algorithms such as fuzzy k-means in that primary attention is focused on preserving an accurate representation of the underlying attribute data, rather than maximizing the internal consistency of classes.  This focus on accuracy suggests PCA-based fuzzy classification as appropriate for data modeling applications.  However, further research is required to balance the goal of accuracy with the desire for simple (less fuzzy) representations.

Keywords: Data reduction, fuzzy classification, principal components analysis, linear unmixing models

In: McRoberts, R. et al. (eds).  Proceedings of the joint meeting of The 6th International Symposium On Spatial Accuracy Assessment In Natural Resources and Environmental Sciences and The 15th Annual Conference of The International Environmetrics Society, June 28 – July 1 2004, Portland, Maine, USA.

Kronenfeld2004accuracy.pdf2.42 MB