Mining Spatial Association Rules with Geostatistics

Jiangping Chen 1, 2+ and Xiaojin Tan 3

1 School of remote sensing Information Engineering Wuhan University, Wuhan Hubei, PR China 430079
2 Department of Geography, University of Cambridge, Downing Place, Cambridge, UK, CB2 3EN
3 International school of software Wuhan University, Wuhan 430079, China 

 Abstract.  In 1962, G. Matheron introduced the term geostatistics to describe a scientific approach to evaluate problems in geology and mining, from ore reserve estimation to grade control. Geostatistics provides statistical methods used to describe spatial relationships among sample data and to apply this analysis to the prediction of spatial and temporal phenomena. They are used to explain spatial patterns and to interpolate values at unsampled locations. Geostatistics have traditionally been used in the sphere of geosciences: meteorology, mining, soil science, forestry, fisheries, remote sensing, and cartography. It later were successfully applied to economics, health, and other disciplines.  Currently, it’s a trend to integrate powerful methods of geostaitsitcs into a geographic information system (GIS).  This paper put forward a new algorithm of mining association rules with geostatistics in analyzing the epidemic problem. A key feature of epidemic data is their location in a space-time continuum. Geostatistics is independent of mean variance relationship and therefore can be used to verify more traditional methods of evaluation inner spatial structure. During structural analysis, spatial autocorrelation can be analyzed using covariance and semivariogram. With structural analysis predictions at unsampled locations can be made using geostatistic method such as kriging (i.e. multiple linear regression in a spatial context). Geostatistical analysis can interpret statistical distributions of data and also examine spatial relationships. It is capable of revealing how cohesion values vary over distance, and of predicting areas of high and low cohesion values. The geostatistics software provides tools for capturing maximum information on a phenomenon from sparse, often biased, and often under-sampled data. It is a good method for spatial data mining by taking account of the autocorrelation between the spatial data. In this paper, the first step is to use the geostatistics methods such as kriging, Spatial Autoregressive Model (SAR) to analyse and estimate  the correlation of the land use/cover change and hay fever incidence. Then build a spatial autocorrelation model and then use the model to mining the spatial association rules. We can get the spatial frequency items from the autocorrelation Model. This replaces the repeated scanning of the spatial database by the measure of conventional spatial association rules mining. From the result of the example, the method is more quick and efficient than the traditional data mining algorithm Apriori. 

Keywords: geographical information science, statistical analysis, spatial autocorrelation, geostatistics, spatial association rule

In: Wan, Y. et al. (eds) Proceeding of the 8th international symposium on spatial accuracy assessment in natural resources and environmental sciences, World Academic Union (Press).

JiangpingChen2008accuracy.pdf353.74 KB