Abstract: |
We propose a bagging strategy based on random Voronoi tessellations for the exploration of high dimensional spatial data, suitable for different purposes (e.g. classification, regression,?). The analysis is based on local representatives from neighbouring data: thanks to spatial dependence, former ones are expected to be less noisy and less correlated than latter ones, providing better performance. Moreover, this permits the handling of high dimensional datasets otherwise intractable without an explicit model for spatial dependence. Given a set of complex data (e.g. functional) indexed by the N sites of a high dimensional spatial lattice, and having chosen a proper measure for the distance between sites (depending on the application), the algorithm replicates for M times the following steps: ? generate a random n-dimensional Voronoi tessellation of the lattice, with n << N; ? identify a local representative for each of the n elements of the tessellation; ? use the sample of local representatives to meet the final purpose of the analysis, via a proper statistical technique; ? assign the result obtained for the local representative to all the sites belonging to the corresponding element of the tessellation. The M results obtained in each site of the lattice are finally pooled together. The algorithm also gives the possibility to compute a local measure of uncertainty in each site. We explore the link between our algorithm and results by Penrose (2007) on the coverage property of Voronoi tessellations. Performance of the algorithm has been tested on synthetic functional data, in the context of unsupervised classification; simulations results clearly show the existence of a bias-variance trade-off with respect to the dimension of the Voronoi tessellation, and open up new theoretical perspectives. Applications of the algorithm to real data are also reported, with a special focus on a clustering problem concerning worldwide irradiance annual patterns. |