GLOBE: Global Collaboration Engine - Global Representativeness Analysis

Global Representativeness Analysis

A publication on global representativeness analysis is available here.

For step-by-step instructions on using Global Representativeness Analysis in GLOBE, click here.

Global Representativeness Analysis assesses the degree to which a given collection of study sites represents an unbiased sample of a specified global extent based on a set* of global variables selected by the user.

Purpose: To assess the degree to which a given collection of case studies (sample) is an unbiased sample across an extent of Earth’s land, and to potentially remediate bias in case study samples. Representativeness analysis is designed primarily to assist researchers conducting meta-studies using samples of existing local or regional case studies, and can also be used to detect and sample understudied areas.

Example: A peer review of your meta-study of tropical deforestation criticizes it as geographically biased: “the case studies you chose are in the most accessible areas of the tropics and do not represent the true global patterns of tropical deforestation, which are most common in the more remote areas of these woodlands.” The editor requires you to provide evidence that this is not the case. A Representativeness Analysis can provide a robust statistical test of whether the case studies in your meta-study are significantly more common in the least accessible areas of the tropics, versus more accessible areas.

Why use it: To quantify and avoid bias in meta-studies. Case studies are not conducted at random across Earth’s land. As a result, a given sample of existing case studies can be highly biased, overrepresenting or underrepresenting more accessible areas, wealthy areas, high population areas, the temperate zone, etc. To assess this bias, representativeness analysis quantifies the degree to which a given sample of study sites (collection) resembles a random sample across a specified global extent based on its coverage of a global variable.

Results enable meta-study researchers to assess biases and limits to their case study samples, and potentially to improve them by selecting more samples in underrepresented areas of the world. Additional capabilities include reweighting the sample to make it more representative (increasing the weight of underrepresented case studies, decreasing weights of overrepresented studies), and searching for case studies specifically within underrepresented areas.

How it works: Representativeness analysis is based on the principle that unbiased samples of study sites should cover the variation in a global variable to the same degree that a random sample of the same size would. To assess this, the frequency distribution of a global variable across the sample is compared with the frequency distribution of a large set of random samples across the specified global extent. The degree to which a sample collection is representative of a specified global extent is quantified by how similar its frequency distribution is to that of the entire specified global extent (one of several f-divergence indicators can be used).

The probability of attaining a given level of representativeness by random sampling can be compared against the representativeness of the collection; samples with representativeness levels similar to 50% of random samples are not significantly biased, and those with lower levels show bias . The degree to which the sample over or underrepresents different areas is mapped, and the global extents of well-represented, underrepresented and overrepresented areas are calculated as a total area (km²) and as a percentage of the specified global extent.

* At this time, Representativeness Analysis is limited to a single global variable. Multivariate Representativeness Analysis is planned for a future update.

Citation for Representativeness Analysis

Schmill, Matthew D., Gordon, Lindsey M., Magliocca, Nicholas R., Ellis, Erle C. and Oates, Tim. 2014. GLOBE: Analytics for Assessing Global Representativeness. In COM.Geo ’14: Proceedings of the 5th International Conference on Computing for Geospatial Research and Applications.Washington, DC. [download PDF]

RT @hanna123987: New version of #rstats #rspatial package CAST allows visualizing whether training data for #MachineLearning have represent…

Global Representativeness Analysis

Twitter: globalyzer