Quantifying the magnitude of environmental exposure misclassification when using imprecise address proxies in public health research
Introduction
Recent advances in the analytical capacity of desktop geographic information system (GIS) software, combined with the increasing availability of spatially-referenced health and environmental data in digital format, have created new opportunities for making breakthroughs in spatial epidemiology (Zandbergen, 2008). As digital mapping is an abstraction of reality, the spatial data used for visualizing and analyzing geographic phenomena will always be inaccurate to some degree. Such inaccuracies can be compounded when spatially aggregated units are used as locational proxies for mapping and analyzing spatial relationships, rather than more precise geographic locations. In environmental and public health research, it is common to use proxies for sample unit locations, such as centroids of postal/zip codes, census tracts, dissemination areas, blocks, or lots; however, it is very uncommon for studies to address, or even mention, the potential problems ensuing from the positional discrepancies associated with using imprecise address proxies. It is the responsibility of the researcher to identify, quantify, interpret, and attempt to reduce any errors associated with using particular spatial data and locational proxies, so that they do not interfere with any conclusions and recommendations to be made from the findings (Fotheringham, 1989, Anselin, 2006).
Researchers in spatial epidemiology have long been concerned about the absolute or relative spatial accuracy of the address points used to map sample populations or phenomena within a GIS (Goldberg, 2008). Numerous researchers have examined the ‘positional errors’ which occur when the address from a database is located on a digital map, but the point is not located at the true position of the address (Cayo and Talbot, 2003, Ward et al., 2005, Schootman et al., 2007, Strickland et al., 2007, Zandbergen and Green, 2007, Jacquez and Rommel, 2009). In many previous studies, positional errors are reported as Euclidian distance errors, or errors in the X and Y dimension. While much has been said about positional errors, much less has been said about how study results might be affected when researchers use spatially aggregated units (which themselves might be positionally accurate) as address proxies. Very few studies measure and compare the positional discrepancies between address proxies and the exact address they are used to represent (Bow et al., 2004).
A major area of investigation in the fields of spatial epidemiology, health geography, and public health attempts to assess the levels of accessibility or ‘exposure’ of subject populations to elements in their local environments that are believed to be health-promoting or health-damaging, and are related to certain health-related behaviors or outcomes. Accessibility is typically measured in relation to the distance between subject populations and selected environmental features, and is often operationalized as a binary variable (i.e., accessible/inaccessible, exposed/not exposed) or a density variable (i.e., number of sites within, volume of contaminant within) in relation to an areal unit or ‘buffer’ of a certain threshold distance (radius) around the subject’s address. There is much variability, but unfortunately not much debate, regarding the particular threshold distances to be used in accessibility studies; however, most authors do attempt to justify their choice of threshold distances based on human behavior (e.g. ‘walking distance’) or perhaps some characteristic of contaminant source (e.g. 150 m from roadway). The chosen accessibility thresholds also typically vary by study population (e.g. children vs. adults), setting (e.g. urban vs. rural), and by health-related outcome (e.g. physical activity vs. asthma). In their study of the environmental influences on whether or not a child will walk or bike to school, for example, Larsen and colleagues (2009) justify the choice of a 1600 m neighborhood buffer based on the local school board cut-off distance for providing school bus service (see also Schlossberg et al., 2006, Muller et al., 2008, Brownson et al., 2009, Panter et al., 2009). Studies which have focussed on access to neighborhood resources such as public parks and recreation spaces have utilized a variety of threshold distances, typically between 400 and 1600 m (compare Lee et al., 2007, Bjork et al., 2008, Tucker et al., 2008, Maroko et al., 2009); however, we submit a threshold distance of 500 m is ideal, as it represents a short 5–7 min walk, therefore easily accessible for populations of all ages (see Tucker et al., 2008, Sarmiento et al., 2010, Wolch et al., 2010). The 5–7 min walk zone, as represented by the 500 m buffer around a home or public school, is also a common distance used in studies exploring the relationship between access to junk food and obesity (see Austin et al., 2005, Morland and Evenson, 2009; Gilliland, 2010). Studies of ‘food deserts’ (disadvantaged areas with poor access to retailers of healthy and affordable food) and the potential impact of poor access to grocery stores on dietary habits and obesity have tended to focus on longer distances (800 m or greater), and vary according to urban vs. rural setting (see Wang et al., 2007, Larsen and Gilliland, 2008, Pearce et al., 2008, Sharkey, 2009, Sadler et al., 2011). For the purpose of this analysis, we focus on 1000 m, or the 10–15 min walk zone around a grocery store, as has been identified in previous studies of food deserts in Canadian cities (Apparicio et al., 2007, Larsen and Gilliland, 2008). Explorations of how distance from a patient’s home to emergency services available at hospitals is associated with increased risk of mortality are more likely to use much larger threshold distances than standard ‘walk zones’ (e.g. greater than 5 km) (see Jones et al., 1997, Cudnick et al., 2010, Nicholl et al., 2007, Acharya et al., 2011). Nicholl and colleagues (2007), for example, discovered that a 10 km increase in straight-line distance to hospital is associated with a 1% increase in mortality. As hospitals tend to be a regional, rather than a neighborhood facility, we will use the threshold distance of 10 km for our analyses.
Rushton and colleagues (2006) have argued that when short distances between subject population and environmental features are associated with health effects in epidemiologic studies, the geocoding result must have a positional accuracy that is sufficient to resolve whether such effects are truly present. The purpose of this study is to quantify the magnitude of the positional discrepancies in terms of distance errors and accessibility misclassification that result from using several commonly-used address proxies in public health research. Positional errors have been shown to vary greatly by setting (Bonner et al., 2003, Cayo and Talbot, 2003, Ward et al., 2005); therefore, we quantify errors by multiple neighborhood types: urban, suburban, small town, and rural. We also attempt to ascribe ‘meaning’ to these errors for spatial epidemiologic studies by examining errors in distance and accessibility misclassification with respect to several health-related features, including hospitals, public recreation facilities, schools, grocery stores, and junk food retailers.
Section snippets
Study area and data
The City of London (population 350,200) and Middlesex County (population 69,024) in Southwestern Ontario, Canada are ideal study areas for examining the geocoding errors in accessibility studies as they encompass a mix of urban, suburban, small town, and rural agricultural areas (Statistics Canada, 2011) (see Fig. 1). The study area was categorized into four neighborhood types as follows: (1) urban areas correspond to neighborhoods in the City of London built primarily before World War II; (2)
Magnitude of positional discrepancies
In almost every case, urban neighborhoods show the smallest median distance error for all address proxies, followed successively by suburban, small town, and rural areas (see Table 1). As expected, lot centroids were the most accurate proxy for precise residential dwelling location that we examined in relation to nearest distance to health related facilities, with the median positional discrepancy (50th percentile) between lot centroids and dwelling centroids equal to 6–9 m for locations in
Discussion
It is common in public health research to use spatially aggregated units as address proxies for the locations of subjects and facilities when more precise address information is unavailable. It is rare, however, for public health researchers to examine, or even mention, the potential distance and misclassification errors resulting from the positional discrepancies between the locations of imprecise address proxies and precise subject locations. It is inappropriate for researchers to ignore
References (46)
- et al.
Distance from home to hospital and thrombolytic utilization for acute ischemic stroke
J Stroke Cerebrovasc Dis
(2011) How (not) to lie with spatial statistics
Am J Prev Med
(2006)- et al.
Obesity prevalence and the local food environment
Health Place
(2009) - et al.
Travel-to-school mode choice modelling and patterns of school choice in urban areas
J Transport Geog
(2008) - et al.
Geocoding in Cancer Research
Am J Prev Med
(2006) - et al.
Positional accuracy and geographic bias of four methods of geocoding in epidemiologic research
Ann Epidemiol
(2007) Measuring potential access to food stores and food-service places in rural areas in the US
Am J Prev Med
(2009)A comparison of address point, parcel and street geocoding techniques
Comput Environ Urban Syst
(2008)- et al.
The case of Montréal’s missing food deserts: evaluation of accessibility to food supermarkets
Int J Health Geog
(2007) - et al.
Comparing alternative approaches to measuring the geographical accessibility of urban health services: distance types and aggregation-error issues
Int J Health Geogr
(2008)
Clustering of fast-food restaurants around schools: a novel application of spatial statistics to the study of food environments
Am J Public Health
Recreational values of the natural environment in relation to neighbourhood satisfaction, physical activity, obesity and wellbeing
J Epidemiol Community Health
Positional accuracy of geocoded addresses in epidemiologic research
Epidemiology
Accuracy of city postal code coordinates as a proxy for location of residence
Int J Health Geogr
Measuring the built environment for physical activity: state of the science
Am J Prev Med
Positional error in automated geocoding of residential addresses
Int J Health Geogr
Parcels, buildings, address points, and health facilities GIS files [DVD]
A geospatial assessment of transport distance and survival to discharge in out of hospital cardiac arrest patients: Implications for resuscitation centers
Resuscitation
Scale-independent spatial analysis
Estimating the accuracy of geographical imputation
Int J Health Geogr
Cited by (62)
Parents' attitudes regarding their children's play during COVID-19: Impact of socioeconomic status and urbanicity
2023, SSM - Population HealthResidential greenness and substance use among youth and young adults: Associations with alcohol, tobacco, and marijuana use
2022, Environmental ResearchCitation Excerpt :To limit potential exposure misclassification that results from using address proxies such as postal codes, the following study uses a buffer zone of 1000 m around each centroid. While many authors have elected to use buffers as small as 100 m to 500 m, research has indicated that misclassification could be very large if smaller buffer sizes are used (Healy and Gilliland, 2012). Similarly, since misclassification is likely to be exacerbated for rural areas, the current study is limited to urban areas.
The quantity and composition of household food waste during the COVID-19 pandemic: A direct measurement study in Canada
2022, Socio-Economic Planning SciencesInvestigating the association between regeneration of urban blue spaces and risk of incident chronic health conditions stratified by neighbourhood deprivation: A population-based retrospective study, 2000–2018
2022, International Journal of Hygiene and Environmental HealthUncertainty in geospatial health: challenges and opportunities ahead
2022, Annals of EpidemiologyCitation Excerpt :Due to confidentiality restrictions, patient addresses may be aggregated to larger areal units [129]. Aggregation is known to impact true travel distance estimates [130–134], spatial clustering [55,135–138] and exposure misclassification [73,77]. Since it is unclear where patients may reside within an aggregated unit, a common approach is to impute likely coordinates by locating the individuals at the centroid of the geographic unit [139].
Examining how children's gender influences parents’ perceptions of the local environment and their influence on children's independent mobility
2021, Wellbeing, Space and SocietyCitation Excerpt :These include the distance between home and school, land-uses, population density, and intersection density. As this study focused on children that live within walking distance residing in urban areas and small towns, postal codes are appropriate proxies for home locations (Healy and Gilliland, 2012). Distance between home and school was measured in kilometers using the shortest network distance between a child's home postal code and school.