Field validation of secondary data sources for enumerating retail tobacco outlets in a state without tobacco outlet licensing
Introduction
Access to supermarkets, convenience stores, and recreational facilities has been associated with smoking (Henriksen et al., 2008), obesity (Lovasi et al., 2009), and physical activity (Gordon-Larsen et al., 2006) and may create an environment that either enhances or diminishes a resident׳s ability to make health promoting choices. The number, types and locations of retail outlets are often proxies for access to tobacco products (Henriksen et al., 2008), food (Larson et al., 2009), or places to be physically active (Boone-Heinonen et al., 2010). For example, studies have found youth living in communities with comparatively higher retail tobacco outlet density were more likely to use tobacco and living near tobacco outlets made it more difficult for adults to quit smoking (Henriksen et al., 2008, Reitzel et al., 2011). Lower income and racial/ethnic minority neighborhoods have disproportionately higher exposure to retail tobacco outlets (Hyland et al., 2003, Fakunle et al., 2010, Peterson et al., 2005, Rodriguez et al., 2013, Schneider et al., 2005), potentially contributing to higher tobacco use among these groups (Frieden, 2011). To further our understanding of how the tobacco retail environment influences tobacco use valid data sources are needed to enumerate tobacco outlets and to accurately identify areas with increased exposure to tobacco products.
In the United States (US), tobacco retail licensing data is often used to calculate tobacco outlet density (Fakunle et al., 2010, Henriksen et al., 2008, Lipperman-Kreda et al., 2012, Hyland et al., 2003, Peterson et al., 2005, Schneider et al., 2005). Yet, licensing lists may be unavailable to researchers, and 13 States do not require tobacco retailer licensing (CDC, 2012), making such estimation difficult. The quality of the sampling list used for US Food and Drug Administration (FDA) compliance checks or to enforce youth tobacco access laws determines whether and how many tobacco outlets will be missed. A state without a licensing list as a starting point may create a sampling frame from state or local business lists, statewide retail license/permit lists or statewide liquor license/permit lists.
Over the last decade, obesity researchers have increasingly relied on secondary data sources (e.g., ReferenceUSA or government food registries) to enumerate retail food and recreational environments. They have linked information on the location of food and activity resources to neighborhood characteristics to understand the impact on weight status and disparities in obesity faced by lower income, certain racial/ethnic groups and rural communities (Powell et al., 2011, Fleischhacker et al., 2011).
Although primary data collection is the most accurate approach (Hosler and Dharssi, 2010, Sharkey, 2009), it is resource intensive. “Ground-truthing,” or identifying outlets through a systematic field canvass of a targeted study area without using secondary data sources, may be feasible in small cities or counties, but is daunting in larger areas (Fleischhacker et al., 2013). Researchers or state-level staff may need to rely on secondary data sources to enumerate larger study areas. For example, tobacco outlets may be located anywhere on the over 40,000 miles of primary and secondary roads in Kentucky or Virginia, neither of which has tobacco retailer licensing. Commercial secondary data sources have several benefits compared with primary data collection: they can be searched by establishment type (e.g., convenience stores), provide telephone numbers and addresses to aid in the verification process, and are typically less expensive than primary data collection.
Since grocery and convenience stores also sell tobacco products (Hosler and Kammer, 2012), similar methods could potentially help identify tobacco outlets. While there have been numerous studies examining the validity for enumerating food outlets (Fleischhacker et al., 2013), no studies, to our knowledge, have examined the validity of secondary data sources for enumerating tobacco outlets. One study in Chicago estimated tobacco outlet density by gathering primary data (Novak et al., 2006) and another identified 88% of the outlets on Washington State׳s licensing list using a secondary data source without conducting primary data collection (Rodriguez et al., 2013).
The purpose of this study is to provide evidence-informed guidance on whether secondary data sources are a reasonable alternative to primary data collection in order to enumerate the tobacco retail environment. A second purpose is to examine whether secondary data sources allocate outlets to the correct census tract, and to compare tobacco outlet density calculated by primary and secondary sources, particularly in jurisdictions that do not have a comprehensive list of tobacco outlets.
Section snippets
Study area
The study area described previously (Rose et al., 2013) included three geographically diverse counties in North Carolina (NC), USA, a state without tobacco retail licensing. Buncombe County, including the Asheville, NC Metro Area, has a median household income of $44,190, 6.4% of the population is African American, and encompasses 656.7 square miles in Appalachia. Durham County is more urban and includes the Durham–Chapel Hill, NC Metro Area, has a median household income of $49,894, 38.0% of
Results
Primary data collection identified 662 tobacco outlets (Table 2). Teams added 73 of those outlets in the field because they were not identified by either secondary data source. Convenience stores with gas stations were the most common type of tobacco outlet (44.9%), followed by convenience stores (15.4%), supermarkets (15.1%), pharmacies (11.9%) and tobacco stores (5.7%).
ReferenceUSA identified 971 probable tobacco outlets; 761 remained after cleaning the lists and applying exclusions (i.e.
Discussion
We examined the evidence for validity reported for two commercial secondary tobacco outlet data sources using primary data collection to ascertain their utility in identifying tobacco outlets in non-licensing states. Combined, ReferenceUSA and D&B identified nearly 90% of the 662 tobacco outlets in the study area. Reference USA had a higher sensitivity than D&B at identifying both probable and actual tobacco outlets. In states without tobacco retail licensing, combining ReferenceUSA and D&B
Conclusions
To our knowledge, this is the first study to report evidence for validity of secondary data sources for identifying probable and actual tobacco outlets using primary data. Although ReferenceUSA and D&B undercounted the true number of tobacco outlets, combining the two secondary data sources resulted in the enumeration of nearly 90% of all tobacco outlets in the study area. Both lists were correlated with actual tobacco outlet density. In North Carolina and perhaps other non-licensing states,
Acknowledgments
Funding for this study was provided by the University Cancer Research Fund to UNC Lineberger Comprehensive Cancer Center at UNC Chapel Hill. Funding was also provided by a grant from the National Cancer Institute to Dr. Ribisl (1U01CA154281). Shyanika Rose received funding from the UNC Lineberger Comprehensive Cancer Center at UNC Chapel Hill (R25 CA57726). The funders had no involvement in the study design, collection, analysis, writing, or interpretation. Lisa Isgett geocoded addresses and
References (37)
- et al.
Validation of a GIS facilities database: quantification and implications of error
Ann. Epidemiol.
(2008) - et al.
Validity of secondary retail food outlet data: a systematic review
Am. J. Prev. Med.
(2013) - et al.
Is adolescent smoking related to the density and proximity of tobacco outlets and retail cigarette advertising near schools?
Prev. Med.
(2008) - et al.
Identifying retail food stores to evaluate the food environment
Am. J. Prev. Med.
(2010) - et al.
Local tobacco policy and tobacco outlet density: associations with youth smoking
J. Adolesc. Health
(2012) - et al.
Business list vs ground observation for measuring a food environment: saving time or waste of time (or worse)?
J. Acad. Nutr. Diet
(2013) - et al.
Field validation of secondary commercial data sources on the retail food outlet environment in the U.S
Health Place
(2011) Measuring potential access to food stores and food-service places in rural areas in the U.S
Am. J. Prev. Med.
(2009)- (1991)
- et al.
Built and socioeconomic environments: patterning and associations with physical activity in US adolescents
Int. J. Behav. Nutr. Phys. Act.
(2010)
State Tobacco Activities Tracking and Evaluation (STATE) System
Secondary source validity for enumerating tobacco retailers in non-licensing states
Natl. Conf. Tob. Health
The DUNS right quality process: the power behind quality information
The importance of income in the link between tobacco outlet density and demographics at the tract level of analysis in New Jersey
J. Ethnicity Subst. Abuse
A systematic review of fast food access studies
Obes. Rev.
Evidence for validity of five secondary data sources for enumerating retail food outlets in seven American Indian Communities in North Carolina
Int. J. Behav. Nutr. Phys. Act.
Inequality in the built environment underlies key health disparities in physical activity and obesity
Pediatrics
Cited by (49)
Neighborhood distribution of availability of newer tobacco products: A US four-site study, 2021
2022, Preventive Medicine ReportsSociodemographic inequities in tobacco retailer density: Do neighboring places matter?
2021, Health and PlaceCitation Excerpt :Second, although we identified retailers based on store types that are most likely to sell tobacco, this list may include retailers that do not sell tobacco, or there could be tobacco retailers missing; however, we have no reason to believe that this error is systematic. Though national validation of business establishment databases has not been conducted, two regional studies have indicated good validation (D'Angelo et al., 2014; Rodriguez et al., 2013). Given that some studies have indicated that patterns in sociodemographic inequities of vape shop density are different compared to what has been documented with retailer density of conventional tobacco retailers (Dai et al., 2017; Giovenco et al., 2016, 2019b), we conducted a sensitivity test and removed 3798 retailers that RefUSA classified with SIC code “599,306 Electronic Cigarettes.”
Associations of County Tobacco Retailer Availability With U.S. Adult Smoking Behaviors, 2014–2015
2021, American Journal of Preventive MedicineCitation Excerpt :The U.S. Census Bureau uses North American Industry Classification System (NAICS) codes to classify business establishments in the U.S. Using reported tobacco product sales data from the 2012 Economic U.S. Census, NAICS codes that account for approximately 99% of all retail tobacco product sales (e.g., gasoline stations with convenience stores, tobacco stores) were identified.15 There is no tobacco retailer licensing system in the U.S., so consistent with other studies,3,16,17 a 2014 national list of probable tobacco retailers was created using these NAICS codes. This analysis used data from ReferenceUSA,18 a database of businesses that contains both NAICS codes and retailer addresses to identify probable tobacco retailers (details are provided in the study by Kong et al.19).
Validation of secondary data sources for enumerating marijuana dispensaries in a state commercializing marijuana
2020, Drug and Alcohol DependenceCitation Excerpt :In fact, not all validity statistics were applicable to a combination of a gold standard and a test with the current study design (details in Technical Note S1). Following tobacco outlet research (D’Angelo et al., 2014), we considered validity statistics 0−0.2 to be poor, 0.21−0.4 to be fair, 0.41−0.6 to be moderate, 0.61−0.8 to be good, and 0.81–1.0 to be very good. R Version 3.5.3 (package “epiR”) was used to calculate 95 % confidence intervals for all the validity statistics.