Elsevier

Health & Place

Volume 28, July 2014, Pages 38-44
Health & Place

Field validation of secondary data sources for enumerating retail tobacco outlets in a state without tobacco outlet licensing

https://doi.org/10.1016/j.healthplace.2014.03.006Get rights and content

Highlights

  • We conducted primary data collection to identify all tobacco retail outlets in three counties in a non-tobacco licensing State.

  • Evidence for validity reported for two commercial data sources.

  • Nearly 90% of tobacco outlets were identified by combining secondary data sources.

  • More than 90% of the outlets identified by ReferenceUSA were geocoded to the correct census tract

Abstract

Identifying tobacco retail outlets for U.S. FDA compliance checks or calculating tobacco outlet density is difficult in the 13 States without tobacco retail licensing or where licensing lists are unavailable for research. This study uses primary data collection to identify tobacco outlets in three counties in a non-licensing state and validate two commercial secondary data sources. We calculated sensitivity and positive predictive values (PPV) to examine the evidence of validity for two secondary data sources, and conducted a geospatial analysis to determine correct allocation to census tract. ReferenceUSA had almost perfect sensitivity (0.82) while Dun & Bradstreet (D&B) had substantial sensitivity (0.69) for identifying tobacco outlets; combined, sensitivity improved to 0.89. D&B identified fewer "false positives" with a PPV of 0.82 compared to 0.71 for ReferenceUSA. More than 90% of the outlets identified by ReferenceUSA were geocoded to the correct census tract. Combining two commercial data sources resulted in enumeration of nearly 90% of tobacco outlets in a three county area. Commercial databases appear to provide a reasonably accurate way to identify tobacco outlets for enforcement operations and density estimation.

Introduction

Access to supermarkets, convenience stores, and recreational facilities has been associated with smoking (Henriksen et al., 2008), obesity (Lovasi et al., 2009), and physical activity (Gordon-Larsen et al., 2006) and may create an environment that either enhances or diminishes a resident׳s ability to make health promoting choices. The number, types and locations of retail outlets are often proxies for access to tobacco products (Henriksen et al., 2008), food (Larson et al., 2009), or places to be physically active (Boone-Heinonen et al., 2010). For example, studies have found youth living in communities with comparatively higher retail tobacco outlet density were more likely to use tobacco and living near tobacco outlets made it more difficult for adults to quit smoking (Henriksen et al., 2008, Reitzel et al., 2011). Lower income and racial/ethnic minority neighborhoods have disproportionately higher exposure to retail tobacco outlets (Hyland et al., 2003, Fakunle et al., 2010, Peterson et al., 2005, Rodriguez et al., 2013, Schneider et al., 2005), potentially contributing to higher tobacco use among these groups (Frieden, 2011). To further our understanding of how the tobacco retail environment influences tobacco use valid data sources are needed to enumerate tobacco outlets and to accurately identify areas with increased exposure to tobacco products.

In the United States (US), tobacco retail licensing data is often used to calculate tobacco outlet density (Fakunle et al., 2010, Henriksen et al., 2008, Lipperman-Kreda et al., 2012, Hyland et al., 2003, Peterson et al., 2005, Schneider et al., 2005). Yet, licensing lists may be unavailable to researchers, and 13 States do not require tobacco retailer licensing (CDC, 2012), making such estimation difficult. The quality of the sampling list used for US Food and Drug Administration (FDA) compliance checks or to enforce youth tobacco access laws determines whether and how many tobacco outlets will be missed. A state without a licensing list as a starting point may create a sampling frame from state or local business lists, statewide retail license/permit lists or statewide liquor license/permit lists.

Over the last decade, obesity researchers have increasingly relied on secondary data sources (e.g., ReferenceUSA or government food registries) to enumerate retail food and recreational environments. They have linked information on the location of food and activity resources to neighborhood characteristics to understand the impact on weight status and disparities in obesity faced by lower income, certain racial/ethnic groups and rural communities (Powell et al., 2011, Fleischhacker et al., 2011).

Although primary data collection is the most accurate approach (Hosler and Dharssi, 2010, Sharkey, 2009), it is resource intensive. “Ground-truthing,” or identifying outlets through a systematic field canvass of a targeted study area without using secondary data sources, may be feasible in small cities or counties, but is daunting in larger areas (Fleischhacker et al., 2013). Researchers or state-level staff may need to rely on secondary data sources to enumerate larger study areas. For example, tobacco outlets may be located anywhere on the over 40,000 miles of primary and secondary roads in Kentucky or Virginia, neither of which has tobacco retailer licensing. Commercial secondary data sources have several benefits compared with primary data collection: they can be searched by establishment type (e.g., convenience stores), provide telephone numbers and addresses to aid in the verification process, and are typically less expensive than primary data collection.

Since grocery and convenience stores also sell tobacco products (Hosler and Kammer, 2012), similar methods could potentially help identify tobacco outlets. While there have been numerous studies examining the validity for enumerating food outlets (Fleischhacker et al., 2013), no studies, to our knowledge, have examined the validity of secondary data sources for enumerating tobacco outlets. One study in Chicago estimated tobacco outlet density by gathering primary data (Novak et al., 2006) and another identified 88% of the outlets on Washington State׳s licensing list using a secondary data source without conducting primary data collection (Rodriguez et al., 2013).

The purpose of this study is to provide evidence-informed guidance on whether secondary data sources are a reasonable alternative to primary data collection in order to enumerate the tobacco retail environment. A second purpose is to examine whether secondary data sources allocate outlets to the correct census tract, and to compare tobacco outlet density calculated by primary and secondary sources, particularly in jurisdictions that do not have a comprehensive list of tobacco outlets.

Section snippets

Study area

The study area described previously (Rose et al., 2013) included three geographically diverse counties in North Carolina (NC), USA, a state without tobacco retail licensing. Buncombe County, including the Asheville, NC Metro Area, has a median household income of $44,190, 6.4% of the population is African American, and encompasses 656.7 square miles in Appalachia. Durham County is more urban and includes the Durham–Chapel Hill, NC Metro Area, has a median household income of $49,894, 38.0% of

Results

Primary data collection identified 662 tobacco outlets (Table 2). Teams added 73 of those outlets in the field because they were not identified by either secondary data source. Convenience stores with gas stations were the most common type of tobacco outlet (44.9%), followed by convenience stores (15.4%), supermarkets (15.1%), pharmacies (11.9%) and tobacco stores (5.7%).

ReferenceUSA identified 971 probable tobacco outlets; 761 remained after cleaning the lists and applying exclusions (i.e.

Discussion

We examined the evidence for validity reported for two commercial secondary tobacco outlet data sources using primary data collection to ascertain their utility in identifying tobacco outlets in non-licensing states. Combined, ReferenceUSA and D&B identified nearly 90% of the 662 tobacco outlets in the study area. Reference USA had a higher sensitivity than D&B at identifying both probable and actual tobacco outlets. In states without tobacco retail licensing, combining ReferenceUSA and D&B

Conclusions

To our knowledge, this is the first study to report evidence for validity of secondary data sources for identifying probable and actual tobacco outlets using primary data. Although ReferenceUSA and D&B undercounted the true number of tobacco outlets, combining the two secondary data sources resulted in the enumeration of nearly 90% of all tobacco outlets in the study area. Both lists were correlated with actual tobacco outlet density. In North Carolina and perhaps other non-licensing states,

Acknowledgments

Funding for this study was provided by the University Cancer Research Fund to UNC Lineberger Comprehensive Cancer Center at UNC Chapel Hill. Funding was also provided by a grant from the National Cancer Institute to Dr. Ribisl (1U01CA154281). Shyanika Rose received funding from the UNC Lineberger Comprehensive Cancer Center at UNC Chapel Hill (R25 CA57726). The funders had no involvement in the study design, collection, analysis, writing, or interpretation. Lisa Isgett geocoded addresses and

References (37)

  • State Tobacco Activities Tracking and Evaluation (STATE) System

    (2012)
  • H. D’Angelo et al.

    Secondary source validity for enumerating tobacco retailers in non-licensing states

    Natl. Conf. Tob. Health

    (2012)
  • Dun et al.

    The DUNS right quality process: the power behind quality information

    (2005)
  • D. Fakunle et al.

    The importance of income in the link between tobacco outlet density and demographics at the tract level of analysis in New Jersey

    J. Ethnicity Subst. Abuse

    (2010)
  • S.E. Fleischhacker et al.

    A systematic review of fast food access studies

    Obes. Rev.

    (2011)
  • S.E. Fleischhacker et al.

    Evidence for validity of five secondary data sources for enumerating retail food outlets in seven American Indian Communities in North Carolina

    Int. J. Behav. Nutr. Phys. Act.

    (2012)
  • Frieden, T., 2011. Forward: CDC Health Disparities and Inequalities Report-United States, 2011. Morbidity and Mortality...
  • P. Gordon-Larsen et al.

    Inequality in the built environment underlies key health disparities in physical activity and obesity

    Pediatrics

    (2006)
  • Cited by (49)

    • Sociodemographic inequities in tobacco retailer density: Do neighboring places matter?

      2021, Health and Place
      Citation Excerpt :

      Second, although we identified retailers based on store types that are most likely to sell tobacco, this list may include retailers that do not sell tobacco, or there could be tobacco retailers missing; however, we have no reason to believe that this error is systematic. Though national validation of business establishment databases has not been conducted, two regional studies have indicated good validation (D'Angelo et al., 2014; Rodriguez et al., 2013). Given that some studies have indicated that patterns in sociodemographic inequities of vape shop density are different compared to what has been documented with retailer density of conventional tobacco retailers (Dai et al., 2017; Giovenco et al., 2016, 2019b), we conducted a sensitivity test and removed 3798 retailers that RefUSA classified with SIC code “599,306 Electronic Cigarettes.”

    • Associations of County Tobacco Retailer Availability With U.S. Adult Smoking Behaviors, 2014–2015

      2021, American Journal of Preventive Medicine
      Citation Excerpt :

      The U.S. Census Bureau uses North American Industry Classification System (NAICS) codes to classify business establishments in the U.S. Using reported tobacco product sales data from the 2012 Economic U.S. Census, NAICS codes that account for approximately 99% of all retail tobacco product sales (e.g., gasoline stations with convenience stores, tobacco stores) were identified.15 There is no tobacco retailer licensing system in the U.S., so consistent with other studies,3,16,17 a 2014 national list of probable tobacco retailers was created using these NAICS codes. This analysis used data from ReferenceUSA,18 a database of businesses that contains both NAICS codes and retailer addresses to identify probable tobacco retailers (details are provided in the study by Kong et al.19).

    • Validation of secondary data sources for enumerating marijuana dispensaries in a state commercializing marijuana

      2020, Drug and Alcohol Dependence
      Citation Excerpt :

      In fact, not all validity statistics were applicable to a combination of a gold standard and a test with the current study design (details in Technical Note S1). Following tobacco outlet research (D’Angelo et al., 2014), we considered validity statistics 0−0.2 to be poor, 0.21−0.4 to be fair, 0.41−0.6 to be moderate, 0.61−0.8 to be good, and 0.81–1.0 to be very good. R Version 3.5.3 (package “epiR”) was used to calculate 95 % confidence intervals for all the validity statistics.

    View all citing articles on Scopus
    View full text