Article Text

Download PDFPDF

Predictors of tobacco outlet density nationwide: a geographic analysis
  1. Daniel Rodriguez1,
  2. Heather A Carlos2,
  3. Anna M Adachi-Mejia2,3,
  4. Ethan M Berke2,4,
  5. James D Sargent2,3
  1. 1LaSalle University, Graduate Clinical Counseling Psychology, Philadelphia, Pennsylvania, USA
  2. 2Cancer Control Research Program, Norris Cotton Cancer Center, Lebanon, New Hampshire, USA
  3. 3Pediatrics, Dartmouth Medical School, Lebanon, New Hampshire, USA
  4. 4Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire, USA
  1. Correspondence to Dr Daniel Rodriguez, 1900 West Olney Avenue, Philadelphia, PA 19141, USA; drodrig63{at}


Objective To elucidate how demographics of US Census tracts are related to tobacco outlet density (TOD).

Method The authors conducted a nationwide assessment of the association between socio-demographic US Census indicators and the density of tobacco outlets across all 64 909 census tracts in the continental USA. Retail tobacco outlet addresses were determined through North American Industry Classification System codes, and density per 1000 population was estimated for each census tract. Independent variables included urban/rural; proportion of the population that was black, Hispanic and women with low levels of education; proportion of families living in poverty and median household size.

Results In a multivariate analysis, there was a higher TOD per 1000 population in urban than in rural locations. Furthermore, higher TOD was associated with larger proportions of blacks, Hispanics, women with low levels of education and with smaller household size. Urban–rural differences in the relation between demographics and TOD were found in all socio-demographic categories, with the exception of poverty, but were particularly striking for Hispanics, for whom the relation with TOD was 10 times larger in urban compared with rural census tracts.

Conclusions The findings suggest that tobacco outlets are more concentrated in areas where people with higher risk for negative health outcomes reside. Future studies should examine the relation between TOD and smoking, smoking cessation, as well as disease rates.

  • Smoking
  • tobacco
  • density
  • disparities
  • GIS
  • high-risk populations

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Almost 50 years after the 1964 Surgeon General's report implicating smoking as a cause of lung cancer, heart disease and chronic lung disease, smoking remains the leading preventable cause of death and disease in the USA, resulting in some 443 000 deaths annually.1 ,2 One contextual factor that may be related to an individual's propensity to smoke is access to tobacco, including the number of and proximity to stores that sell tobacco or ‘tobacco outlets’. Tobacco outlet density (TOD) is a community contextual health factor, which, along with air and water quality, the availability of parks and playgrounds for recreation or stores selling nutritious foods, could impact a community's health.3 Indeed, the results of several recent studies suggest a possible association between greater TOD and an increased likelihood of taking up smoking.4–6 Although the results of these early studies are mixed,7 ,8 and more research is clearly needed to clarify the extent of the relation, these findings have nevertheless prompted some researchers to suggest managing density of tobacco outlets as an important component of comprehensive tobacco control.9

Regional studies have found that TOD is higher in areas with lower median household incomes and those of minority race or ethnicity, particularly Hispanic and black.5 ,10–12 Coincidentally, one of the most striking findings in modern surveys of smokers is that smoking prevalence varies greatly across social class, making smoking predominantly a health risk factor today among the poor. For example, in 2007, the overall prevalence of smoking among US adults was 19.7%; however, prevalence varied markedly by education level13: ‘Adults who had a General Education Development (GED) diploma (44.0%) and those with 9–11 years of education (33.3%) had the highest prevalence of current smoking. Those who had an undergraduate or graduate degree had the lowest smoking prevalence (11.4% and 6.2%, respectively)’. Having lost the affluent segment, the poor segment is a clientele the tobacco industry can ill afford to lose.

If increased TOD in poor areas affects tobacco use there, this could exacerbate or maintain observed disparities in smoking prevalence. Increased TOD could affect tobacco use by, for example, lowering the cost of a tobacco purchase and increasing the propensity to purchase and use tobacco products.10 ,14 Higher TOD could also increase the initiation of new smokers by raising exposure to storefront advertising15 ,16 and increasing the visibility of individuals purchasing and consuming tobacco, thereby reinforcing a favourable subjective norm for smoking.17 ,18 This approach would be an effective marketing strategy as the perceived prevalence of smoking is associated with smoking, particularly in youth,19–23 the age group at greatest risk for smoking.24 ,25 Furthermore, higher retail tobacco density could derail quit attempts by increasing product visibility and encouraging impulse buying,18 ,26 and increasing the visibility and impact of promotional discounts for cigarettes.27 In this scenario, disproportionately high tobacco density in specific regions contributes to increased initiation, promotes higher daily use and limits cessation of tobacco use. This would logically lead to excess disease based on their greater neighbourhood access to tobacco.12

One problem with the evidence base supporting the above scenario is that most of the research in this area fails to account for population density. It is plausible that higher TOD in poor neighbourhoods is due to higher population density in poorer or minority neighbourhoods, which would logically support a greater area-based density of tobacco outlets to efficiently distribute product to the neighbourhood population. Another weakness in the extant research is the regional nature of the studies published to date, which limits generalisability.5 ,10–12 The results of these studies indicate that lower median household income and the proportion of blacks are associated with a higher density of tobacco outlets.5 ,10–12 However, the results are mixed for Hispanics, which could be a result of undercoverage of Hispanics in the regions studied. Finally, few studies have examined whether the relation between socioeconomic status and retail outlet density holds equally well for urban and rural areas. Because tobacco outlets in rural areas have greater geographic reach, they may serve both poor and more affluent tobacco users living there. Thus, we would expect to find less striking disparities in rural compared with urban communities; it is difficult to test this hypothesis in regional studies.

This study adds to the literature by assessing the association between socio-demographic indicators and the density of tobacco outlets across the 64 909 census tracts in the entire continental USA, using a retail outlet density measure that accounts for underlying population density. To assure that findings for poverty, blacks and Hispanics within a census tract were not spurious, we added measures of education and average household size to the model. The national sample allowed us to determine definitively whether the association with socio-demographic variables and TOD was modified by urban versus non-urban status. Finally, this study adds to the existing literature by validating commercially derived densities against densities derived from licensing data in one state, something that has not been done to our knowledge.



Tobacco outlet density

A national data set of tobacco outlets was created using North American Industry Classification System (NAICS) codes. The Office of Management and Budget developed NAICS for use by Federal Statistical Agencies, classifying all business establishments based on their primary activity. We obtained geocoded data from the NAICS Association ( for all likely points of sale for tobacco products, including establishments coded as tobacco stores, grocery stores, gas stations and convenience stores, identifying 306 695 addresses nationwide.

We produced a nationwide density surface for the tobacco outlets using adaptive bandwidth kernel density estimation28 ,29 using the LandScan Global Population Data Set, which estimates the underlying population based on satellite imagery.30 Adaptive bandwidth kernel density estimation allows the influence (the bandwidth) of each tobacco outlet to be limited to a surrounding population, or in our case, 1000 people. In choosing 1000 persons, we estimated that 1000 would be about the number of persons needed to support a small retailer. Our estimation is supported by the data, in that the median TOD in urban areas is close to 1.0 (it is 0.74). This mathematically constrains the influence of a single outlet to a small spatial area in regions where the population density is high (eg, urban areas), while in rural areas, the influence of the tobacco outlet is allowed to be larger geographically. For regions that are sparsely populated, we limited the influence of each outlet to a 25 km radius to prevent the density calculation from expanding to a spatially unreliable distance. We then used this density surface to calculate the mean TOD for each census tract by averaging the densities contained within each polygonal census tract. The resultant density unit, our criterion variable for this study, is measured in tobacco outlets per 1000 population. Data and maps for states and communities are available from the senior author on request.

Validation of the dependent measure

Several studies have raised concerns about the accuracy of NAICS and other commercial data sets. For example, in a field study31 of food outlets in rural towns, researchers found that only 39% of outlets were accurately characterised by Google Earth and Yahoo Yellow Pages. Since no one has validated commercial data for tobacco outlets, we assessed the concordance between densities calculated above for Washington State, obtaining addresses for all tobacco retailers (n=6386) from the Washington State Department of Health as of 1 February 2012. We successfully geocoded 97% of the licensees based on the street address (n=6196). Our national data set of tobacco retailers was from 2007 and contained 5623 outlets in Washington State. The difference in the number of outlets can be accounted for even with the 4+ year difference in compilation date, by inclusion of vending machines in the licenses data set and because the commercial data set did not include pharmacies and warehouse stores. Additionally, the commercial data set included all stores of a particular category (eg, grocery or convenience store), some of which may not have sold cigarettes.

The question pertinent to the present study was not the level of one-to-one correspondence for tobacco outlets but the correspondence between census tract TOD. We used the same kernel density procedure to estimate TOD for each of the 1318 census tracts in Washington State with the license data (TODL) and compared these with densities from the commercial data set (TODC). The question we sought to answer was whether use of commercial sources introduced bias into our study. We first examined the distributions for each variable and found them to be broadly similar (median (IQR) for TODL 0.32 (0.13–1.92) and for TODC 0.30 (0.13–0.78)), with mean TOD being significantly higher with the license data set. A lowess smoothed plot of TODC against TODL was linear, and the least squares regression confirmed that TODL was about 6% higher than TODC (β=1.06, t=61, R2=0.74). We created a new variable (TODL−TODC) to represent possible bias and fit a least squares regression with independent variables for urbanicity, per cent families in poverty, per cent Hispanic, per cent black and population of the census tract as independent variables, and it failed to reach statistical significance F (7, 1310)=1.94, p=0.06, despite the large sample size. Finally, we fit the model included in the present manuscript for Washington State using first TODL then TODC and found that the conclusions for both models were the same. While not national in scope, this validation provides very strong evidence that, although commercial data sets may underestimate the number of tobacco outlets, these underestimates are not biased with respect to the variables of interest in our study.

Socio-demographic variables

Predictor variables were based on US Census level data regarding the following socioeconomic and ethnic group indicators: proportion of black race, proportion of Hispanic ethnicity, proportion of families with income below the poverty level, proportion of women older than 25 years without a high school diploma or equivalent and average household size. These census data were extracted from the 2000 US Census for all census tracts within the continental USA.32 ,33 We selected these socio-demographic variables as they are known to be related to smoking.34 Furthermore, we assessed household size as it is an indicator of residential crowding, a health disparity indicator, that is, related to poverty, and has become an increasingly important issue given the recent influx of immigrant families into the USA.35 ,36


We used the Rural-Urban Commuting Area (RUCA) classification system to derive a binary indicator of urban versus rural census tracts. The RUCA classification system uses commuter patterns to derive a four-tiered classification system of census tracts into Metropolitan (Urban), Micropolitan (large rural town), Small Rural Town and Isolated Rural census tracts. We assessed the socio-demographic variables at each level of the four-tiered RUCA classification system,37 noting that differences were most pronounced when comparing urban to the non-urban census tracts (ie, Urban versus Micropolitan, Small Rural Town and Isolated Rural). Thus, we determined that a binary urban versus non-urban variable would best represent differences in the relations between our socio-demographic variables and TOD.

Data analysis

We assessed the effects of the socio-demographic variables and the two-tier urban/non-urban binary variable on our outcome variable, log tobacco output density per 1000 population within a census tract, with a multiple regression analysis. Predictor variables were entered using a hierarchical forced-entry method in two blocks. In the first block, the binary urban/non-urban variable was entered with the quantitative socio-demographic variables: average household size and proportion of Hispanic, black, families living in poverty and women older than 25 years without a high school diploma. All model quantitative predictor variables were log transformed and centred about their means to make the relation with the dependent variable more linear and to aid in the interpretation of interaction effects once added. We had tested the relation of each predictor variable with our criterion variable for an R2 improvement with a quadratic trend, and in no case did the quadratic trend improve substantively upon a simple linear trend. In the second block, the quantitative predictor variables along with the interaction of each quantitative variable with the urban/non-urban binary variable were entered into the model.

The significant interactions were illustrated by plotting TOD on the y-axis and representing predictor variables on the x-axis at 1SD below and above the predictor variable's mean. Slope calculations were based on the multiple regression formulae (y=b0+b1X+b2Y+b3XY), including the specific interaction terms Y (urban=1, non-urban=0). Thus, for instance, with respect to non-urban census tracts (Y=0), the slope for log proportion of Hispanics (X) included the intercept term (b0) and the gradient (b1) for Hispanics, but neither the interaction (XY) gradient (b2) nor the gradient for Y (b3), as each was multiplied by zero (non-urban, Y=0). To calculate the slope for urban census tracts, we reverse coded the moderator (0=urban, 1=non-urban) so again our formula only included the intercept term and the gradient for Hispanics. All data analysis was conducted with PASW Statistics V.18.0.


Descriptive statistics

Descriptive statistics are presented in table 1. Medians and IQRs were used as summary statistics as the majority of variables were positively skewed and positively kurtotic. Of the nearly 65 000 census tracts in the continental USA, approximately 45 000 are categorised as urban within the two-tiered RUCA classification system. The median density of tobacco outlets per 1000 population in a census tract was higher for urban (median=0.74, IQR=1.46) than for non-urban (median=0.22, IQR=0.29) census tracts. All comparisons between urban and non-urban census tracts were statistically significant, p<0.0001, even when differences were small (eg, average household size). However, these significant relations are a likely result of the large sample size.

Table 1

Descriptive statistics

Multivariate model

The results of the multiple regression analysis are presented in table 2. In the first block (Model 1), all main effects were significant (p<0.0001). Of the six predictor variables in Model 1, five had positive effects on the outcome variable, and only average household size had a negative association. Compared with non-urban census tracts, urban census tracts were associated with a 32% increase in TOD per 1000 population. Among the continuous variables, proportion of Hispanic and proportion of families living in poverty had the largest association with TOD: a 1% increase in each measure was associated with a 0.91% and 0.83% increase in density, respectively. These findings are consistent with an argument that socio-demographic variables are risk factors for greater TOD per 1000 population. Unexpectedly, there was an inverse relation between average household size and TOD per 1000 population. Specifically, each 1% change in the average household size was associated with a 0.57% decline in TOD per 1000 population.

Table 2

Results of the hierarchical multiple regression analysis

To explore whether the main effects were similar across urban and rural census tracts, we entered the five interaction terms in the second block (Model 2). Only the urban by poverty interaction was not significant (p=0.62), suggesting that there is no difference in the association between the proportion of families living in poverty within a census tract and TOD between the urban and non-urban census tracts. TOD per 1000 population was higher for urban than for non-urban census tracts, regardless of the level of the predictor variable (figure 1). The most striking differences were seen when comparing proportion of Hispanics in urban versus non-urban census tracts, with the slope for the effect of the proportion of Hispanics being much steeper in the urban (slope=1.12) than in the non-urban (slope=0.11) settings. For proportion of blacks, there was a different pattern, with a significantly steeper slope in the non-urban (slope=0.96) than in the urban (slope=0.53) census tracts. The association between proportion of women older than 25 years without a high school diploma and TOD per 1000 population was no longer significant once its interaction term was included in the model, suggesting that its effect on TOD was driven by urban as opposed to non-urban census tracts. Finally, for household size, as the average household size increased, the density of tobacco outlets decreased, but the decrease was steeper in the urban (slope=−0.70) than in non-urban (slope=−0.26) census tracts.

Figure 1

Urban versus rural differences in the effects of the socio-demographic variables on tobacco outlet.


This study clearly indicates, in a national sample, that race, ethnic and socioeconomic factors are positively associated with TOD. Furthermore, the study demonstrates that these associations are not simply a function of higher population density for the tracts where poor and minority populations tend to live. These data do not permit us to understand why such disparities exist, in part, because we cannot ascertain whether higher TOD is a specific industry targeting strategy,12 contributing to more tobacco use among those living in poverty or whether individuals living in poverty require higher TOD to satisfy their higher demand for tobacco products. However, the study points out the need for longitudinal research studies to understand what factors dictate tobacco points of sale in low-income communities,38 and whether such points of sale affect smoking among the individuals that live there.

One novel finding in this study is the striking difference in how TOD is related to socio-demographic variables in urban versus rural settings and race/ethnicity. Interestingly, poverty seems to confer risk for higher retail TOD regardless of whether the community is urban or rural. In contrast, the relation of tobacco outlets to the proportion of Hispanics living in a census tract is much more pronounced in urban than in rural areas, whereas the opposite is seen when comparing urban and rural areas for the proportion of black population. For Hispanics, this difference may be the result of their increased population growth rate and a greater proportion of tobacco outlets being distributed to Hispanic urban centres to meet the demand of new smokers; from 2000 to 2006, Hispanics accounted for 50% of the growth in the US population.39 Although the rates of smoking are lower among Hispanics than black and white adults, Hispanic adolescents smoke at twice the rate of black youth (18% vs 9%).34 ,40 Thus, it is possible that tobacco companies target larger Hispanic urban communities as a source of potential new smokers. We see the opposite pattern for the proportion of blacks. Although there is a positive relation between the proportion of blacks in a census tract and the density of tobacco outlets in both urban and rural census tracts, the positive relation is more pronounced in rural than in urban census tracts. This finding may be the result of certain rural areas with disproportionately high rates of smoking (eg, the Black Belt counties of rural Alabama) that naturally result in a greater density of tobacco outlets to meet the increased need for cigarettes.41 For instance, the national average of regular smoking for black men was 24% across the USA, whereas the proportion of weekly smokers was 40% in the Black Belt sample. Although this is but one rural region of the USA, and the proportion of black regular smokers in the Black Belt region of Alabama is still lower than white smokers,42 it is plausible that larger percentages of smokers in various rural communities may affect the relation of race to density across rural regions. Therefore, such diverse effects suggest a need for more detailed study of the role of tobacco-related health disparities within urban and rural settings.

Another interesting finding is the negative relation between average household size and tobacco outlets in urban and non-urban census tracts. Although historically, household size has been considered an indicator of overcrowding and lower socioeconomic status,43 our findings suggest that in the case of tobacco outlets, these assumptions may not hold. Perhaps the increase in mother-only families along with the sharp decline in dual-parent families with children younger than 18 years over the past 4 decades can explain this finding, suggesting that smaller households would be related to less income.44 The disproportionately greater proportion of black and to some extent Hispanic children living in mother-only families44 bolsters this possibility as greater proportions of both race/ethnic groups are associated with an increased TOD. Finally, the increased immigration of Hispanics has resulted in a shift in the cultural acceptance of larger households, as there is a greater tendency for recent immigrant Hispanics to reside in a single household to reduce expenditures so as to support families abroad.43 This would mean that even as income increases, there is a reduced likelihood of changing living arrangements.

There are some limitations to these findings. We did not include all possible tobacco outlets due to practical considerations regarding use of NAICS codes to define likely points of tobacco sale. For instance, some department stores and pharmacies (eg, Wal-Mart, Costco, CVS) may be tobacco outlets.45 However, many stores in these classifications do not sell tobacco (eg, Sears, Dollar Stores, hospital pharmacies), and this is why we excluded them. Additionally, available data sets on retailers have their own limitations, in that their sensitivity to the actual retail environment is only on the order of 60%–70%.46 However, the validation procedure conducted in Washington State reassures that use of commercial data does not bias the results for the relationships we studied. Another limitation to this study is that we did not alter the bandwidth by store category, so the model assumes that a small convenience store selling tobacco serves the same population size (1000 persons) as a large supermarket. It is also possible that the relations found in this study are specific only to census tracts and that conclusions would differ with alternative areas (eg, cities or counties). Finally, this study cannot distinguish whether tobacco companies target poor communities or simply respond to higher demand there. This is the biggest limitation to the research on this topic and underlines the need for longitudinal research to assess how changes in community structure, demographics and smoking relate to changes in the retail environment (or vice versa). Additionally, there is a need for more research on whether retail tobacco density and proximity affects smoking onset, maintenance and cessation.

The results of this study clearly demonstrate that access to tobacco is most pronounced in areas highly populated with individuals already vulnerable to negative health outcomes based on socio-demographic indicators.47 These national findings raise concern that tobacco points of sale could promote or exacerbate disparities in smoking prevalence and its associated morbidity and mortality. However, further study is needed to determine if limits on retail outlet density should be incorporated into tobacco control policy.

What this paper adds

  • While research supports that TOD may be related to health disparity variables (eg, race, ethnicity, poverty), studies thus far have used regional data potentially limiting the generality of their findings.

  • This study extends the prior research by assessing the effects of health disparity variables on TOD nationally using all 64 909 census tracts in the continental USA and generates many important questions for further study.

  • This study also adds to the existing literature by validating commercially derived densities against densities derived from licensing data in one state, something that has not been done to our knowledge.



  • Funding This work was supported by the National Institutes of Health (CA077026). EB is supported by the National Institute on Aging 1K23AG036934.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement We are able to provide any detail needed to other researchers to explain the processes necessary to calculate tobacco outlet density. We are willing to share our determination of tobacco outlet density at the census tract level for a nominal processing fee. Our continuous density US map allow us to determine point densities for researchers interested in using this variable, again for a nominal processing fee. For other access to the data, prospective researchers should contact Dr James D Sargent.