Article Text

Download PDFPDF

Population Assessment of Tobacco and Health (PATH) reliability and validity study: selected reliability and validity estimates
  1. Roger Tourangeau1,
  2. Ting Yan1,
  3. Hanyu Sun1,
  4. Andrew Hyland2,
  5. Cassandra A Stanton1,3
  1. 1 Westat, Rockville, Maryland, USA
  2. 2 Department of Health Behavior, Roswell Park Comprehensive Cancer Center, Buffalo, New York, USA
  3. 3 Department of Oncology, Georgetown University Medical Center, Washington, DC, USA
  1. Correspondence to Dr Roger Tourangeau, Westat, Rockville, MD 20850, USA; RogerTourangeau{at}


Introduction This paper reports a study done to estimate the reliability and validity of answers to the Youth and Adult questionnaires of the Population Assessment of Tobacco and Health (PATH) Study.

Methods 407 adults and 117 youth respondents completed the wave 4 (2016–2017) PATH Study interview twice, 6–24 days apart. The reinterview data were used to estimate the reliability of answers to the questionnaire. Kappa statistics, gross discrepancy rates and correlations between answers to the initial interview and the reinterview were used to measure reliability. We examined every item in the questionnaire for which there were at least 100 observations. After the reinterview, most respondents provided a saliva sample that allowed us to assess the accuracy of their answers to the tobacco use questions.

Results There was generally a very high level of agreement between answers in the interview and reinterview. On the key current tobacco use items, the average kappa (the agreement rate adjusted for chance agreement) was 0.79 for adult respondents (age 18 or older). Youth respondents exhibited equally high levels of agreement across interviews. The items on current tobacco use also exhibited high levels of agreement with saliva test results (kappa=0.72). Rating scale items showed lower levels of exact agreement across interviews but the answers were generally within one scale point or category.

Conclusions The PATH Study questions were developed using a careful protocol and the results indicate the answers provide reliable and valid information about tobacco use.

  • nicotine
  • cotinine
  • non-cigarette tobacco products

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


The Family Smoking Prevention and Tobacco Control Act became law in 2009. It authorised the Food and Drug Administration (FDA) to regulate tobacco products. To fulfil this mandate, the FDA required a base of evidence to inform its decisions, monitor their implementation and assess their impact. It partnered with the National Institute on Drug Abuse to create this base through the Population Assessment of Tobacco and Health (PATH) Study.

The PATH Study is a national longitudinal study that is following more than 40 000 members of the US household population ages 12 and older. It includes both tobacco users and non-users. The fourth wave of the study was completed in 2017. It uses audio computer-administered self-interviews (ACASI) to collect information on use, attitudes and perceptions of tobacco products; knowledge of the products and their health consequences; tobacco use cessation attempts, outcomes and rates of relapse; uptake of new products, product and brand switching; use of two or more tobacco products and health conditions, including those potentially related to tobacco use.

Relatively few national surveys assess the reliability and validity (RV) of answers to survey questions, although most survey researchers agree that valid and reliable data are central goals for questionnaire designers.1 One notable exception is the Current Population Survey (CPS), which has used reinterview data to estimate the reliability of answers to the CPS labour force questions on which the monthly unemployment figures in the USA are based.2 3 The CPS uses the most common method for assessing the reliability of survey questions—respondents are interviewed twice, getting the same questions a few weeks apart. Two basic measures are commonly used to assess the level of reliability: the gross discrepancy rate (GDR), which is the proportion of respondents who give different answers to the question in the two interviews, and Cohen’s kappa, the proportion of respondents giving the same answer in the two interviews corrected for chance. The GDR is a measure of unreliability.

Several studies have evaluated the reliability of survey data about substance use, including tobacco use. These include a series of studies investigating the reliability of answers to the Composite International Diagnostic Interview Substance Abuse Module.4–6 In addition, one study examined the reliability of items on cigarette use from the Youth Risk Behavior Survey Questionnaire, and another, the reliability of items (including items on smoking) from the Behavioral Risk Factor Surveillance System Survey.7 8 More recent examples include the 2006 National Survey on Drug Use and Health (NSDUH)9 and the 2012 National Epidemiologic Survey on Alcohol and Related Conditions-III (NESARC-III),10 both of which incorporated reinterview studies (as did the previous NESARC survey done in 2001–200211). Another study examined selected questions on the 2002–2003 Tobacco Use Supplement to the CPS.12 A final study examined the consistency of reports about the onset of tobacco and other substance use.13

Valuable though these studies are, they have various limitations. Most examine only a few tobacco use items and only four use nationally representative samples (the Johnson and Mott study, the 2006 NSDUH study, the NESARC-III and the CPS study). Few of them explore the correlates or causes of unreliability. The PATH RV Study (PATH-RV) used the method pioneered by Cottler and Compton5 to explore reasons for discrepant answers; in the reinterview, discrepant answers triggered questions designed to identify the reason for the discrepancy.

Besides examining the reliability of answers to the PATH Study questions, our study assessed the validity of answers to tobacco use questions. Several prior studies have also attempted to validate self-reports about tobacco use. Most of them compare self-reports with cotinine levels in blood,14 saliva15–17 or urine18 samples; some studies have used carbon monoxide levels in breath samples as the gold standard.19 These studies generally find relatively low levels of under-reporting of tobacco use,20 and several find evidence that light or infrequent users over-report use of tobacco. Three of these studies examine data from the National Health and Nutrition Examination Survey (NHANES),21–23 a large national survey that features a medical examination of the respondents. The most comprehensive of the studies based on NHANES found a false negative rate (the proportion of reported non-users who test positive for cotinine) of less than 2%21; other studies find false negative rates of 4%–5%,14–17 though one finds higher rates of under-reporting.24 Under-reporting is highest in certain subgroups, such as adolescents and people trying to quit.4 20

The PATH-RV Study used two gold standards for assessing validity. At the end of the reinterview, the field interviewers took saliva samples and also asked to photograph any tobacco products respondents had reported. We examined whether the results from the saliva samples and photographs were consistent to the survey reports.


The PATH-RV Study replicated the main systems and procedures of the main PATH Study, using the same instruments, software and interviewers administer the questions.


The target population for the PATH-RV Study (like that for the main PATH Study) was the civilian household population in the USA age 12 and older. The samples for both studies were stratified, multistage area probability samples, with metropolitan areas and counties selected in the first stage, individual blocks or adjoining blocks selected in the second stage, individual addresses in the third stage and sample persons in the final stage. The PATH-RV Study sample was selected in a subsample of the first-stage samples areas for the PATH Study. Online supplementary appendix A provides the details of the PATH-RV sample design.

The PATH-RV Study sample included 9782 sample addressees, which were mailed a short screening questionnaire to identify adult (18 and older) tobacco users and non-users and youths (12–17 years old). A total of 2296 households returned questionnaires and we selected 865 adults and 266 youths for the PATH-RV sample. In households where a youth was selected, we also randomly sampled one of the adults. In households with more than one youth, one of them was selected at random.

Data collection

Data collection took place in two phases. The first was the mail screening effort, which consisted of up to six mailings to sample addresses: an advance letter, an initial survey package with a cover letter, screening questionnaire and US$5 cash incentive (later reduced to US$2); a thank you/reminder postcard; a replacement survey package (with a replacement questionnaire) and a final reminder postcard (later increased to two reminders). We received 2296 completed screening questionnaires; at another 643 addresses, the mailings were returned as undeliverable. The overall response rate to the screening component of the study was 25.1% (American Association for Public Opinion Research response rate 3).25

Westat field interviewers, all experienced PATH Study interviewers, carried out the field work. The PATH-RV Study training covered the components of the study that differed from the main PATH Study, such as the request for a reinterview. The training consisted of a home study lasting about 4 hours and a 1-hour group session via WebEx. A total of 68 interviewers conducted the PATH-RV field data collection.

This consisted of an initial interview and reinterview done 6–24 days later. Both were done using ACASI, the same mode used in the main PATH Study. In ACASI, the computer displays the questions to the respondents on screen and also plays them aloud to the respondents via earphones. As in the main PATH Study, a text-to-speech synthesised voice was used to generate the audio version of the questions. Both interviews used the main PATH Study wave 4 questionnaire. The median interview lengths for the initial interviews were 65.7 min for the adults and 41.6 min for the youths; the median reinterview lengths were 67.4 min for the adults and 38.8 min for youths.

As in the main PATH Study, adults were offered US$35 for completing each interview and youths, US$25. Both youths and adults were offered US$10 to provide saliva samples. Prior to contacting sample youths, interviewers first obtained parental consent.

With a few exceptions, the protocol was the same in the reinterview done 6–24 days later as in the initial interview. The reinterview questionnaire included some additional items after the regular PATH Study questions. Among the items added were questions asking the reasons for discrepant answers for selected items. The possible reasons for discrepancies included true change, misunderstanding the question, memory problems, inattention and reluctance to answer truthfully. These discrepancy probes were administered at the end of the reinterview; the programme had a record of the respondent’s answers in the earlier interview and could detect discrepancies in real time.

After the reinterview, respondents were asked to provide saliva samples. Respondents who agreed provided a saliva sample using the Alere iScreen screening device, which detects cotinine for up to 4 days after tobacco use. The device includes a saliva-absorbing sponge; respondents were to keep the sponge in their mouths until it had completely softened. The interviewer then placed a cap over the sponge, collapsing it and initiated the test. Three results were possible—positive (cotinine present), negative or invalid (inconclusive results, reflecting insufficient saliva or other problems).i For adult respondents who reported using any tobacco products in the second interview, the interviewer also asked to photograph the products. For the 110 respondents who agreed, the interviewer photographed the product(s) using the computer’s camera. The photographs were subsequently coded for the product type (eg, cigarettes) and brand (eg, Newport); we compared these to the products and brands reported in the reinterview.

Table 1 shows the outcomes of the field work, including the number of sampled persons who completed the PATH-RV interviews and the number who provided saliva samples. Overall, 46.3% of the sample members completed the two interviews; 89.5% of those also provided a saliva sample.

Table 1

Data collection results for the Population Assessment of Tobacco and Health reliability and validity Study, by subgroup

Comparison to main PATH Study

A central goal for the PATH-RV Study design was to reproduce the essential features of the main PATH Study data collection. We used the same questionnaire, mode of data collection, software to administer the questions and interviewers. Table 2 summarises the key features of the two studies.

Table 2

Comparison of design features of main Population Assessment of Tobacco and Health (PATH) Study and PATH-reliability and validity (RV) Study

sis focuses on four questions: (1) How reliable are the PATH Study wave 4 data? (2) How do the reliability estimates for these questions compare to those for similar NSDUH questions? (3) When discrepancies over time were found, what are the reasons for them? and (4) Do the interview responses agree with the saliva test results and photograph data? The results here are unweighted, since we are not making population estimates.

Reliability of the interview data

We calculated reliability estimates for every item for which at least 100 respondents answered the question in both interviews—a total of 447 questions from the adult questionnaire and for 229 from the youth questionnaire. Because of skip patterns in the questionnaire, some questions were administered to relatively few respondents.

Table 3 presents the results for nine questions asking about current use of various tobacco products. For the adults, the average kappa value is 0.72; for the youths, the average kappa is 0.79. According to Landis and Koch,26 this constitutes ‘substantial’ agreement. The kappa values are low for some products because very few respondents reported using them. For both age groups, the GDRs indicate that over 95% gave identical answers in both interviews. Results for past year use and for lifetime use are similarly high (see below).

Table 3

Kappas and gross discrepancy rates (GDRs) for current tobacco use items in the PATH-RV Study

The reliability estimates are lower for the questions asking about the respondent’s awareness of various tobacco products (table 4), ranging from 0.32 to 0.64 for the adults and from 0.25 to 0.57 for the youth. In some cases, the answers may differ because respondents learnt about the product in the first interview. This appears to be the case for snus and dissolvable tobacco, where most of the discrepancies involve respondents reporting they were unaware of the product in the first interview but aware of it in the reinterview. Since these are both generally less well known than the other products, it is likely that respondents would have become aware of them because of the first interview. For the adults, 40 of the 72 discrepant answers for snus and 39 of 46 discrepant answers for dissolvables follow this pattern; for the youth, 22 of the 33 discrepant answers for snus and 18 of 21 discrepant answers for dissolvables follow this pattern. The discrepant results for the remaining tobacco products show no clear pattern.

Table 4

Kappas and gross discrepancy rates (GDRs) for items about the awareness of each tobacco products

Table 5 presents mean kappas and GDRs for all the questions we analysed, by four different response formats—dichotomous items (mostly yes-no questions), categorical items with more than two response options, ordinal rating scales and items asking for numerical responses. There were only five numerical items for the youth, and they are omitted from the table.

Table 5

Mean reliability estimates by age group and item type in the PATH-RV Study

For the categorical and ordinal questions, we examine kappas and GDRs based on exact agreement across interviews and also based on approximate agreement (answers within one category of the earlier response). We refer to the latter statistics as generalised kappas and GDRs. We also present mean weighted kappas, which take the size of the discrepancy into account. The mean kappas are in the 60s (‘substantial agreement,’ according to Landis and Koch,26) and when respondents gave answers that did not agree exactly, they were often within one response category, producing much higher generalised kappas and much lower generalised GDRs. For the numerical questions, we examined the Pearson correlation of answers across two interviews as the measure of reliability.

Comparison with NSDUH reliability estimates

How do the estimates from the PATH-RV Study compare with similar estimates from the 2006 NSDUH Reliability Study? As can be seen from table 6, they are quite similar. Although the question wordings are not necessarily identical in the two surveys, they are generally similar and both studies use ACASI as the mode of interviewing. The kappa values are close for the two surveys and there is no clear pattern of differences across the two—sometimes the PATH Study reliability is higher, sometimes the NSDUH reliability is higher and sometimes the two estimates are identical.

Table 6

Kappas from PATH-RV Study and 2006 NSDUH Reliability Study

Reasons for discrepancies

As already noted, when the answers in the two interviews were discrepant, they were often close—within a scale point. Table 7 illustrates this point, with reliability estimates for five ratings of the respondent’s health and quality of life. The first four of the variables were assessed on five-point scales, ranging from ‘excellent’ to ‘poor.’ The final variable was also assessed on a five-point scale that ranged from ‘extremely satisfied’ to ‘not at all satisfied’. The generalised kappas and GDRs indicate that there is almost perfect agreement across interviews when this more liberal standard of agreement is applied.

Table 7

Exact and approximate agreement for five rating scale items

The PATH-RV reinterview questionnaire also probed respondents about why they gave different answers in the two interviews. Items subject to discrepancy probes include lifetime tobacco use, current tobacco use, tobacco products and brands, and health and risks. Table 8 shows the distribution of responses across all the discrepancy probes. (Different items were probed for the youth and adults, so the results for the two groups are not comparable.) True change between the two interviews accounts for a substantial proportion of the discrepancies, though for a higher proportion of the discrepancies for the youth respondents (44%) than for the adults (11%). Lapses of attention also account for many of the discrepancies. Respondents attributed very few of their discrepant answers (less than 2% in both samples) to discomfort in answering the questions truthfully.

Table 8

Reasons for discrepant answers for adults and youth

Validity of the answers

That the answers were consistent (or at least close) over time does not guarantee that they were accurate. Our study included two validity checks. First, we compared respondents’ answers to the tobacco use questions to the saliva test results. The survey results reflect respondents’ answers to questions about current use of several different types of tobacco products (eg, ‘Do you now smoke cigarettes every day, some days or not at all?’). Respondents were classified as current users if they reported using any tobacco product every day or some days. Table 9 shows that the survey data and saliva test results agree for 87.5% of the respondents (kappa=0.72). Of the 15 respondents who reported they were tobacco users in the second interview but tested negative, 12 were occasional users (saying they used one or more tobacco products ‘some days’).

Table 9

Agreement between Population Assessment of Tobacco and Health reliability and validity survey answers and saliva test results

Second, the reports of the adult respondents about the types of tobacco products they were currently using (and which particular brands) were compared with the products identified by coders from the photographs interviewers took. At the product level, the photographs matched the survey reports 98.1% of the time. If the respondent reported using cigarettes, the photograph almost always showed cigarettes (in 85 out of 86 instances), and, if they reported using some other product, the photograph almost always showed that other product (19 out of 20 instances). There were more discrepancies at the level of brands. The brand reported by the respondents agreed with the brand identified in the photographs 87.5% of the time. (The survey item asking about brand only lists brands rather than variety or sub-brand. Thus, if a respondent reported ‘Marlboro’ and the photograph showed Marlboro Lite, it was considered a match.)


The PATH-RV Study is one of several studies demonstrating that respondents give reliable answers even to potentially sensitive survey questions, such as questions about use of tobacco,7–9 13 alcohol10 11 and illicit drugs.9–11 Many of these studies use forms of self-administration, which promotes accurate responses to sensitive questions.27 Like the NSDUH, the main PATH Study collects data mainly via ACASI. It is reassuring that the PATH-RV Study and 2006 NSDUH Reliability Study, which used the same mode of data collection and similar questions, produce very similar reliability estimates (table 6).

Relatively few respondents reported being uncomfortable answering the questions truthfully (see table 8). Instead, discrepancies between the answers in interview and reinterview seemed to reflect several causes. First, respondents sometimes confessed to lapses in attention. They also reported problems in remembering and lack of familiarity with the products; memory issues may be particularly an issue with questions on lifetime use of tobacco products, though one study found that such items did surprisingly well in the CPS Tobacco Use Supplement.12 Other discrepant answers seemed to reflect actual changes in behaviour between the two interviews. Even when answers did not agree exactly across the two interviews, they were often close—within a scale point.

We also assessed the validity of some answers. For almost 90% of the respondents, reported tobacco use was confirmed by a saliva test. There were discrepancies in both directions, including 15 apparent over-reporters (respondents who said they used tobacco but tested negative for cotinine). Twelve of these 15 involved occasional users, who may have used tobacco outside the 3–4 days window within which the test we used can detect tobacco use. A novel feature of this study was the use of photographs to validate the products and brands respondents reported. Because the data collection computers had built-in cameras, it was relatively easy for the interviewers to take pictures of the tobacco products respondents provided. Respondents almost always correctly identified the type of product they were using (98% agreement), but occasionally did not identify the right brand (87.5% agreement).

The PATH-RV Study sought to answer four questions: (1) How reliable are the PATH Study wave 4 interview data? (2) Are the PATH Study data as reliable as those for similar questions administered as part of NSDUH? (3) What were the reasons for discrepant answers in the two interviews? and (4) How accurate are the PATH Study data? Results reveal that the reliability estimates are quite comparable to those reported in earlier reliability studies, and the discrepancies that are found do not appear to reflect systematic flaws in the questions.

Still, the study has its limitations. For example, the window for detecting one tobacco use in the Alere iScreen test does not precisely map onto self-reports of ‘every day’ or ‘some days’ use of tobacco. In addition, the response rates across the major phases of the study—screening questionnaire, initial interview, reinterview, saliva sample and photographs—is less than 10%. Thus, there is considerable room for non-response error in the study.

What this paper adds

  • Prior studies have examined the reliability of survey data on sensitive topics, including illicit drug use, and have shown that survey respondents give reliable answers.

  • There have been no comprehensive evaluations of the reliability and validity (RV) of answers to questions about tobacco use. This is important because there are many new types of tobacco products, some of them unfamiliar to many respondents.

  • The Population Assessment of Tobacco and Health (PATH-RV) Study documents that the data collected in the PATH Study are both reliable and valid.


We are grateful to Doug Williams, Tammy Cook, Vanessa Meldener and Victoria Vignare for their oversight of the data collection effort, to Martin Baer and Ed Dolbow for the work in programming the instrument, to Antonia Warren for help with coding the photos and conducting some analyses, to Mike Jones for selecting the PATH-RV Study sample and creating the PATH-RV Study sample weights, and to Charles Carusi and Wendy Kissin for their thoughtful comments on the paper.



  • i In a small pretest with 12 self-reported smokers and 8 self-reported non-smokers, the Alere iScreen test correctly classified all 20 volunteers. We collected a second saliva sample from the volunteers and sent it for laboratory testing to determine cotinine levels via the Salimetrics EIA assay. All 20 of the volunteers were classified in the same way by the Salimetrics EIA assay as by the iScreen test.

  • Contributors All of the authors helped design the study. RT took the lead in the write-up, with input from the other authors. TY and HS carried out the analyses.

  • Funding The work reported here was funded by a grant from the National Institute on Drug Abuse, National Institutes of Health (5R01DA040736-02 to RT).

  • Disclaimer The views and opinions expressed in this manuscript are those of the authors only and do not necessarily represent the views, official policy or position of the US Department of Health and Human Services or any of its affiliated institutions or agencies.

  • Competing interests None declared.

  • Patient consent Not required.

  • Ethics approval The Westat IRB approved this data collection protocol.

  • Provenance and peer review Not commissioned; externally peer reviewed.