Aims To use association rule mining methods to investigate prescribing of smoking cessation medication in the UK primary care and to identify the characteristics of numerically important groups of patients who typically do, or do not, receive cessation therapy.
Design An association rule mining study using The Health Improvement Network Database.
Settings and participants 282 433 patients aged 16 years and over from 419 UK general practices, who were registered with the practice throughout 2008 and recorded as a current smoker during that year.
Outcome Prescription for any type of smoking cessation medications in 2008 (nicotine replacement therapy, bupropion or varenicline).
Variables Age, gender, lifestyle indicators and co-morbidity.
Results Of the current smokers, 37 731 (13.4%) were given prescriptions for smoking cessation treatment during 2008. Prescriptions were particularly likely to be given to women, those aged 31–60 years, and people with diagnoses of chronic obstructive pulmonary disease and depression. On the contrary, of patients with dementia, with alcohol intake over recommended levels, atrial fibrillation or chronic kidney disease was extremely unlikely to be prescribed a smoking cessation medication. However, the largest group of patients who did not receive therapy was young and otherwise healthy individuals.
Conclusions This novel approach identified sizeable and easily definable groups of patients who are systematically failing to receive support for smoking cessation in primary care. Association rule mining can be used to identify key numerically important groups at high or low risk of receiving treatment and hence potentially to improve healthcare delivery.
- Smoking cessation prescription
- association rule mining
- primary care database
- health services
- primary healthcare
- public policy
- secondhand smoke
- environmental tobacco smoke
Statistics from Altmetric.com
- Smoking cessation prescription
- association rule mining
- primary care database
- health services
- primary healthcare
- public policy
- secondhand smoke
- environmental tobacco smoke
Data mining methods are widely used in the analysis of marketing and other commercial data1–4 and are increasingly used in healthcare research and management5–7 but to our knowledge have not, to date, been applied in tobacco control research. Data mining differs from traditional hypothesis-driven epidemiological approaches in that it sets out to identify new patterns and relationships between variables in large data sets without consideration of prior knowledge and to explore the effects of multiple combined exposures on outcomes. Data mining methods thus offer an alternative approach to the analysis of medical activity in large healthcare databases and in particular to assessing equality of healthcare provision.
Smoking is the largest preventable cause of death and disability8 and of social inequality in health,9 in the UK. Helping current smokers to stop smoking is crucial to preventing the 100 000 deaths caused by smoking in the UK each year,8 ,10 and offering behavioural support and cessation pharmacotherapy should be a core component of all healthcare contacts, including primary care consultations.11 ,12 However, survey data indicate that while most UK smokers want to quit smoking and about 25% attempt to do so each year,13 only about a quarter of quit attempts involve pharmacotherapy and in only about half of these cases are the medicines prescribed by a doctor.13 It is thus likely that repeated opportunities to initiate and support quit attempts in primary care patients are currently being missed.
We therefore set out to explore current prescribing patterns for the three main smoking cessation therapies (nicotine replacement therapy (NRT), varenicline and bupropion) in primary care to attempt to identify the characteristics of patients who tend to receive, or not receive, cessation therapy. To do so, in a novel approach, we have used association rule mining to analyse smoking cessation prescribing in a major national primary care data set, The Health Improvement Network (THIN) (http://www.thin-uk.com/).
Design, data and settings
We used anonymised data from the electronic primary care records of patients from 419 general practices across the UK, which contribute to THIN Database. The data set represents approximately 6% of the UK population and has been shown to be reasonably nationally representative in terms of patient demographic characteristics, though possibly slightly less representative of the more deprived social groups and of young adults.14 For analysis, we selected all individuals aged 16 years or over, who had been registered with a THIN practice for the whole of 2008 and had any Read code recorded in 2008 indicating that they were a current smoker (using Read codes that have been defined in a previous study14). For each of these patients, we identified whether they had received any prescription for NRT, bupropion or varenicline during 2008, using drug codes from the British National Formulary. We also extracted data on age (categorised for analysis as under 30, 31–60 and 61 or over), gender, quintiles of Townsend score according to postcode and on whether patients were coded as heavy smokers (coded YES if >20 cigarettes/day), alcohol consumption over recommended levels (coded YES if >14 units/week for women and 21 units/week for men) and obesity (coded YES if body mass index >30). We also extracted data on medical conditions from the clinical disease indicators in the Quality and Outcomes Framework15 since these have been well recorded and are conditions for which there are health benefits from improved management, including smoking cessation support, in primary care. These include asthma, atrial fibrillation, cancer, coronary heart disease, chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), dementia, depression, diabetes, epilepsy, heart failure, hypertension, mental handicap, psychotic disorders, stroke and transient ischaemic attack and thyroid disorders. Any diagnosis made before the end of 2008 was included.
Background to association rule mining
Association rules show conditions that occur frequently together in a given data set. An association rule is an implication expression of the form , where X is the subset of exposures, and Y is the subset of outcomes. X and Y are mutually exclusive. Association rules have to satisfy constraints on measures of significance and –reliability—support and confidence. The support level of a rule is the percentage of instances (current smokers in this context) with both conditions , and the confidence level is the percentage of instances with conditions X and Y within those instances with condition . Low support may indicate that a rule has simply occurred by chance or identifies a very small proportion with specified characteristics in the data set, and confidence is used to measure the reliability of the generated rule. In this study, we are looking for rules such as:
If there are 5000 smokers in the data set, the left side of the rule shows that there are 4000 male patients younger than 30 (group X) and the right side that 3000 patients in group X received a prescription. Therefore, the confidence level for this rule is 75% (3000/4000), and the support level is 60% (3000/5000).
In this work, we used the open-source data mining toolkit WEKA16 (http://www.cs.waikato.ac.nz/ml/weka/) to undertake the association rule mining work. The general Apriori algorithm was used to execute the rule mining task.16 The algorithm generates all the rules with support and confidence greater than the minimum settings and then lists the top pre-setting number of rules ranked by the confidence. In data mining, the generated rule is called ‘knowledge’ (or a hypothesis), which needs to be verified before it can be applied to real life.
We used the association rule mining technique to generate rules for whether patients received, or did not receive, any prescription for a smoking cessation medication based on exposures that included demographic information, smoking and lifestyle indicators and medical conditions. Since we were interested in identifying numerically important exposure groups while also ensuring that each of the patient groups represented by the disease indicators had potential to be included in our rules, we set a minimum exposure group size of 500 rather than a support level. The generated rules are ordered by the confidence and the size of exposure group.
For comparison purposes and to identify the unique features of the results from association rule mining, we also applied the more traditional approach to this type of data and compared the characteristics of smokers who were prescribed smoking cessation treatments in the study period with smokers who were not prescribed treatment using cross-tabulation and multiple logistic regression in Stata V.11, presenting results in terms of ORs and 95% CIs, with adjustment for all other variables in the model. p Values were calculated using the likelihood ratio test.
A total of 282 433 patients were recorded as current smokers in 2008 and included in the analysis; 46.7% were men, and approximately 45% were in the two most deprived quintiles of Townsend score (table 1). Of these, 37 731 (13.4%; 12.8% of men and 13.9% of women) received a prescription for one or more type of smoking cessation medication during 2008.
The top 20 rules for patients who received prescriptions are shown in table 2. Prescriptions were most common among female smokers aged 31–60 years who had ever been diagnosed as having COPD and depression, and these characteristics were shared by all the top 20 rules. The other common features in the rules for those receiving prescriptions include not drinking in excess of recommended levels (all top 10 rules), not being obese (all top 10 rules), not having hypertension (rule 1), thyroid disease (rule 2), psychotic disorders (rule 3), heart failure (rule 4), atrial fibrillation (rule 5), stroke (rule 6), dementia (rule 7) or mental handicap (rule 8). The top rule indicates that of those who were female, aged 31–60 years, with a diagnosis of COPD and depression who did not drink over the recommended level and were not obese and without hypertension, 35.7% received at least one smoking cessation treatment from their general practitioner (GP) during 2008. The combination of COPD, depression and without hypertension or psychotic disorder and being female and aged 31–60 years appeared in five of the top 20 rules.
When we applied the same approach to identify those who did not receive a smoking cessation medication, we initially generated a large number of rules of the form ‘one or more disease indicator=NO, age ≤30’, and their confidence levels were all very high (>97%, most of them 100%), indicating that young smokers in good health are very unlikely to receive a smoking cessation therapy. To explore prescribing in those with other co-morbidity, therefore we reset the constraints for rule generation such that our disease indicators were restricted to those that indicated presence, not absence, of disease. The top 20 rules for smokers who did not receive cessation therapy prescriptions are shown in table 3. The top 2 rules demonstrate that over 97.5% of smokers with dementia did not receive prescriptions, particularly those aged over 60 years. Weekly alcohol consumption over recommended levels was a very common feature of those who did not receive prescriptions, occurring in 14 of the top 20 rules with 92.4% of these patients not receiving a smoking cessation drug (rule 12). This was evident in all age groups (rules 4, 15 and 17) and both genders (rules 3 and 5). Other characteristics of those who did not receive prescriptions were the presence of hypertension, age over 60, atrial fibrillation and CKD.
The associations between patient characteristics and receipt of a prescription for a smoking cessation treatment from traditional analysis of ORs are shown in table 4. The likelihood of receiving a prescription is significantly increased in women, in those aged 31–60-year-olds, in those with COPD, those with coronary heart disease, those who defined as heavy smokers and those with depression, while those with high alcohol intake and those with dementia, hypertension, CKD or atrial fibrillation are less likely to receive a prescription. However, unlike the association rule mining method, the traditional analysis identified the relationship between each individual exposure and likelihood of receiving a prescription rather than identifying combinations of indicators associated with high or low risk of receiving a prescription.
In this study population in 2008, about 13.4% of smokers received a smoking cessation medication, a substantially higher proportion than that recorded previously at 6.4% over 2 years from 2001 to 200317 but nevertheless a small proportion of all smokers. Given that 63% of smokers report readiness to quit and 25% make a quit attempt each year,13 it is thus evident that opportunities to support these attempts with cessation pharmacotherapy are being missed. Association rule mining identified a number of groups who were far more likely to receive prescriptions—particularly women aged between 31 and 60 years with a diagnosis of COPD and depression—and groups that were far less likely to receive prescriptions—particularly healthy young people and those with co-morbidity, particularly dementia, high alcohol intake, atrial fibrillation and chronic renal disease. A conventional logistic regression analysis showed broadly consistent findings, but the difference between the two approaches evident from the results is that association rule mining identifies specific, clinically and demographically identifiable categories of patients that can easily be targeted in clinical practice.
Our study relied on identifying current smokers by their Read codes, which we have previously shown to accurately reflect the prevalence of smoking found in national surveys,14 and in which we have demonstrated that smoking status is now updated more regularly.18 We included those recorded as current smokers in 2008 to ensure we were accurately identifying current smokers but with the result that our study excluded smokers whose smoking status was not recorded that year; while this may mean that our study population comprises relatively more of those who have their smoking status updated regularly such as those with chronic conditions, any such bias is relatively unlikely to have affected the associations that we have seen with prescribing in our study group. The data set does not capture non-prescription use of NRT, which accounts for about half of all cessation medication use in practice,19 and does not establish whether prescribed smoking cessation medication was used as part of a quit attempt or indeed whether it was used at all. We have chosen to analyse data on prescriptions rather than brief advice or referrals to Stop Smoking Services since the smoking cessation prescribing data in THIN are valid,20 while data on brief advice may be less reliable,21 but this study does identify patients in whom GPs have intervened in relation to smoking by providing a smoking cessation medication prescription and the characteristics of those who tend not to be treated. In our study population, over two-thirds of those prescribed medication were prescribed NRT, and analysis of those who did and did not receive an NRT prescription resulted in the same common features as those prescribed for all medication (data not shown); we have not carried out a data mining analysis in those receiving bupropion or varencline alone where numbers were smaller. Some of the discrepancy between the data mining and OR analyses is attributable to our decision to present only those rules including more than 500 cases; the determinations of support level or equivalently the minimum exposure group size is a crucial step in association rule mining. We explored in sensitivity analysis (data not presented) the effect of varying the minimum exposure group size and even at 50 rather than 500, the conclusions with respect to the common features of those who receive or do not receive prescriptions were not appreciably changed, although different specific characteristics of exposure groups were identified if the support level was varied. Nevertheless, this highlights an important difference between the two approaches. Conventional OR analyses do not typically select exposures on the basis of prevalence, and although it would be feasible to adopt such an approach, the association rule approach provides a simple means of identifying groups that are important in numerical terms, rather than exposures that are important in terms of strength. The association rules approach also has the potential advantage of approaching data with no prior hypothesis or null hypothesis, in what is in effect a structured exploration intended to identify discriminators that are important in terms of the number of patients they identify.
A previous study by Wilson et al 17 reported on the prescribing of smoking cessation treatments in UK general practice and factors that predicted their use from 2001 to 2003. We examined data for 2008, after the new GP contract released in 2004 and the introduction of a new smoking cessation drug, varenicline, at the end of 2006 and the introduction of smoke-free policy in 2007, all of which may have contributed to the increase in prescribing and which might have been expected to also impact on the characteristics of those who received prescriptions. We found a number of features that were in common with previous reports, including the higher likelihood that women would be prescribed smoking cessation drugs by their GPs, as were people aged between 31 and 60 years.17 ,22 Whether this reflects increased contact between GPs and these categories of people is not clear, but if contact alone was an important determinant of prescribing, then prescribing would generally be expected to be increased in most groups with chronic disease.
This was not however the case. The association rule mining approach identified that those with some co-morbidities, particularly COPD and depression, were more likely to receive prescriptions, while those with other co-morbidities of equally similar importance (atrial fibrillation and chronic renal disease in our study) were not. There is no obvious explanation for these biases in prescribing, and it is unclear whether the absence of prescribing was attributable to the GP not offering, the patients not accepting treatments or both of these options. Some doctors might regard cessation therapies to be relatively contraindicated by conditions such as hypertension or atrial fibrillation, patients that in our study were less likely to receive medication. However, while NRT is recommended to be used with caution in some conditions, for example, those with unstable hospitalised cardiovascular disease and in hepatic and renal impairment, most warnings associated with NRT also apply to continued cigarette smoking and the risks of continued smoking outweighs any risks of using nicotine preparations.23 There is also no obvious reason why prescribing should be biased in relation to dementia, obesity or alcohol intake.
That heavy smoking did not emerge from the association rule mining analysis as an important feature of those receiving prescriptions and, though significant, had a small effect size in the logistic regression analysis, was inconsistent with previous findings that cigarettes per day predict use of smoking cessation medications in those making a quit attempt.19 We also explored the effect of performing our analysis only in those recorded as heavy smokers but found that the patterns of those receiving or not receiving prescriptions were very similar to that seen in the whole patient population (data not shown). These findings may reflect the poor reliability of data on cigarette consumption collected in primary care, perhaps resulting from the fact that number of cigarettes per day is not required to be recorded under the Quality and Outcomes Framework. Overall, our results suggest that GPs do not use this information in their decision making.
Variations and inconsistencies are inevitable in clinical practice, and effective means of monitoring and correcting inappropriate biases in prescribing and delivery of other medical interventions are therefore important to ensure equitable delivery of services. This study illustrates how association rule mining approaches could be used in this context, namely by identifying groups that have been neglected or excluded from conventional service provision and providing an opportunity and means to set up a system whereby GPs are prompted to discuss a treatment or other type of care with patients in these groups. The strength of the association rule mining approach in relation to a more conventional logistic regression analysis is that it has identified sizeable groups that can easily be defined and identified for intervention at practice level, in real time to allow more focused and immediate correction of bias in service provision. Data mining methods thus provide a potentially very powerful means to monitor, intervene in and improve on the practice of care delivery at aggregate and individual practice level. Further work is now indicated to determine whether these findings can be applied to identify neglected groups in routine applications and thus improve on the delivery of this and other essential primary care interventions.
What this paper adds
This paper investigated the characteristics of patients who have and have not being provided with smoking cessation prescriptions in primary care. An association rule mining method has been applied to identify sizeable and easily definable groups of patients that have been excluded from service provision, which is novel in this context. Our findings suggested that prescribing smoking cessation treatments is still underused among the younger smokers and those with co-morbidity, particularly dementia, high alcohol intake, atrial fibrillation and chronic renal disease.
Competing interests None.
Ethics approval EPIC Scientific Review Committee.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.