Article Text

Download PDFPDF
Machine learning applications in tobacco research: a scoping review
  1. Rui Fu1,
  2. Anasua Kundu2,
  3. Nicholas Mitsakakis1,3,
  4. Tara Elton-Marshall4,
  5. Wei Wang5,
  6. Sean Hill5,
  7. Susan J Bondy5,
  8. Hayley Hamilton5,
  9. Peter Selby5,
  10. Robert Schwartz2,4,
  11. Michael Oliver Chaiton2,4
  1. 1 Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
  2. 2 Ontario Tobacco Research Unit, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
  3. 3 Children's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada
  4. 4 Institute for Mental Health Policy Research, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
  5. 5 Centre for Addiction and Mental Health, Toronto, Ontario, Canada
  1. Correspondence to Dr Michael Oliver Chaiton, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada; Michael.chaiton{at}


Objective Identify and review the body of tobacco research literature that self-identified as using machine learning (ML) in the analysis.

Data sources MEDLINE, EMABSE, PubMed, CINAHL Plus, APA PsycINFO and IEEE Xplore databases were searched up to September 2020. Studies were restricted to peer-reviewed, English-language journal articles, dissertations and conference papers comprising an empirical analysis where ML was identified to be the method used to examine human experience of tobacco. Studies of genomics and diagnostic imaging were excluded.

Study selection Two reviewers independently screened the titles and abstracts. The reference list of articles was also searched. In an iterative process, eligible studies were classified into domains based on their objectives and types of data used in the analysis.

Data extraction Using data charting forms, two reviewers independently extracted data from all studies. A narrative synthesis method was used to describe findings from each domain such as study design, objective, ML classes/algorithms, knowledge users and the presence of a data sharing statement. Trends of publication were visually depicted.

Data synthesis 74 studies were grouped into four domains: ML-powered technology to assist smoking cessation (n=22); content analysis of tobacco on social media (n=32); smoker status classification from narrative clinical texts (n=6) and tobacco-related outcome prediction using administrative, survey or clinical trial data (n=14). Implications of these studies and future directions for ML researchers in tobacco control were discussed.

Conclusions ML represents a powerful tool that could advance the research and policy decision-making of tobacco control. Further opportunities should be explored.

  • health services
  • public policy
  • surveillance and monitoring

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Twitter @drpselby

  • Contributors RF, AK and MOC designed the study. AK conducted the literature search. RF, AK and MOC conducted eligibility screening. RF and AK extracted the data. RF undertook data synthesis and led writing and revision of the manuscript. NM, TE-M, WW, SH, SJB, HH, PS, RS and MOC made substantial contribution to the preparation and revision of the manuscript. All authors read, critically revised and approved the final version of the manuscript before submission.

  • Funding This work was supported by the Canadian Institutes of Health Research Catalyst Grant #172898.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.