Objective Despite recent increases in little cigar and cigarillo (LCC) use—particularly among urban youth, African-Americans and Latinos—research on targeted strategies for marketing these products is sparse. Little is known about the amount or content of LCC messages users see or share on social media, a popular communication medium among youth and communities of colour.
Methods Keyword rules were used to collect tweets related to LCCs from the Twitter Firehose posted in October 2014 and March–April 2015. Tweets were coded for promotional content, brand references, co-use with marijuana and subculture references (eg, rap/hip-hop, celebrity endorsements) and were classified as commercial and ‘organic’/non-commercial using a combination of machine learning methods, keyword algorithms and human coding. Metadata associated with each tweet were used to categorise users as influencers (1000 and more followers) and regular users (under 1000 followers).
Results Keyword filters captured over 4 372 293 LCC tweets. Analyses revealed that 17% of account users posting about LCCs were influencers and 1% of accounts were overtly commercial. Influencers were more likely to mention LCC brands and post promotional messages. Approximately 83% of LCC tweets contained references to marijuana and 29% of tweets were memes. Tweets also contained references to rap/hip-hop lyrics and urban subculture.
Conclusions Twitter is a major information-sharing and marketing platform for LCCs. Co-use of tobacco and marijuana is common and normalised on Twitter. The presence and broad reach of LCC messages on social media warrants urgent need for surveillance and serious attention from public health professionals and policymakers. Future tobacco use prevention initiatives should be adapted to ensure that they are inclusive of LCC use.
- Advertising and Promotion
- Non-cigarette tobacco products
- Tobacco industry
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Little cigars and cigarillos (LCCs) are an understudied domain in tobacco control and are particularly interesting because of the strategic and targeted marketing used to promote these products to youth and communities of colour.1–8 Although smoking rates in the USA have decreased, the recent declines in cigarette consumption are offset by sharp increases in the consumption of other tobacco products including cigarillos.9 These products are increasingly aggressively marketed on the internet (eg, through social media) and at the point of sale and may serve as a means to introduce youth to tobacco.8 ,10 By definition, cigarillos are slimmer versions of a large cigar and weigh 3–10 lb per 1000 cigars; little cigars weigh not more than 3 lb per 1000 cigars; they resemble cigarettes but are wrapped in tobacco leaf rather than paper.11 Cigar smoke contains the same toxic and carcinogenic constituents found in cigarette smoke and may cause oral, laryngeal, oesophageal, lung cancer, heart disease, aortic aneurysm, etc.11 Users often inhale LCC smoke and thus absorb it into their lungs and bloodstream.11
Although the US 109 Food and Drug Administration's decision in 2016 extended regulatory authority to all tobacco products, including LCCs, these products are not currently subject to many of the regulations on cigarette sales and advertising.12 ,13 Unlike cigarettes, for example, cigars can still be sold in flavours and in packs of fewer than 20. This lack of regulation could provide an opportunity for the industry to market cigars more aggressively in the USA. Nationally, cigar smoking is the second most common form of tobacco use among youth.9 ,14 Use of these products is also disproportionately high among youth, young adults and people of colour.8 Furthermore, LCCs are often used as vehicles for marijuana consumption (a process called ‘blunting’ where the tobacco is hollowed out of a cigar and replaced with marijuana).15 Blunt smokers often identify themselves as marijuana users but not as tobacco users, which may have led to underestimates of population of LCC consumption.15–17 Thus, in recent years, there has been an increased scientific interest in the relationships between tobacco and marijuana use among youth and young adults in regard to the direction of uptake pathways. Since both substances are typically smoked, tobacco and marijuana use may support and reinforce use of each other.18 Recent evidence suggests the emergence of a reverse gateway mechanism, where marijuana use precedes tobacco smoking and can lead to nicotine dependence.19–22 Marijuana users may be an emerging target for the tobacco industry marketing.23
Characterising the role of new media platforms in tobacco product marketing and counter marketing is critically important as these platforms largely remain under the radar of tobacco control policymakers and are not currently covered by the advertising restrictions that apply to outdoor and television advertising. In fact, social media have become a major marketing platform for tobacco products.24–26 New communication technologies offer alternative means for gathering and managing information, which are not present in traditional media and provide high brand visibility for tobacco products.27 The emergence of new technologies has resulted in profound changes in the media landscape and has led internet users to encounter a vast amount of online information exposure, including social exposure on social-networking sites, such as Twitter. Twitter is particularly important as this platform is disproportionately popular among hard-to-reach populations traditionally at risk for tobacco use, such as youth and communities of colour.28 Use of this platform is increasing. According to the 2015 Pew Research Center Report, 23% of online adults use Twitter, compared to 16% in 2012.28 In 2015, roughly 38% of all Twitter users used the site daily.28 Approximately 28% of Twitter users are black, 28% are Hispanic and 32% are aged 18–29, which is consistent with the fact that youth, African-Americans and Latinos in general use social media at higher rates than the general population in the USA. Twitter popularity among these groups is growing,28 and it has come to play a major role in the life experience of American youth and ethnic minorities.
Prior research has shown that there is an influx of tobacco and nicotine product promotion on social media, with ∼5 million messages about electronic cigarettes/vaping products posted on Twitter, by 1.2 million unique accounts over a 1 year period.29 Social exposure to this content contributes to normalisation and glamorisation of smoking and may influence the spread of smoking behaviours via individual's social networks (followers/friends and followers of followers/friends of friends).30 Social media use provides greater speed of information retrieval and higher level of media control, making it easier for consumers to actively search for, produce, block and retransmit tobacco-related information, including product marketing.27 ,31 Consequently, selective information exposure and transmission processes may allow social media users to establish an information filter ‘bubble’ in which tobacco use is portrayed as a normal acceptable behaviour and becomes part of shared, in-group experience.31 Thus, tobacco-related messages on social media may lead to tobacco use initiation through such mechanisms as social learning or modelling of behaviours32 and socialisation into peer groups.33–37 Indeed, portrayal of tobacco and alcohol misuse is becoming a common activity to network about on Twitter.25 ,29 ,38 ,39
In addition to product promotion, social-networking sites are also used by the tobacco industry and their allies to influence public opinion on tobacco control policy decisions.40 Therefore, there is an urgent need to develop communication theory and technology, as well as programming infrastructure for active engagement with user-generated tobacco-related content on social media (ie, surveillance, labelling, filtering) to enable potential regulation of commercial advertising messages on these platforms as the methodological base for systematic audit of the tobacco-related content on these sites is lacking. Discovering how LCCs are marketed online and on social media has important and direct relevance to potential FDA regulations for these products.
Our study fills these research gaps by using cutting-edge statistical and computational methodologies to analyse LCC-related Twitter posts. While a recent study by Step et al (2016) analysed a sample of 288 LCC-related tweets, to the best of our knowledge, there has been no prior comprehensive systematic research on the magnitude of LCC message exposure and sharing on Twitter or marketing strategies used to promote tobacco products on this social-networking site.41 For the purpose of this study, we collected data on the amount and variety of LCC-related information that smokers and non-smokers are exposed to and post on Twitter and conducted analyses to identify major sources and themes of LCC content. We used the message content and related metadata to investigate product preferences (eg, brand and flavour), behaviour (purchase and use context), social norms (eg, subculture frames, peer group references) and product marketing strategies.
Data acquisition and processing
The present study is based on tweets filtered by 70 keywords related to LCCs over the period of 3 months (October 2014, March–April 2015). We purchased LCC-related tweets from Gnip (http://www.gnip.com), the official Twitter data provider. A tweet was included in the data set if it matched one or more of the keyword rules (eg, brands: swisher OR swishers, swisha OR swishas, splitarillo OR splitarillos; product names, including slang terms: rello OR rellos, rillo OR rillos, blunt-‘james blunt’-‘emily blunt’-st_blunt-‘too blunt’-‘be blunt’) (see online supplementary appendix 1 for a complete list of search rules). Keyword rules were chosen based on the trends (eg, through use of http://www.topsy.com that showed the volume of relevant Tweets over the past 30 days as well as examples of actual Tweets containing the searched keyword), prior literature and research team expert consensus based on knowledge of LCC-related terminology and brands.42 We used Boolean rules rather than individual keywords to make our search filter more efficient, minimise the amount of irrelevant tweets captured and reduce the number of duplicates. The Gnip Historical Powertrack delivered a collection of posts (in .json format) containing one or more search terms; the resultant data were stored in a NoSQL database, MongoDB, and cleaned using python programming language to create analytic data.
Training samples and machine learning classifiers
To assess whether the captured tweets were relevant to LCCs, accurately measure the volume of the social conversation about LCCs and determine trends, we estimated the retrieval precision (the proportion of the data relevant to the LCC topic) and retrieval recall (the amount of all relevant conversation captured) of the keyword filters used to gather data.43 For this purpose, we first built a machine learning classifier based on a human-coded training sample. Two coders rated a random sample of 5124 tweets (the sample was stratified by the search rule) as relevant and non-relevant to LCCs. The two coders achieved notably high agreement (α=0.95) on an overlap sample of 600 tweets. This human-coded sample was used to train the machine learning classifier to clean the entire corpus of the tweets. Machine learning is data-driven analytic approach in which computational systems develop algorithms based on a training set (a subset of the data) to determine prediction of outcomes in a separate, test data set.44 The goal of supervised learning is generalisation to unseen data,45 that is, developing a model that allows to map unseen observations to one of the human labels.46 If a model performs well in predicting outcomes for the test data set, it may predict well for the rest of the database. Hence, this approach allows to reliably automate large data classification. After comparing several machine learning methods including Naïve Bayes algorithm, logistic regression and linear support vector machine (SVM) classifier, linear SVM with L1-norm regularisation was selected due to its high performance. Ten-fold cross-validation was utilised to test the accuracy of the classifier.47 Classifier accuracy was 0.95, classifier recall (sensitivity) was 0.96 and classifier precision (positive predictive value) was 0.96 (F1=0.96). The machine classifier performance was further tested with additional human coding of 1040 tweets (test data set) to confirm that the good classifier performance is not a coincidence due to the parameter set-up in the classifier training but a good fit of the whole population, we took an additional random sample of the raw data to check the accuracy of machine classifier result against human labels (95%). In addition to classifier precision and recall, we estimated retrieval precision and recall following the suggestions by Kim et al.43 Retrieval precision was approximated by classifier precision (96%). Computation of retrieval recall involves non-retrieved (although relevant) tweets, that is, LCC-relevant tweets that do not contain the LCC keyword rules in the denominator. To determine retrieval recall, we randomly sampled 4000 tweets that do not contain the LCC keyword rules (ie, tweets relevant to other products) from our database, which contains a total number of over 21 million tobacco and smoking-related tweets, and found that 4% of these 4000 non-retrieved tweets were relevant to LCC. Therefore, our retrieval recall of the keyword filter was estimated to be 87%, suggesting that our keyword filters retrieved about 87% of all relevant tweets in our larger tobacco-related corpus.
A similar iterative process of combining human coding and machine learning was used to classify all collected tweets based on the themes of interest to informing policy and public health, namely, commercial/promotional content and co-use of marijuana and tobacco content.
First, we classified the relevant tweets as either organic or commercial. Organic tweets were those deemed non-sponsored; they reflected individual opinions or experiences or linked to non-promotional content. Commercial tweets were defined by the presence of any of the following: branded promotional messages; URLs linking to commercial websites; usernames indicating affiliations with commercial sites or user's Twitter page consisting only of promotional tweets (ie, spammer accounts). Two human coders reviewed all tweets posted by a sample of 3000 Twitter accounts (intercoder reliability was high: α=93%). Human codes were used to train linear SVM machine classifier. Classifier accuracy was validated using 10-fold cross-validation. Classifier accuracy was 0.97, classifier recall (sensitivity) was 0.92 and precision (positive predictive value) was 0.92 (F1=0.92). Machine classifier was further tested with additional human coding of tweets posted by 343 accounts; 97% of the test data set was correctly classified.
In addition, we classified the relevant tweets as those referencing co-use of tobacco and marijuana and those referencing tobacco use only (ie, LCCs). Co-use tweets contained any reference to marijuana. Specifically, these posts were defined by the presence of any references to using LCCs for the purpose of making blunts (ie, hollowed-out cigars filled with marijuana leaf), any terms referring to marijuana strains, marijuana slang terms such as loud, green, purp and mid and any references to being under the influence (‘high’) due to marijuana use. Tobacco use only tweets contained references to LCC use exclusively. Two human coders rated a sample of 2670 tweets (inter-coder reliability was high: α=95%). Resultant codes were used to train the linear SVM classifier; 10-fold cross-validation was applied to assess classifier performance. Classifier accuracy was 0.98, classifier recall (sensitivity) was 0.99 and precision (positive predictive value) was 0.99 (F1=0.99). Machine classifier was further tested with additional human coding of 185 tweets; 98% of the test data set was correctly classified.
Metadata associated with each tweet were used to examine the characteristics of accounts tweeting about LCCs. Thus, such user-level information as the number of followers was analysed to measure the potential reach of collected LCC-related tweets. We defined potential reach as the total times tweets were posted or the sum of followers for all tweets.
Furthermore, we utilised account metadata to categorise users as influencers (1000 and more followers) and regular users (under 1000 followers). As tweets posted by influencers have greater potential reach compared to messages posted by regular users, we assessed whether there were substantive differences in LCC-related content posted by these groups.
Since we analyse the entire population of LCC-related tweets posted over the 3-month period rather than a sample, we directly interpret the proportions of tweets posted by the two groups of users without statistical hypothesis testing. Furthermore, due to large data size, any null hypothesis will be rejected and p value from any statistical test will be close to 0.
Keyword algorithms were used to search the tweets to assess the frequency with which they mentioned specific brands, price-related, music-themed promotion and other content related to marketing strategies targeting youth and vulnerable populations that may be of interest to informing policymaking, public health and communication research. More specifically, we used keyword strings to quantify the amount of content featuring promotional offers (ie, using strings including money, deal, %, $, save, promo, dollars, discount, coupon, code, price, cost), brand references, popular memes (ie, using strings such as ‘hits blunt’, ‘pass the twitter blunt’), lyrics and subculture references (eg, rap/hip-hop lyrics, celebrity). These themes were selected to help inform future policymaking and interventions to prevent LCC use among youth and other populations at risk.
To conduct additional exploratory analyses of the meaningful trends in LCC-related conversation, we used latent Dirichlet allocation (LDA) topic modelling—a form of machine learning and a natural language processing tool for identifying patterns of themes or topics in a corpus of unlabelled documents.48 This is an unsupervised method to discover topics occurring in documents, that is, tweets. A topic may be defined as a cluster of words that frequently appear together. The R package ‘mallet’ was used to generate the topics by LDA. The number of topics was set to 200, and retweets were excluded from the analysis because retweets may dominate the amount of posts and obscure the topic patterns. The R package ‘wordcloud’ was used to generate word clouds that visualise topics using the top 200 terms ranked by LDA-generated weights per topic. Given a topic, weights indicate the relative importance of terms; the higher the weight of a term, the more likely a document (ie, tweet) containing this term will belong to this topic. Within a word cloud, a larger font size indicates greater weight and the same colours indicate approximately the same weights. Therefore, word clouds provide a relative gage of how important a word is within a given topic. This visualisation allows the reader to see the most important terms, as well as the less important ones to aid interpretation. On the automatic discovery of 200 topics, the research team reviewed the word clouds displaying the topics and assigned labels for each substantive topic based on its salient terms. After identifying meaningful patterns, the LCC-related topics were grouped into larger categories or archetypes.48–50
During the data collection period (October 2014, March–April 2015), our keyword filters captured over 4.5 million tweets, of which 4 372 293 were classified as LCC-relevant tweets. These tweets were posted by 1 849 322 individual Twitter users. Our analyses revealed that 1 836 557 accounts posting about LCCs (99%) were organic, and <1% (N=12 765) of accounts were obviously commercial. The overwhelming majority of LCC-related tweets (3 636 176 or 83%) contained references to marijuana. The frequency of LCC-related tweets over time for the month of April 2015 to help illustrate temporal trends in posts is shown in figure 1. There was a sharp increase in LCC-related tweets on the 4/20 ‘smoking holiday’. An example of a tweet sent on this holiday was from a cigar and cigarillo brand Executive Branch, owned by Snoop Dogg (a famous rapper). Snoop Dogg retweeted the tweet promoting his cigarillo brand, as well as Snoopdogg rolling papers: ‘RT @ExecBranch: Only way to celebrate #snoop420 right is with some @snoopdogg rolling papers! Get yours now on [link redacted]’.
Almost one-third of all tweets about LCCs (29%) were memes. An internet meme can be defined as an element of a culture (ie, activity, concept, catchphrase or piece of media) which spreads, often as mimicry, from person to person via the internet.51–53 LCC-related memes were predominantly humorous tweets containing links to or embedding images, videos or vines referencing blunt smoking behaviour. One of the most popular memes were the ‘hits blunt’ tweets, containing questions a smoker would ask after smoking a blunt. An example of a popular retweet featuring this meme was: ‘*Hits blunt* If I go see the Grand Canyon do I actually see the Grand Canyon? #PSAT [link redacted]’ (retweeted 4323 times). Other examples of popular retweets featuring blunt-related memes were
When you’re so high you roll your homie up into the blunt [link redacted] (10406 retweets or RTs)
When you hit the blunt too hard [link redacted] (7468 RTs)
*Girl hits blunt once* Changes twitter name to Flower Child *wears huff socks**listens to Bob Marley* (5778 RTs)
Joints or Blunts? [link redacted] (6300 RTs)
Table 1 shows the characteristics of LCC-related tweets collected, presented in total and separately for influencers versus regular users. We found that ∼17% of account users posting the LCC content were influencers, these users had a potential reach of 1 868 926 085 impressions (or 79% of total potential reach). Influencers were ∼30% more likely to mention specific LCC brands and 33% more likely to post promotional messages. Regular users were more likely to retweet messages, to post marijuana-related content and to post tweets featuring memes.
Table 2 lists examples of popular influencer accounts whose tweets referencing LCCs were frequently retweeted, and it also includes examples of popular retweets of messages posted by these accounts.
Influencer users included three major groupings: rap or hip-hop celebrities such as Snoop Dogg, Wiz Khalifa, Drake; rap community accounts (eg, Rappers Said, Rapper Reactions); and marijuana user or ‘stoner’ communities (eg, Stoner Nation, Happy Campers, Weed Tweets, Marijuana Posts, High Ideas, Stoner Beauties, Intelligent Stoners, Life as a Stoner, Stoner Chicks, StonerXpress, Stuff Stoners Like, etc). These influencer accounts had a large number of followers (for instance, Stoner Nation/@TheStonerNation was followed by over 480 000 users) and posted content on tobacco and marijuana co-use (ie, blunt use), however, it is noteworthy that most of these accounts did not feature any age restrictions or health warnings.
Figure 2 illustrates the results of the exploratory topic modelling analysis. As mentioned above, we set the number of topics to be 200. The resultant topics could be generally grouped into four major categories or archetypes: (1) product-related messages, including references to specific brands, flavours, cigarillos or blunt size, quality and burning speed; (2) marijuana references, including co-use of tobacco and marijuana; marijuana slang terms and strains; (3) smoking behaviour—purchase, intention to smoke and buy and (4) normative and cultural context references, including memes, rap/hip-hop lyrics, birthday, subcultures, work, school, celebrities, music, etc. Topic modelling captured a number of hip-hop lyrics quotes referencing LCCs, for example, lyrics by such rappers as Wiz Khalifa, Drake, Chief Keef, Tupac, were captured as individual topics. Although retweets were excluded from the analysis, memes represented a significant proportion of the topics (eg, ‘hits blunt’ and ‘when you hit the blunt too hard’ appeared among the 200 topics). This grouping appears to be a sizeable part of the corpus. We generated word clouds from the weights of top 200 terms within each LCC-related topic, and figure 2 shows sample word clouds for each of the four archetypes. Overall, the majority of LCC-related topics appeared to be organic as very few topics featured explicit promotional messages.
Summary and discussion
The role of social media platforms, such as Twitter, in tobacco control is an emerging area of research. Our work builds on a growing body of the literature that uses mixed-method computational approaches to assess health trends from digital media24–26 ,38 ,39 ,54 and adds to prior literature by demonstrating that Twitter appears to be a major messaging and information-sharing social platform for LCCs. In fact, our study is first to quantify the overall presence of LCC-relevant content on Twitter. The amount of tobacco-related messages has grown rapidly in recent years; for instance, estimates of e-cigarette-related content range from 1.7 million tweets over a 5-year period between 2008 and 201355 to nearly 5 million posts captured in 2013,29 suggesting a sharp increase in the popularity of e-cigarettes and marketing efforts. Our study provides further evidence that there is an influx of tobacco-related conversation on Twitter. Our keyword filters captured over 4 372 293 LCC-related tweets, which is more than twice as many tweets over a 3-month period as the number of posts about e-cigarettes over a 5-year time frame described by Kim et al. The majority of the LCC-related conversation was organic and referenced LCC and marijuana co-use in the form of blunts. The ostensibly organic nature of the conversation and the large proportion of memes (nearly 30%) may indicate that open discussion of LCC use appears to be a normalised, generally accepted/popular activity on Twitter. LCCs are becoming part of popular culture. Although the majority of LCC-related topics appeared to be organic, with few topics featuring overt commercial content, we found that influential users, such as rap/hip-hop celebrities, were more likely to mention specific LCC brands and post promotional messages about LCCs. These findings indicate that prior strategies used to identify commercial content on social media may not be effective in capturing integrated industry promotional tactics, for example, across tobacco and music industries. Further, these findings suggest that tobacco promotion via brand references/product placement is blurring the boundary between rap music, the hip-hop culture and the tobacco industry, which also allows for active consumer engagement, for example, through event promotion, contests stimulating organic user-generated content. Such a focus on developing personal two-way relationships with consumers allows product promoters to integrate into organic user social networks and foster high-speed viral/word-of-mouth information dissemination.56–59 These findings are consistent with LCC and cigarette promotion strategies starting in the 1970s, such as integration with lifestyle trends, music-themed promotion and use of celebrities.60 ,61
We found that ∼83% of LCC-related tweets referenced marijuana co-use. This finding confirms that LCCs cannot be considered without the context of marijuana use and may be primarily used for rolling blunts. For instance, recent data indicate that up to 90% of marijuana users are concurrent tobacco smokers and that tobacco use among marijuana users may be seriously under-reported.19 ,62 Marijuana users are becoming an emerging target for the tobacco industry marketing and numerous accounts including celebrity accounts, such as Snoop Dogg, rapper and ‘stoner’ community accounts on Twitter post and retweet promotional content on tobacco. Further research is needed to understand the relationships between tobacco and marijuana promotion and use among at-risk populations and the direction of uptake pathways.
Our study has several limitations and raises questions for future research. As with any social media research, our conclusions depend on the validity of data collected through our search filters. However, we achieved high amount of accuracy based on testing classifier precision and recall, as well as the retrieval recall of the keyword filter. Our full list of keyword rules is disclosed in online supplementary appendix 1. Another potential limitation has to do with the definition of influencer accounts. Our definition was based on the number of user followers only; future studies should seek to further develop this definition and measure additional features that may have an impact on tweeter influence (eg, number and frequency of posted tweets, number of likes/retweets by followers). Furthermore, while media sources, including social-networking sites, are an important source of influence on tobacco users, individual users of LCCs and social media have complex motives and predispositions that may mediate or moderate behavioural outcomes.
Future research and health campaigns need to address cultural engagement, for example, music-themed promotion, and social media presence of the tobacco industry. Our findings have direct implications for future FDA regulations of LCCs and related products, particularly with respect to marketing restrictions on social media. There is an urgent need for surveillance, monitoring and regulation of social media content relevant to tobacco. Content of Twitter posts is currently not subject to any regulation in regard to health risk disclosure or restriction for underage users. New strategies are needed to protect youth and address the transformation of tobacco advertising into transcendental branding, where the boundaries between marketing and entertainment are indistinguishable. Traditional efforts to restrict the amount of tobacco advertising, its placement and content to protect vulnerable populations cannot address the industry's integrated marketing approaches.
What this paper adds
Despite recent increases in cigarillo use, research on targeted strategies marketing these products is sparse. Little is known about the amount or content of little cigar and cigarillo (LCC) messages users see or share on social media.
This study reveals that Twitter is a major information-sharing and marketing platform for LCCs and co-use of tobacco and marijuana is common and normalised on Twitter.
Tobacco use prevention initiatives should be adapted to ensure they are inclusive of LCC use and social media.
Contributors GK, SE, HT, YK and YS together designed the study; YS and HT conducted cleaning and pre-processing of the data; HT, GK, YK and YS conducted data analysis; SE, GK and HT contributed to data interpretation; GK wrote the first draft; SE and YK revised the draft; the final version of the paper has been reviewed and approved by all four coauthors.
Funding This study was funded by National Cancer Institute (U01CA154254).
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.