Statistics from Altmetric.com
Awareness and use of Electronic Nicotine Delivery Systems (commonly known as electronic cigarettes or e-cigarettes) have increased rapidly among adults and youth in the past few years. Two recent studies found that ever use of e-cigarettes doubled among adults between 2010 and 2011 (from 3.3% to 6.2%),1 and among youth between 2011 and 2012 (from 3.3% to 6.8%).2 Coinciding with this increase is the rapid expansion of e-cigarette marketing on a variety of media channels, including the internet and traditional platforms (print, radio, event sponsorship and TV) some of which have long prohibited tobacco advertising.
Because e-cigarettes are not currently subject to the same marketing restrictions as traditional cigarettes, researchers and the public health community have expressed concern about potential impact of the rapid expansion of e-cigarette marketing on e-cigarette initiation and transition to regular use of electronic and combustible cigarettes, especially given limited scientific evidence related to these products’ long-term health impact, efficacy in smoking substitution or cessation,3 and potential role in combustible tobacco uptake. Therefore understanding the scope and content of e-cigarette marketing across various media platforms becomes critically important. Social media could be used to promote e-cigarettes’ role in smoking cessation despite limited scientific evidence supporting such claims, and there is evidence that such claims are found on social media.4 Better understanding of e-cigarette marketing on social media can provide valuable insights towards characterising forces that propelled the exponential growth of e-cigarettes from 2010 to 2012. Isolating and analysing the impact of e-cigarette marketing on the microblogging platform Twitter is important because Twitter use is widespread and growing rapidly, particularly among young adults and minority populations.5
Examining e-cigarette marketing on social media platforms is also important from a US policy perspective. Due to recent court rulings, the US Food and Drug Administration (FDA) cannot regulate e-cigarettes as drugs or medical devices unless they are marketed for therapeutic purposes.6 While FDA can regulate e-cigarettes as tobacco products as a result of the 2009 Family Smoking Prevention and Tobacco Control Act, the agency has yet to announce its specific plan for e-cigarette regulation in the USA. Ample evidence indicates that internet messaging has been employed to bypass marketing restrictions on combustible tobacco.7 ,8 Discovering how e-cigarettes are marketed online and on social media has important and direct relevance to potential FDA regulations for these products.
Our study fills these research gaps by examining a snapshot of e-cigarette marketing on Twitter over a 2-month period (May–June 2012). Twitter is a microblogging website that forms a valuable repository of public information on consumer attitudes and behaviours. Twitter's popularity has increased rapidly since its founding in 2006; over 200 million active monthly users worldwide now produce more than 500 million tweets and 2.1 billion Twitter search engine queries daily.9 Users of Twitter tend to be young (30% of internet users on Twitter were aged 18–29 years in 2013) and members of minority groups (27% of internet users on Twitter were African American and 28% Latino in 2013, compared with 14% White).5 Data generated through Twitter is not limited to the content of tweets themselves, but also includes metadata attached to each tweet, such as time posted and user-level information including geo-identifiers (when user-enabled) and number of accounts a user is followed by and follows, and user's Klout score (a measure of a user’s influence on social media platforms).10 Tweet content and associated metadata can be leveraged to track communications by consumers and companies.
In this paper, we use a novel methodology to capture and analyse e-cigarette related tweets and metadata through the Twitter Firehose (Twitter's certified complete data stream), measuring the volume of Twitter communications related to e-cigarettes and quantifying the extent to which tweets about e-cigarettes include mentions of smoking cessation, safety and price promotions. In addition, we quantify the proportion of e-cigarette tweets tied to commercial interests and those reflecting putatively ‘organic’ points of view and experiences. We also estimate the influence of commercial and organic e-cigarette tweets by using user-level metadata including Klout score and number of followers.
Through a licensed Twitter data provider (Gnip; http://www.gnip.com), we obtained access (for an institutionally negotiated fee) to the Twitter Firehose. In contrast to the publicly available Twitter data stream (Twitter API), which provides approximately 1% of all real-time tweets, the Firehose provides real-time search access to 100% of all tweets, as well as metadata associated with each tweet.11
Tweets related to e-cigarettes were identified and collected from the Firehose if they contained one or more keywords of interest, identified via expert consensus (‘e-cigarette,’ ‘ecigarette,’ ‘e-cig,’ or ‘ecig’). By ‘expert consensus’ we refer to review by our interdisciplinary team of collaborators representing substantial expertise in health behaviour and public health policy research. Tweets that included additional relevant keywords (‘electronic,’ ‘blu,’ ‘njoy’) along with the word ‘cig’ or ‘cigarette’ anywhere in the tweet text also were included. Blu and Njoy were selected as the top-selling e-cigarette brands in the USA according to Nielsen.12 It should also be noted that variations of the keyword ‘vape’ were not included in our search terms; we focused on search terms with extremely high precision at the expense of recalling the corpus of e-cigarette related tweets.
Twitter is a global platform on which a tweet from any location can be viewed by users from any location and where geolocation often cannot be reliably determined; however, our data collection sought to gather only English-language tweets to focus the analyses.
A random sample of 500 collected tweets was reviewed to assess whether tweets were relevant to e-cigarettes (over 99%) and whether they were in English (99%).
An iterative process combining human coding and machine learning was used to classify all collected tweets as either organic or commercial. DiscoverText, a cloud-based text analytics software, was used to collect, archive and machine-classify tweets. DiscoverText requires no text preprocessing prior to coding. This is the first study to apply this novel methodology to analysing content related to e-cigarettes. Organic tweets were those deemed non-sponsored; they reflected individual opinions or experiences or linked to non-promotional content. Commercial tweets were defined by presence of any of the following: branded promotional messages; URLs linking to commercial websites; usernames indicating affiliations with commercial sites; or user's Twitter page consisting only of promotional tweets (ie, spammer accounts). For the purposes of this study, a tweet containing a commercial link was coded as ‘commercial’ regardless of whether it was posted by an ostensibly individual account. Thus our definition of ‘commercial’ would include many tweets posted by ‘affiliate marketers.’ Commercial websites included those that directly sold e-cigarette related products as well as ‘landing’ or ‘affiliate’ websites for sales13; these sites did not themselves sell e-cigarette related products but promoted or reviewed these products and linked to retailer sites.
Two human coders coded a random sample of 2000 collected tweets, using tweet content as well as the associated metadata to determine whether tweets were commercial or organic. The two coders achieved notably high agreement (κ=0.93) on an additional overlap sample of 500 tweets. The coded set of tweets was used to train a machine classifier using a Naïve Bayes algorithm; the machine classifier was applied to the full set of collected tweets. When we tested agreement between machine classification and a human coder in an additional sample of 500 tweets, we found that the machine classifier identified tweets as ‘commercial’ with high levels of accuracy but was less accurate in identifying ‘organic’ tweets; the machine algorithm misclassified commercial tweets as organic more often than the reverse. Rather than reviewing all classified tweets, we systematically reviewed and recoded select tweets based on relevant features. Since our definition of ‘commercial’ tweets includes those linking to commercial sites, we reviewed the most common URLs in the data set and, for websites deemed commercial, we recoded all tweets containing these URLs as ‘commercial.’ We also reviewed and recoded tweets where the text exactly matched other tweets coded as ‘commercial’ or included some common commercial keywords (eg, coupon, discount, free trial, starter kit). Finally, we searched account name fields to identify those with explicit e-cigarette sales affiliations (eg, VaporGod LLC), recoding all tweets posted by such accounts as commercial. Approximately 1500 tweets were recoded. Our final coding algorithm was highly consistent with human coders when tested on an additional random sample of 500 tweets (κ=0.88).
A similar combination of human coding and machine classification was applied to distinguish tweets that mentioned smoking cessation from those that did not. Smoking cessation mentions included any references to the use of e-cigarettes in quitting or stopping smoking either in the tweet text, the URL or the username fields. For example, the following tweets were coded as cessation mentions: “Stop smoking with the electronic cigarette starter kit...” “Need to quit smoking #ecig,” and “Give up smoking with electronic cigarettes?” The machine classifier was trained using a set of 3500 tweets coded by two human coders with a high level of agreement (κ=0.90). After applying the machine classifier to the full set of collected tweets, we compared results with a human coder in an additional sample of 500 tweets, finding the machine classifier misclassified ‘no cessation’ tweets as ‘cessation’ more often than the reverse. To improve the coding algorithm, we systematically reviewed and recoded a subset of tweets based on presence or absence of common keywords related to cessation (variations of ‘quit*’ and ‘stop sm*’ in the tweet text, account name or full URL fields). We manually reviewed approximately 5000 tweets where machine classifier and keyword algorithms disagreed, recoding approximately 3500. Our final coding process was consistent with human coders (κ=0.87).
Among commercial tweets, various keywords were used to identify those with mentions of health effects (using letter string ‘health’) or safety (using the string ‘safe’) as well as mentions of pricing or discounts (using the strings: money, deal, %, $, save, promo, dollars, discount, coupon, code, price, cost).
Analysis of account information
To measure the potential reach of collected e-cigarette tweets, user-level information was analysed, focusing on (A) number of followers and (B) Klout score, a proprietary measure of influence calculated using data from Twitter and other social networks (Klout scores range from 1 to 100; higher scores indicate greater influence).16 Klout was not available for 1889 users (8%) and follower data became available from Gnip only as of 28 May 2012; thus the total times tweets were posted (sum of followers for all tweets) is calculated as of June 2012.
Between 1 May and 30 June 2012, our keywords identified a total of 73 672 tweets from the Twitter Firehose, almost all of which (99%) were relevant to e-cigarettes. These tweets were posted by 23 700 separate Twitter users. Average number of total e-cigarette tweets per day was 1208 (SD=325). Average number of e-cigarette tweets per user over the study period was 3 (SD=90.7).
For 14 319 accounts where follower count was available, the average number of followers was 845, with the median being 53. The follower count varied greatly (SD=7230), with 13.4% of accounts having fewer than 10 followers and 11.1% of accounts having more than 1000 followers.
Table 1 shows the characteristics of e-cigarette related tweets collected, presented in total and separately for organic versus commercial. Among collected tweets, 66 102(89.6%) were classified as commercial. Compared with organic tweets, commercial tweets were more likely to include URLs (94% vs 11%, p<0.001) and to be retweets (19% vs 17%, p<0.001).
There were 17 936 Twitter users who posted commercial tweets and 6254 users who posted organic tweets; 490 users posted tweets that fell into both categories. The average number of e-cigarette tweets by those who posted commercial tweets was 3.7, three times higher than among those who posted organic tweets (1.2, p=0.002). Although those posting organic tweets, on average, had more followers than those posting commercial tweets (867 vs 841), this difference was not statistically significant (p=0.86). Average Klout scores among those posting organic e-cigarette tweets was twice as high as among those posting commercial e-cigarette tweets (30.6 vs 15.4, p<0.001).
Of all collected e-cigarette tweets, 63 254 (86%) included URLs. Of these tweets, 62 406 (98.7%) were coded as commercial. URLs were predominantly for .com addresses (85.2%), followed by .org (4.8%) and .net (4.6%) addresses. The most-mentioned website was vaporgod.com (13 244 mentions), followed by purecigs.com (4546) and bestcelebrex.blogspot.com (3232).
The total times these e-cigarette tweets were posted on Twitter feeds (the sum of all followers of all tweets) was nearly 4 million for organic tweets and 173 million for commercial tweets in June 2012, implying that commercial e-cigarette tweets were posted on followers’ Twitter feeds 45 times as often as organic tweets.
Overall, approximately 11% of e-cigarette tweets were found to contain references related to smoking cessation. Cessation mentions were more frequent among commercial (11%) than among organic tweets (9%, p<0.001).
Tweet activity was highly concentrated among the most active users, with the top three users producing 25% of e-cigarette tweets and the top 100 users producing 48% of e-cigarette tweets over the study period. Figure 1 depicts the distribution of tweets across users, where each dot represents the number of users who posted a given number of e-cigarette tweets. The figure shows that while approximately 10 000 users tweeted about e-cigarettes once, only one user tweeted about e-cigarettes over 10 000 times. Out of the top 10 most active users, eight had smoking or e-cigarette related usernames; over 99% of e-cigarette tweets by these 10 users were coded as commercial and 99.6% included URLs. The top 10 most active Twitter accounts had a higher-than-average number of followers (5601 vs 845, 0.7 SD from the mean). The same accounts had an average Klout score that was 0.6 SD higher than the mean (26.7 vs 19.3).
Of the 23 700 Twitter users who posted about e-cigarettes, 589 (2% of users) had username fields containing the keywords or letter strings ‘vape,’ ‘vapor,’ ‘cig,’ ‘smok’ or ‘elec.’ These 589 users produced 30 046 tweets over the study period, or 40.7% of total tweets. Of these tweets, 29 898 (99.5%) were coded as commercial.
The extent of mentions related to e-cigarette health and safety, as well as price or discounts, is shown in table 2. Among commercial e-cigarette tweets, 2% contained descriptions or mentions related to health, and mentions of safety were present among approximately 1%. About a third of all commercial e-cigarette tweets contained price or discount mentions.
Summary and discussion
Whereas prior studies of tobacco-related content on social networks have noted the presence of e-cigarette promotion,4 ,14 our study is the first to quantify overall presence of e-cigarette relevant content on Twitter. Unlike previous studies,4 ,15 we analysed metadata associated with each tweet to better understand potential reach of e-cigarette-related tweets across the platform. Our analyses revealed that advertising and promotion represents the overwhelming majority of Twitter content related to e-cigarettes. Over a 2-month period in 2012, we found that 90% of nearly 74 000 tweets related to e-cigarettes contained commercial content—either promotional messages or URLs linked to commercial websites promoting e-cigarette use. Commercial tweeting about e-cigarettes was largely driven by a small number of highly active users. The fact that 25% of all tweets examined were generated by just three users suggests an automated process—these users may be ‘bots’ set by commercial bodies to tweet on a bulk level. The vast majority of commercial e-cigarette tweets contained links to websites and a third included price or discount appeals; more than 1 in 10 commercial tweets mentioned smoking cessation. The finding that e-cigarettes are extensively marketed on Twitter is consistent with the increasing levels of e-cigarette web searches and sales.12 ,16 Our findings are consistent with those of Prochaska et al4 who found that many Twitter accounts dedicated to smoking cessation included links to commercial e-cigarette sites. While only 11% of the e-cigarette tweets collected in this study mentioned smoking cessation, the high absolute number of such mentions is noteworthy.
The rapid growth in the popularity of e-cigarettes may in part reflect marketing and networking efforts through Twitter, the extent of which has been demonstrated by our findings in this study. Some researchers have hypothesised that, due to lack of regulatory standards, social media may play an increased role in the diffusion of tobacco products and pro-smoking messages.7 Prior to emergence of e-cigarette TV ads in late 2012, e-cigarettes were primarily marketed online17; however, the extent and strategies of online marketing were not well-understood.
Our study offers strong evidence of the presence of e-cigarettes marketing on Twitter, a non-trivial portion of which contains mentions related to smoking cessation and price promotions. Price promotion and discounting significantly influence product uptake and consumption, an association strongly demonstrated for regular cigarettes.18 E-cigarette marketing on Twitter may have contributed to the rapid rise in e-cigarette popularity. Results of our analysis of Twitter data have implications for the marketing of e-cigarettes on other social media platforms due to high levels of cross-platform interaction. Given the substantial youth presence on social media,5 the marketing of e-cigarettes on those platforms may entice non-smokers—youth in particular—to experiment with and initiate e-cigarette use.
These results have direct and important implications for future FDA regulations on e-cigarettes and related products, particularly with respect to marketing restrictions on social media. To the extent that e-cigarette safety and efficacy have not yet been fully studied, extensive marketing of the products on social media may carry public health risks. The recent court case decision6 was predicated on the company's assertion that e-cigarettes were not marketed as cessation devices, a claim contradicted by our findings. Learning the extent to which e-cigarette marketing appeals to new users who had not previously used tobacco—especially children and adolescents—and affects initiation is a stated FDA research priority.19 Given the pervasiveness of e-cigarette marketing on Twitter (and likely other social media platforms) and the substantial youth presence on those platforms, it is imperative for the FDA to closely monitor content and reach of such strategies and adopt appropriate social media marketing regulations for tobacco products, including e-cigarettes, that are consistent with the Family Smoking Prevention and Tobacco Control Act.
Our study has several limitations and raises questions for future research. First, our study is cross-sectional in nature, examining e-cigarette-related tweets for a short 2-month window, and does not examine trends over time. Future studies can use our analyses as a starting point to examine trends in volume of e-cigarette-related content on Twitter. Second, the time frame during which our data were collected was just prior to the launch of major e-cigarette TV marketing. Thus we were unable to examine the relationships between televised e-cigarette ads and Twitter conversation. Future research should seek to clarify these relationships. Third, we may have overlooked important keywords including emerging brands and variations of slang terms ‘vape.’ Pilot testing indicated that slang terms involving variations of ‘vape’ were less precise than the keywords we selected (eg, ‘vapes’ may refer to marijuana vaporisers). We focused on precision rather than recall since including less relevant keywords would require extensive data preprocessing to eliminate irrelevant tweets in order to draw valid conclusions from the content analysis. For future studies we will develop efficient methods for conducting such preprocessing. While our keywords demonstrated high precision, our analyses likely underestimate the total amount of e-cigarette-related tweets. Fourth, our data do not allow us to map the interrelationships between Twitter users who post e-cigarette-related content. Future research might measure types of interactions between e-cigarette marketers and potential customers, including whether commercial and organic accounts are likely to follow, retweet, or mention one another. We also cannot reliably distinguish whether an organic user tweeting commercial content was paid or encouraged to do so by a commercial entity. Our estimates do not take into account content viewed by actively searching for e-cigarette-related keywords using the Twitter search interface. By 2010 Twitter was the world's fastest growing search engine.9
Finally, more research is needed to better understand how social media marketing affects smokers’ and non-smokers’ beliefs and attitudes about e-cigarettes, and ultimately whether it influences product use. Future studies could examine the content of organic tweets about e-cigarettes to better understand tweeters’ assessments of products’ appeal, safety and smoking cessation application.
Given the absence of demographic or smoking status information about individual users, external validity of inferences drawn from Twitter data partly depends upon available information about user characteristics at the population level. Internet users on Twitter are described as primarily young (30% aged 18–29 years), African American (27%) or Latino (28%).5 While Klout score is not universally acknowledged as the top measure of online influence, the measure is useful to characterise potential reach and impact of Twitter messaging. It is generated using an undisclosed algorithm to capture influence, defined as ‘ability to drive people to action’—thus retweets and replies are likely weighted most heavily.10
Despite the limitations, this study contributes to the literature by beginning to characterise the presence and nature of e-cigarette marketing on social media and identifying potential directions for future research. This paper extends the line of research on tobacco-related information using ‘big data’ by adopting a standard statistical algorithm-based machine learning method which, until recently,14 has received little attention from tobacco control researchers. Research using Twitter data is relatively new and standardised methodologies for classifying and analysing those data are needed. The study described here contributes to the testing and standardisation of novel methods for conducting such analyses.
What the paper adds
While the marketing and use of e-cigarettes have increased, little is known about how e-cigarettes are marketed and promoted on social media platforms.
This study reveals that Twitter appears to be an important marketing platform for e-cigarettes. E-cigarette marketing on Twitter may carry major public health risks. Continued surveillance of e-cigarette marketing on social media platforms is needed.
The authors thank Lisa Vera, Steve Binns and Huayi Li for their excellent research assistance.
Contributors SLE, GS, JH and RK together designed the study; GS and RK collected data; JH and RK conducted data analysis; SLE, GS, JH and RK contributed to data interpretation; RK and JH wrote the first draft; SLE, JH and RK revised the draft; the final version of the paper has been reviewed and approved by all four coauthors.
Funding This project is funded by a National Cancer Institute-funded grant (Grant No. 1U01CA154254), titled “Tobacco Control in a Rapidly Changing Media Environment” (Principal investigator: SLE). The National Cancer Institute did not play any role in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication. The opinions expressed here are those of the authors, and do not necessarily reflect those of the sponsors.
Competing interests None.
Patient consent Not applicable.
Ethics approval This study is cleared for ethics by Research Ethics Boards or International Review Boards at the University of Illinois at Chicago (USA).
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.