The most cited authors and papers in tobacco control
- Correspondence to: Professor Simon Chapman School of Public Health, Edward Ford building A27, University of Sydney, NSW 2006, Australia;
In this paper, we present the first attempt at determining which authors of research and commentary of direct relevance to tobacco control have the most cited publications in this field. We examine this from 1980 to 2004 and also for the past decade (1994–2004) in an effort to distinguish the 100 overall most cited authors for these periods. We have also provided a list of the 50 highest citation classics in tobacco control.
Citations are the most common way of measuring the impact in the scientific community of an article, and cumulatively, of a researcher.1–3 However, there can be important differences between evaluating impact and the quality of a paper. The quality of a paper is essentially characterised by the notion of possible value and this cannot be easily measured in an objective and quantitative way.4 Smith has suggested a range of ways in which scientific output might otherwise be evaluated, but all of these outcomes can be less easily measured than by calculating citations.5 While use of databases for citation analysis have been widely criticised,6 they do offer the easiest approach for assembling “ball park” data even when taking into consideration that the databases do not distinguish between positive and negative credits. Authors generally cite exemplary or important articles. While controversial badly flawed articles can also be highly cited in critical articles referring to them, it seldom happens that an author will repeatedly produce such highly cited flawed articles.
The field of tobacco control is very broad and includes basic science on toxicology, pharmacology, and genetics; clinical research reporting on the impact of tobacco use on individuals’ health; epidemiological research on large populations; studies of tobacco use; and matters relevant to policies and programmes designed to reduce tobacco use and the harm it causes. There is a huge body of epidemiological research, which examines multiple risk factors for disease, where tobacco use is only one variable examined. Such research can be vital to the evidence base that underlies the case for tobacco control and the content of public awareness campaigns. As will be seen, many authors of such studies have been ranked highly and dominate the lists. While some may question their inclusion arguing that multiple risk factor epidemiology is not “true” tobacco control research, we reject that argument on both substantive and practical grounds. Culling such articles manually would have taken hundreds of hours, generated understandable controversy, and resulted in many highly influential epidemiological studies being capriciously discarded for the dubious purpose of elevating some authors doing more work precisely focused around tobacco “control”.
Authors were identified using the Thomson Institute of Scientific Information (ISI) Web of Science (WoS) database. WoS includes original contributions, reviews, letters, editorials, text chapters, and published conference abstracts. None of these were excluded, on the rationale that any type of paper that was highly cited was of interest to the field. In August 2004, advanced WoS search queries combining topic and title fields were conducted to facilitate the construction of a shortlist of authors. The search string TS = (tobacco* OR smok* OR cigarette* OR nicotine* OR nonsmok*) OR TI = (tobacco* OR smok* OR cigarette* OR nicotine* OR nonsmok*)† was run for each year between 1980 to 2004. The search covered all articles indexed in the Science Citation Index Expanded, the Social Science Citation Index, and the Arts and Humanities Citation Index, and 106 789 records were downloaded.
As WoS has a download limit of 500 citations, the data were downloaded in batches of 500 for each year. The accumulated batches were then imported into two databases: Histcite Trial Version (HistCite (Vlad) Version: 2004.05.14) and Postgres SQL. Histcite is a system designed to assist in identifying the most cited papers retrieved in topical searches of WoS.7 HistCite can create tables by author, year, or citation frequency.7
The data were imported into HistCite in annual batches and a bibliography representing each year was saved. The bibliographies were then re-imported into HistCite and merged to a level at which the database would cope: 1980–1989, 1990–1996, 1997–1999, 2000–2001 and 2002–2004.
To generate the shortlist of authors for 1980–2004, the top 300 names by citation frequency from each batch were then extracted and imported into Excel, subsequently merged, and re-sorted by citation frequency. Names and associated papers were reviewed by revisiting the complete dataset in the Postgres SQL database. Importantly, to appear on the shortlist of authors, a person needed to have published at least 10 publications as first author.
The same procedure was carried out to compile a list of authors for the period 1994–2004.
Updated data for the authors on the shortlists were collected in February 2005. As the WoS database is updated on a weekly basis, the top 100 names for each time period were searched individually and combined with the following topic/title search using WoS data updated as at 12 February 2005:
TS = (TS = (tobacco* OR smok* OR cigarette* OR nicotine* OR nonsmok*) OR TI = (tobacco* OR smok* OR cigarette* OR nicotine*
OR nonsmok*)) AND AU = ((AU = Surname Initial*))The settings were DocType = All document types; Language = All languages;
Databases = SCI-EXPANDED, SSCI, A&HCI; Timespan = The specified time period which was either 1994–2004 OR 1980–2004.
The downloaded data were then imported into HistCite as well as an Endnote 7.0 Library for each dataset (1980–2004 and 1994–2004). HistCite was used to compute the preliminary data and the results were exported into two Excel workbooks (1980–2004 and 1994–2004). Each name was cross checked using the data in the Endnote Library to detect any anomalies and also to confirm that each author had published ⩾ 10 papers as first author.
The top 50 articles were selected from the initial August 2004 data download. Papers that had more than 150 citations (957 in total) were exported from the Postgres SQL database and imported into an Excel spreadsheet for sorting and analysis. Papers were selected if they were on tobacco use prevention and control as well as epidemiological papers regarded as seminal in establishing the harms of tobacco use.
Our final analysis included 9745 papers for the period 1980–2004 and 7644 papers for the period 1994–2004. Table 1 shows the 100 most cited individuals 1980–2004 by total citations. The honour of the most cited author goes to Meir Stampfer from the Harvard School of Public Health. The person who has the highest mean number of citations is Paul Ridker, Eugene Braunwald Professor of Medicine at Harvard Medical School. Over 70% of authors are from North America with the remainder from Europe. There are a higher percentage of male authors (84%) than female authors (16%). Forty seven per cent of the authors conduct epidemiological research, followed by 16% in the field of psychology, and 12% in medicine and neuropsychopharmacology, respectively.
Table 2 lists the 100 most cited individuals 1994–2004 by total citations and the honour of the most cited author goes to an Australian, Graham Colditz, Professor in the Department of Epidemiology, Channing Laboratory, Harvard School of Public Health. The person with the highest mean number of citations is again Paul Ridker. Sixty four per cent of authors are from North America, 34% from Europe, and 2% from Asia. Female author representation is 22% and male authorship is 78%. Fifty per cent of the authors conduct epidemiological research, and other predominant fields include branches of medicine (17%) and psychology (13%).
There were 53 authors who appeared on both lists.
The papers that were selected for inclusion in the top 50 (table 3) were published from 1980 to 1999, with 52% published from 1990. Eighty per cent of the papers selected for inclusion were published in journals with ISI impact factors ranging from 2.352 (European Journal of Pharmacology) to 34.833 (New England Journal of Medicine); 12% of the papers were published in JAMA, followed by 10% in both the Lancet (impact factor 18.316) and the New England Journal of Medicine (impact factor 34.833).
Both lists of highly cited authors are heavily dominated by “big” epidemiology papers, often being multiple risk factor studies where tobacco was one (important) variable being examined. These studies have often been the bedrock on which tobacco control policy rests and so are deservedly important. More practically, they are often cited in the introductory sections of papers when authors often run through a summary of why smoking is a health problem. Similar observations apply to another heavily cited category: nicotine pharmacology.
There are several reasons to limit interpretations of the results presented. Firstly, we recognise that use of WoS for citation analysis has been widely criticised,6 and as WoS does not cover all papers published the results should only be considered as “ball park” figures. Secondly, as a research study it is not precisely reproducible because: (1) WoS is a dynamic database, updated on a weekly basis; (2) authors selected for inclusion may differ between reviewers; and (3) it is impossible to ensure that all papers for an author have been detected due to potential errors in the data as well as finding all works of authors using two or more names. Thirdly, the selection criteria for including authors (⩾ 10 publications as first author) may have excluded authors who had highly cited papers. We imposed this threshold to reduce the number of individuals who assisted in the preparation of frequently cited papers but did not make the most significant contribution. Fourthly, as the preliminary data involved over 100 000 records and were merged in batches there is a probability that an author may have been excluded from the final list of names due to computation errors. Further limitations include the assumption that many authors try to support the interpretation of their own results when attempting to convince or persuade readers.8 Biased citing, such as self citation and bias towards citing English language publications, are recognised as potential problems.4 A study of citation counts in the area of addiction found that the number of citations received by papers appeared to reflect the geographic region of the study (the huge dominance of American authors citing American journals) rather than necessarily their importance as agreed by peers.9 These criticisms and observations should be taken into consideration when interpreting the results.
↵† A list of exclusion keywords was run in conjunction with the topic search to primarily exclude results concerned with tobacco agronomy.
Competing interests: none declared