The true percentage of scientific articles that will never be cited

Every so often I see a message passing on social media in which someone claims that ‘90% of scientific articles will never be cited’. While I do not agree with the notion that a publication’s value is solely determined by the number of citations it receives, such statements can be damaging to science, because they suggest to the public that most research is useless, with the underlying implication that many researchers therefore waste public money.

Although the statement has been traced back to an overeager editor of a non-scientific journal, it is actually fairly easy to examine the validity of this claim with a website called Scimago Journal and Country Rank (SJR). This excellent website includes citation information categorized by country and subject area based on data taken from Elsevier’s Scopus. Because SJR categorizes the information by country or subject area, a country has to be selected to obtain the numbers necessary for examining the claim. For the first analysis, I have selected the United States, but using other countries yields similar outcomes.

When reading the statement, it is unclear how ‘never’ and ‘articles’ are defined. Never is a very long time period. Although SJR gives citation information for the period between 1996 and 2015, I have taken articles that were published in 2005 as the reference point for this analysis. It seems unlikely that many articles that have not been cited at all in 10 years will suddenly be cited.

Besides defining ‘never’, it is similarly important to define ‘articles’. Some scientific documents, such as editorials, errata and lists of reviewers, are not intended to be cited and should therefore not be included in the analysis. SJR helpfully distinguishes citable and non-citable documents. The website informs us that, for the year 2005, 443188 citable and 42704 non-citable documents were published by researchers from the United States (8.79%).

Furthermore, the website informs us that, of the documents that had been published in 2005, 379366 had been cited but 106526 remained uncited after 10 years. However, the number of uncited documents includes the non-citable documents. Without those non-citable documents, only 63822 of the documents that had been published in 2005 remained uncited after 10 years. By dividing the number of uncited documents (63822) with the number of citable documents (443188), one finds that only 14.4% had not been cited.

The analysis shows that statement that ‘90% of articles will never be cited’ is simply not true. According to information taken from Scimago Journal and Country Rank, only 14.4% of citable documents (written in 2005 by researchers from the United States and published in journals indexed by Scopus) have not been cited after 10 years.

This analysis can be conducted for every other country. Most countries have similar low rates as the United States (UK: 9.0%; Germany: 19.8%, France: 18.0%, Canada: 11.6%, Italy: 14.1%; India: 17.3%, Spain: 14.0%). Some countries have higher proportions (China: 31.9%, Japan: 23.2%), but none of them has extremely high rates as the one mentioned in the claim. While these 10 countries represent about 69% of all the documents published in 2005, only 17.7% of the citable articles from these countries had not been cited after 10 years.

It is important to note that these analyses only include articles published in journals that are indexed by Scopus. Publications in journals that are not indexed by Scopus are probably less likely to be cited.

While it is fair to assess the impact of research, unfounded statements implying that many academics do not conduct valuable research can damage science. Politicians may be swayed by public opinion to decrease research funding and they may feel comfortable ignoring the opinions of experts when making policy decisions. It is therefore important that false myths, such as the proportion of articles that are never cited, are dispelled.

The true percentage of scientific articles that will never be cited

More references, more citations?

A good friend of mine recently complained online that no one would read her new paper. Friends immediately responded that they were highly interested in her work. As online posts are wont to do, comments quickly escalated, with one commenter suggesting to cite each other’s work. While this last comment was made in jest and the friend did not seriously suggest setting up a citation ring, it made me wonder whether the number of citations is indeed related to the number of references. Do some researchers cite other researchers, just because they had cited them?

Like so many things, this question has already been examined. Webster, Jonason and Schember (2009) took 562 articles published in Ethology and Sociobiology (1979-1996) and its successor Evolution and Human Behavior (1997-2002) and compared the number of references of each article to the number of citations. Because the distributions were skewered (i.e., the medians were different from the means), they applied log transformations. Webster et al. (2009) found a surprisingly strong correlation of .44, which suggests that articles with more references indeed receive more citations.

The findings of Webster et al. (2009) bothered me more than it should have. Could it really be that some groups of researchers cite each other frequently? There must be another explanation. One issue that was unclear to me is the extent to which the analysis of Webster et al. (2009) included editorials, errata, commentaries, replies, letters to the editor, and book reviews. These kinds of publications tend to have few references and are seldom cited. As such, they represent outliers and their inclusion could have increased the correlation considerably.

To exclude the possible influence of those editorials and such, I quickly downloaded from Thomson’s Web of Science articles that were published in Memory, Applied Cognitive Psychology and Memory & Cognition in 2004, 2005 and 2006. The years 2004, 2005 and 2006 were selected, because any citation that is made for purely reciprocal reasons is likely to have been made within 10 years or so. Editorials and such were omitted from the subsequent analyses.

For each data set, I calculated one correlation between the number of references and citations. Like Webster et al. (2009), I also applied log transformations to account for the skewered nature of the data. The nine correlations are: r(63) = .275, p = .029, r(75) = .109, p = .351, r(76) = .305, p = .007, r(75) = .191, p = .101, r(73) = .479, p < .001, r(84) = .256, p = .019, r(119) = .142, p = .125, r(125) = .265, p = .003, r(149) = .239, p = .003, respectively. When I averaged the nine correlations (M = .246), then the correlation appears to be less strong than the correlation of Webster et al. (2009). Furthermore, for each journal, one of the three correlations was not significant, suggesting that the effect is not very robust.

Although the range of the correlations is lower than the correlation that was found by Webster et al. (2009), the average of these nine correlations seems to suggest that it might indeed be worthwhile to add a few references. However, supplemental regression analyses indicate that, for every additional 4 references, a study will receive 1 extra citation after a 10-year period (B = 0.282). In other words, the effect seems to be there, but it does not seem to be large or robust. Furthermore, there are at least three other explanations that might account for the relation between references and citations beyond purely reciprocal reasons.

First, whereas I omitted editorials and such from the analyses, I did not account for brief reports and reviews. Brief reports (or rapid communications), regardless whether the journal has such a category, tend to be smaller in scope and report preliminary results. If the results are promising, then a larger study is surely to follow. Brief reports therefore tend to have fewer references and to receive fewer citations too. Reviews, on the other hand, are supposed to provide an overview of the literature and therefore include many references. They are also known to receive many citations.

Second, it is possible that the relation between references and citations reflects differences in the interest in the topics. There are few previous studies to which an article about a niche topic can refer. Similarly, there will be few subsequent studies that can cite the article. However, when the topic is popular, there are many previous studies to which an article can refer and there will be many studies which can cite the article.

Third, it is also possible that the relation between references and citations reflects the quality of the articles. A high quality study has a complete literature review, offers support for its assumptions, makes informed decisions about the design of the study, and puts its results into context. A study which addresses these issues is likely to have more references than a study which ignores these issues. It is also likely to receive more citations.

Oddly enough, I find the size of the relation between references and citations reassuring. The effect is sufficiently small to exclude the existence of extensive citation rings or a wide-spread culture of reciprocal citations, at least in cognitive psychology. Moreover, the relation can be explained by benign factors, such as the type, the topic and the quality of articles. As a struggling academic, it is strangely comforting that there does not appear to be a short-cut for success.

Webster, G. D., Jonason, P. K., & Schember, T. O. (2009). Hot topics and popular papers in evolutionary psychology: Analyses of title words and citation counts. Evolutionary Psychology, 7, 348-362.

More references, more citations?