- Where does the data come from?
- What does PubMed, DOI and API mean?
- Where does the citation data come from?
- Geolocation data
- Why don't some statistics use data from all publications?
- What are the circles with numbers in them?
- What's the difference between "All keywords" and "Major keywords (MeSH)"?>/li>
- Why doesn't my paper show up when I click on a keyword which describes it?
- What are some publications in a different year to what I would expect?
- What is a word cloud?
- I want to know more!
- How was this project funded?
Where does the data come from?
It is mostly metadata we use here. You can think of the journal articles themselves as being the data part, with information about the journal articles (e.g. author lists, keywords, citations etc) being the metadata part. We get this metadata for the journal articles from the PubMed and DOI APIs.
What does PubMed, DOI and API mean?
PubMed is an online search engine which contains journal articles on lifesciences and biomedical topics. DOI is short for Digital Object Identifier and is the de facto way of referring to journal articles. An example DOIs is https://doi.org/10.12688/f1000research.25484.2 and http://doi.org are in charge of maintaining the official list of them. An API is an Application Programming Interface, which is a fancy way of saying that computer programs can send and receive data to a service. So here we send and receive metadata to/from PubMed and doi.org.
Where does the citation data come from?
Citation data is retrieved from the Scopus API provided by Elsevier. This is cached locally and updated regularly. You can see the last time the citation data was updated in the last updated note at the bottom of the page.
We use the first author's postal address and email address to assign a journal article to a location. Sometimes this may not exist in the metadata, which is why not all journal articles will be plotted on the maps. Occasionally a journal article indicates the authors wish a different author to be considered the lead author, we cannot process this information, so the first author is used.
The coordinate location of institutions is retrieved from the free open data project Wikidata.
Why don't some statistics use data from all publications?
Metadata can be missing because some journal articles are very old and the metadata about it just doesn't exist, or sometimes a recent journal article may not have the metadata about it available yet. There are many different ways to track journal articles and prior to the introduction of DOIs in 2000 there was no standard method. This means some metadata on old journal articles could have been lost or not recorded.
The metadata used for the statistics is gathered from databases which only collect data from particular journals (no one has 100% of journal articles in their database). This means some statistics shown here are under-reported (e.g. citation counts and their derivatives) or may be missing entirely (e.g. journal articles with no location assigned to them).
What are the circles with numbers in them?
Journal article citations are one way to track how an article is being used. Another way is by considering a wider set of metrics, such as if it is mentioned in news articles, or on social media. This is what https://www.altmetric.com does, and by hovering your mouse over one of the circles (or clicking on it) you can get an overview of where this article is being talked about.
What's the difference between "All keywords" and "Major keywords (MeSH)"?
The All keywords page shows author defined keywords and MeSH terms. MeSH (Medical Subject Headings) is a commonly used hierarchical controlled vocabulary. This means that there is a list of well defined terms which are allowed to be used (controlled vocabulary), and they are related to one another (hierarchical) e.g. "finger" belongs to "hand". So the Major Keywords are the broader areas, and all keywords will be more fine grained. MeSH terms are applied retrospectively to a subset of all journal articles.
Why doesn't my paper show up when I click on a keyword which describes it?
Authors can often add whatever they want for their keywords when they publish a journal article, this may result in slightly different words for the same thing being used by different articles. Sometimes the author defined keywords are not present in the metadata at all.
What are some publications in a different year to what I would expect?
How do you define when a journal article is published? Is it when it is accepted, or when a pre-print is available, or when it is made available on a journal's website, or when it appears in a journal's printed volume, or something else? Different journals have different preferences for which they prefer. This will likely mean that some journal articles appear later here than you may expect them to.
What is a word cloud?
A word cloud is a way of showing how often words appear in a list of words. The more frequent a word is the bigger the word appears.
I want to know more!
- The source code is at https://github.com/OllyButters/puma. You can download and run this yourself!
- Some documentation is at https://github.com/OllyButters/puma/wiki
- There is even a paper written about it: https://f1000research.com/articles/9-1095/v2
- You can talk to us at: @DrOllyButters, @DrBeccaWilson and @_hugh_garner_
How was this project funded?This project has been funded by:
- CLOSER, whose mission is to maximise the use, value and impact of longitudinal studies. CLOSER is funded by the Economic and Social Research Council (ESRC) and Medical Research Council (MRC) (grant reference: ES/K000357/1).
- Becca Wilson is a UKRI Innovation Fellow with HDR UK [MR/S003959/1].
- The Nuffield Foundation research placement program.
- The Wellcome Trust and Medical Research Council (grant number 108439/Z/15/Z).
- The European Union’s Horizon 2020 research and innovation programme under grant agreement No 824989 and the Canadian Institutes of Health Research (CIHR).
- The National Institute for Health Research Applied Research Collaboration.