Skip to content

Open Science Indicators FAQs

Open Science Indicators FAQs

Thank you for using the dashboard. We hope the information below will help you understand and reuse the data. If your question is not answered here please get in touch with us using this form.

What are Open Science Indicators?

Open Science Indicators are a tool developed by PLOS in collaboration with DataSeer to measure open science behaviors in a standardized way.

What indicators are measured in the OSI dashboard prototype?

The dashboard measures five open science practices associated with published open access research articles:

  • Research data sharing
  • Code sharing
  • Preprint posting
  • Protocol sharing
  • Study registration

Definitions and measures of these indicators are summarized in the below table and follow the PLOS OSI dataset definitions, principles and measures.

Definitions table

OSI

Definition

Indicator measures

Research data sharing Any research data supporting the results of a published article are findable and accessible
  • Data shared in a repository: Research data generated by the article have been shared in a commonly-used data repository, which has been identified by the repository name and/or the persistent identifier for one or more datasets
  • Data shared elsewhere online: Research data generated by the article have been shared in an online location where the name of the repository has not been detected and/or a persistent identifier for a shared dataset was not identified
  • Data shared as supporting information with an article: Data generated by the research article have been shared in the supporting information file(s) with the article
  • Data generated, no sharing found: Research data have been generated by the research article and none of these data have been found
  • Data not generated: The article did not generate any research data
Code sharing
Any code (programming statements created with a text editor or a visual programming tool) used to produce the results of a published article are findable and accessible
  • Code shared online: Code generated by the article has been shared in a repository or other online location
  • Code shared in supporting information: Code generated by the article has been shared in the supporting information file(s) with the article
  • Code generated, no sharing found: Code has been generated by the research article and none of the code has been found
  • No code generated: The article did not generate any code
Preprint posting
An early version of a scholarly research article (a preprint) is findable and accessible online and posted before the associated article’s publication date
  • Matching preprint identified: A preprint matching the research article has been found in a preprint server
  • No preprint matched: No preprint matching the research article has been found
Protocol sharing Detailed or “step-by-step” instructions for carrying out a research procedure for the research article are findable and accessible online
  • Protocol shared in a repository: A protocol(s) used in the research article has been found in an online repository
  • Protocol shared in a separate article: A protocol(s) used in the research article has been published as its own journal article or book chapter
  • Protocol shared in supporting information: A protocol used in the article has been shared in the supporting information file(s) with the article
  • No protocols identified: The article did not share any protocols
Study registration
Elements of the research plan such as research questions or hypotheses, study design, or data analysis approach are findable and accessible online
  • Study registration shared: A study registration identifier and/or a link to a study registration were found in the article. PLOS OSI measures study registration in 31 databases that cover all types of research studies including clinical trials, systematic reviews, animal studies, and other study designs
  • No study registrations identified: The article did not share any study registration identifiers

More information on how PLOS OSIs are defined is in the PLOS OSI definitions and measures paper, and in the detailed methods documentation in the public data release in Figshare.

 

What content is analyzed in the PLOS OSI dashboard?

The PLOS OSI dashboard prototype currently presents an analysis of the PLOS OSI dataset covering the time period 2018 to 2024. This includes all research articles published in PLOS journals in this time (approximately 134,000 articles), plus a comparator set of non-PLOS content from PubMed Central (approximately 27,000 articles). The non-PLOS comparator set of articles is topic matched to the PLOS content using article MeSH (Medical Subject Headings) terms that are assigned to articles listed in PubMed by the US National Library of Medicine, ensuring it is a representative comparison group. Using the dashboard, you can view results for the entire corpus of PLOS and non-PLOS content or compare PLOS to other publishers using the Publisher filter. We would love to include a larger comparator corpus of content but for open science monitoring to be sustainable at a larger scale, we need other publishers to also share similar indicators for content they publish.

How are OSIs produced?

After creating the PLOS OSI definitions and measures, PLOS first partnered with DataSeer to create the PLOS OSI dataset in 2022. The five indicators of open science practice (OSI) are extracted and created using a combination of natural language processing (NLP) and large language model (LLM) artificial intelligence tools applied to the full text of published research articles and other open data sources. 

This includes but is not limited to analyzing Data Availability Statements and Supporting Information file titles and captions to identify if data and code have been generated and shared by looking for keywords and their proximity to other keywords. Information like the persistent identified (PID, such as a Digital Object Identifier, DOI) or Accession Numbers for shared data and repository name are extracted and included in the OSI dataset.

Preprints are detected using an LLM pipeline that searches the DataCite and Crossref databases for documents labelled as preprints that have similar titles and author lists to the published article.

Indicators for protocols and study registration are detected by a search of known databases/registries of these outputs and matched to the article based on factors such as authors, PIDs, regular expressions, and keyword searches of the full text to help identify if the article mentions a protocol or registration. 

In addition to these OSIs, metadata are extracted from article XML to aid comparisons using and analysis of the data. The dataset is further enhanced using standardized metadata from OpenAlex to capture all affiliation information from articles – not just corresponding authors. This provides additional detail on geography, research organization and funder.

Black and white image of a field

How is the PLOS OSI dashboard created?

The dashboard prototype is created in Looker Studio, a product from Google that can create visualization of datasets that are hosted in Google Big Query (GBQ) or in Google Sheets. The current dashboard runs from a version of the PLOS OSI dataset hosted in PLOS’ GBQ database. 

The PLOS OSI dataset has been enhanced with information on research organization, research fields, author country, and research funders from the OpenAlex database to enable users to compare articles in the dashboard in more detailed ways. The specific dataset underlying the dashboard and the queries used to create the dashboard in Looker Studio are available openly here.

How can the OSI dashboard be used and reused?

We encourage the dashboard and underlying data and methods to be reused as much as possible. The PLOS OSI dashboard enables its users to:

An icon of boxes arranged in a hierarchy

Understand what open science practices can be measured in publications at a broad scale

An icon of a globe

Explore how trends in practices are changing over time – at a global level or for an entity or group of interest 

An icon of test tubes

Compare how open science practices differ between countries, research organisations, funders, and/or research fields

masterbrand-email-icon-arrows-sf

Share insights about open science indicators with colleagues and peers

If you have feedback, questions or would like help reusing the dashboard or underlying datasets, please use this form.

How should I cite the dashboard or OSI data?

To cite the dashboard, please cite as:
To cite the data and / or queries underlying the dashboard, please cite as:
To cite the PLOS OSI dataset and/or methods, please cite as:

How should the OSI dashboard not be used?

In line with the PLOS Open Science Indicators Principles, and the Open Science Monitoring Initiatives (OSMI) Principles, the dashboard should not be used to rank research organizations, countries, research fields, journals, publishers, or individuals. Comparisons of entities, with the appropriate context, can however be valuable. To promote responsible use, comparisons of funders, institutions, and countries are limited to five comparable entities. All entities can be compared to the average of the entire dataset. PLOS OSIs and the dashboard are also not designed as an auditing or compliance enforcement tool – for individual entities, articles, or individuals.

How accurate are the results?

For all indicators in the OSIs PLOS and DataSeer have aimed for a minimum accuracy of at least 85%. For each indicator a range of accuracy metrics (accuracy, sensitivity, specificity, precision, F-score) are given in the full documentation of the public data in Figshare. Below are the accuracy rates for each of the five indicators in the dashboard for both the PLOS articles and the non-PLOS comparator set. The accuracy rate is calculated by randomly selecting 100-200 articles from each corpus and checking them by hand to identify false positives and false negatives.

Indicator

Accuracy assessment - PLOS articles

Accuracy assessment - non-PLOS comparator articles

Data sharing

85%

81%

Code sharing

97%

94%

Preprint posting

97%

97%

Protocol sharing

94%

97%

Study registration

99%

99%

Where can I find the underlying data, methods and code?

In line with the PLOS Open Science Indicators Principles, and the Open Science Monitoring Initiatives (OSMI) principles, data, code, and methods underlying PLOS OSIs are openly available. The PLOS OSI dataset is available as an openly-available file in Figshare along with detailed methods and accuracy information. A large proportion of the code and software used to produce PLOS OSIs, in particular those relating to data and code sharing, is available under an open source license here.

The specific dataset underlying the dashboard along with the queries used to create the dashboard in Looker Studio are available openly here. To enable PLOS to learn quickly about the potential value and impact of sharing a public OSI dashboard, the prototype has been created using Google Looker Studio. Depending on what we learn from sharing this prototype dashboard, we will explore longer-term solutions which may use different technologies, in particular those that are fully open source.