Open Science Indicators FAQs
Open Science Indicators FAQs
Thank you for using the dashboard. We hope the information below will help you understand and reuse the data. If your question is not answered here please get in touch with us using this form.
What are Open Science Indicators?
Open Science Indicators are a tool developed by PLOS in collaboration with DataSeer to measure open science behaviors in a standardized way.
What indicators are measured in the OSI dashboard prototype?
The dashboard measures five open science practices associated with published open access research articles:
- Research data sharing
- Code sharing
- Preprint posting
- Protocol sharing
- Study registration
Definitions and measures of these indicators are summarized in the below table and follow the PLOS OSI dataset definitions, principles and measures.
Definitions table
OSI |
Definition |
Indicator measures |
|
| Research data sharing | Any research data supporting the results of a published article are findable and accessible |
|
|
| Code sharing |
Any code (programming statements created with a text editor or a visual programming tool) used to produce the results of a published article are findable and accessible |
|
|
| Preprint posting |
An early version of a scholarly research article (a preprint) is findable and accessible online and posted before the associated article’s publication date |
|
|
| Protocol sharing | Detailed or “step-by-step” instructions for carrying out a research procedure for the research article are findable and accessible online |
|
|
| Study registration |
Elements of the research plan such as research questions or hypotheses, study design, or data analysis approach are findable and accessible online |
|
|
More information on how PLOS OSIs are defined is in the PLOS OSI definitions and measures paper, and in the detailed methods documentation in the public data release in Figshare.
What content is analyzed in the PLOS OSI dashboard?
The PLOS OSI dashboard prototype currently presents an analysis of the PLOS OSI dataset covering the time period 2018 to 2024. This includes all research articles published in PLOS journals in this time (approximately 134,000 articles), plus a comparator set of non-PLOS content from PubMed Central (approximately 27,000 articles). The non-PLOS comparator set of articles is topic matched to the PLOS content using article MeSH (Medical Subject Headings) terms that are assigned to articles listed in PubMed by the US National Library of Medicine, ensuring it is a representative comparison group. Using the dashboard, you can view results for the entire corpus of PLOS and non-PLOS content or compare PLOS to other publishers using the Publisher filter. We would love to include a larger comparator corpus of content but for open science monitoring to be sustainable at a larger scale, we need other publishers to also share similar indicators for content they publish.
How are OSIs produced?
After creating the PLOS OSI definitions and measures, PLOS first partnered with DataSeer to create the PLOS OSI dataset in 2022. The five indicators of open science practice (OSI) are extracted and created using a combination of natural language processing (NLP) and large language model (LLM) artificial intelligence tools applied to the full text of published research articles and other open data sources.
This includes but is not limited to analyzing Data Availability Statements and Supporting Information file titles and captions to identify if data and code have been generated and shared by looking for keywords and their proximity to other keywords. Information like the persistent identified (PID, such as a Digital Object Identifier, DOI) or Accession Numbers for shared data and repository name are extracted and included in the OSI dataset.
Preprints are detected using an LLM pipeline that searches the DataCite and Crossref databases for documents labelled as preprints that have similar titles and author lists to the published article.
Indicators for protocols and study registration are detected by a search of known databases/registries of these outputs and matched to the article based on factors such as authors, PIDs, regular expressions, and keyword searches of the full text to help identify if the article mentions a protocol or registration.
In addition to these OSIs, metadata are extracted from article XML to aid comparisons using and analysis of the data. The dataset is further enhanced using standardized metadata from OpenAlex to capture all affiliation information from articles – not just corresponding authors. This provides additional detail on geography, research organization and funder.

How is the PLOS OSI dashboard created?
The dashboard prototype is created in Looker Studio, a product from Google that can create visualization of datasets that are hosted in Google Big Query (GBQ) or in Google Sheets. The current dashboard runs from a version of the PLOS OSI dataset hosted in PLOS’ GBQ database.
The PLOS OSI dataset has been enhanced with information on research organization, research fields, author country, and research funders from the OpenAlex database to enable users to compare articles in the dashboard in more detailed ways. The specific dataset underlying the dashboard and the queries used to create the dashboard in Looker Studio are available openly here.
How can the OSI dashboard be used and reused?
We encourage the dashboard and underlying data and methods to be reused as much as possible. The PLOS OSI dashboard enables its users to:
Understand what open science practices can be measured in publications at a broad scale
Explore how trends in practices are changing over time – at a global level or for an entity or group of interest
Compare how open science practices differ between countries, research organisations, funders, and/or research fields
Share insights about open science indicators with colleagues and peers
If you have feedback, questions or would like help reusing the dashboard or underlying datasets, please use this form.
How should I cite the dashboard or OSI data?
To cite the dashboard, please cite as:
- Public Library of Science. PLOS Open Science Indicators Dashboard Prototype [Internet]. Google Looker Studio; 2026 Jan 21 [insert accessed date]. Available from: https://lookerstudio.google.com/reporting/2b34c431-c3dd-4eed-ac69-0c8856bd8af7/page/GAiiF
To cite the data and / or queries underlying the dashboard, please cite as:
- Public Library of Science (2026). PLOS Open Science Indicators public dashboard data and queries. Public Library of Science. Dataset. https://doi.org/10.6084/m9.figshare.31078504
To cite the PLOS OSI dataset and/or methods, please cite as:
- Public Library of Science (2022). PLOS Open Science Indicators. Public Library of Science. Dataset. https://doi.org/10.6084/m9.figshare.21687686.v10
How should the OSI dashboard not be used?
In line with the PLOS Open Science Indicators Principles, and the Open Science Monitoring Initiatives (OSMI) Principles, the dashboard should not be used to rank research organizations, countries, research fields, journals, publishers, or individuals. Comparisons of entities, with the appropriate context, can however be valuable. To promote responsible use, comparisons of funders, institutions, and countries are limited to five comparable entities. All entities can be compared to the average of the entire dataset. PLOS OSIs and the dashboard are also not designed as an auditing or compliance enforcement tool – for individual entities, articles, or individuals.
How accurate are the results?
For all indicators in the OSIs PLOS and DataSeer have aimed for a minimum accuracy of at least 85%. For each indicator a range of accuracy metrics (accuracy, sensitivity, specificity, precision, F-score) are given in the full documentation of the public data in Figshare. Below are the accuracy rates for each of the five indicators in the dashboard for both the PLOS articles and the non-PLOS comparator set. The accuracy rate is calculated by randomly selecting 100-200 articles from each corpus and checking them by hand to identify false positives and false negatives.
|
Indicator |
Accuracy assessment - PLOS articles |
Accuracy assessment - non-PLOS comparator articles |
|
Data sharing |
85% |
81% |
|
Code sharing |
97% |
94% |
|
Preprint posting |
97% |
97% |
|
Protocol sharing |
94% |
97% |
|
Study registration |
99% |
99% |
Where can I find the underlying data, methods and code?
In line with the PLOS Open Science Indicators Principles, and the Open Science Monitoring Initiatives (OSMI) principles, data, code, and methods underlying PLOS OSIs are openly available. The PLOS OSI dataset is available as an openly-available file in Figshare along with detailed methods and accuracy information. A large proportion of the code and software used to produce PLOS OSIs, in particular those relating to data and code sharing, is available under an open source license here.
The specific dataset underlying the dashboard along with the queries used to create the dashboard in Looker Studio are available openly here. To enable PLOS to learn quickly about the potential value and impact of sharing a public OSI dashboard, the prototype has been created using Google Looker Studio. Depending on what we learn from sharing this prototype dashboard, we will explore longer-term solutions which may use different technologies, in particular those that are fully open source.