Data Documentation

Documenting your data

Comprehensive data documentation should be considered as part of best practice research in terms of creating, organising and managing research data. It is a key component of reproducibility of your research, i.e., how you got from capturing the data through to results. This will be invaluable to you if you need to re-run any part, to ensure confidence in your results and to be able to publish a dataset as a science output. To ensure accurate documentation, begin at the onset of a project and continue throughout the research process.

Documentation of data should explain their lineage and provenance - how data were created or where acquired from, their content and structure, and any manipulations or alterations that may have taken place. It ensures that data can be understood during research projects, that researchers continue to understand data in the longer term and that re-users of data are able to interpret the data and use them appropriately. Good documentation is vital for successful data preservation and sharing and will be needed to generate documentation and metadata describing each dataset published.

What metadata should be included for useful data documentation?

The metadata you should include for your dataset will be field-specific, but generally all data documentation should include:

the context of data collection: project history, aims, objectives and hypotheses
data collection methods: data collection protocol, sampling design, instruments, hardware and software used, data scale and resolution, temporal coverage and geographic coverage
dataset structure of data files/tables, cases, relationships between files/tables
data sources used
data validation, checking, proofing, cleaning and other quality assurance procedures carried out
modifications made to data over time since their original creation and identification of different versions of datasets
within project access - read/edit permissions, IPR (especially where project partners involved), data confidentiality /sensitivity
public data sharing arrangements (authors, IPR, embargo periods etc)

At data-level, datasets should also be documented with

names, labels and descriptions for variables, records and their values
explanation of codes and classification schemes used

How does creating quality metadata benefit me?

Having comprehensive metadata with your dataset is a key part of following the FAIR data principles. It helps users to:

Understand the context of your dataset
Be confident in the reliability and quality of your data
Efficiently access and re-use your data
Easily find your data
Ensure they are following any use constraints or specific licence conditions.
Keep up to date with different versions of your dataset

All of these make it easier for other groups to access, re-use and cite your data, providing more space for collaboration and new science opportunities!

Data Documentation

Documenting your data

What metadata should be included for useful data documentation?

How does creating quality metadata benefit me?

Related Articles