Data Quality/Quality Control
Research data are the raw fuel of science
Quality control of data is an integral part of all research and takes place at various stages, during data collection, data entry or digitisation, and data checking. It is important to assign clear roles and responsibilities for data quality assurance at all stages of research.
Data quality control procedures are put in place to ensure data consistency within each dataset and between different datasets where applicable. The aim is to ensure that the quality, and errors, of the data are apparent to users so that they have the information they need to assess the suitability of the data for their task. Quality control measures, if carried out well, will maintain common standards giving users confidence in the reliability of the data.
Data quality control procedures can include
- Automated checks
- Scientific checks such as looking for unexpected anomalies
- Providing documentation
- Detecting errors made during collection, transfer or reformatting
- Detecting missing values or information
- Detecting duplicates
- Attaching a quality flag to each numerical value in order not to modify the observed data points
During data collection, researchers must ensure that the data recorded reflect the actual facts, responses, observations or events.
If data are collected with instruments:
- calibration of instruments is essential to check the precision, accuracy, bias and/or scale of measurement
- data are validated by checking for equipment as well as digitisation errors
- data may be verified by checking the truth of the record with an expert or by taking multiple measurements, observations or samples
Standardised methods and protocols can be used for capturing observations, alongside recording forms with
clear instructions.
Flags
Use data flags to qualify the data. Such additional information describe the quality of the measured data during the collection period. Definitions of flag codes should be included in the companion data documents.
One example of data quality flags
Flag | Meaning |
V | Data accuracy: acceptable response, valid data |
M | Data accuracy: No value reported (data are missing) |
Z | Data accuracy: Zero value reported |
E | Data accuracy: value exceeded DQO by a factor of 2 |
X | Data accuracy: value exceeded DQO by more than a factor of 2 |
* | Lab incident: catastrophic laboratory failure |
C | Data accuracy: measured vs. calculated conductivity exceeds allowed tolerance |
I | Data accuracy: Ion balance exceeds allowed tolerance |
D | Data accuracy: value is at the laboratory's reported detection limit |
LD | Data accuracy: Less than detection limit (reported in data field) |