Sensitive Data in Environmental Datasets
Personal, sensitive and confidential data
Research data may contain personal, sensitive or confidential information and there are obligations (some legal) on researchers, projects and the NERC Data Centres to ensure these data are properly managed both during the research and beyond.
In terms of environmental research, such data commonly falls into the following categories:
- Personal - Data from which a person could be identified. Covered by UK GDPR legislation.
- Sensitive - 1) personal data that are special categories of personal data, covered by UK GDPR legislation. 2) informally, where there are concerns over publicising non-personal information e.g. the existence or location or a protected site or species location. See 'Confidential'
- Confidential - non-personal information where release may compromise legal protection, terms of access consent or continued scientific monitoring e.g. from which a site or species location could be identified.
Personal Data
Personal data are data about a living individual person, from which they can be identified. These are covered by UK GDPR legislation (Data Protection Act 2018), but there is a special category of personal data, which requires even greater protection.
You must make plans before you collect or acquire any personal data that ensure you comply with UK GDPR legislation (Data Protection Act 2018) for how you will collect, store, analyse and share the data during your research and what will happen at the end of the research. You need to determine which of the defined roles you have under UK GDPR legislation - a 'Data Controller' or a 'Data Processor'. You should also identify if you have any personal data that falls under the 'special category' as these have greater protection under UK legislation. Guidance is available from the NERC Data Centres.
After a project ends, personal data should not be stored longer than is required for the business need for retaining such data. Any data of long-term re-use value should be archived with an appropriate long-term data centre. Data or supporting documentation containing any personal information that allows an individual to be identified cannot be deposited into the NERC Data Centres but could be if anonymised. Otherwise, they can be offered to a specialist data centre that can accept personal data, e.g., The UK Data Service.
How to identify personal data?
Personal information can be either quantitative or qualitative.
Quantitative personal data
Some forms of quantitative personal data are easily identified. For example, addresses, names or e-mails of individuals which are not in the public domain. Care is needed with other forms or combinations of quantitative data. For example, while it may not be possible to identify a living individual from a Post Code in an urban area, it may be possible in rural or sparsely populated areas where Post Codes cover a smaller number of properties. Careful consideration needs to be given that individuals cannot be identified from combinations of data in a dataset. For example, it may be possible to identify an individual with a combination of a Post Code and age in a dataset. Quantitative information could be in the form of a grid reference. There may be cases where individuals do not want, for example, a sample point on their land to be identifiable.
Qualitative personal data
Qualitative personal data could be in the form of transcripts. For example, 'Mr. Jones said that the organisation responsible was not doing a good job.' Even if names are removed from a transcript care is needed that the text does not identify an individual such as by giving a history of an individual.
UK GDPR legislation: if you are collecting or acquiring any personal data you must comply with UK GDPR legislation (Data Protection Act 2018) for how you will collect, store, analyse and share the data during your research and what will happen at the end of the research. You need to determine which of the defined roles you have under UK GDPR legislation - a 'Data Controller' or a 'Data Processor'. Guidance is available from the NERC Data Centres.
Data security: If a project intends to collect personal data or will be using third party data which contains personal or sensitive information, then careful consideration needs to be given to the security of such data. It is recommended that advice is sought as to the most suitable way of storing the data. For example, creating folders with access limited to only those who will be using the data.
Consent: Consent must be obtained from individuals before collecting any personal data. They should be made aware of the intended use, who is responsible for the data, who will access them, what will happen to any analysis/reports and what will happen in the long term.
If personal data from which living individuals can be identified are to be ingested into a long-term data centre, e.g., the NERC Data Centres, there must be evidence that consent has been given to publicly share the data, from the participants concerned. Where written consent has been given by participants, a copy of the consent form should form part of the supporting material. Where verbal consent has been given, a statement in the supporting documentation saying verbal consent had been given by all participants will be considered.
Confidential/sensitive data
Other types of non-personal sensitive data are those that contain information that must not be made publicly available for either business (science integrity, long-term viability of research, etc), or security reasons. This includes third party data sets. However, it may be possible to make a 'de-sensitised' version of the data publicly accessible.
For example, a project may be collecting stream water samples for chemical analysis at a number of sites, the locations of which have been recorded as GB National Grid co-ordinates. The landowners may not want the site location of sample points to be publicly available and could be concerned about investigation if the chemical data showed any issues with the stream water chemistry. In such cases ways of anonymising the data would have to be discussed with the project. One option would be to report the stream location to the nearest square kilometre. The project will need to decide if the exact locations of the sample sites need to be kept in a secure location, in case they were needed for future use.
However, if the level of anonymity applied is too great then the usefulness of the data resource could be impacted.
Storing personal and other sensitive data in the long-term
If personal or sensitive data have not been archived to an appropriate long-term data repository and need to be retained, it is essential that proper plans are in place to safeguard the security of any sensitive data in future and that these plans meet with Data Centres procedures.
Plans should consider and set out:
- where will these data be securely stored
- who will have edit /read access to the space
- who will be responsible for these data
- where will documentation be kept to state why the data are sensitive and cannot be made freely accessible (ideally stating one of the legitimate exemptions from access under the UK EIR regulations)
- how long will they be retained before being disposed? UK GDPR legislation states that personal data should not be retained longer than there is a business need for doing so
Where can personal data be deposited to a long-term data centre?
This will depend on two factors, the level of anonymity that can be applied to the data and the wishes of the project. Fully anonymised socio-economic data from NERC grants can be deposited with The UK Data Service, the UK's largest collection of social, economic and population data resources. There is an understanding between NERC and the NERC Data Centres that any socio-economic data resulting from NERC grants can be offered for deposit with the UK Data Service, though they reserve the right to decline them.
Personal data can be removed by the process of anonymisation or deidentification. However, careful consideration needs to be given to the level of anonymity that can be applied to a data resource. If the level of anonymity applied is too great, then the usefulness of the data resource could be impacted.
If the data from a project cannot be fully anonymised to remove all personal information or the project do not want to remove personal information, then the data can still be offered for deposit with the UK Data Service.