Data integration

Data integration is a group of processes that combine data from disparate sources into a meaningful and valuable dataset for further novel analysis. 

The ability to appropriately integrate data rely fundamentally on an understanding of all the data to be integrated in terms of: 

  • what the data values are, the methods used in their creation, and any subsequent transformation 
  • the quality control and validation applied to those data 

Documentation and metadata describing the data are therefore essential in order to integrate datasets. 

To allow the best opportunity for integrating data: 

  • store data in, or transform to, well-known and preferably open formats usable in common software. 
  • use relevant standards for metadata. 
  • use community-agreed schemas, controlled vocabularies, keywords, thesauri or ontologies where possible. 

Common issues 

Data integration is especially challenging for environmental data because metadata standards are not always agreed upon and there are many different data types produced in these fields. If you need assistance with metadata standards, please contact the NERC Data Centres. 

Still need help? Contact Us Contact Us