File Naming

Structuring your files

Having a consistent and logical folder structure is essential for every project, and especially necessary when working as a group. Best practice is that the structure is agreed and adopted by all participants, which helps in maintaining it as coherent and understandable. Users must be able to navigate easily through folders and files, so it is important to keep the hierarchy logical, without being too deep or too shallow. A good structure for a project has from 3-4 levels, although this number can vary depending on project complexity. Folder levels can be organised in terms of activity, data types, content kinds, or any categorical manner, depending on project needs.

Before a project starts, start planning how to organise the project files and folders. This helps to ensure that the correct files can be located, identified and retrieved from a filing system in a timely and accurate fashion.

Having designed and agreed the management plan for data files and associated documentation, it is important to also plan how to manage versions (version control).

Best practice is to

Uniquely identify files, preferably using a systematic naming convention (see below)
Clearly record version and status of a file, e.g., draft, interim, final, internal. Where a version number is applicable, it should always appear in the file name so that the most recent version can be easily identified and retrieved.
Record what changes are made to a file when a new version is created - use a document metadata section.
Record relationships between items as, in many cases, the information contained in a single file is supported by information held in other files, e.g., relationship between the code and the data file it is run against, or between the data file and the documentation or metadata that relate to it, or between multiple tables.
Track the location of all files if stored in a variety of locations.
Regularly synchronise files in different locations, e.g., using OneDrive.

Data file naming best practices

There are no perfect folder and file naming conventions, but there are some best practice rules that can help guide you.

If you have multiple, related files it is a good idea to be consistent and use a relevant naming convention.
Try to keep file names short but descriptive (<25 characters) - bear in mind that the maximum full path including the file name is 259 characters in a Windows systems.
Do not use spaces and special characters (e.g., * : \ /< > | ” ? [ ] ; = + & £ $).
Underscores and hyphens are acceptable but it is best not to start or end a file name with these characters.
Use capitals (CamelCase) and underscores instead of spaces or slashes but bear in mind that using all lower-case names is less software and platform dependent.
Use date format YYYYMMDD or YYMM, etc. Putting the year first helps when sorting the files.
Start your file name with the most important parameter. If the file will be searched for and retrieved by date, the date element should appear first. If the file will be retrieved according to another descriptor, that element should appear first.
If using a sequential numbering system, estimate the number of files you might accumulate during the entire project and use an appropriate number of leading zeros. This helps when sorting files in sequential order.
Follow similar logic for folder structures and names. Bear in mind that file names need to be meaningful if moved or shared outside their original folder location; think about how much information is only in folder names.
As with file naming, it is important to develop a folder structure which makes sense with your project and data. For example, if you have hundreds of images collected over several years from different locations, you may want to organise first by year, then month, then location. You could also organize them entirely by date and include the location in the file name.
Check with the data centre where your data will be deposited, they may have specific file naming recommendations.

Metadata in file names

If you have a file naming convention that includes important metadata within the file name (e.g., site codes or collection dates), will that information be available elsewhere or only in the file name? File names are very useful as metadata for people involved in the project but to computers they are just identifiers. To prevent mishaps with renamed files, metadata information should always also be available elsewhere and not only through the file name.

Also note that if metadata changes, embedding it in the file names may require renaming files during the project and this may have implications for references to those files.

If you are working on sensitive data, do not include personal or identifying information in your file names. It may be useful to include an element in the file name which allows you to identify that there is identifying information in the file contents.

You should keep a text file (README) containing the information of each file stored in the folder. README files are also useful for others to understand the underlying context of the files and subfolders stored in a set location. It is the first thing someone will open to have an overview of the data, so it’s important to include any relevant information that should be known before using the files for any matter.

Project planning

When working in collaboration with others, it is important all follow the same file naming convention. In your Data Management Plan (DMP) for the project, you can describe your folder and file naming plan for data-related files. Consider defining a set of templates as a reminder to all staff contributing material to be consistent, e.g., instrument_location_yyyymmdd[_extra].ext

You will also need to provide an explanation of the file naming convention when you deposit data files in a repository for long-term storage and sharing. If you plan on using abbreviations in any of your file name elements, it is essential to document these.

Examples

1486Xiuytr.csv

This file name does not tell you anything about the data it contains. It may mean something to you now but will not help others identify it, including you in the future.

Location data from the UK Monitoring Scheme 1980.csv

This file name is very long and contains spaces. Computer systems have a limit for the length of filenames (including the full path) and spaces often cause problems if handled using code. Keeping names short and without spaces from the start can avoid a lot of issues.

1980UKMonitoringLocationData.csv

This is descriptive, short and contains no spaces or special characters.

Tham_AWS_Gwy_1980_04_06.dat

Tham_AWS_Gwy_1980_05_04.dat

File names show site, instrument and download date.

Futureproofing file paths

Including the full path and file name for your files in code, scripts or models can be a problem. If the file is moved to a new location, or the folder is renamed, the code will need to be edited. This is particularly error-prone if the file location appears more than once in the code or script.

Plan ahead by using a variable name (setting the full path only once) or making use of reference files that set the file paths when a script is executed. Consider alternatives such as using a dialogue to prompt the user for the file location if that is helpful.