Skip to main content

Data Management

Tips on Managing Your Data

Each data set in each discipline is different and unique -- but that doesn't mean you can't manage your data!

Essentially, data management is about creating a system that works for you and your lab, and is in line with dominant practices in your discipline. Data management tends to be established around four main pillars:

Data Organization

Data Documentation

Sharing Your Data

Storage & Back-up

Data Organization

One tip is to think about the end of your research proposal, and develop your data management plan based on the forecasted product. It doesn't have to be exactly right -- a guess and a plan is much better than nothing at all!

You should have a top-level directory or folder, which should include:

  • Project title
  • Date

Within your top-level folder, follow a clear and documented naming convention. You might need to divide your files into sub-directories, or you could have different versions of your dataset, or data related to each person in the group.

Some naming convention best practices include:

  • Avoid using blank spaces or other special characters (e.g., $ % & * #) which can easily become corrupted or misinterpreted by a machine
  • Use CamelCase or an underscore if you want to create space or ease of reading without including special characters.
  • Include the title of the project the data set relates to (e.g., lifeStories_subjectInterviews_201410)
  • Include the date the dataset was created in the name of your file, and change this date each time the data set is manipulated. This is also referred to as versioning, and can help you keep track of changes to your files

Data Documentation

Documenting your data is the practice of contextualizing your data, ideally at the time of creation. In libraries or in repositories, you might encounter the term metadata -- this is essentially the same concept, with some fixed rules and fields applied. Some disciplines have recommended metadata standards -- you should follow those as closely as possible, where available.

Data documentation helps other users (human and computer!) understand your data. By outlining how your data was created, the context for the data, the structure of the data (including your filing and naming system), defining terms, abbreviations, or acronyms, and more, you'll help make sure your data is accessible and reusable -- main principles of a Data Management Plan.

While what information you capture and how you capture it is dependent upon your discipline, you should try to document:

Title.
   The name of the dataset, and its associated research project.

Creator.
   The names and addresses of the organization or people who created the data.

Identifier.
   
The number used to identify the data. This could be an internal accession number or something more formal.

Dates.
   Important dates associated with the data. This can include the project start data and end date, modification dates, and time period covered by the data.

Subject keywords.
   The keywords or phrases describing your data.

Funders.
   
Organizations or agencies who funded the reesarch.

Rights.
   
Any known intellectual property rights held for the data.

Language.
   Languages of the content of the resource, when applicable.

Location.
   
When data has a physical location, record information about its spatial coverage in a way consistent with your discipline.

Methodology.
   How this data was generated. Include information on equipment or software used, experimental protocol, or other information you might include in a lab notebook.

Sharing Your Data

Sharing your data is of great benefit to you and your community!

Try to look at the data repository you've decided to use to share your data well before you begin collecting data. This way, you'll be able to create a dataset that is easily uploadable, and already contains all the information you'll need.

Also be sure to define all the terms and abbreviations you use in your data set. This will make your data easier for others to understand, and will remove any ambiguity around your terms.

In general, the following guidelines apply to sharing your data in a repository, and in making it archivable for long-term preservation:

Use a file format that is:

  • Nonproprietary
  • An open, documented standard
  • Uncompressed
  • Unencrypted
  • Commonly used by your research community

Preferred file formats include:

  • Image: JPG, JPG-2000, PNG, TIFF
  • Text: plain text (TXT), XML, PDF/A
  • Audio: AIFF, WAVE
  • Containers: TAR, GZIP, ZIP
  • Video: MPEG-4 (.mp4)
  • Spreadsheets: CSV (comma separated values)

See the National Archives' page on Format Guidance for more information, as well as more guidance on other formats.

Storage & Back-up

You'll need to do a little homework around data storage and backup before your project gets started! The following guidelines provide some best practices.

After you set up your backup system, try accessing your data from time to time to make sure there are no errors or malfunctions in your system.

Back up your data:

  • Have three copies (e.g., original internal; local external; remote external)
  • Copies should be geographically distributed, ideally in areas with enough distance that there is only a remote possibility that they will be impacted by the same disaster

Storage options:

  • Personal computer hard drive, external hard drive, university servers
  • Box @ UMass Amherst
  • Tape backup system

Avoid using:

Security:

  • Unencrypted security is ideal for storing your data so that you and others can easily read ut, but if encryption is required because of the sensitive nature of your data:
    • Keep passwords and keys on paper (2 copies) and in a PGP (pretty good privacy) encrypted digital file.
    • Don't rely on third party encryption alone
  • Uncompressed is ideal for storage, but if you must do so to save space, keep your first and second copies untouched, and compress your third backup copy.

More questions?

You are welcome to contact us, or you can read more at the UK Data Archive.

But what if...??

What if you already have a large data project, or a legacy project that you'd like organized?

Get in touch with the Data Working Group, and we'll work with you on figuring out how to plan for retroactive data management. 

Policy on Data Retention

Per UMass Amherst's Policy on Data Ownership, Retention, and Access, data should be retained for at least three years after its creation. If data is created as part of a sponsored research project, then the data needs to be retained for at least three years after the final report is submitted, or the ending date of the project, whichever is later.

If anyone is using your data, it needs to be retained as long as it is in use. Data will not be discarded or destroyed when not in use.

Project Management

Workflow Managers

© 2016 University of Massachusetts Amherst.