Each data set in each discipline is different and unique -- but that doesn't mean you can't manage your data!
Essentially, data management is about creating a system that works for you and your lab, and is in line with dominant practices in your discipline. Data management tends to be established around four main pillars:
One tip is to think about the end of your research proposal, and develop your data management plan based on the forecasted product. It doesn't have to be exactly right -- a guess and a plan is much better than nothing at all!
You should have a top-level directory or folder, which should include:
Within your top-level folder, follow a clear and documented naming convention. You might need to divide your files into sub-directories, or you could have different versions of your dataset, or data related to each person in the group.
Some naming convention best practices include:
Documenting your data is the practice of contextualizing your data, ideally at the time of creation. In libraries or in repositories, you might encounter the term metadata -- this is essentially the same concept, with some fixed rules and fields applied. Some disciplines have recommended metadata standards -- you should follow those as closely as possible, where available.
Data documentation helps other users (human and computer!) understand your data. By outlining how your data was created, the context for the data, the structure of the data (including your filing and naming system), defining terms, abbreviations, or acronyms, and more, you'll help make sure your data is accessible and reusable -- main principles of a Data Management Plan.
While what information you capture and how you capture it is dependent upon your discipline, you should try to document:
Sharing your data is of great benefit to you and your community!
Try to look at the data repository you've decided to use to share your data well before you begin collecting data. This way, you'll be able to create a dataset that is easily uploadable, and already contains all the information you'll need.
Also be sure to define all the terms and abbreviations you use in your data set. This will make your data easier for others to understand, and will remove any ambiguity around your terms.
In general, the following guidelines apply to sharing your data in a repository, and in making it archivable for long-term preservation:
Use a file format that is:
Preferred file formats include:
See the National Archives' page on Format Guidance for more information, as well as more guidance on other formats.
You'll need to do a little homework around data storage and backup before your project gets started! The following guidelines provide some best practices.
After you set up your backup system, try accessing your data from time to time to make sure there are no errors or malfunctions in your system.
Back up your data:
- CDs or DVDs -- life expectancy is 2 - 5 years
- 3rd party cloud storage -- more relevant for confidential information; cloud storage has security issues and legal risks
What if you already have a large data project, or a legacy project that you'd like organized?
Get in touch with the Data Working Group, and we'll work with you on figuring out how to plan for retroactive data management.
Per UMass Amherst's Policy on Data Ownership, Retention, and Access, data should be retained for at least three years after its creation. If data is created as part of a sponsored research project, then the data needs to be retained for at least three years after the final report is submitted, or the ending date of the project, whichever is later.
If anyone is using your data, it needs to be retained as long as it is in use. Data will not be discarded or destroyed when not in use.