Skip to Main Content
The University of Massachusetts Amherst

Managing Your Data

Take care of the products of your research -- the tips here will help your work be available long into the future!

Plan for future use

Envisioning future use helps broaden the reach and impact of your data, and takes planning at the beginning of your project. 

This will help you identify potential areas of friction when trying to share your data (e.g., proprietary formats, unexpected costs related to sharing), and help you think critically of the future of your data. 

In the tabs below, learn more about:

  • Broadening your audience, by determining the who, what, when, where, and how of sharing your data.
  • Considering future applications of your data, to help you critically assess what data is of future use. 
  • Licensing your data and research products, to ensure that others understand how you want your data used, and under what circumstances.

Plan for future use

Share your data as broadly as possible. 

“Share your data” -- these three little words belie a host of questions and processes. When you break down your own sharing into who, what, when, where, why, and how, you can answer many of the questions your intended repository or future collaborators will ask of you.



Questions to think about: 

  • Who:
    • Who generated your data? Include graduate students, post docs, undergraduate students, technical staff, etc.
    • Who will be able to use your data? The public, only other scholars, only certain PIs, only certain individuals with particular funding streams?
      • If you need to apply restrictions to access, explain why.
  • What:
    • What are you sharing? Describe the data you are sharing. 
    • What formats will your data take? Describe the file formats. Try to use open source, stable file formats.
    • What supplemental material is needed to help others understand your data?
      • Include a readme file, data dictionary, and a description of your file naming convention and file hierarchy. 
      • You may also need to include code or other supplemental processes. 
    • What restrictions do you need to consider ahead of sharing your data?
      • If you are unable to openly share your data, state why. 
      • Consider sharing de-identified data, or group-level data instead of individual- or participant-level data.
  • When:
    • When was your data generated?
      • This could be the time period in which data was collected, or the time period you used as part of your parameters. 
    • When will you make your data available to others for re-use? This is often at time of first article publication, but may depend on your current status (e.g., writing a dissertation) or your discipline (release data immediately after it is cleaned).  
  • Where: 
    • Where will you share your data? A corporate repository? A discipline-specific repository? A repository at your institution? Your website? Something else?
  • Why:
    • Why was this data generated - refer to your larger projects, grants, or papers and other research products that link to your dataset and help provide the bigger picture of your work.
  • How:
    • How was this data produced? Document information on:
      • Instruments used
      • Software used
      • Methods used - including for data capture, pre-processing, post-processing, cleaning data, or whatever is relevant in your circumstance. 
    • How is this data interpreted?
      • Include software necessary to interpret your data.
    • How will you help others understand this data?
      • Including a readme file, a data dictionary, and a description of your file naming convention and file hierarchy can help others easily or quickly understand the data you have generated. (Not to mention, it can help you interpret your data at a future point in time!). 
  • Miscellaneous:
    • Include any other concepts, restrictions, or considerations you have made ahead of sharing your data. This might include referencing other policies that impact how your data is shared, other policies that affect data management or data security, or expectations in your field. 

Activity:

Map out the relationship of the research lifecycle to the data lifecycle. How does your data map to your research? 

Stuck? Take a peek at this diagram, or dive into Twitter to look examples of the #datalifecycle and the #researchlifecycle

 


Further reading: 

Assess future uses of your data. 

How might your data be used in the future? Funders are sensitive to funding studies that duplicate effort -- and articulating that your data is both novel and reusable will strengthen your proposal. Alternatively, noting that data exists elsewhere and you will be building off of existing data can demonstrate how deeply you understand the field, or how connected you are to current research.

Describing potential future uses of your work can be a useful exercise to do ahead of writing a grant. You can start to see long-term and future implications of your work.  

Furthermore, you can start to determine what data you must share and curate for long-term protection and access, and what data is supplemental or ephemeral. 


Questions to think about: 

  • Who might reuse the data? 
    • Is your data extremely specialized, so only a handful of individuals will be able to understand it and use it?
    • How might you improve reuse outside of your field?
  • How might the data be reused?
    • Can your data be combined with other data? 
    • How have you made your data interoperable? 
    • Are there opportunities for meta-analysis?
  • What data has long-term use and impact?
    • Determine what data could be of future use ant interest. See the activity or the selected works for more detail. 
    • Data underlying your publications are important to protect for the long term, especially as retractions can occur due to inability to locate source data, even for articles published 20 years ago.

Activity: 

Identify the future uses of your data. Use our checklist to help you think about future uses of your data. 


Further reading: 

Determine what license best fits your needs.

Licensing data and other products of your research helps others understand what permissions you give in re-distributing and re-using your data. Licenses reduce uncertainty and ambiguity, and tell users up front if their intended use is ok. 

Note that facts are not copyrightable, so copyright laws do not apply to facts.


Resources: 


Questions to think about: 

  • What do you want to license - code, scripts, computer programs, drawings, images, audio files, video files, spreadsheets, etc.?
  • Do you want to receive attribution for your data?
  • Do you want others to be able to reuse your data?
  • Do you want others to be able to build upon your data?

Further reading: