Skip to Main Content
The University of Massachusetts Amherst

Managing Your Data

Take care of the products of your research -- the tips here will help your work be available long into the future!

Use Stable File Formats

Use file formats that are open, standard, and well documented.

Stable file formats are highly unlikely to become obsolete, orphaned, or subject to abandonware, in which software or hardware is no longer maintained by its creator.  

The table below outlines data types and stable, preferred file format examples.

Data type Preferred file format examples
Containers TAR, GZIP, ZIP
Databases XML, CSV
Geospatial SHP, DBF, GeoTIFF, NetCDF
Moving images MOV, MPEG, AVI, MXF
Sounds WAVE, AIFF, MP3, MXF
Statistics ASCII, DTA, POR, SAS, SAV
Still images TIFF, JPEG2000, PDF, PNG, GIF, BMP
Tabular data CSV
Text XML, PDF/A, HTML, ASCII, UTF-8
Web archive WARC

 

Using stable file formats makes your data more replicable, more easily combined with other datasets, and has a much higher likelihood of being accessed in the future. Stable file formats have a long history of access and use. Some file formats even predate personal computers - data in the form of comma-separated values was supported as early as 1972 (as “list-directed input/output). 

Stable file formats are those that are: 

  • Non-proprietary. Non-proprietary file formats are usable by many different operating systems and different versions of operating systems, and are not restricted by a specific software or manufacturer. When working with proprietary software, you may have to choose to export your data into a stable file format. 
  • Uncompressed. Compression algorithms modify your data in order to make files smaller by rounding off bits of ‘nonessential’ information. Low-quality images or sounds could impact how your data is analyzed and the results of your work. Working with a raw format and sharing and saving only the compressed formats could mean that your work is no longer reproducible. 
  • Unencrypted. Encryption algorithms can change or be lost -- rendering your data inoperable. 

Resources and tools

Resources and tools: 

Questions To Think About

Questions to think about:

  • Are the file formats you use standard in your community?
    • Could you use a more open file format and still be in line with the files your community uses?
  • Do you need a particular piece of software to read and use the data file? 
    • If so, can you save your file in a stable file format?
    • Document the software package, version, and operating system used, even if you share the data in a stable file format in your readme file or data documentation.
  • Do you have multiple files saved as part of your data file? (e.g., GIS often requires a package of several files to make a dataset viewable). 
    • Document the structure of your data in your readme file or data documentation.

Further Reading