Managing Your Data
- Home
- Care for your data
- Archive and preserve your dataToggle Dropdown
- Share your dataToggle Dropdown
- Write your data management planToggle Dropdown
- The How and Why of Open DataToggle Dropdown
- Love Data Week 2024
- Love Data Week 2025
Contact the Data Working Group
Have another idea? Have a question we haven't answered yet?
Contact us!
Use Stable File Formats
Use file formats that are open, standard, and well documented.
Stable file formats are highly unlikely to become obsolete, orphaned, or subject to abandonware, in which software or hardware is no longer maintained by its creator.
The table below outlines data types and stable, preferred file format examples.
Data type | Preferred file format examples |
---|---|
Containers | TAR, GZIP, ZIP |
Databases | XML, CSV |
Geospatial | SHP, DBF, GeoTIFF, NetCDF |
Moving images | MOV, MPEG, AVI, MXF |
Sounds | WAVE, AIFF, MP3, MXF |
Statistics | ASCII, DTA, POR, SAS, SAV |
Still images | TIFF, JPEG2000, PDF, PNG, GIF, BMP |
Tabular data | CSV |
Text | XML, PDF/A, HTML, ASCII, UTF-8 |
Web archive | WARC |
Using stable file formats makes your data more replicable, more easily combined with other datasets, and has a much higher likelihood of being accessed in the future. Stable file formats have a long history of access and use. Some file formats even predate personal computers - data in the form of comma-separated values was supported as early as 1972 (as “list-directed input/output).
Stable file formats are those that are:
- Non-proprietary. Non-proprietary file formats are usable by many different operating systems and different versions of operating systems, and are not restricted by a specific software or manufacturer. When working with proprietary software, you may have to choose to export your data into a stable file format.
- Uncompressed. Compression algorithms modify your data in order to make files smaller by rounding off bits of ‘nonessential’ information. Low-quality images or sounds could impact how your data is analyzed and the results of your work. Working with a raw format and sharing and saving only the compressed formats could mean that your work is no longer reproducible.
- Unencrypted. Encryption algorithms can change or be lost -- rendering your data inoperable.
Resources and tools
Resources and tools:
- FileInfo.comBrowse file types, and see what programs open file extensions.
- PRONOMTechnical registry from the UK National Archives. Can search file format types, risk of data types, and migration pathways for some file types.
Questions To Think About
Questions to think about:
- Are the file formats you use standard in your community?
- Could you use a more open file format and still be in line with the files your community uses?
- Do you need a particular piece of software to read and use the data file?
- If so, can you save your file in a stable file format?
- Document the software package, version, and operating system used, even if you share the data in a stable file format in your readme file or data documentation.
- Do you have multiple files saved as part of your data file? (e.g., GIS often requires a package of several files to make a dataset viewable).
- Document the structure of your data in your readme file or data documentation.
Further Reading
- Sustainability of digital formatsLibrary of Congress (2019). Sustainability of digital formats: Planning for Library of Congress collections. Retrieved 2019 10 17 from https://www.loc.gov/preservation/digital/formats/
- What is the difference between binary and text files?FileInfo.com (2011). What is the difference between binary and text files? Retrieved 2019 10 18 from https://fileinfo.com/help/binary_vs_text_files
- Last Updated: Oct 21, 2024 1:56 PM
- URL: https://guides.library.umass.edu/data
- Print Page