Skip to Main Content
The University of Massachusetts Amherst

Data Mining and Text Mining at UMass

Guide to data mining and text mining resources

What are data and text mining?

Data mining and, more specifically, text mining are research techniques, using computational analysis to uncover patterns in large data sets.  "Text mining" is just data mining in large text-based data sets. 

This kind of analytic tool is useful in numerous scholarly fields, from the humanities (where it is sometimes viewed as one of the tools of "digital humanities" scholars), to the sciences, where useful data can be "mined" from large non-text datasets and from text databases of the published literature. 

UMass Resources

The UMass Libraries are developing resources to help faculty and students engage in text and data mining.  In addition to these resources which affirmatively permit data mining, the Libraries can also negotiate assistance for individual projects.  Please contact us if you have suggestions for additional resources, or research projects with which we can help you.  

  • HathiTrust - UMass Amherst is a partner in HathiTrust, a research university collaboration to archive and share digitized collections.  (See press release, Dec. 6, 2013.) HathiTrust makes available multiple collections of works for research purposes, including the public domain works digitized by Google for its Google Books project.  The University Libraries have signed an agreement with Google to permit research in the Google-digitized corpus, as well as in other HathiTrust collections.  See for more information about the process of establishing research access.  Contact your reference librarian to initiate a project. 

  • Elsevier - UMass Amherst subscribes to numerous Elsevier resources, and Elsevier permits text mining of subscribed content on ScienceDirect for noncommercial purposes, via ScienceDirect APIs.  See for more information about Elsevier's Science Direct resources.  Contact your reference librarian for assistance.