This chapter evaluates a variety of strategies for computational text analysis, otherwise known as "text as data" (TAD). We begin by providing general guidance on constructing a corpus - or a collection of documents - relevant to a research question, followed by an explanation of the necessary steps to prepare raw text for a variety of computational models. We then describe three TAD modeling strategies. First, we discuss dictionary methods, which leverage the frequency and tone of words to measure documents on a quantity of interest and/or classify them into a particular category. Second, we review supervised classification, where a machine learning algorithm assigns documents to predetermined categories. Finally, we address unsupervised techniques, which seek to discover new ways of organizing texts that are theoretically useful, but perhaps understudied or previously unknown.
Institutional Login
Log in with Open Athens, Shibboleth, or your institutional credentials
Personal login
Log in with your Elgar Online account