This chapter evaluates a variety of strategies for computational text analysis, otherwise known as "text as data" (TAD). We begin by providing general guidance on constructing a corpus - or a collection of documents - relevant to a research question, followed by an explanation of the necessary steps to prepare raw text for a variety of computational models. We then describe three TAD modeling strategies. First, we discuss dictionary methods, which leverage the frequency and tone of words to measure documents on a quantity of interest and/or classify them into a particular category. Second, we review supervised classification, where a machine learning algorithm assigns documents to predetermined categories. Finally, we address unsupervised techniques, which seek to discover new ways of organizing texts that are theoretically useful, but perhaps understudied or previously unknown.

You are not authenticated to view the full text of this chapter or article.

Access options

Get access to the full article by using one of the access options below.

Other access options

Redeem Token

Institutional Login

Log in with Open Athens, Shibboleth, or your institutional credentials

Login via Institutional Access

Personal login

Log in with your Elgar Online account

Login with your Elgar account