Inspired by the work of Laura Aull, I’ve been introducing students to corpora as a way to find information about writing in various disciplines—they can, for example, easily check whether professional writers in the sciences use the first person pronoun—and sometimes just for the kinds of curious poking around that is so much fun to do in large databases. Aull found, in an early study based on data drawn from corpora, that professional writers use more “hedges” (that is, qualifiers like “most” or “some”) in their writing than do first-year students, while first-year students use more “boosters” (words like “very”). I asked students to compare their own use of hedges and boosters in a sample of their writing and then use the Corpus of Contemporary American English to see what professional writers in other fields did. My students responded well to using the corpus and really liked analyzing their own writing for specific usages. They said doing so helped them feel they had “more control” over their writing.
Last week, I learned of a brand new corpus—Coronavirus Corpus—and took a look to see how students might use it for research. Here’s how the website describes the corpus:
The corpus (which was first released in May 2020) is currently about 279 million words in size, and it continues to grow by 3-4 million words each day. At this rate, it may be 500-600 million words in size by August 2020.
The Coronavirus Corpus allows you to see the frequency of words and phrases in 10-day increments since Jan 2020, such as social distancing, flatten the curve, WORK * home, Zoom, Wuhan, hoard*, toilet paper, curbside, pandemic, reopen, defy. . . .
The corpus also allows you to see the patterns in which a word occurs, as with stay-at-home, social, economic, or hoard*.
You can also compare between different time periods, to see how our view of things have changed over time. (And you can even compare between the 20 countries in the corpus). Interesting comparisons over time might include phrases like social * or economic * that were more common in Jan/Feb than in Apr/May, or words near BAN or OBEY that were more common in Apr-May than in Jan-Feb.
I was able to dig around in this corpus to find out how many times the word “liberate” appears in the corpus and in what context, to look for how often the word “suicide” appears and whether or not its use increases across time, and so on. I think your students might also enjoy working with the corpus. They could, for instance, search the database for collocates (words near to each other) to see, first, how often “Clorox” or “disinfectant” appears and then see how often it appears near the words “virus” or “COVID-19.” Or they might search for “masks” and then see how often that word appears near “resistance.” Using corpora in these ways can help students identify what Kenneth Burke called “terministic screens,” that is, the way words work together to form a network (or screen) that creates or reinforces certain impressions or ideas—see Burke’s discussion of terministic screens in Language as Symbolic Action; I often start my classes with a discussion of this concept!
Students taking their writing courses from home could find this new corpus—and others, like Brigham Young University’s Corpus of Global Web-Based English or the Corpus of Contemporary American English—useful for projects they may be working on, or for inspiring them to pursue other research questions. Learning to use available corpora can play a role in any rhetorical analysis they may be doing now or in the future.
If you use corpora in your teaching, I’d love to hear from you! In the meantime, I’ll go back to washing my hands. . .
Image Credit: Pixabay Image 943739 by TBIT, used under the Pixabay License