Using ChatGPT in the U.S. History survey

Like many of my colleagues, I responded to the rise of ChatGPT with panic. But my blood pressure recovered after reading a few articles (especially this one but I’ll also point out this excellent later piece). I began to wonder if there was an alternative to my peer’s abstinence-only approach. The following is my experiment with having students use ChatGPT in a U.S. history course.

For the past few years, my survey course asked 300 students to write essays on some of the most important questions in early American historiography. For example, How radical was the American Revolution? What is the most important factor that shaped the lives of enslaved people in colonial America? Why did the Spanish, Dutch, French, and English treat Native Americans differently? Did the Consumer Revolution do more to promote unity or division?

This summer, I used ChatGPT to generate a dozen essays using my typical questions. I then graded the results. The essays mustered numerous, usually correct, but often vague pieces of evidence. The prose was almost always well-organized and largely free of the errors endemic to undergraduate writing. But the essays almost always lacked a strong, explicit argument, instead opting for mushy bothsidesism. When it came time to award a letter grade based on my usual standards, the ChatGPT essays ranged from a D to a high B. The majority earned a C. Which, as the adage goes, gets degrees… And for me, this was a problem.

But what to do? The siren song of scantrons beckoned. But I’ve long resisted that temptation. Writing offers help in strengthening your brain, organizing your ideas, enriching your understanding, and opening career opportunities. (My understanding of the value of writing in education comes from my undergrad courses as an education major where I was influenced by the rather old but I think still relevant literature on “writing to learn.” See this, this, this, and this) In my experience the best way to grow as a writer was to practice. So, no. I would not abandon the essay.

Many of the most thoughtful instructors I know have joined the ranks of Luddite holdouts by switched their assessments to in-class formats (again let me bump this piece). This was in, fact, my first solution. But I wondered if there was a way to make this a teachable moment about both the content of my course, the importance of writing, and the nature of the new technology. Could we together critically interrogate ChatGPT’s abilities and limitations? I forced myself to imagine a world five or ten years from now when ChatGPT and its successors have exponentially improved. What then? What if, as one of my colleagues says, it proves to be “a calculator for writing?”

So, I gave my students two options for their midterm exam. The first was to take an in-person essay exam. The second was to grade essays written by ChatGPT. I hoped most students would choose the second option, but I wanted to create space for students who (understandably) reported feeling uncomfortable with using this morally dubious tool or otherwise were nervous about being part of an experiment. I also wanted a fallback in case the tool collapsed. Sure enough, only a dozen or so students opted for the in-person option. Others welcomed the chance to engage ChatGPT critically or just generally preferred an out-of-class assessment.

At the same moment that the in-class essay assignment began, all of the students received an email with six of the big discussion questions that framed each of my lectures. From the six questions they received by email, they selected the three that they think they could best answer. Then for each they asked ChatGPT to “Write a 1,500-word essay that answers the question [“insert question text”]. They then pasted the result into this worksheet, and on the next page, they evaluated the essays by answering the following six questions:

1   Did the essay answer the question with a clear argument?

Historians make arguments about the past. Did ChatGPT make a clear argument? If so, what is it? If not, what’s the closest it got to making an argument that answers the question? (Also yikes, an essay without an argument is a bad essay…) Your answer to this question should just be a few sentences.

2   What evidence did ChatGPT come up with to defend its argument?

Historians have to defend their arguments with evidence. Make a list of all of the evidence ChatGPT gave to defend their argument. This is probably the longest answer you will give. The exact length depends on what ChatGPT gave you. We will grade you here on whether you were able to identify all of the evidence given to you in this essay.

3   What of this evidence was given to you in the lectures?

Go back through the list above and list everything that was also mentioned in our class lectures. If you want to comment on items in the list, please do, that’s a great way to show off, but this isn’t necessarily required. The point here is to evaluate your ability to recall and apply what you heard in the lectures.

4   Did ChatGPT say anything that contradicts what you have learned?

List anything in the essay that seems wrong based on what you heard in the lecture. Maybe this list will be long, maybe there will be nothing.

5   What evidence did we discuss in the lecture that ChatGPT failed to mention?

I bet there are things that we talked about in class that ChatGPT didn’t mention. Make a list of these things. Again, feel free to comment if you’d like, but it’s most important that you identify everything we talked about in the lecture that’s relevant to the question that ChatGPT left out.

6   What is the best and worst written sentence in the essay?

Is ChatGPT a good writer? What do you think is the best and worst written sentence in the essay? What makes it so good or bad?  Answer this by quoting the best and worst written sentence and then in your own sentence or two for each, explain what makes ChatGPT’s sentence so good or so bad.


The students had 48 hours to complete this assignment. They were allowed, and in fact encouraged, to use their notes. My TAs and I are now nearly completed grading these exams. So far, I’m pleased with the results. As in any large class, the quality of submissions has been mixed. However, many students seem to have taken the task seriously. Most did an admirable job of assessing the argument (or lack thereof) and the evidence. Too many students just offered a list of the evidence provided and missing–this is my fault. I need to be more explicit in my instructions that I want their answers to be in paragraph form. I do want them to practice writing, after all.

In two optional debriefing sessions, several dozen students spoke supportively of the assignment. One thanked me for “the chance to try out the tool in a way that didn’t feel dirty.” Most students walked away frustrated with ChatGPT. The most common complaint was that “it won’t make a real argument” or to use the words of one student, “ChatGPT is a coward.” Surprisingly, the students reported feeling less inclined to ever use the tool again. This was even after I told the students that I felt inclined to continue to experiment with the tool–I recently used it to make a first draft of a conference CFP (which I had to heavily edit, partly because like most humanities conferences I would be unable to pay the $1,000 stipend promised by the ChatGPT draft). However, the students remain convinced that the tool is incapable of writing the kind of essays that they think are valid. Perhaps the most interesting comment came from a student who noted that Grammerly (an AI tool designed to give feedback on how students can approve their writing) was highly critical of the essay generated by ChatGPT. As a result of feedback from an AI writing tool, this student determined that AI text generating was simply not up to snuff.

I hope this assignment gave students a different perspective on writing that might change the way they think about producing their own essays. When I asked the students, they were unsure. But I have reason to think it might.[1] My hope is that evaluating writing will help them identify what makes a quality essay and accordingly improve the quality of their primary-source-driven final exam essay on our course’s big question “Did technology do more to expand or contract freedom in early American history?” Stay tuned for an update on that.



[1] Like most historians, I require my graduate students to read monographs and articles and then come to class prepared to discuss the work’s argument, historiographical intervention, evidence, methodology, organizational logic, and strengths/weaknesses. Years ago I created a worksheet for that purpose. I found that the structured task of identifying these elements of a book or article taught students how to read like a historian (for another, likely better, way to pursue this task see this resource from Caleb McDaniel). It also helped students to become aware of the components of a good work of scholarship. At the end of the semester, I have students fill out the same worksheet for their work and the work of their peers. I remain convinced that one of the best ways to improve as a writer is to improve as an evaluator of writing. Nothing has done more to improve my writing than my experience as an editor. (Please don’t use this shoddy blog post as evidence to evaluate this claim.)


One thought on “Using ChatGPT in the U.S. History survey

  1. Hi Ben–
    Thanks for this. We need to think more about these tools.

    As a Rhetorician, I separate writing-to-learn from learning to write. I use writing-to-learn in very low stakes, but research-oriented (sources ChatGPT doesn’t manage well) Slack chats weekly throughout the semester. I teach writing in classes where my instruction can begin with drafts and continue to final edits. So I have given up the essay exam altogether and “the essay” as a genre in many classes. In favor of other types of arguments. Merits of that are debatable for some.

    ChatGPT is a tricky tool. One needs to be an expert prompter in order to get expert writing. Will we ever be in the position of training prompters? Probably hype.

    As someone interested in language, I am interested in how LLM-generated text will be identifiable and whether we will be able to determine authorship. Seems generated text and images are already mingling with our authored work. How will that change what it means to use language in any process. Matthew Kirschenbaum’s “Textopalypse” … a bit too theoretical but an interesting read.

    As far as ChatGPT in the classroom, I have seen people use it successfully to demonstrate the difference between analysis and argument. However, intentional use of the tool is time-consuming. Reports back regarding benefits of doing so are quite helpful.

Leave a Reply

Your email address will not be published. Required fields are marked *