What If Robots Start Taking Your Exams?

Stephen L. Carter / Bloomberg

What if students could sleep in on the day of the final examination and let a computer program take the test for them? Let’s imagine a hypothetical adaptive software program — call it NeverTest — capable of taking examinations and performing consistent with student ability.

Were such a system implemented, teachers could grade the tests without making nervous students sit through them.

Yes, it’s a thought experiment — but bear with me. The student would work with NeverTest on assignments during the term, and when it came time to measure performance, the program, not the student, would take the examination, making the mistakes its algorithm predicted the student would make.

All of the accuracy without any of the anxiety.

Is Our Reliance on Technology Making Us Vulnerable?

KU Postpones UG 2nd, 4th Semester Exams Scheduled From July 15, 16

The notion is hardly outrageous. Increasing evidence suggests that the use of learning analytics during the term can accurately predict final grades.

Surely it’s a small step to have the software take the exam and make the predicted mistakes. All that’s needed is a large enough data set; the more assignments, the more accurate the predictions.

Does the proposal make you uneasy? Two possible reasons come to mind: First, the students might cheat on the assignments the algorithm uses to predict exam errors; and, second, the software might err in its predictions.

Let’s take these concerns in order.

Sadly, college students do cheat. A lot. What has come to be called “contract cheating” “- in which a third party completes the student’s work “- is on the upswing around the world.

In some studies, more than 15 per cent of students admit to having cheated at least once. A non-trivial number of cheaters probably lie on surveys about cheating, so the proportion is likely higher still.

Half the enrolment

Elite universities are as vulnerable as anyplace else. In 2015, the provost of Stanford issued an open letter about “an unusually high number of troubling allegations of academic dishonesty,” including one “that may involve as many as 20 per cent of the students in one large introductory course.”

In 2012, Harvard investigated charges that some 125 students in a single course “- half the enrolment “- had colluded on assignments.

One might therefore imagine that in a course using my hypothetical NeverTest software, a large number of students would break the rules.

If the point of the adaptive software is to evaluate student strengths and weaknesses throughout the term in preparation for the final examination, a student could simply hire someone smarter or harder-working or less anxious, then let NeverTest evaluate her strengths and weaknesses instead.

But this concern matters only if net cheating would increase under NeverTest. I’d suggest, to the contrary, that net cheating might actually fall. It’s one thing to pay a substitute to sit for a final examination, a single concrete event; it’s something else to pay a substitute to complete all the other assignments during the term.

Besides, there are plenty of ways to ensure that the right person is sitting in front of the computer, including some that students experience as intrusive. That experience could be avoided by making NeverTest optional.

Only those who preferred not to sit for the final examination would use the adaptive software during the term. But even if the early users were mainly students with unusual levels of examination anxiety, it’s easy to imagine that the software might swiftly come to be the default.

The larger concern, surely, is that the NeverTest algorithm could be wrong. It might predict inaccurately the errors the students would have made had they sat for the final examination themselves.

Every teacher has known students who struggle throughout the term only to blossom unexpectedly at the end. NeverTest wouldn’t capture the result of the hard work and determination that carries these students successfully through their difficulties.

Preference for rewarding

But I wonder whether the harm suffered by the student who outperforms on final exams might be balanced by the harm avoided by the student who underperforms. A preference for rewarding the student who peaks late over the student who peaks early might represent nothing more than status quo bias.

In any case, the incidence of both errors might be reduced by helping my hypothetical software make better evaluations of student ability. In computer science courses, performance on early assignments turns out to be a significant predictor of the final grade. It’s easy to imagine this result replicated in other STEM courses, and perhaps in economics or foreign languages “- all fields where constant homework yields constant feedback.

All of this, as I said, is simply a thought experiment. But I wouldn’t be surprised if it’s also the future. So, to my friends in Silicon Valley: I’ll be sitting by the phone.

(1) And, yes, as the academic arms race continues, it’s easy to find advice online about how to get around the anti-cheating software. (I do not mean to suggest that any of the advice will actually work.)

(2) We know that student internet use during class has a negative effect on examination performance.

(3) For a contra result, see here.

(4) The evidence is unclear on whether frequency of tests improves final examination performance, but here we are trying only to improve the ability to predict performance.