Vol. 24 Seeing Like an Algorithmic Error 12
quick to stress, these tools do not decide that a student has cheated,
they simply identify patterns that they argue are statistically
correlated with cheating, leaving schools to investigate and decide
whether an event the software flags is indeed cheating.
These systems are problematic for many reasons, as popular
press accounts and social media complaints document. Students,
especially those in shared living situations, often cannot create the
kind of silent and visually static environments that such proctoring
systems expect; family members and roommates may enter the
exam scene for reasons unrelated to cheating. Students who think by
habitually looking up or away for any length of time may be flagged
as potential cheaters more than students who train themselves to
stare at the camera. Because proctoring systems often do not allow
the use of virtual backgrounds, students are forced to reveal their
home environments to the camera and, potentially, a professor
investigating a possible instance of cheating. Though universities
and companies stress that such data is anonymized and only used in
the aggregate to improve an algorithm’s accuracy, students using
these systems are effectively forced to submit their keystroke,
mouse, audio, and video data to machine learning datasets. There is
usually no way to opt out of remote proctoring and still take a test.
The question that our task force was asked to consider was
whether the facial detection system that our university’s remote
proctoring system used to track students’ head movements and eye
gazes systematically treated students of color differently from other
students. Though it was not the software we used, ExamSoft was
publicly criticized for telling students of color that they should take
extra steps to make sure that they were properly illuminated. They
told students to front-light themselves and be sure to hold their heads
especially still, to avoid having their exams flagged for review. We
knew that many remote proctoring systems used similar facial
recognition systems (competitors often use the same off-the-shelf
datasets, computational models, and pattern-matching algorithms)
and, indeed, we confirmed that our vendor’s remote proctoring
system had a higher error rate for dark-skinned versus light-skinned
students. They similarly suggested that students of color should
front-light themselves and be especially careful to minimize head
movements. Our algorithm was systematically treating our students