AI-detection software isn't the solution to classroom cheating

February 26, 2025

The GIST

Editors' notes

AI-detection software isn't the solution to classroom cheating—assessment has to shift

by Michael Holden ,

Credit: Pixabay/CC0 Public Domain

Two years since the release of ChatGPT, teachers and institutions are still struggling with (AI).

Some have . Others have or have called for teachers to .

The result of responses, leaving many kindergarten to Grade 12 and post-secondary teachers to make decisions about AI use that with the teacher next door, institutional policies, or current research on what AI can and cannot do.

One response has been to , which rely on algorithms to try to identify how a specific text was generated.

AI detection tools . But they're a sufficiently imperfect solution, and they do nothing to address the core validity problem of designing assessments where we can be confident in what students know and can do.

Get free science updates with Science X Daily and Weekly Newsletters — to customize your preferences!

Teachers using AI detectors

A , based published by the , reported that 68% of teachers use AI detectors.

This practice has also found its way into some Canadian and .

AI detectors vary in their methods. Two common to check for qualities described referring to alternating and short and long sentences (the way humans tend to write) and complexity (or "perplexity"). If an assignment does not have the typical markers of human-generated text, the software may flag it as AI-generated, prompting the teacher to begin an investigation for academic misconduct.

To its credit, AI detection software is more reliable than human detection. Repeated studies show humans— and —are incapable of reliably distinguishing AI-generated text, .

Accuracy of detectors varies

While or , others seem to be more successful. However, what success rates should really signal for educators is questionable.

Turnitin boasts that their AI detector has a 99% success rate, vis-à-vis their (that is, the number of human-generated submissions their tool incorrectly flags as AI-generated). This accuracy has been challenged by a recent study that found Turnitin only detected AI-generated text .

The same study suggested how different factors could shape accuracy results. For example, GPTZero's accuracy , especially if students edit the output an AI tool generates. Yet a different study of the same detector suggested (for example, between 23% and 82% accuracy or 74% and 100% accuracy).

Considering numbers in context

The value of a percentage depends on its context. In most courses, being correct 99% of the time is exceptional. It's above the most common threshold for , which is often set at 95%.

But a 99% success rate would be atrocious in air travel. There, a 99% success rate would mean . That level of failure would be unacceptable.

To suggest what this could look like: at an institution like mine, the University of Winnipeg, submit multiple assignments—we could ballpark five, for argument's sake—for around five courses every year.

That would be about 250,000 assignments every year. There, even a 99% success rate means roughly 2,500 failures. That's 2,500 false positives where students did not use ChatGPT or other tools, but the AI detection software flags them for possible use of AI, potentially initiating hours of investigative work for teachers and administrators alongside stress for students who may be .

Time wasted investigating false positives

While AI detection software merely flag possible problems, we've already seen that humans are unreliable detectors. We cannot tell which of these 2,500 assignments are false positives, meaning cheaters will still slip through the cracks and precious teacher time will be wasted investigating innocent students who did nothing wrong.

This is not a new problem. . Ubiquitous AI has merely shed a spotlight on a long-standing .

When students can plagiarize, hire contract cheaters, rely on ChatGPT or have their friend or sister write the paper, relying on take-home assessments written outside class time without any teacher oversight is indefensible. I cannot presume that such forms of assessment represent the student's learning, because I cannot reliably discern if the student actually wrote them.

Need to change assessment

The solution to taller cheating ladders is not taller walls. The solution is to change how we are assessing—something have been advocating for .

Just as we don't spend thousands of dollars on "did-their-sister-write-this" detectors, schools should not rest easy simply because AI detection companies have a product to sell. If educators want to make valid inferences about what students know and can do, assessment practices are needed that emphasize (like drafts, works-in-progress and repeated observations of student learning).

These need to be rooted in that center as a shared responsibility of students, teachers and system leaders—not just a mantra of "don't cheat and if we catch you we will punish you."

Let's spend less on flawed detection tools and more on supporting teachers to across the board.

Provided by The Conversation

This article is republished from under a Creative Commons license. Read the .

�鶹��Ժ

AI-detection software isn't the solution to classroom cheating—assessment has to shift

Teachers using AI detectors

Accuracy of detectors varies

Considering numbers in context

Time wasted investigating false positives

Need to change assessment

Understanding problems tougher than solving them, mobile game experiment shows

Pianists' subtle finger movements influence variations in timbre, according to new study

Statistics don't lie—but they can be misleading

Gender, language and income biases limit contributions to scientific, English-language journals

ChatGPT appears to improvise when put through ancient Greek math puzzle

Inquiry into the history of science shows an early 'inherence' bias

Who shows up in times of need? High school extracurriculars offer clues

Saturday Citations: Epiphanies and brain states; a baffling skull find; achieving well-being in old age

Icy planetesimal with high nitrogen and water content discovered in white dwarf's atmosphere

New adaptive optics system promises sharper gravitational-wave observations

AggreBots: Tiny living robots made from lung cells could one day deliver medicine inside the body

A 3000-year-old copper smelting site could be key to understanding the origins of iron

�鶹��Ժics-informed AI learns local rules behind flocking and collective motion behaviors

What noise does a fish make? New underwater tool lets ecologists ID fish from their sounds

Termite observations reveal their sophisticated technique to prevent contamination in fungal crop

Plasma: The fourth state of matter drives sustainable carbon upcycling

Super-absorbent hydrogel for soilless farming enables plants to thrive in drought conditions

Fewer hailstorms but bigger hailstones: Climate change shifts Europe's severe weather risks

Get Instant Summarized Text (GIST)