AI in assessment: ethics and practice

GPTZero's accuracy in detecting AI text is 54-87%, depending on the test. (Source: Stanford HAI, 2023) This means if you accuse 10 students of using AI, 1-5 of them are wrongly accused.

This is an important number. Not because AI in assessment is bad. But because many teachers don't know this number when making decisions.

AI in assessment is a tool. But like any tool - you need to know what it does and doesn't do.

What AI CAN do in assessment

1. Initial screening - saves 50%

30 essays. AI reads through and says: "These 5 are clearly strong. These 10 are average. These 5 need attention."

You don't read 30 essays with the same eye. You read 5 carefully, 10 quickly, 15 routinely. That's 50% time savings - without quality dropping.

2. Grammar and style - speed without errors

AI detects: - Typos (100% accuracy on common errors) - Repetitions ("this is this, which is this") - Confusing sentences (long run-ons)

This DOESN'T detect: - Whether argument is logical - Whether examples are relevant - Whether text is original thinking

3. Feedback drafts - foundation, not end

Prompt: "Read this essay. Give 3 points: 1) what's good, 2) what to improve, 3) question to prompt further thinking."

AI gives draft. You review, adjust, add context.

AI without adjustment	AI + your adjustment
"Structure is clear"	"Structure is clear - especially good use of concrete example in paragraph 2"
"Add sources"	"Add sources - try finding one study that supports your claim"
"What's the main argument?"	"What's the main argument? I see 2 different ones - which are you actually claiming?"

What AI CAN'T do in assessment

1. See context

AI doesn't know: - That this student made tremendous progress - That last week was hard for the student - That this is written in L2 - That the student tried risking a new style

Context is your job.

2. Measure effort

AI measures result, not process. But sometimes a 6/10 work is significantly more valuable than 9/10 - because someone tried something new.

3. Reliably detect AI text

GPTZero, Turnitin AI, Originality.ai - they all make mistakes. Stanford HAI study showed: - False positive (accuses innocent): 5-15% - False negative (misses real): 20-40%

This means: use detection as signal, not evidence.

Ethical principles

1. Transparency

If you use AI for drafting feedback - say so.

"I used AI for initial feedback draft. Then I read through and adjusted."

This isn't weakness. It's honest.

2. Final decision is human

AI gives input. You decide. If student challenges - you must be able to explain why you decided that way.

"AI said so" is not an explanation.

3. Data protection

Before pasting student essay into ChatGPT: - Do you have permission? - Where does data go? - Is it used for training?

GDPR applies in Estonian schools. Student text is personal data. Data security in AI projects covers this in detail.

Practical decision guide

Use	Allowed?	Notes
Grammar check	Yes	Grammarly, LanguageTool are OK
Feedback draft	Yes	But review and adjust
Initial sorting	Yes	But don't trust blindly
AI text detection	Carefully	Not as evidence, only signal
Automatic grade	No	Grade is human decision

Practical example

25 essays. Teacher uses AI:

Without AI: - 15 min per essay = 6.25 hours - Feedback: 2-3 sentences (tired)

With AI: - 6 min per essay = 2.5 hours - Feedback: 3-5 sentences (fresh)

Result: 3+ hours saved AND better feedback.

Summary

AI in assessment is a tool - not a judge.

Use for screening and feedback drafts
Don't use for automatic grading
Don't trust AI text detection blindly (54-87% accuracy!)
Be transparent - students must know
Final decision is always yours

Bad use causes harm. Good use saves hours and improves feedback quality.