AI in assessment: ethics and practice
GPTZero's accuracy in detecting AI text is 54-87%, depending on the test. (Source: Stanford HAI, 2023) This means if you accuse 10 students of using AI, 1-5 of them are wrongly accused.
This is an important number. Not because AI in assessment is bad. But because many teachers don't know this number when making decisions.
AI in assessment is a tool. But like any tool - you need to know what it does and doesn't do.
What AI CAN do in assessment
1. Initial screening - saves 50%
30 essays. AI reads through and says: "These 5 are clearly strong. These 10 are average. These 5 need attention."
You don't read 30 essays with the same eye. You read 5 carefully, 10 quickly, 15 routinely. That's 50% time savings - without quality dropping.
2. Grammar and style - speed without errors
AI detects: - Typos (100% accuracy on common errors) - Repetitions ("this is this, which is this") - Confusing sentences (long run-ons)
This DOESN'T detect: - Whether argument is logical - Whether examples are relevant - Whether text is original thinking
3. Feedback drafts - foundation, not end
Prompt: "Read this essay. Give 3 points: 1) what's good, 2) what to improve, 3) question to prompt further thinking."
AI gives draft. You review, adjust, add context.
| AI without adjustment | AI + your adjustment |
|---|---|
| "Structure is clear" | "Structure is clear - especially good use of concrete example in paragraph 2" |
| "Add sources" | "Add sources - try finding one study that supports your claim" |
| "What's the main argument?" | "What's the main argument? I see 2 different ones - which are you actually claiming?" |
What AI CAN'T do in assessment
1. See context
AI doesn't know: - That this student made tremendous progress - That last week was hard for the student - That this is written in L2 - That the student tried risking a new style
Context is your job.
2. Measure effort
AI measures result, not process. But sometimes a 6/10 work is significantly more valuable than 9/10 - because someone tried something new.
3. Reliably detect AI text
GPTZero, Turnitin AI, Originality.ai - they all make mistakes. Stanford HAI study showed: - False positive (accuses innocent): 5-15% - False negative (misses real): 20-40%
This means: use detection as signal, not evidence.
Ethical principles
1. Transparency
If you use AI for drafting feedback - say so.
"I used AI for initial feedback draft. Then I read through and adjusted."
This isn't weakness. It's honest.
2. Final decision is human
AI gives input. You decide. If student challenges - you must be able to explain why you decided that way.
"AI said so" is not an explanation.
3. Data protection
Before pasting student essay into ChatGPT: - Do you have permission? - Where does data go? - Is it used for training?
GDPR applies in Estonian schools. Student text is personal data. Data security in AI projects covers this in detail.
Practical decision guide
| Use | Allowed? | Notes |
|---|---|---|
| Grammar check | Yes | Grammarly, LanguageTool are OK |
| Feedback draft | Yes | But review and adjust |
| Initial sorting | Yes | But don't trust blindly |
| AI text detection | Carefully | Not as evidence, only signal |
| Automatic grade | No | Grade is human decision |
Practical example
25 essays. Teacher uses AI:
Without AI: - 15 min per essay = 6.25 hours - Feedback: 2-3 sentences (tired)
With AI: - 6 min per essay = 2.5 hours - Feedback: 3-5 sentences (fresh)
Result: 3+ hours saved AND better feedback.
Summary
AI in assessment is a tool - not a judge.
- Use for screening and feedback drafts
- Don't use for automatic grading
- Don't trust AI text detection blindly (54-87% accuracy!)
- Be transparent - students must know
- Final decision is always yours
Bad use causes harm. Good use saves hours and improves feedback quality.