Data security in AI projects

83% of companies have experienced at least one data breach. (Source: IBM Cost of a Data Breach Report, 2023) The average breach costs $4.45 million.

AI projects make risk bigger. Not because AI is insecure. But because you're sending data to places you don't control.

If you paste a customer email into ChatGPT - it goes to OpenAI servers in the US. If you paste a doctor's letter there - that's health data going outside the EU.

This isn't always bad. But you need to know what you're doing.

Three questions before starting an AI project

1. Where does my data go?

Most AI services store data in the US. GDPR allows this if there's "adequate protection" - but you need to verify.

Service	Server location	DPA available?
OpenAI (ChatGPT)	USA	Yes
Anthropic (Claude)	USA	Yes
Google (Gemini)	USA/EU	Yes
Microsoft (Copilot)	USA/EU	Yes

DPA = Data Processing Agreement. This is a contract regulating how the provider processes your data.

2. Is my data used for training?

This is an important question. Some services use your input data for model training. This means your data could - theoretically - appear in someone else's response.

Most services let you opt out: - OpenAI API: off by default - OpenAI ChatGPT: you must disable in settings - Claude: off by default - Google: depends on account type

Check before using.

3. Do I have a plan if something goes wrong?

GDPR requires notifying of data breaches within 72 hours. You need: - Process for detecting breaches - Responsible person - Notification template - Data protection authority contact

Practical steps

1. Use API, not web interface

Through API: - You can log what data you send - Data isn't used for training (with most services) - You can add your own security rules

Through web interface: - You don't know exactly what data is sent - Data MAY be used for training - You have no log

2. Anonymize before sending

Before sending data to AI: - Remove names - Remove contact info (email, phone) - Remove ID numbers (personal ID, contract number) - Replace specific numbers with approximations

This can often be automated with regex patterns.

3. Document data flows

Create a simple document:

Phase	Description
Source	Where does data come from?
Processing	What's done before sending to AI?
AI service	Where is it sent? Which DPA?
Storage	Does AI service store data? How long?
Return	What's done with the response?

This document helps both for GDPR audits and your own clear thinking. See how AI audit services help evaluate your data security systems.

Practical example

One client wanted to use AI for summarizing customer emails.

Original plan: Copy emails to ChatGPT.

Problem: Customer emails contain names, contact details, sometimes health info.

Solution: 1. Used OpenAI API (data not used for training) 2. Built automatic anonymizer (replaces names with [NAME], contacts with [CONTACT]) 3. Logged all requests 4. Documented data flow and added DPA to client contract

Result: Same functionality, GDPR-compliant, auditable.

Common mistakes

1. "It's just a test"

Even in testing, you're using real data. And real data is real data. GDPR applies to tests too.

2. "I trust this service"

Trust isn't the same as security. You need to know what the service does with data - not just believe they're decent.

3. "Nobody has ever asked about this"

Until someone asks. And then you have 72 hours to respond.

Summary

In AI projects, security is foundation, not add-on.

Before starting: 1. Know where your data goes 2. Check if it's used for training 3. Anonymize sensitive data 4. Document data flows 5. Have a breach plan

If you feel uncertain - involve someone who helps. It's cheaper than a breach.