How does tokenization work to protect sensitive data?
So tokenization is basically taking an input for let's say an id 123-456-789, store the real id in database1 then tokenize the id into a systematic id like 000-000-001 then store the token into database2?
But how are the data even getting secured if the original data still exists as its in database1?
2 Replies
In a nutshell, the real database containing your 123-456-789 entries is a highly privileged database which very few people have access to. Because it's a single asset which can be protected and because access to it is tightly controlled, it is a more tractable problem to secure it.
The "token" (000-000-001) is given out to callers who aren't necessarily trusted with the sensitive data. Maybe the code hasn't been vetted, maybe they work for a different company, maybe more people have access to the database, whatever.
The general idea is that if the database containing the tokenized data is compromised, it's not the end of the world. The token by itself is a placeholder. It's not really used for all that much. (This is scenaro-specific, obviously.)
Contrast this against a system where a whole bunch of people all have access to the secret data, and where this is spread across multiple different databases. Securing this would be nightmarish.
tl;dr: Tokenization narrows the number of places where the real secret appears, and it narrows the number of entities which have access to the real secret. This means that the overall attack surface is much reduced compared to a design where the secret is plastered across many different databases accessible by many different entities.
Hmmmm, this makes sense 🤔
Thank you