What is Tokenization and how does it work?

Welcome and Happy New Year. Over the next several months, I will be writing about the joint partnership between RSA and First Data, the service that came out of that partnership (First Data’s TransArmor solution), the technologies that we are using, and the impact that the service will have on credit card processing in the enterprise. Leading up to the actual launch of the solution, I thought it appropriate that we begin with an explanation of the technologies being used, especially since there is so much confusion in the market created by multiple vendors using the same terminology.

So let’s get started.

The first technology to discuss is tokenization. Tokenization, in its simplest form, is another way of saying ‘data substitution’. It is the act of using a substitute value, or ‘token’, which has no inherent value, in the place of data that does have value. That way, if the system using tokens is compromised, it is the tokens that are taken, not the actual valuable data.

This works within an enterprise because the actual content of the data field isn’t really that important to support most internal business processes. The enterprise can then use the token internally in lieu of the original data, and can translate the token back into the original data for necessary external interactions.

For example, instead of keeping a credit card number (which has value to a bad guy) in a database, I keep the token (which has no external value) in the credit card number field. I use the token to support internal business processes because my internal reporting and analytics don’t need an actual card number, they just need another constant value to operate against. The only time the actual card number would ever be needed is in external communication with the cardholding customer or to monetize it in an interaction with the credit card processor. In either of those cases, I would turn the token back into the original value for that external communication.

So far, this sounds an awful lot like encryption, doesn’t it? Well, while they are similar in description, they operate in a very different manner.
Encryption works by taking the original data and performing a mathematical operation against that data that results in what is essentially gibberish. To retrieve the original data you reverse that mathematical operation and turn the gibberish back into good data. The data that comes out of an encryption operation is typically larger than the original data. The encrypted data can (theoretically) be compromised given a large enough sample of encrypted data. Encryption relies on keys, which must be safeguarded, else the data protected by those keys be compromised.

As an example, if you applied encryption to a credit card number it might look something like this:

Card Number Encrypted Card Number
5647 8377 8388 2299 Ojr73h3d^&hh#&HFH&##ED*HD#*

Notice how the encrypted value is both more characters and structurally different.

Tokenization works by taking the original data value and generating a substitute value, usually with a random number generator. The mapping between the original data and the token is maintained in a secure database.

Card Number Tokenized Card Number
5647 8377 8388 2299 9483 7266 3928 9819

Notice how the token is structurally similar to the original data, both is length and characters.

If you want to reverse the process and retrieve the original data, you submit the token to the database and the original data is returned. Because there is no mathematical relationship between the original data and the token, it is impossible to ever derive the original data from a token, no matter how large a sampling of tokens you have. Because the token is randomly generated, you can customize the token format, giving it greater flexibility. This is important for things like databases, where you frequently have issues trying to put encrypted data into a field that was originally designed for cleartext data because encrypted data is larger than data in the clear. With tokenization, you can create a token that is of the same length as the original value, so there is no modification of back-end systems to take the token.

Obviously, with tokenization, it is imperative to protect the database that contains the mappings between the original data and the tokens. Also, the fewer times that the tokens are required to be converted back into the original data, the more valuable tokenization is as well.

That’s a quick overview of what tokenization is, and how it works. In the future, I will discuss the different types of tokenization that are appearing in the market and what those different approaches mean to your enterprise.

Comments

Re: What is Tokenization and how does it work?
Nice post, thanks.
- d0s

Question
In tokenization,how are mappings between original data and tokens protected in the database? You’re still storing them.
- Virgil

2 Responses to “What is Tokenization and how does it work?”

  1. Jane says:

    How does tokenization protect the data through cross domain communication, given that only one of the domain adopted tokenization?

  2. Hannah says:

    This is a GREAT article! I was just hired at a company, and they wanted me to learn about tokenization versus end-to-end encryption, but nobody could answer my questions! With this article, I feel like I actually UNDERSTAND how tokenization works, and WHY it’s different from encryption. Thank you so very much for this!!

Leave a Reply