Empty Column: Encryption and query performance

I am trying to implement encryption but have run into some serious performance issues. I am required to encrypt the SSN in our database. In and of itself, this is not a problem. The problem comes in because there is also a need to be able to query the table based on the SSN. Since the SSN is encrypted, the query basically performs an index scan, decrypting each value as it goes along. As a result, the query for one record out of 10 million records in the table takes three minutes. It needs to occur immediately.

If I could encrypt my SSN parameter and then compare it to the encrypted value in the column, it would work fine. Unfortunately, everytime you encrypt a particular value, the resultant encrypted value is different. Hence, I have to decrypt the column to match my parameter, instead of encrypting th parameter to match the column.

Does anyone have any suggestions to help alleviate this problem?

Thanks,

See the following links - they discuss this issue in detail:

http://blogs.msdn.com/lcris/archive/2005/12/22/506931.aspx

http://blogs.msdn.com/raulga/archive/2006/03/11/549754.aspx

Thanks
Laurentiu

|||

Laurentiu,

Thanks for the links. They supplied the information that I was afraid I'd find. In the example of using the MAC key, is there a benefit of having one column containing the encrypted SSN, if another column contains the MAC of the plaintext? That is, if the MAC of the plaintext is secure enough to be able to use for indexing, then are things more secure by separately encrypting the SSN with a symmetric key?

Two ideas that were proposed for our situation were:

1.) Encrypt the SSN and also store, let's say the last four of the SSN in plaintext. Our last four field would be included in the query condition. That would, in effect, reduce the amount of the index searched by a factor of 10000 (e.g search and decrypt only 1000 of the 10 million records)

2.) Create a separate table containing the plaintext SSN (indexed) and the encrypted value of the PK to the customer table. The SSN would be plaintext, but would not be linkable to any particular customer. Searches by SSN would use this table to retrieve the one row and then decrypt the encrypted value of the PK to link to the customer table. I'm not sure I like this idea, as the SSN are all visible, regardless of whether they can be linked to a particular person.

What are your thoughts on either of these ideas?

Thanks,

Larry

|||

The HMAC cannot be used to retrieve the original data, hence the encrypted value needs to be stored as well. The HMAC is only used for the search, to leverage the index for retrieving the row. Once the row is retrieved, the data is extracted from the encrypted column value.

I would avoid both ideas, as both are disclosing information about your customers' SSNs. The solution we recommend is to index an HMAC of the SSN, instead of indexing portions of the unencrypted SSN. There is no significant overhead compared to the other two solutions, so why not implement this approach?

Thanks
Laurentiu

|||

Again, thanks for your reply. Your answer makes sense to me. I, too, was not real keen on leaving SSN (or parts of it) in plaintext, but I was running short on other ideas and short on time. I hope you'll forgive my ignorance when it comes to HMAC, actually when it comes to encryption in general. When I looked at Raul's sample code for implementing this solution, I'll have to admit I was rather intimidated by it. I wish I had more time to really digest the whole thing. I will re-read his blog and try to digest it completely. Are there any other places where I might be able to get a crash course on HMAC's?

Thanks,

Larry

|||One other question I have is regarding replication. This table is one that I am replicating to another database. If I create these same certificates and keys on both database, can I still search from database B on an indexed value that was created on database A?|||

Laurentiu,

The example you pointed me to in Raul's blog implements the encryption and the generation of the MAC in a trigger. All of our data access logic (INSERTs and UPDATEs) currently resides in stored procedures. What are benefits or drawbacks to implementing the encryption logic in triggers as opposed to the stored procedures that we already have?

|||

You can generate the HMAC in procedures as well. Raul's example shows how you can compute the HMACs without changing existing stored procedures - this is why he's doing the HMAC computations in a trigger. If changing the stored procedures is not an issue for you, then you can move the HMAC computation in their code.

Also, for HMAC generation, you can find out more by looking at a cryptography book or at online resources. The algorithm that Raul uses is to concatenate the original value (the sensitive data) with a secret value (@.key) and then to compute a SHA1 hash of the resulting concatenation - that is the HMAC of the original value:

SELECT @.RetVal = HashBytes( N'SHA1', convert(varbinary(8000), @.Message) + @.Key )

Thanks

Laurentiu

Thursday, March 22, 2012

Encryption and query performance

No comments:

Post a Comment

Empty Column

Blog Archive

About Me