Thursday, March 29, 2012
Endian problems with converting to varbinary
format to a relational database structure. However, for migration reasons we
need to be able to export the relational data to the binary file structure
for processing. I have accomplished this with a not-so-lightweight procedure
that basically steps through data, pulls out a variable, converts it to
binary, and appends it to an Image column. At the end of the process I use
textcopy within the TSQL code to write the binary file out. However, when
processing the data in a VB6 utility, reading an integer from the file
becomes impossible. 1's become 256, 2's become 512, 4's become 1024, and
larger values are crazy. Stepping through the original files, we notice that
a one would be stored as 0x10, a 2 as 0x20. Stepping through the file that
SQL Server wrote, we noticed that a one was 0x01, a two was 0x02, etc.
I understand why the old files work, as that is in little endian format,
native to windows. However, I don't understand why SQL server is converting
to a binary structure that looks to be big endian. Below is an example of my
conversion, please let me know what I'm doing wrong and/or how I can fix it
to make it work correctly (tempBlob.MainBlob is the Image that I'm updating)
.
Thanks!
Code:
DECLARE @.bin2 binary(2)
DECLARE @.PixelsAcrossDetector SMALLINT
SELECT @.PixelsAcrossDetector = (SELECT PixelsAcross FROM Strips WHERE GUID =
@.StripGUID)
SELECT @.bin2 = (SELECT CONVERT(BINARY(2), @.PixelsAcrossDetector))
UPDATETEXT tempBlob.MainBlob @.Pointer NULL 0 @.bin2> Stepping through the original files, we notice that
> a one would be stored as 0x10, a 2 as 0x20. Stepping through the file that
> SQL Server wrote, we noticed that a one was 0x01, a two was 0x02, etc.
> I understand why the old files work, as that is in little endian format,
> native to windows. However, I don't understand why SQL server is
> converting
> to a binary structure that looks to be big endian.
Consider the following:
DECLARE
@.PixelsAcrossDetector smallint,
@.bin2 binary(2)
SET @.PixelsAcrossDetector = 1
SET @.bin2 = CONVERT(BINARY(2), @.PixelsAcrossDetector)
SELECT @.bin2
The result is a fixed-length *binary string* with a value of 0x0001 rather
than the little endian smallint value 0x0100 you want to serialize. One
method to address the problem is to swap the bytes like the example below.
It's ugly but Transact-SQL isn't really designed to create binary records.
DECLARE
@.PixelsAcrossDetector smallint,
@.bin2 binary(2),
@.Byte0 binary(1),
@.Byte1 binary(1)
SET @.PixelsAcrossDetector = 1
SET @.bin2 = CONVERT(BINARY(2), @.PixelsAcrossDetector)
SELECT @.Byte0 = SUBSTRING(@.bin2, 2, 1)
SELECT @.Byte1 = SUBSTRING(@.bin2, 1, 1)
SELECT @.Bin2 = @.Byte0 + @.Byte1
SELECT @.Bin2
Hope this helps.
Dan Guzman
SQL Server MVP
"David Harris" <DavidHarris@.discussions.microsoft.com> wrote in message
news:35FE8697-EE20-4641-9042-3F569D5BCDC9@.microsoft.com...
> We are in the process of migrating from a DOS based binary file storage
> format to a relational database structure. However, for migration reasons
> we
> need to be able to export the relational data to the binary file structure
> for processing. I have accomplished this with a not-so-lightweight
> procedure
> that basically steps through data, pulls out a variable, converts it to
> binary, and appends it to an Image column. At the end of the process I use
> textcopy within the TSQL code to write the binary file out. However, when
> processing the data in a VB6 utility, reading an integer from the file
> becomes impossible. 1's become 256, 2's become 512, 4's become 1024, and
> larger values are crazy. Stepping through the original files, we notice
> that
> a one would be stored as 0x10, a 2 as 0x20. Stepping through the file that
> SQL Server wrote, we noticed that a one was 0x01, a two was 0x02, etc.
> I understand why the old files work, as that is in little endian format,
> native to windows. However, I don't understand why SQL server is
> converting
> to a binary structure that looks to be big endian. Below is an example of
> my
> conversion, please let me know what I'm doing wrong and/or how I can fix
> it
> to make it work correctly (tempBlob.MainBlob is the Image that I'm
> updating).
> Thanks!
> Code:
> DECLARE @.bin2 binary(2)
> DECLARE @.PixelsAcrossDetector SMALLINT
> SELECT @.PixelsAcrossDetector = (SELECT PixelsAcross FROM Strips WHERE GUID
> =
> @.StripGUID)
> SELECT @.bin2 = (SELECT CONVERT(BINARY(2), @.PixelsAcrossDetector))
> UPDATETEXT tempBlob.MainBlob @.Pointer NULL 0 @.bin2|||>> We are in the process of migrating from a DOS based binary file storage f
ormat to a relational database structure. <<
Now you know why BIT is not part of the SQL Standards. SQL is meant to
be high-level, abstract language; use files and machine-specific
assembly language for that level of programming.
Right tool for the job; no kludges.|||"--CELKO--" Wrote:
> Now you know why BIT is not part of the SQL Standards.
> SQL is meant to be high-level, abstract language; use
> files and machine-specific assembly language for that
> level of programming.
One of my joys is that this is only interim, until we can rewrite our
processing software to make database calls insted of reading the binary
files. But unfortunately that will be a ways off for now, and we have to
kludge it up right now. :( But good design is right around the corner... :)
"Dan Guzman" wrote:
> The result is a fixed-length *binary string* with a value of 0x0001 rather
> than the little endian smallint value 0x0100 you want to serialize. One
> method to address the problem is to swap the bytes like the example below.
> It's ugly but Transact-SQL isn't really designed to create binary records.
Yeah, I was hoping it was a setting or a separate convert function. :) But
this seems like the best solution, will just have to abstract it to simplify
things. Appreciate the help. :)
Will the swap method also work with doubles/floats, just by scaling the
variables to 8 instead of 2, and completely reversing it? Or must the double
s
have a separate format?|||> Will the swap method also work with doubles/floats, just by scaling the
> variables to 8 instead of 2, and completely reversing it? Or must the
> doubles
> have a separate format?
It gets especially nasty when you deal with single and double data types.
Not only do you need to address the big/little endian issue, you'll need to
create the exponent correctly. It's likely to be easier and faster to write
a custom application program than to do this in Transact-SQL. I don't
always agree with Joe but, IMHO, this is beyond an acceptable temporary
kludge.
Hope this helps.
Dan Guzman
SQL Server MVP
"David Harris" <DavidHarris@.discussions.microsoft.com> wrote in message
news:219FAFF8-546B-47B1-B744-02BFD0391989@.microsoft.com...
> "--CELKO--" Wrote:
> One of my joys is that this is only interim, until we can rewrite our
> processing software to make database calls insted of reading the binary
> files. But unfortunately that will be a ways off for now, and we have to
> kludge it up right now. :( But good design is right around the corner...
> :)
> "Dan Guzman" wrote:
> Yeah, I was hoping it was a setting or a separate convert function. :) But
> this seems like the best solution, will just have to abstract it to
> simplify
> things. Appreciate the help. :)
> Will the swap method also work with doubles/floats, just by scaling the
> variables to 8 instead of 2, and completely reversing it? Or must the
> doubles
> have a separate format?|||>> Now you know why BIT is not part of the SQL Standards.
It was a bit suprising that SQL:99 bought it up & is already deprecated in
SQL:2003
Anith|||"Dan Guzman" wrote:
> It gets especially nasty when you deal with single and double data types.
> Not only do you need to address the big/little endian issue, you'll need t
o
> create the exponent correctly. It's likely to be easier and faster to wri
te
> a custom application program than to do this in Transact-SQL. I don't
> always agree with Joe but, IMHO, this is beyond an acceptable temporary
> kludge.
Agreed. Too bad everything isn't an integer. :) I did manage to create a few
procedures to abstract a good part of the process into TSQL, which cuts the
time at least in half from doing everything from C#. Appreciate the help,
even though it's not exactly what I wanted to hear... But this is interim,
and we'll take just about anything right now. :)
Tuesday, March 27, 2012
Encryption Varbinary Length
Using the new encryption included in SQL Server 2005, what is a good way to determine what length I should use for the column?
For example, I am encrypting a column, its maxlength is about 30 characters, but when encrypted, the encrypted value extends from between 50 and no more than 68 characters-
So if I had a column with a max of 500 or so characters, how could I know what varbinary length I should set it to if I were to encrypt it, without actually finding the highest value I could possibly fit into the field?
Is it good practice to just make it a varbinary(max) field?
-rob
My recommendation would be to create a dummy value of the maximum possible size (for example, on a varchar(500), create a 500 characters strings) and encrypt it with the same key and parameters you would normally use. That way you know you have reserved enough space to accept the maximum plaintext value your application accepts.
Example:
CREATE SYMMETRIC KEY key_demo WITH ALGORITHM = AES_256
ENCRYPTION BY PASSWORD = 'My 53Cr3+ p@.zzw0rD'
go
OPEN SYMMETRIC KEY key_demo
DECRYPTION BY PASSWORD = 'My 53Cr3+ p@.zzw0rD'
go
DECLARE @.MaxPlaintext nvarchar(500)
DECLARE @.Ciphertext varbinary(8000)
SET @.MaxPlaintext = replicate (N'a', 500)
-- Notice that I am using Unicode and the length is in bytes = 1000
PRINT datalength( @.MaxPlaintext )
-- Using the basic encryption parameters
SET @.Ciphertext = EncryptByKey( key_guid('key_demo'), @.MaxPlaintext )
-- Length using default parameters
PRINT datalength( @.Ciphertext )
-- Using the optional parameters
SET @.Ciphertext = EncryptByKey( key_guid('key_demo'), @.MaxPlaintext, 1, 'Dummy value' )
-- Length using optional parameters
PRINT datalength( @.Ciphertext )
go
I wrote an article (SQL Server 2005 Encryption – Encryption and data length limitations, http://blogs.msdn.com/yukondoit/archive/2005/11/24/496521.aspx) that explains a little bit more about the encryption length limitations in SQL Server 2005. In that article I have a formula that may also help, but I would recommend using the method shown above instead of the formula in the article.
I hope I was able to help you. Please let us know if you have any further questions or feedback.
Thanks,
-Raul Garcia
SDE/T
SQL Server Engine
|||This example will work for my needs - I should be able to get away with that as it is not often that a user will use the full length of a column. Thanks for validating.|||Raul, thanks for the solution on encrypting large amount of data. We have tried your approach on some BLOBs (e.g. files stored as binaries). The performance is very dismal though it works. Downloading a 3MB file that's encrypted took more than 45 seconds (the same file without encryption can be downloaded from the same webpage in 5 seconds). Any suggestions? I understand that SQL Server is not an efficient way to store files to begin with but we have to use it due to data center restrictions.|||Hey Bob,
A lot of factors will impact the performance. Given the dramatic difference in times, I'm wondering if this is actually caused by the query you are using the retrieve the data? Are you only decrypting the blob or are other columns in the table encrypted as well?
I will run some tests on this, but if it is possible can you share the queries you are using with us? Or at least sample queries which approximate how you are retrieving the data?
Thanks,
Sung
|||Sung:
There is only one field in the DB that's encrypted, the one containing the BLOB. The query is pretty much a copy of what Raul posted on his blog:
--
<edited to remove the source code sample>
Unfortunately the work around I give is not very efficient with data that is larger than a few segments.
The main problem with this workaround (in terms of efficiency) is that the overhead SQL Server needs to pay besides the direct encryption/decryption (lookup for key in key ring, initialize key in cryptographic provider, verify inner header for consistency, etc.) will hit you multiple times for each blob. In the sample you mentioned of a 3MB file, you will be affected by this overhead more than 380 times.
Given the length and nature of the data you are manipulating, I would suggest using a CLR UDF instead and encrypt using CLR cryptographic APIs. The performance should be much better and also much more efficient in terms of space needed (not to mention a much cleaner code than my workaround).
I hope this information will help. Thanks a lot for your feedback.
-Raul Garcia
SDE/T
SQL Server Engine
|||Raul, thanks for the reply. would a CLR UDF approach still use the SQL 2005's key/certificate infrstructure or it's completely .NET solution with separate keys? I have done file encryptions in .NET using X509 cert and symmetric keys, which are pretty easy to implement, but I'd like to leverage the keys created in SQL 2005 so the key management is consistent with other encrypted data in storage.
In addition, if you could show me some examples of using CLR UDF for SQL encryption that'd be great.
Thanks again
|||Actually I was thinking of doing the whole key management and encryption directly in CLR due to the huge overhead you would have to pay for using the SQL Server infrastructure for large objects.
Another solution I can think of using SQL Server key management infrastructure would be to split the original plaintext in multiple rows and leverage on multiple client connections to encrypt/decrypt in parallel. For example: Create a table that will store the ciphertext as fragments in multiple rows (i.e. file_id, file_fragment_id used as primary key); your client can split the plaintext in segments (you could use a fixed length for the segments to make the file SEEK easier, let’s say 7K per segment) and using multiple connections send in parallel encrypt and insert requests to SQL Server (the fragment_id will be relative to the offset of the plaintext). As the fragments will be encrypted independently, they can also be decrypted independently and assembled again in the client application after decryption.
Hope this idea will work for you, or at least give you some other ideas.
-Raul Garcia
SDE/T
SQL Server Engine