Wednesday, March 7, 2012

Encoding errors

Somewhat complex:
I'm getting the following error in a sp (SQL Server 2000):
"Server: Msg 6603, Level 16, State 1, Procedure sp_xml_preparedocument, Line
8
XML parsing error: Switch from current encoding to specified encoding not
supported."
When a ntext xml string has the encoding set to "utf-8", but not utf-16. So
it would seem
obvious that the document stored in the database is in utf-16...?
However, when the data is written to the database, the XmlDocument class in
.NET (1.1)
is convinced the document is utf-8 (looking in the VS debugger). The call to
the sp in
C# uses:
parameter = command.Parameters.Add("@.Body", SqlDbType.NText);
parameter.Direction = ParameterDirection.Input;
parameter.Value = xd.OuterXml;
where xd is the XmlDocument. I can only conclude that somewhere the xml is
being converted
into a .NET string (and hence utf-16), I would guess at .OuterXml. But I
can't see a way
round this. At some point I need to pass the parameter value!
Any help would be appreciated.
Rgds
Peter JohnstonThe problem here is that both UTF-16 and UTF-8 are encodings for UCS-4
Unicode characters but they are quite different beasts at the byte level.
SQL Server's ntext type is for historical reasons supporting UCS-2 Unicode
and for our XML parser thus should only be used to contain UCS-2 or UTF-16
encoded XML (UTF-16 uses two 2-byte code points, the so called surrogate
pairs, to represent the Unicode characters that go beyond UCS-2).
UTF-8 is a variable-length multi-byte encoding (1 to 4 bytes) of UCS-4 and
thus conflicts with the two-byte representation of ntext. The server-side
XML parser does not allow switching to UTF-8 while it parses ntext
characters since it could lead to data corruption.
Unfortunately, SQL Server does not really support a UFT-8 code page for
text/varchar either. I think that the windows-1256 encoding is mostly UTF-8
code point safe though (meaning, it does not change the code points to ?).
If you need to load data in SQL Server 2000 that contains UTF-8 encoded
data, you have the following options:
1. Change the encoding of the document to UTF-16 before passing it to the
server as ntext parameter (both the native and managed client XML libraries
have mechanisms to set the encoding)
2. If you know that you can just drop the encoding indicator, since you are
only loading ASCII anyway, do so. But be careful - even some 127-255 range
characters can get corrupted this way.
3. Load the data with the UTF-8 encoding into a text that has a collation
that implies Windows-1256.
In SQL Server 2005, we have added the ability to load the data into the XML
datatype either as XML or as varbinary() in which cases UTF-8 will work
without problems.
Best regards
Michael
"Peter Johnston" <nospam@.nospam.com> wrote in message
news:usP7DnCwFHA.2076@.TK2MSFTNGP14.phx.gbl...
> Somewhat complex:
> I'm getting the following error in a sp (SQL Server 2000):
> "Server: Msg 6603, Level 16, State 1, Procedure sp_xml_preparedocument,
> Line 8
> XML parsing error: Switch from current encoding to specified encoding not
> supported."
> When a ntext xml string has the encoding set to "utf-8", but not utf-16.
> So it would seem
> obvious that the document stored in the database is in utf-16...?
> However, when the data is written to the database, the XmlDocument class
> in .NET (1.1)
> is convinced the document is utf-8 (looking in the VS debugger). The call
> to the sp in
> C# uses:
> parameter = command.Parameters.Add("@.Body", SqlDbType.NText);
> parameter.Direction = ParameterDirection.Input;
> parameter.Value = xd.OuterXml;
> where xd is the XmlDocument. I can only conclude that somewhere the xml is
> being converted
> into a .NET string (and hence utf-16), I would guess at .OuterXml. But I
> can't see a way
> round this. At some point I need to pass the parameter value!
> Any help would be appreciated.
> Rgds
> Peter Johnston
>

No comments:

Post a Comment