[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ale] OT: Illegal XML Characters



I am not an XML expert, but might it be the supplier is using UTF-8 
without declaring it?.  I'm reasonably sure that XML supports a way to 
declare the character set in use.  If that is the case, then it would 
simple enough to add the declaration yourself.  

Of course, that doesn't change the fact that the XML is still 
(technically) invalid.  To answer your question more directly look to 
see what the default character set is if none is declared.

David Corbin


Mike Millson wrote:

>I am in a situation where I am having to parse the xml that another company
>is passing to a client of mine. The data often contains illegal characters.
>The most usual culprits are hex C0, 80, and b7. Having these bytes in the
>xml stream causes my parser to die. I have run the xml through an
>independent xml validator on the web, and the validator  says the xml is
>bad. I have forgotten the error with C0 and 80. The message for b7 is
>"Error: Input error: Illegal UTF-8 start byte <0xb7>."
>
>I need to be able to clearly explain to the client that it is the other
>company passing invalid xml, not my parsing that is at issue, and they
>should validate their xml before sending it out the door. I'm going through
>the W3C documentation and haven't anything to clearly explain (at least to
>me) why the above characters are not legal.
>
>Anyone have any ideas how to explain why C0, 80, and b7 are not valid xml
>characters?
>
>Thank you,
>Mike
>
>
>---
>This message has been sent through the ALE general discussion list.
>See http://www.ale.org/mailing-lists.shtml for more info. Problems should be 
>sent to listmaster at ale dot org.
>
>




---
This message has been sent through the ALE general discussion list.
See http://www.ale.org/mailing-lists.shtml for more info. Problems should be 
sent to listmaster at ale dot org.