W3C has defined a set of illegal characters for use in XML . You can find info about the same here:
Here is a function to remove these characters from a specified XML file:
using System; using System.IO; using System.Text; using System.Text.RegularExpressions; namespace XMLUtils { class Standards { /// <summary> /// Strips non-printable ascii characters /// Refer to http://www.w3.org/TR/xml11/#charsets for XML 1.1 /// Refer to http://www.w3.org/TR/2006/REC-xml-20060816/#charsets for XML 1.0 /// </summary> /// <param name="filePath">Full path to the File</param> /// <param name="XMLVersion">XML Specification to use. Can be 1.0 or 1.1</param> private void StripIllegalXMLChars(string filePath, string XMLVersion) { //Remove illegal character sequences string tmpContents = File.ReadAllText(filePath, Encoding.UTF8); string pattern = String.Empty; switch (XMLVersion) { case "1.0": pattern = @"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|7F|8[0-46-9A-F]9[0-9A-F])"; break; case "1.1": pattern = @"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|[19][0-9A-F]|7F|8[0-46-9A-F]|0?[1-8BCEF])"; break; default: throw new Exception("Error: Invalid XML Version!"); } Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); if (regex.IsMatch(tmpContents)) { tmpContents = regex.Replace(tmpContents, String.Empty); File.WriteAllText(filePath, tmpContents, Encoding.UTF8); } tmpContents = string.Empty; } } }
Filed under: .net, C#, Code, Microsoft, Standards, Technical, XML | Tagged: Illegal Characters, W3c, XML
Here is the php version:
unction strip_invalid_xml_chars2( $in )
{
$out = “”;
$length = strlen($in);
for ( $i = 0; $i = 0×20)
&& ($current = 0xE000) &&
($current = 0×10000) && ($current <= 0×10FFFF)))
{
$out .= chr($current);
}
else
{
$out .= ” “;
}
}
return $out;
}
Hi Ramesh,
i just want the code for trimming non-printable characters.
The one placed on this page is going beyond the margins.
i would appriciate if you can just send it across to my mail id mentioned above.
thanks,
Nagesh
Hi Nagesh,
I have sent the code via email as requested.
Cheers,
Balaji
Hi Ramesh,
This class is just what I’ve been looking for. Would you mind mailing it to me as well?
Thanks,
Ryan
Hi Ryan,
I have mailed across the code.
Cheers,
Balaji
Hi, Ryan ……
Any chance at this late date you could mail me a copy of this code as well?
And thanks very much for providing it.
Hi, Balaji ……
Any chance at this late date you could mail me a copy of this code as well?
And thanks very much for providing it.
Hi Bud,
Please check your mail..
Cheers,
Balaji
Hi,
Could you email me the code please.
Hi,
Could you email this code please.
Got it, Balaji — thanks very much for sharing it.
Bud
Hello, this is great.
Can you send it to me as well? Thank you.
Sent!
Hi, would it be possible to get. Thanks very much.
Darwin,
Please check your mail.
Cheers!
hi Balaji
Could you please email the code as I am unable to view it due to margin issues on the webpage? I would really appreciate that.
Thanks again
Hi!
I tried implementing your function, but now I get an error saying “Illegal characters in path”. Any suggestions please?
many thanks,
TS
Hello Balaji,
I just came across your implementation, is it possible to email me your solution…thanks very much.
-Minhas