Skip to content

Instantly share code, notes, and snippets.

@cfalzone
Created June 11, 2014 13:48
Show Gist options
  • Save cfalzone/945a04f8d93f61c44a7a to your computer and use it in GitHub Desktop.
Save cfalzone/945a04f8d93f61c44a7a to your computer and use it in GitHub Desktop.
/**
* Cleans a block of text for XML
* This was needed because some hidden characters commonly used in Japanese text can foul up the push publisher using SAXParser
*
* @param input The input to be cleaned
* @return The cleaned input
*/
public String cleanForXML(String input) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < input.length(); i++) {
char c = input.charAt(i);
if (XMLChar.isValid(c)) {
sb.append(c);
}
}
// Found a couple that this was not replacing that SAXParser was complaining about
String invalidXmlPattern = "[^"
+ "\\u0008\\u0009\\u000A\\u000D"
+ "\\u0012\\u0020-\\uD7FF"
+ "\\uE000-\\uFFFD"
+ "\\u10000-\\u10FFFF"
+ "]+";
String r = sb.toString().replaceAll(invalidXmlPattern, "");
return r;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment