Skip to content

Instantly share code, notes, and snippets.

@cfalzone
Created June 11, 2014 15:27
Show Gist options
  • Save cfalzone/21b25d49c5a605ebf504 to your computer and use it in GitHub Desktop.
Save cfalzone/21b25d49c5a605ebf504 to your computer and use it in GitHub Desktop.
/**
* Cleans a block of text for XML
* This was needed because some hidden characters commonly used in Japanese text can foul up the push publisher using SAXParser
*
* @param input The input to be cleaned
* @return The cleaned input
*/
public String cleanForXML(String input) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < input.length(); i++) {
char c = input.charAt(i);
if (XMLChar.isValid(c)) {
sb.append(c);
}
}
// Found a couple that this was not replacing that SAXParser was complaining about
String invalidXmlPattern = "[^"
+ "\\u0009\\r\\n"
+ "\\u0020-\\uD7FF"
+ "\\uE000-\\uFFFD"
+ "\\ud800\\udc00-\\udbff\\udfff"
+ "]";
String r = sb.toString().replaceAll(invalidXmlPattern, "");
return r;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment