Skip to content

Instantly share code, notes, and snippets.

@ramsey
Created November 16, 2011 20:04
Show Gist options
  • Select an option

  • Save ramsey/1371181 to your computer and use it in GitHub Desktop.

Select an option

Save ramsey/1371181 to your computer and use it in GitHub Desktop.
xmlentities() implemented in PHP (provides functionality similar to htmlentities())
<?php
function xmlentities($s)
{
static $patterns = null;
static $replacements = null;
static $translation = null;
if ($translation === null) {
$translation = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
foreach ($translation as $k => $v) {
$patterns[] = "/$v/";
$replacements[] = '&#' . ord($k) . ';';
}
}
return preg_replace($patterns, $replacements, htmlentities($s, ENT_QUOTES, 'UTF-8'));
}
@wez
Copy link
Copy Markdown

wez commented Nov 18, 2011

Just happened to notice this; since it's likely that you're calling this more than once per page load, consider caching the table building:

function xmlentities($s) {
    static $patterns = null;
    static $reps = null;
    static $tbl = null;
    if ($tbl === null) {
        $tbl = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
        foreach ($tbl as $k => $v) {
            $patterns[] = "/$v/";
            $reps[] = '&#' . ord($k) . ';'
        }
   }
  return preg_replace($patterns, $reps, htmlentities($s, ENT_QUOTES, 'UTF-8'));
}

also: this seems rather expensive; adds 101 regex replaces per call

@ramsey
Copy link
Copy Markdown
Author

ramsey commented Nov 18, 2011

Thanks for the pointers, Wez!

I hadn't thought about the table needing to be built each time, but I was aware of the number of regex replaces per call. This was some source I had sitting around on my blog from a long time ago, and I was just moving it here in preparation for updating my blog.

I'll update the function to include the static variables to save on table creation, but do you have any recommendations on making this a less expensive call?

@wez
Copy link
Copy Markdown

wez commented Nov 19, 2011

I'd try to avoid needing to do this in the first place; htmlspecialchars encodes just the characters that are special to XML and HTML. If you have UTF-8 text, I would just declare the XML doc with the right charset/encoding attribute in the <?xml tag and then use the natural UTF-8 text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment