Skip to content

Instantly share code, notes, and snippets.

@mattpass
Created June 22, 2012 07:00
Show Gist options
  • Save mattpass/2970878 to your computer and use it in GitHub Desktop.
Save mattpass/2970878 to your computer and use it in GitHub Desktop.
PHP var security
Looking for functions to clean vars for different situations.
Can you improve on these, returning strings, urls and numbers:
function strClean($var) {
// returns converted entities where there are HTML entity equivalents
return htmlentities($var, ENT_QUOTES, "UTF-8");
}
function urlClean($var) {
// returns a-z A-Z 0-9 / - . _ chars only
return preg_replace('/[^a-zA-Z0-9\/\-\._]/si','',$var);
}
function numClean($var) {
// returns a number, whole or decimal or null
return is_numeric($var) ? floatval($var) : false;
}
@jedisct1
Copy link

Depends on the context you want to render to.

In order to render plaintext between properly balanced tags, htmlspecialchars() is all you need. This is not enough for Javascript or properties inside a tag, though. Javascript has some funny constraints. For example, Unicode chars 0x2028 and 0x2029 are valid JSON but not valid JS.

When you get a user-supplied string, check that it is a valid UTF8 sequence with mb_check_encoding($value, 'UTF-8').

What is urlClean supposed to do? Don't strip out characters, just properly encode them with urlencode() for query strings (both for variable names and values) and rawurlencode() for other components. Or use http_build_query().

intClean(): do you really want to return 0 if something is not a number? Do you want to accept 42FOOBAR as a valid number? If your application is just supposed to be fed with numbers that are actually numbers, you can test intval($x) === $x and just throw an exception or die() if the test fails. Don't be liberal if your application constructs the links.

How are you rendering the content? Are you using some template engine like mustache or smarty? Or just mixing php and html code?

@mattpass
Copy link
Author

@jedisct1 Many thanks for the advice and sorry I didn't provide context.

Usage will be sanitising user input via GET & POST and either rendered to screen or used in a SQL command, though we'll use PDO so they're safe (?) anyway. Aiming to prevent user hack attempts & bot attacks, especially SQL injection, XSS and remote file ref injection. Am not using a template, it's a PHP & HTML mix and want to call the appropriate function to get back a clean value.

Am using htmlentities in strClean, as it's the same as htmlspecialchars but also covers html equivalents. htmlspecialchars doesn't protect against \ or 0x attacks. I believe htmlentities extends the idea of htmlspecialchars to all covert chars with html equivalents or kills them.

urlClean is supposed to only allow safe chars in URLs, including the query string. Need something better here I think as / and & can form dangerous chars.

No, you're right, I don't want to return 0 in numClean on failures, so I'm now conditionally returning false. I changed it from being inval to *1 so it now covers decimals too.

@maettig
Copy link

maettig commented Jun 22, 2012

To clean an URL I would use parse_url() and work with the returned array. To clean a number I always use (int)$var; and (float)$var;. You can use intval($var); and floatval($var); if you want. Also there is an is_numeric() function so your code may look like this:

function numClean($var) {
    return is_numeric($var) ? floatval($var) : false;
}

@mattpass
Copy link
Author

@maettig Thanks. Used that as a better solution to my $var*1 === $var ? $var : false; effort. Will use look at parse_url to get the various elems and then sanitise the values as @jedisct1 suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment