-
-
Save mattpass/2970878 to your computer and use it in GitHub Desktop.
Looking for functions to clean vars for different situations. | |
Can you improve on these, returning strings, urls and numbers: | |
function strClean($var) { | |
// returns converted entities where there are HTML entity equivalents | |
return htmlentities($var, ENT_QUOTES, "UTF-8"); | |
} | |
function urlClean($var) { | |
// returns a-z A-Z 0-9 / - . _ chars only | |
return preg_replace('/[^a-zA-Z0-9\/\-\._]/si','',$var); | |
} | |
function numClean($var) { | |
// returns a number, whole or decimal or null | |
return is_numeric($var) ? floatval($var) : false; | |
} |
@jedisct1 Many thanks for the advice and sorry I didn't provide context.
Usage will be sanitising user input via GET & POST and either rendered to screen or used in a SQL command, though we'll use PDO so they're safe (?) anyway. Aiming to prevent user hack attempts & bot attacks, especially SQL injection, XSS and remote file ref injection. Am not using a template, it's a PHP & HTML mix and want to call the appropriate function to get back a clean value.
Am using htmlentities in strClean, as it's the same as htmlspecialchars but also covers html equivalents. htmlspecialchars doesn't protect against \ or 0x attacks. I believe htmlentities extends the idea of htmlspecialchars to all covert chars with html equivalents or kills them.
urlClean is supposed to only allow safe chars in URLs, including the query string. Need something better here I think as / and & can form dangerous chars.
No, you're right, I don't want to return 0 in numClean on failures, so I'm now conditionally returning false. I changed it from being inval to *1 so it now covers decimals too.
To clean an URL I would use parse_url()
and work with the returned array. To clean a number I always use (int)$var;
and (float)$var;
. You can use intval($var);
and floatval($var);
if you want. Also there is an is_numeric()
function so your code may look like this:
function numClean($var) {
return is_numeric($var) ? floatval($var) : false;
}
Depends on the context you want to render to.
In order to render plaintext between properly balanced tags, htmlspecialchars() is all you need. This is not enough for Javascript or properties inside a tag, though. Javascript has some funny constraints. For example, Unicode chars 0x2028 and 0x2029 are valid JSON but not valid JS.
When you get a user-supplied string, check that it is a valid UTF8 sequence with mb_check_encoding($value, 'UTF-8').
What is urlClean supposed to do? Don't strip out characters, just properly encode them with urlencode() for query strings (both for variable names and values) and rawurlencode() for other components. Or use http_build_query().
intClean(): do you really want to return 0 if something is not a number? Do you want to accept 42FOOBAR as a valid number? If your application is just supposed to be fed with numbers that are actually numbers, you can test intval($x) === $x and just throw an exception or die() if the test fails. Don't be liberal if your application constructs the links.
How are you rendering the content? Are you using some template engine like mustache or smarty? Or just mixing php and html code?