Skip to content

Instantly share code, notes, and snippets.

@alpenzoo
Created July 5, 2024 14:54
Show Gist options
  • Save alpenzoo/e92f5cdf8820772740f696a216520f00 to your computer and use it in GitHub Desktop.
Save alpenzoo/e92f5cdf8820772740f696a216520f00 to your computer and use it in GitHub Desktop.
ANSI sanitizing user input by removing unsupported Unicode characters for a Latin1 encoded PostgreSQL database.
<?php
function sanitizeLatin1($input) {
$output = '';
$unsupportedChars = [];
// Loop through each character in the input string
for ($i = 0; $i < mb_strlen($input, 'UTF-8'); $i++) {
$char = mb_substr($input, $i, 1, 'UTF-8');
// Convert the character to Latin1 encoding
$latin1Char = @iconv('UTF-8', 'ISO-8859-1//IGNORE', $char);
// If conversion is successful, append to output string
if ($latin1Char !== false && strlen($latin1Char) > 0) {
$output .= $latin1Char;
} else {
// If conversion fails, add to unsupported characters list
$unsupportedChars[] = $char;
}
}
return [
'sanitized' => $output,
'unsupported' => $unsupportedChars
];
}
// Example usage
$input = "This is a test string with unsupported chars: 😀, 你好, привет";
$result = sanitizeLatin1($input);
echo "Sanitized string: " . $result['sanitized'] . "\n";
echo "Unsupported characters: " . implode(', ', $result['unsupported']) . "\n";
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment