Last active
March 2, 2023 06:48
-
-
Save kipcole9/6f66236350f4ae9eb0f2c4c63b3b1869 to your computer and use it in GitHub Desktop.
Sanitize a string using [unicode_set](https://hex.pm/packages/unicode_set)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
defmodule Sanitize do | |
# Unicode sets are defined at https://unicode-org.github.io/icu/userguide/strings/unicodeset.html | |
require Unicode.Set | |
# Defines a guard that is the intersection of alphanumerics and the latin script plus the | |
# space and underscore characters. Note that the set is resolved at compile time into an | |
# integer expression and is therefore acceptably performant at runtime. | |
defguard latin_alphanum(c) when Unicode.Set.match?(c, "[[:Alnum:]&[:script=Latin:][_\\ ]]") | |
def sanitize_string(<<"">>), do: "" | |
def sanitize_string(<<c::utf8, rest::binary>>) when latin_alphanum(c), do: <<c::utf8, sanitize_string(rest)::binary>> | |
def sanitize_string(<<_c::utf8, rest::binary>>), do: sanitize_string(rest) | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Example