Skip to content

Instantly share code, notes, and snippets.

@liukun
Last active November 11, 2024 09:02
Show Gist options
  • Save liukun/f9ce7d6d14fa45fe9b924a3eed5c3d99 to your computer and use it in GitHub Desktop.
Save liukun/f9ce7d6d14fa45fe9b924a3eed5c3d99 to your computer and use it in GitHub Desktop.
local char_to_hex = function(c)
return string.format("%%%02X", string.byte(c))
end
local function urlencode(url)
if url == nil then
return
end
url = url:gsub("\n", "\r\n")
url = url:gsub("([^%w ])", char_to_hex)
url = url:gsub(" ", "+")
return url
end
local hex_to_char = function(x)
return string.char(tonumber(x, 16))
end
local urldecode = function(url)
if url == nil then
return
end
url = url:gsub("+", " ")
url = url:gsub("%%(%x%x)", hex_to_char)
return url
end
-- ref: https://gist.github.com/ignisdesign/4323051
-- ref: http://stackoverflow.com/questions/20282054/how-to-urldecode-a-request-uri-string-in-lua
-- to encode table as parameters, see https://github.com/stuartpb/tvtropes-lua/blob/master/urlencode.lua
@DanielVukelich
Copy link

Line 10 is not the right regex for url safe characters. This code will incorrectly % encode the characters . _ - ~ even though they are already safe per https://tools.ietf.org/html/rfc3986#section-2.3. The correct one would be
str = string.gsub(str, "([^%w _ %- . ~])", char_to_hex)

@deltanedas
Copy link

you forgot to escape some patterns too bro
str = string.gsub(str, "([^%w _%%%-%.~])", char_to_hex)
If it isnt alphanumeric, a space, a % sign, a -, a . or a ~ it will decode

@kids0407
Copy link

Thanks for the gist. URLs can also be decoded/encoded online using UrlTools.org tool.

@kqvanity
Copy link

@DanielVukelich
Which option is the most resilient. Yours or op's or by @deltanedas ?

@liukun
Copy link
Author

liukun commented Sep 14, 2024

@kqvanity I think you can just try some cases yourself. There are many preferences when doing the encoding. I opted for a conservative approach that prioritizes a broader range of characters for percent-encoding. This choice ensures compatibility across various systems and contexts, even if it means encoding characters that RFC 3986, section 2.3, lists as safe (such as ., _, -, ~).

For environments that require strict adherence to the RFC, the regex can be adjusted as suggested, to exclude these characters from encoding:
url = url:gsub("([^%w _%%%-%.~])", char_to_hex)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment