Last active
December 14, 2016 16:59
-
-
Save Stanback/43c1401ac3c1f7edf31a5d6becc4a710 to your computer and use it in GitHub Desktop.
Regex for fixing improperly formatted URIs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
* Snippet to encode invalid characters from improperly formatted URIs | |
* | |
* RFC 3986 defines that URIs may contain the following characters: | |
* ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=`.``` | |
*/ | |
import java.net.URLEncoder | |
"""[^A-Za-z0-9-._~:/?#\[\]@!$&'\(\)*+,;=%`]|%[^0-9a-fA-F]{2}]""".r. | |
replaceAllIn("http://testingurl.com/?foo%20bar baz", m => URLEncoder.encode(m.group(0), "UTF-8")) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment