http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding
https://bob:[email protected]:8080/file;p=1?q=2#third
Part | Data |
---|---|
Scheme | https |
Users | bob |
Password | bobby |
Host address | www.lunatech.com |
Port | 8080 |
Path | /file |
Path parameter | p=1 |
Query parameter | q=2 |
Fragment | third |
HTTP URLs
is URL with http
or https
schemes.
"/photos/egypt/cairo/first.jpg" has four path segments: "photos", "egypt", "cairo" and "first.jpg"
Each path segment can have optional path parameters (aka. Matrix parameter), e.g. /photos;px=1;py=2/egypt/..
The reserved characters must be URL-encoded, e.g. ://
, /
, ?
, &
, e.g.
http://example.com/xyz?.jpg
needs to be encoded to http://example.com/xyz%3F.jpg
ASCII chars no need to escaped except the reserved chars.
Non-ASCII, we must know which encoding used to encode chars.
Latest version of URI standard defines that new URI schemes and host names use UTF-8
, but how about path??
In path fragment, a space is encoded to %20
, while +
can be left unencoded.
In query part, a space could be encoded to either +
(for backwards compatibility) or %20
, while +
is encoded to %2B
.
blue+light blue
: http://example.com/blue+light%20blue?blue%2Blight+blue
- "?" is allowed unescaped anywhere within a query part,
- "/" is allowed unescaped anywhere within a query part,
- "=" is allowed unescaped anywhere within a path parameter or query parameter value, and within a path segment,
- ":@-._~!$&'()*+,;=" are allowed unescaped anywhere within a path segment part,
- "/?:@-._~!$&'()*+,;=" are allowed unescaped anywhere within a fragment part.
Analysis of reserved chars and URL parts has to be done before URL-decoding.
The implication is that URL-rewriting filters should NEVER decode a URL before attempting to match it if reserved chars are allowed to be URL-encoded.
// BAD - http://example.com/a/b?c is INCORRECT
String pathSegment = "a/b?c";
String url = "http://example.com/" + pathSegment;
// GOOD
String url = "http://example.com/" + URLUtils.encodePathSegment(pathSegment);
// BAD
// "http://example.com/?query=a&b==c" is not what we want
// "http://example.com/?query=a%26b==c" is what we want
String value = "a&b==c";
String url = "http://example.com/?query=" + value;
Paring URL should happen before URL decoding, while getPath()
will decode then parse.
URI uri = new URI("http://example.com/a%2Fb%3Fc");
// BAD
for(String pathSegment : uri.getPath().split("/"))
System.err.println(pathSegment);
// GOOD
for(String pathSegment : uri.getRawPath().split("/"))
System.err.println(URLUtils.decodePathSegment(pathSegment));
// In HTML
// BAD
var url = "#{vl:encodeURL(contextPath + "/view/ + resource.name)}";
// GOOD
var url = "#{contextPath}/view/{vl:encodeURLPathSegment(resource.name)}";