Skip to content

Instantly share code, notes, and snippets.

@vladak
Last active April 1, 2025 07:15
Show Gist options
  • Save vladak/beb43b9056b546869196d09a16542466 to your computer and use it in GitHub Desktop.
Save vladak/beb43b9056b546869196d09a16542466 to your computer and use it in GitHub Desktop.
java URL constructor deprecation

context: JDK 21 support: oracle/opengrok#4459

existing PR from Lubos contains some of the changes: oracle/opengrok#4570

Specifically, all the URL() constructors are being deprecated in Java 21: https://docs.oracle.com/en/java/javase/21/docs/api//java.base/java/net/URL.html. The only way to get URL object is to create URI object first and then use toURL() method to convert it (the URL.of() static method requires URI argument).

Fun fact: URL can establish connection to the target using the openConnection() method (even via proxy).

My note: oracle/opengrok#4570 (comment):

I took a longer look at the departure from the URL() constructors yesterday. The URL class by itself does not support any encoding/decoding, it merely breaks down the URL into pieces. The encoding is then supplied by the URI class.

The problem is that URI(String) accepts only already encoded URIs and refuses any non-compliant characters. I think the way to go is to propagate the URL object all the way where possible, i.e. change the String url parameter type of the various linkify() and buildLink() methods to URL url and avoid passing the url in the map of attributes as String - pass it as an extra argument with the URL type.

It is still not clear to me where the unescaped characters could originate from.

The URI(String) constructor requires that the URI in the string is already encoded:

jshell> URI uri = new URI("http://example.com/bug?foo ");
|  Exception java.net.URISyntaxException: Illegal character in query at index 26: http://example.com/bug?foo 
|        at URI$Parser.fail (URI.java:2976)
|        at URI$Parser.checkChars (URI.java:3147)
|        at URI$Parser.parseHierarchical (URI.java:3235)
|        at URI$Parser.parse (URI.java:3177)
|        at URI.<init> (URI.java:623)
|        at (#24:1)

jshell> URI uri = new URI("http://example.com/bug?foo%20");
uri ==> http://example.com/bug?foo%20

However, when using one of the hierarchical URI constructors, the individual parts can contain other characters and the URI class will encode them:

jshell> URI uri = new URI("http", "example.com", "/bug", "foo ", null);
uri ==> http://example.com/bug?foo%20

The documentation states this behavior:

The single-argument constructor requires any illegal characters in its argument to be quoted and preserves any escaped octets and other characters that are present.

The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character ('%') is always quoted by these constructors. Any other characters are preserved.

The encoding is preserved when converting URI to URL:

jshell> URI uri = new URI("http", "example.com", "/bug", "foo ", null);
uri ==> http://example.com/bug?foo%20

jshell> URL url = uri.toURL();
url ==> http://example.com/bug?foo%20

jshell> url.toString()
$3 ==> "http://example.com/bug?foo%20"

The problem is in OpenGrok the URL might come more frequetly as a String and rarely in pieces. These are the places:

  • history.jsp: Util.linkifyPattern(cout, bugPattern, "$1", Util.completeUrl(bugPage + "$1", request));
    • the bugPage and bugPattern come from the configuration
  • history.jsp: Util.linkifyPattern(cout, reviewPattern, "$1", Util.completeUrl(reviewPage + "$1", request));
    • the reviewPage and reviewPattern come from the configuration
  • repository.tag: Util.linkify(ObjectUtils.defaultIfNull(Util.redactUrl(repositoryInfo.parent), "N/A"))}
    • this displays the parent/origin of a repository, comes from the repository (git remote equivalent for Git, hg paths default for Mercurial, etc.)

The Util.encodeURL() method as it is currently coded basically abuses the URL/URI relationship to provide encoded URL:

    /**
     * Encode URL.
     *
     * @param urlStr string URL
     * @return the encoded URL
     * @throws URISyntaxException URI syntax
     * @throws MalformedURLException URL malformed
     */
    public static String encodeURL(String urlStr) throws URISyntaxException, MalformedURLException {
        URL url = new URL(urlStr);
        URI constructed = new URI(url.getProtocol(), url.getUserInfo(),
                url.getHost(), url.getPort(),
                url.getPath(), url.getQuery(), url.getRef());
        return constructed.toString();
    }

Trying to encode the whole string using URLEncoder does not work:

jshell> URLEncoder.encode("http://example.com/foo \"")
$22 ==> "http%3A%2F%2Fexample.com%2Ffoo+%22"

The conclusion is that the responsibility of URL encoding needs to be pushed further from the OpenGrok code, unless we reimplement the break down of String into URL portions. For the bug/review pages onto those who set these in the configuration and for the repositories onto those who configure them.

For the latter I did a quick check:

  • Mercurial seems to encode the URLs:
$ grep -A 2 '\[paths\]' ~/.hgrc
[paths]
default = http://example.com:443/foo"bar
$ hg paths default
http://example.com:443/foo%22bar
  • Git does not (at least the Git CLI which is different to JGit used in OpenGrok):
/tmp/gitrepo (master)$ git remote add foo 'http://example.com:80/foo"bar'
/tmp/gitrepo (master)$ git remote -v
foo	http://example.com:80/foo"bar (fetch)
foo	http://example.com:80/foo"bar (push)

The way it works in linkifyPattern() is that it constructs the link first, with value e.g. <a href="http://example.com/bug$1" rel="noreferrer" target="_blank">$1</a> and then it uses the Pattern substitution so that the $1 gets replaced with the first group in the matched regular expression. The trouble with this approach is that the "$1" string is attached to the URL and set in the title before it can be properly encoded. The $ character is allowed to exist within URI so it works, however this feels like works-by-accident rather than intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment