Skip to content

Instantly share code, notes, and snippets.

@hugmanrique
Last active October 26, 2020 05:34
Show Gist options
  • Save hugmanrique/60477e11901340f6cd1d020fa327423a to your computer and use it in GitHub Desktop.
Save hugmanrique/60477e11901340f6cd1d020fa327423a to your computer and use it in GitHub Desktop.
Paper PR about RegionFiles

Currently, the RegionFile (line 52) class extends the file if it's size is not a multiple of 4KB, but does it in a inefficient way:

for(int i = 0; i < (this.file.length() & 0xfff); i++) {
        this.file.write(0);
}

That code could potentially cause up to 4095 file extensions, when we could just do them at once with the following code:

long fileLength = this.file.length();

// Check if file needs to be extended
if ((fileLength & (0xFFF - 1)) != 0) {

  // Extend file to the next multiple of 4KB
  this.file.setLength((fileLength | (0xFFF - 1)) + 1);
}

Now, the RandomAccessFile javadocs say the following:

If the present length of the file as returned by the length method is smaller than the newLength argument then the file will be extended. In this case, the contents of the extended portion of the file are not defined.

There's a closed issue on the OpenJDK portal (https://bugs.openjdk.java.net/browse/JDK-6606216) where the issue author shows that Windows does in fact zero-fill the bytes between the old and the new EOF. This behavior is guaranteed on POSIX compliant systems (and Linux too):

The lseek() function shall allow the file offset to be set beyond the end of the existing data in the file. If data is later written at this point, subsequent reads of data in the gap shall return bytes with the value 0 until data is actually written into the gap. Source: http://pubs.opengroup.org/onlinepubs/009695399/functions/lseek.html

(The RandomAccessFile#setLength method calls this function internally: http://developer.classpath.org/doc/java/io/RandomAccessFile-source.html)

Every major operating system (Windows, Mac and Linux) follows this behavior (probably for security reasons too), but I think this issue should be discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment