Files are all over the Internet, and Go should provide a reasonable abstraction in its stdlib to handle this.
Files are everywhere. Local disk is the obvious place. But they're also in storage buckets like Google Cloud Storage and Amazon S3. They're in code repositories like GitHub and Google Source. They're in personal storage like Dropbox and Evernote. They're on other machines or filesystems on corporate Intranets. Tests which write to disk are much less error-prone if they write to volatile memory, such as a RAMFS.
Go 1's os
package provides an os.File
struct. The implicit assumption is that local disk storage is the only thing the stdlib should care about, and that external packages should provide access to these other sources. In 2007, this would have been a reasonable approach. In 2017, files are increasingly unlikely to be local.
For the Open Source Programs Office, we have a need to scan files, whether they exist on GitHub, Google Source, or locally. We want those scanners to be agnostic to how the file is opened and read, and those scanners need to be able to dictate how they walk the file system.
io.Reader
doesn't carry a path with it, so you'd need to put that in a structure along with it if you were to do any meaningful logging. You'll also need to carry along information about where the remote file repository is. You might be interested in other things, likeos.Stat
information. If you want to open other files from the same repository that a given file is, you'll need a pointer to something that implements that too.filepath.Walk()
takes a string and only walks the local disk.- Not all systems you interact with can simulate a remote file repository as a local disk using FUSE, and this necessitates a new download dependency (separate from
go get
?). os.File
is a struct, so you can't simulate a remote file repository as a local disk using native Go.
That I know of, there are two packages that deal with abstracting the file system:
At OSPO, we used Afero. We pass around an afero.Fs
(file system) with a url.URL
, with a fake schema (e.g. github://golang/go/master/README.md
), which the filesystem then parses and uses to find the correct method to retrieve a file (such as cloning the Git repository to /tmp
and then pointing all future accesses to that cache).
This is OK in practice, but forces the program and any API consumers to be aware of this library in a way that is inelegant. Having a standard means across all Go programs increases readability, reliability and code sharing.
Passing a filesystem as a parameter also has non-obvious limitations. For example, it is not possible to switch between repositories within the same function invocation: it has to use the file system it was passed. I bring this up not to discount that perhaps there was a better way to do it, but to note that such gotchas are hard to see before going far down an implementation, and that Gophers often look to the stdlib for the better way to do something.
This is a very similar issue to how the sql
package talks to different drivers. Any Go program can be written almost entirely agnostic of the database driver outside of the main package. The main
package can import the drivers it wants to allow, define a data source name adapater that means something to the driver, and pass around a *DB
from that point out. This is fairly analagous to passing an afero.FS
struct. But it has all the benefits of being standardized: readability, writability, and the ability to share file system accessors trivially. Being able to simply download a new driver for a new database system in a matter of seconds without needing to rewrite code is awesome. This isn't possible for file repositories.
I know that experience reports are supposed to be problems and not solutions, but in order to start the conversation, I'd take the approach from sql
and modify it slightly. A Register()
function can be used to register new file systems. os.Open
is translated to something like fs.Open(driver, path string) fs.File
(where fs.File
is an interface, not a struct) and a new Walk(driver, root string, walkFn WalkFunc)
is written that can walk these repositories.