You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to store files related to projects somewhere. The files can be
large (one example was 1GB). Some of the files need to be processed
locally in the backend. An example is FOOB files, which get converted
into geojson using command-line tools.
Solutions
Current situation: files in postgres
Currently, we store the files in postgres as blobs.
Pros:
Simple to implement
Cons:
Probably won't scale in performance
Will need a large disk allocated for postgres
Files get received and served via our application backend: potential performance risk
Files still stored in the database, so performance degradation if db is possible
Will need a large disk allocated for postgres
Files get received and served via our application backend: potential performance risk
Questions:
How is the performance?
Backups?
B) Files on disk
Allocate a large disk for the files, keep paths to the files in the db.
Pros:
Fairly straightforward to implement
Can separate database storage from file storage (e.g. fast small disk for postgres, large slow one for files)
Cons:
Will need diskspace management
Will need a seprate backup strategy
C) Object storage
An object storage service like Hetzner's S3 clone is meant for WORM
[write once read many times] workloads like this.
Pros:
No need for separate backup strategy
Files can be served from the object storage directly, instead of via the backend
Files can be uploaded directly to the object storage from the user's browser, instead of via the backend
The cloud has infinite space
Possibility to make the backend stateless in the future
Team is familiar with this approach (used in at least N.N's previous project)
Cons:
Will need more code
Will need a mock implementation for local development
Will need a migration script when we want to drop support for in-database blobs
Questions:
How do we handle local processing of eg. FOOB files?
How do the costs of Hetzner's S3 compare to disk space?
Which library to use with the S3 API?
Chosen solution
Let's go with C: Hetzner's Object Storage. It feels like the modern
solution to storing files, and lets us not worry about backups &
quotas. There are multiple providers of the S3 API, including local
ones, so we can always change providers later if needed.
Implementation plan
Initially:
Use Hetzner's S3 via the AWS Java SDK v2
using the Java SDK directly recommended by colleagues from XYZ
the Java SDK v2 lets us pull in only the S3 API instead of the whole AWS API
Use minio to run a local S3
for development purposes