Skip to content

Instantly share code, notes, and snippets.

@opqdonut
Last active December 4, 2024 15:08
Show Gist options
  • Save opqdonut/7977daf7a13ad422b8f06d3625ebd115 to your computer and use it in GitHub Desktop.
Save opqdonut/7977daf7a13ad422b8f06d3625ebd115 to your computer and use it in GitHub Desktop.
An example of an Architecture Decision Record for a real project

Architecture Decision Records

This directory contains Architecture Decision Records (ADRs) that document technical decisions made during the development of the system.

A good ADR should have the following parts:

  • A date
  • People involved with the decision
  • Problem statement
  • Current situation with pros/cons
  • Solution options with pros/cons
  • Chosen solution (with date!) with justification
  • Implementation plan
  • Links to source code & Notion when applicable
  • Updates (with dates) if the decision gets revisited

File Storage

2024-12-04

Author: @opqdonut

Involved: N.N. A.B.

Problem statement

We need to store files related to projects somewhere. The files can be large (one example was 1GB). Some of the files need to be processed locally in the backend. An example is FOOB files, which get converted into geojson using command-line tools.

Solutions

Current situation: files in postgres

Currently, we store the files in postgres as blobs.

Pros:

  • Simple to implement

Cons:

  • Probably won't scale in performance
  • Will need a large disk allocated for postgres
  • Files get received and served via our application backend: potential performance risk

A) Postgres large objects

Postgres large objects are meant for uses like this.

Pros:

  • Drop-in replacement for current situation

Cons:

  • Not familiar to development team
  • Files still stored in the database, so performance degradation if db is possible
  • Will need a large disk allocated for postgres
  • Files get received and served via our application backend: potential performance risk

Questions:

  • How is the performance?
  • Backups?

B) Files on disk

Allocate a large disk for the files, keep paths to the files in the db.

Pros:

  • Fairly straightforward to implement
  • Can separate database storage from file storage (e.g. fast small disk for postgres, large slow one for files)

Cons:

  • Will need diskspace management
  • Will need a seprate backup strategy

C) Object storage

An object storage service like Hetzner's S3 clone is meant for WORM [write once read many times] workloads like this.

Pros:

  • No need for separate backup strategy
  • Files can be served from the object storage directly, instead of via the backend
  • Files can be uploaded directly to the object storage from the user's browser, instead of via the backend
  • The cloud has infinite space
  • Possibility to make the backend stateless in the future
  • Team is familiar with this approach (used in at least N.N's previous project)

Cons:

  • Will need more code
  • Will need a mock implementation for local development
  • Will need a migration script when we want to drop support for in-database blobs

Questions:

  • How do we handle local processing of eg. FOOB files?
  • How do the costs of Hetzner's S3 compare to disk space?
  • Which library to use with the S3 API?

Chosen solution

Let's go with C: Hetzner's Object Storage. It feels like the modern solution to storing files, and lets us not worry about backups & quotas. There are multiple providers of the S3 API, including local ones, so we can always change providers later if needed.

Implementation plan

Initially:

  • Use Hetzner's S3 via the AWS Java SDK v2
    • using the Java SDK directly recommended by colleagues from XYZ
    • the Java SDK v2 lets us pull in only the S3 API instead of the whole AWS API
  • Use minio to run a local S3 for development purposes
    • Nice guide here
    • Can be also used for integration & end-to-end tests
    • Smaller tests can use a simpler fake written in clojure
    • Also evaluated localstack, but it doesn't have persistence (objects disappear when container restarts)
  • Handle uploads via the backend, just like now
    • FOOB handling can work as currently
    • less risk of concurrency issues (eg. file uploaded to S3 but registering it with the backend fails)
  • Serve files to the frontend directly from S3
    • Use presigned GET urls
  • Keep both code paths: storing files in postgres; storing files in S3
    • Will delay need to migrate existing files
  • Keep geojsons in the database

Next:

  • Upload files directly from the browser to S3
    • Hetzner's S3 doesn't support notifications, so the browser must tell the backend when the upload is done
    • The backend will need to download the file from S3 for things like FOOB conversion
  • Remove support for files stored in postgres
    • Migrate existing files to S3

Future possibilities:

  • Store geojson data in S3 objects as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment