Skip to content

Instantly share code, notes, and snippets.

@Xevion
Created November 11, 2024 17:57
Show Gist options
  • Save Xevion/1c255edf13eba661144c0ccf63c21fee to your computer and use it in GitHub Desktop.
Save Xevion/1c255edf13eba661144c0ccf63c21fee to your computer and use it in GitHub Desktop.

Why SQLite is the right database for Hydrus

If you were directed to this page, one or more of the following may apply to you:

  • you suggested porting Hydrus to another database,
  • you have misconceptions about SQLite,
  • you are uninformed about databases in general.

Don't worry though, you are not the first (hence this document). Please do read on.

Also note that this document is not official Hydrus documentation and was not written by the Hydrus Developer.

Reasons to use SQLite in Hydrus

It is portable and requires no setup

SQLite can be bundled and shipped with Hydrus. It is available on all operating systems that Hydrus can possibly support. It requires no additional setup, the user doesn't need to know anything about SQL or databases. The entire database is a single file, easily copied or backed up.

None of this is true for the alternatives.

Or as the SQLite website puts it:

SQLite does not need to be "installed" before it is used. There is no "setup" procedure. There is no server process that needs to be started, stopped, or configured. There is no need for an administrator to create a new database instance or assign access permissions to users. SQLite uses no configuration files. Nothing needs to be done to tell the system that SQLite is running. No actions are required to recover after a system crash or power failure. There is nothing to troubleshoot.

SQLite just works.

It's safe, stable and well tested

Being safe and stable is very important, since Hydrus stores all configuration, tags, image metadata, subscriptions, etc. in the database. If the database gets corrupted, you lose everything except the image files.

SQLite is by far the best tested and most stable freely available database software. It is tested to aviation-grade standards. Other databases don't even come close.

Let me just quote the SQLite website again:

"As of version 3.23.0 (2018-04-02), the SQLite library consists of approximately 128.9 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 711 times as much test code and test scripts - 91772.0 KSLOC."

Yeah, that is more than 91 million lines (!) of test code.

Of course not just the amount of test code but also the testing methods are important. Here is a summary of how SQLite is tested:

  • Three independently developed test harnesses
  • 100% branch test coverage in an as-deployed configuration
  • Millions and millions of test cases
  • Out-of-memory tests
  • I/O error tests
  • Crash and power loss tests
  • Fuzz tests
  • Boundary value tests
  • Disabled optimization tests
  • Regression tests
  • Malformed database tests
  • Extensive use of assert() and run-time checks
  • Valgrind analysis
  • Undefined behavior checks
  • Checklists

You can read a detailed description of SQLite testing here.

It is no wonder that basically everyone uses SQLite. It is found in:

  • Every Android device
  • Every iPhone and iOS device
  • Every Mac
  • Every Windows10 machine
  • Every Firefox, Chrome, and Safari web browser
  • Every instance of Skype
  • Every instance of iTunes
  • Every Dropbox client
  • Every TurboTax and QuickBooks
  • PHP and Python
  • Most television sets and set-top cable boxes
  • Most automotive multimedia systems
  • Countless millions of other applications

As of 2018, the United States Library Of Congress officially names SQLite as a recommended storage format for the preservation of digital content.

You are probably using several instances of SQLite right now on your PC, smartphone and other devices while not even knowing about it.

It is fast

SQLite is faster than using the filesystem directly. Since it runs in-process, it has no communication overhead like the "big" database systems. SQLite is thread safe. You can use SQLite from multiple threads without problems. In many cases it has comparable or better performance than MySQL or PostgreSQL.

Here are some (sadly out of date) benchmarks to give you an idea:

  • "SQLite 2.7.6 is significantly faster (sometimes as much as 10 or 20 times faster) than the default PostgreSQL 7.1.3 installation on RedHat 7.2 for most common operations."

  • "SQLite 2.7.6 is often faster (sometimes more than twice as fast) than MySQL 3.23.41 for most common operations."

It can do everything that Hydrus needs

SQLite is a full-featured SQL implementation. It has all basic features expected from a SQL database. It provides far more than what Hydrus uses or needs. Of course it has support for transactions too.

SQLite isn't just for "small" databases either. It handles terabyte-scale databases just fine (the file size limit for a SQLite database is around 140 terabytes - if you have single files of that size, you have much bigger problems than SQLite limits anyway).

Contrary to commonly spread myths, you could also run most websites from SQLite just fine (unless you have a very high amount of concurrent writes, which most websites don't).

It is free software

This almost goes without saying, but SQLite is 100% free in both senses of the word (in fact, it is in the public domain). This means that the source code is freely available and you can do with it whatever you want. Both the code comments and the accompanying documentation is extensive and high quality.

It is made by people who know what they are doing

This is increasingly important in a world where every code monkey shitter who can copypaste together some barely working framework-of-the-month flavored JavaScript from StackOverflow dreams himself a "software engineer" or "computer scientist".

Quoting the SQLite website once again:

"The intent of the developers is to support SQLite through the year 2050.

  • At this writing, 2050 is still 34 years in the future. Nobody knows what will happen in that time, and we cannot absolutely promise that SQLite will be viable or useful that far out. But we can promise this: we plan as if we will be supporting SQLite until 2050. [..]

  • Cross-platform: SQLite runs on any platform with an 8-bit byte, two's complement 32-bit and 64-bit integers, and a C compiler. It is actively tested on all currently popular CPUs and operating systems. [..]

  • Aviation-grade testing: Every machine-code branch instruction is tested in both directions. Multiple times. On multiple platforms and with multiple compilers. [..]

  • Disaster planning: Every byte of source-code history for SQLite is cryptographically protected and is automatically replicated to multiple geographically separated servers, in datacenters owned by different companies. Thousands of additional clones exist on private servers around the world. The primary developers of SQLite live in different regions of the world. SQLite can survive a continental catastrophe.

  • Old school: Nobody is completely immune to trends and fads, but the SQLite developers work hard to avoid being sucked into the latest programming fashion. Our aim is to produce timeless code that will be readable, understandable, and maintainable by programmers who have not yet been born.

[..]

Our goal is to make the content you store in SQLite today as easily accessible to your grandchildren as it is to you."

Reasons to not port Hydrus to another database

Other databases are complicated to set up and secure

Using another database engine would mean that the users themselves would have to install, configure and properly secure a separate database system. Most people don't know how to do that, nor should they be required to become database administrators just to use Hydrus. There are a lot of things that can break in a database server, and bundling SQLite avoids almost all of these problems.

It would be a lot of work, better spent elsewhere

Hydrus is large, both in number of features and amount of code. The entire project is run by a single developer working alone. Porting Hydrus to use another database engine would be a lot of work for minimal (if any) gains, and there are a lot of far more urgent issues to resolve.

Other topics

Why not use an ORM?

Ultimately, this is the choice of the developer. However, the value of using ORMs in general is questionable:

  • SQL is fine and it just works. Writing SQL by hand is really not that difficult. While SQL might not be the best design ever, it is far from bad and does its job just fine. The Python sqlite module which Hydrus uses is also well written and great to work with.

  • It's another layer of abstraction and another place where things can break. ORMs increase complexity while hiding how the database is used exactly, making optimizations and debugging more difficult.

  • ORMs are a leaky abstraction. If your use case is complicated enough (which Hydrus is), eventually you will end up fighting the ORM or working around it. Either way, you won't escape having to learn SQL.

  • It is propagating object oriented fetishism. OOP is not the solution to all (or most) problems and we have far too much low quality OOP code shat out by code monkeys who know nothing else already.

When to use other databases?

So if SQLite is so great, why are other databases in widespread use? Well, there are a lot of things SQLite can't do. Fortunately for us, none of them are relevant to Hydrus, but here is a(n incomplete) list of situations when you are better off with something else:

  • You need to access the database through the network. (No, the Hydrus Public Tag Repository is not an example of this.)

  • You have multiple users accessing the database and you need user account management, separate databases for different users, user right management and other multiuser features.

  • You have high volume applications with many concurrent users writing the database.

  • You have petabyte-scale or larger data sets. SQLite handles a few terabytes just fine. The file size limit of a SQLite database is around 140 terabytes.

  • You need to use a non-standard SQL extension or other advanced feature that is only found in some specific database engine.

  • You are a memelord and think that SQL is not "cool" enough. Enjoy reinventing the wheel and suffering from corrupt and inconsistent databases.

Further reading

Check out the SQLite website. The documentation is great, with many informational pages further explaining the above topics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment