Skip to content

Instantly share code, notes, and snippets.

@nikclayton
nikclayton / msla.md
Last active January 10, 2025 21:56
Mastdon SLAs part deux

And it's still a largely meaningless number.

We don't set SLAs because we like setting SLAs.

We set SLAs because we're trying to capture an idea of "How badly can our service perform and our users are still happy?"

As an aside, what we're really talking about here are SLOs ("Objectives"). An SLA is the agreement you have with a customer if you break the SLO. But since we started with SLA as the term I'll stick with it.

They're not the only tool in our toolbox that helps with this. And like any tool, used as intended they can provide useful insight, but use them carelessly and they can be very misleading.

@nikclayton
nikclayton / Mastodon SLAs.md
Last active January 21, 2025 10:35
Mastodon SLAs.md

Note

My previous post got some feedback from junior SWEs arguing -- roughly, and I'm paraphrasing -- the costs were way off and anyway this wasn't important because last year Hachyderm had "99.996% uptime" relying on a handful of volunteers.

This feels like a teachable moment, so here's an excerpt from the discussion that followed. I'm paraphrasing and removing names because the goal here is not to call out people earlier in their career, it's to highlight some of the things you need to think about when deciding if you want to run a service that people can rely on.


99.996% uptime or something

Right, but that's meaningless without knowing what the goal was, and an agreement on what "downtime" means. If the goal was higher then Hachyderm failed.

Script to trigger the (possible) Dragonfly bug on a Mastodon account.

If this doesn't find any problems within the first limit=90 posts on your home timeline you can step back further by adjusting the first curl command in the script.

#!/bin/sh
#
# Assumes
#  - curl and jq are installed
#  - $TOKEN is your account's bearer token

To: Admins of tldr.nettime.org

Your server is configured with a reference to a URL on a server with an invalid SSL certificate.

To recreate this;

Fetch https://tldr.nettime.org/.well-known/nodeinfo

At the time of writing this has the following content (prettified):

[napkin sketch]

  1. Create 1-N test bot accounts that follow different subsets of each other. Have them post predictable content on a regular schedule.

  2. Monitor the bot posting software. Have it report errors and latency, and store the ID of each post when it's generated.

  3. Write a client that, authenticated as each bot, fetches various permutations of the bot's home timeline with/without min_id/max_id/limit set. Because each bot's posts are predictable you can predict the timeline contents for each query, using the status IDs stored earlier. Have the client report good/bad results and latency to the monitoring system.

  4. Alert on errors, and if latency exceeds some threshold.

If you're not sure about any of this you can contact the GoToSocial devs at https://codeberg.org/superseriousbusiness/gotosocial#contact -- there should be enough information here for them to determine if this is a misconfiguration of your server, a GoToSocial bug, or a Mastodon bug.

Your account name isn't displaying properly on Mastodon. It looks like this:

image

Note the :purplerabbit: in there.

This is because you have several emoji references in your account's display_name, but your account info doesn't include all the emoji details, so other servers can't show it.