- Status: draft
- Author: [email protected], @[email protected]
Version | Date | Changes |
---|---|---|
3.0 | 2024-01-22 | Re-write; include the operation information in the nodeinfo |
2.0 | 2023-09-25 | Re-write; replace the original suggestion to use the OpenAPI definition with a simpler specification |
1.0 | 2023-08-14 | Initial draft |
This document proposes an extension to the NodeInfo schema that would allow developers of Mastodon and Mastodon-like servers to unambigiously communicate the operations their servers support, and allow developers of clients for those servers to detect these features and update their UX accordingly.
This document is written for:
- The maintainers of the NodeInfo specification
- Developers of Mastodon and Mastodon-like servers
- Developers of clients for those servers
After reading this document you should:
- Understand the general problem this is intended to solve
- Understand the proposed solution
- Understand alternatives to the solution, and why they are not appropriate
- Be able to provide feedback on the proposal
For the purposes of this document a "Mastodon or Mastodon-like" server is a server that presents the Mastodon client API, optionally with extensions to that API that provide additional functionality. These servers include, but are not limited to:
- Mastodon
- Glitch
- Hometown
- Pleroma
- Akkoma
- Firefish
- Iceshrimp
- Sharkey
- Friendica
- GoToSocial
Clients of these servers have an API discovery problem. Since different servers support different (but similar) APIs the client has to determine what API operations the server supports.
Given the wide variety of servers that are available, and their many forks, it's not feasible for clients to maintain an accurate list of all the possible server software names while mapping the names to API features.
Instead the server should have a mechanism for advertising the operations it supports.
The client would use this when determining what features to show the user, without needing to employ complex, error-prone heuristics.
This would also provide a clear mechanism for Mastodon and Mastodon-like servers to incrementally deploy new features and deprecate old ones without inconveniencing clients.
It also provides a clear mechanism to advertise server functionality without continually bolting it on to the "instance info" mechanism in the inconsistent fashion that has been done so far.
The rest of this document sets out the specific problems I'm interested in solving, with motivating examples, and then describes how the new approach would solve these problems.
- Get feedback from developers of client and server software for Mastodon and Mastodon-like systems
- Get buy-in from client developers that they would support this in their apps if adopted
- Get buy-in from one or more server developers to support this in their servers
Changes are made to the Mastodon API in a manner that is not easily discoverable by clients.
For example, Add POST /api/v1/conversations/:id/unread by ClearlyClaire · Pull Request #25509 · mastodon/mastodon · GitHub adds a new API endpoint (api/v1/conversations/:id/unread
).
The only way a client can discover that this API exists is to maintain, per-client, a mapping between Mastodon server version and the API supported at each version.
This is:
- A lot of work for each client
- Something that every client needs to do
- Easy to get wrong
- Doesn't scale across multitudes of different servers
The Instance information contains a configuration
block that has some, but not all the information necessary to determine the features a server supports.
Other servers have extended this information in incompatible ways (e.g., the pleroma
block).
Mastodon-like servers implement some or all of the Mastodon API.
In many cases they also extend the API, providing additional functionality (local-only posting, quoting, markdown formatting, bookmarks, etc.)
In some cases that functionality has already been incorporated in Mastodon (e.g., bookmarks), in other cases there are plans to include that functionality in Mastodon (e.g., quoting, markdown formatting).
This leads to three problems.
- There is no simple way for clients to know which parts of the Mastodon API the server supports
- There is no simple way for clients to know if the server supports additional operations
- If Mastodon decides to implement an API that was first introduced in a Mastodon-like server there is no way for clients to detect this, without recompiling the client with new information about what features a given Mastodon server version implements
Server developers already have a lot of work to do. Any proposal should therefore be straightforward to implement. Additional complexity, such as changing the contents of existing API responses, or requiring developers of different servers to tightly coordinate when new functionality is introduced is going to make it less likely that groups adopt any proposed solutions.
A given Mastodon or Mastodon-like server supports a set of operations.
To expose those to the user a Mastodon client needs to know:
- Which operations does the server support?
- What's the overlap between the operations the server supports and the operations the client supports?
Therefore we need:
- A unique identifer for each operation that a set of servers supports identically
- A mechanism for a server to report the operations it supports
Operations are named after the reverse FQDN of the server software that first implemented that operation, then an arbitrary number of dot-separated components determined by the server authors.
This ensures that operation IDs are unique without needing tight coordination between different server developer groups.
For example:
org.joinmastodon.api.statuses.post
org.joinmastodon.api.statuses.translate
io.github.glitch-soc.api.statuses.bookmark
dev.iceshrimp.api.notes.reactions.create
[!NOTE] Precise reverse FQDN to use for each server is to be decided
This example use the reverse FQDNs for the server's primary websites or documentation sites, but each server group would determine and document the reverse FQDN for their server's operations.
[!NOTE] Dot-separated components do not have to map 1:1 to API endpoint components
In these examples the dotted components after the
api
correspond to the path components of the API endpoint, but there is no requirement that they do so.
Each operation exists at one or more semver-compatible (v2.0.0) versions. Semver is used because it is a widely deployed standard, easily understandable, and client libraries that can parse this format are available across many different programming languages.
For example, in the Mastodon API documentation "Post a new status" describes the API for posting a new status. That API has changed three times in the Mastodon server implementation.
- Initial implementation
- Support for
scheduled_at
- Support for
poll
There are no backwards-incompatible breaking changes across those versions so this is the same operation at three different versions; per semver the major version stays the same and the minor version is incremented.
1.0.0
- initial implementation1.1.0
- support forscheduled_at
1.2.0
- support forpolls
[!IMPORTANT] These version numbers are unrelated to the version number of the software that introduced the operation
Bookmarking statuses originated in the glitch-soc fork and was incorporated in to Mastodon.
Therefore, the ID for the bookmark operations -- if they are compatible with the glitch-soc implementation -- use the io.github.glitch-soc.api
prefix.
io.github.glitch-soc.api.statuses.bookmark
@1.0.0
- bookmark a statusio.github.glitch-soc.api.statuses.unbookmark
@1.0.0
- remove a status from bookmarksio.github.glitch-soc.api.timeline.bookmarks
@1.0.0
- fetch a timeline of the user's bookmarksio.github.glitch-soc.api.timeline.bookmarks
@1.1.0
- fetch a timeline of the user's bookmarks, supportingmin_id
andmax_id
simultaneously
Clients must be able to discover which operations the server supports and the endpoints to use for those operations.
To do this the nodeinfo (determined via /.well-known/nodeinfo
) schema should be extended to support a new clientApis
property.
The property's value is a map from a string key -- the operation ID -- to a set of one or more semver versions of the operation that the server supports.
For example:
"clientApis": {
...
"org.joinmastodon.api.some.operation": ["1.0.0", "1.1.0", "1.2.0", "2.0.0"]
...
}
[!NOTE] Unordered versions
The supported version operations is not ordered; client code should treat this as a set, not a list.
[!NOTE] Not limited to Mastodon / Mastodon-like servers
This
clientApis
map is not limited to operations supported by Mastodon/Mastodon-like servers. This is a general mechanism that can be used by servers to expose information about their supported operations and could be used by other Fediverse software like Lemmy, KBin, etc.
Because of the semver rules for breaking changes servers may omit earlier versions from the list if they are included in a later version. In the previous example the 1.0.0
and 1.1.0
versions can be omitted as a server supporting v1.2.0
of an operation implicitly supports all preceding versions with the same major number.
"clientApis": {
...
"org.joinmastodon.api.some.operation": ["1.2.0", "2.0.0"]
...
}
[!NOTE] There is no need to specify the operation semantics
The semantics of each {operation, version} pair are already known by the client (for each operation it supports). Semantics like whether these endpoints are
GET
,POST
,DELETE
, orPATCH
, the exact names of the URL query parameters, the API endpoint, etc.In other words, it is not permissible for a server to advertise an existing operation ID and change anything about how that operation works. The server developers should either register a new operation ID, or implement the operation as a new version (bumping the major version if it is a breaking change).
Servers where the set of supported operations is not user configurable would need to maintain a static map of operations to versions, and return that map as part of the nodeinfo response.
If the set of operations is user configurable (e.g., perhaps the server software supports a translation API but the server operator has not enabled translation support) the nodeinfo response would need to be dynamically generated from the current software configuration.
In both cases developing a new operation or changing an existing operation would require the developers to:
- Determine the operation's version number, following semver backwards-compatible rules
- Document the behaviour of the new operation / version
- Include the new operation / version in the server's response
To provide the best user experience client developers would fetch the operations map when the user logs in.
If the client supports a particular operation at a particular version the client can query the map and determine whether the concrete version they need is in the map, or met by a higher version. Semver client libraries are available for Kotlin and Java (Android) and Swift (iOS), as well as many other languages.
If the server does not support the operation the client can fall back to a different operation, or disable the particular operation in the UI.
To use the example from earlier, Add POST /api/v1/conversations/:id/unread by ClearlyClaire · Pull Request #25509 · mastodon/mastodon · GitHub which adds a new API endpoint (api/v1/conversations/:id/unread
).
The server would report this as:
"clientApis": {
...
"org.joinmastodon.api.conversations.id.unread": ["1.0.0"]
...
}
and a client that wanted to conditionally support this would query the operations map for org.joinmastodon.api.conversations.id.unread
with any version entry with a major version of 1
, and if the operation/version pair is not found then disable the "Mark a conversation unread" UI affordances where they occur.
Yes.
I have started implementing the client side of this in Pachli. At the moment this uses server version parsing heuristics to maintain a Pachli-specific map of operations and supported versions (Server.kt ) and then query the server's reported capabilities and adjust the UI accordingly.
For example, this snippet conditionally enables the "edit filters" UI only if the user's server supports filtering.
Maintaining the server-specific operations map in Pachli is error prone, slow to update, and does not benefit the wider ecosystem of Mastodon clients and servers, hence this proposal.
This solves the problems described earlier:
- "[[#The supported API is not easily discoverable]]"
- The client can easily discover the specific operations the server supports, and adjust UX accordingly
- "[[#No standard way for Mastodon servers to advertise that some functionality is disabled]]"
- The
clientApis
property must reflect the active configuration of the server.
- The
- "[[#No standard way for Mastodon-like servers to advertise their functionality to clients]]
- If a Mastodon-like server implements a Mastodon-compatible API endpoint it lists that endpoint using the relevant
org.mastodon...
operation identifier.
- If a Mastodon-like server implements a Mastodon-compatible API endpoint it lists that endpoint using the relevant
- [[#Server developers have too much to do]]
- This proposal doesn't modify any existing API responses
- For a given server the list of supported operations can be statically configured, and does not change after the server has launched
- The work of developing a dictionary of supported operations can be sharded amongst different groups
- Server developers have a vested interest in contributing details of operations specific to their server, so more third party clients support them
- Client developers have a vested interest in reviewing and contributing details of operations specific to servers their users use, to make their clients more attractive to potential users
- No coordination is required between different groups of server developers to develop operation IDs
- Developers are incentivised to re-use existing operations instead of inventing new ones
- Implementing an existing operation in a compatible manner with another server increases the speed with which your users will be able to use the feature in their preferred clients.
This proposal doesn't address how clients can discover any limits associated with the operations. For example, how many characters are allowed per post, or the number of options that can be included in a poll.
That information is already included in the server's /api/v2/instance
call (in the language of this proposal, the org.mastodon.api.instance
operation).
I did consider extending the clientApis
definition so that each operation mapped to an object that contained multiple keys, like this:
clientApis": {
"org.joinmastodon.api.statuses.post": {
"1.0.0": {
"endpoint": "/api/v1/statuses",
"limits": {
"max_characters": 500,
// ...
},
"mimeTypes": ["text/plain"],
// ...
},
"1.1.0": { /* ... */ }
}
}
That would significantly complicate this proposal, increasing the risk that it's not adopted. There's also no clear value in doing this.
It's tempting to think that operations could be broken down in to smaller parts.
For example, instead of different versions for the "post a status" operation you could include more specific capabilities in the operation description:
"clientApis": {
...
"org.joinmastodon.api.statuses.post": {
"contentWarning": true,
"polls": true,
"media": true,
...
}
...
}
This indicates this server supports the "post a new status" operation with statuses that include content warnings, polls, and media.
You don't do that because it results in a combinatorial explosion of the different sub-types of operations that clients need to support, without any significant benefit.
Even the example above is incomplete; for example, some servers support including images in content warnings, so a simple boolean for the contentWarning
property is insufficient.
So treating the thing-that-has-to-be-versioned as the operation (post a status, translate, reblog, etc) seems to be the better level of granularity.
A server could include metadata in each response that contains an object that describes the operations that can be performed on that object. For example, the Status object could be modified to include an operations
property that looks like this:
{
"id": "103270115826048975",
"created_at": "2019-12-08T03:48:33.901Z",
...
"operations": {
"org.joinmastodon.api.statuses.reply": ["POST", "https://example.com/api/v1/statuses"],
"org.joinmastodon.api.statuses.view": ["GET", "https://example.com/api/v1/statuses/103270115826048975"],
"org.joinmastodon.api.statuses.favourite": ["POST", "http/api/v1/statuses/103270115826048975/favourite"],
... etc
}
}
}
This is the Hypermedia as the engine of application state (HATEOAS) model.
It's an interesting approach, and a possible future direction. But it would require significant work on the part of server developers to implement as it would affect every response returned by the server.
On the other hand the approach in this proposal is static content in the nodeinfo response. It's significantly easier to implement and iterate on.
This could go the other way, and instead require servers to have a consistent name and parseable version number, and expect clients to keep a map of "server A at version V can perform operations X, Y, and Z".
I think this is the wrong approach because:
- It requires every client development team to independently maintain a mapping between server versions and capabilities
- It requires client updates whenever a server is released that supports a capability the client already supports on another server
Re that last point a worked example might make it clearer.
Suppose there are two server types, A and B. A supports operations X and Y, B supports X, Y, and Z.
A client is released which supports operations X, Y, and Z, and is hardcoded with knowledge about which server type supports a given operation.
A new version of server type A is released which now supports operation Z as well. But users of the client who connect to server type A cannot benefit from this until a new version of the client is released with updated information about the capabilities of server type A.
With the proposal in this document this problem does not occur; if a client supports operation Z (at a given version) and a server advertises that it supports that operation then the client can choose to use it without needing a new release.
This is better for our users.
OpenAPI is a popular schema for defining an API. The server could just return the OpenAPI schema for the API that it supports.
I did consider this (an earlier version of this proposal was built around it). But it complicates the data the client needs to process, and includes data that the client will ignore.
Consider the /api/v1/timelines/home
endpoint, which would have an operation ID something like org.joinmastodon.api.timelines.home
under this proposal.
This is the OpenAPI definition for that endpoint, copied from the GoToSocial project's OpenAPI definition (swagger.yaml, the descriptions have been deleted to keep this a reasonable length):
/api/v1/timelines/home:
get:
description: |-
The statuses [... deleted ...]
operationId: homeTimeline
parameters:
- description: [deleted]
in: query
name: max_id
type: string
- description: [deleted]
in: query
name: since_id
type: string
- description: [deleted]
in: query
name: min_id
type: string
- default: 20
description: [deleted]
in: query
name: limit
type: integer
produces:
- application/json
responses:
"200":
description: Array of statuses.
headers:
Link:
description: [deleted]
type: string
schema:
items:
$ref: '#/definitions/status'
type: array
"400":
description: bad request
"401":
description: unauthorized
security:
- OAuth2 Bearer:
- read:statuses
summary: See statuses/posts by accounts you follow.
tags:
- timelines
Most of the information in that definition is redundant for the client.
It's absolutely essential information to have for the server developer, and for producing documentation.
But the client should already have this compiled in. The contract between the client and the server, if the server reports that it supports the org.joinmastodon.api.timelines.home
operation at v1.0.0 is that:
- the valid parameters are
max_id
,since_id
,min_id
, andlimit
. - the response is JSON
- there will be pagination details in the
Link
header
So returning an OpenAPI definition to the client significantly complicates things for no benefit.
OpenAPI is also is endpoint-oriented; by which I mean that the definition leads with the endpoint (/api/v1/statuses
) and then describes the single operation that is present at that endpoint.
This is backwards to what we need, where the operation comes first, and multiple operations might be supported at the same endpoint.
Not an exhaustive list:
- Mastodon Issues
- Mastodon PRs (these all attempted to add OpenAPI definitions)
- docs(open-api): Add OpenAPI Specification by oneslash · Pull Request #20000 · mastodon/mastodon · GitHub
- Feat/add rswag in order to generate verified openapi docs by casaper · Pull Request #20607 · mastodon/mastodon · GitHub
- [proposal] Machine readable API specification via OpenAPI by takayamaki · Pull Request #25043 · mastodon/mastodon · GitHub
- Blog posts
- Mastodon-like servers