lfx-mc.md

What problem are we trying to solve:

Certain workloads, such as databases, rely on headless services for discovery and for network operations. Headless services behave differently to normal clusterIP services. They do not do anything special, like representing a virtual IP, they just represent a convinience by which multiple pods can be associated.

Linkerd currently supports multicluster communication through headless services. This is also known as statefulset support (since most workloads that use a headless service are deployed as StatefulSets, the resource type provides a number of different guarantees).

The problem we have is that in a lot of cases, it is cumbersome to have a lot of different headless services that need to be passed to a StatefulSet. For example, let's say we have a cluster called source. In this cluster we have a database called my-cool-database. It has a service, database-cool-service. It will be passed as a CLI value: ./my-cool-database --discover database-cool-service.

Say we want to also discover nodes in different clusters. After all, database resiliency can be improved if we add more AZs, or replicas, or whatever. We link two clusters: foo and bar. We export database-cool-service and mirror it in source. Now we have database-cool-service-foo, database-cool-service-bar and our original database-cool-service. This means we have to change the way we deploy:

kubectl edit my-cool-database-statefulset
+ ./my-cool-database --discover database-cool-service,database-cool-service-foo,database-cool-service-bar
- ./my-cool-database --discover database-cool-service

Now we link another cluster baz. We have to go through this whole painful experience again. This adds overhead: requires manual intervention, requires patching, requires us to re-start the discovery process in my-cool-database. What happens if we remove a service? The exact opposite. It's not a robust mechanism.

If we can instead put all linked clusters under one umbrella, we can avoid adding and removing arguments as we go. We want to prototype this and see how well it would work in real scenarios. This means we need to figure out how to implement it. We also need to figure out how to test it, manually, at the very least. And we need to figure out what the constraints are. Not everything is going to be known upfront, but we need to at least be clear about what resources we plan to act on, and how we plan to implement this in a prototype. We also need to be clear about the dependencies (hint: we should only do this when statefulset support is enabled, for example, or better yet, prototype should not work if headless services/statefulset support works for Linkerd, otherwise we will have a resource clash. Or maybe it doesn't matter, maybe it's better to have 2 types of resources to preserve backwards compatibility? Which way do we go).

mateiidavid/lfx-mc.md