(Transcript of the audio of https://youtu.be/OyuqM7RbX5E, thanks to OpenAI's Whisper turbo model and some minor tweaks and fixups)
Hi everyone, and welcome to Matrix Live Season 11 Episode 7 where you are stuck with me, Matthew, Project Lead for Matrix. I guess I'm going to be wearing both Matrix and Element CEO hats in this week's recording and I will try to identify which hat I'm wearing at any given point.
What I wanted to talk about this week is that there is clearly a lot of unhappiness being expressed out there about Matrix and Element this week. I think we've basically seen a blog post a day, sometimes twice, ending up on the front page of Hacker News or Lobste.rs or wherever, where folks are expressing quite reasonable disappointment and irritation at where Matrix and Element are at the moment. And so I basically wanted to try to respond, so was thinking of putting out a blog post to try to articulate this a bit more coherently. But I haven't had a chance to put it together, so let's risk me improvising one off the top of my head now.
I mean the root causes of the feedback that we're seeing are basically, well, manifold, but one of the main ones that keeps coming up is this feeling that most people experience Matrix via using Element, and Element currently has two clients: two apps out there. One is the classic Element mobile app which hasn't been updated for two years now; the other then Element X which is obviously the rewrite that we've done over the last two/three years in order to have a much, much, much better platform to build on. However, the Element X apps have not got the full feature set of the Classic ones and so as several folks have pointed out, this means that you either have a choice of a stale app which has performance issues and UX issues and hasn't been updated really in a few years other than for security issues... or, an incomplete app.
Which I guess this is entirely my fault as CEO of Element for green-lighting and in fact pushing for the whole rewrite in Element X! And this was not a failure mode that I anticipated, perhaps stupidly, in that the rationale here was that the improvements in usability and performance of Element X would be such that folks would be happy to forgive some of the more exotic features like Threads or Spaces being missing at first. And that's why back in September of last year, we went and made a big song and dance about launching it as a Signal or an iMessage or a WhatsApp-style replacement. I mean, none of those have spaces or threads for the messaging use case. I personally think that Element X is a really, really good app for supporting that use case.
However, in practice, we've had two problems. The first of all, casual users just still end up installing the old app and don't even know that Element X exists and use that to gauge their opinion of Matrix. And then more interestingly, perhaps existing power users are just not moving over to Element X: because it turns out people like spaces and threads way more than I personally ever realized. And I don't know; perhaps this is a weird self-selection bias where everyone assumes that people use apps similarly to how they see themselves and their colleagues and their friends and families using it. But in practice, I haven't used the classic Element app in over two years now. I've been busy dogfooding Element X and I personally have not remotely missed spaces and haven't really missed threads that much given that you do have compatibility: you can see the messages, you can respond in threads, even if they don't end up with dedicated UI like you get in the classic apps.
Either way, this is, to use the technical term, a cock-up. In that clearly there are a lot of very unhappy, angry people who have ended up feeling, basically, let down by Element due to being stuck sort of between two stools.
And it's a bit of a weird one because: this is in some ways a good problem to have.
If people were saying "Oh my God, Element is awful and there is no solution anywhere to be seen and here are all the terrible things wrong with it and they're never going to fix it", this would be like classic Mac OS back in the day, before Mac OS X, where it had been stuck with System 7 and then System 8 and 9 for years and years and years and years. And, you know, it could barely multitask. If an app crashed, it would take out the whole kernel. The performance was awful, the thing was stagnating, there weren't any new features, and speaking as a Mac user from back in the 90s when that was happening, almost everyone gave up. Everybody shifted over to Windows 95, some people started messing around with Linux, and everybody thought that Apple was toast. And there was no hope at all at that point.
Now, that's not where we are today with Matrix or Element.
In that: all of the hard stuff - the equivalent of building Mac OS X for Apple back in the day - has happened.
And it exists: it's Element X. You can even use it.
So, isn't it weird that the thing that has gone wrong, and the root of much of the negativity, is basically due to a failure of choreography and migration and positioning in terms of actually smoothly getting people over as rapidly as possible from the old to the new apps... as opposed to the fact that the new apps don't exist or that solutions don't exist to the problems which they're seeing.
So, I'm not saying that it's not a bad situation.
It's been made very abundantly clear to us that folks are unhappy with how it's played out, and I can only apologise.
The only good news I can give, though, is that the building blocks are sitting on the table to get out of this mess, even if they are not fully assembled yet.
So, I think one of the other things which has come up a lot is the perception that we're arrogant and defensive and don't listen. Hopefully, the last five minutes of monologue contradicts that in showing that we are listening!
We are trying to understand, we would be first to agree that we've screwed up here and are trying to fix it.
I do wonder sometimes whether the complaints people have about Element being arrogant or the Matrix project being arrogant are based on tribalism, where if they turn up in a chat room and say "Matrix is crap because X, Y, Z", and then some complete random, well-meaning random in the Matrix community responds: "No, you are wrong because A, B, C." - and then that person says, ah, the Matrix people are arrogant and don't listen, they then tar the whole project with that. That might be what's happening.
Or perhaps it's just because I spend too much time on HackerNews trying to justify why we've ended up where we are and try to explain how we're trying to fix it, and that gets seen as defensive. Perhaps like this sort of Matrix Live! I don't know.
But the point I'm trying to make is that we do listen to the feedback here and try to act on it and are not sitting here saying "There is nothing wrong with Matrix, there is nothing wrong with Element, our thing is perfect". It is the opposite.
Of all the people, I particularly feel frustrated that we've ended up in a kind of worst possible case where we've done the work and we've put the effort in, and yet people are still unhappy because of failure to execute on the transition on our side. And also just on the personal level, because the idea of pushing Element X out as soon as possible (as early as possible, so that people could use it and capture value in it, and from all the work that we've done in that) may have backfired in this way.
Another thing that comes up a lot is performance.
Again, I would be the first to agree that Synapse is not the fastest chat server ever. In fact, it's almost certainly the slowest. And what's interesting are the reasons for this. Everyone has their own pet theory as to why Synapse is perceptually slow, and they're almost always wrong. It's always things like, "it's written in Python, and Python is intrinsically slow" which is just not true. I mean, sure, Python is probably 10 times slower than Rust for a typical operation, but that should not be noticeable for the sort of operations that a chat server does.
Instead, it's fairly subtle things, like there is a bug, I forget the number (element-hq/synapse#17722 and element-hq/synapse#13356), out there, which we've yet to track down somewhere in the depths of Twisted or our HTTP stack, which means that sometimes requests pause for a second or two before responding. No, that's not great. If you're scrolling back and you hit the /messages API, you clearly expect an instant retrieval of a bunch of messages. Sometimes I think the record we've seen is about a 25-second delay. So, okay, I guess you might try to blame Python or Twisted or something for that. But in practice, I think you should probably just blame us for not having dug into that issue yet.
And, I don't know, there are other things like, again, on the /messages, endpoint: do you go and try to pull in history from dead servers or slow servers when you call that endpoint? And today, you do. And that's avoidable. You know, if you have a bunch of events that you haven't been able to retrieve from other servers, you shouldn't keep trying to pull them in whenever you backpaginate. Instead, there is a new MSC (MSC4282), I think, proposed by Benjamin Bouvier as part of the threads work on Element X, that says: "look, have an interactive mode on /messages that says give me the messages you already have rapidly and worry about the others in the background."
What other performance stuff is there? There is the fact that we fixed the sync performance in Sliding Sync, but many, many people just don't experience that because they're not using Element X. And the fact that Element Web still doesn't have it, again, makes people feel that things are slow. I saw somebody complain that Fractal takes minutes to sync for them when they turn it on at the moment, earlier today. And, yeah, I mean, Fractal doesn't use Sliding Sync, yeah, despite being based on Rust SDK, which does provide Sliding Sync as, you know, out of the box if one implements against the API.
There are many reasons there.
And the question is, of course, it's been 11 years of Synapse. In fact, 12 years of Synapse, maybe? What year is it? No, it's been 11 years. And we still haven't fixed some of these performance bogeys.
And it really boils down to prioritization within Element, which obviously always implemented Synapse and is now officially the maintainer, rather than the kind of shadow maintainer as we were when we were donating the work on it to the Foundation. And as the maintainer, we have a choice between doing things which are going to get funding in to allow us to go back to investing properly in Matrix, versus optimizing performance. When in practice, typical government deployments of Matrix don't hit up against those sort of performance issues. Perhaps the accounts are too small. Perhaps they don't federate as much. Perhaps the federation is more reliable. But in practice, it's not something that comes up very much.
So it means that other features like, I don't know, compliance, or Kubernetes distributions, or horizontal scalability or stuff like that ends up taking priority over relatively basic performance work.
So what I would say is, on Synapse particularly, there are some relatively low-hanging things where people could contribute to help on things like helping track down the nightmare HTTP pipeline stalling bug that I mentioned. Otherwise, we will get to it eventually.
But unfortunately, Element is still in the mode of trying to put its own oxygen mask on before helping the community put on theirs, as it were.
Although we're getting closer to that: in the last couple of weeks, we've seen a bunch more government deployments starting to talk about scaling up. But we still suffer a lot from some governments failing to route any funding at all upstream, either to the foundation or to Element. It won't be helpful to name names, but some of the biggest and most high-profile celebrated government matrix deployments still don't fund us at all.
Also, the ones who do just move slowly due to being governments and being subject to their yearly budgeting cycles, and having to do big European tenders to justify when they spend large amounts of money and all this sort of thing.
Anyway, I don't want this to be a victimisation sob story. One of the other things that people complain about is us complaining too much about being victims! But more just trying to explain how we've ended up in this bonkers situation of having to prioritise the things that can be sold to governments in order to make money over really basic quality of life things, like how rapidly can you create a room, say, or invite somebody or send a message.
We will get there, though.
What other things have people been complaining about?
I think there is an elephant in the room that Element X Android is suffering a set of problems that Element X iOS doesn't have. And a lot of the folks at Element use iOS, like myself and Amandine and Manu (the engineering manager for Element X). And as a result, some of the places where we're having problems like Android push notification reliability haven't had the visibility that they should have.
In fact, it's only in the last couple of weeks I realised just how bad some people's experiences of push on Android.
Which is weird because some people who are like massive Android users, like Patrick Alberts, the Chief Product Officer officer at Element, is a big Android fan. And he says his push is absolutely fine, despite being in tons and tons and tons of rooms with huge amounts of push. In fact, he complains about getting too many notifications as opposed to ever losing any.
So there is something clearly screwy with the kind of background energy quotas that you get, and whether you have gone and explicitly had to configure the app on Android to have maximum quota and don't worry about draining the battery, etc. And I think that varies on device. It varies on operating system specifics. And it really shouldn't be a problem in the first place because, hey, iOS also has very strict quotas on what you can do in a push notification. And we've already fixed that. And iOS push is working absolutely rock solidly, thankfully, at last.
So long story short, there are some things like that going wrong (and also the Send Queue, who had managed to not get turned on on Android, which was just a feature flag mess), such that the folks who judge Element X on Android are getting a worse experience than on iOS and therefore wondering what all the fuss is about.
We're very aware of this.
We are trying to figure out how to shift the balance there so that the Android folks get more support in order to dig into this.
And again, it's all open source, so if you're listening to this and you are a 'leet Android developer and you want to help us figure out how to reduce the footprint of the push notification background task on Android, please turn up in the dev room and offer to help.
Then you've got "Element X is stagnating" as a failure mode. So we pushed a lot and shipped Element X in September of last year, knowingly missing out on Threads, Spaces and stable VoIP. Now, since then, we still don't have Thread, Spaces and stable VoIP. There has been a lot of work in the background on this, designing it, debugging the VoIP stability stuff that needs to be solved.
But a lot of it is architectural: on VoIP, it means moving from state events to to-device messages for end-to-end encryption so that it scales better and so that it's more reliable and doesn't fall foul of state replication quirks. On threads, it means going and trying to get the semantics better than on the previous apps. I think in retrospect, this was an error: we should just be shipping parity with the previous apps... and then improving how you subscribe to threads, or how you could have an activity center where you can look at what threads are globally happening and all that sort of thing. And we are changing now in order to try to ship threads as rapidly as possible by just focusing on parity with what we had.
Spaces, again: we have excellent designs which are quite comprehensive, and we're changing tack to really try to ship the minimal viable thing and then iterate on it rather than take longer and get a really solid thing.
And a lot of this is frankly a reaction to times gone by when we moved very fast and cut too many corners and everybody complained about how crap Element's UX was... whereas here, the team is trying to oversteer and make sure that this time nobody can complain because it will be perfect.
As I said, we're trying to find the happy path between two extremes.
Meanwhile, there has just been a lot of under-the-hood work.
Like on the Rust SDK team, there has been a huge amount of work on the event cache, which is needed for threads. It's needed for swiping between media, having proper media indexes, it's needed for offline support, it's needed for accurate room previews and all sorts of things. And that project ended up being a lot of time to get right, and it landed and it's turned on.
But from an end-user perspective, I guess it's performance polish, effectively, and relatively minor features like being able to swipe between images in the gallery, rather than a big thing that was stopping people from using the app like threads and spaces.
And I go back to my point at the beginning that I'd totally mis-estimated how important Threads and Spaces are for power users and just average community users apparently.
What else?
People complaining about Matrix being bloated because it syncs DAGs. Honestly, I don't buy this one. Lots of folks saying that because Matrix replicates conversation history it's always going to be a big slow thing. This is just untrue. Look at Git. That syncs large DAGs much larger than Matrix and it does it very, very fast. It's a totally different sort of trade-offs to XMPP.
And, you know, if you want a message-passing system, please use XMPP. And if you want a conversation-syncing system please use Matrix.
Then we've got the problem that Element Web is stuck in many places. It's not yet moved over to Element X. We're worried about doing an Element X style rewrite having seen how things are going now on the mobile apps! Instead, what we're doing there is refactoring the UI so that it can run as MVVM components in isolation, which both gives us nice modular components but also means we can switch the SDK out from under it.
Now, what combination that ends up being, in terms of replacing what we have on the current app or moving the components over to Aurora, which is the codename of the Rust-based Element X Web idea, really still remains to be seen.
We're in the process of rewriting the timeline right now. And one of the things that again has come up a lot is scroll jumps. It drives me nuts that I cannot scroll up through chat rooms in Element Web or click on permalinks and then scroll around and not get caught in weird loops or OOMs, etc. All I can say is that, better late than never, we are rewriting the timeline component to address that, whilst also modularizing it as an MVVM module.
Meanwhile, we've got the new Left Panel, which has already been rewritten as MVVM and is a huge improvement. It's stuck in Labs but if you're not using it I very strongly recommend going to Labs on Element Web Nightly or Element Desktop Nightly and enabling it, because it will speed your app up, and it feels a hell of a lot better.
And also the membership list is also already MVVM'd. So we've basically got the left bit done, and we've got the right hand bit done and we're now doing the middle bit - at which point the UI will have been completely reworked and set up also in future to support use with Rust SDK.
Some people complain about encryption problems.
Encryption problems should be solved as of last September. The only scenarios I know where you will still get UTDs today are:
- if your server is buggy
- or if you're talking with somebody on a client that is buggy (potentially including the old Classic Element apps, although in theory they should be alright).
- or if you have logged out of all of your devices and so you don't have any keys any more, which we are addressing by dehydrated devices. This is implemented in Element Web, but needs to be again brought out from Labs. It should work, however. And it needs to be implemented in Element X, and I think we have customers on the Element side willing to fund that now.
- or if you have a network outage such that the servers can't send keys directly to one another (so that's a bit of an edge case).
Personally, I just don't get them anymore and I am a big power user. I'm in many many rooms, talking to many people in different environments. So occasionally, when I get people turning up saying "I cannot believe that I still get unable to decrypt errors everywhere!": simply, that should not be happening. If you submit a bug report I will personally go and champion it all the way through the encryption team here who continue to put unable to decrypt reports as 'P0, Drop Everything', and figure out what's going on.
Honestly I think a lot of the folks hitting these might be talking about experiences from over a year ago before we fixed things, but if you are seeing them today and it's not because you logged out of all of your devices, seriously please let us know. Let me know! matthew at matrix.org, or @matthew:matrix.org. DM me, tell me about them.
And then, finally: spam.
Obviously the spam problems this year have been awful. I thought we'd said quite a lot about the work going into addressing this - but there was one issue on the element-meta repo on github that hadn't been updated. Travis was good enough to give a really long update on the anti-abuse work we've been doing yesterday, which in fact I'll TWIM that so that people can find the link.
It's kind of useful to see where folks are at but also a lot of feedback could be seen as: "Clearly Matrix doesn't care about spam" or "Element doesn't care about spam". We really do, and obviously we haven't done as good a job at keeping it under control as we should have. I wish I could get in a time machine and go back 11 years and invest way more into antispam right from day one. We've been catching up as fast as we can whilst also operating on a shoestring, whilst also just not having budget to work on it properly on the Foundation side at all.
So hopefully we've seen some improvements; we're continuing to work on it: you can see the details in Travis's post, but I am in no way trying to dismiss the horrible experience that so many people have had, and I can only apologise for it.
Then other (finally) things people complain about:
Device verification problems: we are continuing to iterate on crypto UX; it's not finished yet; it is certainly better on Element X and Element Web than it used to be, but there are still scenarios where verification confuses people. In an ideal world we would be using PRFs to store recovery keys, storing them on things like Yubikeys or passkeys, optionally. I did a big deep dive in this a few months ago and frankly the browser and OS support was still not there yet. As soon as it is we will be first in line to use it as a way to hopefully make the recovery dance and the self-verification dance disappear for people who are willing to trust passkeys or Yubikeys (or should I say FIDO2 keys).
Then other problems: just people having a bad time on badly administered servers, or servers where Synapseis disgracing itself somehow. See all of the performance stuff about Synapse earlier.
And then really finally, I think a large part of it is just disappointment where people have been fans, expecting and hoping Matrix to move faster. We have basically over-promised and then failed to deliver in a fast enough manner. They've been waiting for two years for Element X to become the default, and it isn't, so they haven't started using it themselves, and so it might as well be that Element X stuff hadn't even happened in the first place.
I get it. I'm sorry. I mean the only good thing I can say is that the criticism is being listened to, and it frankly makes me feel more determined than ever to succeed in this - and I think that's the vibe I'm seeing from the whole team here, in terms of wanting to prove the doubters wrong and get out the other side of this. And also to never ever do a rewrite where you end up stuck between the old app and the new app.
It's a bit like the Osborne Effect, if people know their 1980s microcomputers, where Osborne Computer Company announced the Osborne 2, failed to ship it, and everybody gave up and moved over to Commodores and TRS80s and the like.
What else? A couple of other updates:
Premium accounts on matrix.org - what we call Operation Golden Eagle, although I'm not sure whether that codename is public or not. It's almost done! We got a lot of feedback that a 1MB limit for free accounts on matrix.org for file transfer was way too little, and ("amazingly!") we listened, and we fixed it, and we're going to launch with a 10MB file limit size instead, which hopefully will be a lot more palatable. And that will be going live next week, so lots of excitement on the horizon to hopefully encourage folks to pay for premium accounts so that we can help fund the running of matrix.org and the Foundation as a whole; particularly also for trust and safety work.
Then other big news: we pushed out the Terms of Use update for matrix.org this week (via the system alert system, which confused everybody as always, because there's no way to tell that it's a official alert other than the fact that it claims to be an official alert). It is official! And it caused further concerns by talking about the UK Online Safety Act compliance.
Now, the Online Safety Act is a very sore point. The fact is that Matrix.org or Element is based in the UK, but even if we weren't we would still be required to comply to the Online Safety Act for users who are in the UK, because it is British law.
We've tried to fight it extensively: I spent a huge amount of time last year going around trying to lobby and educate, and explain why turning the UK's internet system into a censorship regime like Russia's or China's might not be the wisest thing in the world. But I failed, and now it's law, and therefore we have no choice to comply with it.
Now how we comply with it is still up in the air. Options include banning users under 18 years old who are in the UK on matrix.org, or banning everybody under 18, or banning NSFW content in general so we don't have to be subject to the age verification stuff. We're still figuring it out. Please understand this obviously only applies to the servers which we happen to run; anybody running their own server can make whatever choices they like themselves. None of this applies to Matrix as a network as a whole: if you don't have users in the UK or you don't care about adhering to UK law or you don't think the UK authorities have jurisdiction over wherever you're operating, do whatever you want - but we have to stick to the law as it is written. We'll give more info as it comes in, but trust me: we will do everything we can to avoid nightmare age verification stuff if we possibly can.
An interesting question would be if it's better to try to just block NSFW on the matrix.org instance in general, and basically kowtow to censorship, versus kowtow to the idea of age verification by some mechanism. Anyway, we want to avoid having a disaster there, particularly given the negative sentiment towards us at the best of times at the moment! So watch the space to see how that pans out. The reason we haven't announced specifics is that we're still working on trying to avoid worst case scenarios.
So there you go. Sorry for a very long and slightly drained monologue; I hope it was useful to hear where our thoughts are at this point.
Thank you for folks who have given constructive feedback, and huge thanks to people who stuck up for us and said "well you know there may be some good things about Matrix", over the last week. It's very much appreciated. If anybody wants to write a blog post that says "you know, perhaps Matrix isn't quite as bad as some people claim" that would be great too! But otherwise all I can say is that we will fix the things which people are complaining about, and come back more powerful than you can possibly imagine.
Right! Have fun everybody; have a great weekend; speak to you soon. Bye!