We want only public profiles to be searchable and visible in the user directory search. By public profiles ("canonical" maybe?) we mean the thing returned by GET /profile/USER_ID
.
We're getting names into the user directory (user_directory
and user_directory_search
) via calls to update_profile_in_user_dir
.
There are five calls:
main/user_directory.py
(regenerate the user directory)_populate_user_directory_process_users
. 1 call site, reads data from theprofiles
table. Only does stuff ifuser_directory_search_all_users
._populate_user_directory_process_rooms
. 1 call site, reads from theroom_memberships
table
handlers/user_directory.py
(incrementally update the user directory)handle_local_profile_change
. 4 call sites, all reading fromprofiles
_handle_profile_change
. 1 call site, reads from a room event. Seems to handle both local and remote users._handle_new_user
. 2 call sites, one reading from a room event, andother fromroom_memberships
.
I think the idea behind these two tables is:
- combine local user profile info (
profiles
) with remote user profile info (hinted at inroom_membership
) into a single place. (At the cost of maintaining a second source of truth?) - factor out the text search stuff into a separate table, so we can limit the scope of postgres/sqlite differences
Any per-room "private names" will appear in room membership events, which end up getting stored in room_memberships
. So per-room names can get into the user_directory table via the bolded entries above.
To fix the leak without changing the existing source code too much, we'd need to ensure that the bolded entries always refer to a "public profile". When the user in question is local to this homeserver, we can use query the profiles
table. Otherwise... well, we have no alternative but to query the remote user's homeserver. We'll have to do that in an efficient way to stop ourselves from spamming the same profile request in short succession if a lot of rename events come through.
Profile update events will come through to lots of rooms. Wait until we've had $THRESHOLD seconds without a duplicate request before firing one off. (Or don't fire one off unless the last time we tried was $THRESHOLD seconds ago?
I think the code paths would be simpler if we made the config flag user_directory_search_all_users
just decide which WHERE
clause to use when searching the user directory. I don't think it needs to control the data we put into the user_directory
table.
As a bonus, doing this would mean that a server admin can toggle this config flag live without having to restart their server.
The cost of doing so is an extra entry in the user_directory
and user_directory_search
table for users on this homeserver who aren't in any rooms. I think this is negligible. The search query we use will
- Remove the guards in
DeactivateAccounHandler:activate_account
,ProfileHandler:set_displayname
,ProfileHandler:set_avatar_url
,RegistrationHandler:register_user
. This means changes to local profiles will immediately propagate to theuser_directory
tables. - Remove the guards in
_populate_user_directory_createtables
and_populate_user_directory_process_users
so that these will always scan for local users.
TODO: want to exclude "support users" from directories at all stages. Can a user change from being support to not support? Will support users always be local to this homeserver?
The above can then be solely responsible for adding and updating local users' profiles to user_directories. So the UserDirectoryHandle can completely ignore any per-room profiles for local users.
- Change
_populate_user_directory_process_rooms
to only adds user_directory entries for remote users. Do this by skipping local users in the loop overusers_with_profile
. Note that we need to still consider these users for the purposes of computingusers_in_public_rooms
andusers_who_share_private_rooms
.- TODO: why do we skip over remote users if the room is not public? (last
else
block)
- Because we're only ever going to be searching
users_who_share_private_rooms
where the first user is local?
- TODO: What's an appservice? Why might we not want to add its users to
users_in_public_rooms
andusers_who_share_private_rooms
?
- TODO: why do we skip over remote users if the room is not public? (last
- Change
_handle_deltas
to continue iftyp == EventTypes.Member
,_get_key_change
indicates no membership change and the user is local
- Find a name for
_populate_user_directory_process_rooms
which hints that we'll also populate the "who's in what room" tables - Use an enum instead of
Optional[bool]
for the return type of_get_key_change
Clarify the role of UserDirectoryHandler
. I think it has three purposes:
- Deciding the result of a user search (
POST /user_directory/search
) - Exposing hooks for other parts of the application to call when local user profiles are added, changed or removed
- Respond to room changes
- publicness change
- user added or removed
- user changed their profile within that room
I suggest
- Move
notify_new_event
down belowhandle_user_deactivated
- Rename
handle_user_deactivated -> handle_local_user_deactivated
I think this means we'll need some kind of queuing mechanism to deduplicate and send out requests across the federation.
I think we should treat this as a separate piece of work because the surgey above already feels intimidating.