Skip to content

Instantly share code, notes, and snippets.

@DMRobertson
Last active August 31, 2021 12:36
Show Gist options
  • Save DMRobertson/b63d7caa1c7c1e39d88d98f27b233bc2 to your computer and use it in GitHub Desktop.
Save DMRobertson/b63d7caa1c7c1e39d88d98f27b233bc2 to your computer and use it in GitHub Desktop.

Goal

We want only public profiles to be searchable and visible in the user directory search. By public profiles ("canonical" maybe?) we mean the thing returned by GET /profile/USER_ID.

Where does the leak come from?

We're getting names into the user directory (user_directory and user_directory_search) via calls to update_profile_in_user_dir. There are five calls:

  • main/user_directory.py (regenerate the user directory)
    • _populate_user_directory_process_users. 1 call site, reads data from the profiles table. Only does stuff if user_directory_search_all_users.
    • _populate_user_directory_process_rooms. 1 call site, reads from the room_memberships table
  • handlers/user_directory.py (incrementally update the user directory)
    • handle_local_profile_change. 4 call sites, all reading from profiles
    • _handle_profile_change. 1 call site, reads from a room event. Seems to handle both local and remote users.
    • _handle_new_user. 2 call sites, one reading from a room event, andother from room_memberships.

I think the idea behind these two tables is:

  • combine local user profile info (profiles) with remote user profile info (hinted at in room_membership) into a single place. (At the cost of maintaining a second source of truth?)
  • factor out the text search stuff into a separate table, so we can limit the scope of postgres/sqlite differences

Any per-room "private names" will appear in room membership events, which end up getting stored in room_memberships. So per-room names can get into the user_directory table via the bolded entries above.

To fix the leak without changing the existing source code too much, we'd need to ensure that the bolded entries always refer to a "public profile". When the user in question is local to this homeserver, we can use query the profiles table. Otherwise... well, we have no alternative but to query the remote user's homeserver. We'll have to do that in an efficient way to stop ourselves from spamming the same profile request in short succession if a lot of rename events come through.

Profile update events will come through to lots of rooms. Wait until we've had $THRESHOLD seconds without a duplicate request before firing one off. (Or don't fire one off unless the last time we tried was $THRESHOLD seconds ago?

Proposed implementation: stop local users' per-room profiles from leaking

When local profiles change, always the update user_directory

I think the code paths would be simpler if we made the config flag user_directory_search_all_users just decide which WHERE clause to use when searching the user directory. I don't think it needs to control the data we put into the user_directory table.

As a bonus, doing this would mean that a server admin can toggle this config flag live without having to restart their server.

The cost of doing so is an extra entry in the user_directory and user_directory_search table for users on this homeserver who aren't in any rooms. I think this is negligible. The search query we use will

  • Remove the guards in DeactivateAccounHandler:activate_account, ProfileHandler:set_displayname, ProfileHandler:set_avatar_url, RegistrationHandler:register_user. This means changes to local profiles will immediately propagate to the user_directory tables.
  • Remove the guards in _populate_user_directory_createtables and _populate_user_directory_process_users so that these will always scan for local users.

TODO: want to exclude "support users" from directories at all stages. Can a user change from being support to not support? Will support users always be local to this homeserver?

Ignore room events describing local users' profile changes

The above can then be solely responsible for adding and updating local users' profiles to user_directories. So the UserDirectoryHandle can completely ignore any per-room profiles for local users.

  • Change _populate_user_directory_process_rooms to only adds user_directory entries for remote users. Do this by skipping local users in the loop over users_with_profile. Note that we need to still consider these users for the purposes of computing users_in_public_rooms and users_who_share_private_rooms.
    • TODO: why do we skip over remote users if the room is not public? (last else block)
    • Because we're only ever going to be searching users_who_share_private_rooms where the first user is local?
    • TODO: What's an appservice? Why might we not want to add its users to users_in_public_rooms and users_who_share_private_rooms?
  • Change _handle_deltas to continue if typ == EventTypes.Member, _get_key_change indicates no membership change and the user is local

Refactors

  • Find a name for _populate_user_directory_process_rooms which hints that we'll also populate the "who's in what room" tables
  • Use an enum instead of Optional[bool] for the return type of _get_key_change

Clarify the role of UserDirectoryHandler. I think it has three purposes:

  1. Deciding the result of a user search (POST /user_directory/search)
  2. Exposing hooks for other parts of the application to call when local user profiles are added, changed or removed
  3. Respond to room changes
  • publicness change
  • user added or removed
  • user changed their profile within that room

I suggest

  • Move notify_new_event down below handle_user_deactivated
  • Rename handle_user_deactivated -> handle_local_user_deactivated

TODO: stop remote users' per-room profiles from leaking

I think this means we'll need some kind of queuing mechanism to deduplicate and send out requests across the federation.

I think we should treat this as a separate piece of work because the surgey above already feels intimidating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment